Page 1 of 1

limits to dsb_rand

Posted: Mon Jan 27, 2014 9:19 am
by kaypy
Are there any particular limits to how far you can push dsb_rand?

I'm throwing some pretty big numbers at it so I'm wondering if I ought to be breaking it up to something like
dsb_rand(0,9999)*10000 + dsb_rand(0,9999)
where the exact details would depend on the number I am after and what dsb_rand can handle.

Re: limits to dsb_rand

Posted: Mon Jan 27, 2014 10:10 pm
by Sophia
You're fine. The largest range dsb_rand can handle is 2147483647. :mrgreen:

Re: limits to dsb_rand

Posted: Tue Jan 28, 2014 12:27 am
by kaypy
OK, I'll admit that's a seriously big number 8-)

Thanks

Re: limits to dsb_rand

Posted: Sat Aug 08, 2015 5:22 am
by kaypy
A correction: while dsb_rand may not *crash* for very large ranges, I wouldn't call the results of any range above about 3200 really viable.

a) dsb_rand seems to use "generate a randomish number in 0 .. 32767 and then modulus it into the required range" which means that:

b) if the range is greater than 32768, then the higher values have zero probability

c) if the range doesn't perfectly fit the modulus, then lower numbers get an additional chance to be selected. So if you do dsb(1,30000) then you have twice as much change of getting 100 than of 10000.

(Or rather, experimentally determining (b) and (c) leads me to believe (a))

At about 3200ish the difference between high and low probabilities drops below 10%, hence my initial statement

Re: limits to dsb_rand

Posted: Sat Aug 08, 2015 11:08 pm
by Sophia
Interesting. I admit I haven't used DSB's RNG with very large numbers, so there is a possibility there could be some big problems with it.

I'm curious what would happen in your experiments if you replace dsb_rand with Lua's built-in math.random and see if that changes anything. (Don't forget to seed it with math.randomseed first)
I'd also appreciate it if you'd post your actual data, rather than just your conclusions, because it'll make me easier to analyze/debug.

Re: limits to dsb_rand

Posted: Sun Aug 09, 2015 3:44 am
by kaypy
If I put

Code: Select all

local dist={}
local temp
for i=0,999999 do
	temp=dsb_rand(0,99999999)
	if (dist[temp] == nil) then
		dist[temp] = 1
	else
		dist[temp] = dist[temp]+1
	end
end
for k,v in pairs(dist) do
	__log(tostring(k) .. " -> " .. tostring(v))
end
in startup.lua I get

Code: Select all

Lua: 1 -> 34
Lua: 2 -> 35
Lua: 3 -> 24
Lua: 4 -> 36
...
lots of about the same number
...
Lua: 32763 -> 33
Lua: 32764 -> 27
Lua: 32765 -> 32
Lua: 32766 -> 21
Lua: 32767 -> 28
Lua: 0 -> 25
(double check: 30 x 32768 is about the number of loops - so I'm not getting random numbers that aren't being printed)

Re: limits to dsb_rand

Posted: Sun Aug 09, 2015 3:49 am
by kaypy
If I put

Code: Select all

local dist={}
local temp
for i=0,999999 do
	temp=dsb_rand(0,30000)
	if (dist[temp] == nil) then
		dist[temp] = 1
	else
		dist[temp] = dist[temp]+1
	end
end
for k,v in pairs(dist) do
	__log(tostring(k) .. " -> " .. tostring(v))
end
in startup.lua I get

Code: Select all

Lua: 1 -> 61
Lua: 2 -> 60
Lua: 3 -> 62
Lua: 4 -> 70
Lua: 5 -> 54
...
Lua: 2758 -> 57
Lua: 2759 -> 68
Lua: 2760 -> 58
Lua: 2761 -> 62
Lua: 2762 -> 61
Lua: 2763 -> 62
Lua: 2764 -> 69
Lua: 2765 -> 58
Lua: 2766 -> 73
Lua: 2767 -> 25
Lua: 2768 -> 28
Lua: 2769 -> 46
Lua: 2770 -> 38
Lua: 2771 -> 31
Lua: 2772 -> 34
Lua: 2773 -> 24
Lua: 2774 -> 45
Lua: 2775 -> 30
Lua: 2776 -> 29
Lua: 2777 -> 23
...
Lua: 29996 -> 31
Lua: 29997 -> 40
Lua: 29998 -> 24
Lua: 29999 -> 37
Lua: 30000 -> 28
Lua: 0 -> 59

Re: limits to dsb_rand

Posted: Sun Aug 09, 2015 3:59 am
by kaypy
Lua's random falls through to ANSI C random, which can have a range (depending on the implementation) "as low as 32767"

using

Code: Select all

math.randomseed(34892895636) -- arbitrary seed, for reproducability

local dist={}
local temp
for i=0,999999 do
	temp=math.random(0,9999999)
	if (dist[temp] == nil) then
		dist[temp] = 1
	else
		dist[temp] = dist[temp]+1
	end
end
for k,v in pairs(dist) do
	__log(tostring(k) .. " -> " .. tostring(v))
end
gives me

Code: Select all

Lua: 4846949 -> 32
Lua: 8172856 -> 42
Lua: 9617297 -> 29
Lua: 5109103 -> 28
Lua: 3472090 -> 24
...
Lua: 3209936 -> 35
Lua: 6089053 -> 27
Lua: 7075106 -> 18
Lua: 2226935 -> 26
Lua: 5237281 -> 35
where there are still 32768 different return values, but now spread across the target range instead of being the first values.

Re: limits to dsb_rand

Posted: Sun Aug 09, 2015 4:13 am
by kaypy
The overlap case for lua random is a bit trickier, so I added a histogram

Code: Select all

math.randomseed(34892895636) -- arbitrary seed, for reproducability

local dist={}
local hist={}
local temp
for i=0,999999 do
	temp=math.random(0,30000)
	if (dist[temp] == nil) then
		dist[temp] = 1
	else
		dist[temp] = dist[temp]+1
	end
end
for k,v in pairs(dist) do
	__log(tostring(k) .. " -> " .. tostring(v))
	if (hist[v] == nil) then
		hist[v] = 1
	else
		hist[v] = hist[v]+1
	end
end
__log("---hist--")
for k,v in pairs(hist) do
	__log(tostring(k) .. " => " .. tostring(v))
end
The initial output is not immediately obviously skewed

Code: Select all

Lua: 1 -> 25
Lua: 2 -> 27
Lua: 3 -> 33
Lua: 4 -> 31
Lua: 5 -> 29
Lua: 6 -> 33
Lua: 7 -> 32
Lua: 8 -> 36
Lua: 9 -> 30
Lua: 10 -> 68
Lua: 11 -> 35
Lua: 12 -> 28
Lua: 13 -> 36
...
although the result for 10 there probably gives you an idea for what is coming:

The histogram shows a distinctive double bell curve. The results with increased probability are just again spread over the range, so 10, 21, 32, etc are double sized

peaks, troughs and midpoints for brevity:

Code: Select all

Lua: 11 => 3
Lua: 20 => 318
Lua: 29 => 1980
Lua: 38 => 750
Lua: 47 => 50
Lua: 53 => 91
Lua: 59 => 155
Lua: 77 => 17
Lua: 96 => 1

Re: limits to dsb_rand

Posted: Sun Aug 09, 2015 4:26 am
by kaypy
Oh two additional things:

1) Is there sufficient data up there, or would you rather the full reams and reams of it? (I would be looking at putting a zip file somewhere rather than dumping all that to the forum)

2) I'm not sure that any of this actually constitutes a problem with dsb_rand, just a limitation of the system. If you use it within the limits, everything is fine. (eg if you are generating numbers in a sensible 1-100 range then the inaccuracies come out to about 0.3% which really isn't much)

(I would be more worried about "is this falling though to a C LCRNG which means it will inherit the usual issues with LCRNGs?" - but really, that is probably not a big deal either)

Re: limits to dsb_rand

Posted: Sun Aug 09, 2015 11:19 am
by kaypy
Oh, and in case anyone winds up here looking for larger dsb_rands, I'll leave this here. (nb I know there are statistical issues with this style of algorithm, but I'm not using it for cryptography, so it ought to be OK)

Code: Select all

-- random number generator suitable for ranges up to a couple hundred million
function rand_30bit(minval,maxval)
	-- determine required range
	local randrange = 1 + maxval - minval
	if (randrange == 1) then return minval end
	-- get maximum whole group (above this is 'unfair')
	local randmax = math.floor(32768 * 32768 / randrange) * randrange
	-- get useful random number (ie not in the unfair part of the range)
	local randnum = 32768 * 32768
	while randnum >= randmax do	
		randnum = dsb_rand(0,32767) * 32768 + dsb_rand(0,32767)
	end
	return minval + math.mod(randnum,randrange)
end

Re: limits to dsb_rand

Posted: Sun Aug 09, 2015 9:18 pm
by Sophia
Thanks, that was all very helpful. I don't need the full reams and reams; I think I've managed to figure out what's going on from what you've posted.

First of all, you're exactly right, it is using a C LCRNG (dsb_rand is more or less just a wrapper for stdlib rand) and apparently not a particularly robust implementation of one, either. Since both my code and Lua's built-in code have the same issue, I'm guessing the implementation of rand() on Windows (or, at least, the one you get if you're not using MSVC) only gives the bare minimum of 15 bits.

As you might already know, there are two fair ways to clamp a random number generator to a given range:
1) Use mod, and throw out values that will skew the distribution, like your code does. The disadvantage is this can occasionally slow the program down.
2) Treat the number as a fixed-point number and multiply it by the total range desired, like Lua does. The disadvantage is that for large ranges you might still have some weirdness, and some missing numbers.

There's also the method used by DSB, which is to use mod and not throw out the unfair values-- I wrote this code some 8 years ago and I just didn't think of it back then. This is unnoticeable with small numbers, but becomes problematic with larger ones, especially when it's only generating 15 random bits rather than 32.

Anyway, there are very fast and good implementations of the Mersenne Twister algorithm that didn't exist back when I first started working on DSB, so I think the best solution is probably to just completely swap out DSB's RNG for something more robust, which is what I'll do.

Re: limits to dsb_rand

Posted: Mon Aug 10, 2015 12:23 pm
by kaypy
Cool. I wasn't sure any change was necessary, but if there's a drop in replacement MT in a suitable license, a better RNG is always better 8-)

thanks