limits to dsb_rand
Moderator: Sophia
Forum rules
Please read the Forum rules and policies before posting.
Please read the Forum rules and policies before posting.
limits to dsb_rand
Are there any particular limits to how far you can push dsb_rand?
I'm throwing some pretty big numbers at it so I'm wondering if I ought to be breaking it up to something like
dsb_rand(0,9999)*10000 + dsb_rand(0,9999)
where the exact details would depend on the number I am after and what dsb_rand can handle.
I'm throwing some pretty big numbers at it so I'm wondering if I ought to be breaking it up to something like
dsb_rand(0,9999)*10000 + dsb_rand(0,9999)
where the exact details would depend on the number I am after and what dsb_rand can handle.
Friends don't let friends eat worm round
Re: limits to dsb_rand
OK, I'll admit that's a seriously big number 8-)
Thanks
Thanks
Friends don't let friends eat worm round
Re: limits to dsb_rand
A correction: while dsb_rand may not *crash* for very large ranges, I wouldn't call the results of any range above about 3200 really viable.
a) dsb_rand seems to use "generate a randomish number in 0 .. 32767 and then modulus it into the required range" which means that:
b) if the range is greater than 32768, then the higher values have zero probability
c) if the range doesn't perfectly fit the modulus, then lower numbers get an additional chance to be selected. So if you do dsb(1,30000) then you have twice as much change of getting 100 than of 10000.
(Or rather, experimentally determining (b) and (c) leads me to believe (a))
At about 3200ish the difference between high and low probabilities drops below 10%, hence my initial statement
a) dsb_rand seems to use "generate a randomish number in 0 .. 32767 and then modulus it into the required range" which means that:
b) if the range is greater than 32768, then the higher values have zero probability
c) if the range doesn't perfectly fit the modulus, then lower numbers get an additional chance to be selected. So if you do dsb(1,30000) then you have twice as much change of getting 100 than of 10000.
(Or rather, experimentally determining (b) and (c) leads me to believe (a))
At about 3200ish the difference between high and low probabilities drops below 10%, hence my initial statement
Friends don't let friends eat worm round
- Sophia
- Concise and Honest
- Posts: 4240
- Joined: Thu Sep 12, 2002 9:50 pm
- Location: Nowhere in particular
- Contact:
Re: limits to dsb_rand
Interesting. I admit I haven't used DSB's RNG with very large numbers, so there is a possibility there could be some big problems with it.
I'm curious what would happen in your experiments if you replace dsb_rand with Lua's built-in math.random and see if that changes anything. (Don't forget to seed it with math.randomseed first)
I'd also appreciate it if you'd post your actual data, rather than just your conclusions, because it'll make me easier to analyze/debug.
I'm curious what would happen in your experiments if you replace dsb_rand with Lua's built-in math.random and see if that changes anything. (Don't forget to seed it with math.randomseed first)
I'd also appreciate it if you'd post your actual data, rather than just your conclusions, because it'll make me easier to analyze/debug.
Re: limits to dsb_rand
If I put
in startup.lua I get
(double check: 30 x 32768 is about the number of loops - so I'm not getting random numbers that aren't being printed)
Code: Select all
local dist={}
local temp
for i=0,999999 do
temp=dsb_rand(0,99999999)
if (dist[temp] == nil) then
dist[temp] = 1
else
dist[temp] = dist[temp]+1
end
end
for k,v in pairs(dist) do
__log(tostring(k) .. " -> " .. tostring(v))
end
Code: Select all
Lua: 1 -> 34
Lua: 2 -> 35
Lua: 3 -> 24
Lua: 4 -> 36
...
lots of about the same number
...
Lua: 32763 -> 33
Lua: 32764 -> 27
Lua: 32765 -> 32
Lua: 32766 -> 21
Lua: 32767 -> 28
Lua: 0 -> 25
Last edited by kaypy on Sun Aug 09, 2015 3:50 am, edited 1 time in total.
Friends don't let friends eat worm round
Re: limits to dsb_rand
If I put
in startup.lua I get
Code: Select all
local dist={}
local temp
for i=0,999999 do
temp=dsb_rand(0,30000)
if (dist[temp] == nil) then
dist[temp] = 1
else
dist[temp] = dist[temp]+1
end
end
for k,v in pairs(dist) do
__log(tostring(k) .. " -> " .. tostring(v))
end
Code: Select all
Lua: 1 -> 61
Lua: 2 -> 60
Lua: 3 -> 62
Lua: 4 -> 70
Lua: 5 -> 54
...
Lua: 2758 -> 57
Lua: 2759 -> 68
Lua: 2760 -> 58
Lua: 2761 -> 62
Lua: 2762 -> 61
Lua: 2763 -> 62
Lua: 2764 -> 69
Lua: 2765 -> 58
Lua: 2766 -> 73
Lua: 2767 -> 25
Lua: 2768 -> 28
Lua: 2769 -> 46
Lua: 2770 -> 38
Lua: 2771 -> 31
Lua: 2772 -> 34
Lua: 2773 -> 24
Lua: 2774 -> 45
Lua: 2775 -> 30
Lua: 2776 -> 29
Lua: 2777 -> 23
...
Lua: 29996 -> 31
Lua: 29997 -> 40
Lua: 29998 -> 24
Lua: 29999 -> 37
Lua: 30000 -> 28
Lua: 0 -> 59
Friends don't let friends eat worm round
Re: limits to dsb_rand
Lua's random falls through to ANSI C random, which can have a range (depending on the implementation) "as low as 32767"
using
gives me
where there are still 32768 different return values, but now spread across the target range instead of being the first values.
using
Code: Select all
math.randomseed(34892895636) -- arbitrary seed, for reproducability
local dist={}
local temp
for i=0,999999 do
temp=math.random(0,9999999)
if (dist[temp] == nil) then
dist[temp] = 1
else
dist[temp] = dist[temp]+1
end
end
for k,v in pairs(dist) do
__log(tostring(k) .. " -> " .. tostring(v))
end
Code: Select all
Lua: 4846949 -> 32
Lua: 8172856 -> 42
Lua: 9617297 -> 29
Lua: 5109103 -> 28
Lua: 3472090 -> 24
...
Lua: 3209936 -> 35
Lua: 6089053 -> 27
Lua: 7075106 -> 18
Lua: 2226935 -> 26
Lua: 5237281 -> 35
Friends don't let friends eat worm round
Re: limits to dsb_rand
The overlap case for lua random is a bit trickier, so I added a histogram
The initial output is not immediately obviously skewed
although the result for 10 there probably gives you an idea for what is coming:
The histogram shows a distinctive double bell curve. The results with increased probability are just again spread over the range, so 10, 21, 32, etc are double sized
peaks, troughs and midpoints for brevity:
Code: Select all
math.randomseed(34892895636) -- arbitrary seed, for reproducability
local dist={}
local hist={}
local temp
for i=0,999999 do
temp=math.random(0,30000)
if (dist[temp] == nil) then
dist[temp] = 1
else
dist[temp] = dist[temp]+1
end
end
for k,v in pairs(dist) do
__log(tostring(k) .. " -> " .. tostring(v))
if (hist[v] == nil) then
hist[v] = 1
else
hist[v] = hist[v]+1
end
end
__log("---hist--")
for k,v in pairs(hist) do
__log(tostring(k) .. " => " .. tostring(v))
end
Code: Select all
Lua: 1 -> 25
Lua: 2 -> 27
Lua: 3 -> 33
Lua: 4 -> 31
Lua: 5 -> 29
Lua: 6 -> 33
Lua: 7 -> 32
Lua: 8 -> 36
Lua: 9 -> 30
Lua: 10 -> 68
Lua: 11 -> 35
Lua: 12 -> 28
Lua: 13 -> 36
...
The histogram shows a distinctive double bell curve. The results with increased probability are just again spread over the range, so 10, 21, 32, etc are double sized
peaks, troughs and midpoints for brevity:
Code: Select all
Lua: 11 => 3
Lua: 20 => 318
Lua: 29 => 1980
Lua: 38 => 750
Lua: 47 => 50
Lua: 53 => 91
Lua: 59 => 155
Lua: 77 => 17
Lua: 96 => 1
Friends don't let friends eat worm round
Re: limits to dsb_rand
Oh two additional things:
1) Is there sufficient data up there, or would you rather the full reams and reams of it? (I would be looking at putting a zip file somewhere rather than dumping all that to the forum)
2) I'm not sure that any of this actually constitutes a problem with dsb_rand, just a limitation of the system. If you use it within the limits, everything is fine. (eg if you are generating numbers in a sensible 1-100 range then the inaccuracies come out to about 0.3% which really isn't much)
(I would be more worried about "is this falling though to a C LCRNG which means it will inherit the usual issues with LCRNGs?" - but really, that is probably not a big deal either)
1) Is there sufficient data up there, or would you rather the full reams and reams of it? (I would be looking at putting a zip file somewhere rather than dumping all that to the forum)
2) I'm not sure that any of this actually constitutes a problem with dsb_rand, just a limitation of the system. If you use it within the limits, everything is fine. (eg if you are generating numbers in a sensible 1-100 range then the inaccuracies come out to about 0.3% which really isn't much)
(I would be more worried about "is this falling though to a C LCRNG which means it will inherit the usual issues with LCRNGs?" - but really, that is probably not a big deal either)
Friends don't let friends eat worm round
Re: limits to dsb_rand
Oh, and in case anyone winds up here looking for larger dsb_rands, I'll leave this here. (nb I know there are statistical issues with this style of algorithm, but I'm not using it for cryptography, so it ought to be OK)
Code: Select all
-- random number generator suitable for ranges up to a couple hundred million
function rand_30bit(minval,maxval)
-- determine required range
local randrange = 1 + maxval - minval
if (randrange == 1) then return minval end
-- get maximum whole group (above this is 'unfair')
local randmax = math.floor(32768 * 32768 / randrange) * randrange
-- get useful random number (ie not in the unfair part of the range)
local randnum = 32768 * 32768
while randnum >= randmax do
randnum = dsb_rand(0,32767) * 32768 + dsb_rand(0,32767)
end
return minval + math.mod(randnum,randrange)
end
Friends don't let friends eat worm round
- Sophia
- Concise and Honest
- Posts: 4240
- Joined: Thu Sep 12, 2002 9:50 pm
- Location: Nowhere in particular
- Contact:
Re: limits to dsb_rand
Thanks, that was all very helpful. I don't need the full reams and reams; I think I've managed to figure out what's going on from what you've posted.
First of all, you're exactly right, it is using a C LCRNG (dsb_rand is more or less just a wrapper for stdlib rand) and apparently not a particularly robust implementation of one, either. Since both my code and Lua's built-in code have the same issue, I'm guessing the implementation of rand() on Windows (or, at least, the one you get if you're not using MSVC) only gives the bare minimum of 15 bits.
As you might already know, there are two fair ways to clamp a random number generator to a given range:
1) Use mod, and throw out values that will skew the distribution, like your code does. The disadvantage is this can occasionally slow the program down.
2) Treat the number as a fixed-point number and multiply it by the total range desired, like Lua does. The disadvantage is that for large ranges you might still have some weirdness, and some missing numbers.
There's also the method used by DSB, which is to use mod and not throw out the unfair values-- I wrote this code some 8 years ago and I just didn't think of it back then. This is unnoticeable with small numbers, but becomes problematic with larger ones, especially when it's only generating 15 random bits rather than 32.
Anyway, there are very fast and good implementations of the Mersenne Twister algorithm that didn't exist back when I first started working on DSB, so I think the best solution is probably to just completely swap out DSB's RNG for something more robust, which is what I'll do.
First of all, you're exactly right, it is using a C LCRNG (dsb_rand is more or less just a wrapper for stdlib rand) and apparently not a particularly robust implementation of one, either. Since both my code and Lua's built-in code have the same issue, I'm guessing the implementation of rand() on Windows (or, at least, the one you get if you're not using MSVC) only gives the bare minimum of 15 bits.
As you might already know, there are two fair ways to clamp a random number generator to a given range:
1) Use mod, and throw out values that will skew the distribution, like your code does. The disadvantage is this can occasionally slow the program down.
2) Treat the number as a fixed-point number and multiply it by the total range desired, like Lua does. The disadvantage is that for large ranges you might still have some weirdness, and some missing numbers.
There's also the method used by DSB, which is to use mod and not throw out the unfair values-- I wrote this code some 8 years ago and I just didn't think of it back then. This is unnoticeable with small numbers, but becomes problematic with larger ones, especially when it's only generating 15 random bits rather than 32.
Anyway, there are very fast and good implementations of the Mersenne Twister algorithm that didn't exist back when I first started working on DSB, so I think the best solution is probably to just completely swap out DSB's RNG for something more robust, which is what I'll do.
Re: limits to dsb_rand
Cool. I wasn't sure any change was necessary, but if there's a drop in replacement MT in a suitable license, a better RNG is always better 8-)
thanks
thanks
Friends don't let friends eat worm round