How Random is arc4random_uniform?

I remember when I was in college and iTunes became a thing. One of the aspects of iTunes that I liked was their feature where you could randomize a playlist.

After using it for a while, I noticed it didn’t seem to quite work. I would make a playlist with ten songs by one artist and two songs by another and every time I would randomize the playlist the two songs by one artist would always play one after another.

A lot of people complained about it. There were articles on the Internet about how random does not mean that the algorithm will always separate those two songs. If you shuffle a deck of cards there will periodically be sets of cards that settle into order. In later versions of iTunes there were options to make thing more or less random, which didn’t make a whole lot of sense to me.

To this day, I am still bothered by what seems to not be truly random modes of behavior. I play a few German-style board game apps on my iPhone and I notice that the dice rolls never seem to feel natural. In Settlers of Catan you roll two dice. Statistically, the dice should amount to seven as the most common dice combination, followed by six and eight. Whenever I play this game, I will have three or four rolls in a row where the dice total is three, which is statistically improbable. If this happened just once, it would be weird because stuff like that can happen in real life, but it happens every game.

I don't want him touching me!

I don’t want him touching me!

Random isn’t random. Humans are bad at making truly random patterns. In the pilot episode of Numb3rs, Charlie demonstrates this by having everyone create a “random” distribution of themselves in the room, which isn’t truly random because everyone is trying to maintain a certain distance from one another. (That scene is about 18 minutes into the episode, if you don’t want to watch the whole thing.) If it were a truly random distribution there would be clumps of people. Cornell has a page summarizing this if you’d rather read than watch.

Humans also mistake things that look random for not actually being random. Synchronicity is the practice of looking for patterns in randomness because it’s the believe that some things are so coincidental that they could not possibly be con incidents but the presence of a higher power. This is the basis for divination methods such as the Tarot and the iChing. There are books about this phenomenon.

So if humans are bad at creating random sequences and see meaning in random patterns, then computers should be awesome at it, right? Well…

RC4 and ARC4

There are a lot of reasons that a computer programmer would need randomly generated values. Most games depend on having a lot of randomly generated values. You can’t play Solitaire without a “shuffled” deck. Having enemies randomly spawn, rolling dice, and a lot of other things depend on random values to work.

Even more importantly, randomly generated values are vital for cryptography and security. It is this purpose that spawned our most used random generation algorithms.

Arc4Random is the most common command used to generate random numbers. A simple way to generate a random dice roll in Swift is:

let diceRoll = Int(arc4random_uniform(6) + 1)

So what is arc4Random and where did it come from?

In 1987 a man named Ron Rivest created the Rivest Cypher 4 (RC4) algorithm. He did this while working for RSA and thus they owned the algorithm.

RC4 is a stream cipher, allowing for varying lengths of bits to be encrypted. It remained secret and secure until 1994, when it was reverse engineered and the cypher was cracked. RC4 was a registered trademark owned by RSA, so this new, public algorithm was named ARC4.

Since this cypher has been cracked, it’s a really bad idea to use this to encrypt your programs. But it still works for creating randomly generated content, so it is now commonly used in things like game programing when you need randomly generated content.

How Computers “Generate” Randomness

There are two flavors of random number generation in computing: Pseudo-Random Number Generators (PRNG) and True Random Number Generators (TRNG).

RC4 is a PRNG, which means that arc4random is also a PRNG.

PRNGs work by generating a table of values from a seed. These values are supposed to mimic what you would get if you had true randomness. If you took an intern and had them roll a die a hundred times and record the result they got (for experience, of course) and made it into a table, you would have a PRNG.

So if you start out with the same seed, you’ll get the same results over and over again. In order to get different results, you need to use another seed.

For most of what you need to do with random number generation, this is good enough. If you just want to have a game on your iPhone where you need to randomly generate bad guys, this is just fine. It’s not fine if you’re trying to encrypt credit card or personal health information.

Random Number Playground

I wanted to test out how random arc4Random is because I feel like there are number that get repeated all the time constantly and it’s not really an even distribution.

I create a playground that you can access here. I am planning to add additional functionality to this over time, but it’s pretty bare bones at the moment.

I decided to run this die roll function twenty times. With six values, there should be about three rolls per number. Didn’t quite get that:

  • Number of 1: 7
  • Number of 2: 2
  • Number of 3: 1
  • Number of 4: 4
  • Number of 5: 2
  • Number of 6: 4

Yikes! This seems to buy into my Settlers of Catan theory that the game is set to screw me over by generating an excessive number of ones.

However, as anyone who does polling and clinical trials can tell you, twenty is not a large enough number to be statistically accurate.

So what happens if you run this a lot? Like a lot a lot? What if you ran this 2000 times? I got these results:

  • Number of 1: 348
  • Number of 2: 304
  • Number of 3: 327
  • Number of 4: 329
  • Number of 5: 347
  • Number of 6: 345

As you can see, if this gets run a lot, then the numbers even out significantly. The number of ones, fives, and sixes only deviates by three.

Conclusions

There is a difference between snapshots of things at certain points in time and long term patterns of behavior. If you look at something like the stock market, stock prices can deviate wildly over the course of an hour, but if you look at long term trends, then they tend to even out.

It’s rather frustrating to play a game expecting the dice rolls to behave statistically. However, if everything in life behaved exactly as you expected it to, it wouldn’t really be random, would it?

Additional Links