How Random is arc4random_uniform?

I remember when I was in college and iTunes became a thing. One of the aspects of iTunes that I liked was their feature where you could randomize a playlist.

After using it for a while, I noticed it didn’t seem to quite work. I would make a playlist with ten songs by one artist and two songs by another and every time I would randomize the playlist the two songs by one artist would always play one after another.

A lot of people complained about it. There were articles on the Internet about how random does not mean that the algorithm will always separate those two songs. If you shuffle a deck of cards there will periodically be sets of cards that settle into order. In later versions of iTunes there were options to make thing more or less random, which didn’t make a whole lot of sense to me.

To this day, I am still bothered by what seems to not be truly random modes of behavior. I play a few German-style board game apps on my iPhone and I notice that the dice rolls never seem to feel natural. In Settlers of Catan you roll two dice. Statistically, the dice should amount to seven as the most common dice combination, followed by six and eight. Whenever I play this game, I will have three or four rolls in a row where the dice total is three, which is statistically improbable. If this happened just once, it would be weird because stuff like that can happen in real life, but it happens every game.

I don't want him touching me!

I don’t want him touching me!

Random isn’t random. Humans are bad at making truly random patterns. In the pilot episode of Numb3rs, Charlie demonstrates this by having everyone create a “random” distribution of themselves in the room, which isn’t truly random because everyone is trying to maintain a certain distance from one another. (That scene is about 18 minutes into the episode, if you don’t want to watch the whole thing.) If it were a truly random distribution there would be clumps of people. Cornell has a page summarizing this if you’d rather read than watch.

Humans also mistake things that look random for not actually being random. Synchronicity is the practice of looking for patterns in randomness because it’s the believe that some things are so coincidental that they could not possibly be con incidents but the presence of a higher power. This is the basis for divination methods such as the Tarot and the iChing. There are books about this phenomenon.

So if humans are bad at creating random sequences and see meaning in random patterns, then computers should be awesome at it, right? Well…

RC4 and ARC4

There are a lot of reasons that a computer programmer would need randomly generated values. Most games depend on having a lot of randomly generated values. You can’t play Solitaire without a “shuffled” deck. Having enemies randomly spawn, rolling dice, and a lot of other things depend on random values to work.

Even more importantly, randomly generated values are vital for cryptography and security. It is this purpose that spawned our most used random generation algorithms.

Arc4Random is the most common command used to generate random numbers. A simple way to generate a random dice roll in Swift is:

let diceRoll = Int(arc4random_uniform(6) + 1)

So what is arc4Random and where did it come from?

In 1987 a man named Ron Rivest created the Rivest Cypher 4 (RC4) algorithm. He did this while working for RSA and thus they owned the algorithm.

RC4 is a stream cipher, allowing for varying lengths of bits to be encrypted. It remained secret and secure until 1994, when it was reverse engineered and the cypher was cracked. RC4 was a registered trademark owned by RSA, so this new, public algorithm was named ARC4.

Since this cypher has been cracked, it’s a really bad idea to use this to encrypt your programs. But it still works for creating randomly generated content, so it is now commonly used in things like game programing when you need randomly generated content.

How Computers “Generate” Randomness

There are two flavors of random number generation in computing: Pseudo-Random Number Generators (PRNG) and True Random Number Generators (TRNG).

RC4 is a PRNG, which means that arc4random is also a PRNG.

PRNGs work by generating a table of values from a seed. These values are supposed to mimic what you would get if you had true randomness. If you took an intern and had them roll a die a hundred times and record the result they got (for experience, of course) and made it into a table, you would have a PRNG.

So if you start out with the same seed, you’ll get the same results over and over again. In order to get different results, you need to use another seed.

For most of what you need to do with random number generation, this is good enough. If you just want to have a game on your iPhone where you need to randomly generate bad guys, this is just fine. It’s not fine if you’re trying to encrypt credit card or personal health information.

Random Number Playground

I wanted to test out how random arc4Random is because I feel like there are number that get repeated all the time constantly and it’s not really an even distribution.

I create a playground that you can access here. I am planning to add additional functionality to this over time, but it’s pretty bare bones at the moment.

I decided to run this die roll function twenty times. With six values, there should be about three rolls per number. Didn’t quite get that:

  • Number of 1: 7
  • Number of 2: 2
  • Number of 3: 1
  • Number of 4: 4
  • Number of 5: 2
  • Number of 6: 4

Yikes! This seems to buy into my Settlers of Catan theory that the game is set to screw me over by generating an excessive number of ones.

However, as anyone who does polling and clinical trials can tell you, twenty is not a large enough number to be statistically accurate.

So what happens if you run this a lot? Like a lot a lot? What if you ran this 2000 times? I got these results:

  • Number of 1: 348
  • Number of 2: 304
  • Number of 3: 327
  • Number of 4: 329
  • Number of 5: 347
  • Number of 6: 345

As you can see, if this gets run a lot, then the numbers even out significantly. The number of ones, fives, and sixes only deviates by three.

Conclusions

There is a difference between snapshots of things at certain points in time and long term patterns of behavior. If you look at something like the stock market, stock prices can deviate wildly over the course of an hour, but if you look at long term trends, then they tend to even out.

It’s rather frustrating to play a game expecting the dice rolls to behave statistically. However, if everything in life behaved exactly as you expected it to, it wouldn’t really be random, would it?

Additional Links

Caesar Cipher

I have been trying to do more programming exercises recently. One thing that I think is incredibly important when you are an engineer is to have an understanding of the problem you are trying to solve. If you’re not solving a problem, you wind up with Clever Code, and that’s just bad for everyone.

I have a casual interest in cryptography. It’s one of those things that always fascinated me but I never realized I could actually learn and apply to things.

I couldn’t sleep last night and I looked around for a book to read. I found my copy of The Code Book by Simon Singh.

I found a bunch of really interesting cryptographic methods that looked like they would be interesting code projects.

I am going to go through and do some of these in order of complexity and age. The older the method, the simpler it is. This makes sense because new cryptographic algorithms are only created in response to the most recent one being cracked.

I will be adding all of my projects to a Github repository.

I’ll probably hit a wall at some point where I can’t continue to do this, but for now I have a few that I can do and I am looking forward to figuring out how to implement these.

Working Caesar Cipher

Working Caesar Cipher

What is a Caesar Cipher?

A Caesar Cipher is an encryption technique where each letter of the alphabet is shifted by a certain offset and all the letters are substituted.

For example, if your offset is 3, then every time you have an “a” in your text that you are encrypting, it would be replaced by a “d.” “b” would be replaced by “e” and so on. Once you get to the end of the alphabet, you start at the beginning. “z” would be replaced by “c”, “y” would be replaced by “b”, and so forth.

This cipher was used for several hundred years, but it was vulnerable to being easily broken. In fact, there is a daily cryptographic puzzle that no one ever looks at published in the paper next to the comics.

Even though this specific cipher is fairly easy to break, it provides the basis for some more complex ciphers being used, which is why I am starting out with this one. I can build off of this one to work on more complicated problems.

What are our Requirements?

In order to implement a solution for this problem, it’s helpful to think through what this application has to do.

  1. Take in a string
  2. Choose an offset
  3. Check each letter in the string
  4. Shift the letter by the offset
  5. Return an encrypted string

The first part is easy. We simply use the built-in UI elements in iOS to create a text field that can contain the input text and the encrypted string.

Since we’re dealing with an offset where one letter will be substituted for another, it mades sense to use arrays. There should be a constant array that simply contains the entire alphabet with each letter constituting an element.

The second array is a little more tricky. We need to shift the elements in the array by a variable offset. You use the offset to figure out what the first element in the new array is.

After you get both arrays, you find the index of each letter in the first array and replace it with the correlating letter in the second array.

Implementation

I like to keep as much code as possible out of the view controller as possible, so I only have one line of code:

@IBAction func encryptText(sender: AnyObject) {
    encryptedText.text = encryptedString(inputText.text)
}

I have a helper function that takes the text input as a parameter and outputs the encrypted string and some other parts to the text output. Everything else is in a separate file of stand alone functions.

First thing I wanted to create was my immutable alphabet array:

let alphabet:[Character] = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]

Next up, I need my offset. I could make that something the user inputs, but it was easier to just have it be a randomly generated number:

func randomNumber() -> Int {
    return Int(arc4random_uniform(25) + 1)
}

I want this to return a number between one and twenty-five. Since the array starts at zero and a zero offset means no encryption, it makes sense to have the lowest number be a one. Since the alphabet is always going to be 26 characters long, having a hard-coded value of twenty-five makes sense in this scenario.

I would like to unit test this method, but I don’t know how to do that. Since the output is random, I don’t really know how to generate a useful unit test. At this point I have to just have faith that this piece of code won’t return a number that is outside of the index of my array.

I don’t like leaving things to chance, so in my code to create my encryption array I deal with this possibility to avoid making my app crash:

func encryptArray(var offset:Int) -> [Character] {
    
    if offset < 1 {
        offset = 1
    }
    
    if offset > 25 {
        offset = 25
    }
    
    var array = alphabet
    let slice = array[0.. < offset]
    array.removeRange(0.. < offset)
    array.appendContentsOf(slice)
    
    return array
    
}

The last few lines of code are copying the range at the beginning of the array, hacking it off of the front, the pasting it to the back.

I feel like there should be a more efficient way of doing this, but it works for now, so huzzah.

One of the things I really wanted to do was figure out a way to use map(). So far I have not found a good reason to use it. I have seen clever code from people that is longer than mine but more functional. I really wanted to find a situation where mapping a function made sense. Generating an array of characters that are offset by a certain fixed amount from another array of characters fits this to a tee.

Before I can use map(), I need to set up the function that is going to be mapped to the array:

func encryptCharacter(input:Character, encryptedArray:[Character]) -> Character {
    
    if alphabet.contains(input) {
        let index = alphabet.indexOf(input)
        return encryptedArray[index!]
    } else {
        return input
    }
}

One case I had to handle was if the character was not a letter. We’re dealing with spaces, commas, periods, etc… I do a check to make sure that the array contains the character being passed in. If it doesn’t, then it is one of the other characters we are returning directly.

Now that we have our processing function, and all of our other functions, for that matter, it’s time to pull it all together:

func encryptedString(stringToEncrypt:String) -> String {
    let lowercaseStringToEncrypt = stringToEncrypt.lowercaseString
    let offset = randomNumber()
    let encryptionArray = encryptArray(offset)
    let array = [Character](lowercaseStringToEncrypt.characters)
    
    let encryptedArray = array.map({encryptCharacter($0, encryptedArray: encryptionArray)})
    
    let encryptedString = String(encryptedArray)
    
    return "(encryptedString)
 The offset was (offset).
 (alphabet[0]) = (encryptionArray[0])"
}

This isn't working quite right...

This isn't working quite right...

The first time I ran this project, I realized that my analysis is case sensitive. I had letters that would fall through and be returned directly because the algorithm did not catch that they were letters. I resolved this by forcing all of the characters to be lowercase so that they would be processed. I think I could have made another array of upper case letters and add more complexity to the encryption process, but I didn’t feel like it. This was a simple fix.

Capital letters don't get no respect.

Capital letters don't get no respect.

I am creating variable to hold the results of all of my helper functions. Where necessary, I am passing in the results of some of these function to the functions that follow. We have a nice line of parts that all connect to one another in only one way.

In order to pass each character in, I need to create an array of characters from the string input in the view controller. There is a nice handy function that makes this process somewhat easy.

I was able to effectively use my nice map() function to process each character in the array of characters.

After going through and encrypting each character in the array, we can bring it back together again in a String.

Lastly, I wanted to include some information about the encrypted string beyond just the string. I wanted to include the offset and show what the first element in the encrypted array was.

Conclusion

I am not certain how to write unit tests for two of my stand alone function. They have randomly generated outputs. I think there must be a way to prove these work, even if it isn’t with unit tests. If anyone has suggestions about how to proceed with this I would appreciate it.

Overall, this was a nice little project to work on before heading off to RWDevCon. I have a larger independent project I would like to work on that I didn’t want to start when I would be gone for a week.

I find encoding and data processing to be fascinating. I would like to work on more stuff like this, even if it is just for my blog. I am hoping to have a more complicated encryption project on here soon.