"A man is accepted into a church for what he believes and he is turned out for what he knows."Samuel Clemens (Mark Twain)



Please help out the Big Fat Atheist Quiz Of The Year by submitting multiple choice questions!

Random Data

I have recently been messing around with a few “Bible Code” generators, purely to see what I could find. For those who don’t know, the Bible Code is the idea that you can find a series of words in the bible by picking a letter and “skipping” letters by 5 each time, possibly more. The idea is flawed, and several statisticians have come forward and revealed that given a large enough volume of text, you can find all sorts of “hidden” messages.

I’ll save my results for another blog post, since I have only just started doing some tests. This blog post details how I set up my comparison text. In order to perform the tests, I decided to use a text version of the King James Bible, and also a randomly generated text of the exact same file-size.

Getting the KJB text was easy enough, as Project Gutenberg has it for free download. All I had to do was remove the long introduction from the beginning, stopping when I got to Genesis 1:1. I then needed to strip everything other than the 26 letters of the alphabet, because the Bible Code algorithms only accept these. So numbers, spaces, carriage returns, apostrophes, colons all had to go. On Linux systems, this is easily done:

dd if=kjv10.txt | grep -ao “[A-Za-z]” | tr -d “\n” > bible-text.txt

dd is a command that copies / converts files, and I gave it the input file (if=kjv10.txt) of the full King James text. I then piped this into grep, telling it to only output characters that matched either A-Z (uppercase) or a-z (lowercase). grep outputs every match on a new line, so I fed that result through tr, which when given the “-d” argument, deletes the following character from the input. The character “\n” is the standard way of showing a new line as a character. The result of all these commands is a long string of characters, and is saved to the file “bible-text.txt“.

A simple command for listing directory contents told me the size of the file.

ls -l

Now I had a file that was 3,224,520 bytes (3.2 megabytes) in file-size, and I needed to procure random data of the exact same size. This was done by a couple of slight modifications to the original command:

dd if=/dev/urandom | grep -ao “[A-Za-z]” | tr -d “\n” | dd bs=1 count=3224520 > random-text.txt

As you can see, the input for the first dd command is now the built-in Linux random generator, which is piped into the same commands to remove illegal characters. This output is then piped back into a dd command, but is set up so that it creates a file with a specific amount of data. The bs value is the blocksize, which for ease of calculation I set to 1 byte, meaning I can then set the output file-size (count) as 3224520. The output of these commands is a string of random characters, equal in size to the string in bible-text.txt, and I saved this to the file random-text.txt.

A further use of the ls -l command confirms that both files are indeed the same size.

This method of generating a set size of random data is fast and easily scripted, which is why I thought it should be shared. Now I need ideas for experiments to run on the random data…

  • E-mail this story to a friend!
  • StumbleUpon
  • Digg
  • del.icio.us
  • Technorati
  • Reddit
  • The Atheist Spot
  • Facebook
  • Google

Possibly Related Posts:

Subscribe to my RSS Feed if you like what you read!

Written by Adrian Hayter

September 18th, 2008 at 7:03 pm

2 Responses to 'Random Data'

Subscribe to comments with RSS or TrackBack to 'Random Data'.

  1. #1

    Letters do not occur with equal frequency in English text. Your random text will have far too many Js, Qs, Xs, and Zs, and not enough vowels, so I suspect that your random text will yield fewer interesting results than the Bible text.

    Eric Haas

    19 Sep 08 at 3:13 pm (GMT)

  2. #2

    Instead of /dev/urandom, I’d use something repeatable like the digits of PI or e as a source. Perhaps calculate PI-3 in base 26 to create your series of letters.

    This would still have same distribution problem that Eric identified.

Leave a Reply


XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

NOTE: Comments with 2 or more links are automatically marked for moderation. Please don't post your comment again if you don't see it come up, it doesn't help anyone.

Please copy the string Q6X3GA to the field below: