Christopher Wellons
mosquitopsu@gmail.com
Public GPG Key
Post Index - see a list of all of the entries.
I am a computer engineer working at the Johns Hopkins University Applied Physics Laboratory. I love computers and free software.
Most of the topics you will find here are about hobby computing and programming with free software (and a few exceptions).
E-mail me: mosquitopsu@gmail.com
Projects:
- Brainfuck Compiler
- PNG Archiver
- BINI Tools
-
Parallel Mandelbrot Generator
Archives:
September 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
June 2008
July 2008
August 2008
September 2008
At work they recently moved me to a new project. It is a Matlab-based data analysis thing. I haven't really touched Matlab in over a year (the last time I used Matlab at work), and, instead, use GNU Octave at home when the language is appropriate. I got so used to Octave that I found a pretty critical feature missing from Matlab's implementation: treat an expression as if it were of the type of its output.
Let's say we want to index into the result of a function. Take, for
example, the magic square function, magic(). This spits
out a
magic square
of the given size. In Octave we can generate a 4x4 magic square and
chop out the middle 2x2 portion in one line.
octave> magic(4)(2:3,2:3)
ans =
11 10
7 6
Or more possibly clearly,
octave> [magic(4)](2:3,2:3)
ans =
11 10
7 6
Try this in Matlab and you will get a big, fat error. You have to assign the magic square to a temporary variable to do the same thing. I kept trying to do this sort of thing in Matlab and was thinking to myself, "I know I can do this somehow!". Nope, I was just used to having Octave.
Where this really shows is when you want to reshape a matrix into a
nice, simple vector. If you have a matrix M and want to
count the number of NaN's it has, you can't just apply
the sum() function over isnan() because it
only does sums of columns. You can get around this with a special
index, (:).
So, to sum all elements in M directly,
octave> sum(M(:))
In Octave, to count NaN's with isnan(),
octave> sum(isnan(M)(:))
Again, Matlab won't let you index the result of isnan()
directly. Stupid. I guess the Matlab way to do this is to
apply sum() twice.
Every language I can think of handles this properly. C, C++, Perl, Ruby, etc. It is strange that Matlab itself doesn't have it. Score one more for Octave.
Stream ciphers are one of the two types of symmetric key algorithms, which is when the same key is used for encryption and decryption. They follow this simple concept: take a good pseudo-random number generator and combine, usually by XOR, its output with your plaintext stream. This is very similiar to the one-time pad, but the random pad is pseudo-random rather than truly random. The key is the seed (or part of one) for the PRNG.
Probably the most well known stream cipher is the extremely simple, yet cryptographically strong, Arcfour algorithm. The official name is actually RC4, which comes from RSA Security where it was developed. It was a trade secret until someone anonymously leaked the algorithm to the public. The name RC4 is still trademarked, though, so Arcfour is generally used instead, meaning "Alleged RC4" (alleged because RSA Security never confirmed the algorithm as being RC4). You have almost certainly used the cipher yourself, because it is used in applications such as WEP and SSL.
The algorithm has two parts: the key schedule algorithm and pseudo-random generation algorithm. The key schedule uses the key, and possible a non-secret initialization vector, to set up the state of the PRNG. The state is an array of length 256 holding all of the values from 0 to 255 in some order, along with two integers (initialized to 0 after the key schedule). The algorithm looks like this,
for i from 0 to 255
S[i] := i
endfor
j := 0
for i from 0 to 255
j := (j + S[i] + key[i mod keylength]) mod 256
swap(S[i],S[j])
endfor
The PRNG deals with one byte at a time, emitting a stream of bytes,
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap(S[i],S[j])
output S[(S[i] + S[j]) mod 256]
endwhile
If you implement this in C and use the char type, you can
toss the modulus parts because they will just work automatically.
Now you just XOR your message with the output of the PRNG. The Wikipedia article probably explains it better than I can, so check it out if you still don't follow.
Now, Arcfour has some flaws. Specifically, the algorithm itself
doesn't specify how an initialization vector is applied, which is
important. Using a plan key twice is bad it allows an adversary to get
information easily. For example, Lets say you have two
messages A and B. You use the same
key k, which will produce the same keystream K. Now, you
create your two ciphertexts CA
and CB
CA= A ^ KCB= B ^ K
But notice if the adversary has both ciphertexts,
CA^CB= A ^ K ^ B ^ K = A ^ B
They are left with your two original messages XORed together. Let me illustrate: we have two messages as bitmap images (here as PNGs for the web),
Encrypt them using the same key. In this case, my key was "somekey".
If the adversary has both of these second "images", she can XOR them together (having no knowledge of the key!) and get this,
An initialization vector (IV) is a set of bytes we combine with the key. The IV is not a secret, as it is sent plaintext with the ciphertext. If different IVs were used above with the same key, XORing the ciphertext would result in nothing, because the keystreams are totally different for each message.
However, the way the IV is combined is important too. Simply concatenating the IV and the key can lead to weaknesses due to the way the key schedule algorithm works. Something more secure would be a cryptographic hash of the key and the IV. The reason WEP is broken is because in its design the IV wasn't used properly.
Another weakness is that the initial bytes of the keystream have low entropy. That is, some bits tend to be 0's or 1's consistently, which can leak information to an adversary. This can be countered by tossing the first few bytes of the keystream. Often the first 768 bytes are dropped, but a conservative amount would be 3072 bytes. Another way to deal with this is running the key schedule algorithm 10 or 20 times (not reinitializing the S array between them of course) rather than just once, which is the way CipherSaber-2 does it.
Yet another weakness is that the keystream becomes distinguishable from random data after about a gigabyte of output. That is, after about a gigabyte, the entropy of the overall stream can become too low and compromise the security of the message. A solution might be to change the IV each gigabyte.
I wrote an implementation of Arcfour in C, which you can get from my Git repostiory with,
git clone http://git.nullprogram.com/arcfour.git
Or grab a snapshot.
It is written as a library that can be used in different applications. Included are a couple programs that make use of it. I strongly suggest writing your own implementation. It is really easy to do, and you will automatically have the algorithm memorized once you do it. The Wikipedia article has some test vectors you can use to test it.
Don't stop here! This isn't everything. Check out the archives (on the left) for more posts. Or just have a look at the index.