Sunday, November 9, 2008

Diffie-Hellman: How to share a secret with everyone listening.

This post is the third post of a series of three posts on why we care about prime numbers. The first entry gave an overview of primes. The second discussed the link between Mersenne primes and perfect numbers. This post will discuss practical applications of large prime numbers and provide a concrete example.

The main focus of this post is the Diffie-Hellman algorithm. Diffie-Hellman is a procedure whereby two people can have a conversation and at the end of the conversation they will share a secret number. The remarkable thing is that even if an eavesdropper hears the entire conversation, the eavesdropper will have no feasible way of figuring out the secret. . By repeating this procedure many times two people develop can have a list of secret numbers which they can use to encrypt messages to each other that only they can read. As is standard in cryptography, I will refer to three people, Alice and Bob who want to share a secret number and Eve the evil eavesdropper wishes to obtain their secret number.

There is a standard analogy in cryptographic circles to accomplish this sort of task. Suppose Alice and Bob want to exchange an object through a mail system that they don't trust. But they know the mail employees aren't going to be able to get into a locked box. How can Alice send an object to Bob? She cannot just send a locked box because Bob won't have a key. Nor can she send a box and a key because then the corrupt mailmen can use the key to open the box. Here is the solution: Alice puts her object into the box and locks it with her lock. She mails this to Bob. Bob puts a second lock on the box and mails it back to Alice. Alice now removes her lock and sends it back to Bob. Bob can now remove his own lock and gain access to the contents of the box.

Before we discuss this algorithm in detail we need a few small mathematical facts. We call doing arithmetic where we only care about the remainder when divided by k to be “arithmetic mod k”. So for example, 4+6= 3 mod 7 since 4+6 = 10 and the remainder of 10 when divided by 7 is 3. For a concrete analog, when we tell what time it will be on a clock some number of hours from now we are doing arithmetic mod 12.

Now, we need to define a primitive root: Suppose for a fixed prime p we have some u such that u1 u2 u3 u4… up-1 mod p cycle through 1,2,3,4… p-1 in some order then we say that u is a primitive root for p. For example, if p=5, then 2 is a primitive root for 5 since 21=2,22=4,23=3,24=1. But 4 is not a primitive root for 5 since 41=4, 42=1,43=4,44=1 so we only cycled through 1 and 4. Now, it happens that every prime p has some primitive root u. The proof will not be given here but this fact is the only piece of non-trivial number theory we will need for this procedure.[1]

The last detail we need is to note how it is possible to quickly raises numbers to large powers. Naively, if I want to calculate xn for some fixed x and large integer n I need to multiply by x n times. And this applies if I am operating mod k also. But we have a nice trick called repeated squaring that allows us to get around this. Observe that if we know xy then we can calculate x2y by squaring xy. Thus, we can easily calculate x2, x4, x8, x16 all the way up to the largest power of 2 less than n. Then we can write n in base 2 and calculate xy in the obvious fashion by multiplying together the relevant powers of the form x2^t. Thus, calculating xn takes at most c log n multiplications for some fixed constant c.

We are now ready to outline the procedure. Alice says to Bob: Hey Bob, I want to talk to you. Let’s use Diffie-Hellman. Bob says ok and picks a large prime p (by large probably in the order of 100s of decimal digits) and Bob finds a primitive root u for this prime. Bob then picks a random positive integer between 1 and p-1, call it b. Bob calculates B = ub mod p which is easy to calculate by our above remark. Bob then transmits B, p and u to Alice. Alice picks some a and calculates A = ua mod p and sends A to Bob. Alice calculates S = Ba = (ub)a = uab mod p and Bob calculates by taking Ab mod p. Alice and Bob now share S. However, suppose we have our evil eavesdropper Eve. What information does she have? She has A. She has B. She has u and she has p. There’s no obvious way for Eve to work out uab mod p. If Eve had a fast way of finding a or b from knowing ua=A mod p she’d be set. But that turns out to be quite hard. This problem, known as the discrete log, is very difficult in general. So Eve is stuck. Alice and Bob can construct an agreed upon code by repeatedly using this procedure and there’s nothing Eve can do about it. If Alice and Bob set p to be a few hundred digits than the computation necessary for Eve to work out what S is turns out to be more than all the computational power on Earth today.

Let's do a small example with Say p=17 and u=3. Bob says "My prime is 17, my primitive root is 3." Bob picks 6 as his random number. Bob calculates that 36 mod 17 which is 15. So he says "my B is 15."

Alice picks a random number, say 13 and calculates 313 mod 17 which is 12 so she says "My A is 12.".

Bob then calculates A^b mod 17, which is 126 mod 17 which is 2. Alice calculates B^a mod 17 which is 1513 mod 17 which is 2

So Alice and Bob now share the secret number S = 2. Obviously, for 17 it isn’t that hard for Eve to work out the answer. She just needs to write out a table of powers of 3 mod 17. But if instead of 17 she had to deal with a prime that was a few hundred digits long that strategy would never work; Eve's table would have more entries in it than the number of atoms in the universe. There are other strategies Eve could try to attack this but in general she doesn’t have many options.

An important caveat: No one knows a quick way of doing the discrete log but it has not yet been proven that there is no quick way to do so. Almost everyone who has studied the problem is very sure that there isn’t a quick way.

Diffie-Hellman turns out to have a variety of practical problems with implementation. So more often a related algorithm called RSA is used instead. In a later post, I will discuss RSA. RSA has even more counter-intuitive properties than Diffie-Hellman. While Diffie-Hellman required Alice and Bob to be work with each other, in RSA the intended recipient simply needs to publish a key which can then be used by anyone to encrypt messages such that only the recipient can read them. The standard analogy used by cryptographers to describe this procedure of a box that anyone is able to lock but only the person with a special key can unlock. The fact that this is mathematically doable turns out to be very useful. Readers have likely used RSA without even realizing it if they have engaged in almost any form of commerce over the internet. Modern web browsers and other software implement RSA in a way that is invisible to casual user. Even though one does not notice it RSA is responsible for keeping credit card numbers and similar data to safe and secure.

Both Diffie-Hellman and RSA rest heavily on the ability to find large primes. Moreover, the security of these algorithms is connected strongly to our understanding of prime numbers and their properties. Thus, the mathematics of large primes is vital to the functioning of the modern economy.

Edit: Added the locked box analogy.

[1] In fact we won’t really need that fact other than that we need in the procedure access to a prime that has a primitive root and we want to know that there are large primes with this property. Since all primes have this property, so do all large primes.


Etienne said...

There aren't any tricks Eve can use to ease her task? For instance, she can cut her search space in half by raising u^a to (p-1)/2, to detect if a is odd or even. Similarly, if p-1 has any other easily-found small divisors, these can be leveraged to gain additional information about a.

Etienne said...

Also, there appears to be a slight bug in your post:

"Let's do a small example with Say p=17 and u=3. Bob says "My prime is 17, my primitive root is 3." Normal 0 false false false MicrosoftInternetExplorer4 Normal 0 false false false MicrosoftInternetExplorer4 "

I'm using Firefox, if that has anything to do with it.

Joshua said...

Thanks for picking up on the bad html. I normally write posts in Word and then import them directly to blogger. Sometimes the html doesn't always cooperate. (Hiss for not supporting LaTeX).

As to your first comment, yes there are tricks that Eve can use. I've glossed over some details. In particularly, to actually implement this you want to pick a prime p such that p-1 does not have many small prime divisors. This is in fact related to why factoring is connected to discrete log. If Eve can easily factor p-1 completely then Eve's task becomes easier. However, if you have only a small partial factorization then it doesn't help that much. If you have p to be around say 10^100 then a factor of 2 isn't going to be that helpful. To compare, that's still a lot larger than 10^80 which is the standard estimate for number of atoms in the observable universe.

John said...

You could embed PDF from latex couldn't you. I do love how every new iteration of html is supposed to have better math but it doesn't seem to happen

Watson Ladd said...

The connection is a bit different: basically factoring and discrete logarithms in Z/Z_p can both be turned into factoring a bunch of numbers by a bunch of small primes.

Joshua said...

Watson, you are correct about the formal connection. The above remark is more of a general argument for why they are connected.