Where is this strange pattern coming from? I didn’t have a good understanding of this until it was explained to me by Steven Miller. Consider a finite list of numbers that is reasonably well-behaved. For each member of our list, we can write it in scientific notation. So, for example. if we had 236 on our list, we would write it as 2.36 * 10^2. Note that the lead digit is then determined by what the lead digit is in the part of the scientific notation that isn’t the exponent. (This part is sometimes called the mantissa when one wants to be fancy) Now, for each number on our list, instead of looking at the number x, we can look at log10 x. What does this do to the scientific notation? Well, scientific notation then corresponds to the log of the mantissa + an integer that is the power of the exponent. So, for example, log10 (2.36 * 10^2)= .3729...+ 2.
Let’s examine the non-integer part of log10 x (call this f(x)) What distribution do we expect for f(x)?, It is a fixed value between 0 and 1 with no clear cut offs or biases in any direction so the most obvious thing to do is to make it a uniform distribution. That means that there’s about a 5% chance that f(x) falls below .05, about a 20% chance that f(x) falls below .2, about a 50% chance that f(x) falls below .5 and so on.
So what does this tell us about the leading digit? If the mantissa is below 2, then we have lead digit 1. If the mantissa is between 2 and 3, then we have lead digit 2 and so on. The mantissa is below 2 if f(x) is less than log10 2 = .301. Accordingly, we should expect that the mantissa is below 2 about 30.1% of the time. Thus, a number should have a lead digit of 1 about 30.1% of the time. Similar logic works for how frequently we should expect the lead digit to be 2, or 3 or so on.
What is wrong with the intuition that every lead digit is just as common? When we calculate probabilities, we are used to using the simplest probability distribution we can imagine, something like picking a positive integer from 1 to 10^n for some fixed n. We are used to this approach primarily because it is easy to calculate. Consequently, most probability problems in high school and college assume that we have such a uniform distribution since that assumption makes the math much easier. But actual distributions in real life don’t often look like this. For example, we might have a Bell curve or some other distribution. For almost any distribution that arises in nature, Benford’s law will apply due to the logic we used earlier.
So what does this do for us? Perhaps most importantly, we can use this insight to detect fraud. When humans try to make up data, it often fails to fit Benford’s law. In general, humans are bad at constructing data that passes any minimal test for randomness. Failure to obey a generalized version of Benford’s law was one of the major pieces of evidence for election fraud in the last Iranian election. The recent questions regarding whether Strategic Visions Polling was falsifying poll data arose when Nate Silver noticed that its results diverged substantially from Benford’s law.
For more information on Benford’s Law and related patterns in data, as well as more mathematical discussions of that data, see Terry Tao’s blog post from which I shamelessly stole the hard data about populations of nations.