Where is this strange pattern coming from? I didn’t have a good understanding of this until it was explained to me by Steven Miller. Consider a finite list of numbers that is reasonably well-behaved. For each member of our list, we can write it in scientific notation. So, for example. if we had 236 on our list, we would write it as 2.36 * 10^2. Note that the lead digit is then determined by what the lead digit is in the part of the scientific notation that isn’t the exponent. (This part is sometimes called the mantissa when one wants to be fancy) Now, for each number on our list, instead of looking at the number x, we can look at log10 x. What does this do to the scientific notation? Well, scientific notation then corresponds to the log of the mantissa + an integer that is the power of the exponent. So, for example, log10 (2.36 * 10^2)= .3729...+ 2.

Let’s examine the non-integer part of log10 x (call this f(x)) What distribution do we expect for f(x)?, It is a fixed value between 0 and 1 with no clear cut offs or biases in any direction so the most obvious thing to do is to make it a uniform distribution. That means that there’s about a 5% chance that f(x) falls below .05, about a 20% chance that f(x) falls below .2, about a 50% chance that f(x) falls below .5 and so on.

So what does this tell us about the leading digit? If the mantissa is below 2, then we have lead digit 1. If the mantissa is between 2 and 3, then we have lead digit 2 and so on. The mantissa is below 2 if f(x) is less than log10 2 = .301. Accordingly, we should expect that the mantissa is below 2 about 30.1% of the time. Thus, a number should have a lead digit of 1 about 30.1% of the time. Similar logic works for how frequently we should expect the lead digit to be 2, or 3 or so on.

What is wrong with the intuition that every lead digit is just as common? When we calculate probabilities, we are used to using the simplest probability distribution we can imagine, something like picking a positive integer from 1 to 10^n for some fixed n. We are used to this approach primarily because it is easy to calculate. Consequently, most probability problems in high school and college assume that we have such a uniform distribution since that assumption makes the math much easier. But actual distributions in real life don’t often look like this. For example, we might have a Bell curve or some other distribution. For almost any distribution that arises in nature, Benford’s law will apply due to the logic we used earlier.

So what does this do for us? Perhaps most importantly, we can use this insight to detect fraud. When humans try to make up data, it often fails to fit Benford’s law. In general, humans are bad at constructing data that passes any minimal test for randomness. Failure to obey a generalized version of Benford’s law was one of the major pieces of evidence for election fraud in the last Iranian election. The recent questions regarding whether Strategic Visions Polling was falsifying poll data arose when Nate Silver noticed that its results diverged substantially from Benford’s law.

For more information on Benford’s Law and related patterns in data, as well as more mathematical discussions of that data, see Terry Tao’s blog post from which I shamelessly stole the hard data about populations of nations.

## 8 comments:

This is probably the clearest way to explain Benford's law.

One of the nicest alternative ways has to do with scale invariance. If, in a given set of quantities, there is

anydistribution of first non-zero digits that is not merely an artifact of scale, it must be independent of the units in which the measurement is expressed. And if we take (say) the heights of buildings expressed in feet, in yards, etc., we find that they satisfy Benford's law regardless of the choice of units.There are also some fascinating applications of Benford's Law in probing the distribution of primes.

Here's another way to think about Benford's Law. Suppose that, in some entry in a table of values, we find an entry that begins with the digit 1. By what fraction of the total value will we have to increase it in order to change that digit? (No peaking at the remaining digits of the number!) It could be anything up to 100%; we might have to double the original value to flip that leading digit to a 2. But if the leading digit is a 9, we will at most have to increase the total value by about 11% in order to change the leading digit to a 1.

I suspect there is some fancy way to make the logarithmic values drop out of this consideration, but I'm too sleepy right now to work it out. Over to you!

http://www.youtube.com/watch?v=69xYD8oWxYg&feature=sub

I'm still a bit confused... why do we expect f(x) to be uniformly distributed between 0 and 1? Can't I pick any different f(x) : R -> S^1, repeat your argument, and derive a different Law?

Shalmo, I'm sorry I just watched the first five minutes of that video. What does it have to do with the original post?

Etienne,

Yes, and in general you will expect that well behaved functions will have a uniform distribution. Thus, for example if I took sqrt(x) mod 1 as my function it turns out for natural data this will have a uniform distribution. What we are seeing in this particular case is a general example of that sort of behavior.

The only specific reason we notice this in the case of Benford's law is because of the natural convenience of base 10 and the well behaved nature of the logarithm.

...so I guess then what's going on is that this works just as well with other functions, except that if we use any function other than a logarithm, we won't be able to draw any conclusions about the first digit?

Yes,

Although one can construct functions that deliberately don't map well to S^1. But you have to work at it and they generally fail in an obvious fashion.

Post a Comment