Math Vacation: May 2020

Saturday, May 30, 2020

Counting Prime Numbers

(Design vector created by freepik - www.freepik.com)

Imagine challenging a fellow mathematics fan to a friendly bar bet in which the person who best estimates the number of prime numbers found between a range of two given numbers wins a drink. Let your friend pick the first number and you will pick the second number. A list of prime numbers up to 10,000,000 is available here for you to check your answers.

What do you have to know?

You won't have to remember lengthy lists of prime numbers to win your bet. You just have to remember two numbers: 165 and 72.

After your friend suggests the first number, add 165 to get the second number of the range. To get your estimate, round the top number of the range to the nearest power of ten. Divide 72 by the number of zeros in the rounded number and this will be your estimate. So if your friend suggests 1000 for the first number, you'll state the second number will be 1165. Rounding 1165 to the nearest power of ten is 1000. Dividing 72 by 3 - the number of zeros in 1000 - yields an estimate of 72/3 = 24.

Going to the above link, if we count the number of primes between 1000 and 1165, we find the following primes:

1009,1013,1019,1021,1031,1033,1039,1049,1051,1061,1063,1069,1087,1091,1093, 1097,1103,1109,1117,1123,1129,1151,1153,1163. The count of actual primes is 24, matching your estimate.

Here's another example. Your friend picks 1 as the starting number; you add 165 so the final number is 166. Rounding 166 to the nearest power of 10 is 100. Divide 72 by 2 (the number of zeros in 100) and your estimate is 36.

Referring to the link above the the primes between 1 and 166 are:
2, 3, 5, 7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101, 103,107,109,113,127,131,137,139,149,151,157,163. The actual count is 38. Your estimate is still very close and probably closer than what your friend could estimate or count in a quick period of time.

How does this quick estimation work?

The key to this trick is the prime number theorem which states the number of primes less than a given number, n, is approximately: n/ln(n). Alternately, we can say the relative frequency of primes near n is 1/ln(n). Note: the prime-counting function is also discussed in another post: Math Vacation: The Frequency of Prime Numbers – The Prime Number Theorem (jamesmacmath.blogspot.com).

Our estimate used the nearest power of 10 as a starting point. Counting zeros in a number that is a power of ten is the same as taking the log(base 10) of that number. Log(100) = 2, Log(1000) = 3, Log(10000) = 4 etc. The prime number theorem uses ln(n). Converting from log(n) to ln(n) is a factor of 2.3.

The prime number theorem stated the relative frequency of primes near n is 1/ln(n), so our estimate for a range of 165 should be 165/ln(n). Converting from log(n) to ln(n), our estimate becomes 165/(2.3 log(n)) or approximately 72/log(n). The range of 165 was chosen so we would end up with 72 in the numerator. Since 72 is easily divisible by many numbers, your estimation task is a little easier.

A prior post wrote about the "Rule of 72" for quick approximations of compound interest. Now you have 2 uses of the number 72 to make quick estimations.

Credit is given to Grant Sanderson's site, 3b1b, for inspiring this trick. More on the natural logarithm is given here by Grant:

https://www.3blue1brown.com/videos-blog/what-makes-the-natural-log-natural-lockdown-math-ep-7.

Thursday, May 28, 2020

Trigonometric Proof of Pythagorean Theorem

This proof is similar to Bhaskara's second proof, but at the end we also derive an important trigonometric identity. In the diagram, A, B, C are the lengths of the sides of a right triangle. The lower case, a, represents the angle opposite of side A.

From the definitions of the common trigonometric functions of sin(a) and cos(a), sin(a) = A/C and cos(a) = B/C. These two equations can be rewritten as A = C sin(a) and B = C cos(a).

On the hypotenuse side of the triangle, C can be broken into two sub-lengths, A’ and B’, which represent the projection of sides A and B, respectively, onto side C.

From the similar triangles that are formed, sin(a) = A’/A and cos(b) = B’/B. These two equations can be rewritten as A’ = A sin(a) and B’ = B cos(a).

Substituting for sin(a) and cos(b), A’= A (A/C) and B’ = B (B/C) or A’ = A²/C and B’ = B²/C.

Given that side C = A’ + B’, we get C = A²/C and B’ = B²/C. Multiplying both sides by C, we now derive the Pythagorean Theorem:

C² = A² + B².

The above proof of the Pythagorean Theorem used the trigonometric functions to derive the terms of A²and B². Alternatively, we can keep the sin(a) and cos(a) functions to show another important identity. Returning to the terms for A’ and B’, we can also express these two lengths as A’ = A sin(a) and B’ = B cos(a). Substituting for A and B in these two equations, A’ = C sin(a) sin(a) and B’ = C cos(a) cos(a). Trigonometric convention allows these terms to be stated as A’ = C sin²(a) and B’ = C cos²(a).

Substituting these last two equations into the relationship of side C being the sum of A’ and B’,

C = A’ + B’

C = C sin²(a) + C cos²(a)

Next divide both sides of the equation by C, we now have an important trigonometric identity:

1 = sin²(a) + cos²(a)

Tuesday, May 26, 2020

Find Your Birthday or Phone Number in Pi

Princeton University has a site with the mathematical constant, pi, listed to 10 million digits. Readers are invited to search/find their phone number or other favorite numerical sequence in pi. I found my phone number - 7 digits, not 10-digit number, early in the the full sequence.

Tests of random numbers look for that n-digit length of numerical sequences are found with equal frequency in a sufficiently long sequence. I found my 7-digit phone number, but not unexpectedly, not my full 10-digit number, in the listing of pi. My 7-digit phone number represents one of 10 million listings so I wasn't surprised when my 7-digit number showed up. With a listing of pi's digits to more digits, I might find my full 10-digit phone number.

To try this exercise for yourself, go to the Princeton website linked above, and use your browser's "find on this page" tool to enter the numerical sequence of your choice. For my example, I used Microsoft Edge and the "find" tool is Control-F. Matching sequences on the page are highlighted.

Monday, May 25, 2020

Pi - How Many Digits are Really Needed?

A prior post showed how to estimate pi using random numbers. The spreadsheet linked to the post can estimate pi correctly to about 3 places; however, since each set of random numbers is different, the results will vary. Readers are encouraged to extend the spreadsheet beyond 1000 random numbers to improve the accuracy of the estimation.

Pi taken out to 50 decimal places is:

3.14159265358979323846264338327950288419716939937510...

Princeton University has a post where the digits of pi are listed to 10 million places. Beyond 100 places does not provide much immediate value. Just using pi to the 50 places listed above, one could calculate the circumference of the observable university to an accuracy smaller than the diameter of a single hydrogen atom. Readers are challenged to verify given that the estimate of the diameter of the observable universe is 8.8 x 10^26 m and the size of a hydrogen atom is about 1 angstrom (1/10,000,000,000 m).

For simple estimations, consider the following approximations and their relative error:

Approximation Relative Error

3 .045

22/7 .0004

355/113 .00000008

Update (8-17-2021): A Swiss team recently established a new record for calculating pi to the most digits: 62.8 trillion.

Sunday, May 24, 2020

Using Random Numbers to Estimate Pi

A friend recently asked that I post something about the mathematical constant, pi. It is the ratio of a circle's circumference to it diameter. To fifty digits, pi is

3.14159265358979323846264338327950288419716939937510...

There have been many methods established for calculating pi. One of the more interesting methods is derived from Euler's solution to the zeta function:

Euler showed that this function, in its limit, is:

This result has been used in number theory to establish the probability (P) of two random numbers being relatively prime (having no common factors greater than 1) is approximately:

Now, if we generate a large number of pairs of random numbers, we can count how many are relatively prime and then use the proportion to estimate pi using the formula:

I wrote a Google Sheet to make this approximation using 1000 pairs of random numbers. A link to this sheet is here and you can try for yourself. To refresh or change the 1000 pairs of numbers, simply update a blank cell in the sheet. In my first estimate, there were 612 of the 1000 pairs that were relatively prime yielding a proportion, P, of 0.612. This gave an approximation of pi of 3.13.

Pi Approximation - Google Sheets

Friday, May 22, 2020

Speed Limits and the Fibonacci Series

(By Amateria1121 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=31513168)

Travelers between the United States and Canada have to adjust to speed limits from miles per hour to kilometres per hour. Astute drivers usually learn a few quick approximations for converting between the two systems. Some common limits and their approximate conversions are shown below.

km/h	30	50	80	130
mile/h	20	30	50	80

The more precise conversion is 1 mile = 1.61 km or 1 km = 0.62 mile.

If the numbers above seem familiar, you may have noticed the conversion between the two measures is very close to the golden ratio or 1.618. This ratio has the property that it's inverse equals itself minus one: 1/1.618 = 0.618 = 1.618 -1.

The calculation for the golden ratio is:

To be clear, the relationship between kilometres and miles being near the golden ratio is only a coincidence.

A common approximation to the golden ratio is given by ratio of successive elements of the Fibonacci Series. The series begins as:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...

Each entry of the series is the sum of the previous two entries. Beginning with 1/1, the ratio of successive elements of the series is:

1, 2, 1.5, 1.667, 1.6, 1.625, 1.615, 1.619, 1.618

This ratio converges to the golden ratio of 1.6180339887...

Finally, return to the approximate kilometre-mile conversion table above, as we noted about successive entries of the Fibonacci series, each entry in both rows is the sum of the previous two entries.

Monday, May 18, 2020

The Rule of 72

Nearly anyone who has learned the time value of money has heard of the Rule of 72. The "rule" is a way to estimate the number of periods required to double an investment at a given interest rate. Conveniently, one can also use the rule to estimate the interest rate need to double principal invested at a given interest rate.

For example, at an interest rate of 6%, it takes approximately 12 years (72/6) to double your investment. Or, if one wanted to double their money in 9 years, you would need an interest rate of 8% (72/9) to achieve your goal. The rule just gives an approximate answer; however, the results are very good for a wide range of interest rates (or periods of time). The error in time compared to using compound interest formulas is under 4% for interest rates between 1 and 16% and the time error is under 2% for interest rates between 4 and 12%. Especially since 72 is easily dividable by many numbers, the rule is a good one for those willing to accept the small error.

While I've heard many people talk about the rule, I've never seen an explanation of why the rule works. Certainly, as given by examples, one can see how it works but I spent some time to determine why the rule works.

Here is my explanation:

Without knowing, the "rule of 72," suppose someone stated that there exists a number, x, for which the equation: NI = x holds true for the number of periods, N, at a given interest rate, I (in %), that an investment doubles. Could we now solve for the "unknown" number, x?

For our further calculations, we need to express interest in decimal format. Therefore, let i = I/100.

Given the double rule, and using the formula for compound interest:

Since N=x/I, and i=I/100, N=x/(100i). Substituting for N in the above equation:

Take the log of both sides of the equation:

Now, solving for x:

Now, let's substitute some interest rates into this equation:

Rate , i	x
.04	70.69
.05	71.03
.06	71.37
.07	71.71
.08	72.05
.09	72.39
.10	72.76

While the value, x, varies with each interest rate, for a wide range of rates, x is close to 72, and that is why the "rule" of 72 works. For very low interest rates, one could say it is better to use the rule of 70; however, 72 remains popular because it is easily divisible by many numbers and this makes the approximations easier.

Compound Interest

I recently read a debate between two colleagues who disagreed on the math topics that should be taught to high school students. The essence of the disagreement boiled down to classical topics versus practical. I'll offer one topic of practical importance that should be emphasized, compound interest. Understanding the power of compound interest over long period of times makes the difference of wealth accumulation versus debt accumulation.

Interest rates for savers are currently very low, but just two or three years ago, I bought a 5-year CD with 3.3% interest. The power of compound interest is that one earns interest on the principal and on the earnings. Simplified at bit, $100 becomes $103.30 after one year. After 5 years, the initial investment becomes: $100 x (1.033) x (1.033) x (1.033) x (1.033) x (1.033) = $117.63. That is a bit more than the $116.50 that would have been earned under a simple interest arrangement ($100 x 5 x 1.033).

On the debt-accumulation side, things are much worse as credit card companies are not generous in their terms to consumers. Consider a debt of $100 carried for 5 years at the 18%. This debt now becomes, $228.78.

The amount by which the debt or the asset grows depends on two key things, the interest rate and time. Above we saw how the grow varies by interest rate, but time becomes a big factor especially when long periods of time are involved.

For a final example, let's look at the purchase of Manhattan. This week marks the 396th anniversary of when Dutch settlers traded $24 dollars worth of trinkets in exchange for the now, very valuable, New York real estate. Had the sellers been able to invest in a 396-year certificate of deposit, what would their proceeds be worth now? That would be $24 times (1.033) x (1.033) ... x (1.033) or $24 x (1.033) raised to the power of 396 which now would be $9,203,194. The investments didn't exist at the time, but one with a very long-term time frame would be told by financial advisors to invest in something with more risk, like stocks. The last sixty years or so, the S&P500 index has yielded about 8%. An investment at 8% over four centuries becomes astronomical so for the final calculation, let's back that average to 6.5%. The $24, if invested in 1624, would now be worth $24 x (1.065) ^ 396 = $1.6 trillion. A search for the current value of all Manhattan real estate revealed the current value to be approximately $1.74 trillion.

JamesMcMahon

Adjunct faculty member of the University of Redlands, School of Business. Retired Quality Engineering Manager - Abbott Labs (32 years). Favorite classes to teach: Management Science, Statistics, Operations Management, Analytics. Contributor to the Online Encyclopedia of Integer Sequences (OEIS) https://oeis.org/.

Sunday, May 17, 2020

Octal Number System

Most our numbering systems are base-10 and this convention likely is a result of our ten fingers. If we ever come across extraterrestrial intelligent life, they may have a different number of digits and a different numbering system. For example, the popular little green man is often depicted with four on each hand. Such beings would gravitate to a base-8 or octal numbering system.

We don't have to leave our planet to encounter examples of octal systems. Some native American tribes used an octal system as they counted on their knuckles. More recently, octal systems have been used in computer platforms. Hexadecimal systems (using 16 digits) are also used but require the use of 6 letters in addition to the digits 0 through 9.

Volume measurements of dry goods are somewhat octal in their arrangement; 8 pints to a gallon; 8 gallons to a bushel; 8 bushels to a seam. The full system has other intermediate volumes. Each is double of the prior volume: pint, quart, pottle, gallon, peck, kenning, bushel, strike, coomb, and seam.

Octal numbering systems have been proposed to facilitate division of quantities into halves and quarters, but that doesn't seem like a good enough reason to drop our base-10 system. If easy divisibility was the goal, we would adopt the ancient Babylonian sexagesimal system (base-60). This system was used because 60 has many factors: 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, 60. We have carried on the sexagesimal system for our units time with 60 seconds in a minute and 60 minutes in an hour. Also, with our system of geographical positions one degree of longitude or latitude is divided into 60 minutes and each minute into 60 seconds. The nautical mile is one minute of arc along the meridian.

(Image: https://openclipart.org/artist/laurent)

Saturday, May 16, 2020

Imprisoned Mathematicians

Some prior posts discussed clergymen who were mathematicians (Bayes, Lemaitre); it's come to my attention that there are also many cases of imprisoned mathematicians. I'll just give three examples.

Galileo Galilei was placed under house arrest in the 17th century due to his support of the heliocentric view of the solar system. In seclusion he wrote Two New Sciences, which established required reading for future generations of engineers on the topics of kinematics and strength of materials.

Moving forward, during World War II, Jakow Trachtenberg was a Russian Jew imprisoned in a Nazi concentration camp. He survived the ordeal and during his imprisonment he developed a system of mental calculations, now known as the Trachtenberg System.

Most recently, Christopher Havens, serving 25 years for murder, published his work on continued fractions in the journal Research in Number Theory in January 2020. What makes his story interesting is that Havens mathematical ability was largely self-taught in prison. As he progressed in his studies, he reached out for tutoring and is earning a associate science degree from Adams State University by correspondence.

Perhaps the one favorable condition these mathematicians had during their imprisonment was the forced solitude which they put to good use. It's a good lesson as we all deal with the current solitude we face during the Covid-19 shelter-in-place orders.

(image: LIMSKO)

JamesMcMahon

Friday, May 15, 2020

Worm Farm Census

I keep a worm farm for the purpose of vermicomposting kitchen waste. Recently, I started new bins to increase the project's capacity. I had enough extra worms that I also helped some friends to begin their worm farm. The project suffered a set-back as locally we had a very early heat wave that killed many of the worms. My expansion bin has recovered a bit and I decided to make an estimate of the worm population. It is 2020 and everyone should be counted.

Method

The expansion project is shown below.

There are two 20-liter bins. Each is approximately half full. I used a small measuring cup (.75 ml) and took six samples from each bin. Two samples each at the bottom, middle and top of the bins. This sample cup size was chosen because this amount can be spread over a paper plate to facilitate the counting of the worms.

Also noted, but not counted, was the presence of pods which is an indication that the worm population is reproducing and that new worms will soon emerge.

Results

The results are given in the table below:

Bin	Position	Worm Count	Pods Present Y/N
Top	Top	2	N
Top	Top	1	N
Top	Mid	6	N
Top	Mid	4	Y
Top	Bottom	3	Y
Top	Bottom	3	Y
Lower	Top	7	Y
Lower	Top	4	Y
Lower	Mid	7	Y
Lower	Mid	4	Y
Lower	Bottom	2	Y
Lower	Bottom	1	Y
Average		44/12 = 3.67

Given an average sample count of 3.67 per 75 ml, yields an average worm density of 3.67/.075 = 49 worms/liter. The total project volume is approximately 20 liters so my current worm census is 980 worms.

Conclusion

We now have a baseline against which future counts can be compared. The presence of pods throughout the bins indicates the worms are healthy enough to reproduce. As a goal for future measurements, my first worm farm has a worm density of about 225 worms/liter so the expansion farm still needs some careful monitoring until the population increases.

Update: 6-11-2020

A repeat census was conducted. Since the prior month, a new bin (or layer) was added to the original two bins. Samples were taken from 3 locations in each of the 3 bins. The average count was 6 per 75 ml for a average density of 80 worms/liter. Additionally, pods were present in every sample of the two lower bins but not in any of the samples of the top (most recently added bin).

Update: 1-23-2022
In the book, The Math of Life & Death, Kit Yates describes the capture-recapture method of estimating wildlife populations (method reviewed in this post).

Prime Number Gap Conjectures

A prior post discussed gaps between successive prime numbers and included a spreadsheet for exploring these gaps for all primes under 10,000. In this modest set of primes, the largest gap we see is 36. Seeing these gaps grow, raises the question of how large the gap between successive primes can be. It may be possible the gap grows without limit. As of August 2018, the largest known prime gap has length 1550, found by Bertil Nyman. This gap occurs between the prime 18,361,375,334,787,046,697 and the next prime.

There are many theories proposed about the gaps between prime numbers. One is Andrica’s Conjecture which states for all primes:

This difference is given in the linked spreadsheet for all primes under 10,000. While these differences vary, empirically they appear to max out at approximately 0.670873.

Another popular theory is Legendre’s Conjecture which states that for all n, a prime number exists between:

The linked spreadsheet does not test Legendre’s conjecture, but readers are invited to modify it in their exploration of prime numbers.

Math comedian, Matt Parker, has a good video on prime number gaps.

Thursday, May 14, 2020

Prime Number Gaps

(Image credits: https://commons.wikimedia.org/wiki/User:David_Eppstein/Gallery)

An interesting topic is the gap between prime numbers. A prior posts discussed twin primes (those with a gap of 2). As we review the list of prime numbers, the first few gaps are mundane. The minimum gap is between 2 and 3. Since all prime numbers > 2 are odd, all subsequent gaps are at least 2. Progressing through the list of prime numbers below 100, most gaps are 2, 4 or 6. Just as we near 100, the gap between the primes 97 and 89 is 8. This observation invites the question of just how high this gap can become.

A spreadsheet with the first 1229 primes is linked here.

The gaps continue to increase and the largest gap in the first 1229 primes is 36 and occurs occurs between 9551 and 9587.

Benford Distribution - Additional Thoughts

Shown above is a photo of one of my many slide rules. This is my favorite, an all-metal Picket Model N4-ES. Many of the scales are arranged in logarithmic basis. Benford's law states that with collections of numbers ranging over multiple magnitudes, there is a tendency for more numbers to begin with low digits than with higher digits. One explanation uses a logarithmic scale as one shown below:

(image credit: https://commons.wikimedia.org/wiki/User:GKFX )

Let this scale represent a collection of data spanning three orders of magnitude. Picking a point along the scale at random, one will has a 30% of landing with a number beginning with 1 and under just under 5% for a number beginning with 9.

Link to prior post on Benford Distribution.

Wednesday, May 13, 2020

Quadratic Equation Solution - Completing the Square

Al-Khawarizmi was a ninth-century mathematician who developed the solution to the quadratic equation we commonly refer to as completing the square. It is an appealing visual, geometric proof.

Example: Given the following quadratic equation, solve for the value x.

This equation can be represented by a sum of a square with side x and a rectangle wtith sides of 8 and x.

Now split the rectangle into two equal halves forming rectangles with sides of 4 and x. Rearrange the rectangles around the square. The sum of the areas remain the same.

Complete the square - make the full arrangement square by adding a smaller square in the 4 x 4 gap (the yellow square below). The new area of the total arrangement is 65 + 16 = 81. The sides of the larger square are 9, the square root of 81.

Since each side of the completed square is 9, x = 9 - 4 = 5. Confirm the solution by

substituting for x in the original equation:

5 x 5 + 8 x 5 = 65

Tuesday, May 12, 2020

Finite Difference Method for Determining Coefficients of a Quadratic Equation

Another post gives a geometric solution to solving the quadratic equation. But how does one solve for the coefficients of the quadratic equation if you are given the first few terms of the sequence?

Suppose you have a sequence of numbers you believe is generated by a quadratic equation and you wish to determine the equation's coefficients. The finite difference method can be used. Consider the series beginning 3, 12, 25. You wish to find the coefficients a, b, and c so that:

will result in the answers of 3, 12, and 25 when one substitutes 0, 1, and 2 for x in the equation.

The general form for the method is let the first terms of the series be designated t₁, t₂, and t₃. Then the coefficients can be found by the following equations:

a = ½ (t₃ - 2t₂ + t₁)

b = (t₂- t₁) - a

c = t₁

In our example above, this would give us:

a = ½ (25 - 24 + 3) = 2

b = (12 - 9) - 2 = 7

c = 3

Therefore, the quadratic equation to generate the sequence 3, 12, 25 is:

In an early post, I introduced the Lazy Caterer Sequence, which starts: 1, 2, 4. Solving for a, b, c yields:

a = ½ (t₃ - 2t₂ + t₁) = ½ (4 - 4 + 1) = ½

b = (t₂- t₁) - a = (2 - 1) - ½ = ½

c = t₁ = 1

Monday, May 11, 2020

Fractals

Upon reading that I had started a Math Blog, a colleague sent me a short, yet encouraging, message that summarized a casual lesson I shared with him ten years ago "Fractals are everywhere." Rather than a review of the equations that create fractal patterns, I find it more satisfying when we see fractal patterns in nature. Checking up on a young cedar in my yard, I saw this example of a self-similar pattern in nature.

More examples and better photographs than mine are available at Fractals in Nature.

Update 7/11/2021 - Article in Science News: How Romanesco cauliflower forms its spiraling fractals

Shout out to Steven Lasry for this image.

Sunday, May 10, 2020

Benford Distribution

In a statistics class I teach for the University of Redlands, School of Business, we spend a lot of time graphing data and reviewing distributions. It is important for students, or anyone analyzing data, to understand the underlying distribution of their data. Sometimes the distribution is known or given. When the distribution is not known, graphing the data can provide clues to the underlying distribution. Once the distribution is known, then one can then make calculations about to further describe the data (descriptive statistics) or to make hypotheses about a population (inferential statistics).

Here are some typical data sets and the type of distributions that describe them:

Heights of people - Normal Distribution

Number of arrivals/time at an emergency room - Poisson Distribution

Time between successive patients at an ER - Exponential Distribution

Number of "Heads" when flipping 5 coins at a time - Binomial Distribution

Random numbers produced by Excel function RAND() - Uniform Distribution

First digit of entries in a large collection of numerical data - Benford Distribution

The last distribution listed above is one of my favorites because it is not intuitive. For example, consider a forensic accountant reviewing a large number of expense receipts submitted by expense reports. Such expenses vary over several orders of magnitude - a cup of coffee for $2.45 to an airline ticket for $2,500. When presented to students, the immediate response is that the distribution of the first digit of all these entries should be uniformly distributed between 1 and 9 (that is, approximated 1/9 of the entries should begin with a 1, 1/9 with a 2, etc.). However, these entries will follow the Benford Distribution which says more entries will begin with a lower digit than a higher digit.

The Benford Distribution is given by:

where P(d) is the probability of d as first digit and ln is the natural logarithm.

First Digit	Distribution
1	30.1%
2	17.6%
3	12.5%
4	9.7%
5	7.9%
6	6.7%
7	5.8%
8	5.1%
9	4.6%

I looked at the distribution of the first digit of the longest rivers in the world. This list of rivers with over 180 entries starts with Nile (6650 km) and ends with the Finders in Australia (1004 km). The distribution of the first digit of the lengths is:

First Digit	Distribution Frequency
1	125
2	35
3	14
4	7
5	3
6	4
7	0
8	0
9	0

While this example doesn't follow the full distribution, it does follow the trend that low digit entries far exceeds the higher digit entries. Re-examining the list, note the last entry is just over 1000 km; if the list continued, the next entries would probably be in the 900's, 800's and 700's filling. This is not a perfect example as the range of the underlying data does not extend more than one order of magnitude.

I tried another list: largest city populations. The list given starts with Tokyo (37 million) and ends with Guadalajara (5 million). Again, we barely have one order of magnitude; however, if we continued the list we would see many orders of magnitude as small town populations of a few hundred are reached.

First Digit	Distribution Frequency
1	27
2	5
3	1
4	0
5	16
6	14
7	8
8	6
9	4

Wikipedia's entry on the Benford Distribution gives a better example using the heights of the 60 tallest structures in the world.

The distribution is given for the first digit of the heights as listed in meters or feet.

Leading digit	meters		feet		In Benford's law
Leading digit	Count	%	Count	%	In Benford's law
1	26	43.3%	18	30.0%	30.1%
2	7	11.7%	8	13.3%	17.6%
3	9	15.0%	8	13.3%	12.5%
4	6	10.0%	6	10.0%	9.7%
5	4	6.7%	10	16.7%	7.9%
6	1	1.7%	5	8.3%	6.7%
7	2	3.3%	2	3.3%	5.8%
8	5	8.3%	1	1.7%	5.1%
9	0	0.0%	2	3.3%	4.6%

I read that the first digit of powers of 2 gives a good Benford Distribution. In the sheet linked here, I confirmed by raised 2 to the power of 0 through 100:

First Digit	Count
1	31
2	17
3	13
4	10
5	7
6	7
7	6
8	5
9	5

Update: a reasonable explanation for the distribution is presented in another post.

Math Vacation