Logarithms

Introduction

Suppose you are a forest ranger tasked with monitoring the growth of trees in a forest reserve. When you begin, the forest contains 100 trees. Over the next four years, you observe that the number of trees doubles each year:

Year 0: 100 trees
Year 1: 200 trees
Year 2: 400 trees
Year 3: 800 trees
Year 4: 1.600 trees

Although the percentage change remains constant—each year the number of trees increases by 100%—the actual number of new trees grows exponentially: 100 new trees in the first year, 200 in the second, 400 in the third, and so on.

Disclaimer

Of course this is a hypothetical example. In practice, forests don’t grow precisely like that.

This type of growth can be expressed using an exponential function, as follows:

\[N = 100 \times 2^t\]

where \(N\) is the number of trees and \(t\) is the number of years.

This formula makes it easy to predict the number of trees at any future time point. For example, after two years, the expected number of trees will be: \[N = 100 \times 2^2 = 100 \times 4 = 400\]

However, suppose we reverse the question: In how many years will there be 6,400 trees? We must solve for \(t\) in the equation:

\[6.400 = 100 \times 2^{t}\]

To isolate \(t\), we use logarithms. Taking base-2 logarithms of both sides:

\[\log_26400 = \log_2(100 \times 2^t)\]

\[\log_26400 = \log_2(100) + \log_2(2^t)\]

\[\log_26400 = \log_2(100) + t\]

Solving for \(t\), we have:

\[t = \log_26400 - \log_2100\]

\[t = \log_2 \left( \frac{6400}{100} \right)\]

\[t = \log_2(64)\]

\[t = 6\]

Thus, the forest reaches 6,400 trees after six years. In this example, the logarithm allows us to solve an equation in which the unknown appears in the exponent.

This illustrates one of the most important uses of logarithms: solving equations involving exponential growth. Logarithms have broad applications in Data Science, Finance, Biology, Medicine, and many other fields. Yet, while many people know that \(\log_22 = 1\), they may not fully understand the utility of logarithms. This chapter introduces the intuition and mathematical properties of logarithms and demonstrates their practical value, within the Data Science context.

Logarithms in Depth

Historical Purpose of Logarithms

Logarithms were developed by John Napier and Henry Briggs to simplify calculations (Havil, 2014). By converting multiplication into addition and division into subtraction, logarithms offered a powerful computational shortcut in the pre-digital era. For instance, instead of multiplying 32 by 16 directly, we express both as powers of 2:

\(32 = 2^{5}\)
\(16 = 2^4\)

Then:

\[32 \times 16 = 2^{5} \times 2^{4} = 2^{9}\]

Some Terminology

Before we discuss the mathematical properties of logarithms, it is important to understand the terminology. Suppose we have the following mathematical equation:

\[2^{4} = 16\]

In this equation, number 4 is called power, exponent or index, and number 2 is the base. If we have \(3^{4} = 81\), then we say that number 3 is the base and number 4 is the power(or exponent or index).

Logarithms in Equations

Suppose we have the following equation:

\[16 = 2^{4}\]

This equation is equivalent to:

\[\log_216 = 4\]

We describe this equation as ‘log to base 2 of 16 equals 4’. We can confirm this result in R by using the following code:

# Calculating log of 16 with base 2
log(16, base = 2)

[1] 4

The subscript of a log shows the base (2) and the result (4) is the power of that base that leads to the number (16) of the log. This is what we used in the introduction example, in which we asked in which year we will have \(N\) number of trees. In other words, to which power do we need to raise the base 2 in order to have a result of 16? The answer is 4. More generally, we have the following mathematical equation:

\[x = \alpha^n \Leftrightarrow \log_{\alpha}x = n\]

A special case of this property is \(\alpha = \alpha^{1}\). We know that this equation holds and, as a result, we can conclude that:

\[\log_{\alpha}\alpha = 1\]

We can see that this is the case when using R to do the calculation:

# a = 10
log(10, base = 10)

[1] 1

# a = 2
log(2, base = 2)

[1] 1

Therefore, when we use logs, we essentially “pivot” on the base of a number; that number was the value of 2 in our tree example.

Power in a Logarithm

We saw that:

\[x = \alpha^n \Leftrightarrow \log_{\alpha}x = n\]

Assume that instead of \(x = \alpha^n\), we had:

\[x^{z} = (\alpha^{n})^{z} \Leftrightarrow x^{z} = \alpha^{nz}\]

By taking the log of that equation, we have:

\[\log_{\alpha}x^{z} = \log_{\alpha}\alpha^{nz} = nz\]

We know that \(\log_{\alpha}\alpha^{nz} = nz\) because if we raise the base \(\alpha\) to the power of \(nz\), the result would be \(\alpha\) to the power of \(nz\).

Since we saw earlier that \(\log_{\alpha}x = n\), we can conclude that the following property holds:

\[\log_{\alpha}x^{z} = z \cdot \log_{\alpha}x\]

From Multiplication to Addition

Suppose we have the following two equations:

\(x = \alpha^{n}\)
\(y = \alpha^{z}\)

Did you notice that the base on the right hand side is the same number? If we want to multiply numbers x and y, then we have:

\[xy = \alpha^{n} \times \alpha^z = \alpha^{(n + z)}\]

From the property of logarithms we previously explained, we have the following:

\[xy = \alpha^{(n + z)} \Leftrightarrow \log_{\alpha}xy = n + z\]

At the same time, if we apply the previous property in each of the two equations separately, we get:

\[x = \alpha^{n} \Leftrightarrow \log_{\alpha}x = n\] and \[y = \alpha^{z} \Leftrightarrow \log_{\alpha}y = z\]

Therefore, we can conclude that the following property holds:

\[\log_{\alpha}xy = n + z \Leftrightarrow \log_{\alpha}xy = \log_{\alpha}x + \log_{\alpha}y\]

From Division to Subtraction

Similarly, suppose we have (again) the following two equations:

\(x = \alpha^{n}\)
\(y = \alpha^{z}\)

This time, we want to divide \(x\) by \(y\). We have:

\[\frac{x}{y} = \frac{\alpha^{n}}{\alpha^{z}} = \alpha^{n - z}\]

Respectively, from the first property of logarithms we discussed, we have the following:

\[\frac{x}{y} = \alpha^{n - z} \Leftrightarrow\]

\[\log_\alpha\frac{x}{y} = \log_\alpha{\alpha}^{n - z} \Leftrightarrow\]

\[\log_{\alpha}\frac{x}{y} = n - z\]

As we did previously, we can apply logs to each of the initial equations separately:

\[x = \alpha^{n} \Leftrightarrow \log_{\alpha}x = n\]

and

\[y = \alpha^{z} \Leftrightarrow \log_{\alpha}y = z\]

Therefore, we can conclude that the following property holds:

\[\log_{\alpha}\frac{x}{y} = n - z \Leftrightarrow \log_{\alpha}\frac{x}{y} = \log_{\alpha}x - \log_{\alpha}y\]

Inverse

Suppose we have the following two logarithms:

\(\log_93 = \frac{1}{2}\)
\(\log_39 = 2\)

For the first, we understand that we need to raise the base 9 to the power of 1/2 (which is equivalent to the square root) in order to get 3. For the second, we see that we need to raise the base 3 to the power of 2 in order to get 9.

More broadly, it becomes evident that we can substitute the denominator of the right hand side of the first equation with the left hand side of the second equation. We therefore have:

\[\log_93 = \frac{1}{\log_39}\]

Trying this with different numbers, the conclusion will always be the same. Thus, the following mathematical equation:

\[\log_{\alpha}\beta = \frac{1}{\log_{\beta}\alpha}\]

Most Common Bases (and Why \(e\) Is Important)

The two most common bases are \(10\) and \(e\). A log without a specific base implies that its base is 10. However, in R, the function log() has e as its base. Since we know that \(\log_{10}10 = 1\), let us confirm this by using the following code:

# Specifying the base
log(10, base = 10)

[1] 1

# Not specifying the base
log(10)

[1] 2.302585

The exponential constant \(e\) is a special number—about 2.71828—that naturally appears when things grow continuously, such as populations, interest, or radioactive decay. It is the base of natural growth processes and is very important in applications across physics, biology, and other fields. Logarithms with base e are usually written as \(\ln\)⁡ and are called natural logarithms.

Logarithms of 1 and 0

From our high school years, we may remember that any number \(\alpha\) that is raised to the power of 0 is 1. So, \(\alpha^{0} = 1\). The logarithm in this cases is:

\[\log_{\alpha}1 = 0\]

In those cases, the base of a logarithm does not matter; any number that is raised to the power of 0 is 1.

One question that may arise is “What about 0 or negative numbers?”. Indeed, we cannot take the logarithm of a non-positive number, including 0, using the standard logarithmic functions such as the natural logarithm (\(\ln\)) or the common logarithm (log with base 10). This is because the logarithm function grows infinitely negative as its input approaches 0 from the positive side. Let us confirm this in R by using the following code:

# Trying to calculate log of 0
log(0)

[1] -Inf

Logarithms in Quantitative Analysis

Thus far we learned (or reviewed) the basic mathematical properties and intuition behind logarithms. However, we still have not discussed how and why we would use logarithms in practice. In other words, why are logarithms (logs) useful in fields such as Data Science? Although the trick that we replace multiplication with addition seems smart, nowadays we have computers and calculators that can solve much more complicated formulas.

The main purpose of using logarithms in quantitative analysis is because we are interested in the percentage change (growth rate) itself, not the absolute values of a variable (Wooldridge, 2022). This was highlighted in the introduction example with the growth of the number of trees in the forest. In that example, we mentioned that the percentage change (growth) is constant and that the number of trees grows exponentially.

To understand this better, let’s create the following data frame:

# Libraries
library(tidyverse)

# Creating a tibble with the tree example
trees_example <- tibble(Year = 0:10,
                        Number_of_Trees = 50 * cumprod(rep(2, 11)))

# Printing all rows
head(trees_example, n = 11)

# A tibble: 11 × 2
    Year Number_of_Trees
   <int>           <dbl>
 1     0             100
 2     1             200
 3     2             400
 4     3             800
 5     4            1600
 6     5            3200
 7     6            6400
 8     7           12800
 9     8           25600
10     9           51200
11    10          102400

Using the data frame we just created, we have the following plot:

The number of trees grows exponentially across the years, meaning that, as time passes, more and more trees grow. However, the growth rate remains steady. Remember, the number of trees always doubles (100% increase from one year to the other). If we take the log of the number of trees, the plot looks different:

Figure 12.2: Log of number of trees per year.

With logs, we transform (or, in a sense, simplify) exponential growth into a linear relationship, making it easier to understand the progression of the population (trees). For instance:

Year 1: 200 trees \(\log_2 200 \approx 7.643\)
Year 2: 400 trees \(\log_2 400 \approx 8.644\)
Year 3: 800 trees \(\log_2 800 \approx 9.643\)

We therefore have a constant increase by 1 unit on the logarithm scale, even though the number of trees grows exponentially. This is intuitive because each year we multiply the number of trees by 2, and the logarithm tells us the power of the base corresponding to that number.