Yongfu's Blog

Constructing Life Tables with R

Dec 11, 2017

I have been using the package dplyr to handle with data for a while, and I thought I can use it with ease until I was stuck with my homework on contructing a life table. I found spreadsheets (either Excel or Google Spreadsheets) easy for handling this task, but had a hard time dealing with it in R. I think it was due to my unfamiliarity with the built-in functions and insufficient practice in R. So, I wrote this post as a review and practice of my data-wrangling skills in R.

I will illustrate how I constructed a life table with R, and you’ll find out how easy it is (and wonder how could I stumble on it).

Load Packages

I used these packages to construct a Life table.

library(readr)
library(dplyr)
library(knitr)

Load data

Load the csv file to life_table. The raw data contains 3 columns: Age, Survivorship at Age x ($l_x$), and Fecundity at Age x ($m_x$). I added options(scipen=999) to disable scientific notations.

options(scipen=999)  # Disable Scientific Notation
life_table <- read_csv("life_table.csv")
life_table
# A tibble: 10 x 3
     Age        lx    mx
   <int>     <dbl> <int>
 1     0 1.0000000     0
 2     1 0.0000620  4600
 3     2 0.0000340  8700
 4     3 0.0000200 11600
 5     4 0.0000155 12700
 6     5 0.0000110 12700
 7     6 0.0000065 12700
 8     7 0.0000020 12700
 9     8 0.0000020 12700
10     9 0.0000000     0

Variables to Construct

Here are the variables that need to be calculated.

StatisticNotationCalculation Formula
$l_x m_x$
$x l_x m_x$
$l_x m_x e^{-rx}$
Average survivorship
(age class)
$L_x$$L_x = (lx + l{x+1})/2$
Life expectancy$e_x$$e_x = (Lx + L{x+1} + … + L_{max})/l_x$
Reproductive value$V_x$$\displaystyle \frac{\sum_{y=x}^{max\hspace{0.3mm}x} e^{-ry} l_y m_y}{e^{-rx} l_x}$
Net reproductive rate$R_0$$\sum_{all \hspace{0.3mm} x} l_x m_x$
Generation time$G$$\frac{\sum_{all \hspace{0.3mm} x} x l_x m_x}{R_0}$
Intrinsic rate of increaseApproximate $r$$r \approx \frac{ln(R_0)}{G}$
Intrinsic rate of increase(True) $r$$\displaystyle \sum_{all \hspace{0.3mm} x} e^{-rx}l_x m_x = 1$

$l_xm_xe^{-rx}$ and $V_x$ will be calculated twice for the approximate and the true $r$.

Constructing Variables

mutate: Creating new columns

Using the pipe %>% and the function mutate in package dplyr, I first constructed 7 new variables. Note the dependencies of the variables, so that I couldn’t construct the life tables at once.

life_table <- life_table %>%
    mutate("lx*mx"=lx*mx,
           "x*lx*mx"=Age*lx*mx,
           "Lx"=(lx+lead(lx))/2,
           "Lx"=replace(Lx, 10, 0),
           "ex"=rev(cumsum(rev(Lx)))/lx,
           "R0"=sum(lx*mx),
           "G"=sum(Age*lx*mx)/R0,
           "approx.r"=log(R0)/G
           )

Two things worth noting in the mutate function:

  1. The code "Lx"=(lx+lead(lx))/2, "Lx"=replace(Lx, 10, 0)

    • lead(lx) shifts the whole column of $l_x$ to its next value, i.e. the column $lx$ becomes $l{(x+1)}$.
    • Due to lead(lx), the last entry of the new column $L_x$ must be a NA, so I have to assign 0 to it (otherwise all calculations based on it will become NAs).
  2. The code "ex"=rev(cumsum(rev(Lx)))/lx (This is where I was stuck)

    1. The numerator of ex is calculated by summing over $Lx$ to $L{max}$, the maximum age of $L_x$. This is not so intuitive when working with R.
    2. rev(Lx) reverse the order of $L_x$, and cumsum() is for cummulative sum. cumsum(rev(Lx)) then is equivalent to summing $L_x$ backwards (i.e. from $L_{max}$ to $L_x$).
    3. But since $L_x$ is reversed in the first place, cumsum(rev(Lx)) is also in reverse order. Reversing cumsum(rev(Lx)) with rev() then gives what I want.

Calculating $r$ by while loop

By the Eular-Lotka equation, I can calculate $r$.

$$\displaystyle \sum_{all \hspace{0.3mm} x} e^{-rx}l_x m_x = 1$$

Using while loop and Approximate $r$ calculated earlier as the starting value for r, $r$ is calculated as below.

df <- as.data.frame(life_table)
r <- 0.0812198
x <- sum(exp(-r*df$Age)*df$`lx*mx`)
while (abs(x-1) >= 0.000001) {
    if (x-1>0){
        r <- r+0.00000001
    }
    else{
        r <- r-0.00000001
    }
    x <- sum(exp(-r*df$Age)*df$`lx*mx`)
    r
}

The rest is simple, and the logic applied is the same.

life_table <- life_table %>%
    mutate(
        "approx.r"=log(R0)/G,
        "r"=r,
        "Vx"=rev(cumsum(rev(exp(-r*Age)*lx*mx)))/exp(-r*Age)*lx,
        "lx*mx*e^-rx"= lx*mx*exp(-r*Age),
        "approx.Vx"=rev(cumsum(rev(exp(-approx.r*Age)*lx*mx)))/exp(-approx.r*Age)*lx,
        "approx.lx*mx*e^-rx"= lx*mx*exp(-approx.r*Age)
    )

Now, I’m done constructing a life table.

Printing pretty Life Table

kable(life_table, format="markdown", align="c")
$Age$$l_x$$m_x$$l_x m_x$$x l_x m_x$$L_x$$e_x$$R_0$$G$$Approximate$
$r$
$r$$V_x$$l_x m_x e^{-rx}$$Approximate$
$V_x$
$Approximate$
$l_x m_x e^{-rx}$
01.000000000.000000.00000.50003100.5001531.28293.067270.08121980.08471171.00000100.00000001.00991030.0000000
10.000062046000.285200.28520.00004801.9677421.28293.067270.08121980.08471170.00006750.26203520.00006790.2629518
20.000034087000.295800.59160.00002702.1764711.28293.067270.08121980.08471170.00002970.24970000.00002990.2514499
30.0000200116000.232000.69600.00001782.3500001.28293.067270.08121980.08471170.00001260.17993620.00001260.1818310
40.0000155127000.196850.78740.00001321.8870971.28293.067270.08121980.08471170.00000670.14027370.00000670.1422467
50.0000110127000.139700.69850.00000871.4545461.28293.067270.08121980.08471170.00000280.09146340.00000280.0930743
60.0000065127000.082550.49530.00000421.1153851.28293.067270.08121980.08471170.00000080.04965670.00000080.0507081
70.0000020127000.025400.17780.00000201.5000001.28293.067270.08121980.08471170.00000010.01403800.00000010.0143853
80.0000020127000.025400.20320.00000100.5000001.28293.067270.08121980.08471170.00000010.01289780.00000010.0132632
90.000000000.000000.00000.0000000NaN1.28293.067270.08121980.08471170.00000000.00000000.00000000.0000000

Note that I edited the column names of the table in a text editor to make it display in LaTeX style. There is no simple knitr function (at least I don’t know) that print out pretty displayed style text in a table generated from a data frame