Yongfu's Blog

# Demystifying Item Response Theory (2/4)

IRT as Generalized Linear Models

Mar 6, 2023

In Part 1, we went through the simplest item response model, the 1PL model, from the perspective of simulations. Starting with item difficulty and testee ability, we worked forward to simulate item responses that mimic real-world data. Back then, we were precisely laying out the data generating process that is assumed by the item response theory. In this post, we work backward. We will start with the item responses and work back toward the unobserved difficulties and abilities, with the help of statistical models. But first, let’s simulate the data we will be using!

 1logistic = function(x) 1 / (1 + exp(-x))
2rbern = function( p, n=length(p) ) rbinom( n=n, size=1, prob=p )
3
4set.seed(12)
5n_item = 30   # number of items
6n_subj = 60   # number of subjects
7n_resp = n_subj * n_item  # number of responses
8
9A = rnorm( n=n_subj, mean=0, sd=1 )  # Subjects' ability
10D = seq( -1.6, 1, length=n_item )  # Items' difficulty
11
12# The data
13d = expand.grid( S=1:n_subj, I=1:n_item, KEEP.OUT.ATTRS = FALSE )
14d$R = rbern( logistic(A[d$S] - D[d$I]) ) 15d$S = factor(d$S) 16d$I = factor(d$I) 17str(d)  'data.frame': 1800 obs. of 3 variables:$ S: Factor w/ 60 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$I: Factor w/ 30 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...$ R: int  0 1 0 0 1 1 1 1 1 1 ...


In the code above, the first two lines are the definitions for the logistic and the Bernoulli functions used previously. The second chunk of code sets the shape of our data. This time, the simulated data is much larger, with 30 items and 60 testees, or subjects (I will use the more general term “subject” hereafter). Since we assume here that each subject responds to every item, this gives us 1800 responses in the data.

The subject abilities come from a normal distribution with a zero mean and a standard deviation of one (the standard normal). The item difficulties are equally-spaced values that range from -1.6 to 1. A notable change from the previous post is that number indices are used here for labeling items (I) and subjects (S). For simple illustrations, letter indices are clearer. But for larger data sets, number indices are easier to manipulate with code. Now, since S and I are coded as integers, we need to explicitly convert them into factors. Otherwise, the model will treat the number indices as values in a continuous variable.

## DAGs Revisited

Before we move on to the statistical model, let me lay out the DAGs again. The DAG on the right below is identical to the one in Part 1 (the left DAG here), but with a slight modification that emphasizes the perspective from the statistical model. Here, the observed $S$ and $I$ take place of the unobserved $A$ and $D$, respectively. So why the difference?

Recall that the nodes $A$ and $D$ represent the joint influences of a subject’s ability and an item’s difficulty on the probability of success on that item. However, the statistical model cannot notice $A$ and $D$ since they are theoretical concepts proposed by the IRT. What the model “sees” is more similar to the DAG on the right. This DAG is theoretically neutral. All it says is that the probability of success is influenced by the particular subject and item present in an observation. It does not further comment on the factors underlying each subject/item that lead to the results.

Given the data and the right DAG, the statistical model estimates the so-called subject effects and item effects. These effects will be estimates of subject ability and item difficulty if the IRT assumptions are met: when a subject and an item influence the result only through subject ability and item difficulty. With the concepts of subject/item effects in place, we can move on to the formulas of the statistical model.

## Equation, Index and Annoying Things

The equations in (1) are the formulation of our model. This model is known as the logistic regression, or in GLM terms, the Generalized Linear Model of the binomial family with the logit link (more on this later). Lots of things are going on here. Don’t panic, I’ll walk you through slowly.

\begin{align} & R_i \sim \text{Bernoulli}( P_i ) \\ & P_i = \text{logistic}( \mu_i ) \\ & \mu_i = \alpha_{[S_i]} + \delta_{[I_i]} \end{align} \tag{1}

First, note the common subscript $_i$ to the variables above. The presence of this common $_i$ indicates that the equations are read at the observational level. The observational level is easier to think of with help of the long data format. In this long form of data, each row records an observation and is indexed by the subscript $_i$. So you can think of the set of three equations as describing the links among the variables for each observation. Note that the long data format is also the format we have been using for the data frames.

The last equation in (1), also related to the reading of the subscript $_i$, deserves some elaboration, as some might feel confused about the square brackets after $\alpha$ and $\delta$. Actually, we have already met these brackets in Part 1. The brackets here serve a similar function to R’s subset function [] that we have used for linking particular ability/difficulty levels of a subject/item to the rows (observations) of the data frame. So what the square brackets after $\alpha$ and $\delta$ do exactly, is to “look up” the index of the subject and item for the $_i$th observation such that the $\alpha$ corresponding to the subject and the $\delta$ corresponding to the item could be correctly retrieved. The model can thus “know” which $\alpha$ and $\delta$ to update when it encounters an observation. For instance, suppose we are on the 3rd row (observation) of the data, in which $S_3 = 5$ and $I_3 = 8$. This tells the model that the observation gives information about $\alpha_5$ and $\delta_8$. The model thus learns something about them and updates accordingly.

I haven’t written about $\alpha$ and $\delta$ yet, but based on the previous paragraph, you might already know what they are: $\alpha$s are the subject effects and $\delta$s the item effects to be estimated by the model.

Now, let me walk you through the equations from bottom to top:

• $\mu_i = \alpha_{[S_i]} + \delta_{[I_i]}$
No surprise here. This equation simply illustrates how the model computes a new variable $\mu$ from $\alpha$ and $\delta$.
• $P_i = \text{logistic}( \mu_i )$
The equation should look familiar. It indicates how the model maps $\mu$, which can range from $-\infty$ to $\infty$, to probability, $P$, through the logistic function.
• $R_i \sim \text{Bernoulli}( P_i )$
This equation describes that each observed response is generated from a Bernoulli distribution with probability $P_i$. Or even simpler, $R_i$ would be $1$ with probability $P_i$ and $0$ with probability $1 - P_i$.

These equations all look familiar because they are essentially mathematical representations of the simulation we have done. Here, the model formulation is simply simulation in reverse.

The GLM formulation of (1) is often seen in an alternative form in (2). The only difference between (2) and (1) lies in the second equation. Instead of the logistic function, the second equation in (2) uses the logit function. What is the logit?

\begin{align} & R_i \sim \text{Bernoulli}( P_i ) \\ & \text{logit}(P_i) = \mu_i \\ & \mu_i = \alpha_{[S_i]} + \delta_{[I_i]} \end{align} \tag{2}

The logit function is simply the mirror of the logistic. They do the same mapping but in reverse directions:

• the logistic function maps real numbers to probabilities
• the logit function maps probabilities to real numbers

The logistic and the logit are inverse functions to each other. So if a real number gets converted to the probability by the logistic, the logit can convert it back to the original real, and vice versa.

1logit = function( p ) log( p/(1-p) )
2x = seq( -1, 1, by=0.1 )
3( p = logistic(x) )  # Transformed x on probability space

 [1] 0.2689414 0.2890505 0.3100255 0.3318122 0.3543437 0.3775407 0.4013123
[8] 0.4255575 0.4501660 0.4750208 0.5000000 0.5249792 0.5498340 0.5744425
[15] 0.5986877 0.6224593 0.6456563 0.6681878 0.6899745 0.7109495 0.7310586

1# Logit gives x back
2logit(p)

 [1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1  0.0  0.1  0.2  0.3  0.4
[16]  0.5  0.6  0.7  0.8  0.9  1.0


Some algebra would get us from the logistic to the logit:

\begin{aligned} \text{logistic}(x) &= \frac{1}{1 + e^{-x}} = p \\ & \Rightarrow ~~ e^{-x} = \frac{1-p}{p} \\ & \Rightarrow ~ -x = \text{log}(\frac{1-p}{p}) \\ & \Rightarrow ~~ x = -\text{log}(\frac{1-p}{p}) \\ & \phantom{\Rightarrow ~~ x } = \text{log}(\frac{p}{1-p}) = \text{logit}(p) \end{aligned}

There is really nothing special about the logit function. We have learned all the important things through the logistic back in Part 1. I mention the logit here simply because the term is frequently used. When people talk about GLMs, they prefer to use the link function to characterize the model. The link function, in the case of the logistic regression here, is the logit function. It transforms the outcome probabilities into real numbers that are modeled linearly. It’s just the logistic, but works in the reverse direction.

## Fitting GLM

Now, we are packed with the statistical muscles to carry out the analysis. Let’s fit the model on the data we’ve simulated. In R, this is done through the function glm(). The first argument of glm() is the formula, in which we specify our linear model with R’s model syntax. There are in principle two ways, one succinct and the other tedious, to express the formula when there are categorical predictors in the model. I will first demonstrate the tedious one, as it exposes all the details hidden by the succinct form. Though tedious, it saves us from confusion.

### Dummy Coding

The formulas we specify in glm() (and other model fitting functions in general) correspond pretty well to their mathematical counterparts. So let me first present the math before we move on to the code. Lots of things to explain here.

Equation (3.2) is rewritten from the last two equations, $logit(P_i) = \mu_i$ and $\mu_i = \alpha_{[S_i]} + \delta_{[I_i]}$ in (2), which I reproduce here in Equation (3.1) by combining the two equations.

Earlier I mentioned that the square brackets after $\alpha$ and $\delta$ serve as a “look up” function to locate the relevant $\alpha$ and $\delta$ of each subject and item in an observation. There is an equivalent way to express the same formula without the use of these “look up” functions, which is shown in equation (3.2). For the sake of simplicity, let’s assume here that we have only two items (A, B) and three subjects (J, K, L). For real data, equation (3.2) would be extremely long.

\begin{align} \tag{3.1} \text{logit}(\mu_i) &= \alpha_{[S_i]} + \delta_{[I_i]} \\ \tag{3.2} \text{logit}(\mu_i) &= J_i \alpha_J + K_i \alpha_K + L_i \alpha_L + A_i \delta_A + B_i \delta_B \end{align}

The variables ($J_i, K_i, L_i, A_i, B_i$) in front of the $\alpha$s and $\delta$s have a value of either 0 or 1. Here, they serve as a “switch” that turns on the relevant $\alpha$ and $\delta$ and turns off the others in each observation. This is easier to see with the help of the tables below. Table 3.1 corresponds to Equation (3.1), and Table 3.2 corresponds to Equation (3.2). So, for instance, in row 2 of Table 3.2, $K$ and $A$ are 1 while the others are 0. This turns on, or picks out, $\alpha_K$ and $\delta_A$. As such, they would be updated by the model when it reaches this observation. In row 2 of Table 3.1, $\alpha_K$ and $\delta_A$ are picked out too, but not by the switches. They are directly picked out through the K and A present in the row.

Table 3.1Table 3.2
$_i$$S$$I$
1JA
2KA
3LB
110010
201010
300101