\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \)

Chapter 3 Factor Analysis

Factor analysis is a technique that represents the variables of a dataset \(X_1, X_2, \cdots, X_p\) (or \(\bm{X}_{p \times 1}\)) as linearly related to some fewer unobservable variables called factors, denoted \(F_1, F_2, \cdots, F_m\) (or \(\bm{F}_{m \times 1}\)). The factors are representative of latent variables underlying the original variables, which is hypothetical as they cannot be measured or observed.

Notations:

  • Capital Letter: Random Variable
  • Lower case Letter: Observation of a random variable
  • Bold: Vector or Matrix
  • Normal: Single Value

3.1 The Concept of Factor Analysis

The Orthogonal Factor Model posits that each variable, \(X_i\) is a combination of the underlying latent variables, \(F_1, F_2, \cdots, F_m\). For the variables in any of the observation vectors in a sample, the model is defined as:

\[\begin{equation} \begin{split} X_1 - \mu_1 = \ell_{11} F_1 + \ell_{12} F_2 + &\cdots + \ell_{1m} F_m + \varepsilon_1 \\ X_2 - \mu_2 = \ell_{21} F_1 + \ell_{22} F_2 + &\cdots + \ell_{2m} F_m + \varepsilon_2 \\ & ~~ \vdots \\ X_p - \mu_p = \ell_{p1} F_1 + \ell_{p2} F_2 + &\cdots + \ell_{pm} F_m + \varepsilon_p \end{split} \tag{3.1} \end{equation}\]

, or

\[ \bm{X}_{p \times 1} - \bm{\mu}_{p \times 1} = \bm{L}_{p \times m} \bm{F}_{m \times 1} + \bm{\varepsilon}_{p \times 1} \tag{3.2} \]

, where \(\bm{\mu}\) is the mean vector and \(\bm{\varepsilon}\) is a random error term to show the relationship between the factors is not exact. There are several assumptions that must be made regarding the relationships of the factor model described above.

  1. Factors are independent of each other.
    • \(E(F_j) = 0\), \(Var(F_j) = 1\)
    • \(Cov(F_j, F_k) = 0\) where \(j \neq k\)
  2. The error terms \(\varepsilon_i\) are independent of each other.
    • \(E(\varepsilon) = 0\), \(Var(\varepsilon_i) = \psi_i\)
    • \(Cov(\varepsilon_i, \varepsilon_j) = 0\).
  3. \(\varepsilon_i\) and \(F_j\) are independent: \(Cov(\varepsilon_i, F_j) = 0\).

Note the assumption \(Cov(\varepsilon_i, \varepsilon_j) = 0\) implies the factors represent all correlations among the variables \(X_i\)'s.

3.1.1 Factoring the Covariance Matrix

With the assumptions above,

\[\begin{equation} \begin{split} & \bm{\Sigma}_{p \times p} = Cov(\bm{X}_{p \times p}) = \bm{L L^T}_{p \times p} + \bm{\Psi}_{p \times p}~, ~~where \\ & \bm{L}_{p \times m} = \begin{pmatrix} \ell_{11} & \ell_{12} & \cdots & \ell_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ \ell_{p1} & \ell_{p2} & \cdots & \ell_{pm} \end{pmatrix}, \\ & \bm{\Psi}_{p \times p} = \begin{pmatrix} \psi_{1} & & 0\\ & \ddots & \\ 0 & & \psi_{p} \end{pmatrix} \end{split} \tag{3.3} \end{equation}\]



  • \(Var(X_i) = \sigma_{ii} = \ell^2_{i1} + \ell^2_{i2} + \cdots + \ell^2_{im} + \psi_i\)

  • \(Cov(X_i, X_j) = \sigma_{ij} = \ell_{i1}\ell_{j1} + \ell_{i2}\ell_{j2} + \cdots + \ell_{im}\ell_{jm}\)

  • \(Cov(\bm{X}, \bm{F}) = \bm{L}\), i.e. \(\ell_{ij} = Cov(X_i, F_j)\)

We therefore have a partitioning of the variance of the observation vector \(X_i\) into a component due to the common factors \(h_i^2\) and a component due to specific variance:

\[ \begin{equation} \begin{split} Var(X_i) & = (\ell^2_{i1} + \ell^2_{i2} + \cdots + \ell^2_{im}) + \psi_i \\ & = h^2_i + \psi_i \end{split} \tag{3.4} \end{equation} \]

3.2 Principal Component Method

There are several methods for estimating the factor loadings and communalities, including the principal component method, principal factor method, the iterated principal factor method and maximum likelihood method.

The approach of the principal component method is to calculate the sample covariance matrix \(\bm{S}\) from a sample of data and then find an estimator, denoted \(\hat{\ell}\) that can be used to factor \(\bm{S}\).

By orthogonal diagonalizing \(\bm{S}\):

\[ \begin{equation} \begin{split} &\bm{S} = PDP^T \\ \\ , ~ where \ & D = \begin{pmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_p \end{pmatrix}_{p \times p} , \\ & P = \begin{bmatrix} e_1 \ \vdots & \cdots & \vdots \ e_p \end{bmatrix}_{p \times p} \end{split} \tag{3.5} \end{equation} \]


\(D\) is a diagonal matrix with the diagonal entries, \(\lambda_1 > \lambda_2 > \cdots > \lambda_p\) equaling the eigenvalues of \(\bm{S}\).
\(P\) is an orthogonal matrix with columns of eigenvectors of \(\bm{S}\) corresponding to \(D\).

By factoring the diagonal matrix \(D\), \(~D = D^{1/2} ~ D^{1/2}\), since all \(\lambda_i \geq 0\):

\[ \begin{equation} \begin{split} \bm{S} &= PDP^T = (P D^{1/2})(D^{1/2} P^T) \\ & = (PD^{1/2})(PD^{1/2})^T \\ & = \bm{LL^T} \end{split} \tag{3.6} \end{equation} \]

Since we are interested in finding \(m\)(\(<p\)) factors in the data, we want to find \(\bm{L}_{p \times m}\).
By dropping the last \((p-m)\) terms in \(D\)4, the loading matrix then becomes \(\bm{L}_{p \times m} = P_{p \times m} ~ D_{m \times m}\). And,

\[ \begin{equation} \begin{split} & \bm{S} \approx \bm{L}_{p \times m} \bm{L}^T_{m \times p} + \bm{\Psi}_{p \times p} \\ \\ , where ~~ & \bm{\Psi} = diag(\bm{S} - \bm{LL}^T) \\ & ~~~ = \begin{pmatrix} s_{11} - h_1^2 & & 0 \\ & \ddots & \\ 0 & & s_{pp} - h_p^2 \end{pmatrix} \end{split} \tag{3.7} \end{equation} \]

The number \(m\) could be determined by a scree plot.

3.3 Factor Analysis with the psych Package

The psych package has many functions available for performing factor analysis.

library(psych)

The principal() function performs factor analysis with the principal component method as explained above. The rotation is set to none for now as we have not yet done any rotation of the factors. The covar argument is set to TRUE so the function factors the covariance matrix \(\bm{S}\) of the data as we did above.

library(psych)
iris.FA.PC <- 
    principal(iris[,-5], 
              nfactors = 2, 
              rotate = 'none', 
              covar = TRUE)
iris.FA.PC
Principal Components Analysis
Call: principal(r = iris[, -5], nfactors = 2, rotate = "none", covar = TRUE)
Standardized loadings (pattern matrix) based upon correlation matrix
               PC1  PC2   h2     u2 com
Sepal.Length  0.89 0.36 0.92 0.0774 1.3
Sepal.Width  -0.46 0.88 0.99 0.0091 1.5
Petal.Length  0.99 0.02 0.98 0.0163 1.0
Petal.Width   0.96 0.06 0.94 0.0647 1.0

                       PC1  PC2
SS loadings           2.92 0.91
Proportion Var        0.73 0.23
Cumulative Var        0.73 0.96
Proportion Explained  0.76 0.24
Cumulative Proportion 0.76 1.00

Mean item complexity =  1.2
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is  0.03 
 with the empirical chi square  1.72  with prob <  NA 

Fit based upon off diagonal values = 1

The function's output matches our calculations. H2 and U2 are the *communality and specific variance, respectively, of the standardized loadings obtained from the correlation matrix \(\bm{R}\).

iris.FA.PC[["communality"]]
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
   0.9225986    0.9909193    0.9837300    0.9352804 
iris.FA.PC[["loadings"]]

Loadings:
             PC1    PC2   
Sepal.Length  0.890  0.361
Sepal.Width  -0.460  0.883
Petal.Length  0.992       
Petal.Width   0.965       

                 PC1   PC2
SS loadings    2.918 0.914
Proportion Var 0.730 0.229
Cumulative Var 0.730 0.958
iris.FA.PC[["Vaccounted"]]
                            PC1       PC2
SS loadings           2.9184978 0.9140305
Proportion Var        0.7296245 0.2285076
Cumulative Var        0.7296245 0.9581321
Proportion Explained  0.7615072 0.2384928
Cumulative Proportion 0.7615072 1.0000000

Proportion Var is the proportion of total variance due to the column of \(\hat{\bm{L}}\), i.e the common factor.

Proportion Explained is the variance due to one common factor devided by variance due to all common factors.


  1. It's reasonable since the last few \(\lambda_i\)'s have small values, hence dropping them doesn't largely affects the total variance of \(\bm{S}\).