Chapter 3 Factor Analysis
Factor analysis is a technique that represents the variables of a dataset \(X_1, X_2, \cdots, X_p\) (or \(\bm{X}_{p \times 1}\)) as linearly related to some fewer unobservable variables called factors, denoted \(F_1, F_2, \cdots, F_m\) (or \(\bm{F}_{m \times 1}\)). The factors are representative of latent variables underlying the original variables, which is hypothetical as they cannot be measured or observed.
Notations:
- Capital Letter: Random Variable
- Lower case Letter: Observation of a random variable
- Bold: Vector or Matrix
- Normal: Single Value
3.1 The Concept of Factor Analysis
The Orthogonal Factor Model posits that each variable, \(X_i\) is a combination of the underlying latent variables, \(F_1, F_2, \cdots, F_m\). For the variables in any of the observation vectors in a sample, the model is defined as:
\[\begin{equation} \begin{split} X_1 - \mu_1 = \ell_{11} F_1 + \ell_{12} F_2 + &\cdots + \ell_{1m} F_m + \varepsilon_1 \\ X_2 - \mu_2 = \ell_{21} F_1 + \ell_{22} F_2 + &\cdots + \ell_{2m} F_m + \varepsilon_2 \\ & ~~ \vdots \\ X_p - \mu_p = \ell_{p1} F_1 + \ell_{p2} F_2 + &\cdots + \ell_{pm} F_m + \varepsilon_p \end{split} \tag{3.1} \end{equation}\], or
\[ \bm{X}_{p \times 1} - \bm{\mu}_{p \times 1} = \bm{L}_{p \times m} \bm{F}_{m \times 1} + \bm{\varepsilon}_{p \times 1} \tag{3.2} \]
, where \(\bm{\mu}\) is the mean vector and \(\bm{\varepsilon}\) is a random error term to show the relationship between the factors is not exact. There are several assumptions that must be made regarding the relationships of the factor model described above.
- Factors are independent of each other.
- \(E(F_j) = 0\), \(Var(F_j) = 1\)
- \(Cov(F_j, F_k) = 0\) where \(j \neq k\)
- The error terms \(\varepsilon_i\) are independent of each other.
- \(E(\varepsilon) = 0\), \(Var(\varepsilon_i) = \psi_i\)
- \(Cov(\varepsilon_i, \varepsilon_j) = 0\).
- \(\varepsilon_i\) and \(F_j\) are independent: \(Cov(\varepsilon_i, F_j) = 0\).
Note the assumption \(Cov(\varepsilon_i, \varepsilon_j) = 0\) implies the factors represent all correlations among the variables \(X_i\)'s.
3.1.1 Factoring the Covariance Matrix
With the assumptions above,
\[\begin{equation} \begin{split} & \bm{\Sigma}_{p \times p} = Cov(\bm{X}_{p \times p}) = \bm{L L^T}_{p \times p} + \bm{\Psi}_{p \times p}~, ~~where \\ & \bm{L}_{p \times m} = \begin{pmatrix} \ell_{11} & \ell_{12} & \cdots & \ell_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ \ell_{p1} & \ell_{p2} & \cdots & \ell_{pm} \end{pmatrix}, \\ & \bm{\Psi}_{p \times p} = \begin{pmatrix} \psi_{1} & & 0\\ & \ddots & \\ 0 & & \psi_{p} \end{pmatrix} \end{split} \tag{3.3} \end{equation}\]\(Var(X_i) = \sigma_{ii} = \ell^2_{i1} + \ell^2_{i2} + \cdots + \ell^2_{im} + \psi_i\)
\(Cov(X_i, X_j) = \sigma_{ij} = \ell_{i1}\ell_{j1} + \ell_{i2}\ell_{j2} + \cdots + \ell_{im}\ell_{jm}\)
\(Cov(\bm{X}, \bm{F}) = \bm{L}\), i.e. \(\ell_{ij} = Cov(X_i, F_j)\)
We therefore have a partitioning of the variance of the observation vector \(X_i\) into a component due to the common factors \(h_i^2\) and a component due to specific variance:
\[ \begin{equation} \begin{split} Var(X_i) & = (\ell^2_{i1} + \ell^2_{i2} + \cdots + \ell^2_{im}) + \psi_i \\ & = h^2_i + \psi_i \end{split} \tag{3.4} \end{equation} \]
3.2 Principal Component Method
There are several methods for estimating the factor loadings and communalities, including the principal component method, principal factor method, the iterated principal factor method and maximum likelihood method.
The approach of the principal component method is to calculate the sample covariance matrix \(\bm{S}\) from a sample of data and then find an estimator, denoted \(\hat{\ell}\) that can be used to factor \(\bm{S}\).
By orthogonal diagonalizing \(\bm{S}\):
\[ \begin{equation} \begin{split} &\bm{S} = PDP^T \\ \\ , ~ where \ & D = \begin{pmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_p \end{pmatrix}_{p \times p} , \\ & P = \begin{bmatrix} e_1 \ \vdots & \cdots & \vdots \ e_p \end{bmatrix}_{p \times p} \end{split} \tag{3.5} \end{equation} \]
\(D\) is a diagonal matrix with the diagonal entries, \(\lambda_1 > \lambda_2 > \cdots > \lambda_p\) equaling the eigenvalues of \(\bm{S}\).
\(P\) is an orthogonal matrix with columns of eigenvectors of \(\bm{S}\) corresponding to \(D\).
By factoring the diagonal matrix \(D\), \(~D = D^{1/2} ~ D^{1/2}\), since all \(\lambda_i \geq 0\):
\[ \begin{equation} \begin{split} \bm{S} &= PDP^T = (P D^{1/2})(D^{1/2} P^T) \\ & = (PD^{1/2})(PD^{1/2})^T \\ & = \bm{LL^T} \end{split} \tag{3.6} \end{equation} \]
Since we are interested in finding \(m\)(\(<p\)) factors in the data, we want to find \(\bm{L}_{p \times m}\).
By dropping the last \((p-m)\) terms in \(D\)4, the loading matrix then becomes \(\bm{L}_{p \times m} = P_{p \times m} ~ D_{m \times m}\). And,
\[ \begin{equation} \begin{split} & \bm{S} \approx \bm{L}_{p \times m} \bm{L}^T_{m \times p} + \bm{\Psi}_{p \times p} \\ \\ , where ~~ & \bm{\Psi} = diag(\bm{S} - \bm{LL}^T) \\ & ~~~ = \begin{pmatrix} s_{11} - h_1^2 & & 0 \\ & \ddots & \\ 0 & & s_{pp} - h_p^2 \end{pmatrix} \end{split} \tag{3.7} \end{equation} \]
The number \(m\) could be determined by a scree plot.
3.3 Factor Analysis with the psych
Package
The psych package has many functions available for performing factor analysis.
library(psych)
The principal()
function performs factor analysis with the principal component method as explained above. The rotation is set to none
for now as we have not yet done any rotation of the factors. The covar
argument is set to TRUE
so the function factors the covariance matrix \(\bm{S}\) of the data as we did above.
library(psych)
iris.FA.PC <-
principal(iris[,-5],
nfactors = 2,
rotate = 'none',
covar = TRUE)
iris.FA.PC
Principal Components Analysis
Call: principal(r = iris[, -5], nfactors = 2, rotate = "none", covar = TRUE)
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 h2 u2 com
Sepal.Length 0.89 0.36 0.92 0.0774 1.3
Sepal.Width -0.46 0.88 0.99 0.0091 1.5
Petal.Length 0.99 0.02 0.98 0.0163 1.0
Petal.Width 0.96 0.06 0.94 0.0647 1.0
PC1 PC2
SS loadings 2.92 0.91
Proportion Var 0.73 0.23
Cumulative Var 0.73 0.96
Proportion Explained 0.76 0.24
Cumulative Proportion 0.76 1.00
Mean item complexity = 1.2
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.03
with the empirical chi square 1.72 with prob < NA
Fit based upon off diagonal values = 1
The function's output matches our calculations. H2 and U2 are the *communality and specific variance, respectively, of the standardized loadings obtained from the correlation matrix \(\bm{R}\).
iris.FA.PC[["communality"]]
Sepal.Length Sepal.Width Petal.Length Petal.Width
0.9225986 0.9909193 0.9837300 0.9352804
iris.FA.PC[["loadings"]]
Loadings:
PC1 PC2
Sepal.Length 0.890 0.361
Sepal.Width -0.460 0.883
Petal.Length 0.992
Petal.Width 0.965
PC1 PC2
SS loadings 2.918 0.914
Proportion Var 0.730 0.229
Cumulative Var 0.730 0.958
iris.FA.PC[["Vaccounted"]]
PC1 PC2
SS loadings 2.9184978 0.9140305
Proportion Var 0.7296245 0.2285076
Cumulative Var 0.7296245 0.9581321
Proportion Explained 0.7615072 0.2384928
Cumulative Proportion 0.7615072 1.0000000
Proportion Var
is the proportion of total variance due to the column of \(\hat{\bm{L}}\), i.e the common factor.
Proportion Explained
is the variance due to one common factor devided by variance due to all common factors.
It's reasonable since the last few \(\lambda_i\)'s have small values, hence dropping them doesn't largely affects the total variance of \(\bm{S}\).↩