\( \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \)

Chapter 1 Multivariate Normal Distribution & Covariance Matrix

library(dplyr)
library(latex2exp)
library(ggplot2)
theme <- theme(axis.text.x = element_text(size = 7, face = "plain", angle = 30),
               axis.text.y = element_text(size = 7, face = "plain"),
               axis.title.x = element_text(size = 9, face = "bold"),
    axis.title.y = element_text(size = 9, face = "bold"))

1.1 Bivariate Normal Contour Map

1.1.1 ellipse()

ellipse() from ellipse (Murdoch and Chow 2020) is used to generate ellipse data based on a correlation/covariance matrix.

ellipse(x, scale, centre, level, npoints = 1000)
  • x: a single number, correlation of the two variables.

  • scale: vector, standard deviation of the two variables.

  • centre: vector, center of the ellipse, i.e. the mean vector of the bivariate normal distribution.

  • level: a single number, the contour probability.

  • npoints: number of points used to draw the contour.

ellipse returns a matrix with dim(npoints \(\times\) 2), which can be used to plot contour.

1.1.2 Data Generation

The for loop below is used to generate a data frame with 3 columns(variables):

  • Column 1: First variable of bivariate normal function (\(x_1\))

  • Column 2: Second variable of bivariate normal function (\(x_2\))

  • Column 3: The contour that \(x_1\) & \(x_2\) on the same row belongs to.

library(ellipse)

All_contours <- c(NA, NA, NA) 
    ## Set empty start for appending ##

for (i in 1:5) {
    level <- 0.1*i 
        ## Set Contour prob., prob. of obs within contour ##
    ell_data <-ellipse(-0.8, c(sqrt(2), 1), centre = c(1, 3), level = level, npoints = 800+(i-1)^3)
        ## npoints: bigger contours with more points ##
    class <- rep(paste(level*100, "% Contour", sep=""), nrow(ell_data))
        ## Assign contour class ##
    ell_data <- as.data.frame(ell_data)
        ## Change to data.frame BEFORE cbind, ##
        ## or coersion happens ##
    ell_data <- cbind(ell_data, class)
    
    All_contours <- rbind(All_contours, ell_data)
}

All_contours <- All_contours[-1,]
    ## Remove the empty start ##

1.1.3 Plotting

ggplot(data = All_contours) +
    geom_point(aes(x = x, y = y, color = class),
               size = 0.1) +
    scale_colour_grey(start = 0.7, end = 0.3) +
        ## Use gray scales instead of colored default ##
    labs(color = "Contours", 
         title = "Contour Plot",
         x = TeX("$x_1$"), y = TeX("$x_2$")
    )

1.2 Multivariate Normal Functions

1.2.1 Generate density f(x)

library(mvtnorm)

mu <- c(1, 3) # mean vector
Sigma <- matrix(c(2, -0.8*sqrt(2), -0.8*sqrt(2), 1),
                nrow = 2) # covariance matrix

dmvnorm(x = c(2, 5), mean = mu, sigma = Sigma)
[1] 1.562995e-05
  • x: Vector x in f(x), all variables of the multivariate normal distribution.
  • mean: Mean vector(center of ellipse) of the multivariate normal distribution.
  • sigma: Covariance matrix of the multivariate normal distribution.

dmvnorm returns f(x), the range of the multivariate normal function. For example, dmvnorm(x = c(2, 5), mean = mu, sigma = Sigma) returns the value f(\(x_1=2\), \(x_2=5\)) of the multivariate normal distribution specified by mean vector, mu, and covariance matrix, Sigma.

1.2.1.1 Example: Densities of a Contour

data <- All_contours %>% 
    filter(class == "50% Contour")

dmvnorm(x = data[1, 1:2], mean = mu, sigma = Sigma)[[1]]
[1] 0.09378295
dmvnorm(x = data[4, 1:2], mean = mu, sigma = Sigma)[[1]]
[1] 0.09378295

The retured values are the same(very close), since they are on the same contour. See the section above for more details.

1.2.2 Covariance Matrix

Generater covariance and correlation Matricies:

# Function to check whether package is installed
is.installed <- function(mypkg){
    is.element(mypkg, installed.packages()[,1])
} 

# check if package "hydroGOF" is installed
if (!is.installed("mat2tex")){
    remotes::install_github("markheckmann/mat2tex")
}
  
   checking for file ‘/tmp/Rtmpack0ok/remotes1db24a4a9e4b/markheckmann-mat2tex-d6ba4c1/DESCRIPTION’ ...
  
✔  checking for file ‘/tmp/Rtmpack0ok/remotes1db24a4a9e4b/markheckmann-mat2tex-d6ba4c1/DESCRIPTION’

  
─  preparing ‘mat2tex’:

  
   checking DESCRIPTION meta-information ...
  
✔  checking DESCRIPTION meta-information

  
─  checking for LF line-endings in source and make files and shell scripts

  
─  checking for empty or unneeded directories

  
─  building ‘mat2tex_0.1.9002.tar.gz’

  
   
library(mat2tex)
cov.mt <- cov(iris[,1:4]) ## Cov Matrix of variable 1~4
cor.mt <- cor(iris[,1:4]) ## Cor Matrix of variable 1~4

Covariance matrix \(= \begin{pmatrix} 0.69 & -0.04 & 1.27 & 0.52 \\ -0.04 & 0.19 & -0.33 & -0.12 \\ 1.27 & -0.33 & 3.12 & 1.30 \\ 0.52 & -0.12 & 1.30 & 0.58 \\ \end{pmatrix}\)

Correlation matrix \(= \begin{pmatrix} 1.00 & -0.12 & 0.87 & 0.82 \\ -0.12 & 1.00 & -0.43 & -0.37 \\ 0.87 & -0.43 & 1.00 & 0.96 \\ 0.82 & -0.37 & 0.96 & 1.00 \\ \end{pmatrix}\)

Package Used

Murdoch, Duncan, and E. D. Chow. 2020. Ellipse: Functions for Drawing Ellipses and Ellipse-Like Confidence Regions. https://CRAN.R-project.org/package=ellipse.