Linear Discriminant Analysis

Author

Nabil Salehiyan

# Knowledge Mining: Linear Discriminant Analysis
# File: Lab_LDA01.R
# Theme: Linear Discriminant Analysis
# Adapted from ISLR Chapter 4 Lab

# Setup
require(ISLR)

Loading required package: ISLR

require(MASS)

Loading required package: MASS

require(descr)

Loading required package: descr

attach(Smarket)

## Linear Discriminant Analysis
freq(Direction)

Direction 
      Frequency Percent
Down        602   48.16
Up          648   51.84
Total      1250  100.00

train = Year<2005
lda.fit=lda(Direction~Lag1+Lag2,data=Smarket, subset=Year<2005)
lda.fit

Call:
lda(Direction ~ Lag1 + Lag2, data = Smarket, subset = Year < 
    2005)

Prior probabilities of groups:
    Down       Up 
0.491984 0.508016 

Group means:
            Lag1        Lag2
Down  0.04279022  0.03389409
Up   -0.03954635 -0.03132544

Coefficients of linear discriminants:
            LD1
Lag1 -0.6420190
Lag2 -0.5135293

plot(lda.fit, col="dodgerblue")

Smarket.2005=subset(Smarket,Year==2005) # Creating subset with 2005 data for prediction
lda.pred=predict(lda.fit,Smarket.2005)
names(lda.pred)

[1] "class"     "posterior" "x"

lda.class=lda.pred$class
Direction.2005=Smarket$Direction[!train] 
table(lda.class,Direction.2005)

         Direction.2005
lda.class Down  Up
     Down   35  35
     Up     76 106

data.frame(lda.pred)[1:5,]

     class posterior.Down posterior.Up         LD1
999     Up      0.4901792    0.5098208  0.08293096
1000    Up      0.4792185    0.5207815  0.59114102
1001    Up      0.4668185    0.5331815  1.16723063
1002    Up      0.4740011    0.5259989  0.83335022
1003    Up      0.4927877    0.5072123 -0.03792892

table(lda.pred$class,Smarket.2005$Direction)

      
       Down  Up
  Down   35  35
  Up     76 106

mean(lda.pred$class==Smarket.2005$Direction)

[1] 0.5595238

From the three methods (best subset, forward stepwise, and backward stepwise):

Which of the three models with k predictors has the smallest training RSS?

From these methods, best subset selects the model with the smallest training RSS. This method searches through all the subsets of predictors and picks the one with the lowest RSS on the training data.

b. Which of the three models with k predictors has the smallest test RSS?

In order to pick the model with the smallest test RSS, we need to estimate the error. We can do this using validation or cross-validation, or indirectly estimate the error by making an adjustment to the training error to account for the bias that comes from overfitting It also depends on other factors such as number of variables and curve of the data. Cp, AIC, BIC, and Adjusted R2 are all techniques for adjusting the training error and allows us to select amongst models with different numbers of variables.

Cp: This statistic takes on a small value when the test error is also small.

AIC: This statistic is best for a large class of models fit by maximum likelihood

BIC: Dervied from a bayesian perspective. This also takes on a smaller value as the test error decreases. This method adds a heavier penalty on models with many variables which results in smaller model selection compared to Cp.

R^2: this statistic increases as test error decreases.

Application Exercise

#Use the rnorm() function to generate a predictor X of length n = 100, as well as a noise vector 𝜀 of length n = 100.
set.seed(1)
X = rnorm(100)
eps = rnorm(100)

# Generate response vector according to model and plot x and y
y <- 4 + 9*X + 2*X^2 + X^3 + eps
plot(X, y, main="Response vs Predictor X", xlab="X", ylab="Response")

# Load leaps package and perform best subset selection
require(leaps)

Loading required package: leaps

library(leaps)
df <- data.frame(X, y)
subsets <- regsubsets(y ~ poly(X, 10, raw=T), data=df, nvmax=10)

# Plot RSS, Cp, BIC, and adjusted R^2 for each model size
plot(subsets, scale = "Cp")

plot(subsets, scale = "bic")

plot(subsets, scale = "adjr2")

# Get best model according to Cp
summary(subsets)$which[which.min(summary(subsets)$cp),]

           (Intercept)  poly(X, 10, raw = T)1  poly(X, 10, raw = T)2 
                  TRUE                   TRUE                   TRUE 
 poly(X, 10, raw = T)3  poly(X, 10, raw = T)4  poly(X, 10, raw = T)5 
                  TRUE                  FALSE                   TRUE 
 poly(X, 10, raw = T)6  poly(X, 10, raw = T)7  poly(X, 10, raw = T)8 
                 FALSE                  FALSE                  FALSE 
 poly(X, 10, raw = T)9 poly(X, 10, raw = T)10 
                 FALSE                  FALSE

# Get best model according to BIC
summary(subsets)$which[which.min(summary(subsets)$bic),]

           (Intercept)  poly(X, 10, raw = T)1  poly(X, 10, raw = T)2 
                  TRUE                   TRUE                   TRUE 
 poly(X, 10, raw = T)3  poly(X, 10, raw = T)4  poly(X, 10, raw = T)5 
                  TRUE                  FALSE                  FALSE 
 poly(X, 10, raw = T)6  poly(X, 10, raw = T)7  poly(X, 10, raw = T)8 
                 FALSE                  FALSE                  FALSE 
 poly(X, 10, raw = T)9 poly(X, 10, raw = T)10 
                 FALSE                  FALSE

# Get best model according to adjusted R^2
summary(subsets)$which[which.max(summary(subsets)$rsq),]

           (Intercept)  poly(X, 10, raw = T)1  poly(X, 10, raw = T)2 
                  TRUE                   TRUE                   TRUE 
 poly(X, 10, raw = T)3  poly(X, 10, raw = T)4  poly(X, 10, raw = T)5 
                  TRUE                   TRUE                   TRUE 
 poly(X, 10, raw = T)6  poly(X, 10, raw = T)7  poly(X, 10, raw = T)8 
                  TRUE                   TRUE                   TRUE 
 poly(X, 10, raw = T)9 poly(X, 10, raw = T)10 
                  TRUE                   TRUE

summary(subsets)

Subset selection object
Call: regsubsets.formula(y ~ poly(X, 10, raw = T), data = df, nvmax = 10)
10 Variables  (and intercept)
                       Forced in Forced out
poly(X, 10, raw = T)1      FALSE      FALSE
poly(X, 10, raw = T)2      FALSE      FALSE
poly(X, 10, raw = T)3      FALSE      FALSE
poly(X, 10, raw = T)4      FALSE      FALSE
poly(X, 10, raw = T)5      FALSE      FALSE
poly(X, 10, raw = T)6      FALSE      FALSE
poly(X, 10, raw = T)7      FALSE      FALSE
poly(X, 10, raw = T)8      FALSE      FALSE
poly(X, 10, raw = T)9      FALSE      FALSE
poly(X, 10, raw = T)10     FALSE      FALSE
1 subsets of each size up to 10
Selection Algorithm: exhaustive
          poly(X, 10, raw = T)1 poly(X, 10, raw = T)2 poly(X, 10, raw = T)3
1  ( 1 )  "*"                   " "                   " "                  
2  ( 1 )  "*"                   " "                   " "                  
3  ( 1 )  "*"                   "*"                   "*"                  
4  ( 1 )  "*"                   "*"                   "*"                  
5  ( 1 )  "*"                   "*"                   "*"                  
6  ( 1 )  "*"                   "*"                   "*"                  
7  ( 1 )  "*"                   "*"                   "*"                  
8  ( 1 )  "*"                   "*"                   "*"                  
9  ( 1 )  "*"                   "*"                   " "                  
10  ( 1 ) "*"                   "*"                   "*"                  
          poly(X, 10, raw = T)4 poly(X, 10, raw = T)5 poly(X, 10, raw = T)6
1  ( 1 )  " "                   " "                   " "                  
2  ( 1 )  "*"                   " "                   " "                  
3  ( 1 )  " "                   " "                   " "                  
4  ( 1 )  " "                   "*"                   " "                  
5  ( 1 )  " "                   "*"                   "*"                  
6  ( 1 )  " "                   " "                   " "                  
7  ( 1 )  " "                   "*"                   "*"                  
8  ( 1 )  "*"                   " "                   "*"                  
9  ( 1 )  "*"                   "*"                   "*"                  
10  ( 1 ) "*"                   "*"                   "*"                  
          poly(X, 10, raw = T)7 poly(X, 10, raw = T)8 poly(X, 10, raw = T)9
1  ( 1 )  " "                   " "                   " "                  
2  ( 1 )  " "                   " "                   " "                  
3  ( 1 )  " "                   " "                   " "                  
4  ( 1 )  " "                   " "                   " "                  
5  ( 1 )  " "                   " "                   " "                  
6  ( 1 )  "*"                   "*"                   "*"                  
7  ( 1 )  " "                   "*"                   " "                  
8  ( 1 )  " "                   "*"                   "*"                  
9  ( 1 )  "*"                   "*"                   "*"                  
10  ( 1 ) "*"                   "*"                   "*"                  
          poly(X, 10, raw = T)10
1  ( 1 )  " "                   
2  ( 1 )  " "                   
3  ( 1 )  " "                   
4  ( 1 )  " "                   
5  ( 1 )  " "                   
6  ( 1 )  " "                   
7  ( 1 )  "*"                   
8  ( 1 )  "*"                   
9  ( 1 )  "*"                   
10  ( 1 ) "*"

coef(subsets, 10)

           (Intercept)  poly(X, 10, raw = T)1  poly(X, 10, raw = T)2 
            4.17282867             9.51409233             0.86853993 
 poly(X, 10, raw = T)3  poly(X, 10, raw = T)4  poly(X, 10, raw = T)5 
            0.06886485             1.90382807             0.55109577 
 poly(X, 10, raw = T)6  poly(X, 10, raw = T)7  poly(X, 10, raw = T)8 
           -1.26499408            -0.15569320             0.31986888 
 poly(X, 10, raw = T)9 poly(X, 10, raw = T)10 
            0.01627747            -0.02690171

In this we see that the best models have the smallest coefs, signifying a smaller RSS value and larger R squared. We can see that as the variables increase in number, the best model also changes.

5. Repeat 3, using forward stepwise selection and using backwards stepwise selection. How does your answer compare to the results in 3?

# Forward stepwise selection
fit_fwd <- regsubsets(y ~ poly(X, 10, raw = TRUE), data = data.frame(y, X), nvmax = 10, method = "forward")
summary(fit_fwd)

Subset selection object
Call: regsubsets.formula(y ~ poly(X, 10, raw = TRUE), data = data.frame(y, 
    X), nvmax = 10, method = "forward")
10 Variables  (and intercept)
                          Forced in Forced out
poly(X, 10, raw = TRUE)1      FALSE      FALSE
poly(X, 10, raw = TRUE)2      FALSE      FALSE
poly(X, 10, raw = TRUE)3      FALSE      FALSE
poly(X, 10, raw = TRUE)4      FALSE      FALSE
poly(X, 10, raw = TRUE)5      FALSE      FALSE
poly(X, 10, raw = TRUE)6      FALSE      FALSE
poly(X, 10, raw = TRUE)7      FALSE      FALSE
poly(X, 10, raw = TRUE)8      FALSE      FALSE
poly(X, 10, raw = TRUE)9      FALSE      FALSE
poly(X, 10, raw = TRUE)10     FALSE      FALSE
1 subsets of each size up to 10
Selection Algorithm: forward
          poly(X, 10, raw = TRUE)1 poly(X, 10, raw = TRUE)2
1  ( 1 )  "*"                      " "                     
2  ( 1 )  "*"                      " "                     
3  ( 1 )  "*"                      " "                     
4  ( 1 )  "*"                      "*"                     
5  ( 1 )  "*"                      "*"                     
6  ( 1 )  "*"                      "*"                     
7  ( 1 )  "*"                      "*"                     
8  ( 1 )  "*"                      "*"                     
9  ( 1 )  "*"                      "*"                     
10  ( 1 ) "*"                      "*"                     
          poly(X, 10, raw = TRUE)3 poly(X, 10, raw = TRUE)4
1  ( 1 )  " "                      " "                     
2  ( 1 )  " "                      "*"                     
3  ( 1 )  "*"                      "*"                     
4  ( 1 )  "*"                      "*"                     
5  ( 1 )  "*"                      "*"                     
6  ( 1 )  "*"                      "*"                     
7  ( 1 )  "*"                      "*"                     
8  ( 1 )  "*"                      "*"                     
9  ( 1 )  "*"                      "*"                     
10  ( 1 ) "*"                      "*"                     
          poly(X, 10, raw = TRUE)5 poly(X, 10, raw = TRUE)6
1  ( 1 )  " "                      " "                     
2  ( 1 )  " "                      " "                     
3  ( 1 )  " "                      " "                     
4  ( 1 )  " "                      " "                     
5  ( 1 )  "*"                      " "                     
6  ( 1 )  "*"                      " "                     
7  ( 1 )  "*"                      "*"                     
8  ( 1 )  "*"                      "*"                     
9  ( 1 )  "*"                      "*"                     
10  ( 1 ) "*"                      "*"                     
          poly(X, 10, raw = TRUE)7 poly(X, 10, raw = TRUE)8
1  ( 1 )  " "                      " "                     
2  ( 1 )  " "                      " "                     
3  ( 1 )  " "                      " "                     
4  ( 1 )  " "                      " "                     
5  ( 1 )  " "                      " "                     
6  ( 1 )  " "                      " "                     
7  ( 1 )  " "                      " "                     
8  ( 1 )  "*"                      " "                     
9  ( 1 )  "*"                      "*"                     
10  ( 1 ) "*"                      "*"                     
          poly(X, 10, raw = TRUE)9 poly(X, 10, raw = TRUE)10
1  ( 1 )  " "                      " "                      
2  ( 1 )  " "                      " "                      
3  ( 1 )  " "                      " "                      
4  ( 1 )  " "                      " "                      
5  ( 1 )  " "                      " "                      
6  ( 1 )  "*"                      " "                      
7  ( 1 )  "*"                      " "                      
8  ( 1 )  "*"                      " "                      
9  ( 1 )  "*"                      " "                      
10  ( 1 ) "*"                      "*"

# Backward stepwise selection
fit_bwd <- regsubsets(y ~ poly(X, 10, raw = TRUE), data = data.frame(y, X), nvmax = 10, method = "backward")
summary(fit_bwd)

Subset selection object
Call: regsubsets.formula(y ~ poly(X, 10, raw = TRUE), data = data.frame(y, 
    X), nvmax = 10, method = "backward")
10 Variables  (and intercept)
                          Forced in Forced out
poly(X, 10, raw = TRUE)1      FALSE      FALSE
poly(X, 10, raw = TRUE)2      FALSE      FALSE
poly(X, 10, raw = TRUE)3      FALSE      FALSE
poly(X, 10, raw = TRUE)4      FALSE      FALSE
poly(X, 10, raw = TRUE)5      FALSE      FALSE
poly(X, 10, raw = TRUE)6      FALSE      FALSE
poly(X, 10, raw = TRUE)7      FALSE      FALSE
poly(X, 10, raw = TRUE)8      FALSE      FALSE
poly(X, 10, raw = TRUE)9      FALSE      FALSE
poly(X, 10, raw = TRUE)10     FALSE      FALSE
1 subsets of each size up to 10
Selection Algorithm: backward
          poly(X, 10, raw = TRUE)1 poly(X, 10, raw = TRUE)2
1  ( 1 )  "*"                      " "                     
2  ( 1 )  "*"                      " "                     
3  ( 1 )  "*"                      " "                     
4  ( 1 )  "*"                      " "                     
5  ( 1 )  "*"                      " "                     
6  ( 1 )  "*"                      " "                     
7  ( 1 )  "*"                      " "                     
8  ( 1 )  "*"                      " "                     
9  ( 1 )  "*"                      "*"                     
10  ( 1 ) "*"                      "*"                     
          poly(X, 10, raw = TRUE)3 poly(X, 10, raw = TRUE)4
1  ( 1 )  " "                      " "                     
2  ( 1 )  " "                      "*"                     
3  ( 1 )  " "                      "*"                     
4  ( 1 )  " "                      "*"                     
5  ( 1 )  " "                      "*"                     
6  ( 1 )  " "                      "*"                     
7  ( 1 )  " "                      "*"                     
8  ( 1 )  " "                      "*"                     
9  ( 1 )  " "                      "*"                     
10  ( 1 ) "*"                      "*"                     
          poly(X, 10, raw = TRUE)5 poly(X, 10, raw = TRUE)6
1  ( 1 )  " "                      " "                     
2  ( 1 )  " "                      " "                     
3  ( 1 )  "*"                      " "                     
4  ( 1 )  "*"                      "*"                     
5  ( 1 )  "*"                      "*"                     
6  ( 1 )  "*"                      "*"                     
7  ( 1 )  "*"                      "*"                     
8  ( 1 )  "*"                      "*"                     
9  ( 1 )  "*"                      "*"                     
10  ( 1 ) "*"                      "*"                     
          poly(X, 10, raw = TRUE)7 poly(X, 10, raw = TRUE)8
1  ( 1 )  " "                      " "                     
2  ( 1 )  " "                      " "                     
3  ( 1 )  " "                      " "                     
4  ( 1 )  " "                      " "                     
5  ( 1 )  " "                      "*"                     
6  ( 1 )  " "                      "*"                     
7  ( 1 )  "*"                      "*"                     
8  ( 1 )  "*"                      "*"                     
9  ( 1 )  "*"                      "*"                     
10  ( 1 ) "*"                      "*"                     
          poly(X, 10, raw = TRUE)9 poly(X, 10, raw = TRUE)10
1  ( 1 )  " "                      " "                      
2  ( 1 )  " "                      " "                      
3  ( 1 )  " "                      " "                      
4  ( 1 )  " "                      " "                      
5  ( 1 )  " "                      " "                      
6  ( 1 )  " "                      "*"                      
7  ( 1 )  " "                      "*"                      
8  ( 1 )  "*"                      "*"                      
9  ( 1 )  "*"                      "*"                      
10  ( 1 ) "*"                      "*"

par(mfrow=c(1,2))  
plot(fit_fwd, which="fit", main="Forward Selection")  
plot(fit_bwd, which="fit", main="Backward Elimination")

coef(fit_fwd, 10)

              (Intercept)  poly(X, 10, raw = TRUE)1  poly(X, 10, raw = TRUE)2 
               4.17282867                9.51409233                0.86853993 
 poly(X, 10, raw = TRUE)3  poly(X, 10, raw = TRUE)4  poly(X, 10, raw = TRUE)5 
               0.06886485                1.90382807                0.55109577 
 poly(X, 10, raw = TRUE)6  poly(X, 10, raw = TRUE)7  poly(X, 10, raw = TRUE)8 
              -1.26499408               -0.15569320                0.31986888 
 poly(X, 10, raw = TRUE)9 poly(X, 10, raw = TRUE)10 
               0.01627747               -0.02690171

coef(fit_bwd, 10)

              (Intercept)  poly(X, 10, raw = TRUE)1  poly(X, 10, raw = TRUE)2 
               4.17282867                9.51409233                0.86853993 
 poly(X, 10, raw = TRUE)3  poly(X, 10, raw = TRUE)4  poly(X, 10, raw = TRUE)5 
               0.06886485                1.90382807                0.55109577 
 poly(X, 10, raw = TRUE)6  poly(X, 10, raw = TRUE)7  poly(X, 10, raw = TRUE)8 
              -1.26499408               -0.15569320                0.31986888 
 poly(X, 10, raw = TRUE)9 poly(X, 10, raw = TRUE)10 
               0.01627747               -0.02690171

Here we see that when using forward stepwise and backwards stepwise slection, we get the same coefs as the Cp, BIC, and R2. This makes sense because we have 10 predictors and the data doesnt follow the conditions that lead to the phenomenon mentioned earlier. These methods can also be more computationally efficient.