## Description

Stat 437 HW4

Your Name (Your student ID)

## — Attaching packages ————————————— tidyverse 1.3.0 —

## v ggplot2 3.3.3 v purrr 0.3.4

## v tibble 3.0.6 v dplyr 1.0.4

## v tidyr 1.1.2 v stringr 1.4.0

## v readr 1.4.0 v forcats 0.5.1

## — Conflicts —————————————— tidyverse_conflicts() —

## x dplyr::filter() masks stats::filter()

## x dplyr::lag() masks stats::lag()

##

## Attaching package: ’reshape2’

## The following object is masked from ’package:tidyr’:

##

## smiths

##

## Attaching package: ’MASS’

## The following object is masked from ’package:dplyr’:

##

## select

General rule

Due by 11:59pm Pacific Standard Time, March 30, 2021. Please show your work and submit your

computer codes in order to get points. Providing correct answers without supporting details does

not receive full credits. This HW covers

• Bayes classifier

• kNN classifier

• Discriminant analysis

You DO NOT have to submit your HW answers using typesetting software. However, your answers

must be legible for grading. Please upload your answers to the course space. This is a relatively

long HW. So, please complete it very carefully.

For exercises on the Text, there might be solutions to them that are posted online. Please do not

plagiarize those solutions.

1

Conceptual exercises: I (Bayes classifier)

1. This exercise is on Bayes theorem and Bayes classifier.

1.1

State clearly the definition of the 0-1 loss function. Can this function be used in multi-class

classification problems?

1.1 Response

State clearly the definition of the 0-1 loss function. From Lecture Notes 4, the definition of

the 0-1 loss function is as follows:

• Let Gˆ(x) be the estimated class label for an observation x from X, i.e., Gˆ(x) = 1 or 2

• For k, l ∈ {1, 2}, the 0-1 loss L is defined as:

L(k, l) = 0 if k = l, and L(k, l) = 1 if k 6= 1

In other words: if there are classification errors, then the loss will be 1, otherwise, the loss will be

0.

Can this function be used in multi-class classification problems? Yes, the 0-1 loss function

can be utilized in multi-class classification problems.

TODO

1.2

Let Y be the random variable for the class label of a random vector X, such that Y ∈ G = {1, . . . , K}

where K ≥ 2 is the number of classes. Let Yˆ be the estimated class label for X. Given the prior

Pr(Y = k) = πk, k ∈ G on Class k and the conditional density fk(x) of X when it comes from

Class k. Provide the formula to obtain the posterior Pr(Y = k|X = x), which is essentially

the Bayes theorem. What is the Bayes classifier and how does it classify a new observation x0

from X? Is the decision boundary of the Bayes classifier linear or quadratic in X? Explain (but

do not have to mathematically prove) why the Bayes classifier minimizes the expected 0-1 loss.

Note the a proof of the fact that the Bayes classifier minimizes the expected 0-1 loss is given in

“LectureNotes4_notes.pdf”. You should not copy and paste the proof. Instead, please provide the

explanation based on your understanding of the proof.

1.2 Response

Provide the formula to obtain the posterior Pr(Y = k|X = x), which is essentially the

Bayes theorem. Lesson 8 – Lecture Notes 4b

What is the Bayes classifier and how does it classify a new observation x0 from X?

TODO

2

Is the decision boundary of the Bayes classifier linear or quadratic in X? TODO

Explain (but do not have to mathematically prove) why the Bayes classifier minimizes

the expected 0-1 loss. TODO

1.3

If K = 2 in subquestion 1.2), what is the threshold value on Pr(Y = 1|X = x0) that is used by

the Bayes classifier to determine the class label for x0? Suppose you use a different threshold value

on Pr(Y = 1|X = x0) to classify x0, is the corresponding classifier still the Bayes classifier, and is

the corresponding loss function still the 0-1 loss? Explain your answer. Provide a scenario where

to classify an observation a different threshold value is more sensible than the threshold value used

by the Bayes classifier.

1.3 Response

If K = 2 in subquestion 1.2), what is the threshold value on Pr(Y = 1|X = x0) that is

used by the Bayes classifier to determine the class label for x0? TODO

Suppose you use a different threshold value on Pr(Y = 1|X = x0) to classify x0, is the

corresponding classifier still the Bayes classifier, and is the corresponding loss function

still the 0-1 loss? TODO

Provide a scenario where to classify an observation a different threshold value is more

sensible than the threshold value used by the Bayes classifier. TODO

1.4

If K = 2 in subquestion 1.2), π1 = 0.6, f1(x) ∼ Gaussian(0, 1) and f2(x) ∼ Gaussian(2, 1) and

x0 = 1.5. Compute Pr(Y = 1|X = x0) and use the Bayes classifier to classify x0.

1.4 Response

Compute Pr(Y = 1|X = x0) and use the Bayes classifier to classify x0. TODO

Conceptual exercises: II (k-NN classifier)

2. Given the training set T of n observations (x1, y1), . . . ,(xn, yn), where yi

is the class label

of observation xi and yi ∈ G = {1, . . . , K} for K ≥ 2, consider k-NN classifier, where k is the

neighborhood size.

2.1

Describe how the decision boundary (such as its smoothness and shape) of k-NN classifier changes

as k changes.

3

2.1 Response

Describe how the decision boundary (such as its smoothness and shape) of k-NN

classifier changes as k changes. The shape and texture of the decision boundary will become

smoother with less definition as k increases. Because a larger “neighborhood” of points is taken

into consideration, the decision boundary is impacted less by local points and their general shape

will be seen better. TODO

2.2

Explain why the training error of 1-NN classifier is 0. Provide an estimator of the test error of

a classifier and explain why it can be used as such an estimator. Is it true that a large k leads

to a k-NN classifier with smaller test error? Can the test error of a k-NN classifier be equal to

the test error of the Bayes classifier? When k is large and k/n is small, what is a k-NN classifier

approximately estimating?

2.2 Response

Explain why the training error of 1-NN classifier is 0. In the case of 1-NN, only the nearest

point to the point being classified will be taken into account. Because there is only one point, it is

guaranteed to be the supermajority class label, thus having an error of 0. TODO

Provide an estimator of the test error of a classifier and explain why it can be used as

such an estimator.

The test error can be computed by em =

1

mΣ

m

i=1L(y0i

, yˆ0i). Alternatively, if the test error cannot

be computed (e.g. no test samples), the training error, ?n =

1

nΣ

n

i=1L(yi

, yˆi), can serve as a good

estimate of the test error because TODO used to estimate test error or 0-1 loss?

https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=4f1a98a7-eeb3-467e-8a53-

acd0005822c0 1:12

TODO

Is it true that a large k leads to a k-NN classifier with smaller test error? A large k will

not necessarily lead to a smaller k-NN test error. If k is too small, the model will be over-fit to

the training samples and will likely not generalize well. Conversely, if k is too large, the model will

likely be too general. Therefore, the optimal value for k requires finding a balance between small

and large neighborhoods.

TODO

Can the test error of a k-NN classifier be equal to the test error of the Bayes classifier?

Yes, given that kNN essentially estimates the unknown conditional probabilities used by the Bayes

classifier, kNN asymptotically approaches the error of Bayes, possibly equaling it.

When k is large and k/n is small, what is a k-NN classifier approximately estimating?

In the case of k being large and k/n being small, the k-nn classifier is effectively calculating the

posterior probability that a point belongs to a given class. This is formally described in the lecture

materials as gˆj (x) ≈ P r(Y = j | X = x).

4

2.3

When there are K ≥ 2 classes, how does a k-NN classifier classify a test observation x0?

2.3 Response

When there are K ≥ 2 classes, how does a k-NN classifier classify a test observation

x0?

For the case where k = 2, based on the content from the lecture materials:

Let Nk(x) be the neighborhood of x consisting of k points xi

in the training set T that are closest

to x.

Compute the average label as: Y¯ (x) = 1

kΣxi∈Nk(x)yi

Specify a threshold β ∈ (0, 1) and estimate the class label of x as Yˆ (x) = 1, if Y¯ (x) β or Yˆ (x) = 0

otherwise.

In other words: Once the model is trained, find the k nearest neighbors to the new point being

classified, and predict the label that is most common among the neighbors.

https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=447d454e-bec9-4f4d-9df2-

acd000582323 3:06

https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=4f1a98a7-eeb3-467e-8a53-

acd0005822c0 13:18

TODO

2.4

When should data be standardized before applying a k-NN classifier? When standardizing data,

do we standardize each observation or each feature?

2.4 Response

When should data be standardized before applying a k-NN classifier?

Data should be standardized when variables contain a wide range of values since k-NN utilizes the

Euclidean distance as a dissimilarity measure which is vulnerable to outliers.

When standardizing data, do we standardize each observation or each feature?

When standardizing data the features are scaled.

2.5

Using your understanding of Example 3 in “LectureNotes4b_notes.pdf”, provide a step-by-step

guide on how to choose an optimal k for k-NN classifier using cross-validation. You can provide

such as guide in the form of “pseudocode” (see, e.g., https://en.wikipedia.org/wiki/Pseudocode for

some details on pseudocode). Suppose the training set has few observations, can you still perform

cross-validation in order to choose an optimal k? Explain your answer. (Hint: for the 2nd part,

think about if having more observations helps better estimate test error.)

5

2.5 Response

Using your understanding of Example 3 in “LectureNotes4b_notes.pdf”, provide a

step-by-step guide on how to choose an optimal k for k-NN classifier using crossvalidation.

Start by defining a range of k-values to test, as well as a number of folds for the cross-validation

process. Then, for each fold, test the determined range of k-values and obtain a cross-validated

error measure. The value of k for which the lowest cross-validated error is produced is then chosen

as the estimated value for k. TODO

Suppose the training set has few observations, can you still perform cross-validation

in order to choose an optimal k?

Cross-validation can still be used to determine k if only a few observations are present; however,

the estimated value of k may not be a good choice if there are few observations, making the method

not recommended in this case.

Conceptual exercises: III (Discriminant analysis)

## 3 Exercise 2 of Section 4.7 of the Text, which starts with “It was stated in the text that

classifying an observation to the class for which (4.12) is largest is equivalent to classifying an

observation to the class for which (4.13) is largest. Prove that this is the case.” (Helpful information

on how to prove this is contained in the lecture video on LDA and “LectureNotes5b_notes.pdf”.)

3 Response

Exercise 2 of Section 4.7 of the Text, which starts with “It was stated in the text

that classifying an observation to the class for which (4.12) is largest is equivalent to

classifying an observation to the class for which (4.13) is largest. Prove that this is

the case.”

TODO

## 4 Exercise 3 of Section 4.7 of the Text, which starts with “This problem relates to the QDA

model, in which the observations within each class are drawn from a normal distribution with a

class specific mean vector and a class specific covariance matrix. We consider the simple case where

p = 1; i.e. there is only one feature.” (Helpful information on how to prove this is contained in the

lecture video on QDA and “LectureNotes5b_notes.pdf”.)

4 Response

Exercise 3 of Section 4.7 of the Text, which starts with “This problem relates to the

QDA model, in which the observations within each class are drawn from a normal

distribution with a class specific mean vector and a class specific covariance matrix.

We consider the simple case where p = 1; i.e. there is only one feature.”

TODO

## 5 Exercise 5 of Section 4.7 of the Text, which starts with “We now examine the differences

between LDA and QDA.” (Hint: for this question, you may also use information from Figures 4.9,

4.10 and 4.11 in the Text.)

6

5 Response

Exercise 5 of Section 4.7 of the Text, which starts with “We now examine the differences between LDA and QDA.”

5 (a)

Given a linear Bayes decision boundary, it is expected LDA will perform best on the test set. For

the training set, it may be possible for QDA to have a lower error rate than LDA, but only if the

model is overfitting to the training data, and where the data is not perfectly linearly separable.

TODO

5 (b)

Given a non-linear Bayes decision boundary, we would expect QDA to perform best (closest approximate the Bayes decision boundary) on both the training and the test.

TODO

5 (c)

As the sample size n increases, one would expect QDA to perform better on anything but a data set

that is not perfectly linearly separable since the linear decision boundary will incur more and more

errors as sample size is increased for non-linearly separable data. In the case of perfectly linearly

separable data, one would expect them to perform the same since a linear decision boundary is

capable of optimally separating the classes.

TODO

5 (d)

False. For two reasons: First, the Bayes decision boundary is considered to be the “gold standard”

which other classification methods approximate based on available information. If the Bayes decision boundary is linear, then the most desirable decision boundary will be linear. Second, QDA

does not produce a linear decision boundary. It may “model” a linear decision boundary by producing a decision boundary that is very nearly linear, but it does not produce a linear boundary;

for this we would use LDA. An example of where QDA will certainly not perform better than LDA

is on data where the classes are perfectly linearly separable. In this case, it is not possible for QDA

to outperform LDA because LDA will be able to fit a boundary which yields 0% error. This may

be rare in practice, but it does serve as a counter example to an answer of “True” for this question.

TODO – SVM vs LDA/QDA? Does perfect linear separability matter?

6. Let Y be the random variable for the class label of a random vector X ∈ R

p

(where p is

the number of features), such that Y ∈ G = {1, . . . , K} and Pr(Y = k) = πk for Class k with

k ∈ G, where K ≥ 2 is the number of classes. Consider the Gaussian mixture model such that

the conditional density of X when it comes from Class k is fk(x) ∼ Gaussian(µk, Σk). Given

the training set T of n observations (x1, y1), . . . ,(xn, yn) on (X, Y ), where yi

is the class label of

observation xi

, do the following:

7

6.1

Provide the MLEs of πk, µk and Σk for each k ∈ G respectively for the case where all Σk’s are

equal and for the case where not all Σk’s are equal. When p n, is the MLE of Σk still accurate?

If not, recommend a different estimator for estimating Σk and provide details on this estimator.

6.1 Response

Provide the MLEs of πk, µk and Σk for each k ∈ G respectively for the case where all

Σk’s are equal and for the case where not all Σk’s are equal.

TODO

When p n, is the MLE of Σk still accurate? If not, recommend a different estimator

for estimating Σk and provide details on this estimator.

TODO

6.2

Assume p = 2 and K = 2 and k = 1. For the density fk(x) ∼ Gaussian(µk, Σk), what shape do

its contours take, and how does Σk control the shape of these contours? How do you check if the

conditional density of X given that it comes from Class k is Gaussian?

6.2 Response

For the density fk(x) ∼ Gaussian(µk, Σk), what shape do its contours take, and how

does Σk control the shape of these contours?

TODO

How do you check if the conditional density of X given that it comes from Class k is

Gaussian?

TODO

6.3

Is it true that discriminant analysis will perform badly if the Gaussian assumption is violated?

(Hint: for this question, you may also use the information provided by Figures 4.10 and 4.11 of

the Text.) Let X = (X1, . . . , Xp)

P , i.e., X1 up to Xp are the feature variables. Can discriminant

analysis be applied to observations of X when some of Xj , j = 1 . . . , p is a discrete variable (such

as a categorical variable)? Explain your answer.

6.3 Response

Is it true that discriminant analysis will perform badly if the Gaussian assumption is

violated?

TODO

8

Can discriminant analysis be applied to observations of X when some of Xj , j = 1 . . . , p

is a discrete variable (such as a categorical variable)?

TODO

6.4

What is a ROC curve, and what is AUC? How is AUC used to gauge the performance of a classifier?

If you apply the same classifier, say, LDA or QDA under the same Gaussian mixture model, to

two data sets that are independently generated from the same data generating process, i.e., that

are independently generated from (X, Y ) for classification problems, and obtain two ROC curves,

would the two ROC curves be quite different? Explain your answer. When there are 3 or more

classes, are the codes provided in the lecture notes able to obtain ROC curves and their AUC’s for

LDA and QDA?

6.4 Response

What is a ROC curve, and what is AUC?

TODO

How is AUC used to gauge the performance of a classifier?

TODO

If you apply the same classifier, say, LDA or QDA under the same Gaussian mixture

model, to two data sets that are independently generated from the same data generating process, i.e., that are independently generated from (X, Y ) for classification

problems, and obtain two ROC curves, would the two ROC curves be quite different?

TODO

When there are 3 or more classes, are the codes provided in the lecture notes able to

obtain ROC curves and their AUC’s for LDA and QDA?

TODO

6.5

Describe the key similarities and differences, respectively, between LDA and logistic regression.

Provide a situation where discriminant analysis can still be sensibly applied but logistic regression

is not well-defined.

6.5 Response

Describe the key similarities and differences, respectively, between LDA and logistic

regression.

https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=4f1a98a7-eeb3-467e-8a53-

acd0005822c0 12:06 ?

TODO

9

Provide a situation where discriminant analysis can still be sensibly applied but logistic

regression is not well-defined.

TODO

Applied exercises: I (k-NN classifier)

7. Please refer to the NYC flight data nycflights13 that has been discussed in the lecture notes and

whose manual can be found at https://cran.r-project.org/web/packages/nycflights13/index.html.

We will use flights, a tibble from nycflights13.

Please use set.seed(123) for the whole of this exercise. Randomly select from flights for each

of the 3 carrier “UA”, “AA” or “DL” 500 observations for the 3 features dep_delay, arr_delay

and distance. Let us try to see if we can use the 3 features to identify if an observation belongs a

specific carrier. The following tasks and questions are based on the extracted observations. Note

that you need to remove rows with na’s from the extracted observations.

# Setting the seed for the chunk

set.seed(123)

# Inspecting the raw data

head(flights)

## # A tibble: 6 x 19

## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time

## <int <int <int <int <int <dbl <int <int

## 1 2013 1 1 517 515 2 830 819

## 2 2013 1 1 533 529 4 850 830

## 3 2013 1 1 542 540 2 923 850

## 4 2013 1 1 544 545 -1 1004 1022

## 5 2013 1 1 554 600 -6 812 837

## 6 2013 1 1 554 558 -4 740 728

## # … with 11 more variables: arr_delay <dbl, carrier <chr, flight <int,

## # tailnum <chr, origin <chr, dest <chr, air_time <dbl, distance <dbl,

## # hour <dbl, minute <dbl, time_hour <dttm

# Extracting bulk data, filtering, and removing missing

# values

filtered <- flights %% dplyr::select(carrier, dep_delay, arr_delay,

distance) %% filter(carrier %in% c(“UA”, “AA”, “DL”)) %%

na.omit()

# Randomly selecting 500 observations from the bulk data

fl500 <- sample_n(filtered, size = 500, replace = FALSE)

# Inspecting the pre-processed data

head(fl500)

10

## # A tibble: 6 x 4

## carrier dep_delay arr_delay distance

## <chr <dbl <dbl <dbl

## 1 AA -9 -48 733

## 2 UA 83 41 2565

## 3 UA -7 -41 2565

## 4 AA -4 -39 2475

## 5 UA 9 -4 1400

## 6 DL -7 -42 2475

7.1

First, you need to standardize the features since they are on very different scales. Then randomly

split the observations into a training set that contains 70% of the observations and a test set that

contains the remaining observations.

7.1 Code

# Setting the seed for the chunk

set.seed(123)

# Standardizing features and splitting data and label columns

fl500Labels <- fl500 %% dplyr::select(carrier) %% as.data.frame()

fl500Std <- as.data.frame(scale(fl500[, 2:4]))

# Splitting the samples into train and test sets

trainingIndices <- sample(1:nrow(fl500Std), floor(0.7 * nrow(fl500Std)),

replace = FALSE)

testIndices <- (1:nrow(fl500Std))[-trainingIndices]

trainData <- as.data.frame(fl500Std[trainingIndices, ])

trainLabels <- as.data.frame(fl500Labels[trainingIndices, ])

colnames(trainLabels) <- “carrier”

testData <- as.data.frame(fl500Std[testIndices, ])

testLabels <- as.data.frame(fl500Labels[testIndices, ])

colnames(testLabels) <- “carrier”

7.2

Consider the observations as forming 3 classes that are determined by carrier. To the training

set, apply 10 fold cross-validation to k-NN classifier with features arr_delay, dep_delay, and

distance to determine the optimal k from the values {1, . . . , 15}. Apply the optimal k-NN to the

test set, provide the classification table and the overall error rate, and provide visualization of the

classification results. Do you think the error rate is reasonable? Explain your answer. (Hint: you

can follow the strategy provided by Example 3 in “LectureNotes4b_notes.pdf”. )

11

7.2 Code

# Setting chunk seed

set.seed(123)

# 10-folds CV to determine best k from 1-15

m <- 10

folds <- sample(1:m, nrow(trainData), replace = TRUE)

kMax <- 15

testErrors <- matrix(0, nrow = 2, ncol = kMax)

for (k in 1:kMax) {

testError1 <- double(m)

for (s in 1:m) {

trainingTmp <- trainData[folds != s, ]

testTmp <- trainData[folds == s, ]

trainingLabsTmp <- trainLabels[folds != s, ]

testLabsTmp <- trainLabels[folds == s, ]

knntmp <- knn(trainingTmp, testTmp, trainingLabsTmp,

k)

nMissedObs <- sum(1 – as.numeric(knntmp == testLabsTmp))

tError <- nMissedObs/length(testLabsTmp)

testError1[s] <- tError

}

testErrors[, k] <- c(mean(testError1), sd(testError1))

}

colnames(testErrors) <- paste(“k=”, 1:kMax, sep = “”)

rownames(testErrors) <- c(“mean(TestError)”, “sd(TestError)”)

testErrors <- as.data.frame(testErrors)

as.numeric(testErrors[1, ])

## [1] 0.5772902 0.5775851 0.5623222 0.5594703 0.5369742 0.5313338 0.5313181

## [8] 0.5366948 0.5329016 0.5386426 0.5388035 0.5336004 0.5449212 0.5380716

## [15] 0.5372920

hatk <- which(testErrors[1, ] == min(testErrors[1, ]))

hatk

## [1] 7

12

# Applying kNN

knnOut <- knn(trainData, testData, trainLabels$carrier, hatk)

missedObs <- sum(1 – as.numeric(knnOut == testLabels$carrier))

tErrorOpt <- missedObs/length(testLabels$carrier)

# Classification table and error rate

table(knnOut, testLabels$carrier)

##

## knnOut AA DL UA

## AA 2 2 6

## DL 14 29 21

## UA 17 18 41

cat(“Error Rate:”, tErrorOpt, “\n”)

## Error Rate: 0.52

# Visualizing results

testResData <- as.data.frame(testData)

testResData$Carrier <- unlist(testLabels)

testResData$EstimatedCarrier <- unlist(knnOut)

ggplot(testResData, aes(arr_delay, distance)) + geom_point(aes(shape = EstimatedCarrier,

color = Carrier)) + theme_bw() + labs(title = paste0(hatk,

“-NN Results”))

13

−1

0

1

0 2 4

arr_delay

distance

EstimatedCarrier

AA

DL

UA

Carrier

AA

DL

UA

7−NN Results

7.2 Response

Do you think the error rate is reasonable? No, in this case the model did not perform

adequately having produced an error rate of 52%.

7.3

Note that your have standardized the features arr_delay, dep_delay, and distance. However,

with the unstandardized features, you would surely know that none of them follows a Gaussian

distribution no matter with respect to which class (i.e., carrier) you look at their observations

since these features have non-negative values (whereas a Gaussian distribution can generate negative

values). Again, the 3 classes are determined by carrier. So, you will apply QDA based on the 3

standardized the features to the training set to train the model, and then apply the trained model

to the test set (that contains the 3 standardized the features) to classify its observations.

7.3a

First, check if the Gaussian assumption is satisfied. For this, note that if the standardized features

arr_delay, dep_delay, and distance follow a trivariate Gaussian distribution for each individual

class, then any pair among the 3 standardized features follows a bivariate Gaussian distribution for

each individual class.

14

7.3a Code

# Setting the seed for the chunk

set.seed(123)

# Inspecting the data

head(fl500Std)

## dep_delay arr_delay distance

## 1 -0.57254000 -1.2198869 -0.88996180

## 2 1.90785556 0.8324748 1.54589275

## 3 -0.51861836 -1.0584652 1.54589275

## 4 -0.43773589 -1.0123447 1.42622741

## 5 -0.08724522 -0.2052362 -0.00310864

## 6 -0.51861836 -1.0815255 1.42622741

# Collecting data

fl500StdQDA <- as.data.frame(fl500Std)

fl500StdQDA$Carrier <- fl500Labels$carrier

# Melting the data to prepare for ggplot application

fl500Melt <- melt(fl500StdQDA, id = “Carrier”)

# Plotting the densities of the class variables

ggplot(fl500Melt, aes(x = value, fill = Carrier)) + geom_density(alpha = 0.25) +

theme_bw() + labs(title = “Class Distributions”, x = “Value”,

y = “Density”)

15

0.00

0.25

0.50

0.75

1.00

1.25

−2 0 2 4 6 8

Value

Density

Carrier

AA

DL

UA

Class Distributions

# TODO: add contour plots for the three variables,

# LectureNotes5b

# https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cd443046-bb83-4fc6-b94d-acdb011e6a73

# Splitting data into training and test sets, 70-30 ratio

trainingIndices <- sample(1:nrow(fl500StdQDA), floor(0.7 * nrow(fl500StdQDA)),

replace = FALSE)

testIndices <- (1:nrow(fl500StdQDA))[-trainingIndices]

trainData <- fl500StdQDA[trainingIndices, ]

testData <- fl500StdQDA[testIndices, ]

# Training the model

qdaTrain <- qda(Carrier ~ ., data = trainData)

7.3a Response

Visualizing the observations from each of the three classes “AA” “DL” and “UA” the class data

(mixed variables) do not appear to exhibit a trivariate Gaussian distribution as their curves are

highly irregular, appear to be bi-modal rather than tri-modal, and also have very long right tails.

TODO Double check w/ Carl

16

7.3b

Apply the estimated (i.e., trained) QDA model to the test set, provide the estimated mixing

proportion and estimated mean vector for each class, and provide the classification table. If you

randomly pick an observation on the 3 standardized features, is it approximately equally likely to

belong to each of the 3 carriers? (You do not need to mathematically prove your answer. However,

think along this line: we are testing the equality of 3 population proportions, and each estimated

population proportion is based on around 350 observations, which approximately can be done via

a z-test since the central limit theorem is in effect.) How is the performance of QDA on this test

set? Explain your answers.

7.3b Code

# Setting the seed for the chunk

set.seed(123)

# Testing the model

qdaTest <- predict(qdaTrain, newdata = testData)

# Estimated mixing proportions

cat(“Estimated Proportion – AA:”, sum(fl500$carrier == “AA”)/nrow(fl500),

“\n”)

## Estimated Proportion – AA: 0.202

cat(“Estimated Proportion – DL:”, sum(fl500$carrier == “DL”)/nrow(fl500),

“\n”)

## Estimated Proportion – DL: 0.348

cat(“Estimated Proportion – UA:”, sum(fl500$carrier == “UA”)/nrow(fl500),

“\n”)

## Estimated Proportion – UA: 0.45

# Estimated mean vector

mv <- fl500 %% dplyr::select(carrier, dep_delay, arr_delay,

distance) %% dplyr::group_by(carrier) %% dplyr::summarise(mean_dep_delay = mean(dep_delay),

mean_arr_delay = mean(arr_delay), mean_distance = mean(distance))

mv

## # A tibble: 3 x 4

## carrier mean_dep_delay mean_arr_delay mean_distance

## * <chr <dbl <dbl <dbl

17

## 1 AA 5.63 -4.36 1434.

## 2 DL 12.2 6.10 1261.

## 3 UA 15.2 8.13 1497.

# TODO: mean vector standardized or not?

# Classification table and error

table(testData$Carrier, qdaTest$class)

##

## AA DL UA

## AA 0 22 11

## DL 1 23 25

## UA 0 37 31

cat(“Classification Error:”, 1 – mean(qdaTest$class == testData$Carrier))

## Classification Error: 0.64

7.3b Response

If you randomly pick an observation on the 3 standardized features, is it approximately

equally likely to belong to each of the 3 carriers? No, observations belonging to UA are about

twice as likely as that of AA. The mixing proportions for all three carriers (classes) are:

• AA: 20.2%

• DL: 34.8%

• UA: 45%

TODO: is this more about whether any given sample could be predicted to be in all three? or the

actual known sample proportions?

How is the performance of QDA on this test set? QDA Performed poorly on the test set

with an error rate of 61.33%. This is likely because the Gaussian Mixture Model assumption is

violated strongly based on the density plots of each class.

7.3c

Extract observations that are for “UA” or “DL” from the training set and the test set, respectively,

to form a new training set and a new subset, so that there are now 2 classes “UA” and “DL”.

Apply QDA to the new training set and then apply the trained model to the new test set. Report

the overall error rate on the test set, provide the ROC curve, and calculate the AUC. How is the

performance of QDA on this test set? Explain your answer.

18

7.3c Code

# Setting the seed for the chunk

set.seed(123)

# Filtering observations in classes ‘UA’ and ‘DL’ from

# previous train/test sets

trainData2 <- trainData %% dplyr::select(dep_delay, arr_delay,

distance, Carrier) %% dplyr::filter(Carrier %in% c(“UA”,

“DL”))

testData2 <- testData %% dplyr::select(dep_delay, arr_delay,

distance, Carrier) %% dplyr::filter(Carrier %in% c(“UA”,

“DL”))

# Training the QDA model

qdaTrain2 <- qda(Carrier ~ ., data = trainData2)

qdaTest2 <- predict(qdaTrain2, newdata = testData2)

# Classification table and error rate

table(testData2$Carrier, qdaTest2$class)

##

## DL UA

## DL 23 26

## UA 37 31

cat(“Classification Error:”, 1 – mean(qdaTest2$class == testData2$Carrier),

“\n”)

## Classification Error: 0.5384615

# ROC and AUC

predQDA <- ROCR::prediction(qdaTest2$posterior[, 2], testData2$Carrier)

perfROC <- ROCR::performance(predQDA, “tpr”, “fpr”)

plot(perfROC, avg = “threshold”, spread.estimate = “boxplot”,

colorize = TRUE, main = “ROC Curve”)

19

ROC Curve

Average false positive rate

Average true positive rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.37 0.49 0.62 0.75 0.87 1

AUC <- predQDA %% ROCR::performance(measure = “auc”) %% [email protected]

cat(“AUC:”, AUC[[1]], “\n”)

## AUC: 0.4846939

7.3c Response

How is the performance of QDA on this test set? The error rate for this model (compared

to the last with all three carriers/classes) was slightly lower at 44.54%, but still failed to perform

adequately. Based on the ROC curve, and the AUC (0.54), the model performed only slightly

better than random guessing.

Applied exercises: II (Discriminant analysis)

8. The following is on software commands:

8.1

What is the main cause of the message “Warning in lda.default(x, grouping, . . . ): variables are

collinear”? What is the main cause of the message “Error in qda.default(x, grouping, . . . ) : some

group is too small for ‘qda’ ”?

20

8.1 Response

What is the main cause of the message “Warning in lda.default(x, grouping, . . . ):

variables are collinear”?

This error indicates that the covariance matrix is singular, e.g., some variable(s) are not linearly

independent.

What is the main cause of the message “Error in qda.default(x, grouping, . . . ) : some

group is too small for ‘qda’ ”?

This error is caused when there are not enough observations of each class to estimate corresponding

component parameters. This means that the standard estimate of component covariance matrices

will be singular.

8.2

Provide details on the list that predict{MASS} returns.

8.2 Response

The list returned by predict{MASS} contains two components: list$class contains the MAP

(maximum a posteriori) classification, and list$posterior contains the posterior probabilities for

each class in the form of an n × k matrix where k is the number of classes and n is the number of

observations.

8.3

The arguments gamma and lambda of rda{klaR} are usually determined by cross-validation. Can

they be set manually?

8.3 Response

Yes, they can be set manually by specifying gamma= and lambda= within the rda{klaR} function

call.

9. We will use the human cancer microarray data that were discussed in the lectures and are

provided by the R library ElemStatLearn (available at https://cran.r-project.org/src/contrib/

Archive/ElemStatLearn/). Pick 3 cancer types “MELANOMA”, “OVARIAN” and “RENAL”,

and randomly select the same set of 60 genes for each cancer type. Please use set.seed(123) for

the whole of this exercise. Your analysis will be based on observations for these genes and cancer

types.

# Using:

# https://cran.r-project.org/src/contrib/Archive/ElemStatLearn/ElemStatLearn_2015.6.26.tar.gz

# Setting the seed for the chunk

set.seed(123)

21

# Inspecting the raw data

nciRaw <- as.data.frame(nci)

head(nciRaw)

## CNS CNS CNS RENAL BREAST CNS CNS BREAST NSCLC

## 1 0.300 0.679961 0.940 2.800000e-01 0.485 0.310 -0.830 -0.190 0.460

## 2 1.180 1.289961 -0.040 -3.100000e-01 -0.465 -0.030 0.000 -0.870 0.000

## 3 0.550 0.169961 -0.170 6.800000e-01 0.395 -0.100 0.130 -0.450 1.150

## 4 1.140 0.379961 -0.040 -8.100000e-01 0.905 -0.460 -1.630 0.080 -1.400

## 5 -0.265 0.464961 -0.605 6.250000e-01 0.200 -0.205 0.075 0.005 -0.005

## 6 -0.070 0.579961 0.000 -1.387779e-17 -0.005 -0.540 -0.360 0.350 -0.700

## NSCLC RENAL RENAL RENAL RENAL RENAL RENAL RENAL BREAST NSCLC RENAL

## 1 0.760 0.270 -0.450 -0.030 0.710 -0.360 -0.210 -0.500 -1.060 0.150 -0.290

## 2 1.490 0.630 -0.060 -1.120 0.000 -1.420 -1.950 -0.520 -2.190 -0.450 0.000

## 3 0.280 -0.360 0.150 -0.050 0.160 -0.030 -0.700 -0.660 -0.130 -0.320 0.050

## 4 0.100 -1.040 -0.610 0.000 -0.770 -2.280 -1.650 -2.610 0.000 -1.610 0.730

## 5 -0.525 0.015 -0.395 -0.285 0.045 0.135 -0.075 0.225 -0.485 -0.095 0.385

## 6 0.360 -0.040 0.150 -0.250 -0.160 -0.320 0.060 -0.050 -0.430 -0.080 0.390

## UNKNOWN OVARIAN MELANOMA PROSTATE OVARIAN OVARIAN OVARIAN OVARIAN OVARIAN

## 1 -0.200 0.430 -0.490 -0.530 -0.010 0.640 -0.480 0.140 0.640

## 2 0.740 0.500 0.330 -0.050 -0.370 0.550 0.970 0.720 0.150

## 3 0.080 -0.730 0.010 -0.230 -0.160 -0.540 0.300 -0.240 -0.170

## 4 0.760 0.600 -1.660 0.170 0.930 -1.780 0.470 0.000 0.550

## 5 -0.105 -0.635 -0.185 0.825 0.395 0.315 0.425 1.715 -0.205

## 6 -0.080 -0.430 -0.140 0.010 -0.100 0.810 0.020 0.260 0.290

## PROSTATE NSCLC NSCLC NSCLC LEUKEMIA K562B-repro K562A-repro LEUKEMIA

## 1 0.070 0.130 0.320 0.515 0.080 0.410 -0.200 -0.36998050

## 2 0.290 2.240 0.280 1.045 0.120 0.000 0.000 -1.38998000

## 3 0.070 0.640 0.360 0.000 0.060 0.210 0.060 -0.05998047

## 4 1.310 0.680 -1.880 0.000 0.400 0.180 -0.070 0.07001953

## 5 0.085 0.135 0.475 0.330 0.105 -0.255 -0.415 -0.07498047

## 6 -0.620 0.300 0.110 -0.155 -0.190 -0.110 0.020 0.04001953

## LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA COLON COLON COLON

## 1 -0.370 -0.430 -0.380 -0.550 -0.32003900 -0.620 -4.900000e-01

## 2 0.180 -0.590 -0.550 0.000 0.08996101 0.080 4.200000e-01

## 3 0.000 -0.500 -1.710 0.100 -0.29003900 0.140 -3.400000e-01

## 4 -1.320 -1.520 -1.870 -2.390 -1.03003900 0.740 7.000000e-02

## 5 -0.825 -0.785 -0.585 -0.215 0.09496101 0.205 -2.050000e-01

## 6 -0.130 0.520 0.120 -0.620 0.05996101 0.000 -1.387779e-17

## COLON COLON COLON COLON MCF7A-repro BREAST MCF7D-repro

## 1 0.07001953 -0.120 -0.290 -0.8100195 0.200 0.37998050 0.3100195

## 2 -0.82998050 0.000 0.030 0.0000000 -0.230 0.44998050 0.4800195

## 3 -0.59998050 -0.010 -0.310 0.2199805 0.360 0.65998050 0.9600195

## 4 -0.90998050 0.130 1.500 0.7399805 0.180 0.76998050 0.9600195

## 5 0.24501950 0.555 0.005 0.1149805 -0.315 0.05498047 -0.2149805

## 6 -0.43998050 -0.550 -0.540 0.1199805 0.410 0.54998050 0.3700195

## BREAST NSCLC NSCLC NSCLC MELANOMA BREAST BREAST MELANOMA

22

## 1 0.030 -0.42998050 0.160 0.010 -0.620 -0.380 0.04998047 0.650

## 2 0.220 -0.38998050 -0.340 -1.280 -0.130 0.000 -0.72001950 0.640

## 3 0.150 -0.17998050 -0.020 -0.770 0.200 -0.060 0.41998050 0.150

## 4 -1.240 0.86001950 -1.730 0.940 -1.410 0.800 0.92998050 -1.970

## 5 -0.305 0.78501950 -0.625 -0.015 1.585 -0.115 -0.09501953 -0.065

## 6 0.050 0.04001953 -0.140 0.270 1.160 0.180 0.19998050 0.130

## MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA

## 1 -0.030 -0.270 0.210 -5.000000e-02 0.350

## 2 -0.480 0.630 -0.620 1.400000e-01 -0.270

## 3 0.070 -0.100 -0.150 -9.000000e-02 0.020

## 4 -0.700 1.100 -1.330 -1.260000e+00 -1.230

## 5 -0.195 1.045 0.045 4.500000e-02 -0.715

## 6 0.410 0.080 -0.400 -2.710505e-20 -0.340

# Extracting data based on required cancer types

n0 <- dim(nci)[2]

p0 <- dim(nci)[1]

rSel <- sample(1:p0, size = 60, replace = FALSE)

chk <- colnames(nci) %in% c(“MELANOMA”, “OVARIAN”, “RENAL”)

cSel <- which(chk == TRUE)

ncia <- nci[rSel, cSel]

colnames(ncia) <- colnames(nci)[cSel]

# Creating the dataframe for storing the required information

# with labels in the correct format

nciData <- data.frame(t(ncia))

nciData$Class <- colnames(ncia)

rownames(nciData) <- NULL

dim(nciData)

## [1] 23 61

head(nciData)

## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

## 1 -1.030000e+00 0.08 -0.06 0.8500195 -0.10 0.00 -0.23 -0.55 0.00 0.11

## 2 4.900000e-01 -0.30 -0.42 0.7200195 1.63 -0.80 0.08 -0.16 0.00 -0.83

## 3 -1.380000e+00 -0.94 -0.68 -0.2299805 -0.20 0.51 0.14 0.60 0.00 -0.18

## 4 -8.605855e-19 0.30 -0.35 0.1300195 0.85 0.13 -0.69 0.38 -0.17 -0.55

## 5 1.420000e+00 -0.50 -0.26 0.2900195 0.89 -0.06 -0.76 0.15 0.17 0.51

## 6 2.000000e-01 -0.27 0.03 -0.2299805 0.46 -0.53 -0.62 0.40 0.00 1.54

## X11 X12 X13 X14 X15 X16 X17

## 1 -0.669961 -0.1449902 0.00 -0.329990200 0.2600195 -0.40 -0.7300195

## 2 0.480039 0.4050098 0.68 -0.459990200 0.7700195 -0.16 -0.6200195

## 3 1.320039 -0.2649902 0.09 -0.009990234 2.8000190 0.10 0.1499805

23

## 4 -0.249961 -0.1649902 0.63 -0.249990200 -0.2499805 -0.01 -0.2000195

## 5 0.570039 -0.4449902 0.41 -0.189990200 0.4900195 -0.35 -0.4000195

## 6 0.130039 -0.4449902 0.66 -0.769990200 0.6500195 -0.62 -0.1000195

## X18 X19 X20 X21 X22 X23 X24 X25 X26 X27

## 1 -0.30498050 -0.46 -0.49 -0.5999902 0.24 0.255 -0.28 0.60 0.98500980 -0.49

## 2 -0.34498050 0.24 -0.26 -0.3099902 0.72 0.235 -0.41 -0.07 -0.36499020 -0.29

## 3 0.08501949 -0.67 0.22 0.7800098 0.12 -0.875 -0.37 0.04 1.42501000 0.68

## 4 -0.09498051 0.23 0.26 0.1200098 -0.21 -0.145 -0.43 -0.02 0.15500980 0.52

## 5 -0.27498050 1.29 0.13 -0.1899902 -0.06 0.755 -0.54 -0.36 0.01500977 0.37

## 6 -0.67498050 1.68 -0.36 0.5700098 -0.19 0.335 0.00 0.23 0.45500980 0.03

## X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38

## 1 -1.255 -0.025 -0.61 0.80 0.68 -0.96 2.215 0.175 0.16 0.68 0.68001950

## 2 -0.195 -0.665 -0.03 0.10 -0.50 -0.09 1.255 -0.165 0.07 -0.10 -1.15998100

## 3 -0.345 0.255 -0.01 -0.40 0.00 0.13 1.765 -0.055 0.39 -0.18 0.76001950

## 4 0.000 -0.455 -0.50 -0.62 -0.68 -0.24 -1.085 0.315 -0.24 0.18 -0.55998050

## 5 -0.635 -0.075 -0.12 -0.73 -0.50 0.17 -0.415 0.055 -0.17 -0.01 0.05001949

## 6 -0.415 0.665 0.44 0.00 -1.04 0.09 -0.535 0.045 0.62 -0.29 -0.49998050

## X39 X40 X41 X42 X43 X44 X45 X46 X47 X48

## 1 -0.935 -0.30 -0.16 -0.16 0.105 1.18003900 4.500000e-01 -0.0925 -0.505 0.83

## 2 -0.505 -0.31 -0.11 0.10 1.955 1.30003900 -4.700000e-01 0.3375 -0.465 0.10

## 3 0.475 0.15 -0.67 -1.04 0.475 -0.77996100 -1.214306e-17 -0.6525 -0.045 -0.36

## 4 0.335 -0.15 -0.41 -0.32 1.455 -0.63996100 3.200000e-01 0.0075 0.765 -0.20

## 5 0.165 -0.48 -0.04 -0.24 0.795 -0.08996101 -2.100000e-01 0.1075 -0.715 0.03

## 6 -0.265 0.16 -0.10 -1.08 1.375 0.93003900 -1.500000e-01 0.0875 0.015 -0.55

## X49 X50 X51 X52 X53 X54 X55 X56 X57

## 1 0.1649902 -0.4450098 0.385 0.04001949 0.18 -0.51 0.28 -0.23 0.1300195

## 2 0.0000000 0.2249902 0.305 -0.78998050 -0.72 0.18 -0.15 -0.69 -0.1799805

## 3 1.6349900 -0.1150098 -0.475 -0.40998050 0.24 -0.22 -0.08 -0.03 -0.4099805

## 4 2.4049900 0.3149902 0.485 -0.24998050 -0.30 -0.01 -0.17 -0.63 0.3200195

## 5 1.6749900 0.5649902 0.245 -0.04998051 -0.22 0.13 -0.25 -0.03 -0.2299805

## 6 2.7449900 0.8549902 0.595 -0.02998051 -0.11 0.05 -0.48 0.33 0.4600195

## X58 X59 X60 Class

## 1 0.66 -0.03 3.50 RENAL

## 2 0.02 -0.04 0.43 RENAL

## 3 0.04 0.00 -0.85 RENAL

## 4 -0.02 -0.17 -0.12 RENAL

## 5 0.10 0.65 -1.00 RENAL

## 6 0.45 0.28 0.00 RENAL

9.1

Pick 2 features and visualize the observations using the 2 features. Do you think it is hard to classify

the observations based on the amount of overlap among the 3 neighborhoods of observations in each

of the 3 classes? Here “a neighborhood of observations in a class” is a “open disk that contains the

observations in the class”.

24

9.1 Code

# Setting the seed for the chunk

set.seed(123)

# Preparing the data for visualization

nciSub <- nciData %% dplyr::select(X1, X2, Class)

# Plotting the data with two features (X1, X2)

ggplot(nciSub, aes(X1, X2)) + geom_point(aes(color = Class)) +

theme_bw() + ggtitle(“Data by Class for Features X1, X2”)

−1.0

−0.5

0.0

0.5

1.0

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

X1

X2

Class

MELANOMA

OVARIAN

RENAL

Data by Class for Features X1, X2

9.1 Response

Do you think it is hard to classify the observations based on the amount of overlap

among the 3 neighborhoods of observations in each of the 3 classes? It does appear that

the classes will be difficult to separate given the overlap. There is some visual separation as blue

seems to group towards the bottom left, red towards the bottom right, and green towards the top

center. Having said this, these data are visualized in the plot in a 2D space, whereas the actual

data exist in a 60D space. Therefore, it is difficult to say with certainty that it will be impossible

to separate the classes in their ambient space.

25

9.2

Apply LDA and report the classwise error rate for each cancer type.

9.2 Code

# Setting the seed for the chunk

set.seed(123)

# Training the LDA model

lda.fit <- lda(Class ~ ., data = nciData)

## Warning in lda.default(x, grouping, …): variables are collinear

# Testing the model

lda.pred <- predict(lda.fit, nciData)

TrueClassLabel <- nciData$Class

LDAEstimatedClassLabel <- lda.pred$class

# Generating 2×2 prediction table and computing class-wise

# error rates

predTable <- table(LDAEstimatedClassLabel, TrueClassLabel)

predTable

## TrueClassLabel

## LDAEstimatedClassLabel MELANOMA OVARIAN RENAL

## MELANOMA 6 0 0

## OVARIAN 0 5 0

## RENAL 2 1 9

cwDF <- as.matrix.data.frame(predTable)

correct <- diag(cwDF)

total <- colSums(cwDF)

cwError <- 1 – (correct/total)

cat(“Classwise Error:”, cwError, “\n”)

## Classwise Error: 0.25 0.1666667 0

9.3

Use the library klaR, and apply regularized discriminant analysis (RDA) by setting the arguments

gamma and lambda of rda{klaR} manually so that the resulting classwise error rate for each cancer

type is zero.

26

9.3 Code

# Setting the seed for the chunk

set.seed(123)

# Training the RDA model

rda.fit <- rda(Class ~ ., data = nciData, gamma = 0.825, lambda = 0.37)

# Testing the model

rda.pred <- predict(rda.fit, nciData)

# Collecting performance metrics

TrueClassLabel <- nciData$Class

rDAEstimatedClassLabel <- rda.pred$class

# Computing and displaying performance metrics

predTable <- table(rDAEstimatedClassLabel, TrueClassLabel)

predTable

## TrueClassLabel

## rDAEstimatedClassLabel MELANOMA OVARIAN RENAL

## MELANOMA 8 0 0

## OVARIAN 0 6 0

## RENAL 0 0 9

cwDF <- as.matrix.data.frame(predTable)

correct <- diag(cwDF)

total <- colSums(cwDF)

cwError <- 1 – (correct/total)

cat(“Classwise Error:”, cwError, “\n”)

## Classwise Error: 0 0 0

9.4

Obtain the estimated covariance matrices from the RDA and visualize them using the same strategy in Example 3 in “LectureNotes5c_notes.pdf”. What can you say about the degree of dependence among these genes for each of the three cancer types? (Hint and caution: the class labels

“MELANOMA”, “OVARIAN” and “RENAL” will be ordered alphabetically by R. So, you need to

keep track on which estimated covariance matrix is for which class. Otherwise, you will get wrong

visualization.)

27

9.4 Code

# Setting the seed for the chunk

set.seed(123)

# Extracting covariance matrices

hatSigma1 <- rda.fit$covariances[, , 1]

hatSigma2 <- rda.fit$covariances[, , 2]

hatSigma3 <- rda.fit$covariances[, , 3]

# Melting covariance matrices and prepping data for

# visualization

melted_hatSigma1 <- melt(hatSigma1)

melted_hatSigma2 <- melt(hatSigma2)

melted_hatSigma3 <- melt(hatSigma3)

EstSigmaAll <- rbind(melted_hatSigma1, melted_hatSigma2, melted_hatSigma3)

EstSigmaAll$Cancer <- rep(c(“MELANOMA”, “OVARIAN”, “RENAL”),

each = nrow(melted_hatSigma1))

EstSigmaAll$Cancer <- factor(EstSigmaAll$Cancer)

# Plotting the data

ggplot(EstSigmaAll, aes(Var1, Var2, fill = value)) + geom_tile() +

scale_fill_gradient2(low = “blue”, high = “red”) + facet_grid(~Cancer) +

labs(title = “Estimated Covariance Matrices”, x = “”, y = “”) +

theme(plot.title = element_text(hjust = 0.5))

28

MELANOMA OVARIAN RENAL

X1X2X3X4X5X6X7X8X109X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60X1X2X3X4X5X6X7X8X109X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60X1X2X3X4X5X6X7X8X109X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60

X1X2X3X4X5X6X7X8X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60

−2

−1

0

1

2

value

Estimated Covariance Matrices

9.4 Response

What can you say about the degree of dependence among these genes for each of the

three cancer types? RENAL cancer expresses the most dependence between genes, with OVARIAN

cancer expressing the least dependence.

TODO enough?

29