time.1 <- Sys.time()
format(time.1, "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-102701"

Load TRAINING feature.matrix file

format(Sys.time(), "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-102701"
load("plankton-train-wndchrm-features.Rdata", verbose=TRUE)
## Loading objects:
##   feature.matrix
format(Sys.time(), "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-102705"

feature.matrix is a matrix of doubles for storage efficiency – much better than a data.frame.

class(feature.matrix)
## [1] "matrix"
typeof(feature.matrix)
## [1] "double"

center and scale data

Create z values:

\[z = \frac{x - \mu_x}{\sigma_x}\]

format(Sys.time(), "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-102705"
plankton <- feature.matrix[,1]              # plankton class
scaledTrain <- scale(feature.matrix[, -1])  # exclude class variable in first column
format(Sys.time(), "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-102727"
dim(scaledTrain)
## [1] 30336  2894
save(plankton, scaledTrain, file="plankton-train-wndchrm-scaled.Rdata")

SVD analysis

format(Sys.time(), "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-102752"
svdTrain <- svd(scaledTrain)
format(Sys.time(), "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-104021"

Let’s inspect sizes of the three SVD components, u, d, and v.

(u and v are unitary rotation matrices and d is the diagonal of a scaling matrix.)

Note: Original matrix can be constructed from SVD decomposition:

svdTrain$u %*% diag(svdTrain$d) %*% t(svdTrain$v)
dim(svdTrain$u)
## [1] 30336  2894
length(svdTrain$d)
## [1] 2894
dim(svdTrain$v)
## [1] 2894 2894
save(svdTrain, file="plankton-train-wndchrm-svd.Rdata")

Principal Component Analysis

Let’s compute PCs separately with prcomp – but PCA and SVD results are related.

format(Sys.time(), "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-104056"
pcaTrain <- prcomp(scaledTrain)   # If not already scaled, use scale.=TRUE
format(Sys.time(), "%Y-%m-%d-%H%M%S")
## [1] "2015-02-16-105728"
save(pcaTrain, file="plankton-train-wndchrm-pca.Rdata")

First 75 Features

Let’s look at first 75 features associated with largest 75 eigenvalues …

Neigen <- 75
plot(svdTrain$d[1:Neigen]^2 / sum(svdTrain$d^2),pch=19,
     main=paste("Features associated with first", Neigen, "eigenvalues"),
     xlab="Eigenvalue rank",
     ylab="Fraction of variance explained by feature")
grid()
mtext(paste0(nrow(scaledTrain), " rows, ", ncol(scaledTrain), " variables"))

plot(cumsum(svdTrain$d[1:Neigen]^2 / sum(svdTrain$d^2)),pch=19,
     main=paste("First", Neigen, "eigenvalues"),
     xlab="Eigenvalue rank",
     ylab="Cumulative fraction of variance explained")
grid()

The plot above shows that first 75 features explain about 65% of the variance.

Features associated with the first Neigen eigenvalues explain large share of variance.

boxplot(svdTrain$v[,1:Neigen],
        xlab="Eigenvalue rank",
        main="Boxplots of right singular vector weights")
mtext(paste("First", Neigen, "right singular vectors"))

Patterns of variance in the rows

par(mfrow=c(2,2))
smoothScatter(svdTrain$v[,1], main="Right singular vector v1")
smoothScatter(svdTrain$v[,2], main="Right singular vector v2")
smoothScatter(svdTrain$v[,3], main="Right singular vector v3")

par(mfrow=c(2,2))

smoothScatter(svdTrain$v[,1], svdTrain$v[,2])
smoothScatter(svdTrain$v[,1], svdTrain$v[,3])
smoothScatter(svdTrain$v[,2], svdTrain$v[,3])

PCA and SVD

Show PCA results are the same as right singular vectors when SVD is done with scaled data.

par(mfrow=c(2,2))
smoothScatter(pcaTrain$rotation[,1], pcaTrain$rotation[,2])
smoothScatter(pcaTrain$rotation[,1], pcaTrain$rotation[,3])
smoothScatter(pcaTrain$rotation[,2], pcaTrain$rotation[,3])  

3D interactive view of first three principal components

The first column of feature.matrix gives the index of the 121 plankton classes in the training data. This column is converted to a factor for displaying the plankton in different colors – but with so many classes, they don’t all have unique colors.

library(rgl)
library(pca3d)
pca3d(pcaTrain, group=as.factor(feature.matrix[,1]))
## 
## Legend:
## ----------------------------------------
##        group:        color,        shape
## ----------------------------------------
##            1:          red, tetrahaedron
##            2:       green3,         cube
##            3:         blue,       sphere
##            4:         cyan, tetrahaedron
##            5:      magenta,         cube
##            6:       yellow,       sphere
##            7:         gray, tetrahaedron
##            8:        black,         cube
##            9:          red,       sphere
##           10:       green3, tetrahaedron
##           11:         blue,         cube
##           12:         cyan,       sphere
##           13:      magenta, tetrahaedron
##           14:       yellow,         cube
##           15:         gray,       sphere
##           16:        black, tetrahaedron
##           17:          red,         cube
##           18:       green3,       sphere
##           19:         blue, tetrahaedron
##           20:         cyan,         cube
##           21:      magenta,       sphere
##           22:       yellow, tetrahaedron
##           23:         gray,         cube
##           24:        black,       sphere
##           25:          red, tetrahaedron
##           26:       green3,         cube
##           27:         blue,       sphere
##           28:         cyan, tetrahaedron
##           29:      magenta,         cube
##           30:       yellow,       sphere
##           31:         gray, tetrahaedron
##           32:        black,         cube
##           33:          red,       sphere
##           34:       green3, tetrahaedron
##           35:         blue,         cube
##           36:         cyan,       sphere
##           37:      magenta, tetrahaedron
##           38:       yellow,         cube
##           39:         gray,       sphere
##           40:        black, tetrahaedron
##           41:          red,         cube
##           42:       green3,       sphere
##           43:         blue, tetrahaedron
##           44:         cyan,         cube
##           45:      magenta,       sphere
##           46:       yellow, tetrahaedron
##           47:         gray,         cube
##           48:        black,       sphere
##           49:          red, tetrahaedron
##           50:       green3,         cube
##           51:         blue,       sphere
##           52:         cyan, tetrahaedron
##           53:      magenta,         cube
##           54:       yellow,       sphere
##           55:         gray, tetrahaedron
##           56:        black,         cube
##           57:          red,       sphere
##           58:       green3, tetrahaedron
##           59:         blue,         cube
##           60:         cyan,       sphere
##           61:      magenta, tetrahaedron
##           62:       yellow,         cube
##           63:         gray,       sphere
##           64:        black, tetrahaedron
##           65:          red,         cube
##           66:       green3,       sphere
##           67:         blue, tetrahaedron
##           68:         cyan,         cube
##           69:      magenta,       sphere
##           70:       yellow, tetrahaedron
##           71:         gray,         cube
##           72:        black,       sphere
##           73:          red, tetrahaedron
##           74:       green3,         cube
##           75:         blue,       sphere
##           76:         cyan, tetrahaedron
##           77:      magenta,         cube
##           78:       yellow,       sphere
##           79:         gray, tetrahaedron
##           80:        black,         cube
##           81:          red,       sphere
##           82:       green3, tetrahaedron
##           83:         blue,         cube
##           84:         cyan,       sphere
##           85:      magenta, tetrahaedron
##           86:       yellow,         cube
##           87:         gray,       sphere
##           88:        black, tetrahaedron
##           89:          red,         cube
##           90:       green3,       sphere
##           91:         blue, tetrahaedron
##           92:         cyan,         cube
##           93:      magenta,       sphere
##           94:       yellow, tetrahaedron
##           95:         gray,         cube
##           96:        black,       sphere
##           97:          red, tetrahaedron
##           98:       green3,         cube
##           99:         blue,       sphere
##          100:         cyan, tetrahaedron
##          101:      magenta,         cube
##          102:       yellow,       sphere
##          103:         gray, tetrahaedron
##          104:        black,         cube
##          105:          red,       sphere
##          106:       green3, tetrahaedron
##          107:         blue,         cube
##          108:         cyan,       sphere
##          109:      magenta, tetrahaedron
##          110:       yellow,         cube
##          111:         gray,       sphere
##          112:        black, tetrahaedron
##          113:          red,         cube
##          114:       green3,       sphere
##          115:         blue, tetrahaedron
##          116:         cyan,         cube
##          117:      magenta,       sphere
##          118:       yellow, tetrahaedron
##          119:         gray,         cube
##          120:        black,       sphere
##          121:          red, tetrahaedron
## 
## Creating new device

There’s a fairly long delay (a minute or two) before the interactive display appears.

Shown below are three snapshots from interactive views of the first three principal components:

Plankton PCA View 1

Plankton PCA View 1

Plankton PCA View 1


time.2 <- Sys.time()
cat(sprintf("%.1f", as.numeric(difftime(time.2, time.1, units="secs"))), " secs\n")
## 1928.6  secs

efg @EarlGlynn

2015-02-16 1059