Earl F Glynn
14 June 2014
dim(HairEyeColor)
## [1] 4 4 2
HairEyeColor
## , , Sex = Male
##
## Eye
## Hair Brown Blue Hazel Green
## Black 32 11 10 3
## Brown 53 50 25 15
## Red 10 10 7 7
## Blond 3 30 5 8
##
## , , Sex = Female
##
## Eye
## Hair Brown Blue Hazel Green
## Black 36 9 5 2
## Brown 66 34 29 14
## Red 16 7 7 7
## Blond 4 64 5 8
attributes(HairEyeColor)
## $dim
## [1] 4 4 2
##
## $dimnames
## $dimnames$Hair
## [1] "Black" "Brown" "Red" "Blond"
##
## $dimnames$Eye
## [1] "Brown" "Blue" "Hazel" "Green"
##
## $dimnames$Sex
## [1] "Male" "Female"
##
##
## $class
## [1] "table"
str(HairEyeColor)
## table [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...
## - attr(*, "dimnames")=List of 3
## ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"
## ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"
## ..$ Sex : chr [1:2] "Male" "Female"
HairEyeColor[1,1,1]
## [1] 32
HairEyeColor["Black", "Brown", "Male"]
## [1] 32
HairEyeColor[4,4,2]
## [1] 8
HairEyeColor["Blond", "Green", "Female"]
## [1] 8
males <- HairEyeColor[,,1]
dim(males)
## [1] 4 4
males
## Eye
## Hair Brown Blue Hazel Green
## Black 32 11 10 3
## Brown 53 50 25 15
## Red 10 10 7 7
## Blond 3 30 5 8
females <- HairEyeColor[,,"Female"]
dim(females)
## [1] 4 4
females
## Eye
## Hair Brown Blue Hazel Green
## Black 36 9 5 2
## Brown 66 34 29 14
## Red 16 7 7 7
## Blond 4 64 5 8
males
## Eye
## Hair Brown Blue Hazel Green
## Black 32 11 10 3
## Brown 53 50 25 15
## Red 10 10 7 7
## Blond 3 30 5 8
apply(males, 1, sum)
## Black Brown Red Blond
## 56 143 34 46
apply(males, 2, sum)
## Brown Blue Hazel Green
## 98 101 47 33
The third argument is a function that accepts a vector and returns a single value.
Commentary: R’s use of constants “1” and “2” here is a poor software engineering practice.
females
## Eye
## Hair Brown Blue Hazel Green
## Black 36 9 5 2
## Brown 66 34 29 14
## Red 16 7 7 7
## Blond 4 64 5 8
females <- cbind(females, TOTAL=apply(females,1,sum))
females <- rbind(females, TOTAL=apply(females,2,sum))
females
## Brown Blue Hazel Green TOTAL
## Black 36 9 5 2 52
## Brown 66 34 29 14 143
## Red 16 7 7 7 37
## Blond 4 64 5 8 81
## TOTAL 122 114 46 31 313
apply(HairEyeColor, 1, sum)
## Black Brown Red Blond
## 108 286 71 127
apply(HairEyeColor, 2, sum)
## Brown Blue Hazel Green
## 220 215 93 64
apply(HairEyeColor, 3, sum)
## Male Female
## 279 313
names <- c("Joe B Cool", "Jane R Smith", "First Middle Last")
splits <- strsplit(names, " ")
splits
## [[1]]
## [1] "Joe" "B" "Cool"
##
## [[2]]
## [1] "Jane" "R" "Smith"
##
## [[3]]
## [1] "First" "Middle" "Last"
splits[[1]]
## [1] "Joe" "B" "Cool"
splits is a list of character arrays.
Let’s use lapply to visit each node in list. We’ll apply the “[” subscript operator/function with an index parameter to each node and use unlist function to convert result from list to vector.
first <- unlist(lapply(splits, "[", 1))
first
## [1] "Joe" "Jane" "First"
middle <- unlist(lapply(splits, "[", 2))
middle
## [1] "B" "R" "Middle"
last <- unlist(lapply(splits, "[", 3))
last
## [1] "Cool" "Smith" "Last"
Parsed name:
paste0("<",first[1], "><", middle[1], "><", last[1], ">")
## [1] "<Joe><B><Cool>"
split can be used to split a data.frame into a list of lists. lapply/sapply can be used to process each split. But ddply in “plyr” package may be simpler approach.
1977 US Census data
options(width=100)
head(state.x77)
## Population Income Illiteracy Life Exp Murder HS Grad Frost Area
## Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
## Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
## Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
## Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
## California 21198 5114 1.1 71.71 10.3 62.6 20 156361
## Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
str(state.x77)
## num [1:50, 1:8] 3615 365 2212 2110 21198 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
## ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...
state.x77[,"Area"]
## Alabama Alaska Arizona Arkansas California Colorado
## 50708 566432 113417 51945 156361 103766
## Connecticut Delaware Florida Georgia Hawaii Idaho
## 4862 1982 54090 58073 6425 82677
## Illinois Indiana Iowa Kansas Kentucky Louisiana
## 55748 36097 55941 81787 39650 44930
## Maine Maryland Massachusetts Michigan Minnesota Mississippi
## 30920 9891 7826 56817 79289 47296
## Missouri Montana Nebraska Nevada New Hampshire New Jersey
## 68995 145587 76483 109889 9027 7521
## New Mexico New York North Carolina North Dakota Ohio Oklahoma
## 121412 47831 48798 69273 40975 68782
## Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee
## 96184 44966 1049 30225 75955 41328
## Texas Utah Vermont Virginia Washington West Virginia
## 262134 82096 9267 39780 66570 24070
## Wisconsin Wyoming
## 54464 97203
state.region
## [1] South West West South West West
## [7] Northeast South South South West West
## [13] North Central North Central North Central North Central South South
## [19] Northeast South Northeast North Central North Central South
## [25] North Central West North Central West Northeast Northeast
## [31] West Northeast South North Central North Central South
## [37] West Northeast Northeast South North Central South
## [43] South West Northeast South West South
## [49] North Central West
## Levels: Northeast South North Central West
tapply(state.x77[,"Area"], state.region, sum)
## Northeast South North Central West
## 163269 873682 751824 1748019
1000 * tapply(state.x77[,"Population"], state.region, sum)
## Northeast South North Central West
## 49456000 67330000 57636000 37899000
Population density [people/sq mile] by region:
1000 * tapply(state.x77[,"Population"], state.region, sum) / tapply(state.x77[,"Area"], state.region, sum)
## Northeast South North Central West
## 302.91 77.06 76.66 21.68
Coursera R Programming class: https://www.coursera.org/course/rprog
Week 3 videos