apply, lapply, tapply

R Programming

Beginner’s Session

Earl F Glynn
14 June 2014

R Programming

Source: Coursera R Programming class: https://www.coursera.org/course/rprog

apply: dataset HairEyeColor

?HairEyeColor

help(package=datasets)

dim(HairEyeColor)

## [1] 4 4 2

HairEyeColor

## , , Sex = Male
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    32   11    10     3
##   Brown    53   50    25    15
##   Red      10   10     7     7
##   Blond     3   30     5     8
## 
## , , Sex = Female
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    36    9     5     2
##   Brown    66   34    29    14
##   Red      16    7     7     7
##   Blond     4   64     5     8

attributes(HairEyeColor)

## $dim
## [1] 4 4 2
## 
## $dimnames
## $dimnames$Hair
## [1] "Black" "Brown" "Red"   "Blond"
## 
## $dimnames$Eye
## [1] "Brown" "Blue"  "Hazel" "Green"
## 
## $dimnames$Sex
## [1] "Male"   "Female"
## 
## 
## $class
## [1] "table"

str(HairEyeColor)

##  table [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"
##   ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"
##   ..$ Sex : chr [1:2] "Male" "Female"

Review of indexing

HairEyeColor[1,1,1]

## [1] 32

HairEyeColor["Black", "Brown", "Male"]

## [1] 32

HairEyeColor[4,4,2]

## [1] 8

HairEyeColor["Blond", "Green", "Female"]

## [1] 8

2D subsets

males <- HairEyeColor[,,1]
dim(males)

## [1] 4 4

males

##        Eye
## Hair    Brown Blue Hazel Green
##   Black    32   11    10     3
##   Brown    53   50    25    15
##   Red      10   10     7     7
##   Blond     3   30     5     8

females <- HairEyeColor[,,"Female"]
dim(females)

## [1] 4 4

females

##        Eye
## Hair    Brown Blue Hazel Green
##   Black    36    9     5     2
##   Brown    66   34    29    14
##   Red      16    7     7     7
##   Blond     4   64     5     8

Males: row and column sums with ‘apply’

males

##        Eye
## Hair    Brown Blue Hazel Green
##   Black    32   11    10     3
##   Brown    53   50    25    15
##   Red      10   10     7     7
##   Blond     3   30     5     8

apply(males, 1, sum)

## Black Brown   Red Blond 
##    56   143    34    46

apply(males, 2, sum)

## Brown  Blue Hazel Green 
##    98   101    47    33

The third argument is a function that accepts a vector and returns a single value.

Commentary: R’s use of constants “1” and “2” here is a poor software engineering practice.

Females: row and column sums with ‘apply’

females

##        Eye
## Hair    Brown Blue Hazel Green
##   Black    36    9     5     2
##   Brown    66   34    29    14
##   Red      16    7     7     7
##   Blond     4   64     5     8

females <- cbind(females, TOTAL=apply(females,1,sum))
females <- rbind(females, TOTAL=apply(females,2,sum))
females

##       Brown Blue Hazel Green TOTAL
## Black    36    9     5     2    52
## Brown    66   34    29    14   143
## Red      16    7     7     7    37
## Blond     4   64     5     8    81
## TOTAL   122  114    46    31   313

3D apply

apply(HairEyeColor, 1, sum)

## Black Brown   Red Blond 
##   108   286    71   127

apply(HairEyeColor, 2, sum)

## Brown  Blue Hazel Green 
##   220   215    93    64

apply(HairEyeColor, 3, sum)

##   Male Female 
##    279    313

lapply: splitting strings

strsplit and lapply

names <- c("Joe B Cool", "Jane R Smith", "First Middle Last")
splits <- strsplit(names, " ")
splits

## [[1]]
## [1] "Joe"  "B"    "Cool"
## 
## [[2]]
## [1] "Jane"  "R"     "Smith"
## 
## [[3]]
## [1] "First"  "Middle" "Last"

splits[[1]]

## [1] "Joe"  "B"    "Cool"

splits is a list of character arrays.

Let’s use lapply to visit each node in list. We’ll apply the “[” subscript operator/function with an index parameter to each node and use unlist function to convert result from list to vector.

first <- unlist(lapply(splits, "[", 1))
first

## [1] "Joe"   "Jane"  "First"

middle <- unlist(lapply(splits, "[", 2))
middle

## [1] "B"      "R"      "Middle"

last <- unlist(lapply(splits, "[", 3))
last

## [1] "Cool"  "Smith" "Last"

Parsed name:

paste0("<",first[1], "><", middle[1], "><", last[1], ">")

## [1] "<Joe><B><Cool>"

split can be used to split a data.frame into a list of lists. lapply/sapply can be used to process each split. But ddply in “plyr” package may be simpler approach.

tapply: apply function to subsets

?state

1977 US Census data

options(width=100)
head(state.x77)

##            Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
## Alabama          3615   3624        2.1    69.05   15.1    41.3    20  50708
## Alaska            365   6315        1.5    69.31   11.3    66.7   152 566432
## Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
## Arkansas         2110   3378        1.9    70.66   10.1    39.9    65  51945
## California      21198   5114        1.1    71.71   10.3    62.6    20 156361
## Colorado         2541   4884        0.7    72.06    6.8    63.9   166 103766

str(state.x77)

##  num [1:50, 1:8] 3615 365 2212 2110 21198 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
##   ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...

state.x77[,"Area"]

##        Alabama         Alaska        Arizona       Arkansas     California       Colorado 
##          50708         566432         113417          51945         156361         103766 
##    Connecticut       Delaware        Florida        Georgia         Hawaii          Idaho 
##           4862           1982          54090          58073           6425          82677 
##       Illinois        Indiana           Iowa         Kansas       Kentucky      Louisiana 
##          55748          36097          55941          81787          39650          44930 
##          Maine       Maryland  Massachusetts       Michigan      Minnesota    Mississippi 
##          30920           9891           7826          56817          79289          47296 
##       Missouri        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
##          68995         145587          76483         109889           9027           7521 
##     New Mexico       New York North Carolina   North Dakota           Ohio       Oklahoma 
##         121412          47831          48798          69273          40975          68782 
##         Oregon   Pennsylvania   Rhode Island South Carolina   South Dakota      Tennessee 
##          96184          44966           1049          30225          75955          41328 
##          Texas           Utah        Vermont       Virginia     Washington  West Virginia 
##         262134          82096           9267          39780          66570          24070 
##      Wisconsin        Wyoming 
##          54464          97203

state.region

##  [1] South         West          West          South         West          West         
##  [7] Northeast     South         South         South         West          West         
## [13] North Central North Central North Central North Central South         South        
## [19] Northeast     South         Northeast     North Central North Central South        
## [25] North Central West          North Central West          Northeast     Northeast    
## [31] West          Northeast     South         North Central North Central South        
## [37] West          Northeast     Northeast     South         North Central South        
## [43] South         West          Northeast     South         West          South        
## [49] North Central West         
## Levels: Northeast South North Central West

tapply(state.x77[,"Area"], state.region, sum)

##     Northeast         South North Central          West 
##        163269        873682        751824       1748019

1000 * tapply(state.x77[,"Population"], state.region, sum)

##     Northeast         South North Central          West 
##      49456000      67330000      57636000      37899000

Population density [people/sq mile] by region:

1000 * tapply(state.x77[,"Population"], state.region, sum) / tapply(state.x77[,"Area"], state.region, sum)

##     Northeast         South North Central          West 
##        302.91         77.06         76.66         21.68

Additional ‘apply’ info

Coursera R Programming class: https://www.coursera.org/course/rprog

R Programming

Week 3 videos

lapply
apply
tapply
split
mapply