This page shows several examples of reading an Excel file into a data.frame.
Recall Sample.csv file created using writeLines. View this csv on GitHub.
This CSV file was loaded into Excel .xlsx (Excel 2010) and .xls (Excel 2003) files. Download Sample.xlsx or Sample.xls.
Reading Excel files into R can be problematic. Since I have not found one package that works with all Excel files, I show four approaches below.
Find all the R code in this Gist, R-IO-Excel.R.
If using 64-bit Java, must use 64-bit R.
# Avoid Java out-of-memory error for large Excel files. See p. 16,
# http://cran.r-project.org/web/packages/XLConnect/vignettes/XLConnect.pdf
options(java.parameters = "-Xmx1024m")
library(XLConnect)
## XLConnect 0.2-7 by Mirai Solutions GmbH
## http://www.mirai-solutions.com ,
## http://miraisolutions.wordpress.com
d1 <- readWorksheetFromFile("Sample.xlsx", sheet = 1) # .xls or .xlsx
dim(d1)
## [1] 4 5
str(d1)
## 'data.frame': 4 obs. of 5 variables:
## $ Grade : chr "A" "B" "F" "B"
## $ Age : num 23 56 NA 35
## $ Code : chr "ABC" "DEF" NA "XYZ"
## $ Amount: num 48.98 2.72 0 3.14
## $ Start : POSIXct, format: "2014-01-01" "2014-01-02" ...
d1
## Grade Age Code Amount Start
## 1 A 23 ABC 48.982 2014-01-01
## 2 B 56 DEF 2.718 2014-01-02
## 3 F NA <NA> 0.000 <NA>
## 4 B 35 XYZ 3.142 2014-01-05
If using 64-bit Java, must use 64-bit R.
Read Excel file 2003 or earlier (.xls)
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
##
## Attaching package: 'xlsx'
##
## The following objects are masked from 'package:XLConnect':
##
## createFreezePane, createSheet, createSplitPane, getCellStyle,
## getSheets, loadWorkbook, removeSheet, saveWorkbook,
## setCellStyle, setColumnWidth
d2 <- read.xlsx("Sample.xls", sheetIndex = 1)
dim(d2)
## [1] 4 5
str(d2)
## 'data.frame': 4 obs. of 5 variables:
## $ Grade : Factor w/ 3 levels "A","B","F": 1 2 3 2
## $ Age : num 23 56 NA 35
## $ Code : Factor w/ 3 levels "ABC","DEF","XYZ": 1 2 NA 3
## $ Amount: num 48.98 2.72 0 3.14
## $ Start : Date, format: "2014-01-01" "2014-01-02" ...
Read Excel file 2007 or later (.xlsx)
d2a <- read.xlsx("Sample.xlsx", sheetIndex = 1, stringsAsFactors = FALSE)
str(d2a)
## 'data.frame': 4 obs. of 5 variables:
## $ Grade : chr "A" "B" "F" "B"
## $ Age : num 23 56 NA 35
## $ Code : chr "ABC" "DEF" NA "XYZ"
## $ Amount: num 48.98 2.72 0 3.14
## $ Start : Date, format: "2014-01-01" "2014-01-02" ...
64-bit ActivePerl Community Edition works on Windows with either 32-bit or 64-bit R. http://www.activestate.com/activeperl.
library(gdata)
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
##
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
##
## Attaching package: 'gdata'
##
## The following object is masked from 'package:stats':
##
## nobs
##
## The following object is masked from 'package:utils':
##
## object.size
d3 <- read.xls("Sample.xls", sheet = 1) # .xls or .xlsx
dim(d3)
## [1] 4 5
str(d3)
## 'data.frame': 4 obs. of 5 variables:
## $ Grade : Factor w/ 3 levels "A","B","F": 1 2 3 2
## $ Age : int 23 56 NA 35
## $ Code : Factor w/ 4 levels "","ABC","DEF",..: 2 3 1 4
## $ Amount: num 48.98 2.72 0 3.14
## $ Start : Factor w/ 4 levels "","2014/01/01",..: 2 3 1 4
d3a <- read.xls("Sample.xlsx", sheet = 1, as.is = TRUE)
dim(d3a)
## [1] 4 5
str(d3a)
## 'data.frame': 4 obs. of 5 variables:
## $ Grade : chr "A" "B" "F" "B"
## $ Age : int 23 56 NA 35
## $ Code : chr "ABC" "DEF" "" "XYZ"
## $ Amount: num 48.98 2.72 0 3.14
## $ Start : int 41640 41641 NA 41644
Note: “odbcConnectExcel is only usable with 32-bit Windows” with 32-bit R.
channel <- odbcConnectExcel("Sample.xls") # Excel 2003 and earlier
#channel <- odbcConnectExcel2007("Sample.xlsx") # Excel 2007 and later
sqlTables(channel)
d4 <- sqlFetch(channel, "Sample") # Worksheet name
#'data.frame': 4 obs. of 5 variables:
# $ Grade : Factor w/ 3 levels "A","B","F": 1 2 3 2
# $ Age : num 23 56 NA 35
# $ Code : Factor w/ 3 levels "ABC","DEF","XYZ": 1 2 NA 3
# $ Amount: num 48.98 2.72 0 3.14
# $ Start : POSIXct, format: "2014-01-01" "2014-01-02" NA "2014-01-05"
d4a <- sqlQuery(channel, paste("SELECT Code,Amount,Start,Age,Grade",
"FROM [Sample$]",
"WHERE Grade = 'B'"),
as.is=TRUE)
dim(d4a)
#[1] 2 5
str(d4a)
#'data.frame': 2 obs. of 5 variables:
# $ Code : chr "DEF" "XYZ"
# $ Amount: num 2.72 3.14
# $ Start : chr "2014-01-02 00:00:00" "2014-01-05 00:00:00"
# $ Age : num 56 35
# $ Grade : chr "B" "B"
close(channel)
My advice: RODBC should only be used if you have an Excel file without holes, and with very regular numeric data. Without very regular data, RODBC can guess the wrong data types and cause other problems.
See Madelaine's notes for additional RODBC examples – and some RMySQL examples.
efg, 2014-01-08