Thursday, April 11, 2013

Tips to incread the data reading speed in R

Here are some tips to increase the data reading speed in R:

#The purpose is to compare the processing time when using different options to read in data with read.table
# test to read /data02/temp/temp_hsong/test/hsb12.sas7bdat
  
#1: without any options 
system.time(read.table("/data02/temp/temp_hsong/test/sasxport.txt", header=T))
 
#2: By default R will convert character vars into factors wile reading
#suppress R convert character variables into factors by stringsAsFactors=F
system.time(read.table("/data02/temp/temp_hsong/test/sasxport.txt", header=T, stringsAsFactors=F))
 
#3: if data has no comment sign, then tell R
system.time(read.table("/data02/temp/temp_hsong/test/sasxport.txt", header=T, comment.char=''))
 
#4: roughly tell R a number slightly greater than the number of records
system.time(read.table("/data02/temp/temp_hsong/test/sasxport.txt", header=T, nrows=8000))
 
#5: When reading data, it's better tell R the mode of each var by colClasses
system.time(read.table("/data02/temp/temp_hsong/test/sasxport.txt", header=T, sep=',', colClasses=c(rep("numeric",11))))

The processing time is:

1 comment: