Thursday, December 6, 2012

R: check variables with missing data and how many missing(NA) data in a dataframe

It's easy to check with apply function:
cat("\014")
rm(list=ls())
 
seo_rev=read.csv("D:\\hsong\\SkyDrive\\Public\\seo_kw_rev_train.csv", header=T, skip=2, na.string='.')
 
names(seo_rev)<-tolower(names(seo_rev))
lapply(seo_rev, class)
str(seo_rev)
head(seo_rev)
 
all_var<-colnames(seo_rev)
## check which variables have missing value, and how many of the values are missing
getna<-function(x) sum(is.na(x)>0)
apply(seo_rev, 2, getna)
The output will list each variable's name with how many obs are missing for that variable.


No comments:

Post a Comment