Insert random NAs in a vector in R

I was recently writing a function which was going to need to deal with NAs in some kind of semi-intelligent way. I wanted to test it with some fake data, meaning that I was going to need a vector with some random NAs sprinkled in. After a few disappointing google searches and a stack overflow post or two that left something to be desired, I sat down, thought for a few minutes, and came up with this.

#create a vector of random values

 foo <- rnorm(n=100, mean=20, sd=5)

#randomly choose 15 indices to replace
#this is the step in which I thought I was clever
#because I use which() and %in% in the same line
 ind <- which(foo %in% sample(foo, 15))

#now replace those indices in foo with NA
 foo[ind]<-NA

#here is our vector with 15 random NAs 
 foo

Not especially game changing but more elegant than any of the solutions I found on the interwebs, so there it is FTW.

10 thoughts on “Insert random NAs in a vector in R”

You can just do:
foo <- rnorm(n=100, mean=20, sd=5)
foo[sample(1:length(foo),15)] <- NA

You don’t need which() or even the %in%. Just use ind <- sample(seq_along(foo), 15). Or replace seq_along() with length() for that matter.

If you want to keep the %in% part, leave off the which(); you can use the resulting logical vector to index without the which() to extract the indices of the TRUE values.

foo[sample(1:length(foo),10)] = NA

Foo says:

31 July, 2014 at 4:34 am

foo[sample(seq(foo), 15)] <- NA

Reply

Try this:
foo <- rnorm(n=100, mean=20, sd=5)
foo[sample.int(length(foo), 15)] <- NA_real_

Good call everyone on avoiding the index vector. A post on R-bloggers will result in clean code, that’s for sure.

Here’s what I’ve been using (for data frames)…
################################################################
#
#
######### The create random missing values function.
#
# First, grab some data.

no.miss <- read.table("http://www.unt.edu/rss/class/Jon/R_SC/Module4/missForest_noMiss.txt",
header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
summary(no.miss)
nrow(no.miss)
ncol(no.miss)

# Next, take a sample of the data.

samp <- no.miss[sample(seq(1:nrow(no.miss)), 500, replace = FALSE),]
summary(samp)
nrow(samp)
ncol(samp)

# Now, create the function.

a.prob <- .05 # <— This is the proportion of missing data.
b.prob <- 1 – a.prob

create.NA <- function(data, impute.cols = NULL){
sample(c(NA,1), 1, prob = c(a.prob,b.prob), replace = T)
}

# Apply the function (across rows and columns); while leaving out
# the row ID column (column #1).

samp.na <- is.na(apply(samp[,-1], c(1,2), create.NA))
samp.NA <- samp[,-1]

for (j in 1:ncol(samp.na)){
out <- which(samp.na[,j] == "TRUE")
samp.NA[out,j] <- NA
}; rm(a.prob, b.prob, j, out, create.NA, samp.na)

# Reassemble the data (with the row ID column).

samp.NA <- data.frame(samp[,1], samp.NA)
names(samp.NA)[1] <- "id"
summary(samp.NA)
nrow(samp.NA)
ncol(samp.NA)

# End script.
################################################################

Cheers,
Jon

J says:

31 July, 2014 at 1:27 pm

Baaahhh!
The data URL was cut-off above….
Here it is…
http://www.unt.edu/rss/class/Jon/R_SC/Module4/missForest_noMiss.txt

Reply

The original code will usually work if you don’t have the same value multiple times in foo, but if you do, then you may not have 15 eliminated. For example:

foo <- c(letters, letters)

will always eliminate the letters in pairs with the original code.

You are a life-saver patrick.

Paleocave Blog

Trust us, we're scientists

Insert random NAs in a vector in R

10 thoughts on “Insert random NAs in a vector in R”

Leave a Reply Cancel reply

Share this:

10 thoughts on “Insert random NAs in a vector in R”

Leave a Reply Cancel reply