Insert random NAs in a vector in R

I was recently writing a function which was going to need to deal with NAs in some kind of semi-intelligent way. I wanted to test it with some fake data, meaning that I was going to need a vector with some random NAs sprinkled in. After a few disappointing google searches and a stack overflow post or two that left something to be desired, I sat down, thought for a few minutes, and came up with this.

#create a vector of random values
 foo <- rnorm(n=100, mean=20, sd=5)
#randomly choose 15 indices to replace
#this is the step in which I thought I was clever
#because I use which() and %in% in the same line
 ind <- which(foo %in% sample(foo, 15))
#now replace those indices in foo with NA
 foo[ind]<-NA
#here is our vector with 15 random NAs 
 foo

Not especially game changing but more elegant than any of the solutions I found on the interwebs, so there it is FTW.

Share

, , ,

9 Responses to Insert random NAs in a vector in R

  1. M. ClaphamNo Gravatar 30 July, 2014 at 8:00 pm #

    You can just do:
    foo <- rnorm(n=100, mean=20, sd=5)
    foo[sample(1:length(foo),15)] <- NA

  2. GavinNo Gravatar 30 July, 2014 at 8:03 pm #

    You don’t need which() or even the %in%. Just use ind <- sample(seq_along(foo), 15). Or replace seq_along() with length() for that matter.

    If you want to keep the %in% part, leave off the which(); you can use the resulting logical vector to index without the which() to extract the indices of the TRUE values.

  3. VanessaNo Gravatar 30 July, 2014 at 10:32 pm #

    foo[sample(1:length(foo),10)] = NA

    • FooNo Gravatar 31 July, 2014 at 4:34 am #

      foo[sample(seq(foo), 15)] <- NA

  4. wsobalaNo Gravatar 30 July, 2014 at 11:03 pm #

    Try this:
    foo <- rnorm(n=100, mean=20, sd=5)
    foo[sample.int(length(foo), 15)] <- NA_real_

  5. PatrickNo Gravatar 31 July, 2014 at 3:06 am #

    Good call everyone on avoiding the index vector. A post on R-bloggers will result in clean code, that’s for sure.

  6. JNo Gravatar 31 July, 2014 at 1:25 pm #

    Here’s what I’ve been using (for data frames)…
    ################################################################
    #
    #
    ######### The create random missing values function.
    #
    # First, grab some data.

    no.miss <- read.table("http://www.unt.edu/rss/class/Jon/R_SC/Module4/missForest_noMiss.txt&quot;,
    header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
    summary(no.miss)
    nrow(no.miss)
    ncol(no.miss)

    # Next, take a sample of the data.

    samp <- no.miss[sample(seq(1:nrow(no.miss)), 500, replace = FALSE),]
    summary(samp)
    nrow(samp)
    ncol(samp)

    # Now, create the function.

    a.prob <- .05 # <— This is the proportion of missing data.
    b.prob <- 1 – a.prob

    create.NA <- function(data, impute.cols = NULL){
    sample(c(NA,1), 1, prob = c(a.prob,b.prob), replace = T)
    }

    # Apply the function (across rows and columns); while leaving out
    # the row ID column (column #1).

    samp.na <- is.na(apply(samp[,-1], c(1,2), create.NA))
    samp.NA <- samp[,-1]

    for (j in 1:ncol(samp.na)){
    out <- which(samp.na[,j] == "TRUE")
    samp.NA[out,j] <- NA
    }; rm(a.prob, b.prob, j, out, create.NA, samp.na)

    # Reassemble the data (with the row ID column).

    samp.NA <- data.frame(samp[,1], samp.NA)
    names(samp.NA)[1] <- "id"
    summary(samp.NA)
    nrow(samp.NA)
    ncol(samp.NA)

    # End script.
    ################################################################

    Cheers,
    Jon

  7. Bill DenneyNo Gravatar 2 August, 2014 at 4:06 am #

    The original code will usually work if you don’t have the same value multiple times in foo, but if you do, then you may not have 15 eliminated. For example:

    foo <- c(letters, letters)

    will always eliminate the letters in pairs with the original code.

Leave a Reply


*

Powered by WordPress. Designed by Woo Themes