# Insert random NAs in a vector in R

I was recently writing a function which was going to need to deal with NAs in some kind of semi-intelligent way. I wanted to test it with some fake data, meaning that I was going to need a vector with some random NAs sprinkled in. After a few disappointing google searches and a stack overflow post or two that left something to be desired, I sat down, thought for a few minutes, and came up with this.

```#create a vector of random values
```
` foo <- rnorm(n=100, mean=20, sd=5)`
```#randomly choose 15 indices to replace
#this is the step in which I thought I was clever
#because I use which() and %in% in the same line
ind <- which(foo %in% sample(foo, 15))```
```#now replace those indices in foo with NA
foo[ind]<-NA```
```#here is our vector with 15 random NAs
foo```

Not especially game changing but more elegant than any of the solutions I found on the interwebs, so there it is FTW.

### 10 Responses to Insert random NAs in a vector in R

1. M. Clapham 30 July, 2014 at 8:00 pm #

You can just do:
foo <- rnorm(n=100, mean=20, sd=5)
foo[sample(1:length(foo),15)] <- NA

2. Gavin 30 July, 2014 at 8:03 pm #

You don’t need which() or even the %in%. Just use ind <- sample(seq_along(foo), 15). Or replace seq_along() with length() for that matter.

If you want to keep the %in% part, leave off the which(); you can use the resulting logical vector to index without the which() to extract the indices of the TRUE values.

3. Vanessa 30 July, 2014 at 10:32 pm #

foo[sample(1:length(foo),10)] = NA

• Foo 31 July, 2014 at 4:34 am #

foo[sample(seq(foo), 15)] <- NA

4. wsobala 30 July, 2014 at 11:03 pm #

Try this:
foo <- rnorm(n=100, mean=20, sd=5)
foo[sample.int(length(foo), 15)] <- NA_real_

5. Patrick 31 July, 2014 at 3:06 am #

Good call everyone on avoiding the index vector. A post on R-bloggers will result in clean code, that’s for sure.

6. J 31 July, 2014 at 1:25 pm #

Here’s what I’ve been using (for data frames)…
################################################################
#
#
######### The create random missing values function.
#
# First, grab some data.

summary(no.miss)
nrow(no.miss)
ncol(no.miss)

# Next, take a sample of the data.

samp <- no.miss[sample(seq(1:nrow(no.miss)), 500, replace = FALSE),]
summary(samp)
nrow(samp)
ncol(samp)

# Now, create the function.

a.prob <- .05 # <— This is the proportion of missing data.
b.prob <- 1 – a.prob

create.NA <- function(data, impute.cols = NULL){
sample(c(NA,1), 1, prob = c(a.prob,b.prob), replace = T)
}

# Apply the function (across rows and columns); while leaving out
# the row ID column (column #1).

samp.na <- is.na(apply(samp[,-1], c(1,2), create.NA))
samp.NA <- samp[,-1]

for (j in 1:ncol(samp.na)){
out <- which(samp.na[,j] == "TRUE")
samp.NA[out,j] <- NA
}; rm(a.prob, b.prob, j, out, create.NA, samp.na)

# Reassemble the data (with the row ID column).

samp.NA <- data.frame(samp[,1], samp.NA)
names(samp.NA)[1] <- "id"
summary(samp.NA)
nrow(samp.NA)
ncol(samp.NA)

# End script.
################################################################

Cheers,
Jon

7. Bill Denney 2 August, 2014 at 4:06 am #

The original code will usually work if you don’t have the same value multiple times in foo, but if you do, then you may not have 15 eliminated. For example:

foo <- c(letters, letters)

will always eliminate the letters in pairs with the original code.

8. Andreas 3 September, 2015 at 4:20 am #

You are a life-saver patrick.