Tag Archives: cut

Strange behavior from the cut function with dates in R

Update: @hadlywickham tweeted to me to let me know that  daylight savings time was the culprit. Though this explains the behavior I document in the first part of this post, the behavior of the cut function using truncated dates (discussed further down the post) is still unexplained.

I recently encountered some strange behavior from R when using the cut.POSIXt method with “day” as the interval specification. This function isn’t working as I intended and I doubt that it is working properly. I’ll show you the behavior I’m seeing (and what I was expecting) then I’ll show you my current base R workaround. To generate a reproducible example, I’ll use this latemail function I gleaned from this stack overflow post.

latemail <- function(N, st="2013/01/01", et="2013/12/31") {
 st <- as.POSIXct(as.Date(st))
 et <- as.POSIXct(as.Date(et))
 dt <- as.numeric(difftime(et,st,unit="sec"))
 ev <- sort(runif(N, 0, dt))
 rt <- st + ev
 }

And generate some data…


set.seed(7110)
#generate 1000 random POSIXlt dates and times
bar<-data.frame("date"=latemail(1000, st="2013/03/02", et="2013/03/30"))
# assign factors based on the day portion of the POSIXlt object
bar$dateCut <- cut(bar$date, "day", labels = FALSE)

I expected that all rows with the date 2013-03-01 would receive factor 1, all rows with the date 2013-03-02 would receive factor 2, and so on. At first glance this seems to be what is happening.

head(bar, 10)
     date                 dateCut
1    2013-03-01 19:10:31  1
2    2013-03-01 19:31:31  1
3    2013-03-01 19:55:02  1
4    2013-03-01 20:09:36  1
5    2013-03-01 20:13:32  1
6    2013-03-01 22:15:42  1
7    2013-03-01 22:16:06  1
8    2013-03-01 23:41:50  1
9    2013-03-02 00:30:53  2
10   2013-03-02 01:08:52  2

Note that at row 9 the date changes from March 1 to March 2 and the factor (dateCut) changes from 1 to 2. So far so good. But we shall see some strange things in the midnight hour.  
Continue reading Strange behavior from the cut function with dates in R

Share