Update: @hadlywickham tweeted to me to let me know that daylight savings time was the culprit. Though this explains the behavior I document in the first part of this post, the behavior of the cut function using truncated dates (discussed further down the post) is still unexplained.
I recently encountered some strange behavior from R when using the cut.POSIXt method with “day” as the interval specification. This function isn’t working as I intended and I doubt that it is working properly. I’ll show you the behavior I’m seeing (and what I was expecting) then I’ll show you my current base R workaround. To generate a reproducible example, I’ll use this latemail function I gleaned from this stack overflow post.
latemail <- function(N, st="2013/01/01", et="2013/12/31") { st <- as.POSIXct(as.Date(st)) et <- as.POSIXct(as.Date(et)) dt <- as.numeric(difftime(et,st,unit="sec")) ev <- sort(runif(N, 0, dt)) rt <- st + ev }
And generate some data…
set.seed(7110) #generate 1000 random POSIXlt dates and times bar<-data.frame("date"=latemail(1000, st="2013/03/02", et="2013/03/30")) # assign factors based on the day portion of the POSIXlt object bar$dateCut <- cut(bar$date, "day", labels = FALSE)
I expected that all rows with the date 2013-03-01 would receive factor 1, all rows with the date 2013-03-02 would receive factor 2, and so on. At first glance this seems to be what is happening.
head(bar, 10)
date dateCut 1 2013-03-01 19:10:31 1 2 2013-03-01 19:31:31 1 3 2013-03-01 19:55:02 1 4 2013-03-01 20:09:36 1 5 2013-03-01 20:13:32 1 6 2013-03-01 22:15:42 1 7 2013-03-01 22:16:06 1 8 2013-03-01 23:41:50 1 9 2013-03-02 00:30:53 2 10 2013-03-02 01:08:52 2
Note that at row 9 the date changes from March 1 to March 2 and the factor (dateCut) changes from 1 to 2. So far so good. But we shall see some strange things in the midnight hour.
Continue reading Strange behavior from the cut function with dates in R