Author Archive | Patrick

Strange behavior from the cut function with dates in R

Update: @hadlywickham tweeted to me to let me know that  daylight savings time was the culprit. Though this explains the behavior I document in the first part of this post, the behavior of the cut function using truncated dates (discussed further down the post) is still unexplained.

I recently encountered some strange behavior from R when using the cut.POSIXt method with “day” as the interval specification. This function isn’t working as I intended and I doubt that it is working properly. I’ll show you the behavior I’m seeing (and what I was expecting) then I’ll show you my current base R workaround. To generate a reproducible example, I’ll use this latemail function I gleaned from this stack overflow post.

latemail <- function(N, st="2013/01/01", et="2013/12/31") {
 st <- as.POSIXct(as.Date(st))
 et <- as.POSIXct(as.Date(et))
 dt <- as.numeric(difftime(et,st,unit="sec"))
 ev <- sort(runif(N, 0, dt))
 rt <- st + ev
 }

And generate some data…


set.seed(7110)
#generate 1000 random POSIXlt dates and times
bar<-data.frame("date"=latemail(1000, st="2013/03/02", et="2013/03/30"))
# assign factors based on the day portion of the POSIXlt object
bar$dateCut <- cut(bar$date, "day", labels = FALSE)

I expected that all rows with the date 2013-03-01 would receive factor 1, all rows with the date 2013-03-02 would receive factor 2, and so on. At first glance this seems to be what is happening.

head(bar, 10)
     date                 dateCut
1    2013-03-01 19:10:31  1
2    2013-03-01 19:31:31  1
3    2013-03-01 19:55:02  1
4    2013-03-01 20:09:36  1
5    2013-03-01 20:13:32  1
6    2013-03-01 22:15:42  1
7    2013-03-01 22:16:06  1
8    2013-03-01 23:41:50  1
9    2013-03-02 00:30:53  2
10   2013-03-02 01:08:52  2

Note that at row 9 the date changes from March 1 to March 2 and the factor (dateCut) changes from 1 to 2. So far so good. But we shall see some strange things in the midnight hour.  
Continue Reading →

Share
6

Insert random NAs in a vector in R

I was recently writing a function which was going to need to deal with NAs in some kind of semi-intelligent way. I wanted to test it with some fake data, meaning that I was going to need a vector with some random NAs sprinkled in. After a few disappointing google searches and a stack overflow post or two that left something to be desired, I sat down, thought for a few minutes, and came up with this.

#create a vector of random values
 foo <- rnorm(n=100, mean=20, sd=5)
#randomly choose 15 indices to replace
#this is the step in which I thought I was clever
#because I use which() and %in% in the same line
 ind <- which(foo %in% sample(foo, 15))
#now replace those indices in foo with NA
 foo[ind]<-NA
#here is our vector with 15 random NAs 
 foo

Not especially game changing but more elegant than any of the solutions I found on the interwebs, so there it is FTW.

Share
10

GIS in R: Part 1

I messed around with R for years without really learning how to use it properly. I think it’s because I could always throw my hands up when the going got tough and run back and cling the skirts of Excel or JMP or Systat. I finally learned how to use R when I needed to do a fairly hefty GIS project and I didn’t have access to a computer with ArcGIS and couldn’t afford to buy it (who can?). So I started looking into R’s spatial abilities.

Admittedly R might not be the most obvious choice for free GIS options, combinations of QGIS (http://www.qgis.org/), GRASS (http://grass.osgeo.org/), PostGIS (http://postgis.refractions.net/), or OpenGeo (http://boundlessgeo.com/solutions/opengeo-suite/download/) might pop up in google searches before R. R might not even be the first general purpose programming language you think of for GIS, especially now that ArcGIS relies on Python for much of its modeling. However, all of these tools have a significant learning curve, and I was farther along in R than any of these alternatives, so I started googling and watching tutorial videos. So should you be using R for analyzing and displaying spatial data? If you already know a little or a lot of R, if you need a cross platform solution, or need to do some fairly heavy stats applications to your spatial data, R just might be a good solution for you. It turns out R has lots of support for spatial data and does a great job displaying it too.

There are a number of packages useful for analyzing and displaying your spatial data. I think the 4 most useful right out of the gate are sp, rgdal, maptools, and raster. If you haven’t installed packages before do this…

install.packages(“sp”)
install.packages(“raster”)
install.packages(“maptools”)

…and if you are on a Windows machine…

install.packages(“rgdal”)

If you’re on a Mac, installing rgdal is a little tricky. Give this a try

setRepositories(ind=1:2)
install.packages(“rgdal”)

If that doesn’t work read this over.
http://blog.fellstat.com/?p=46

After installing the packages, if you want to use the functions contained in that package you need to load the library. To use the functions in the sp package, you should type

library(sp)

to load the rgdal package…

library(rgdal)

etc.

Share
1

Winter Solstice Survival Guide 2013

The most wonderful time of the year, obviously.

The most wonderful time of the year, obviously.

’tis the season to consume! But for this part of the year, as the Winter Solstice approaches, we consume not for ourselves, but for others, and that’s generally a good thing. Each year we here at Science… sort of like to put together a list of suggestions for the science-inclined in your life. Whether you’re looking to give the gift of science, or fleshing out your own list to send to your regionally-appropriate gift giving elf-spirit, this list should have you covered.

We begin with Patrick, who really went above and beyond with contribution, earning him well-deserved top billing.

Continue Reading →

Share
1

Why are Birds Dinosaurs?

nationalgeographic.com

Month after month, one of the most popular posts on the Paleocave blog is the How to Read a Cladogram post I did some time ago. I always intended to follow it up with more cladistic fun. So, hold onto your butts, we’re going to let the dinosaurs loose.

Birds are dinosaurs. We’ve all heard this. But does that phrase make any sense? Not really. Dinosaurs, for the most part, are things that were really big, were mostly scaly, had fantastic teeth, and are extinct. Birds, on the other hand, don’t have teeth, are generally small, and are covered in feathers (I know that you know that lots of old school dinosaurs had feathers too, but whatever). So, why do we say that birds are dinosaurs? The answer involves evolution and the meaning of taxonomic names in biology.

Continue Reading →

Share
13

Writing a for-loop in R

freeimages.co.uk

freeimages.co.uk

There may be no R topic that is more controversial than the humble for-loop. And, to top it off, good help is hard to find. I was astounded by the lack of useful posts when I googled “for loops in R” (the top return linked to a page that did not exist). In fact, even searching for help within R is not easy and not even that helpful when successful (?for won’t get you anywhere. ?'for' will get you the help page but it is by no means exhaustive.) So, at the request of Sam, a faithful reader of the Paleocave blog, I’m going to throw my hat into the ring and brace myself for the potential onslaught of internet troll wrath.

How to loop in R

Use the for loop if you want to do the same task a specific number of times.
It looks like this.

for (counter in vector) {commands}

I’m going to set up a loop to square every element of my dataset, foo, which contains the odd integers from 1 to 100 (keep in mind that vectorizing would be faster for my trivial example – see below).


foo = seq(1, 100, by=2)

foo.squared = NULL

for (i in 1:50 ) {
foo.squared[i] = foo[i]^2
}

If the creation of a new vector is the goal, first you have to set up a vector to store things in prior to running the loop. This is the foo.squared = NULL part. This was a hard lesson for me to learn. R doesn’t like being told to operate on a vector that doesn’t exist yet. So, we set up an empty vector to add stuff to later (note that this isn’t the most speed efficient way to do this, but it’s fairly fool-proof). Next, the real for-loop begins. This code says we’ll loop 50 times(1:50). The counter we set up is ‘i’ (but you can put whatever variable name you want there). For our new vector foo.squared, the ith element will equal the number of loops that we are on (for the first loop, i=1; second loop, i=2).
Continue Reading →

Share
53

Science-y New Year’s Resolution: Learn to Code

Matrix-codeIn a 1995 interview Steve Jobs said he thought that computer programming should be a liberal art. In other words, he thought everyone’s education should include a year of learning a computer language, because it teaches you how to think in a certain way. If that was true in 1995, just think how much more crucial knowing how to code in some language is today. Perhaps learning a computer language should be on your to-do list; maybe a new year’s resolution?

If you want to learn a computer language a logical question would be which one to learn?

Continue Reading →

Share
0

moRe

more_more_more_main_a2Hopefully my first R post whetted your apatite for open source data software.  I’m gearing up for more R posts regardless.  I thought I’d do a quick post about a couple of useful commands, ‘View’ and ‘fix’. When you first break the shackles of Excel one of the toughest things is not being able to see your data. Try this, fire up R (go download it and install it if you haven’t already) and let’s call up a built-in dataset by typing

volcano

Continue Reading →

Share
4

Award Season

Hi Paleoposse… It’s podcast award season again.  Here at Science… sort of we always view these things a little ambiguously. We, as a group of podcasters, don’t have too much ambition as far as winning a category goes. But, we get a significant amount of new website traffic (and presumably new listeners) from the little bit of buzz these awards generate.  So if you have a few minutes and want to help out the show, go visit the podcast awards and nominate us (voting comes later).  This year Stitcher has decided to get into the game, we don’t quite know what to expect from them, but again, being nominated certainly can’t hurt (and we aren’t as highly ranked on stitcher as we’d like to be). So go nominate us for a Stitcher award too if you are feeling generous.

Thanks for your support!

Share
1

Mind Like Kindle

I ordered a Kindle about 6 months ago. In my solstice shopping guide, back in December, I talked a little bit about how much I liked it. But, I don’t just like it… it has sort of changed my life.

It has me reading more than I have in years – but the real change was the realization that it brought me to. That realization was that I am living a life full of distractions. Email, Facebook, Twitter, various blogs, sports results, and the news cycle in general, these things all compete for my attention – and these are just the non-work related items. But, we all know this already. I knew it intellectually, but I didn’t know it intuitively.
Continue Reading →

Share
3

Powered by WordPress. Designed by Woo Themes