If you’ve listened to the show for a while or if you’ve been reading the paleocave blog from the beginning (like when we actually used to update it regularly), then you might know that I’m rather fascinated with statistics. Imagine my delight a few years ago when I found out that one of the most powerful statistical tools available (the one that most of the cool kids use) was available for free! That tool is called R. It’s a great tool but a terrible name. R is named both for the developers Robert Gentleman and Ross Ihaka (Robert and Ross), and as a sort of pun because it was an open source rewrite of the S language. That’s cool, I guess, but R as a name is horrible search engine optimization. Oh well, keeps out the riff-raff I suppose.
The vast majority of people would call R a programming language. Real computer programmers (the kind of people that argue about Ruby vs Perl) will tell you it’s not really a ‘language,’ it’s a ‘programming environment.’ Whatever, I don’t think I really know the difference. Don’t get intimidated, because it’s pretty easy to do as much or as little as you want in R.
I know what you’re thinking. “I don’t want to mess with that. I want something with a point and click interface and dropdown menus.” You probably do – now, but once you see what the possibilities are your curiosity will be piqued and you’ll learn how to do more than a point and click interface ever could (plus this is free, remember). Think of point and click sort of like public transportation. Right now you just want a way to get to the grocery store because it’s too far to walk. Are you going to learn to drive or just take the bus? You take the bus, less time and resources required. But later, you learn to drive and realize you can go anywhere you want. Maybe you occasionally still take the bus when it’s really convenient but sometimes you want to go someplace nobody else ever goes.
You’re still skeptical – I know, I was too. Here’s a hook. When I show this to many people, they start sitting up straight and listening. The hook is the histogram -that old statistical standby. Ever try to make one in Excel? It’s basically impossible. Download R, install it, open it, there’s some legalish text at the top of the screen and then a prompt that looks like this >
First let’s assign a data set a name. Type “data” “=” “c” and open some parentheses “(“ inside of those parentheses, type in your data points separated by comas, now close the parentheses ”)”. You just assigned the name data to that data set. Now make a histogram from it. Type “hist” “(data)” Hit “return.” Bam! Histogram!
> data=c(1,3,4,6,7,5,7,8,9,7,8,6,7,4,5,6,4,3,10,11,13,2,3)
> hist(data)
It’s that easy! (if you cheated and copied and pasted my text, make sure to delete the prompts “>” before hitting return).
If you are starting to like what you see and you want to get some of your data stored in excel spreadsheets easily into R, I recommend Googling the “scan” command. Not the most elegant way of getting data into R but good for your first time out on the road (I still use it probably more than I should).
If you are starting to think you might really use R, you might want to invest in some books to show you the ropes. I have A First Course in Statistical Programming with R. I have also heard reasonably good things about Statistical Analysis with R. Both of these books are light on statistics and heavy on R. So if you are looking to brush up on stats, you probably need something like Using R for Introductory Statistics, though I really can’t speak to how good or bad this book is because I’ve never used it.
Lastly, I figure some people out there might be looking to learn something about programming languages (or environments as the case may be) and wonder if R is a good place to start. Well in my opinion it’s a fairly gentle start into learning a programming language. What I don’t know is how well skills you’ve learned will translate into other harder hitting languages later on. You can read other people’s opinion on the matter (much more informed than my own) here http://www.psychwire.co.uk/2011/05/is-r-an-ideal-language-to-teach-the-fundamentals-of-programming-to-beginners/. Make sure you check out the comments for the back and forth discussion.
Ok, well enjoy getting started. Shoot me an email patrick[at]sciencesortof.com if you get a kick out of using R or want me to try to help you on something and I’ll expose my ignorance (though I like R, I’m not particularly great at it, tweet the once and future paleopal, @jdyeakel for some real expertise).
Fine! I’m convinced and will redownload R to start playing with it. If only cause it’s free. You’re not wrong about it being horrible SEO, though.
Psst. If you want a free, R-based package that has a sweet GUI, you should check out SOFA (http://www.sofastatistics.com/faq.php).
Before I knew about R, i found the data entry to be a little confusing. But if you’ve had an introduction, it’ll probably be pretty easy!
R may not be a good choice for search engine optimisation, but they’ve overcome it, I searched for R on Google and it was the first result.
I’m frustrated with normal stats programs (in some respects, in others I love them), and I’m doing a 3 day R course to see if it can scratch my statistical itch. I’m looking forwards to it, the last time I used a programming language was BASIC in the mid 90s.
That’s awesome! Let us know what you learn in R bootcamp…
That’s cool, I guess, but R as a name is horrible search engine optimization. Oh well, keeps out the riff-raff I suppose.
The R community has already come up with a solution for that. Use Rseek to limit google queries to R related pages.
Karthik,
I did not know this. That’s pretty darn useful. Thanks.
I have heard that “R” is a riff of “S,” the language that most commercial software packages for statistics are based off of– and that the original developers names start with R.
I also really like R Studio as a GUI.
Jacquelyn,
Yeah, I think that is the most accurate explanation for the language (as some of the earlier comments allude to), but at the time I wrote the post, I wasn’t nearly as versed in the folk mythology of R.
You know, you’re right, and there’s no reason not to edit the post, so I will…
I’m extremely impressed with your writing skills as well as with the layout on your weblog.
Is this a paid theme or did you customize it yourself?
Anyway keep up the nice quality writing,
it is rare to see a great blog like this one today.