R you ready for this? Statistics for free!

If you’ve listened to the show for a while or if you’ve been reading the paleocave blog from the beginning (like when we actually used to update it regularly), then you might know that I’m rather fascinated with statistics. Imagine my delight a few years ago when I found out that one of the most powerful statistical tools available (the one that most of the cool kids use) was available for free! That tool is called R.  It’s a great tool but a terrible name.  R is named both for the developers Robert Gentleman and Ross Ihaka (Robert and Ross), and as a sort of pun because it was an open source rewrite of the S language. That’s cool, I guess, but R as a name is horrible search engine optimization. Oh well, keeps out the riff-raff I suppose.

The vast majority of people would call R a programming language. Real computer programmers (the kind of people that argue about Ruby vs Perl) will tell you it’s not really a ‘language,’ it’s a ‘programming environment.’ Whatever, I don’t think I really know the difference.  Don’t get intimidated, because it’s pretty easy to do as much or as little as you want in R.

R screen shot

What’s a pirate’s favorite statistical programming language?

I know what you’re thinking.  “I don’t want to mess with that. I want something with a point and click interface and dropdown menus.”  You probably do ­– now, but once you see what the possibilities are your curiosity will be piqued and you’ll learn how to do more than a point and click interface ever could (plus this is free, remember).  Think of point and click sort of like public transportation.  Right now you just want a way to get to the grocery store because it’s too far to walk. Are you going to learn to drive or just take the bus?  You take the bus, less time and resources required.  But later, you learn to drive and realize you can go anywhere you want.  Maybe you occasionally still take the bus when it’s really convenient but sometimes you want to go someplace nobody else ever goes.

You’re still skeptical – I know, I was too.  Here’s a hook. When I show this to many people, they start sitting up straight and listening.  The hook is the histogram -that old statistical standby.  Ever try to make one in Excel?  It’s basically impossible. Download R, install it, open it, there’s some legalish text at the top of the screen and then a prompt that looks like this >

First let’s assign a data set a name. Type “data” “=” “c” and open some parentheses “(“ inside of those parentheses, type in your data points separated by comas, now close the parentheses ”)”. You just assigned the name data to that data set. Now make a histogram from it.  Type “hist” “(data)” Hit “return.” Bam! Histogram!

>            data=c(1,3,4,6,7,5,7,8,9,7,8,6,7,4,5,6,4,3,10,11,13,2,3)

>            hist(data)


It’s that easy! (if you cheated and copied and pasted my text, make sure to delete the prompts “>” before hitting return).

If you are starting to like what you see and you want to get some of your data stored in excel spreadsheets easily into R, I recommend Googling the “scan” command.  Not the most elegant way of getting data into R but good for your first time out on the road (I still use it probably more than I should).

If you are starting to think you might really use R, you might want to invest in some books to show you the ropes.  I have A First Course in Statistical Programming with R. I have also heard reasonably good things about Statistical Analysis with R.  Both of these books are light on statistics and heavy on R. So if you are looking to brush up on stats, you probably need something like Using R for Introductory Statistics, though I really can’t speak to how good or bad this book is because I’ve never used it.

Lastly, I figure some people out there might be looking to learn something about programming languages (or environments as the case may be) and wonder if R is a good place to start.  Well in my opinion it’s a fairly gentle start into learning a programming language.  What I don’t know is how well skills you’ve learned will translate into other harder hitting languages later on. You can read other people’s opinion on the matter (much more informed than my own) here http://www.psychwire.co.uk/2011/05/is-r-an-ideal-language-to-teach-the-fundamentals-of-programming-to-beginners/. Make sure you check out the comments for the back and forth discussion.

Ok, well enjoy getting started. Shoot me an email patrick[at]sciencesortof.com if you get a kick out of using R or want me to try to help you on something and I’ll expose my ignorance (though I like R, I’m not particularly great at it, tweet the once and future paleopal, @jdyeakel for some real expertise).


, , , , , ,

11 Responses to R you ready for this? Statistics for free!

  1. RyanNo Gravatar 7 May, 2011 at 2:48 pm #

    Fine! I’m convinced and will redownload R to start playing with it. If only cause it’s free. You’re not wrong about it being horrible SEO, though.

    • JacobNo Gravatar 9 May, 2011 at 1:10 pm #

      Psst. If you want a free, R-based package that has a sweet GUI, you should check out SOFA (http://www.sofastatistics.com/faq.php).

      Before I knew about R, i found the data entry to be a little confusing. But if you’ve had an introduction, it’ll probably be pretty easy!

  2. JohnNo Gravatar 18 May, 2011 at 7:25 am #

    R may not be a good choice for search engine optimisation, but they’ve overcome it, I searched for R on Google and it was the first result.

    I’m frustrated with normal stats programs (in some respects, in others I love them), and I’m doing a 3 day R course to see if it can scratch my statistical itch. I’m looking forwards to it, the last time I used a programming language was BASIC in the mid 90s.

    • PatrickNo Gravatar 19 May, 2011 at 9:49 am #

      That’s awesome! Let us know what you learn in R bootcamp…

  3. Karthik RamNo Gravatar 12 September, 2011 at 12:48 pm #

    That’s cool, I guess, but R as a name is horrible search engine optimization. Oh well, keeps out the riff-raff I suppose.

    The R community has already come up with a solution for that. Use Rseek to limit google queries to R related pages.

  4. PatrickNo Gravatar 12 September, 2011 at 1:12 pm #

    I did not know this. That’s pretty darn useful. Thanks.

  5. JacquelynNo Gravatar 28 December, 2012 at 6:32 pm #

    I have heard that “R” is a riff of “S,” the language that most commercial software packages for statistics are based off of– and that the original developers names start with R.

    I also really like R Studio as a GUI.

    • PatrickNo Gravatar 29 December, 2012 at 7:50 am #

      Yeah, I think that is the most accurate explanation for the language (as some of the earlier comments allude to), but at the time I wrote the post, I wasn’t nearly as versed in the folk mythology of R.

    • PatrickNo Gravatar 2 February, 2013 at 8:04 am #

      You know, you’re right, and there’s no reason not to edit the post, so I will…

  6. google.comNo Gravatar 13 December, 2013 at 10:33 pm #

    I’m extremely impressed with your writing skills as well as with the layout on your weblog.
    Is this a paid theme or did you customize it yourself?
    Anyway keep up the nice quality writing,
    it is rare to see a great blog like this one today.


  1. moRe | Paleocave Blog - 20 December, 2012

    […] my first R post whetted your apatite for open source data software.  I’m gearing up for more R posts […]

Leave a Reply


Powered by WordPress. Designed by Woo Themes