Data Science programming languages: (1) Resources for R
This month, let’s turn our attention to Data Science programming languages; today, resources for R.
Ever since the rise of R as an alternative to traditional statistical packages (like SAS, IBM Analytics etc), there has been a growing focus on coding.
In the past I have tended to avoid these programming languages as a topic for this blog, as I have some concerns. Namely that the role of insight analysts, in the migration to job title of Data Scientist, is being reduced to that of a programmer. Too much focus on coding skills & the capabilities of new packages, can reduce the needed focus on interpretation, insight generation & influencing a business.
However, working with clients, I am seeing that a majority are now embracing this new generation of analytics tools/languages. So, I thought it would make sense to (hopefully) help readers by sharing the resources I have found online for a few of the most popular options.
In this post we will focus on the R programming language.
Resources for R: Learning the language
It makes sense to start by sharing how you or your team could learn this language. As more & more businesses seek to migrate from more expensive toolsets (like SAS) to use of R, what resources exist to accelerate learning curves?
If you google search on this topic, you will find no end of blog posts, tutorials and videos providing introductions to the language. These are of varying quality and some focus on helping you feel like you can produce something visible ASAP, rather than necessarily mastering the basics. Whilst others are, frankly, too boring.
As often with training, it pays to first validate the experience and aptitude of the tutor. For that reason, I am going to recommend a popular book by an author I know personally. Richard Pugh is Chief Data Scientist at Mango Solutions and a leading light (Silver Director) within the R Consortium. With his excellent work, he is doing as much as anyone to advance the understanding and use of R within the UK.
To learn R, quickly and well, I recommend his very accessible book (together with coauthors Andy Nicholls & Aimee Gott), “R in 24 hours”:
In just 24 lessons of one hour or less, Sams Teach Yourself R in 24 Hours helps you learn all the R skills you need to solve a wide spectrum of real-world data analysis problems. You’ll master the entire data analysis workflow, learning to build code that’s efficient, reproducible, and suitable for sharing with others.
Resources for R: Cheatsheets to help you remember
As your knowledge of R grows, especially through use of more packages to expand the capabilities of the basic language (more on that soon), you may struggle to remember how to do everything. The growing use of coding languages by analysts has, not surprisingly, also spawned a growing set of infographics or ‘cheat sheets‘ as memory joggers.
A particularly useful collection of cheat sheets, for R coders, is maintained and published by R Studio. Hopefully you will find one or more to help you at different stages of analytics process here. There are also some very handy contributed ones, not just those from R Studio themselves:
Open source and enterprise-ready professional software for data science
Resources for R: More packages for more functionality
Each of the programming languages we will look at this week have limitations in the basic programming language. R is strong in its core focus, on statistics & data science, but somewhat lacking in some of the data manipulation or logic functionality you might expect from other programming languages.
Where it really comes into its own though, is the strength of the community or eco system that surrounds its use. The open source heritage of R still shows strongly, even if many modern users have purchased commercial implementations (e.g. Microsoft et al). That community has published a huge number of really helpful packages as add-ons to the basic language. Here is a collection, published by Computer Weekly, that impressed me as useful or of interest to the analytics work I know teams need to deliver:
But hard-core R programmers will already know that a great resource for R packages, frameworks and other aids, is GitHub. If you are serious about R programming, it is worth becoming familiar with this library of technical wonders. Here is just one example, the “Awesome R” curation:
A curated list of awesome R packages and tools. Inspired by awesome-machine-learning. for Top 50 CRAN downloaded packages or repos with 400+ gt – Easily generate information-rich, publication-quality tables from R Integrated Development Environment VSCode – vscode-R + vscode-r-lsp VSCode R Langauage Support RStudio – A powerful and productive user interface for R.
Resources for R: Join the tribe at Earl conferences
I hope the above links and their resources have helped you, either learn R or improve your capabilities with the language.
But, human beings, even programmers, are social creatures and it can help to meet with others to share what you’ve discovered and learn more. We often share recommended events on this blog, because we believe there is a still a role for face to face events in today’s L&D plans.
If you want to truly meet the R community and connect with others who might be facing similar challenges and have expertise to share, then a great option if the EARL conference. The full title of this event is “Enterprise Applications of the R Language” and it’s focus it just that. You’ll find plenty of expert speakers as well as active enthusiasts. As the Mango Solutions team are also actively involved, you might even bump into Richard Pugh there (you could get him to sign your book).
As I write, the next one in UK is the London event in September 2017. Here are the details:
Resources for R: what has helped you?
I hope that was a useful collection of R resources to help you. Do you have others to share? If so, please publish those links in the comments box below. If there are particularly popular options, I will then update this post to include them.
You might have guessed, that our next post will focus on Python. So, watch out for resources for that language next. Plus, there will be more languages to come…