MOOC Development: It Takes a Village

Chromebook Data Science

We’re finally able to announce the official launch of our newest MOOC, Chromebook Data Science, a set of 12 courses offered on the Leanpub platform. Jeff Leek has explained this program in a separate blog post in detail, but briefly here, these MOOCs are our attempt to minimize all barriers of entry into data science. These courses are pay what you want, so the entire course set can be taken at no cost. All the learning happens through a web browser, so any laptop or Chromebook can be used to complete the material. And, the content has been developed without the requirement for any background knowledge in computing.

The point of this blog post, however, is to thank and note all of the people outside of our group whose work helped make the development of this content possible.

Thank You

In addition to content developed by members of our group, we have built upon the work of others to generate the content in these courses. As we worked to develop the content, I did my best to keep an exhaustive list of everyone’s work we leaned on to develop this content. This post is my humble attempt to thank all these people.

Big Thanks

It probably goes without saying that much of the content generated has been either directly influenced or indirectly inspired by the work of Hadley Wickham and Jenny Bryan. Specifically, however, we would be remiss not to thank Hadley for both writing R for Data Science and for his contributions to the tidyverse packages. Additionally, we rely heavily on Jenny Bryan’s instructional approach to teaching version control and the googlesheets package throughout the Course Set.

Beyond this, I think the best way to give individual thanks would be by course in our MOOC. This way, each of you knows where your work has been used and can most easily see how we’ve used and attributed your work.

Data Tidying

In the Data Tidying course, learners in this course set are taught within the tidy data framework. These concepts would not be easily accessible and programatically-relatable to new learners without Hadley’s (and others'!) contributions to the tidyverse set of packages.

Additionally, we used examples of data tidying in this course from Miles McBain and Sharla Gelfand to demonstrate what untidy data are and what it looks like once those data have been tidied. Thank you to Sharla and Miles for their wonderful blog posts demonstrating data tidying:

Lastly, in this course we relied heavily on Suzan Baert’s four amazing dplyr tutorials. I’ve attributed her work throughout the lessons and have linked to her blog posts in our courses. If you haven’t looked through them yet, I highly recommend it: Part 1 Part 2 Part 3 Part 4

Data Visualization

Data Visualziation is taught in this course set using ggplot2 exclusively, so more thanks to Hadley for his work and all contributors to the ggplot2 package!

Additionally, we used a graph from a blog post by Lisa Charlotte Rost to demonstrate how to take a plot from exploratory and unpolished to polished and ready for publication. If you’re unfamiliar with Lisa Charlotte Rost’s work in data visualization (spoiler: she’s amazing!), check out the Datawrapper and their blog.

Getting Data

In the Getting Data course, we owe thanks to:

Jenny Bryan for her googlesheets package and incredible ability to write helpful documentation
Tyler Clavelle for his blogpost Using R to extract data from web APIs
Kan Nishida, for his blogpost Working with JSON data in very simple way
Jose Roberto Ayala Solares, for his Web Scraping Tutorial in R

Data Analysis

In the Data Analysis course, we relied heavily on David Robinson’s blogpost, Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half, as a wonderful example of how to formulate a data science question and determine if you have the data you need, and for his contributions to the tidytext package.

These lessons in this course also benefitted from:

Nick Tierney’s awesome work, in particular his neato package for visualizing missing data.
Max Kuhn’s incredible caret package, for predictive modeling

Michael Hoffman and Carl de Boer’s helpful Twitter discussion about predictive modeling terminology

Written & Oral Communication in Data Science

In Written and Oral Communication in Data Science, we utilized the work of others as examples of how to communicate effectively as a data scientist:

Julia Silge’s Text Mining the tidy Way presentation - video visuals (and for her contributions to the tidytext package)
Lucy D’Agostino McGowan’s Harnessing the Power of the Web via R Clients for Web APIs presentation
Suzan Baert’s four-part data wrangling series, as an example of how to write a “How-To” blog post
Hilary Parker’s “Writing an R package from Scratch”, as an example of how to write a “How-To” blog post
Greg Wilson’s Twitter Thread on Meetings, as guidelines for when and how to have meetings.

Getting a Job in Data Science

In Getting a Job in Data Science we’re thankful for contributions from:

Yihui Xie, for the blogdown package and his blogpost You Do Not Need to Tell Me I have a Typo in My Documentation
Renee Teate, for being supportive of and helpful to others on Twitter who are interested in getting a job in data science
Emily Robinson, for her blogpost Advice For Applying to Data Science jobs among many helpful blopost
Mona Chalabi, for having a project galery on her website (and for her stunning art & data visualizations)
David Robinson, for his website, which we used as an example
Nathan Yau, for Flowing Data, which we used as an example
Kyle Scot Shank, for helping us out and providing an example of a data science interview take-home
Mikhail Popov, from the Wikimedia Foundation for contributing Data Analysis Task publicly, as an example of a task one may have to complete during an interivew