Chromebook Data Science
We’re finally able to announce the official launch of our newest MOOC, Chromebook Data Science, a set of 12 courses offered on the Leanpub platform. Jeff Leek has explained this program in a separate blog post in detail, but briefly here, these MOOCs are our attempt to minimize all barriers of entry into data science. These courses are pay what you want, so the entire course set can be taken at no cost. All the learning happens through a web browser, so any laptop or Chromebook can be used to complete the material. And, the content has been developed without the requirement for any background knowledge in computing.
The point of this blog post, however, is to thank and note all of the people outside of our group whose work helped make the development of this content possible.
Thank You
In addition to content developed by members of our group, we have built upon the work of others to generate the content in these courses. As we worked to develop the content, I did my best to keep an exhaustive list of everyone’s work we leaned on to develop this content. This post is my humble attempt to thank all these people.
Big Thanks
It probably goes without saying that much of the content generated has been either directly influenced or indirectly inspired by the work of Hadley Wickham and Jenny Bryan. Specifically, however, we would be remiss not to thank Hadley for both writing R for Data Science and for his contributions to the tidyverse
packages. Additionally, we rely heavily on Jenny Bryan’s instructional approach to teaching version control and the googlesheets
package throughout the Course Set.
Beyond this, I think the best way to give individual thanks would be by course in our MOOC. This way, each of you knows where your work has been used and can most easily see how we’ve used and attributed your work.
Data Tidying
In the Data Tidying course, learners in this course set are taught within the tidy data framework. These concepts would not be easily accessible and programatically-relatable to new learners without Hadley’s (and others'!) contributions to the tidyverse
set of packages.
Additionally, we used examples of data tidying in this course from Miles McBain and Sharla Gelfand to demonstrate what untidy data are and what it looks like once those data have been tidied. Thank you to Sharla and Miles for their wonderful blog posts demonstrating data tidying:
- Tidying the Australian Same Sex Marriage Postal Survey Data with R, by Miles McBain
- Tidying and mapping Toronto open data, by Sharla Gelfand
Lastly, in this course we relied heavily on Suzan Baert’s four amazing dplyr
tutorials. I’ve attributed her work throughout the lessons and have linked to her blog posts in our courses. If you haven’t looked through them yet, I highly recommend it: Part 1 Part 2 Part 3 Part 4
Data Visualization
Data Visualziation is taught in this course set using ggplot2
exclusively, so more thanks to Hadley for his work and all contributors to the ggplot2
package!
Additionally, we used a graph from a blog post by Lisa Charlotte Rost to demonstrate how to take a plot from exploratory and unpolished to polished and ready for publication. If you’re unfamiliar with Lisa Charlotte Rost’s work in data visualization (spoiler: she’s amazing!), check out the Datawrapper and their blog.
Getting Data
In the Getting Data course, we owe thanks to:
- Jenny Bryan for her
googlesheets
package and incredible ability to write helpful documentation - Tyler Clavelle for his blogpost Using R to extract data from web APIs
- Kan Nishida, for his blogpost Working with JSON data in very simple way
- Jose Roberto Ayala Solares, for his Web Scraping Tutorial in R
Data Analysis
In the Data Analysis course, we relied heavily on David Robinson’s blogpost, Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half, as a wonderful example of how to formulate a data science question and determine if you have the data you need, and for his contributions to the tidytext
package.
These lessons in this course also benefitted from:
- Nick Tierney’s awesome work, in particular his
neato
package for visualizing missing data. - Max Kuhn’s incredible
caret
package, for predictive modeling
- Michael Hoffman and Carl de Boer’s helpful Twitter discussion about predictive modeling terminology
Written & Oral Communication in Data Science
In Written and Oral Communication in Data Science, we utilized the work of others as examples of how to communicate effectively as a data scientist:
- Julia Silge’s Text Mining the tidy Way presentation - video visuals (and for her contributions to the
tidytext
package) - Lucy D’Agostino McGowan’s Harnessing the Power of the Web via R Clients for Web APIs presentation
- Suzan Baert’s four-part data wrangling series, as an example of how to write a “How-To” blog post
- Hilary Parker’s “Writing an R package from Scratch”, as an example of how to write a “How-To” blog post
- Greg Wilson’s Twitter Thread on Meetings, as guidelines for when and how to have meetings.
Getting a Job in Data Science
In Getting a Job in Data Science we’re thankful for contributions from:
- Yihui Xie, for the
blogdown
package and his blogpost You Do Not Need to Tell Me I have a Typo in My Documentation - Renee Teate, for being supportive of and helpful to others on Twitter who are interested in getting a job in data science
- Emily Robinson, for her blogpost Advice For Applying to Data Science jobs among many helpful blopost
- Mona Chalabi, for having a project galery on her website (and for her stunning art & data visualizations)
- David Robinson, for his website, which we used as an example
- Nathan Yau, for Flowing Data, which we used as an example
- Kyle Scot Shank, for helping us out and providing an example of a data science interview take-home
- Mikhail Popov, from the Wikimedia Foundation for contributing Data Analysis Task publicly, as an example of a task one may have to complete during an interivew