top of page

FREE Data Science Learning Roadmap

  • Writer: Simbarashe Chikaura
    Simbarashe Chikaura
  • Aug 21, 2019
  • 5 min read

It's been 4 years since I began my love affair with Data Science, and it has been one hell of a ride. It wasn't always that way, however, especially at the beginning. Although the internet is a treasure trove when it comes to learning resources, the decision of which resource to start with and what to learn first can become overwhelming for that very reason. After many iterations through multiple MOOCs, I was finally able to find my rhythm and nail down the requirements that a course needed to fit for me to be able to learn the necessary data science skills from it.


This blog post is a compilation of the courses that I found most helpful when I was just beginning. I have arranged them in the order that I think one should follow when acquiring skills to become a data scientist. All of these options are free!



Although it might seem tempting to dive straight into programming and machine learning, it is imperative to first build a solid relationship with data, especially if you don't come from a background that deals with numbers and structured data.

This course eases you into data analysis wthout using any complicated verbiage or confusing statistical terms. As a matter of fact, it doesn't even involve any programming. The tool used is everyone's favourite data processing package, excel. The main goal of the course is to teach you data literacy, i.e., how to think like a data analyst and the methods & procedures one should take to formulate relevant questions from a dataset so as to make data driven decisions.


Python and R are the 2 leading scripting languages used in data science. With the exception of the first course above, all the courses in this blog post are affiliated to Python. In my opinion, dataquest is easily the best learning resource on the internet for learning Python for data science. They've a very good Python Basics course that teaches all the programming fundamentals such as variables & data types, collections, conditional statements, loops and functions.

Although this particular course is free, the ones that follow it are not. If you wish to carry on with this course provider you'll have to pay US$ 29/mo (it's worth every penny). However, if you want to stick with the free options, simply register, choose the Data Scientist/Analyst with Python path, and do the first free course.


***Dataquest is currently offering a scholarship for all of their courses. Apply here. It closes on Sep 3, 2019***


This course introduces data anlysis in Python. It acquaints the learner with numpy, the library used for mathematical operations in python. Pandas, the tabular data wrangling package is also covered substantially in this course. Other important aspects such as exploratory data analysis and visualisations are also covered. The course caps off with a project through which the learner can test their new found skills. I highly recommend taking this course as data analysis is an imperative part of the data science process.


Tidy data is very rare in data science and analytics. You are almost always guaranteed data that is in a format you're not comfortable with: data that has missing values, wrong column names, etc.

This course takes the learner through a comprehensive data cleaning experience covering a host of scenarios. As a bonus, it even teaches the learner regular expressions.


As a data scientist you'll be dealing with data coming in from multiple sources. One of those sources is almost definitely going to be a database. There are 2 types of databases: SQL databases and No-SQL databases. This course focuses on SQL databases. SQL is short for Structured Query Language. It is a programming language used to manage, transfer and retrieve information from a database - where the data is stored. SQL is an invaluable tool for any data professional, and data scientists are no exceptions.


Mastering the command line is essential for anyone undertaking programming. The efficiency of the terminal is something that cannot be underestimated, and the basics should be on every data scientist's fingertips.

This DataCamp course has 5 chapters which introduce the basics of navigating the command line as a programmer. The first chapter is free, while the remaining 4 need a paid subscription. However, the first one is sufficient to get you up and running for what you'll need at this level. The rest can be learnt by correctly using Google.


Data Science needs a solid statistical foundation, and just knowing how to calculate the mean, median and mode won't cut it. When dealing with data one will often encounter situations which need the application of statistical modelling. Statistical modelling is a simplified, mathematically-formalized way to approximate reality (i.e. what generates your data) and optionally to make predictions from this approximation. This specialisation covers that fairly well.


Once you've a solid grip of statistical modelling the next step is to get acquainted with probability and inference. Statistical inference is the process of using data analysis to deduce properties of an underlying probability distribution. This course takes you through real life scenarios where this would be useful.


Machine Learning is probably the first concept when one hears of data science, and rightly so. It is probably the bridge between a data analyst and a data scientist - the ability to make predictions rather than just telling a story with data. This course from IBM eases you into using machine learning models very effectively. It sets you up nicely for courses which involve much more sophisticated concepts such as deep learning and neural networks.



This is the end of the list of suggested courses to get you started in Data Science. It is by no means exhaustive as the internet is filled with courses from multiple institutions that cover the same information. You should note that this is my opinion of the courses that I found helpful during my journey. Different institutions teach the same things differently. Likewise, learners receive the materials differently. It's not that some courses are better than others, sometimes it just ends up being a matter of preference. Make use of blog posts and articles on specific topics whenever you find that the courses are not sufficient. I always find that the DataCamp, Dataquest and Medium's Towards Data Science blogs have very good tutorials.


Also remember that data science certificates matter very little to employers. If you want something to prove your skill set, much rather include links to completed projects on something like a github portfolio or even a personal website. Data Science has been around for a few years now but it is still very much a new field. As such, there haven't been any standards agreed upon on what makes a good data science certification, so no one really cares about that piece of paper, what matters are your skills. Otherwise, happy learning. If you need any help in your journey, please feel free to contact me. My contact details can be found on my home page.

 
 
 

Commentaires


©2019 by Simbarashe Chikaura

bottom of page