The State of Data Science in Zimbabwe - Part 1: What is Data Science?
- Simbarashe Chikaura
- Aug 8, 2019
- 6 min read
This blog post is the first of my inaugural 2-part series, “The State of Data Science in Zimbabwe”. You’ve probably heard the terms Data Science and/or Big Data Analytics tossed around in recent years. If you haven’t, that’s ok too. That’s what this post is for. My series is meant to look at Data Science in the Zimbabwean context, i.e., what is its status and how has it progressed.
Part 1 of this blog is meant to be a brief introduction into the discipline for anyone who is not familiar with Data Science, or if you simply need a refresher. If you consider yourself acquainted with the subject & do not want to go through another “What is Data Science?” article, please skip straight ahead to Part 2.
Introduction
I was probably in my final year when it dawned on me that I was running out of time to choose a career path that fit with my ambitions, goals and vision. Granted, one’s last year in college is cutting it too close when it comes to figuring out what you want to do with your degree, but if my year on attachment had taught me anything, it was that what I thought I wanted was not what I wanted. So on to Google I went and performed a search query along the lines of “What to do with a Statistics degree?” In the top 5 results, at least 3 had the term ‘Data Science’. I had never heard of it before. What was this ominous field that Forbes was describing as “the sexiest job of the 21st century”.
Definition
At the most abstract level, Data Science can be described as an intersection of the fields of Math/Statistics, Computer Science and Domain/Business Expertise. It is a set of tools, rules & procedures used to derive meaning & insights from data to provide solutions.
What does this mean exactly?

Let’s break down the Venn Diagram:
Math/Statistics is the subject area nerd types use to collect, group, analyse and communicate data, which is mostly in the form of numbers. This expertise can be used to solve and/or manage a number of problems. An easy to understand use case of Statistics is a national census – citizen data is collected, grouped using demographics and then the information is categorically communicated to stakeholders, as well as its associated implications. A more sophisticated use case of Statistics could be the street design of a city (location of traffic lights, time intervals of traffic light changes, streets that should be one-ways, etc). This can be done using a distribution of how vehicles are expected to move in the city based on population, frequency and geography.
Computer Science, another field which the guys in glasses dominate, is easier to grasp, when it comes to its application, anyway. From the very device you’re reading this on, to business enterprise packages like excel, software can be seen solving problems all around us. Fundamentally, computer science is used to simplify tasks that would otherwise be complex and time consuming. Take finding the average monthly sales of a retail store, for example. If computers did not exist, one would have to add these numbers manually, then divide by the total count of those numbers, on paper (remember arithmetic from those math classes in high school? You'd have to do that in real life). With a computer program like Excel, you can perform that very same task in seconds using a formula.
Both of these fields have been used in Business before, with software packages such as Excel, SAP, etc assisting business operations for decades now. Likewise, statistical methods have long been the bedrock of market researches, performance metrics and financial projections.
So what’s the BIG deal?

It’s clear that the fields that make up Data Science are nothing new. In fact, they’ve been around for a long time and have interacted with each other to varying degrees. What’s with the buzz words, then? What makes data science special? DATA, and lots of it. According to Data Never Sleeps, the world produces 2.5 exabytes of data in a day. To put that into context, that’s the equivalent of 312,500 flash drives of size 8GB. Now imagine that data being produced every single day. In fact, 90% of the data available today was produced in the last 2 years alone.

Fortunately, the data boom also occurred at the same time with advancements in tech. The cloud can now store all of this data with ease, as well as simplifying the tasks of accessing & retrieving it. In conjunction with other programming tools, manipulating the data & telling stories with it has also been made possible.
Applications
Now moving on to the last, and probably most important, part of my TED talk. Data Science use cases are literally endless, so I won’t attempt to list them all. I will, however, discuss the ones that I personally find the most interesting.
Fraud Detection
Every enterprise in which money is exchanged for a product or service is open to abuse, whether that be by theft or manipulation. Treating each transaction as a data point, the transactions can be viewed as a distribution which follow a certain pattern, as consumer behaviour is often repetitive. Whenever a disturbance to that pattern occurs, it can then be identified as possible fraud. Take for instance a mobile money service through which users transfer money to each other. A data algorithm can be trained to isolate users that perform transactions of anomalously high value, frequently. Data from other sources can then be integrated with these results, for example, the business backgrounds of the transactors. If the movements of money do not match the scale of the users’ businesses, they’re flagged for fraud.
Supply Chain Analytics
If the revenue of your business involves product sales, then you’re at a critical disadvantage if Supply Chain Management is not a chief driving factor of your business strategy. We now live in the era of globalisation. This means that certain raw materials or stock units can only be available from certain parts of the world. Keeping the final product competitively priced will mean choosing cheaper shipping options, i.e., sea, which in turn will increase lead/arrival times. All this will have to be managed along with limited storage space. Data Science provides a solution via supply chain planning, i.e., demand forecasting of sales, production and/or inventory optimisation. If you know how much you’re going to sale in advance, then you know how much you need to order and at what times. If this peaks your interest, follow this link to a project I did on Demand Forecasting for a Retail Company.
Healthcare
In what is probably one of the most important applications of data science, machine learning models can be trained to identify cancerous cells in a given sample. The way this works is that the mathematical model is fed with data of cells that are both cancerous and healthy. This data is labeled, respectively. The model can then learn from the features of the cells what constitutes a healthy cell, and what does not. When the model is then fed with a cell whose label is unknown, it can then check the parameters of the unknown cell to see if they correspond to those of a healthy cell or a cancerous one, based on what it has learnt from the labelled cells. This method has been found to not only be faster than a human manually evaluating the features, but more accurate as well.
Sentiment Analysis
You can only sell a product/service to an audience for as long as that audience feels like buying it. This is where social media is a gold mine. People now share everything about their lives on social media, worryingly so. But that can also be good news, because this means they’ll also share their views on your product/brand. These views can be mined & translated into meaning using text analytics. By tracking keywords (brand/product name), you can investigate the kind of terms that particular keyword is being associated with, on average. This is known as classification, & you can classify the sentiment of users/customers towards a particular brand as bad, good or neutral. Corrective action can then be taken if necessary. If you’d like to know more about this, please take a look at a project I did on Econet Sentiment Analysis on Twitter.
Conclusion
Technical complexity aside, Data Science is simply a new way of solving problems, and it’s rapidly increasing in popularity, too. It’s an efficient discipline that enables us to make informed decisions quickly. It’s a field that when used correctly, can improve the lives of people greatly. And the nice thing about data science, anyone can learn it, despite of technical background. If you're willing to put in the necessary hours, you'll be up and running in no time.
This is the end of the first part of this blog series. Please move on to Part 2 for a dive into the State of Data Science in Zimbabwe.
Very educative......
Great Job 🤝🤝