top of page

The State of Data Science in Zimbabwe Part 2: Report

  • Writer: Simbarashe Chikaura
    Simbarashe Chikaura
  • Aug 8, 2019
  • 11 min read

This is the second and final instalment of my 2 part blog series “The State of Data Science in Zimbabwe”. If you aren’t acquainted with Data Science as a discipline, I’d advise you read Part 1: “What is Data Science?” Otherwise, proceed. Get ready for a LOT of visualisations.


Introduction

If you were to perform a Google search query along the lines of “the State of Data Science”, you’ll find quite a few interesting reports. Some of the ones I’ve followed religiously over the last few years are the ones compiled by Anaconda, CrowdFlower and Stackoverflow. However, as informative as these documents are, they’ll only go as far as enriching you with insights into what the data science space looks like in America, and maybe India & China, at best. Nothing about Africa, and definitely nothing about Zimbabwe. This can become restrictive if you are someone interested in Data Science and how it fits in a Zimbabwean context but can’t find the necessary results to help you make sense of it. This is what this report is for. It’s meant to be a resource for anyone who would like to have a holistic snapshot of data science locally.


What are the key dimensions of the report?

  1. Methodology: Which techniques and resources were used to arrive at the findings in this report?

  2. Industry: Which industries have the most analytics professionals, and in what categories?

  3. Talent: What is the education and skills background of the people working in the field?

  4. Institutions: What institutions are actively promoting or offering data science related programs?

  5. Popularity: How has the popularity of data science changed over the last few years?

  6. Challenges facing data analytics professionals.

  7. Conclusion


1. Methodology

I used a survey to gather insights from analytics professionals in Zimbabwe. I closed the survey when it reached a cap of 100 responses. The sample of respondents was selected using stratified random sampling. The strata with the most responses was selected from LinkedIn. The LinkedIn search query I used was “data analyst OR data scientist”, with the location set as Zimbabwe. The LinkedIn search engine is smart enough to also return profiles with titles that are related to what you’re searching for. For instance, the search query also returned profiles matching ‘Business Analyst’, ‘Business Intelligence Analyst’ & ‘Statistician’. However, it should be noted that LinkedIn only returns profiles up to which you’ve a 3rd degree connection to, so it’s possible that if someone else with a different profile to mine were to carry out the same search query, they’d get different results, that is, a different sample. Therefore, this method is not entirely random (you can read more about 1st, 2nd and 3rd degree connections here). Other sources of respondents were from Twitter and the Data Science Zimbabwe membership.


In addition to distributing the questionnaire, I used a scraper to mine data from the returned profiles. There were really only three data elements that I considered relevant: Current Position (job title and name of employer), Education (most recent institution and field of study) and Experience (position, organisation, and duration of previous roles). Limiting myself to these three elements not only saved time in writing and debugging the scraper but was also my attempt at minimising the scope of potential liabilities from not adhering to LinkedIn’s terms of service (it’s somewhat illegal to use a bot on their site, even though it is public data).


After filtering out data science aspirants, students and profiles with insufficient information, I was left with 818 data scientist profiles (down from 1032). I used Tableau as my primary tool for producing the visualisations in this report, along with Python for the data cleaning and manipulation.


2. Industry

Based on survey results, the industry with the largest proportion of data analytics professionals is Software, Tech and Telecoms at 30%. Next to it is the Financial Services sector with a share of 19%, as the pie chart above conveniently shows. These 2 industries make up nearly half of all survey respondents. Assuming (naively) that the sample used for the survey was completely random, we can assume that Zimbabwe’s data analytics professionals are distributed in these 2 industries more than any other sector.

Cross referencing the LinkedIn dataset mentioned earlier, the company with the greatest number of data professionals is Econet Wireless Zimbabwe. This not only reaffirms the point made earlier that the majority of survey respondents are from the telecoms sector, but it’s something which should be expected given that Econet probably has the richest repository of data in the country. It would only make sense for them to hire professionals to mine insights from that data.


Looking at the bar plot, we can see that the organisations that follow after Econet are mostly financial institutions: Steward Bank, Deloitte, BancABC, KPMG, FBC, CABS, CBZ, etc. Risk analytics, fraud detection and financial projections are all data science use cases in finance based institutions. Therefore, it is no surprise that there is a significant number of professionals working in this sector as well.

Each bar in the plot above represents the year in which an analytics professional began their current position. In other words, this can be interpreted as the year in which candidates got hired. What this then translates to is that data analytics positions/vacancies have been on the rise in the last 10 years, peaking in 2018. 2017 - 2018 can be described as the period in which Zimbabwean companies seriously prioritised data analytics in their organisations.


3. Talent

More than half of Data Analytics professionals in Zimbabwe who took part in the study have Bachelors degrees (55%). Of the subset that holds Bachelors degrees, an overwhelming majority hold either a Statistics degree or a Computer Science degree. This should come as no surprise, as 2/3s of the triumvirate of Data Science are made up of these fields.


36% of respondents noted a Masters degree as their highest level of qualification. The qualification with the most respondents in this subgroup is the MSc Data Analytics, most probably from Chinhoyi University of Technology (CUT). It has gained a lot of popularity in recent years. It is closely followed by individuals with a MSc in Statistics and/or Operations Research.


Unsurprisingly, numerate degrees make up the bulk of the qualifications held by data professionals.

28% of survey respondents identified with the title Data Analyst, with only 10% referring to themselves as Data Scientists.

According to the survey, data analytics professionals are more likely to be found working in the IT department. This is not surprising when you factor in the skills and tools required to perform data analysis. Research and Development is the second most staffed department, with a share of around 22%. One of the key deliverables of data science is the exploration of the unknown to produce insights that will drive new products and services. Professionals working in IT and R&D make up more than half of all the departments in this survey. Surprisingly, professionals in a designated Data Analytics department make up only 4% of respondents in the survey. This suggests that organisations prefer in-line data professionals, or they all simply are identified as IT professionals.

The most widely used tool by the majority of data professionals is some form of scripting language: Python, R, etc. This is followed by everyone’s favourite data manipulation software, Excel. BI tools come in 3rd, closely followed by Database tools. It is important to note that data professionals who identify as Data Analysts seem to be the majority users in almost every tool category.

No doubt data science can be carried out by a host of tools, but probably the most popular are the open source tools. A healthy majority of survey respondents reported that their organisations had no problem allowing open source to be a part of the analytics workflow. This is important because corporate companies often have a strict policy towards which software can be used to handle sensitive information. As open source is not licensed, there’s no binding document to ensure security.

Around 92% of survey respondents are familiar with at least one programming language, with >45% being classified as intermediate users at the very least.

61% of survey respondents have up-skilled themselves in the last 5 years using Massive Open Online Courses (MOOCs). This is imperative for anyone pursuing a career in data analytics because the technologies change and evolve literally every year.

Data Visualisation is the most popular task carried out by data pros. As with the tools, Data Analysts have the greatest distribution across all tasks. However, it is concerning that ‘Predictive Analytics/Machine Learning’ and ‘Hypothesis formulation’ are 2 of the least carried out tasks by data professionals. Without these 2, one cannot confidently say that they are carrying out data science.


4. Institutions

Although there’s an absolute legion of Data Science learning resources on the internet, it seems very few are oriented to Zimbabwe particularly. This may affect the perception and popularity of the discipline in Zimbabwe because some people might feel it doesn’t apply to Zimbabwe’s problems.


Training Institutions

When I carried out a google search query along the lines of “data science training Zimbabwe”, it produced a few noteworthy results. I’ll outline them below.


a) The Knowledge Academy

Although I didn’t have time (or access) to audit the course material, they seem to be based in the UK, as the number listed under their ‘Contact Us’ has a British code (+44). Their Data Science Analytics course comes at a pricey cost of £500. The course outline seems to be that of a typical introduction to data science course.


b) Ulearn Systems

The second one has an Indian code (+91). They do not provide how much the course costs. The course outline is that of an intermediate level data scientist.


(The fact that these 2 sites popped up on my search results is likely due to search engine optimisation (SEO) more than anything else. Nothing about their course material suggests that they’re specifically tailored for a Zimbabwean audience.)


c) 2KO Zimbabwe

The third one, 2KO Zimbabwe, has a South African code (+27) under their ‘About Us’. However, they do have a local presence as they actually offer in house training to organisations or teams across Zimbabwe. Their Data Analysis with Python and Pandas Online Course comes at a cost of R900. The course focuses on data wrangling and manipulation and doesn’t go much beyond that.


d) MSc Data Analytics - Chinhoyi University of Technology

Upon digging deeper into the search query rabbit hole, I came across the MSc Data Analytics masters currently being offered by CUT. According to the ‘About’ section on the CUT website:


The Master of Science degree in Data Analytics is designed to equip business professionals, mathematicians, information technologists, economists and social scientists with specific tools and techniques involved in analytics to create value and wealth. The programme focuses on data and analytics combining a strong academic programme with practical applications tied to strategic decision making.

The Master of Science Degree in Data Analytics comprises three taught Semesters followed by a Dissertation.


This MSc was recently launched by CUT in 2018 and it is very popular in Zimbabwe’s Data Analytics Community. A considerable chunk of the people I’ve engaged on data science matters are pursuing that qualification. This is a qualification I can vouch for to a degree (pun intended), because I’ve seen people use the skills they’re gaining from it to good effect. However, its content will probably only be useful to people with non-numerate first degrees, ie., degrees that do not have a strong bearing in Statistics, Mathematics and Computer Science. Another worthy point to raise is that the head lecturer of the department, Dr Shepard Makurumidze, has a BSc in Economics, MBA and a Phd in Finance; all financial disciplines. While these disciplines intersect with Statistics/Maths to a certain extent, they don’t pursue, in depth, the required statistical concepts for data science. I believe this in turn affects the scope of the qualification.


e) Statistical Data Analysis Training Programme - Harare Institute of Technology

The Harare Institute of Technology also has a STATISTICAL DATA ANALYSIS TRAINING PROGRAMME that runs on select weekends throughout the year. They offer training on statistical packages like SPSS, R, MATLAB, etc for a combined fee of USD 500.


Data Science Zimbabwe

Data Science Zimbabwe is a vibrant group of data science/analytics professionals and enthusiasts from a host of different backgrounds which include traditional statistics, business intelligence, accounting, IT, web development, among others. Its membership comprises of individuals already practising some level of data science and those who would like to join the field as well.


The group held its inaugural meet-up in December 2018, where it partnered with HIT, PyCon Zim and PyData. You can visit the group’s website here or join the WhatsApp group using this link if you’re interested.


5. Popularity

The search query ‘Data Science’ (and its associated terms) has steadily been growing in popularity in Zimbabwe over the last 5 years, peaking in 2018. 2019 is not over, so popularity might still be on the increase.


6. Challenges

According to survey respondents, the biggest challenge facing the progression of data science in Zimbabwe is Data Availability. This is particularly true in public institutions where records are yet to be properly digitised, as well as being made readily available to the public for scrutiny.


Some organisations also seem to not have fully embraced the 4th industrial revolution and its tools. Some companies are still handling their data using very inefficient methods, failing to extract the best value from it.


The lack of a visible data science ecosystem is also an issue. Although groups like Data Science Zimbabwe and PyCon Zim exist, they’re yet to fully assert a digital footprint on the internet. For instance, if you google 'Data Science Zimbabwe' at the time of publishing this report, the DSZ website won't even be on the first page of search results. The website needs SEO so that it is the first thing that pops up whenever someone makes such a query. This makes entering the field of data science quite daunting if one is a newbie, as there seemingly won't be any support. Also, making a strong case to both the private and public sectors of the importance of data science becomes increasingly difficult if the people practising are not pushing with one voice.


7. Conclusion: What does this all mean for Data Science in Zimbabwe

This report covered a lot, but here is a summary of the highlights and what they imply:

  • The Software, Tech & Telecoms and Financial Services industries have the greatest share of data analytics professionals. If you are thinking of pursuing a career in data analytics, you would be best advised to have those 2 at the top of your list;

  • The Econet group of companies employs data professionals more than any other organisation in Zimbabwe. This is probably due to the quantity and quality of the data it generates, as well as its data/information oriented products;

  • The last 3 years (2019 included) have witnessed a large surge in the demand of data professionals in Zimbabwe. This is the time to pursue the field or learn the skills that will make you a data scientist;

  • The vast majority of professionals practising data analytics in Zimbabwe have numerate degrees, with a considerable size holding either a Statistics or Computer Science degree. Individuals who do not have a first degree in this category are up-skilling using the MSc Data Analytics degree offered at CUT;

  • Approximately 50% of data professionals who took part in this study identify as either a Data Analyst, Business Analyst or Business Intelligence Analyst. Only 10% identify as Data Scientists;

  • 97% of survey respondents claim that they use scripting languages (Python, R, etc) as part of their workflow. If you’re looking into going into data science, I’d recommend learning at least one of those tools;

  • Hypothesis Formulation and Predictive Analytics/Machine Learning were the least carried out tasks by the survey respondents. This suggests that the majority of data professionals are not data scientists (yet), but rather Data Analysts or Business Intelligence Analysts. In other words, they’re telling the same story routinely using data tools and internal data without forming new hypothesis to explore new possibilities;

  • The only institution currently offering a verifiable and structured data analytics training programme in Zimbabwe is CUT through its MSc Data Analytics programme. However, its scope and expertise can still be improved on. (I will say this as humbly as I can. I went through the outline of the courses in the MSc and I've personally learnt the majority of the content through MOOCs on my own without spending a single ZWL $);

  • Data Science Zimbabwe is currently the largest grouping of data professionals in the country with close to 190 members. However, its membership is mostly confined to WhatsApp. It needs to develop an online presence, as well as a membership platform on its official website. More content from diverse backgrounds should be generated by the group’s members to further push the case of data science in Zimbabwe (something similar to Data Science Central or the Medium Towards Data Science platforms). A slack/stackoverflow channel would also be beneficial for sharing technical problems and solutions brought forward by data professionals;

  • Data Science/Analytics is more popular than it has ever been in Zimbabwe at this current moment. This is the time to capitalise on it, whether you’re a corporate strategy formulator, data practitioner or someone wishing to break into the field.

In conclusion, the State of Data Science in Zimbabwe is currently in its formative years, which is exciting. Its now up to those involved in the field to make something out of it.




If you would like to know more about the author (me), please take some time to visit my portfolio on the following link. I am available on a freelance basis if you need any assistance on data related projects. You can find my contact details on my homepage.

 
 
 

Comments


©2019 by Simbarashe Chikaura

bottom of page