Richard Taylor Predictive Insights Zindi champion 2023
Richard Taylor

Meet Richard Taylor, one of the four team members who won the UCT Hackathon based on the Predictive Insights | Zindi Youth Income Prediction challenge. As an enthusiastic and intellectually curious student, Richard is intrigued by the junction between computer systems and people, and harnessing the power of technology to solve human problems. This has naturally led to him to exploring the field of data science – a journey he is loving so far. Currently, Richard is studying towards his Honours in Computer Science. 

Richard's experience

Q: Is this the first time you’ve participated in a competition like the Predictive Insights Zindi challenge? What was your experience?
  • I had no previous experience in hackathons of this style, and as a team (consisting of myself, Gareth Warburton, Jeremy Simpson, and Zuleigha Patel). We learnt most things on the fly and collaborated together to reach all the results and conclusions.

Approach an methodology

Q: What was your approach to tackling the Youth Income Prediction challenge?
  • Our approach involved two main tasks: data processing and model building.

    The data we used had a pooled cross-sectional structure, including different socio-economic outcomes in a specific round of the survey. It also included the current employment status and various information about geography and education.

    We split this data into test and training data – where the training data has our target variable (an unemployment dummy for 6 months’ time). For the purposes of this section, we explored relationships between the target variable and other variables using the training data set.

    Our team applied many pre-processing and feature engineering methods to clean and transform the data. For the sake of brevity, only the more interesting ones will be included below:

        • We combined “Math” and “Mathlit” into “Maths_combined”
        • We then created a binary feature “downturn” based on values in the “Round” column
        • As a team, we decided to use ElasticNet to discard irrelevant features
        • To handle missing values, we filled them with the minimum or the mode, depending on whether the variable was numerical or categorical.
        • We created various variable interaction variables. The heterogeneity of the labour market in across South Africa was the motivation for some interactions. 
          We also created interactions with the gender variable, motivated by the existence of gender barriers in the South African labour market. Finally, we interacted age and tenure in order to account for any non-linearities between these variables and the target.

Q: Can you walk us through the key components of your methodology and the techniques you employed in your predictive model?

Our model was a stacked classifier model. This type of ensemble model runs the processed data through a series of base-learners. We then combined the predictions of each base learner with the target variable. The resulting dataset was then fed in to a Neural Net. This allowed us to determine the best weight to give each model.

The base learning models in this case were XGBoost, AdaBoost, Bernoulli Naive Bayes, Gaussian Naive Bayes and K Nearest Neighbours (KNN).

Other methodologies we attempted included logistic regression and voting ensemble classifier. However these performed less effectively than the Stacked Classifier model. 

Challenges faced and future goals

Q: Were there any specific features or patterns in the data that were particulary informative for your model?
  • In the context of the South African Labour Market, there is a well established link between educational attainment and the likelihood of being unemployed. We ran a linear probability model of the target on the matric and tertiary education variables as well as maths marks. Both matric and degree had a statistically significant effect on the probability of unemployment in the next round. However, Maths mark was not statistically significant.

    Economic status vastly affects the rate of unemployment in a province. 

    Finally, we noticed that the proportion of our training sample who are unemployed varied by round. This indicated that the “Round” might play an important role in determining whether or not someone is employed.
Q: What are your future plans or goals in the field of data science, and how has this competition influenced your trajectory?
  • The word “influence” would be an understatement. Before the competition I had little knowledge or interest in the data science. I only decided to compete with some friends because “it looks like a pretty cool thing to do”. Also, free pizza was served at the event!

    After the competition, I have become hooked into the field of data science! The Zindi challenge felt like a perfect application of the skills accumulated in my undergraduate science degree, and was lots of fun to compete in. I definitely plan to work in the field of Data Science in the future!

Passionate about using data science for good?
Check out the 2023 UCT Hackathon.