Youth unemployment prediction: Richard Taylor on winning the UCT Hackathon

Written by Predictive Insights | Apr 17, 2026 12:36:30 PM

Meet Richard Taylor, one of the four students behind the winning team in the UCT Hackathon based on the Predictive Insights | Zindi Youth Income Prediction challenge.

Richard is drawn to the intersection of computer systems and human outcomes. That curiosity has led him into data science, where he is currently building his skills while completing his Honors in Computer Science.

In this interview, he shares how his team approached the challenge, what worked, and how the experience shaped his career direction.

First hackathon, real pressure

When asked whether this was his first challenge of this kind, Richard was clear: it was.

He and teammates Gareth Warburton, Jeremy Simpson and Zuleigha Patel had limited prior hackathon experience, so much of the process was learned in real time. Their edge came from fast collaboration, clear role-sharing and a willingness to test ideas quickly.

How the team approached the youth income prediction challenge

The team split the work into two streams:

Data processing and feature engineering
Model development and evaluation

The data had a pooled cross-sectional structure and included employment outcomes, geography, education and socio-economic indicators. They split the dataset into training and test sets, using the training set to explore relationships between input variables and the six-month unemployment target.

Feature engineering that made a difference

The team used several practical transformations to improve model signal:

Combined separate maths-related fields into a single Maths_combined feature
Created a binary downturn variable from survey round data
Used ElasticNet to reduce irrelevant features
Imputed missing values by variable type (minimum for numeric fields, mode for categorical fields)
Built interaction features to reflect labor-market heterogeneity, including:
- geography interactions
- gender interactions
- age-tenure interactions to capture non-linear effects

This balance of domain thinking and model discipline helped improve downstream performance.

Why a stacked model outperformed simpler alternatives

Their best-performing approach was a stacked classifier.

They first trained multiple base learners, then fed those predictions into a neural network that learned the optimal weighting across models. Base learners included:

XGBoost
AdaBoost
Bernoulli Naive Bayes
Gaussian Naive Bayes
K-Nearest Neighbors (KNN)

They also tested logistic regression and a voting ensemble, but these underperformed relative to the stacked architecture.

Patterns in the data that stood out

The team observed several patterns consistent with South African labor-market dynamics:

Educational attainment had a meaningful link to unemployment likelihood
Matric and tertiary education showed statistically significant effects
Provincial economic context mattered materially
Unemployment prevalence varied by survey round, suggesting time-period effects were important

These insights reinforced the value of combining statistical analysis with machine-learning workflows, rather than treating modelling as a black box.

From curiosity to career direction

Richard described the challenge as a turning point.

Before the competition, data science was not a major focus area. After the hackathon, it became a serious career path. The experience showed how technical skills can be applied to real social and economic problems, and how quickly capabilities can grow in the right team environment.

Final takeaway

This project is a strong reminder that high-performing models are rarely just about algorithms.

They come from:

clear problem framing
disciplined data preparation
smart feature design
rigorous experimentation
and collaborative execution under constraints

For teams working on labor-market forecasting or social-impact modelling, that combination is what moves a project from interesting to useful.

View full post