Meet Richard Taylor, one of the four team members who won the UCT Hackathon based on the Predictive Insights | Zindi Youth Income Prediction challenge. As an enthusiastic and intellectually curious student, Richard is intrigued by the junction between computer systems and people, and harnessing the power of technology to solve human problems. This has naturally led to him to exploring the field of data science – a journey he is loving so far. Currently, Richard is studying towards his Honours in Computer Science.
The data we used had a pooled cross-sectional structure, including different socio-economic outcomes in a specific round of the survey. It also included the current employment status and various information about geography and education.
We split this data into test and training data – where the training data has our target variable (an unemployment dummy for 6 months’ time). For the purposes of this section, we explored relationships between the target variable and other variables using the training data set.
Our team applied many pre-processing and feature engineering methods to clean and transform the data. For the sake of brevity, only the more interesting ones will be included below:
Q: Can you walk us through the key components of your methodology and the techniques you employed in your predictive model?
Our model was a stacked classifier model. This type of ensemble model runs the processed data through a series of base-learners. We then combined the predictions of each base learner with the target variable. The resulting dataset was then fed in to a Neural Net. This allowed us to determine the best weight to give each model.
The base learning models in this case were XGBoost, AdaBoost, Bernoulli Naive Bayes, Gaussian Naive Bayes and K Nearest Neighbours (KNN).
Other methodologies we attempted included logistic regression and voting ensemble classifier. However these performed less effectively than the Stacked Classifier model.