Employability and Related Context Prediction Framework for University Graduands: A Machine Learning Approach

Manushi Prabhavi Wijayapala, Lalith Premaratne, Imali T. Jayamanne


In Sri Lanka (SL), graduands’ employability remains as a national issue due to the increasing number of graduates produced by higher education institutions each year. Thus, predicting the employability of university graduands can mitigate this issue since graduands can identify what qualifications or skills they need to strengthen up in order to find a job of their desired field with a good salary, before they complete the degree.

The main objective of the study is to discover the plausibility of applying machine learning approach efficiently and effectively towards predicting the employability and related context of university graduands in SL by proposing an architectural framework which consist of four modules; employment status prediction, job salary prediction, job field prediction and job relevance prediction of graduands while also comparing performance of classification algorithms under each prediction module. Series of machine learning algorithms such as C4.5, Naïve Bayes and AODE have been experimented on the Graduand Employment Census - 2014 data. A pre-processing step is proposed to overcome challenges embedded in graduand employability data and a feature selection process is proposed in order to reduce computational complexity. Additionally, parameter tuning is also done to get the most optimized parameters. More importantly this study utilizes several types of Sampling (Oversampling, Undersampling) and Ensemble (Bagging, Boosting, RF) techniques as well as a newly proposed hybrid approach to overcome the limitations caused by the class imbalance phenomena. For the validation purposes, wide range of evaluation measures was used to analyze the effectiveness of applying classification algorithms and class imbalance mitigation techniques on the dataset. Experimented results indicated that RandomForest has recorded the highest classification performance for 3 modules, achieving the selected best predictive models under hybrid approach having a ROC AUC interpretation as an ‘Excellent’ experiment, while a C4.5 Decision Tree model under Ensemble approach has been selected as the best model of other module (Salary Prediction module).


Data Mining; Machine Learning; Predictive Analytics

Full Text:


Printing Sponsor
University of Colombo
School of Computing

Managed & Published

Creative Commons License
This journal is published under a Creative Commons Attribution 4.0 International License.