Classification of Dropout Rates in West Sumatra Using the Random Forest Algorithm with Synthetic Minority Oversampling Technique

Authors

  • Anita Fadila Universitas Negeri Padang
  • Syafriandi Syafriandi Universitas Negeri Padang
  • Yenni Kurniawati Universitas Negeri Padang
  • Admi Salma Universitas Negeri Padang

DOI:

https://doi.org/10.24036/ujsds/vol2-iss3/183

Keywords:

Dropout Rates, Random Forest, Synthetic Minority Oversampling Technique

Abstract

This study aims to classify school dropout rates in West Sumatra Province using the Random Forest algorithm with the Synthetic Minority Oversampling Technique (SMOTE). Based on 2021 data from the Ministry of Education, Culture, Research, and Technology (Kemdikbudristek), the dropout rate in West Sumatra is above the national average. Despite efforts to reduce dropout rates, results remain suboptimal. Therefore, this study seeks to identify the causes of student dropouts and compare the performance of the Random Forest algorithm with and without SMOTE. The study uses the 2021 dropout data from West Sumatra, which has a significant class imbalance. SMOTE is applied to balance the data. The dataset is split into training and testing sets in an 80%:20% ratio, and parameter tuning is performed to optimize mtry and the number of trees (ntree). The model is evaluated using a confusion matrix to compare performance. The results show that Random Forest with SMOTE outperforms the version without SMOTE, with improvements in precision, recall, and F1-score. The presence of the biological mother ( ) is identified as the most significant factor influencing student dropouts, based on the Mean Decrease Gini value. The study concludes that using SMOTE in the Random Forest algorithm helps reduce classification bias and enhances the model's ability to detect students at risk of dropping out.

Published

2024-08-24

How to Cite

Anita Fadila, Syafriandi Syafriandi, Yenni Kurniawati, & Admi Salma. (2024). Classification of Dropout Rates in West Sumatra Using the Random Forest Algorithm with Synthetic Minority Oversampling Technique. UNP Journal of Statistics and Data Science, 2(3), 279–286. https://doi.org/10.24036/ujsds/vol2-iss3/183

Most read articles by the same author(s)

<< < 1 2 3 4 5 6