Classification of Dropout Rates in West Sumatra Using the Random Forest Algorithm with Synthetic Minority Oversampling Technique
DOI:
https://doi.org/10.24036/ujsds/vol2-iss3/183Keywords:
Dropout Rates, Random Forest, Synthetic Minority Oversampling TechniqueAbstract
This study aims to classify school dropout rates in West Sumatra Province using the Random Forest algorithm with the Synthetic Minority Oversampling Technique (SMOTE). Based on 2021 data from the Ministry of Education, Culture, Research, and Technology (Kemdikbudristek), the dropout rate in West Sumatra is above the national average. Despite efforts to reduce dropout rates, results remain suboptimal. Therefore, this study seeks to identify the causes of student dropouts and compare the performance of the Random Forest algorithm with and without SMOTE. The study uses the 2021 dropout data from West Sumatra, which has a significant class imbalance. SMOTE is applied to balance the data. The dataset is split into training and testing sets in an 80%:20% ratio, and parameter tuning is performed to optimize mtry and the number of trees (ntree). The model is evaluated using a confusion matrix to compare performance. The results show that Random Forest with SMOTE outperforms the version without SMOTE, with improvements in precision, recall, and F1-score. The presence of the biological mother ( ) is identified as the most significant factor influencing student dropouts, based on the Mean Decrease Gini value. The study concludes that using SMOTE in the Random Forest algorithm helps reduce classification bias and enhances the model's ability to detect students at risk of dropping out.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Anita Fadila, Syafriandi Syafriandi, Yenni Kurniawati, Admi Salma
This work is licensed under a Creative Commons Attribution 4.0 International License.