Comparison of Error Rate Prediction Methods of C4.5 Algorithm for Imbalanced Data

Authors

  • Yunistika Ilanda Universitas Negeri Padang
  • Dodi Vionanda
  • Yenni Kurniawati
  • Dina Fitria

DOI:

https://doi.org/10.24036/ujsds/vol1-iss4/89

Keywords:

C4.5 Algorithm, Error Rate Prediction Methods, Imbalanced Data

Abstract

Classification modeling can be formed using the C4.5 algorithm. The model formed by the C4.5 algorithm needs to be seen for its prediction accuracy using the error rate prediction method. Imbalanced data causes an increase in the classification error of the C4.5 algorithm because the prediction results do not represent the entire data and worsen the performance of the error rate prediction method. Meanwhile, the case of data with different correlations is carried out to find out whether different correlations affect the performance of the error rate prediction method. The purpose of the research is to find out the most suitable error rate prediction method applied to the C4.5 algorithm in the case of imbalanced data and the influence of different correlations. The results show that the K-Fold CV method is the most suitable prediction method applied to the C4.5 algorithm for imbalanced data cases compared to the HO and LOOCV methods. In addition, high correlation can worsen the performance of error rate prediction methods.

Published

2023-08-28

How to Cite

Yunistika Ilanda, Dodi Vionanda, Yenni Kurniawati, & Dina Fitria. (2023). Comparison of Error Rate Prediction Methods of C4.5 Algorithm for Imbalanced Data. UNP Journal of Statistics and Data Science, 1(4), 240–247. https://doi.org/10.24036/ujsds/vol1-iss4/89

Most read articles by the same author(s)

1 2 3 4 5 6 7 > >>