Comparasion of Error Rate Prediction Methods of C4.5 Algorithm for Balanced Data

Penulis

  • Ichlas Djuazva Universitas Negeri Padang
  • Dodi Vionanda
  • Nonong Amalita
  • Zilrahmi

DOI:

https://doi.org/10.24036/ujsds/vol1-iss4/74

Kata Kunci:

C4.5, Error Prediction, HO, K-folds, LOOCV

Abstrak

C45 is a highly effective decision tree algorithm widely used for classification purposes. Compared to CHAID, Cart, and ID3, C4.5 generates decision trees that are easier to understand and does so in a faster manner. This is due to C4.5 selecting attributes based on their information content during each stage of the process. After generating the decision tree model, its performance needs to be evaluated. One commonly used method is the prediction error rate, which assesses the model's performance. The prediction error rate consists of two approaches: the train error rate, which employs the same data for both building and testing the model, potentially leading to overfitting, and the test error rate, which divides the data into training and testing sets. The test error rate includes cross validation techniques such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and k-folds cross validation. Considering these factors, this research focuses on comparing the three cross-validation methods for predicting error rates applied to the C4.5 algorithm. The study utilizes artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combinations of mean differences and correlations. Different correlation structures are applied between two relevant variables and between relevant and irrelevant variables in the bivariate and multivariate data, including three correlation levels: no correlation, moderate correlation, and high correlation. This research findings that k-folds cross validation is the most suitable cross validation method to apply to C4.5.

Diterbitkan

2023-08-28

Cara Mengutip

Ichlas Djuazva, Dodi Vionanda, Nonong Amalita, & Zilrahmi. (2023). Comparasion of Error Rate Prediction Methods of C4.5 Algorithm for Balanced Data. UNP Journal of Statistics and Data Science, 1(4), 297–305. https://doi.org/10.24036/ujsds/vol1-iss4/74

Artikel paling banyak dibaca berdasarkan penulis yang sama

1 2 3 4 5 6 7 8 > >>