Comparison of Error Prediction Methods in Classification Modeling with Chi-Squared Automatic Interaction Detection (CHAID) Methods for Balanced Data

Authors

  • Findri Wara Putri Padang State University
  • Dodi Vionanda
  • Atus Amadi Putra
  • Fadhilah Fitri

DOI:

https://doi.org/10.24036/ujsds/vol1-iss5/116

Keywords:

CHAID, Hold Out, K-Fold Cross Validation , Leave One Out

Abstract

Chi-Squared Automatic Interaction Detection (CHAID) is an exploratory method for classifying data by building classification trees. The classification result are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The goal is to see the performance of the model. The accuracy of this model can be determined by calculating the level of prediction error in the model. The error rate prediction method works by dividing data into training data and testing data. There are three methods in the error rate prediction method, such as Leave one out cross validation (LOOCV), Hold out, and k-fold cross validation. These methods have different performance in dividing data into training data and test data, so that each method has advantages and disadvantages. Therefore, a comparison of the three error rate prediction methods was carried out with the aim of determining the appropriate method for the CHAID. This research is included in experimental research and uses simulation data from data generation results in RStudio. This comparison is carried out by considering several factors, namely the marginal probability matrix and different correlations. The comparison results will be observed using a boxplot by looking at the median error rate and lowest variance. This research found that k-fold cross validation is the most suitable error rate prediction method applied to the CHAID method for balanced data.

Published

2023-11-30

How to Cite

Findri Wara Putri, Dodi Vionanda, Atus Amadi Putra, & Fadhilah Fitri. (2023). Comparison of Error Prediction Methods in Classification Modeling with Chi-Squared Automatic Interaction Detection (CHAID) Methods for Balanced Data. UNP Journal of Statistics and Data Science, 1(5), 456–463. https://doi.org/10.24036/ujsds/vol1-iss5/116

Most read articles by the same author(s)

1 2 3 4 5 6 > >>