Comparison of Error Rate Prediction in CART for Imbalanced Data
DOI:
https://doi.org/10.24036/ujsds/vol1-iss5/117Keywords:
CART, cross validation, error rate predicton, imbalanced dataAbstract
CART is one of the tree based classification algorithms. CART is a tree consisting of root nodes, internal nodes, and terminal nodes. The accuracy of the model in CART can be calculated by measuring prediction errors in the model. One common method used to predict error rates is cross-validation. There are three cross-validation algorithms, namely leave one out, hold out, and k-fold cross-validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Every algorithm has its shortcomings; hold out cannot guarantee that the training set represents the entire dataset, leave one out is very time-consuming and requires significant computation because it has to train the model as many times as there are data points, and k-fold provides longer computation time because the training algorithm must be run k times. In reality, the data often encountered is imbalanced. Imbalanced data refers to data with a different number of observations in each class. In CART, imbalanced data affects the prediction results. This research focuses on comparing error rate prediction methods in the CART model with imbalanced data. The study uses three types of data: univariate, bivariate, and multivariate, obtained from differences in population means and correlations between independent variables. The results obtained indicate that the k-fold algorithm is the most suitable error rate prediction algorithm applied to CART with imbalanced data.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Lifia Zullani, Dodi Vionanda, Syafriandi Syafriandi, Dina Fitria
This work is licensed under a Creative Commons Attribution 4.0 International License.