Comparison of Error Rate Prediction in CART for Imbalanced Data

Penulis

  • Lifia Zullani Padang State University
  • Dodi Vionanda
  • Syafriandi Syafriandi
  • Dina Fitria

DOI:

https://doi.org/10.24036/ujsds/vol1-iss5/117

Kata Kunci:

CART, cross validation, error rate predicton, imbalanced data

Abstrak

CART is one of the tree based classification algorithms. CART is a tree consisting of root nodes, internal nodes, and terminal nodes. The accuracy of the model in CART can be calculated by measuring prediction errors in the model. One common method used to predict error rates is cross-validation. There are three cross-validation algorithms, namely leave one out, hold out, and k-fold cross-validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Every algorithm has its shortcomings; hold out cannot guarantee that the training set represents the entire dataset, leave one out is very time-consuming and requires significant computation because it has to train the model as many times as there are data points, and k-fold provides longer computation time because the training algorithm must be run k times. In reality, the data often encountered is imbalanced. Imbalanced data refers to data with a different number of observations in each class. In CART, imbalanced data affects the prediction results. This research focuses on comparing error rate prediction methods in the CART model with imbalanced data. The study uses three types of data: univariate, bivariate, and multivariate, obtained from differences in population means and correlations between independent variables. The results obtained indicate that the k-fold algorithm is the most suitable error rate prediction algorithm applied to CART with imbalanced data.

Unduhan

Diterbitkan

2023-11-30

Cara Mengutip

Lifia Zullani, Dodi Vionanda, Syafriandi Syafriandi, & Dina Fitria. (2023). Comparison of Error Rate Prediction in CART for Imbalanced Data. UNP Journal of Statistics and Data Science, 1(5), 464–472. https://doi.org/10.24036/ujsds/vol1-iss5/117

Artikel paling banyak dibaca berdasarkan penulis yang sama

1 2 3 4 5 6 > >>