Comparison of Error Rate Prediction in CART for Imbalanced Data

Lifia Zullani; Dodi Vionanda; Syafriandi Syafriandi; Dina Fitria

doi:10.24036/ujsds/vol1-iss5/117

Penulis

Lifia Zullani Padang State University
Dodi Vionanda
Syafriandi Syafriandi
Dina Fitria

DOI:

https://doi.org/10.24036/ujsds/vol1-iss5/117

Kata Kunci:

CART, cross validation, error rate predicton, imbalanced data

Abstrak

CART is one of the tree based classification algorithms. CART is a tree consisting of root nodes, internal nodes, and terminal nodes. The accuracy of the model in CART can be calculated by measuring prediction errors in the model. One common method used to predict error rates is cross-validation. There are three cross-validation algorithms, namely leave one out, hold out, and k-fold cross-validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Every algorithm has its shortcomings; hold out cannot guarantee that the training set represents the entire dataset, leave one out is very time-consuming and requires significant computation because it has to train the model as many times as there are data points, and k-fold provides longer computation time because the training algorithm must be run k times. In reality, the data often encountered is imbalanced. Imbalanced data refers to data with a different number of observations in each class. In CART, imbalanced data affects the prediction results. This research focuses on comparing error rate prediction methods in the CART model with imbalanced data. The study uses three types of data: univariate, bivariate, and multivariate, obtained from differences in population means and correlations between independent variables. The results obtained indicate that the k-fold algorithm is the most suitable error rate prediction algorithm applied to CART with imbalanced data.

Comparison of Error Rate Prediction in CART for Imbalanced Data

Penulis

DOI:

Kata Kunci:

Abstrak

Unduhan

Diterbitkan

Cara Mengutip

Terbitan

Bagian

Lisensi

Artikel paling banyak dibaca berdasarkan penulis yang sama

Artikel Serupa

Menu

Kirim Naskah

Terbitan Terkini

Bahasa

Citation