Handling Multiclass Imbalance in the Sample Area Sampling Frame Survey Dataset using the SCUT Method

Authors

  • Wilia Sondriva Universitas Negeri Padang
  • Yenni Kurniawati Universitas Negeri Padang
  • Nonong Amalita
  • Admi Salma

DOI:

https://doi.org/10.24036/ujsds/vol2-iss2/163

Keywords:

Imbalance, Multiclass, SMOTE and Cluster-based Undersampling Technique (SCUT)

Abstract

Area Sampling Frame (ASF) is a survey used by the Indonesian government to measure rice productivity in Indonesia. ASF survey is important data because accurate and high-quality rice productivity data is highly needed. There is extreme imbalance in the ASF survey data, thus requiring handling of this imbalance. SMOTE and Cluster-based Undersampling Technique (SCUT) is a method that can be used to address the dataset imbalance. SCUT combines oversampling using SMOTE and undersampling using CUT. The results from SCUT show that the number of data points in each class becomes balanced. Subsequently, a two-sample mean test is conducted to observe the mean differences between the original dataset and the dataset after handling. The results show that in the early vegetative, late vegetative, and harvest phases, the means are significantly similar between the original dataset and the dataset after handling, but in the generative phase, the means are not significantly similar. Therefore, synthetically generated data using the SCUT method generally exhibit similar mean characteristics.

Published

2024-05-31

How to Cite

Sondriva, W., Kurniawati, Y., Amalita, N., & Salma, A. (2024). Handling Multiclass Imbalance in the Sample Area Sampling Frame Survey Dataset using the SCUT Method. UNP Journal of Statistics and Data Science, 2(2), 159–164. https://doi.org/10.24036/ujsds/vol2-iss2/163

Most read articles by the same author(s)

1 2 > >>