Handling Multiclass Imbalance in the Sample Area Sampling Frame Survey Dataset using the SCUT Method
DOI:
https://doi.org/10.24036/ujsds/vol2-iss2/163Keywords:
Imbalance, Multiclass, SMOTE and Cluster-based Undersampling Technique (SCUT)Abstract
Area Sampling Frame (ASF) is a survey used by the Indonesian government to measure rice productivity in Indonesia. ASF survey is important data because accurate and high-quality rice productivity data is highly needed. There is extreme imbalance in the ASF survey data, thus requiring handling of this imbalance. SMOTE and Cluster-based Undersampling Technique (SCUT) is a method that can be used to address the dataset imbalance. SCUT combines oversampling using SMOTE and undersampling using CUT. The results from SCUT show that the number of data points in each class becomes balanced. Subsequently, a two-sample mean test is conducted to observe the mean differences between the original dataset and the dataset after handling. The results show that in the early vegetative, late vegetative, and harvest phases, the means are significantly similar between the original dataset and the dataset after handling, but in the generative phase, the means are not significantly similar. Therefore, synthetically generated data using the SCUT method generally exhibit similar mean characteristics.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Wilia Sondriva, Yenni Kurniawati, Nonong Amalita, Admi Salma
This work is licensed under a Creative Commons Attribution 4.0 International License.