Hybrid LBFA-Based Feature Selection for Improving Machine Learning Classification Performance in Heart Disease Prediction

Penulis

  • Hana Azizah Universitas Brawijaya
  • Eni Sumarminingsih Universitas Brawijaya
  • Adji Achmad Rinaldo Fernandes Universitas Brawijaya

DOI:

https://doi.org/10.24036/ujsds/vol4-iss2/478

Kata Kunci:

Feature Augmentation, Feature Selection, Imbalanced Data, Heart Disease, Prediction, Machine Learning

Abstrak

Feature selection and feature engineering play an important role in improving the predictive performance of machine learning models, particularly when dealing with real-world datasets that often contain redundant variables and imbalanced class distributions. However, many feature augmentation approaches are implemented without a consistent preprocessing structure, which may lead to information leakage or suboptimal feature representations. To address this issue, this study proposes a hybrid classification framework based on Logit-Based Feature Augmentation (LBFA) integrated with a structured preprocessing pipeline. In the proposed framework, the original predictors are first filtered using CatBoost-based feature selection to identify the ten most relevant variables. Feature augmentation is then performed through two complementary transformations, namely the LOGIT transformation and the Log Density Ratio (LDR). Each transformation is constructed through a dedicated preprocessing branch to ensure methodological consistency. One-hot encoding is applied specifically for the construction of LOGIT features, while numerical standardization is used only for estimating LDR features. The augmented features are subsequently combined with the selected original predictors to form several input configurations used to train gradient boosting classifiers, including XGBoost, LightGBM, and CatBoost. Experiments are conducted using the Heart Disease dataset, and model performance is evaluated using accuracy, precision, recall, specificity, and F1-score. The best performance is achieved by the LightGBM + LOGIT + LDR model with an accuracy of 0.8493, precision of 0.2531, recall of 0.8189, specificity of 0.8512, and F1-score of 0.9874, demonstrating that integrating feature selection with consistent feature augmentation can improve predictive performance in imbalanced classification problems.

Unduhan

Diterbitkan

2026-05-31

Cara Mengutip

Azizah, H., Eni Sumarminingsih, & Adji Achmad Rinaldo Fernandes. (2026). Hybrid LBFA-Based Feature Selection for Improving Machine Learning Classification Performance in Heart Disease Prediction. UNP Journal of Statistics and Data Science, 4(2), 167–177. https://doi.org/10.24036/ujsds/vol4-iss2/478