Hybrid LBFA-Based Feature Selection for Improving Machine Learning Classification Performance in Heart Disease Prediction

Authors

  • Hana Azizah Universitas Brawijaya
  • Eni Sumarminingsih Universitas Brawijaya
  • Adji Achmad Rinaldo Fernandes Universitas Brawijaya

DOI:

https://doi.org/10.24036/ujsds/vol4-iss2/478

Keywords:

Feature Augmentation, Heart Disease Prediction, LOGIT Transformation, Log Density Ratio, LightGBM , XGBoost

Abstract

Feature selection and feature engineering are essential steps in developing accurate machine learning models, particularly when dealing with imbalanced datasets and redundant variables. However, many feature augmentation methods are often applied without a consistent preprocessing strategy, which can reduce model reliability and increase the risk of information leakage. To overcome this issue, this study proposes a hybrid classification framework that combines CatBoost-based feature selection with two feature augmentation techniques: LOGIT transformation and Log Density Ratio (LDR). A structured preprocessing pipeline was designed to ensure consistency throughout the modeling process. One-hot encoding was applied for the LOGIT transformation, while numerical standardization was used for LDR estimation. The generated features were then integrated with the selected original variables to produce richer feature representations for classification. The proposed framework was evaluated using the Heart Disease dataset with three gradient boosting algorithms, namely LightGBM, XGBoost, and CatBoost. Model performance was assessed using accuracy, precision, sensitivity, specificity, and F1-score. The results show that the proposed approach consistently improved classification performance across all models. Among the tested models, LightGBM combined with LOGIT and LDR achieved the best performance, obtaining an accuracy of 0.9618, precision of 0.9485, sensitivity of 0.9620, specificity of 0.9625, and F1-score of 0.9552. These findings suggest that combining feature selection with structured feature augmentation can significantly improve predictive performance in imbalanced classification tasks

Published

2026-05-31

How to Cite

Azizah, H., Eni Sumarminingsih, & Adji Achmad Rinaldo Fernandes. (2026). Hybrid LBFA-Based Feature Selection for Improving Machine Learning Classification Performance in Heart Disease Prediction. UNP Journal of Statistics and Data Science, 4(2), 167–177. https://doi.org/10.24036/ujsds/vol4-iss2/478