Penalized Spline Semiparametric Logistic Regression for Modelling Coronary Heart Disease Risk Based on Demographic and Lifestyle Factors

  • Naufal Ramadhan Al Akhwal Siregar
  • Nur Chamidah Airlangga University
  • Marisa Rifada
  • Budi Lestari
  • Dursun Aydin Department of Statistics, Faculty of Science, Muğla Sıtkı Koçman University, Muğla 48000, Turkey;Department of Mathematics, University of Wisconsin, Oshkosh Algoma Blvd, Oshkosh, WI 54901, USA
Keywords: Coronary Heart Disease, Penalized Spline Estimator, Semiparametric Binary Logistic Regression

Abstract

This study presents a novel application of Penalized Splines Semiparametric Binary Logistic Regression (PS-SBLR) to evaluate Coronary Heart Disease (CHD) risk. By combining parametric and nonparametric components, the established PS-SBLR method extends classical logistic regression to effectively model both linear and non-linear relationships simultaneously. To estimate the nonparametric component, a penalized spline estimator is used to produce smooth adaptive curves. At the same time, Generalized Approximate Cross Validation (GACV) is employed for smoothing parameter selection to bypass the nonconvergence issues often found in standard CV or GCV methods. While the theoretical foundation of PS-SBLR has shown strong potential in medical research, it has not yet been applied within an integrated framework for both CHD prediction and prevention. To address this gap, we developed a specific PS-SBLR predictive framework using real-world data to enhance the accuracy and efficiency of CHD risk prediction. This applied approach provides valuable, practical insights for the management and mitigation of CHD risk. The resulting predictive model achieved 84.38% accuracy on the training data with an AUC of 0.90, and 87.5% accuracy on the test data with an AUC of 0.98, demonstrating its excellent performance in distinguishing CHD risk profiles. The analysis revealed that, while age and sugar consumption show a linear positive correlation with CHD, continuous variables such as body weight and stress levels exhibit significant non-linear relationships.
Published
2026-03-07
How to Cite
Siregar, N. R. A. A., Chamidah, N., Rifada, M., Lestari, B., & Aydin, D. (2026). Penalized Spline Semiparametric Logistic Regression for Modelling Coronary Heart Disease Risk Based on Demographic and Lifestyle Factors. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-3334
Section
Research Articles