Towards Transparent AI for Lung Cancer Diagnosis: A Dual-Pipeline Explainable Framework Using Clinical and CT Imaging Data
DOI:
https://doi.org/10.19139/soic-2310-5070-3399Keywords:
Explainable AI (XAI); Machine Learning; Deep Learning; Transparency; Lung Cancer PredictionAbstract
In recent years, artificial intelligence (AI) has shown promising performance in medical diagnosis; however, its clinical adoption remains limited due to a lack of interpretability. In this study, we propose a dual-pipeline explainable framework for lung cancer diagnosis using two independent data modalities: structured clinical data and CT imaging data. For the clinical data, several machine learning models were compared, such as LightGBM, CatBoost, XGBoost, Random Forest, Logistic Regression, K-Nearest Neighbors, and Naïve Bayes. For the imaging data, deep learning models such as VGG16, ResNet50, InceptionV3, MobileNetV2, DenseNet121, and Xception were compared using the IQ-OTH/NCCD dataset. To ensure the reliability of the validation results, a strict patient-level split was used to avoid data leakage. The experimental results showed that LightGBM obtained the best results on the clinical data, achieving an accuracy of 98.39% and an ROC-AUC of 0.99. In the imaging data, MobileNetV2 obtained the best results, achieving an accuracy of 0.97, which is highly computationally efficient. To improve the interpretability of the models, SHAP and LIME were used to analyze the clinical feature importance, while Grad-CAM was used to analyze the discriminative regions in the CT image. The reliability of the explanations was also verified through stability analysis with Spearman rank correlation, agreement analysis with SHAP and LIME, as well as through verification with expert clinicians for the Grad-CAM visualizations. The results demonstrate that various XAI methods provide complementary insights, supporting the creation of transparent, reliable, and meaningful AI systems for lung cancer diagnosis.Downloads
Published
2026-04-14
Issue
Section
Research Articles
License
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
How to Cite
Towards Transparent AI for Lung Cancer Diagnosis: A Dual-Pipeline Explainable Framework Using Clinical and CT Imaging Data. (2026). Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-3399