高级检索
当前位置: 首页 > 详情页

Scalable and robust machine learning framework for HIV classification using clinical and laboratory data

文献详情

资源类型:
WOS体系:
Pubmed体系:

收录情况: ◇ SCIE

机构: [1]The Fourth Hospital of Hebei Medical University, Shijiazhuang, China. [2]Department of Mathematics, Xi'an Jiaotong-Liverpool University, Xi'an, China. [3]The Second Hospital of Hebei Medical University, Hebei, China. [4]School of international business, Anhui International Studies University, Wuhu, Anhui, China. [5]Gezhi Future Research Institute, No.1501 Building L, HaiDian District, Beijing, China. [6]School of Systems and Computing, UNSW Australia, UNSW Canberra, R118, B15, Canberra, ACT, 2600, Australia.
出处:

摘要:
Human Immunodeficiency Virus (HIV) is a retrovirus that weakens the immune system, increasing vulnerability to infections and cancers. HIV spreads primarily via sharing needles, from mother to child during childbirth or breastfeeding, or unprotected sexual intercourse. Therefore, early diagnosis and treatment are crucial to prevent the disease progression of HIV to AIDS, which is associated with higher mortality. This study introduces a machine learning-based framework for the classification of HIV infections crucial for preventing the disease's progression and transmission risk to improve long-term health outcomes. Firstly, the challenges posed by an imbalanced dataset is addressed, using the Synthetic Minority Over-sampling Technique (SMOTE) oversampling technique, which was chosen over two alternative methods based on its superior performance. Additionally, we enhance dataset quality by removing outliers using the interquartile range (IQR) method. A comprehensive two-step feature selection process is employed, resulting in a reduction from 22 original features to 12 critical variables. We evaluate five machine learning models, identifying the Random Forest Classifier (RFC) and Decision Tree Classifier (DTC) as the most effective, as they demonstrate higher classification performance compared to the other models. By integrating these models into a voting classifier, we achieve an overall accuracy of 89%, a precision of 90.84%, a recall of 87.63%, and a F1-score of 98.21%. The model undergoes validation on multiple external datasets with varying instance counts, reinforcing its robustness. Furthermore, an analysis focusing solely on CD4 and CD8 cell counts which are essential lab test data for HIV monitoring, demonstrates an accuracy of 87%, emphasizing the significance of these clinical features for the classification task. Moreover, these outcomes underscore the potential of combining machine learning techniques with critical clinical data to enhance the accuracy of HIV infection classification, ultimately contributing to improved patient management and treatment strategies. These findings also highlight the scalability of the approach, showing that it can be efficiently adapted for large-scale use across various healthcare environments, including those with limited resources, making it suitable for widespread deployment in both high- and low-resource settings.© 2025. The Author(s).

语种:
WOS:
PubmedID:
中科院分区:
出版当年[2025]版:
大类 | 3 区 综合性期刊
小类 | 3 区 综合性期刊
最新[2025]版:
大类 | 3 区 综合性期刊
小类 | 3 区 综合性期刊
JCR分区:
出版当年[2024]版:
Q1 MULTIDISCIPLINARY SCIENCES
最新[2024]版:
Q1 MULTIDISCIPLINARY SCIENCES

影响因子: 最新[2024版] 最新五年平均 出版当年[2025版] 出版当年五年平均 出版前一年[2024版]

第一作者:
第一作者机构: [1]The Fourth Hospital of Hebei Medical University, Shijiazhuang, China.
通讯作者:
通讯机构: [5]Gezhi Future Research Institute, No.1501 Building L, HaiDian District, Beijing, China. [6]School of Systems and Computing, UNSW Australia, UNSW Canberra, R118, B15, Canberra, ACT, 2600, Australia.
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:42329 今日访问量:0 总访问量:1365 更新日期:2025-08-01 建议使用谷歌、火狐浏览器 常见问题

技术支持:重庆聚合科技有限公司 地址:河北省石家庄市健康路12号