Prediction of Diabetes Risk in Young and Middle-aged Adults: Machine Learning Analysis based on Health Behavior and Physiological Indicators

Authors

  • Ruohan Zhang

DOI:

https://doi.org/10.54097/6y8zk455

Keywords:

Diabetes; logistic regression; random forest.

Abstract

One of the most common chronic illnesses in the world is diabetes. In recent years, the prevalence of diabetes is increasing among young and middle-aged people. This article aimed to making a predict model about diabetes used a balanced data, focusing on health behavior and physiological indicators mainly, from Behavioral Risk Factor Surveillance System (BRFSS) 2015. The data was analyzed by machine learning method and two models have been constructed in this article, which are logistic regression model and random forest model, in order to choose the model with higher accuracy, the accuracy rate, f1-score and a confusion matrix of that two models have been compared. The findings of the study indicated that logistic regression model is better for using in this dataset with higher accuracy. However, its accuracy is 84.42%, which is not high enough for actual use. There are 82 false positives (FPs) and 228 false negatives (FNs) as the prediction outcome of logistic regression model. Based on these findings, it is suggested that more updated variables, different parameters’ selection and other predict models (such as k nearest-neighbor, decision tree etc.) should be considered in model construction.

Downloads

Download data is not yet available.

References

[1] Antar S A, et al. Diabetes mellitus: Classification, mediators, and complications; A gate to identify potential targets for the development of new effective treatments. Biomed Pharmacother, 2023, 168: 115734.

[2] Harreiter J, Roden M. Diabetes mellitus: definition, classification, diagnosis, screening and prevention (Update 2023). Wien Klin Wochenschr, 2023, 135(Suppl 1): 7-17.

[3] Magliano D J, et al. Young-onset type 2 diabetes mellitus - implications for morbidity and mortality. Nat Rev Endocrinol, 2020, 16(6): 321-331.

[4] Cole J B, Florez J C. Genetics of diabetes mellitus and diabetes complications. Nat Rev Nephrol, 2020, 16(7): 377-390.

[5] Morrish N J, et al. Mortality and causes of death in the WHO Multinational Study of Vascular Disease in Diabetes. Diabetologia, 2001, 44(2): S14-21.

[6] Diabetes Care Center. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes-2025. Diabetes Care, 2025, 48(Supplement_1): S27-S49.

[7] Diabetes Care Center. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes-2024. Diabetes Care, 2024, 47(Suppl 1): S20-S42.

[8] Davidson K W, et al. Screening for Prediabetes and Type 2 Diabetes: US Preventive Services Task Force Recommendation Statement. JAMA, 2021, 326(8): 736-743.

[9] Subramanian M, et al. Precision medicine in the era of artificial intelligence: implications in chronic disease management. J Transl Med, 2020, 18(1): 472.

[10] Oikonomou E K, Khera R. Machine learning in precision diabetes care and cardiovascular risk prediction. Cardiovasc Diabetol, 2023, 22(1): 259.

[11] Hahn S J, et al. Prediction of type 2 diabetes using genome-wide polygenic risk score and metabolic profiles: A machine learning analysis of population-based 10-year prospective cohort study. Ebio Medicine, 2022, 86: 104383.

Downloads

Published

27-06-2025

How to Cite

Zhang, R. (2025). Prediction of Diabetes Risk in Young and Middle-aged Adults: Machine Learning Analysis based on Health Behavior and Physiological Indicators. Highlights in Science, Engineering and Technology, 144, 161-167. https://doi.org/10.54097/6y8zk455