Diabetes Detection Using Machine Learning Algorithm
Main Article Content
Abstract
Diabetes is a chronic condition in which blood glucose levels are elevated and is responsible for several conditions that can cause disabilities resulting in poor quality of life. The prevalence of diabetes has been observed to increase not only in affluent sections of society but also in socio-economically poor sections. This reflects the seriousness of this medical condition and indicates the changing lifestyle of people worldwide. It is suggested that by 2040, there will be 642 million cases of the disease, globally. This research attempts to create a system based on machine learning (ML) to forecast a patient's risk of having diabetes. In the present study two ML algorithms, Logistic regression (LR) and K-Nearest Neighbor (KNN), were used. The LR employs odds ratios (OR) and p-values to determine diabetes risk variables, on the other hand, KNN uses nearest neighbor distance based on Euclidean distance to identify new cases based on its learning. The results of evaluation metrics such as precision or sensitivity, recall and F1 score showed that LR was better in predicting diabetes than KNN. The overall accuracy obtained with LR was 77% as compared to KNN which provided 72% accuracy. The macro-average values, which gives equal weightage to all classes (irrespective of their size) also indicated a better performance of LR. We suggest creating a better dataset which incorporates comprehensive details factors related to the lifestyle of people which may prove helpful in improving the performance of these models.