Mitigating Data Bias In Machine Learning: Enhancing Model Transparency Through Fairness-Aware Techniques
Main Article Content
Abstract
As the Machine Learning algorithms are increasingly influencing the decision-making process across multiple domains like healthcare and finance, it is posing an important challenge of the bias in data. Having a highly accurate AI system working on dataset containing bias will not only have skewed predictions and unintended societal consequences but can adversely affect the transparency and reliability of the system as well. This research is aimed at exploring biases in medical data within machine learning models that even exhibit ethical concerns. Bias generally refers to systematic favouring or prejudice often due to imbalances in medical records used to train algorithms. The results capture the effects of bias in the medical dataset and foretell likely outcomes of the model while addressing the problem at its root. We used the MEPS medical dataset, which showed disparities based on protected classes such as race and ethnicity or age and calculated measures of bias for the classification model's predictions. Further, we used some of the open-source tools for generating reports about biased outcomes and were calculating bias by emphasizing metrics such as Disparate Impact using IBM's fairness measures. The study compares four machine learning models: Logistic Regression, Decision Tree, Random Forest, and SVM and measures the correlation of model complexity to bias in predictions. To mitigate it, various approaches such as re-weighing, adversarial debiasing, ROC post-processing, and prejudiced remover were deployed which showed trade-offs in terms of bias reduction and model accuracy