Analysis of Aspects of ML Algorithms that Lead to Bias
Issues of algorithmic fairness/bias have received a lot of attention in AI & ML research in recent years. There are two main sources of bias in ML: Negative Legacy: the bias is there in the training data, either due to poor sampling, incorrect labeling or discriminatory practices in the past. Underestimation: the classifier underfits the data, thereby focusing on strong signals in the data and missing more subtle phenomena. In most cases the data (negative legacy) rather than the algorithm itself is the source of bias. Fairness research focuses on fair outcomes no matter what is the source of the problem so the underestimation side of algorithmic bias has not received a lot of attention. However, the algorithmic side of algorithmic bias is important because it is inextricably tied to regularisation, i.e. the extent to which the model fits (overfits) the data. Overfitting occurs when the model fits to noise in the training data thus reducing generalisation. ML practitioners expend a lot of effort avoiding overfitting. This PhD research will focus on the algorithmic aspect of algorithmic bias and the relationship between model fitting and underestimation. An initial paper on this research is available on arxiv. “Algorithmic Bias and Regularisation in Machine Learning” Pádraig Cunningham, Sarah Jane Delany https://arxiv.org/abs/2005.09052 For a wider perspective on research relating to fairness in ML have a look at the papers published at the ACM FAccT conferences https://facctconference.org.