Using ML to Signal Gender and the use of Gendered Language
The focus of this project is to use machine learning and natural language processing to develop automatic techniques for identifying gender issues and bias in text content. There are a variety of application areas where such techniques can be useful. In 2018, Amazon scrapped the use of their AI internal recruitment model which showed significant bias against women. The model had been trained on the applications and CVs of successful applicants, most of whom were male, hence, it ‘learned’ that successful candidates were typically male. In the recent Labour leadership election in Britain, an analysis of the language used in news articles about the candidates showed discrepancies related to their gender in how they were described. The single male candidate was more likely to be discussed in terms of professional employment, politics and law and order and the two female candidates were much more likely to be discussed in terms of their families, in particular their fathers. Earlier projects in this area have used techniques pioneered by Google to help identify gender issues in news articles and to detect racist sentiment. The approach is based on the idea that gender attribution relies on language use, not on language itself; therefore, there are many other factors which should also be considered when determining who is being referred to in a text. Therefore, the inclusion of women’s representations in text can be argued to be not only important for simulating real-life occurrences, but also valuable as it allows us to understand how perception and social roles influence language use. Gender stereotype hypotheses about textual content tend to situate language use within a wider discourse about gender differences and the ways that they are constructed. Thus, the goal is that providing recommended linguistic modifications and positive reinforcement to authors about written text will influence and change behaviour. Signaling text content that suggests gendered language or is gender-biased can encourage and influence writing behaviour that is gender neutral. Hence, this project explores methods in supervised machine learning and natural language processing related to gender bias in text and gendered-language identification and prediction. The model will harness stylometric features, gender-specific language patterns, discourses of gender difference and principles of cognitive perception about an author’s identity and use NLP techniques to identify phrases, language, constructs or patterns in writing that signal use of textual content with gender bias or gender stereotyping.