Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

Maria De-Arteaga,Alexey Romanov,Hanna M. Wallach,Jennifer T. Chayes,Christian Borgs,Alexandra Chouldechova,Sahin Cem Geyik,Krishnaram Kenthapadi,Adam Tauman Kalai

Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

2019

Maria De-Arteaga
Alexey Romanov
Hanna M. Wallach
Jennifer T. Chayes
Christian Borgs
Alexandra Chouldechova
Sahin Cem Geyik
Krishnaram Kenthapadi
Adam Tauman Kalai

We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples' lives. We analyze the potential allocation harms that can result from semantic representation bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in different semantic representations of online biographies. Additionally, we quantify the bias that remains when these indicators are "scrubbed," and describe proxy behavior that occurs in the absence of explicit gender indicators. As we demonstrate, differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances.

Keywords:

Information retrieval
Cognitive psychology
Supervised learning
Computer science
BIOS
Proxy (climate)
semantic representation
gender bias

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations