Racist Technology in Action: Chest X-ray classifiers exhibit racial, gender and socio-economic bias

The development and use of AI and machine learning in healthcare is proliferating. A 2020 study has shown that chest X-ray datasets that are used to train diagnostic models are biased against certain racial, gender and socioeconomic groups.

The authors wanted to find out whether AI classifiers trained on public medical imaging datasets were fair across different patient subgroups. Their study drew upon an aggregation of three large image datasets consisting of more than 700,000 images in total, combined with over 129,000 patients labelled with sex, age, race, and insurance type.

They found that all four datasets contained significant patterns of bias and imbalance. Female patients suffered from the highest disparity despite the fact the proportion of women was only slightly less than men. White patients, who make up the majority, with 67.6% of all the X-ray images, were the most-favoured subgroup, while Hispanic patients were the least-favoured. Additionally, bias existed against patients with Medicaid insurance, a minority of the population with only 8.98% of X-ray images. The classifiers often provided Medicaid patients with incorrect diagnoses.

Despite the study’s limitations, the authors assert that even the implication of bias is enough to warrant a closer look at the datasets and any models trained on them. Dataset challenges may give rise to bias in algorithms. For instance, groups with underdiagnosis are often those who experience more negative social determinants of health. Specifically women, minorities and people with lower socioeconomic statuses, may have less healthcare access than others. They underline that ‘debiasing’ techniques have their limits, as there are crucial biases inherent in existing large public datasets. Additionally, classifiers lack sufficient peer review, which can have different consequences when deployed in the real world.

The lack of transparency and access to code, datasets and techniques, and to the data used to train AI algorithms for diagnosing diseases continue to perpetuate inequalities. More crucially, as pointed out in this report by Balayn and Gürses, a focus on ‘debiasing’ shifts social and political problems into a technical domain, further removed from addressing actual issues of socioeconomic inequality, and discrimination.