Attempts to eliminate bias through diversifying datasets? A distraction from the root of the problem

In this eloquent and haunting piece by Hito Steyerl, she weaves the ongoing narratives of the eugenicist history of statistics with its integration into machine learning. She elaborates why the attempts to eliminate bias in facial recognition technology through diversifying datasets obscures the root of the problem: machine learning and automation are fundamentally reliant on extracting and exploiting human labour.

In 2016, Steyerl found her name in a Microsoft database (ms-Celeb-1m) composed of 10 million images of 100,000 people on the internet. She realised she was part of an early training dataset for facial recognition algorithms. This dataset was utilised to optimise racial classification by developers of another dataset: Racial Faces in the Wild. The aim of these developers was to “fix” the issue of bias in facial recognition software and to diversify training data. The results were “ghostlike apparitions of racialized phenotypes, or a quasi-platonic idea of discrimination as such.” Diversified datasets attempting to reduce racial bias have simply been repackaged to identify minorities more accurately in order to optimise machine vision for “better” racial classification. As she found out, these tools optimised for non-Caucasian faces are ideal for law enforcement and repression globally, and are operationalised in practice by companies such as SenseTime to track Uyghur minorities in China.

The effort to eliminate bias within datasets creates more problems than it purports to solve. The focus on amending discriminatory outputs is simply to satisfy Western liberal consumers, as it leaves the means of production intact. The invisible labour that cleans, labels, sorts and annotates datasets; keeps the internet “clean” by filtering and reviewing traumatic imagery; and the extraction of unpaid labour from users uploading and posting images and their artwork. Other white-collar labourers (programmers, web designers, etc) are also trained to use machine learning models. In doing so, they are integrated into new production pipelines, software and hardware stacks, that align with proprietary machine learning applications and cloud services owned by a handful of powerful tech companies, which are then repackaged into products rented out to them.

This overall infrastructure is thus based on hidden click work in countries of conflict, refugees and migrants in metropoles, with users integrated into a system of extraction, exploitation and expropriation, that also creates a massive carbon footprint. These efforts to “fix” bias are reliant on exploiting and expropriating labour at the level of production, obscuring how automation is the “labour of making labour seem to disappear”. In Steyerl’s words:

“Tweaking technology to be more ‘inclusive’ can thus lead to improved minority identification while outsourcing traumatic and underpaid labour. It can optimize discrimination, superficially sanitizing commercial applications while creating blatantly exploitative class hierarchies in the process. Political and military conflict as well as racially motivated migration barriers are important tools in creating this disenfranchised labour force. Perhaps bias is not a bug, but an important feature of a mean system of production.”

See: Mean Images at New Left Review.

Image from the original article with images of Hito Steyerl found via haveibreentrained.com