Bloomberg’s researchers used Stable Diffusion to gauge the magnitude of biases in generative AI. Through an analysis of more than 5,000 images created by Stable Diffusion, they have found that it takes racial and gender disparities to extremes. The results are worse than those found in the real world.
The researchers used Stable Diffusion’s text-to-image model to generate thousands of images related to job titles and crime. For jobs, they prompted the model to create representations of workers for 14 jobs in the U.S., with 300 images each for seven jobs that are usually considered “high paying,” and seven which are considered “low paying”. For crime, they created three categories.
They calculated an average colour from the parts of the image that made up the facial skin, and based on that average colour, they used the Fitzpatrick Skin Scale (used by dermatologists and researchers) to classify each face into one of six categories of skin tone.
What they found was that image sets generated for every high paying job were dominated by subjects with lighter skin tones, while darker skin tones were generated by prompts such as “fast food worker” and “social worker”.
Categorising images by gender also revealed a similar outcome in which most occupations in the dataset were dominated by men, except for low paying jobs such as cashier or housekeeper. When considering bias in terms of both gender and skin tone, lighter skinned men represented most of the subjects in every high-paying job, such as “politician”, “lawyer”, “judge” and “CEO”.
They compared the findings from their analysis with data from the US Bureau of Labor Statistics which tracks the gender and race of workers in every occupation. This highlighted that Stable Diffusion depicts a different scenario: in terms of gender, Stable Diffusion underrepresents women in high-paying occupations, while also overrepresenting them in low-paying ones. While it is more complex to compare the results of this experiment measured in skin tone with government demographic data (as skin tone does not equate to race), it can still be inferred that Stable Diffusion specifically overrepresents people with darker skin tones in low-paying fields.
Similarly, in using Stable Diffusion to generate images for crime, using keywords such as ‘inmate’, ‘drug dealer’ and ‘terrorist’, the model also amplified racial and religious stereotypes. While quantifying skin tone and perceived gender are the more obvious signals, there are other details within the generated images – like religious accessories or types of facial hair – which were not measured but also contributed to the overall bias encoded in generative AI outputs.
In addition to issues around representation, the use of text-to-image generative models in policing and law enforcement will further exacerbate existing bias in the criminal justice system that already overcriminalises and discriminates against racialised people. Cognitive scientist Abeba Birhane warns that technology helps to legitimise bias by making it seem more objective. Or as Ruha Benjamin has eloquently articulated of this reality as the new Jim Code which is a “combination of coded bias and imagined objectivity”.
See Generative AI Takes Stereotypes and Bias From Bad to Worse at Bloomberg.
Header image from the original Bloomberg article.