In a recent interview with The Guardian, Kate Crawford discusses her new book, Atlas AI, that delves into the broader landscape of how AI systems work by canvassing the structures of production and material realities. One example is ImageNet, a massive training dataset created by researchers from Stanford, that is used to test whether object recognition algorithms are efficient. It was made by scraping photos and images across the web and hiring crowd workers to label them according to an outdated lexical database created in the 1980s.
ImageNet’s database was revealed to have extremely racist, ableist, sexist, and offensive classification categories. While ImageNet has rectified this by removing these categories, and by correcting datasets to be more representative, remaining issues with such large datasets still remain.
Firstly and importantly, and reiterated by Crawford, the idea that simply creating more classifications solves problems of bias is too narrow and limiting. Take, for example, categorising people only into two binary genders, or labelling people according to their skin colour that will lead to moral or ethical judgments. The idea of classification and categorisation should take heed from the past. Secondly, huge training datasets, used to train machine learning systems, are often in the hands of private tech companies, and are still hidden and inaccessible. Additionally, the hidden human cost of cleaning, labelling these data sets often involve the massive exploitation of workers globally.
It is important to address the issues of power in AI systems; technology that is often taken for granted as a beacon for progress. As proclaimed by Crawford:
We need a renewed politics of refusal that challenges the narrative that just because a technology can be built it should be deployed.
See: Microsoft’s Kate Crawford: ‘AI is neither artificial nor intelligent’ at The Guardian.