Setting the record straight: Scientists show that the algorithm that Proctorio used is incredibly biased towards people with a darker skin colour

Do you remember when our Robin Pocornie filed a complaint with the Dutch Human Rights Institute because Proctorio, the spyware that she was forced to use as a proctor for doing her exams from home, couldn’t find her face as it was “too dark”? (If not, read the dossier of her case.)

Proctorio’s CEO, Mike Olsen, proved true to his reputation. He made sure that Proctorio acted in bad faith throughout the process and continued to provide disinformation, making sure that two simple truths wouldn’t come out:

That there was a time during COVID when Proctorio only used OpenCV to do face detection.
That this OpenCV module performed better on faces of white people than on the faces of people of colour.

And that these were the reasons for a third truth that did come out through the lived experiences of students of colour:

That Proctorio was terrible at recognising the faces of people with a darker skin colour.

Often, these students weren’t believed, partially because of blog posts by Proctorio in which they blatantly manipulate the truth. For example, this one, titled Setting the Record Straight: Fairface. Let’s take a closer look at its lack of veracity.

Mike Olsen begins by writing that they have asked external parties to do audits of their software and that they share these audits with their clients. He doesn’t share that these audits can not be made public by their clients. For example, Robin and the Center couldn’t get access to the audit of their software as used during Robin’s exams without us committing to keeping it confidential (a liability we couldn’t afford to take on). If their “AI audits” reveal that the software has no issues, why not share these results with the world?

Olsen then writes:

As published in 2021 by NPR, Proctorio continues to maintain that its auditors have “found no measurable bias” in our face-detection models.

Here, he is attempting to leverage the credibility of NPR to lend more credibility to his own statement. However, NPR has just asked him for a comment on their piece and has published that. It is also telling that he mis-cites himself on Proctorio’s blog. As the full NPR quote is (emphasis ours):

[Mike Olsen] added that the company has partnered with third-party data security auditors, and an analysis of Proctorio’s latest face-detection models found no measurable bias.

His quote does not say anything about the bias that their third-party auditors found in their earlier face detection models. We know that Proctorio made a change to their face detection models at some point during the pandemic. As RTL Nieuws has shown, they later included a model that would recognise faces everywhere. This is somewhat speculative, but ensuring that you see faces everywhere is one way to minimise bias in your face detection. It does mean your software doesn’t work very well, but how would your client ever know?

Next, Olsen writes:

While it is completely acceptable to question these claims and run your own analysis, we have unfortunately continued to see discredited research pop up from time to time. Despite its inaccurate and flawed approach to testing our models for bias, it continues to be referenced.

At the heart of these flawed and inaccurate reports lies a dataset called FairFace. While this dataset has been used to benchmark other facial detection algorithms and check them for potential bias, it can’t be used to test Proctorio because the FairFace dataset is not appropriate for the algorithm and its intended application.

The FairFace dataset contains composite images, photos of photos, with replaced backgrounds, children, cartoon images, photos with graphics, people intentionally hiding their faces, and partial side views of faces, all of which do not simulate a live test-taker’s remote exam experience. None of them were taken via a webcam.

If you have an inkling of an understanding of how facial detection works, you can easily spot the ridiculous reasoning here. If an algorithm can’t find darker faces in a curated set of pictures, how will it ever be able to detect the dark faces using students’ webcams in the real world?

Since the most recent ACM Conference on Fairness, Accountability, and Transparency (FACCT), we no longer need to rely on common sense to show that Proctorio is wrong. Researchers at the Maryland Test Facility have looked at the OpenCV algorithm that Proctorio has been using and have published a scientific article with their findings (PDF). These findings are shocking:

In this study, we demonstrate how scenario testing with demographically varied subjects, a form of prospective testing that simulates real-world conditions, revealed significant performance issues in biometric systems prior to broad deployment. Using generalized linear modeling, we show that subjects’ measured skin lightness, along with other demographic factors, significantly impacted the probability of failure to detect a face. Failure rates increased from just for subjects with the lightest skin in our sample to for subjects with the darkest, controlling for other factors. We show that skin lightness, rather than self-reported race, best explained the differences in system performance. We trace these issues to widely used, older methods in open-source packages for face detection.

Precisely the open-source packages that Proctorio was using during the first months of the pandemic.

So if you are considering procuring a proctoring solution (for example, because you want to give students who are too ill to come to school the option of doing a remotely proctored test), then please avoid Proctorio at all costs: you don’t want to do any business with a company which has such a proven track record of low integrity.

See: Performance Differentials in Deployed Biometric Systems Caused by Open-Source Face Detectors at the ACM Digital Library.