Home / News / Artificial Intelligence / Meta releases bias-testing data for computer vision models

Meta releases bias-testing data for computer vision models

Meta released FACET, a new AI benchmark, today to assess the “fairness” of AI models that classify and detect objects and people in photos and videos.

FACET, a tortured acronym for “Fairness in Computer Vision EvaluaTion” with 32,000 images of 50,000 people labeled by human annotators, accounts for occupations and activities like “basketball player,” “disc jockey” and “doctor” as well as demographic and physical attributes, allowing Meta to evaluate “deep” biases against those classes.

Meta wrote, “By releasing FACET, our goal is to enable researchers and practitioners to perform similar benchmarking to better understand the disparities present in their own models and monitor the impact of mitigations put in place to address fairness concerns.” We recommend researchers use FACET to compare fairness across vision and multimodal tasks.

Bias benchmarks for computer vision algorithms aren’t new. Meta released one several years ago to detect age, gender, and skin tone discrimination in computer vision and audio machine learning models. Several studies have examined computer vision models for demographic bias. Spoiler: they usually are.

Meta also has a poor record of responsible AI.

Meta had to pull an AI demo late last year for writing racist and inaccurate scientific literature. Reports say the company’s AI ethics team is toothless and its anti-AI-bias tools are “completely insufficient.” Academics have accused Meta of worsening socioeconomic inequality in its ad-serving algorithms and biasing Black users in its automated moderation systems.

Meta claims FACET is more thorough than previous computer vision bias benchmarks, answering questions like “Are models better at classifying people as skateboarders when their perceived gender presentation has more stereotypically male attributes?” “Are any biases magnified when the person has coily hair compared to straight hair?”

Meta had the annotators label each of the 32,000 images for demographic attributes (e.g., gender presentation and age group), physical attributes (e.g., skin tone, lighting, tattoos, headwear and eyewear, hairstyle and facial hair), and classes to create FACET. They added labels for people, hair, and clothing from Segment Anything 1 Billion, a Meta-designed data set for training computer vision models to “segment,” or isolate, objects and animals from images.

Meta says FACET used images from Segment Anything 1 Billion, which was purchased from a “photo provider.” The people in the photos may not have known they would be used for this purpose. The blog post doesn’t explain how Meta recruited or paid the annotator teams.

Many annotators who label data sets for AI training and benchmarking are from developing countries and earn well below the U.S. minimum wage. The Washington Post reported this week that Scale AI, one of the largest and best-funded annotation firms, has paid workers low wages, delayed or withheld payments, and provided few recourses.

Meta describes FACET as a group of “trained experts” from “several geographic regions”—North America (United States), Latin America (Colombia), Middle East (Egypt), Africa (Kenya), Southeast Asia (Philippines), and East Asia (Taiwan)—in a white paper. Meta used a third-party vendor’s “proprietary annotation platform” and was paid “with an hour wage set per country.”

Meta says FACET can be used to test classification, detection, “instance segmentation” and “visual grounding” models across demographic attributes, despite its potential issues.

Meta tested FACET on its DINOv2 computer vision algorithm, which is commercially available this week. Meta says FACET found several biases in DINOv2, including a bias against certain gender presentations and a tendency to stereotypically identify women as “nurses.”

Meta wrote in a blog post that DINOv2’s pre-training dataset may have replicated the biases of the reference datasets selected for curation. We plan to address these potential shortcomings in future work and believe image-based curation could help avoid search engine and text supervision biases.”

No benchmark is ideal. Meta acknowledges that FACET may not capture real-world concepts and demographic groups. It also notes that many data set profession depictions may have changed since FACET was created. Most FACET doctors and nurses during the COVID-19 pandemic are wearing more PPE than before.

“At this time we do not plan to have updates for this data set,” Meta writes in the whitepaper. We will allow users to flag objectionable images and remove them if found.”

Meta offers a web-based data set explorer tool along with the data set. Developers must agree not to train computer vision models on FACET, only evaluate, test, and benchmark them to use it and the data set.

About Chambers

Check Also

The Air Force has abandoned its attempt to install a directed-energy weapon on a fighter jet, marking another failure for airborne lasers

The U.S. military’s most recent endeavor to create an airborne laser weapon, designed to safeguard …