Report reveals AI database with abusive child images; raises alarm on misuse for generating realistic fake exploitation pictures. (IT World Canada)


December 21, 2023

The uncovering of thousands of sexually abusive images of children on a widely-used database for training artificial intelligence (AI) image generators has sparked concerns about the potential misuse of these offensive photos. A recently released report from the Stanford University Internet Observatory (SIO) reveals that these disturbing images were part of a vast database named LAION-5B, employed by AI developers for training purposes.

This alarming discovery has prompted action to remove the source images. Researchers promptly reported the image URLs to organizations dedicated to combating child exploitation, including the National Center for Missing and Exploited Children (NCMEC) in the United States and the Canadian Centre for Child Protection (C3P).

The investigation disclosed the troubling images within LAION-5B, recognized as one of the largest repositories of images used by AI developers for training. LAION-5B contains billions of images sourced from various platforms, encompassing mainstream social media and popular adult video sites.

Following the report's release, the nonprofit Large-scale Artificial Intelligence Open Network (LAION) emphasized a stringent policy against illegal content. LAION promptly removed the datasets from its platform until the offensive images could be expunged.

The SIO report clarified its investigation methods, highlighting the use of hashing tools like Microsoft’s PhotoDNA to match image fingerprints with databases maintained by nonprofits addressing online child sexual exploitation. The study did not involve viewing abusive content directly, and matches were verified by relevant organizations like NCMEC and C3P wherever possible.

Addressing the challenges surrounding datasets used to train AI models, the SIO suggested safety measures for future data collection and model training. Recommendations included cross-checking images against known databases of child sexual abuse material (CSAM) and collaborating with child safety organizations like NCMEC and C3P.

While LAION-5B's creation involved attempts to filter explicit content and identify underage explicit material, the report highlighted issues with widely-used AI image-generating models like Stable Diffusion, trained on a diverse range of content, including explicit material. Additionally, other models like Google’s Imagen, trained using LAION datasets, were found to contain inappropriate content, leading to concerns about public use.

Despite efforts to identify CSAM within LAION-5B, the SIO recognized limitations due to incomplete industry hash sets, content attrition, limited access to original reference sets, and inaccuracies in classifying "unsafe" content.

The report concluded that web-scale datasets pose significant problems, advocating for their restriction to research settings, while promoting more curated and meticulously sourced datasets for publicly distributed AI models.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

You may also like

Google developing 'AI Replies' feature for Pixel Phone app

Google is reportedly working on a new "AI Replies" feature for its Phone app on Pixel smartphones, which will use....

Amazon wins FAA approval for new delivery drone, testing in Arizona

Amazon announced on Tuesday that it received regulatory approval from the Federal Aviation Administration (FAA) to begin flying a new,....

Apple is set to release new AI features for the holiday season

Apple has officially launched its much-anticipated generative AI software, Apple Intelligence, with the first set of features going live on....

Perplexity launches AI-based hub for election information

Perplexity, an innovative company specializing in AI search technology, has introduced a new platform designed to provide essential information to....

Chinese researchers create AI model for military using Meta's Llama

Chinese research institutions tied to the People's Liberation Army (PLA) have reportedly developed a military-focused AI tool using Meta's publicly....

OpenAI partners with Broadcom and TSMC to create new chip

OpenAI is making significant strides in its efforts to enhance its artificial intelligence capabilities by collaborating with Broadcom and TSMC....

Meta creates AI search tool to reduce dependence on Google, Bing

Meta Platforms is making significant strides in the artificial intelligence (AI) sector by developing its own AI-powered search engine, aiming....

Google Developing AI to Take Control of Computers, Says Report

Google is making strides in artificial intelligence with a new project aimed at transforming how we interact with web browsers.....

OpenAI set to launch new AI model Orion by December

OpenAI is gearing up to release its highly anticipated AI model, Orion, sometime in December. However, unlike previous versions like....

Nvidia unveils Hindi AI model to boost growth for Indian firms

Nvidia, a global leader in chip technology, has introduced a new, lightweight AI model designed specifically for Hindi, India’s most....

Google Chrome update introduces article reading feature for users

Google has rolled out a major update to its Chrome browser on Android, significantly enhancing the "Listen to this page"....

Anthropic launches AI tool to automate mouse clicks for coders

Anthropic, a startup backed by Alphabet and Amazon, has introduced updated artificial intelligence models, adding a feature designed to automate....