Report reveals AI database with abusive child images; raises alarm on misuse for generating realistic fake exploitation pictures. (IT World Canada)


December 21, 2023

The uncovering of thousands of sexually abusive images of children on a widely-used database for training artificial intelligence (AI) image generators has sparked concerns about the potential misuse of these offensive photos. A recently released report from the Stanford University Internet Observatory (SIO) reveals that these disturbing images were part of a vast database named LAION-5B, employed by AI developers for training purposes.

This alarming discovery has prompted action to remove the source images. Researchers promptly reported the image URLs to organizations dedicated to combating child exploitation, including the National Center for Missing and Exploited Children (NCMEC) in the United States and the Canadian Centre for Child Protection (C3P).

The investigation disclosed the troubling images within LAION-5B, recognized as one of the largest repositories of images used by AI developers for training. LAION-5B contains billions of images sourced from various platforms, encompassing mainstream social media and popular adult video sites.

Following the report's release, the nonprofit Large-scale Artificial Intelligence Open Network (LAION) emphasized a stringent policy against illegal content. LAION promptly removed the datasets from its platform until the offensive images could be expunged.

The SIO report clarified its investigation methods, highlighting the use of hashing tools like Microsoft’s PhotoDNA to match image fingerprints with databases maintained by nonprofits addressing online child sexual exploitation. The study did not involve viewing abusive content directly, and matches were verified by relevant organizations like NCMEC and C3P wherever possible.

Addressing the challenges surrounding datasets used to train AI models, the SIO suggested safety measures for future data collection and model training. Recommendations included cross-checking images against known databases of child sexual abuse material (CSAM) and collaborating with child safety organizations like NCMEC and C3P.

While LAION-5B's creation involved attempts to filter explicit content and identify underage explicit material, the report highlighted issues with widely-used AI image-generating models like Stable Diffusion, trained on a diverse range of content, including explicit material. Additionally, other models like Google’s Imagen, trained using LAION datasets, were found to contain inappropriate content, leading to concerns about public use.

Despite efforts to identify CSAM within LAION-5B, the SIO recognized limitations due to incomplete industry hash sets, content attrition, limited access to original reference sets, and inaccuracies in classifying "unsafe" content.

The report concluded that web-scale datasets pose significant problems, advocating for their restriction to research settings, while promoting more curated and meticulously sourced datasets for publicly distributed AI models.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

You may also like

Bitcoin Investor Buys an Entire SpaceX Flight for the Ultimate Polar Adventure

A bold new chapter in space tourism unfolded as Chun Wang, a Bitcoin investor and entrepreneur, launched into orbit on....

Elon Musk’s xAI Acquires X in $33 Billion Stock Deal

Elon Musk’s artificial intelligence startup, xAI, has officially taken over his social media platform, X, in a deal valued at....

Trump Considers Lowering Tariffs to Seal TikTok Deal

Former U.S. President Donald Trump signalled on Wednesday that he might reduce tariffs on China to facilitate the sale of....

U.S. Robotics Firms Urge National Strategy to Compete China

American robotics companies are calling for a national U.S. robotics strategy to strengthen the industry and maintain a competitive edge....

Waymo Plans Self-Driving Taxi Service in Washington by 2026

Alphabet’s autonomous taxi service, Waymo, is expanding to Washington, D.C., with plans to launch in 2026. The announcement, made on....

Trump Aides Used Signal for Secret War Talks – What to Know

Top officials from the Trump administration reportedly used the encrypted messaging app Signal to discuss military plans, sparking concerns over....

PsiQuantum Secures $750M to Advance Quantum Computing

According to sources, Quantum computing startup PsiQuantum is securing at least $750 million in funding, pushing its valuation to $6....

Are We Ready to Mine Metals from Space? The Future of Asteroid Mining

Asteroid Mining: A Sci-Fi Dream or an Inevitable Future? For decades, space enthusiasts and scientists have imagined a future where....

Nvidia CEO Surprised By Public Quantum Computing Companies

Nvidia CEO Jensen Huang admitted he was unaware that publicly traded quantum computing firms existed when he previously commented on....

Tesla Faces Crisis: Cybertruck Recall & Musk’s Trump Ties

Tesla and its CEO Elon Musk are in hot water as controversy swirls around the company. One of Tesla’s strongest....

Humanoid Robots Could Arrive Sooner Than Expected, Says Nvidia CEO

The world may be closer to a robotics revolution than most people think. Nvidia CEO Jensen Huang believes humanoid robots....

Nvidia’s AI Vision: Jensen Huang Unveils Future at GTC 2025

Nvidia CEO Jensen Huang took center stage at the GTC 2025 conference, often dubbed “AI Woodstock,” to discuss the rapid....