Report reveals AI database with abusive child images; raises alarm on misuse for generating realistic fake exploitation pictures. (IT World Canada)


December 21, 2023

The uncovering of thousands of sexually abusive images of children on a widely-used database for training artificial intelligence (AI) image generators has sparked concerns about the potential misuse of these offensive photos. A recently released report from the Stanford University Internet Observatory (SIO) reveals that these disturbing images were part of a vast database named LAION-5B, employed by AI developers for training purposes.

This alarming discovery has prompted action to remove the source images. Researchers promptly reported the image URLs to organizations dedicated to combating child exploitation, including the National Center for Missing and Exploited Children (NCMEC) in the United States and the Canadian Centre for Child Protection (C3P).

The investigation disclosed the troubling images within LAION-5B, recognized as one of the largest repositories of images used by AI developers for training. LAION-5B contains billions of images sourced from various platforms, encompassing mainstream social media and popular adult video sites.

Following the report's release, the nonprofit Large-scale Artificial Intelligence Open Network (LAION) emphasized a stringent policy against illegal content. LAION promptly removed the datasets from its platform until the offensive images could be expunged.

The SIO report clarified its investigation methods, highlighting the use of hashing tools like Microsoft’s PhotoDNA to match image fingerprints with databases maintained by nonprofits addressing online child sexual exploitation. The study did not involve viewing abusive content directly, and matches were verified by relevant organizations like NCMEC and C3P wherever possible.

Addressing the challenges surrounding datasets used to train AI models, the SIO suggested safety measures for future data collection and model training. Recommendations included cross-checking images against known databases of child sexual abuse material (CSAM) and collaborating with child safety organizations like NCMEC and C3P.

While LAION-5B's creation involved attempts to filter explicit content and identify underage explicit material, the report highlighted issues with widely-used AI image-generating models like Stable Diffusion, trained on a diverse range of content, including explicit material. Additionally, other models like Google’s Imagen, trained using LAION datasets, were found to contain inappropriate content, leading to concerns about public use.

Despite efforts to identify CSAM within LAION-5B, the SIO recognized limitations due to incomplete industry hash sets, content attrition, limited access to original reference sets, and inaccuracies in classifying "unsafe" content.

The report concluded that web-scale datasets pose significant problems, advocating for their restriction to research settings, while promoting more curated and meticulously sourced datasets for publicly distributed AI models.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

You may also like

Trump Weighs Tariffs to Fight Digital Taxes on US Tech Firms

Former President Donald Trump is considering imposing tariffs on countries that tax American tech giants like Alphabet (Google) and Meta....

Elon Musk’s $44B Gamble on X May Finally Pay Off

When Elon Musk purchased Twitter in October 2022 for $44 billion, many saw it as a costly mistake. He immediately....

NASA Leadership Shake-Up Raises Doubts on Moon Mission Plans

NASA is facing a leadership shake-up as four senior officials linked to its Artemis moon program step down, raising concerns....

Elon Musk Unveils Grok 3, Claims It Outperforms ChatGPT & More

Elon Musk’s AI startup, xAI, has officially launched Grok 3, its latest artificial intelligence model, which he claims surpasses leading....

Google Canada Rejects Claims of Market Power Abuse

Google Canada has dismissed allegations of monopolistic practices in response to the Competition Bureau’s lawsuit over its advertising operations. The....

Google Expands AI Hub in Poland for Energy, Cybersecurity

Google is strengthening its presence in Poland by expanding its artificial intelligence (AI) initiatives in key sectors like energy and....

OpenAI Rejects Musk’s $97.4B Bid to Take Over the Company

OpenAI’s board has firmly declined a $97.4 billion buyout offer led by Elon Musk, reinforcing its stance that the company....

TikTok Returns to U.S. App Stores After Temporary Ban

Google and Apple have reinstated TikTok on their U.S. app stores following a brief removal, marking another twist in the....

NASA’s Stuck Astronauts Set to Return to Earth Sooner

Two NASA astronauts stranded aboard the International Space Station (ISS) for over eight months may finally return home sooner than....

Beats Powerbeats Pro 2 Launches with Heart-Rate Monitor

Apple’s Beats brand has unveiled the Powerbeats Pro 2, a long-awaited update to its popular fitness-focused earbuds. This new version....

Space Telescope Captures Stunning Ring of Light Around Galaxy

A newly spotted glowing ring in deep space has captivated astronomers worldwide. The Euclid space telescope, launched by the European....

Musk’s $97.4B Bid for OpenAI Sparks Fresh AI Battle

Elon Musk and his group have made a staggering $97.4 billion offer to take over OpenAI, reigniting tensions with CEO....