Report reveals AI database with abusive child images; raises alarm on misuse for generating realistic fake exploitation pictures. (IT World Canada)


December 21, 2023

The uncovering of thousands of sexually abusive images of children on a widely-used database for training artificial intelligence (AI) image generators has sparked concerns about the potential misuse of these offensive photos. A recently released report from the Stanford University Internet Observatory (SIO) reveals that these disturbing images were part of a vast database named LAION-5B, employed by AI developers for training purposes.

This alarming discovery has prompted action to remove the source images. Researchers promptly reported the image URLs to organizations dedicated to combating child exploitation, including the National Center for Missing and Exploited Children (NCMEC) in the United States and the Canadian Centre for Child Protection (C3P).

The investigation disclosed the troubling images within LAION-5B, recognized as one of the largest repositories of images used by AI developers for training. LAION-5B contains billions of images sourced from various platforms, encompassing mainstream social media and popular adult video sites.

Following the report's release, the nonprofit Large-scale Artificial Intelligence Open Network (LAION) emphasized a stringent policy against illegal content. LAION promptly removed the datasets from its platform until the offensive images could be expunged.

The SIO report clarified its investigation methods, highlighting the use of hashing tools like Microsoft’s PhotoDNA to match image fingerprints with databases maintained by nonprofits addressing online child sexual exploitation. The study did not involve viewing abusive content directly, and matches were verified by relevant organizations like NCMEC and C3P wherever possible.

Addressing the challenges surrounding datasets used to train AI models, the SIO suggested safety measures for future data collection and model training. Recommendations included cross-checking images against known databases of child sexual abuse material (CSAM) and collaborating with child safety organizations like NCMEC and C3P.

While LAION-5B's creation involved attempts to filter explicit content and identify underage explicit material, the report highlighted issues with widely-used AI image-generating models like Stable Diffusion, trained on a diverse range of content, including explicit material. Additionally, other models like Google’s Imagen, trained using LAION datasets, were found to contain inappropriate content, leading to concerns about public use.

Despite efforts to identify CSAM within LAION-5B, the SIO recognized limitations due to incomplete industry hash sets, content attrition, limited access to original reference sets, and inaccuracies in classifying "unsafe" content.

The report concluded that web-scale datasets pose significant problems, advocating for their restriction to research settings, while promoting more curated and meticulously sourced datasets for publicly distributed AI models.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

You may also like

Meta Turns to Nuclear Power to Keep Up with AI Demand

Meta, the parent company of Facebook, has signed a long-term agreement to power its growing artificial intelligence (AI) operations using....

Young AI Coding Startups Surge with Huge Investor Backing

In just a couple of years since ChatGPT made headlines, a new wave of AI-driven coding startups is grabbing the....

Neuralink Secures $650M in Funding as Brain Chip Enters Trials

Elon Musk’s brain-tech company Neuralink has raised a massive $650 million in its latest funding round, marking a major step....

Google to Spend $500M to Fix Compliance After Lawsuit

In a major move to reshape its internal practices, Google has agreed to invest $500 million over the next decade....

Google Pushes Back Against Chrome Breakup Proposal

In a closely watched legal showdown, Google has pushed back against efforts to break up its popular Chrome browser. The....

US Lawyer Warns Canada About AI and Political Threats

An American lawyer known for challenging former U.S. President Donald Trump is urging Canadians to stay alert when it comes....

Google Faces Legal Clash with Bureau Over Ad Market Power

Google is at the center of a legal standoff with Canada’s Competition Bureau. The tech giant is fighting back against....

Claude AI Left Secret Notes That Alarmed Its Own Creators

A new artificial intelligence model, Claude Opus 4, has drawn major attention not just for its power but for its....

Dalhousie University Uses 3D Printing to Fix Navy Ships Fast

Dalhousie University in Halifax is teaming up with Canada’s Department of National Defence to help keep the country’s naval fleet....

Strauss’ ‘Blue Danube’ Waltz Set to Launch Into Space for 200th Birthday

This month, Johann Strauss II’s famous waltz, “Blue Danube,” will embark on a unique journey—into outer space—to celebrate the 200th....

Census Bureau Cuts Raise Worries About Data Future

A group launched by Elon Musk, called the Department of Government Efficiency (DOGE), is now taking aim at the U.S.....

Google’s Veo 3: A Game-Changing AI Video Tool Stuns and Scares Viewers

Google’s latest AI creation, Veo 3, is taking the internet by storm—and not just for the right reasons. The tool’s....