Report reveals AI database with abusive child images; raises alarm on misuse for generating realistic fake exploitation pictures. (IT World Canada)


December 21, 2023

The uncovering of thousands of sexually abusive images of children on a widely-used database for training artificial intelligence (AI) image generators has sparked concerns about the potential misuse of these offensive photos. A recently released report from the Stanford University Internet Observatory (SIO) reveals that these disturbing images were part of a vast database named LAION-5B, employed by AI developers for training purposes.

This alarming discovery has prompted action to remove the source images. Researchers promptly reported the image URLs to organizations dedicated to combating child exploitation, including the National Center for Missing and Exploited Children (NCMEC) in the United States and the Canadian Centre for Child Protection (C3P).

The investigation disclosed the troubling images within LAION-5B, recognized as one of the largest repositories of images used by AI developers for training. LAION-5B contains billions of images sourced from various platforms, encompassing mainstream social media and popular adult video sites.

Following the report's release, the nonprofit Large-scale Artificial Intelligence Open Network (LAION) emphasized a stringent policy against illegal content. LAION promptly removed the datasets from its platform until the offensive images could be expunged.

The SIO report clarified its investigation methods, highlighting the use of hashing tools like Microsoft’s PhotoDNA to match image fingerprints with databases maintained by nonprofits addressing online child sexual exploitation. The study did not involve viewing abusive content directly, and matches were verified by relevant organizations like NCMEC and C3P wherever possible.

Addressing the challenges surrounding datasets used to train AI models, the SIO suggested safety measures for future data collection and model training. Recommendations included cross-checking images against known databases of child sexual abuse material (CSAM) and collaborating with child safety organizations like NCMEC and C3P.

While LAION-5B's creation involved attempts to filter explicit content and identify underage explicit material, the report highlighted issues with widely-used AI image-generating models like Stable Diffusion, trained on a diverse range of content, including explicit material. Additionally, other models like Google’s Imagen, trained using LAION datasets, were found to contain inappropriate content, leading to concerns about public use.

Despite efforts to identify CSAM within LAION-5B, the SIO recognized limitations due to incomplete industry hash sets, content attrition, limited access to original reference sets, and inaccuracies in classifying "unsafe" content.

The report concluded that web-scale datasets pose significant problems, advocating for their restriction to research settings, while promoting more curated and meticulously sourced datasets for publicly distributed AI models.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

You may also like

Samsung Drops Ultra-Slim S25 Edge as Apple Readies iPhone 17 Air

In a surprise move, Samsung has launched a new, sleeker version of its popular S25 smartphone—called the Galaxy S25 Edge.....

Texas Secures $1.4B Settlement From Google In Major Privacy Lawsuit

In a landmark legal victory, Texas has reached a $1.4 billion settlement with Google over claims the tech company secretly....

Nvidia Tones Down H20 Chip for China to Work Around US Ban

Nvidia is planning to release a toned-down version of its H20 artificial intelligence chip to Chinese customers, aiming for a....

Google Stocks Tumble After Apple Testimony Sparks AI Worries

In a major blow to tech giant Google, its parent company Alphabet saw its stock value plummet by more than....

US Moves to blow up Google with proposed teardown of its Digital Ads

The U.S. government is stepping up its fight against Google, aiming to break apart the company’s powerful digital advertising business.....

OpenAI Drops For-Profit Plans, Keeps Nonprofit in Control

OpenAI, the company behind ChatGPT, has decided not to shift to a fully for-profit model after months of debate and....

New Clue to Cosmic Gold Found in Magnetar Starquakes

A recent discovery may have brought scientists one step closer to solving a long-standing mystery: where does gold come from....

Meta Launches Personal AI App to Rival Chatgpt

Meta has stepped into the AI spotlight with the launch of its first standalone artificial intelligence app, designed to compete....

Amazon's First Internet Satellites Head to Space to Join Global Race

Amazon has officially entered the race for space-based internet. On Monday, it launched its first group of internet satellites into....

Alphabet climbs as AI bets drive ad strength, quelling market fears

Alphabet, the parent company of Google, saw its shares rise nearly 4% on Friday after it posted strong quarterly results.....

EV Interest Dips Among Canadians for Third Year Straight

A recent AutoTrader survey reveals that interest in electric vehicles (EVs) among Canadians is steadily declining, despite a noticeable drop....

Nations Boost Digital Defences as Cyber Threats Grow

In a troubling sign of the times, hackers backed by Russia’s government infiltrated a water facility in the small Texas....