• +44(0)7855748256
  • bolaogun9@gmail.com
  • London

Understanding “LLM Grooming”: The New Frontier of AI-Powered Disinformation

Published: July 31, 2025

In February 2025, cyber-security researchers uncovered a disturbing new threat to the integrity of artificial intelligence systems worldwide. They called it “LLM grooming” – a sophisticated disinformation strategy that doesn’t target human readers directly, but instead aims to corrupt the very AI systems millions of people now rely on for information.

This emerging threat represents a fundamental shift in how disinformation campaigns operate, moving from targeting human audiences to poisoning the data that trains our AI assistants. As we increasingly turn to chatbots like ChatGPT, Claude, and Gemini for answers, understanding this threat becomes crucial for anyone who uses AI technology.

What is LLM Grooming?

“LLM grooming” is a term coined by The American Sunlight Project to describe a new form of information warfare where malicious actors flood the internet with propaganda specifically designed to be ingested by AI training systems. Unlike traditional disinformation campaigns that aim to deceive human readers, LLM grooming targets the web crawlers and data collection systems that feed large language models (LLMs).

The concept is both simple and insidious: by creating massive amounts of false content across hundreds of websites, bad actors can ensure their propaganda becomes part of the training data for AI models. When users later ask these AI systems about related topics, the chatbots unwittingly repeat the false narratives they’ve learned.

The Pravda Network: A Case Study in AI Manipulation

The most documented example of LLM grooming is the “Pravda network,” a Russia-based operation that has published over 3.6 million articles across 182 domains. Despite its massive output – over 10,000 articles daily – these websites receive virtually no human visitors. Their poorly designed interfaces and difficult navigation make it clear they weren’t built for human consumption.

Instead, the network appears designed exclusively for AI web crawlers. The sites use automated translation to publish content in 12 languages, targeting 74 countries and regions. They focus particularly on Ukraine-related topics, spreading at least 207 provably false claims about everything from “secret US biolabs” to allegations about Ukrainian leadership.

The Pravda Network: Scale and Reach 10,000+ Articles published daily 3.6M Articles in 2024 alone 182 Unique domains & subdomains Geographic Targeting 74 countries and regions 12 languages NATO & EU targeted 9 heads of state Disinformation Spread 207+ Provably false claims Including: • “Secret US biolabs in Ukraine” • “Zelensky’s misuse of aid” Key Insight: Zero Human Audience The network’s sites have virtually no human visitors – they’re designed exclusively to be scraped by AI web crawlers for training data contamination

The Impact: One-Third of AI Responses Contaminated

Research by NewsGuard in March 2025 revealed the alarming effectiveness of this strategy. When testing 10 major AI chatbots – including ChatGPT-4o, Claude, Gemini, Copilot, and Meta AI – researchers found that 33% of responses about topics targeted by the Pravda network contained Russian disinformation narratives.

This means that when users ask these AI assistants about Ukraine, NATO, or related geopolitical topics, they have a one-in-three chance of receiving an answer contaminated with propaganda. The chatbots present these false narratives alongside accurate information, making it extremely difficult for users to distinguish fact from fiction.

Information Laundering: How Propaganda Becomes “Facts”

The Pravda network employs a sophisticated “information laundering” process to legitimize its content:

  1. Original propaganda starts with Russian state media outlets like TASS and RT, which are banned in the European Union
  2. The Pravda network republishes and remixes this content across its 182 “independent” news sites
  3. Wikipedia citations – researchers found nearly 2,000 links to Pravda sites across Wikipedia in 44 languages
  4. AI training data – Language models crawl Wikipedia and the broader web, ingesting the laundered propaganda
  5. User queries – When people ask AI chatbots questions, they receive responses influenced by this contaminated training data

Why This Matters: The Democratization of Disinformation

LLM grooming represents a fundamental shift in information warfare for several reasons:

Scale and Automation

Traditional disinformation required human writers, social media manipulation, and costly influence operations. LLM grooming can be almost entirely automated, producing millions of articles with minimal human involvement.

Persistence

Once false information enters an AI model’s training data, it becomes part of the model’s “knowledge.” Unlike a false social media post that can be deleted, contaminated training data affects every future interaction with that model.

Trust and Authority

People increasingly view AI chatbots as neutral, authoritative sources of information. When these systems repeat disinformation, it carries an implicit endorsement that makes the false claims more believable.

Global Reach

A single contaminated AI model can spread disinformation to millions of users worldwide, in multiple languages, without any additional effort from the attackers.

Beyond Russia: A Playbook for Bad Actors

While the Pravda network is the most documented example, experts warn that its success provides a blueprint for other malicious actors. Any group with the resources to create content at scale – whether nation-states, extremist organizations, or corporate interests – could employ similar tactics.

The technique is particularly attractive because:

  • It’s relatively inexpensive compared to traditional influence operations
  • It’s difficult to detect and attribute
  • It exploits the fundamental way AI systems learn
  • Current AI companies have limited defenses against this type of attack

The Technical Challenge: Why AI Systems Are Vulnerable

Large language models learn by processing enormous amounts of text from the internet. This training process, while powerful, has several vulnerabilities:

  1. Volume over verification: AI training prioritizes ingesting large amounts of data over verifying its accuracy
  2. No source hierarchy: Training data doesn’t inherently distinguish between reliable and unreliable sources
  3. Pattern matching: LLMs learn patterns in data, so frequently repeated false claims can be interpreted as facts
  4. Update challenges: Once trained, updating or correcting model knowledge is difficult and expensive

Current Responses and Mitigation Efforts

As awareness of LLM grooming grows, various stakeholders are developing responses:

AI Companies

  • Implementing better filtering of training data sources
  • Developing systems to detect and flag potential disinformation in outputs
  • Increasing transparency about training data sources

Researchers

  • Creating tools to detect AI-generated or manipulated content
  • Developing “watermarking” systems for authentic content
  • Building databases of known disinformation networks

Policymakers

  • Discussing regulations requiring AI companies to audit training data
  • Proposing transparency requirements for AI training processes
  • Considering liability frameworks for AI-spread disinformation

What Users Can Do

While systemic solutions are developed, users of AI chatbots should:

  1. Verify critical information – Don’t rely solely on AI responses for important decisions or controversial topics
  2. Check multiple sources – Cross-reference AI answers with established news sources and fact-checkers
  3. Be skeptical of specifics – Be particularly cautious about specific claims regarding ongoing conflicts or political situations
  4. Report issues – Use feedback mechanisms to report when AI systems provide false information
  5. Stay informed – Keep up with news about AI security and known disinformation campaigns

The Future of Information Integrity

LLM grooming forces us to confront fundamental questions about information integrity in the AI age:

  • How do we ensure AI systems learn from accurate information?
  • Who decides what constitutes reliable training data?
  • How do we balance AI capabilities with security concerns?
  • What responsibilities do AI companies have to their users?

These questions will only become more urgent as AI systems become more integrated into our daily lives, from search engines to educational tools to decision-making systems.

Conclusion: A Call for Vigilance

The discovery of LLM grooming marks a new chapter in the ongoing battle against disinformation. As the American Sunlight Project warned, “The long-term risks – political, social, and technological – associated with potential LLM grooming within this network are high.”

This threat requires a coordinated response from AI companies, researchers, policymakers, and users. We must develop technical solutions to protect AI training data, regulatory frameworks to ensure accountability, and public awareness to help people navigate this new landscape.

Most importantly, we must recognize that the same AI systems that promise to democratize access to information can also be weaponized to spread falsehoods at an unprecedented scale. Only through vigilance, transparency, and collective action can we ensure that AI remains a tool for enlightenment rather than deception.

As we stand at this crossroads, one thing is clear: the integrity of our AI systems is too important to leave to chance. The fight against LLM grooming is not just about protecting chatbots – it’s about preserving the trustworthiness of our entire information ecosystem in the age of artificial intelligence.


This article is based on research from The American Sunlight Project, NewsGuard, CheckFirst, the Atlantic Council’s Digital Forensic Research Lab, and other cybersecurity organizations studying the intersection of AI and disinformation.

Leave a Reply

Your email address will not be published. Required fields are marked *