When the training data becomes the battlefield: reflexive control and LLM poisoning

Many might have missed it, however last week DFRLab put out a new report, titled “Pravda in the pipeline: Early evidence of state-adjacent propaganda in AI training data” on AI poisioning. The report shows that by November 2025, some 40,000 English-language articles from the Pravda network had been archived in Common Crawl. This is the open dataset that serves as a primary ingredient for training most major language models.

The Pravda network is a network of over 150 pro-Kremlin websites that collectively published more than 10,000 articles per day. The network’s technical configuration, robots.txt and sitemaps explicitly designed to facilitate automated crawling, suggest that reaching human readers was not the primary objective. This is a relevant finding if we combine it with the finding in October 2025, that approximately 250 poisoned documents, representing as little as 0.00016% of total training data, can embed a persistent backdoor in a large language model, regardless of model size. Not a percentage of the data. A fixed, small, absolute number.

The convergence of these two findings raises a question that sits at the intersection of information warfare doctrine and AI security: can the Russian concept of reflexive control be applied to the poisoning of LLM training data?

In this blog post I will explore that question. My conclusion is that the mapping is imperfect but instructive. Understanding where the concept applies, and where it breaks down, offers a useful analytical framework for intelligence professionals who increasingly encounter LLM-generated information in their work. Before I dive into the analysis, some background explanation is needed. First I give some attention and thereafter to the vulnerabilities of LLMs.

Reflexive control: the mechanism

Reflexive control (рефлексивное управление) is a Soviet-era doctrine, developed from the 1960s onwards, that describes a specific type of information manipulation. The most widely cited English-language treatment is Timothy Thomas’s 2004 article “Russia’s Reflexive Control Theory and the Military“.

The core mechanism of reflexive control can be summarised as follows. The controller studies the adversary’s cognitive “filter”, the term for the target’s stable set of concepts, knowledge, experience and biases through which all information is processed. The controller then constructs tailored information inputs designed to pass through that filter and cause the target to voluntarily make a decision that serves the controller’s interests. The target never knows they are being controlled. They believe they arrived at their decision independently.

This is distinct from simple deception (maskirovka). Deception hides reality. Reflexive control shapes the adversary’s decision-making process itself. The Crimea operation in 2014 is widely cited as a contemporary example: a combination of military distraction, media suggestion, and diplomatic paralysis that caused Western decision-makers to voluntarily delay responses, exactly as intended.

Four vulnerable stages in the LLM pipeline

To understand how this doctrine might apply to AI systems, it helps to distinguish the four stages at which an LLM’s training pipeline is vulnerable to manipulation.

Pre-training data is the foundation. Models like GPT-4, Claude and Llama are trained on vast web scrapes. Common Crawl alone contains hundreds of billions of pages and no human review of this data is possible at scale. Poisoned content that enters the pre-training data becomes embedded in the model’s weights: persistent, difficult to attribute, and requiring full retraining to remediate. This is the stage that should receive most attention from the security community.
Fine-tuning data is smaller but not immune. Research by Wan et al. (2023) demonstrated that approximately 100 poisoned instruction-tuning samples are sufficient for targeted behaviour modification.
Then reinforcement learning from human feedback (RLHF) preference data presents a particularly dangerous attack surface. Rando and Tramèr (ICLR 2024) showed that poisoning as few as 0.5% of human feedback annotations can install universal jailbreak backdoors that generalise to entirely unseen prompts.
Instruction tuning datasets, many of which are shared openly on platforms such as Hugging Face, present supply-chain risks analogous to software dependency vulnerabilities.

For intelligence analysts, the key distinction is between pre-training poisoning and everything else. Fine-tuning, RLHF and instruction tuning are at least in principle auditable, the datasets are smaller, the provenance can be checked. Pre-training poisoning operates at a scale that makes systematic detection intractable with current methods.

The research: what has been demonstrated

Three studies define the current understanding of the threat. First is the Anthropic sleeper agents paper (Hubinger et al., January 2024) proved that LLMs can be trained to behave correctly under normal conditions while activating hidden behaviours when triggered by specific conditions, for example, writing secure code when the year is 2023, but inserting exploitable vulnerabilities when the year changes to 2024. The critical finding for the security community: standard safety training such as supervised fine-tuning, RLHF, and adversarial red-teaming, all failed to remove these backdoors. Larger models showed more persistent backdoors, not fewer.

The 250-document study I already mentioned scaled this to pre-training. Their finding that a near-constant number of poisoned documents suffices regardless of model size overturns the prior assumption that attackers would need to poison a percentage of training data, a seemingly prohibitive requirement. In practice, 250 documents is trivial.

Carlini et al. (2024) demonstrated two practical attack vectors. Split-view poisoning exploits expired domains still referenced in datasets, the authors estimated that 0.01% of the LAION-400M dataset could have been poisoned for approximately $60 by purchasing such domains. Frontrunning poisoning times edits to Wikipedia or other mutable sources to coincide with scheduled dataset snapshots. Both attacks were demonstrated as practical against ten popular training datasets.

The Pravda network: a case in point

Let’s now get to the case of the Pravda network. The most concrete example of what appears to be a deliberate, state-linked attempt to contaminate LLM training data is the Pravda network. Identified by VIGINUM in February 2024 and subsequently investigated by NewsGuard, the American Sunlight Project, and the DFRLab, this network comprises over 150 websites founded by Yevgeny Shevchenko, linked to Russian IT company TigerWeb.

Several indicators point to LLM contamination as a primary objective rather than a secondary effect. The sites generated enormous volumes of content, i.e. 10,000+ articles per day, but attracted negligible human traffic: approximately 31,000 visits per month across the entire network. As a media operation aimed at human audiences, this would be a failure. As a training data contamination operation, it is precisely structured. The DFRLab’s April 2026 report “Pravda in the Pipeline” documented the growth from 37 Pravda articles in Common Crawl in November 2024 to approximately 40,000 by November 2025.

The American Sunlight Project coined the term “LLM grooming” for this type of activity, a deliberately provocative term designed to convey the patient, long-term nature of the contamination effort.

It should be noted that the evidence is not uncontested. A Harvard Kennedy School study (Alyukov et al., 2025) tested four major chatbots against 13 known Kremlin disinformation claims and found that identifiable false narratives surfaced primarily in response to niche or obscure queries, leading the authors to favour a “data void” explanation over deliberate grooming. However, their methodology has a significant limitation when viewed through a reflexive control lens: they measured whether LLMs repeat recognisable disinformation i.e. known false claims traceable to Kremlin sources. By the logic of reflexive control, that is precisely what a successful operation would not produce. Effective training data manipulation would manifest as subtle shifts in framing, selective omission, and altered weighting of arguments. None of these involve repeating an identifiable talking point, and none of which the Alyukov methodology can detect. Their study answers the question “are LLMs parroting known Kremlin narratives?” and the answer is: rarely. But it does not, and methodologically cannot, answer the question “has the baseline framing of LLM outputs on contested topics been shifted by contaminated training data?” That second question remains open. Also the DFRLab’s finding of actual Common Crawl contamination stands, the debate concerns the mechanism (accidental data voids versus deliberate targeting), not the fact of contamination.

GEO: the more visible cousin

A separate but related phenomenon is visible in marketing circles: Generative Engine Optimisation, or GEO. Formally defined by Aggarwal et al., GEO describes the optimisation of web content to increase visibility in AI-generated responses. Their research demonstrated up to 40% visibility improvement using techniques such as adding statistics, source citations and authoritative language.

GEO operates at a fundamentally different level from training data poisoning. It targets retrieval-augmented generation (RAG) systems which is the live web searches that supplement an LLM’s trained knowledge. This distinction matters for security assessment: GEO manipulation is detectable (the source content can be inspected), reversible (update the source and the output changes), and attributable (traceable to specific domains). Training data poisoning, by contrast, is embedded in model weights, opaque, persistent, and practically irreversible without retraining.

The Pravda network represents what appears to be a hybrid approach: content optimised to surface in both RAG retrieval and training data pipelines simultaneously. The fact that a growing GEO industry already exists, demonstrates that the commercial infrastructure for influencing AI outputs is mature and available.

Does reflexive control actually apply?

Upon reflection, my initial idea appears not to be completely correct. The conceptual mapping between reflexive control and LLM data poisoning is suggestive but breaks down on several defining features of the doctrine.

First, there appears no voluntary decision-maker. Classical reflexive control requires the target to voluntarily make a specific decision. An LLM is a statistical pattern-completion system, it has no agency, beliefs, or decision-making process. Poisoning its weights is closer to sabotaging a tool than deceiving an adversary.
Also there is no cognitive filter to model. The doctrine requires deep study of the adversary’s cognitive framework. LLM processing is opaque even to its creators. No attacker can precisely predict how poisoned data will influence outputs across billions of possible queries.
There is no feedback loop. In Crimea 2014, Russian operators observed Western reactions in real time and calibrated their approach. Once poisoned data enters a training corpus, the attacker has no real-time feedback on its effects.
Lastly, we see diffusion, not precision. Classical reflexive control is targeted at specific decision-makers with precise intended outcomes. LLM poisoning is however inherently diffuse. The attacker cannot control who queries the model, what they ask, or how they weight the response.

In sum, we cannot map the idea reflexive controle suffcientkly on the LLM manpulation that appears to be ongoing.

Where the concept does apply: a layered model

That all said, there is a level at which the reflexive control framework offers genuine analytical value, that is if we apply it not to the LLM itself but to the broader decision-making ecosystem. Consider the following layers:

At the technical layer, poisoning training data targets model weights. This is supply-chain contamination, not reflexive control.
At the operational layer, contaminated outputs shape the information available to users. This is information environment manipulation.
At the strategic layer, decision-makers who rely on LLM-mediated information make choices informed by poisoned outputs. This is where the reflexive control mapping becomes relevant but only if the attacker has modelled the decision-making ecosystem sufficiently to predict how poisoned outputs will reach decision-makers.
At the systemic layer, poisoned LLM outputs feed back into next-generation training data, creating a self-reinforcing contamination cycle. This is epistemic infrastructure degradation.

Vasara’s (2020) distinction between constructive and destructive reflexive control is helpful here. Constructive reflexive control steers the adversary toward a specific desired decision. Destructive reflexive control degrades the adversary’s decision-making capacity. LLM data poisoning maps far more credibly onto the destructive variant: eroding the reliability of AI-mediated information systems rather than steering specific decisions.

There is one further dimension where the reflexive control framework applies with uncomfortable precision. The structural economics of information create an asymmetric battlefield. Authoritarian-state media networks produce high-volume, freely accessible, AI-optimised content. Quality Western journalism sits behind paywalls and increasingly blocks AI crawlers. The Foundation for Defense of Democracies reported this problem last March and demonstrated that 57% of LLM responses to questions about contested international conflicts cited state-aligned propaganda sources. This is not because the models were hacked it is because authoritarian content is freely available where independent journalism is not.

In this sense, Western AI developers, acting entirely rationally within market incentives, make the voluntary choice to train on freely available data that happens to be disproportionately shaped by adversarial actors. The adversary does not need to deceive anyone. They need only be the loudest voice in the room when the training data is collected. If there is a reflexive control operation at work here, it operates through the target’s own rational decision-making framework which is, in fact, the purest form of the doctrine.

Implications for intelligence analysis

For intelligence and security professionals, several practical conclusions could follow from this analysis.

First, unsurprisingly, default scepticism toward LLM output is warranted. I try to apply LLM capabilities in every stage in the intelligence cycle, including in the collection stage. Often LLMs are able to find certain open sources which I might not have easily found using other, more classic, collection methods. However, what I do not know, is which sources or arguments were not presented to me because they received a lower weight.

Secondly, and a bit more on the positive side: Indicators exist and can be monitored. Research identified observable precursors to LLM data poisoning: networks of websites with high crawl rates but negligible human traffic; robots.txt configurations designed to maximise crawler access; rapid growth of previously obscure domains in Common Crawl archives; content published in implausible language combinations (the Pravda network published in Scottish Gaelic, Welsh and Maori targeting low-resource language data voids); and coordinated Wikipedia editing campaigns timed to dataset snapshots.

Third, the GEO ecosystem provides a commercial blueprint. The techniques used by marketing agencies for Generative Engine Optimisation are methodologically identical to those required for adversarial LLM manipulation. The difference is intent, not capability. Any state actor with a GEO-literate team can apply these methods to influence operations.

Lastly, the “paywall paradox” is a structural vulnerability. The fact that high-quality information is increasingly locked behind paywalls while propaganda is freely available and AI-optimised, is not a technical problem it is a strategic one. It cannot be solved by better poisoning detection alone.

Conclusion

Is LLM training data poisoning reflexive control? Strictly speaking, no, the doctrine’s requirements for a voluntary decision-maker, cognitive modelling, precision targeting and real-time feedback are not met at the technical level of data poisoning itself. However, at the strategic level, where poisoned AI outputs feed into the decision-making processes of analysts, policymakers and citizens, the framework offers a useful lens, particularly in its destructive variant.

What is beyond reasonable doubt is that the capability exists, that the intent has been signalled (the Pravda network, Putin’s 2023 remarks on training AI from a Russian perspective), and that contamination has already occurred (40,000 articles in Common Crawl). Whether this constitutes reflexive control in the doctrinal sense is an academic question. Whether it constitutes a threat to the integrity of AI-mediated information, and therefore to the quality of intelligence analysis, is not.

This threat is not limited to commercial chatbots. Organisations that deploy in-house LLMs face the same vulnerability, unless they were trained exclusively on controlled, curated data. Any model built on Common Crawl, The Pile, or other open web scrapes inherits whatever contamination those corpora contain. The label “in-house” provides no protection if the training data was not.

For intelligence professionals, the practical response goes beyond the traditional principles of systematic evaluation and independent corroboration. It requires actively challenging the completeness of any LLM-generated collection output and critically examining the weighting of arguments in any reasoning task an LLM performs. An LLM does not flag what it does not know. It does not disclose that a counterargument was underrepresented in its training data. That burden falls on the analyst.