The Dispatch


The Zombie Scientific Archive

How AI-generated content and misinformation are corrupting online academic resources, creating a "zombie" internet where errors and fake science perpetuate.

By Claudia Civinini

Share this page

Subscribe to QS Insights Magazine
"What we see on Facebook is not the dead internet but the zombie internet – a mix of bots, humans and accounts that were once human but are not anymore."
"As it is now easier to create presentable academic research from the simplest of prompts, we would expect to see more and more content being created, which may create a demand for more predatory journals to publish it."
“The Dead (or Zombie) Internet Theory sees the internet as a place with not much human input; it’s all bots talking to each other.”

It’s been going on for a while. Everyone’s English spelling has suddenly become perfect, but the things we read often struggle to make sense. Grammatically correct word salads are everywhere. And while em dashes were widely used pre-ChatGPT, perhaps there weren’t that many notably in the wild before.

And why are my friends’ posts on social media not showing up anymore? Is anybody there?

If you feel like this, you are not alone.

A similarly eerie feeling partly inspired two scientists, Dr Vlada Rozova and Dr Jake Renzella, to write an article about the Dead Internet Theory for The Conversation.

“A lot of the emails we send each other are very cliched, and with the large language models (LLMs) becoming really easily accessible, it seems everything we read written by the students is also very much like a bunch of templates just piped together,” says Dr Rozova, Research Fellow in Applied Machine Learning at the Centre for Digital Transformation of Health at the University of Melbourne, Australia.

“One of my colleagues said, ‘this year, the [research] proposals read nicely because all of a sudden no one has problems writing in English anymore. It’s easier to read, but it doesn’t make sense’,” she tells QS Insights Magazine.

When talking about the subject of their article with friends and family, one of the most common reactions they received was a sense of relief, because many had the same feeling, adds Dr Renzella, Senior Lecturer at the School of Computer Science and Engineering at the University of New South Wales, Australia.

In the article, Dr Rozova and Dr Renzella reviewed evidence that bot-generated content is deployed on social media to push misinformation supporting specific narratives.

According to the Dead Internet Theory, much of what we see online is bot activity. While it is an exaggeration, a look at the feed of some social media platforms gives the theory validation.

The eeriness of the online world shines through the data as well.

According to the Thales Bad Bot Report, automated traffic surpassed human activity in 2024 for the first time in a decade, accounting for 51 percent of all web traffic. The share of automated traffic online has been rising steadily, according to past editions of the report.

Of the automated traffic, 37 percent was created by bad bots, software that performs automated tasks for malicious intent.

The rise in bad bot activity specifically has the potential to damage academic integrity, Tim Ayling, Cybersecurity Specialist at Thales., tells QS Insights Magazine.

“They can be used to manipulate academic metrics, such as citation counts or journal impact factors. For example, some entities might deploy bots to generate fake citations or inflate the visibility of specific papers, skewing the evaluation of academic work,” he explains.

Generative AI, he explains, makes the development of simple bots easier, enabling even people without deep technical knowledge to launch bot attacks, and LLMs also make it possible to generate unoriginal or misleading academic content to be submitted and published.

“Bot activity can support unethical and predatory publishing, because thanks to automation it can happen at a vast scale,” Ayling explains.

“This automated publishing threatens to flood academic communities with misleading and poorly generated content that may infringe on the copyright of existing material.”

Writing on 404 Media, Jason Koebler used another horror analogy to describe the state of things online, suggesting that what we see on Facebook is not the dead internet but the zombie internet – a mix of bots, humans and accounts that were once human but are not anymore.

The zombie analogy is particularly useful to understand what this situation means for academia, as it seems to fit with the online undergrowth of unethical scientific publishing: a mix of AI-generated content, human fraud, and hard-to-kill mistakes and fake science potentially living on forever in AI models.

Fanning the Fake Science Flames

Perfect English is everywhere, but some predatory journals, somewhat reassuringly, still send academics emails ridden with errors. Professor Graham Kendall, Deputy Vice Chancellor (Research & Quality Assurance) and incoming Vice-Chancellor at MILA University Malaysia, who runs the @fake_journals account on X, said in an interview with Cabells’ Simon Linacre that he would have thought he’d be seeing much better written spam emails by now. “That has not changed since November 2022, when large language models became widely available,” he says.

Besides those misspelled emails, perhaps already a quaint reminder of a bygone era, Generative AI is now central to the production of junk science. According to Retraction Watch’s Ivan Oransky, generative AI has allowed the bad actors in scientific publishing to “industrialise the overflow” of junk papers.

Generative AI is, as many breathlessly repeat, just a tool. And the incentives to use that tool for the wrong reasons existed before its introduction: unethical publishing practices and predatory journals are an old problem.

“Of course - generative AI, and that type of cheap content it can generate, does accelerate predatory publishing. Why not? It’s been going on for a long time, even before ChatGPT, and now those tools are available to everyone,” Professor Jutta Haider, a Professor in Information Studies at the University of Borås, Sweden, tells QS Insights Magazine.

Some incentives to publish unethically come from academia itself, as extensively covered.

Professor Haider thinks there should be a limit on how much academics publish. “It should be quality over quantity,” she says.

Generative AI tools that can be used for writing papers (not tools supporting research investigations) seem to amplify exactly the wrong aspect of academia.

“If it’s to make more text output that accelerates, amplifies, supercharges things that were wrong in the first place… we should instead think about redesigning policies and the evaluation systems that we use,” she explains.

“What do we want research for - to publish more or know more? They are two different things.”

While the new tools didn’t create the problem of predatory publishing, they will certainly alter the landscape to some extent.

Simon Linacre, Chief Commercial Officer at Cabells and author of The Predator Effect: Understanding the Past, Present and Future of Deceptive Journals, explains that authors who use predatory publishing fall into two categories: the unaware and the unethical. It’s unlikely that the wider availability of generative AI tools will affect unaware authors. However, it will make things significantly easier for the unethical group.

“If they can use AI to cheat, they’ll use it,” he says.

“And they don’t actually have to take the additional risk of publishing in a predatory journal, if they know it’s predatory. They can try and get the paper published in a legitimate journal. And that’s easier because there is less risk involved: there is going to be less shame in mistakenly using AI than knowingly publishing in a predatory journal.”

Another change could see predatory journals and paper mills joining ranks, Linacre explains. “I could almost see it as a predatory journal service - you pay $100 and you get your article published, but if you pay $250, we’ll write that article for you. For that unethical constituency, that might be attractive.”

Linacre adds that Cabells’ Predatory Reports now include nearly 20,000 journals, with a consistent rate of growth since the reports started in 2017.

While the wide availability of generative AI tools hasn’t yet left a clear imprint on the data on the number of predatory journals, the situation may change.

“As it is now easier to create presentable academic research from the simplest of prompts, we would expect to see more and more content being created, which may create a demand for more predatory journals to publish it,” Linacre says.

Kendall also agrees that generative AI will make the paper mills problem worse: he envisages a scenario where people are going to produce papers using generative AI and coerce editors and reviewers to publish, while the problem of authorship for sale will also worsen. Paper mills have indeed become more enterprising. An investigation found that paper mills had been bribing journal editors.

Once online, a flood of low-quality content is hard to contain – content selection and curation were never the internet’s strongest feature.

Talking about the problem of misinformation and junk science online, Dr Renzella explains that they stemmed from an actually positive development: having a free, accessible, powerful internet space that allows anyone to reach the world.

“That’s amazing, of course, but the downside of it is that everyone, including bots and people with dodgy intentions, can reach the world,” he says.

“Maybe we should ask: have we gone too far? I think we need to find ways to police it, to put sense checks into it, and to curate the content so that’s really meeting the bar to be published.”

Convincing Looking, But Fake

Professor Haider co-authored research that found papers produced using ChatGPT listed on Google Scholar.

According to the research, there are increasing numbers of ‘questionable’ papers created using AI tools, and they tend to be about crucial topics shaping politics and policy, such as the environment and health, and computing.

The questionable papers were identified by scraping Google Scholar for articles containing two phrases that AI tools are likely to use: “as of my last knowledge update” and “I don’t have access to real time data”.

Two-thirds of the papers retrieved were found to have been produced with undisclosed use of AI tools. Of these, 57 percent dealt with sensitive and policy-relevant topics.

“These texts, when made available online… leak into the databases of academic search engines and other parts of the research infrastructure for scholarly communication. This development exacerbates problems that were already present with less sophisticated text generators,” the authors wrote.

According to the authors, AI-produced fake scientific papers can overwhelm the scholarly communication system and jeopardise the integrity of the scientific record. These papers can also be optimised for popular search engines. They warn of an increased risk of what they suggest calling evidence hacking: the strategic and coordinated malicious manipulation of society’s evidence base.

Academic search engines, they explain, should be equipped with filtering options that include quality – allowing, for example, users to filter by ‘peer-reviewed’ papers – to mitigate the risks.

Google Scholar is commonly regarded as a tool that students (and the public) can use to form an evidence-based opinion on a topic, but, Professor Haider says, it doesn’t work like that anymore.

“It doesn’t distinguish between good and bad, indexed and non-indexed research. It’s really easy to get into Google Scholar,” he says.

“That can be really dangerous for policy and evidence hacking. It’s been done before, and it’s so much easier now.”

Google Scholar, Professor Haider explains, is a good tool for someone who already knows the discipline they are researching and how to do research online, otherwise, she says, it’s like climbing without ropes.

Courses on how to use Google Scholar could help, she concedes, but Google Scholar should also take responsibility, make amends and offer appropriate support for its users, she adds. “If we continue with the metaphor of rock climbing, you can’t expect people to do free climbing and not even offer hooks and ropes. It’s irresponsible.”

Google did not respond to our request for comment.

Zombie Papers and Other Hard-to-Kill Creatures

Professor May R. Berenbaum wrote in a 2021 article that she had no idea that killing zombies would be part of her job as Editor in Chief of Proceedings of the National Academy of Sciences (PNAS).

The zombies she referred to are retracted articles that continue to be cited; a sign that the scientific archive is already hard to clean up, and that the retraction system needs a rethink.

With generative AI, however, there could be potentially a much bigger zombie population to deal with: errors and junk science that live on in an AI model.

According to a 2024 paper, the likelihood that unreliable and invalid articles are used for training AI models is high.

And the likelihood that AI tools, while scraping the internet, get trained on junk science, is equally high.

“Almost all predatory journals are open access,” Linacre explains. “Therefore, it’s safe to assume that they have been already, to a greater or lesser degree, covered by the different AIs.”

This will train AI models to perpetuate errors, which will affect the quality of their responses. This is going to be an issue for scientists, but, most importantly, for the general public if they rely on AI to research and summarise information.

“AI can’t identify predatory journals. That’s going to be an issue because, with AI, people are not reading the papers, whether that’s a journalist or a researcher,” Linacre explains.

“If you ask AI to give you ten sources, and two of them are from a predatory journal, just because the AI has found it in this kind of huge, seething maelstrom of data it has in a server farm, then you have no way of checking it.”

In some cases, the mistakes may have been created by AI itself. Recently, the term, vegetative electron microscopy, appeared in some papers.

The term is nonsense, but originated when AI scanned two papers from the 50s which contained those words, not in the same sentence, but across two columns. AI combined the words, and a new term was born, eventually making its way into other papers.

A mistranslation is also to blame, according to an article in The Conversation tracing the origins of ‘vegetative electron microscopy’. Eventually, it made its way into other papers. The authors of the article, Dr Aaron J. Snoswell, Dr Kevin Witzenberger and Rayane El Masri, found that the term is now embedded in AI models.

They call the term a digital fossil, an error preserved in AI systems in the same way biological fossils are trapped in rocks. Its existence, the authors argue, raises important questions about knowledge integrity as AI-assisted research and writing become more common. The nonsense term may be permanently embedded in AI knowledge bases, and likely other ‘quirks’ may be present, they say.

“Digital fossils reveal not just the technical challenge of monitoring massive datasets, but the fundamental challenge of maintaining reliable knowledge in systems where errors can become self-perpetuating,” the article reads.

“Tech companies must be more transparent about training data and methods. Researchers must find new ways to evaluate information in the face of AI-generated convincing nonsense. Scientific publishers must improve their peer review processes to spot both human and AI-generated errors.”

“[Artefacts are] a problem for research. Maybe it’s just one reference among the hundreds so it doesn’t change anything. But it could,” Professor Haider says.

“And it makes it so much easier to question scientific publications and science itself as an institution. Just the fact that this can exist is a huge problem because it makes something that’s already under attack even more vulnerable.”

Tracing the original source of information is not always possible.

Link rot and digital decay are well-documented problems affecting online content. According to research from Pew Research Center, a quarter of web pages that existed between 2013 and 2023 are no longer accessible. A paper analysing ninety-five articles with a total of 2,424 references found that 14.7 percent of citations were unavailable.

While there are specific procedures and frameworks in place that publishers need to abide by to keep copy safe, Linacre explains, this doesn’t apply to some content online.

“A lot of the pre-print servers are not involved in these kinds of safeguarding schemes,” Linacre observes.

“And if you put anything on a website without the safeguarding, then it’s liable to digital decay. And if that has been sucked up by the AI and then it’s disappeared, then you can’t check the original link.”

The Zombie Scientific Archive

The Dead (or Zombie) Internet Theory sees the internet as a place with not much human input; it’s all bots talking to each other.

It’s plausible that the situation is not that far off in the most lawless parts of the unethical publishing world. In those zombified quarters, generative AI is used to write junk scientific papers based on the material it is trained on, which includes AI-generated junk science, and the cycle continues. Human intervention is possibly limited to prompts, paper mills and predatory publishers.

While human oversight wasn’t exactly a feature of predatory publishing pre-generative AI either, the junk science in the zombie scientific archive could become, like the vegetative electron microscopy error, self-perpetuating.

Predatory publishing was already a problem before generative AI, spreading misinformation, jeopardising the reliability of the scientific archive and damaging the credibility of science in society.

Generative AI did not only magnify the scale of the problem, allowing predators to produce better-looking junk science faster, but also made finding and fixing the problem much harder.

Thankfully, there are more and more zombie hunters out there.