The Dispatch

Zombie Hunters

While academic zombies silently infect research, some hunters are finding new ways of dealing with unethical and fraudulent AI generated papers.

By Claudia Civinini

Talking points

Predatory publishers, paper mills, and the use of AI to generate fraudulent research papers are undermining the integrity of scientific literature.
The proliferation of AI-generated and fraudulent papers is creating a credibility crisis which is compounded by the difficulty in detecting AI-generated content, as traditional tools become outdated.
To combat these issues, researchers have developed new tools to detect unethical papers. Fostering a sleuthing mindset among students and improving collaboration among scientists, universities, and publishers are crucial.

Professor Graham Kendall, Vice-Chancellor at MILA University Malaysia, led a secret life for a while, anonymously managing the @fake_journals account on X.

The inspiration for creating an account shedding light on the predatory publishing landscape, Professor Kendall explains, came after Jeffrey Beall retired. Beall’s famed list of predatory publishers, aptly titled the Beall’s List, was consequently closed down.

“Since Beall started his work in 2010, we haven’t really made any inroads into stopping those practices. If anything, it’s got worse,” he says.

“In many fields, it is becoming difficult to build up a cumulative approach to a subject, because we lack a solid foundation of trustworthy findings. And it’s getting worse and worse.”

"People assume that since children are growing up using digital tools, they know how to use the internet. But they don’t."

Generative AI makes the development of simple bots easier, enabling even people without deep technical knowledge to launch bot attacks.

A particular post made Professor Kendall’s account gain a lot of followers and surpass the 10,000 mark: it was about hyper-prolific authors.

While Professor Kendall concedes that, in some research fields, publishing shorter papers or a series of papers is more common, over-the-top productivity, such as publishing more than one paper a day, should be looked into, he says.

“There are certainly people using paper mills and generative AI to produce papers.”

But that is only the tip of the iceberg. Generative AI, paper mills, poor peer review and predatory publishers are all issues that, combined, will lead to the integrity of the scientific archive being harmed, according to Professor Kendall.

“If it carries on too long, we just won’t be able to rely on the scientific archive anymore, or have any faith in it,” he says.

“People would end up using papers produced with generative AI, papers that are not really good science, that haven’t been peer reviewed, but still have made their way in the scientific literature.”

Unethical publishing is an old problem, but generative AI tools are lending unprecedented speed to fraudsters, magnifying the scale of the problem. This in turn is creating a zombie academic archive, a mix of AI-generated content, human fraud, and hard-to-kill mistakes and fake science potentially living on forever in AI models.

Scientists all over the world are responding to the threat and trying to hunt down unethical publishing in a bid to safeguard the integrity of the scientific literature.

There have been warnings that the deluge of junk papers is creating a credibility crisis, and it is slowing research, especially in crucial fields such as medicine.

In comments to the Guardian, Professor Dorothy Bishop said in 2024: “In many fields, it is becoming difficult to build up a cumulative approach to a subject, because we lack a solid foundation of trustworthy findings. And it’s getting worse and worse.”

Bishop was one of the organisers of a recent conference at the University of Oxford in the UK on academic integrity and fraud.

Artificially More Intelligent

Dr Ophélie Fraisier-Vannier is a postdoctoral researcher at the Institut de Recherche en Informatique de Toulouse in France. Last year, she joined the team led by Dr Guillaume Cabanac, a professor of computer science whose activity as a ‘deception sleuth’ was recognised by Nature in 2021, to conduct further work, among other topics, on fraud detection.

Dr Cabanac is the creator of the Problematic Paper Screener (PPS), which screens research for signs that it has been produced unethically. These signs include the tortured phrases, sentences with awkward expressions – for example, counterfeit consciousness instead of artificial intelligence – which signal a crude use of synonyms, usually a way to disguise plagiarism. The PPS also flags other problematic papers, such as those generated with SciGen, those containing ChatGPT ‘fingerprints’, or citejacked papers – articles in legitimate journals that cite articles in hijacked journals.

“It’s a really fascinating topic because it’s at the heart of what it is to do research: if you cannot trust the scientific record, it’s a big problem. Not just for science, but for society at large,” Dr Fraisier-Vannier explains.

“We need more people flagging the problems so that the scientific record can remain trustworthy and society can rely on scientific research, trusting what comes out of labs, universities, and the like.”

Unfortunately, generative AI is also making fraud detection harder.

“There are several types of detection tools included in the PPS, but the main one revolves around the tortured phrases,” Dr Fraisier-Vannier explains.

“The problem is that ChatGPT is way too smart to generate tortured phrases. So, we know already that the tortured phrases detector is kind of outdated, because tortured phrases are not used anymore to generate articles.”

Generative AI, Dr Fraisier-Vannier explains, could just be a very efficient tool for paper mills to generate papers even more quickly. Before AI, the people behind paper mills had to copy and paste articles and avoid plagiarism detection by using synonyms liberally, which created tortured phrases. Now, that might not be needed anymore.

“Their workflow has been reduced by one step,” she says. “They just need to generate an entirely new paper.”

Thankfully, some human mistakes still make the undisclosed use of AI evident.

Papers containing leftovers from an LLM response,, are not uncommon. These includes phrases such as “I don’t have access to real time data”, and in some cases, text literally noting the author is an AI language model.

Those papers are regularly flagged on social media (for example, see here) and they are also being collected by researchers. One list, based on a search strategy developed by Dr Cabanac, can be found on Retraction Watch. Another project is Academ-AI.

Some use of AI is, of course, non-malicious. While opinions vary on what should be disclosed and what shouldn’t, light editing and spell checkers are probably fine. Generating a whole paper using an LLM and not removing ‘as of my last knowledge update’, on the other hand, is a completely different story.

But those papers making undisclosed use of AI evident are, according to Dr Fraisier-Vannier, outliers; catching AI-generated papers that don’t contain those tell-tale sentences requires more time-consuming analyses.

One of these analyses, Dr Fraisier-Vannier explains, can be done on citations: references that have nothing to do with the subject matter or ghost citations (made-up references), can flag up an article as suspect.

“We will have to lean more on this kind of flags rather than on keywords,” she says.

“Keywords will still catch some outliers, people who forgot to delete a sentence. But I fear we will miss the majority otherwise.”

Cross-border collaboration among scientists on the issue of academic integrity is strong.

“We have an informal Slack account where we discuss these integrity issues,” Dr Fraisier-Vannier says. The account includes some very well-known people in the field of academic integrity.

“And the community of people working on these issues tries to work with other stakeholders. Universities need to be involved, and publishers. We try to organise and to have the most impact possible with all the stakeholders, but there is still a lot to do, definitely.”

Asked whether this type of investigation will become a research field in its own right, Dr Fraisier-Vannier says it’s a question circulating in the sleuthing community as well. But perhaps it will become more of a specialisation within each research field.

“For example, thinking about image alteration, it’s more present in biology than in computer science, so it’s all field dependent,” she says.

Digital Natives

A sleuthing mindset needs to be fostered in students, too.

For students, it’s paramount to develop the skills to engage with generative AI safely and critically, but the digital native myth may work against them.

The week before our interview, while attending a conference, Dr Fraisier-Vannier heard a colleague say something that confirmed what she already knew: that students may not be the natural AI experts we think they are.

“There was one researcher there who explained that she had students who used ChatGPT to know what the time was,” she recalls.

AI tools are here to stay, and they can be used smartly, she explains, but we can’t assume that students will instinctively know how to do that. When the previous generation was growing up and the internet was still quite new, students were explicitly taught skills such as how to use the internet and what sources not to rely on, she explains, but this is not as common anymore.

“I feel that after my generation, there was a decline in this kind of teaching, because people assume that since children are growing up using digital tools, they know how to use the internet. But they don’t,” she adds.

Digital natives, people who are naturally tech-savvy because they grew up using technology, are indeed mythical creatures.

“We need to train students on the tools, what they are good at, what they are bad at, what their limitations are, and to always check their sources – that’s the first rule, ever since students started researching on the internet. And I think it’s even more important with generative AI,” Dr Fraisier-Vannier says.

This is especially important in the current online environment.

Dr Vlada Rozova, Research Fellow in Applied Machine Learning at the Centre for Digital Transformation of Health at the University of Melbourne, Australia, recounts that sometimes she receives submissions for Master’s research projects where the whole essay, including references, is generated – and some of those references are fake.

LLM-generated search summaries, which have come under fire for sometimes “hallucinating” responses, are very tempting to use to save time.

But Dr Rozova says it’s hard to estimate how reliable the information in an LLM summary is, as it could rely on any number of sources, including AI-generated fake papers, and while it’s convenient to read the summary, it doesn’t help students practise their research skills.

“It’s tempting, because you don’t need to do this clicking and confirming information yourself, opening multiple tabs and synthesising it yourself, but this activity is really important, especially in research,” she says.

“This is what we are telling our PhD students: anything you produce needs to go through you, it needs to have this filter applied which is you.”

Eerily Quiet

While researchers hunt zombies, and lecturers try to train AI-savvy students, publishers are grappling with other issues.

Simon Linacre, Chief Commercial Officer at Cabells and author of The Predator Effect: Understanding the Past, Present and Future of Deceptive Journals, explains that things have gone quiet online for publishers.

Download statistics, a valuable metric for both publishers and authors, even in the open-access model, have become less useful. Increasingly, figures won’t reflect actual usage because some will come via AI and not all be based on downloads as it used to be previously.

If people use AI to do research, they may not read any of the original papers. With a lot of the open-access papers already hoovered up by AI models, there won’t be any hits recorded on a publisher’s website.

“A lot of publishers and libraries are identifying that the download metric is in decline. Now that’s going to cause a real problem, because the libraries then have less of a valuable metric to understand what the cost per download looks like. The publishers are worrying because their traffic is dropping,” Linacre says.

Automated attacks, or bad bot activity, are another threat facing both universities and publishers.

According to cybersecurity firm Imperva, part of the Thales Group, bad bots can damage the education sector by, for example, taking over students’ and faculty accounts, and scraping proprietary research and data.

Tim Ayling, Cybersecurity Specialist at Thales, tells QS Insights Magazine that generative AI makes the development of simple bots easier, enabling even people without deep technical knowledge to launch bot attacks.

“Academic publishers own a vast amount of valuable copyrighted content, some of which can be decades old. Particularly prestigious journals, research papers and even the reputations of the academic authors and journals can hold great value, which naturally makes them a target for bad actors,” he says.

“Thanks to vulnerabilities with legacy identity and access management tools on the websites of publishers and universities, as well as limited abilities to monitor and block automated activity, this content can be at particular risk.”

Content can also be scraped in bulk from academic journals, he says, which can lead to increased operational costs for publishers from the huge increase in content requests, and threaten their financial viability.

“The scraping of copyrighted data represents a financial threat to academic publishers and universities alike,” Ayling says.

“If that ingested data is then used to generate unoriginal academic content using AI tools, this activity also threatens to devalue high-quality academic research by skewing the evaluation of academic work, as well as the risk of misinformation.”

It’s indeed the fundamental question around research integrity that feels even more urgent.

In a 2024 article, Dr Jessamy Bagenal, Senior Executive Editor – Head of Clinical at The Lancet, asked: how can scientific publishers and journal editors assure themselves that the research they are seeing is real?

How To Survive A Zombie Outbreak

There is another question to ponder, one that may at first look irrelevant here: what do we do when there is a zombie outbreak?

The Dead Internet Theory posits that what we see online is the result of bot activity. To interpret the situation on social media, 404 Media suggested another term – the zombie internet: a mix of automated accounts, human accounts and accounts that were once human but are not anymore.

In the social media sphere, explains Dr Jake Renzella, Senior Lecturer at the School of Computer Science and Engineering at the University of New South Wales, Australia, platforms such as Discord, which allow people to create smaller and more private communities, are on the rise.

“What do we do when there’s a zombie outbreak? We build walls, we fence ourselves off in a community,” he explains.

“Now, that might sound a bit negative, but I think what those smaller social media platforms are saying is that people want a more intimate connection with other people who they know are real.”

“A similar thinking could be applied to the research space. However, of course, open publication and open access to research are incredibly important.”

The zombie internet analogy, as covered “The Zombie Scientific Archive”, fits with the online undergrowth of unethical scientific publishing: a mix of AI-generated content human fraud, and hard-to-kill fake science living on forever in AI models.

What those walls Dr Renzella discusses could mean for research and publishing is a question to be pondered.

Linacre notes that library publishing, with university libraries assuming greater control of research outputs, is a solution that has been proposed in response to previous challenges to the traditional research and publishing models.

“In the current climate, I can see that this argument can still hold weight, with universities taking advantage of low barriers of entry into publishing by mandating publications through their own platforms,” he explains.

In her 2024 article, Dr Bagenal proposed a series of solutions.

One part, she wrote, would be to find new ways to finance the open access model, such as new types of group deals that remove the focus from article processing charges, while reforming the academic rewards system to prioritise quality over quantity and diminish the link between publication and promotion. She also advocated for more robust editorial processes scrutinising studies for signs of data fabrication.

Her paper, most importantly, is a call for editors and publishers to tackle the challenges created by generative AI. She quoted a book by AI entrepreneur Mustafa Suleyman and writer Michael Bhaskar, The Coming Wave, which warns that humanity is not ready for the impact of new technologies and introduces the concept of “pessimism aversion” – the reluctance to confront difficult change.

For journal editors and scientific publishers today, she warned, pessimism aversion is a dangerous trap to fall into.

“All the signs about generative AI in scientific publishing suggest things are not going to be ok,” she wrote.