Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Research suggests that AI-generated junk science is a big problem at Google Scholar


Scientific research generated by artificial intelligence is polluting the online academic information ecosystem, according to a disturbing report has been published at the Harvard Kennedy School Misinformation review.

A team of researchers investigated the distribution of research articles with evidence of artificially generated text. Google Scholaran academic search engine that makes it easy to search for research published historically in many academic journals.

The team specifically investigated the misuse of pre-trained generative transformers (or GPTs), a large language model (LLM) that includes now-familiar software such as OpenAI’s ChatGPT. These models can quickly interpret text input and quickly generate answers in the form of numbers, images, and long strings of text.

During the study, the team analyzed a sample of scientific articles found in Google Scholar with indications of GPT use. The selected papers contained one or two common statements negotiation agents (generally, chatbots) used by LLMs. The researchers then investigated the extent to which these suspicious documents were distributed and posted on the Internet.

Björn Ekström, a researcher at the Swedish School of Library and Information Sciences and co-author of the paper, said, “When AI-generated research is distributed in search engines, the risk of what we call ‘evidence hacking’ increases significantly.” Boras University release. “This can have serious consequences, as false results can leak into society and perhaps increasingly into the field.”

Google Scholar’s search of the web does not, in the opinion of the latter team, consider papers without scholarly affiliation or peer review by their authors; The engine will capture academic pursuits—student papers, reports, preprints, and more—in addition to higher-stakes research.

The team found that two-thirds of the documents they examined were produced at least in part by the undisclosed use of GPTs. The researchers found that 14.5% of the papers produced by GPT were related to health, 19.5% to the environment, and 23% to computing.

“Most of these GPT-produced papers were found in non-indexed journals and working papers, but some cases included studies published in mainstream scientific journals and conference proceedings,” the team said.

The researchers noted two main risks posed by this development. “First, the proliferation of fictitious ‘studies’ infiltrating all areas of the research infrastructure threatens to overwhelm the scientific communication system and threaten the integrity of the scientific record,” the group wrote. “A second risk is the increased likelihood that content that looks convincingly scientific is actually craftily generated by AI tools and optimized for retrieval by publicly available academic search engines, particularly Google Scholar.”

Because Google Scholar is not an academic database, it is easy for the public to use when searching for scientific literature. That’s good. Unfortunately, it is more difficult for members of the public to separate the wheat from the chaff when it comes to reputable magazines; even the difference between a peer-reviewed research piece and a working paper can be confusing. Additionally, AI-generated text was found in some peer-reviewed works as well as less-researched writing, suggesting that the work developed by GPT is muddying the waters in online academic information systems, not just in work. available outside of most official channels.

“If we don’t believe that the research we read is authentic, we risk making decisions based on the wrong information,” study co-author Jutta Haider, also a researcher at the Swedish School of Library and Information Sciences, said in the same release. “But this is as much a matter of media and information literacy as it is a matter of scientific misconduct.”

In recent years, publishers have failed to successfully screen several scientific articles that are in fact complete nonsense. In 2021, it was Springer Nature is forced to retreat More than 40 articles Arabian Journal of GeosciencesDespite the magazine’s title, it covered a variety of topics, including sports, air pollution, and pediatrics. In addition to being off-topic, the articles were poorly written—they didn’t make sense—and the sentences often lacked a coherent line of thought.

It is artificial intelligence exacerbates the issue. Borders Publishing last February caught for publishing an article in his journal Cell and Developmental Biology Midjourney contains images generated by artificial intelligence software; specifically, a lot anatomically incorrect images of signaling pathways and rat genitalia. Borders withdrew the paper a few days after publication.

AI models can be a boon to science; systems can deciphering fragile texts From the Roman Empire, find out previously unknown Nazca Linesand reveal hidden details in dinosaur fossils. But the impact of artificial intelligence can be as positive or negative as the person using it.

Hosts and search engines for peer-reviewed journals and perhaps academic writing need safeguards to ensure that technology works in the service of scientific discovery, not against it.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *