Effects and challenges of GenAI in scientific research.

GenAI displays a growing range of capacities increasingly used by scientists. It is therefore affecting core features of the scientific endeavour: creativity, reliability, publishing and academia-industry relations.

In scientific research, as in any other field, creativity means producing something novel and relevant. Empirical studies and researchers’ own experiences illustrate that GenAI can influence scientific creativity either directly, through its own discoveries, or indirectly, by augmenting scientists’ creative capacities. Moreover, it can also affect collective creativity.

Creativity in scientific research requires mastery of existing knowledge and the cognitive skills to move beyond it (Simon, 2001). Current GenAI models excel at storing and combining vast amounts of information; they can identify patterns and generate plausible re-combinations (interpolations) of existing ideas using statistical techniques. However, according to most tests, they struggle to extrapolate new concepts beyond their training data. Even their capacity to combine “ideas” might concern words rather than ideas, due to their lack of genuine understanding (the “stochastic parrot” metaphor). The capacity of AI to reason can also be questioned: many tests show a limited capacity of AI to perform logical inference (Shojaee et al., 2025), which is a necessary operation for articulating scientific ideas. As a result, GenAI’s capacity for conducting radical scientific breakthroughs remains limited, while its capacity for incremental novelty requires the significant involvement of humans. In testimonies of researchers collected by Castelvecchi (2024), one argues that current LLMs “cannot formulate novel and useful scientific directions beyond superficial combinations of buzzwords.” Yet, there are striking cases of GenAI models demonstrating true creativity. For instance, the model ESM3 has designed fluorescent proteins radically different from naturally occurring ones, variants that nature might have taken hundreds of millions of years to evolve (Hayes, 2025). In an experiment conducted in the discipline of natural language processing, Claude 3.5 Sonnet outperformed human experts in generating interesting research questions, although its suggestions were often less feasible in practice (Chenglei Si, 2024). In another example, an AI algorithm designed unconventional configurations for space wave detectors, some of which could dramatically increase the sensitivity of these instruments (Krenn, Drori and Adhikari, 2025). Such cases demonstrate that GenAI can produce surprising and valuable outputs, although they remain rare and highly domain-specific. One factor that can make GenAI creative is the immensity of the search space it navigates for potential solutions. AI models can identify and examine huge numbers of ideas, situated in extremely remote parts of the search space. Scientific discovery often involves chance, finding unexpected solutions by exploring many possibilities. Because AI models can test and combine so many ideas, they can stumble upon configurations that humans might never have considered yet. This combinatorial advantage is particularly effective in fields like chemistry, materials science and biology, where novelty often comes from exploring novel combinations. The fluorescent proteins generated by ESM3 illustrate how “interpolation” within an immense search space, that humans know only small parts of, can still yield significantly novel outcomes. Several additional aspects are worth noting: • Curiosity: Human creativity is driven by curiosity. Ongoing efforts aim to endow AI models with a form of “artificial curiosity” to encourage questioning established ideas, notably the ones it has found itself. • Anomaly detection: Many discoveries begin with spotting inconsistencies between observations and expectations (predictions by a theoretical model). GenAI can excel at detecting simple anomalies in vast datasets, but lacks the deep reasoning and mere “representation of the world” required for complex theoretical contradictions. • Hallucinations: Also known as “confabulations”, these are (often plausible though wrong) statements or references invented by the AI model, often resulting from the accidental merger of two or more source pieces of information. They are not an example of true creativity as they lack relevance; they are disconnected from the original question and do not carry meaning.

Although GenAI cannot yet rival human creativity, it offers complementary strengths that can enhance human creative work. A well-designed division of labour between humans and AI might then result in greater creativity of a human-AI hybrid system. Researchers are experimenting with human-AI collaborations in various ways:

• Brainstorming: AI can stimulate new ideas by challenging human assumptions, despite its tendency to mirror users’ inputs uncritically. Anderson, Shah and Kreminski (2024) found that people who used an LLM generated more creative stories than those working alone. • Productivity: Freeing a researcher’s time (by taking over simpler tasks) so that they can focus on higher-level thinking. • Balancing exploration and exploitation: GenAI’s strength in re-combining known ideas (exploitation) may allow humans to concentrate on genuine leaps into the unknown (exploration) (Gans, 2025). • Search and screening (evolutionary computation): GenAI can scan vast combinatorial spaces and present promising options for human evaluation. For example, Si et al. (2024) showed that LLMs can generate research plans, but human judgment remains essential to assess feasibility. • Flexible collaboration modes: Tools like Agent Laboratory (an AI-based research assistant) offer both autonomous and co-pilot modes, where researchers can choose when to intervene and guide AI-generated work (Schmidgall et al., 2025).

These mixed-initiative approaches underline a key insight that current GenAI is best seen as an augmentation tool that complements, rather than replaces, human scientific creativity. Additionally, some researchers worry that reliance on AI might erode human creativity, as it assumes tasks that support human’s creativity: if “writing is thinking” and if humans leave the writing to machines, then they will do less thinking as well. A recent study (Lee et al., 2025) suggests that GenAI can reduce the perceived effort of critical thinking, fostering over-reliance on AI and diminishing independent involvement in problem-solving. However, these findings are based on studies conducted in laboratory conditions, related to one-off tasks and literary creation rather than real-world research contexts, so more robust evidence is needed.

Even if GenAI enhances the output of individual researchers who use it, its widespread use could have a negative effect on overall scientific production; in particular, it could reduce the diversity in scientific inquiry. A large-scale study found that scientists who adopt AI tools publish significantly more papers and receive more citations than others, but AI also narrows the range of topics being explored collectively, as researchers using AI tend to concentrate on similar topics (Evans et al., 2024). Rather than inspiring bold ventures into new fields, AI appears to accelerate work in established, data-rich areas, raising concerns about a “homogenisation effect.” Similar findings emerge from creative writing experiments. For instance, Doshi and Hauser (2024) found that while generative AI ideas made stories more creative and engaging - especially for less creative writers - they also made the stories more similar to each other. Anderson (2024) similarly observed that users produced less semantically distinct ideas with ChatGPT than with other collaborative support tools, and felt less responsible for the ideas they generated. In sum, GenAI is unlikely to replace human creativity in science any time soon. The time has not come for an AI system to be awarded a Nobel prize (Kitano, 2016), except to make a point about the creation potential of AI (as was the case for AlphaFold). GenAI might have the potential to reshape and boost human creativity in profound ways though. It can extend the combinatorial power of researchers, accelerate repetitive tasks, and foster serendipitous (“by chance”) discoveries. At the same time, risks remain: over-reliance on AI could dampen human critical thinking, and collective creativity might suffer if the research community focuses too narrowly on questions where GenAI works best.

Veracity is another pillar of science. A scientific statement is accepted as such only after it has been validated according to certain socially vetted procedures and rules among the scientific community. GenAI has been challenged on this front, with a number of cases where it generated clearly inaccurate results.

Technical limitations

A principal problem is that AI models have a notion of “veracity” that is restricted to their training set, which is significantly narrower than that of the real world, especially for LLMs that are essentially trained on texts whose own veracity is often dubious. A second problem is “hallucinations”, or confabulations, which consist of confusing and conflating two or more pieces of information from the training set, resulting in incorrect, usually plausible though often nonsensical statements. This sometimes happens with references to the literature, where certain articles are simply invented by the AI (attributing one title to one author, as the two do exist but are not connected and the reference itself does not exist). Confabulation is generally attributed to technical factors: 1) compression of the data, a process that sometimes generates accidental mixture of data in the decompression stage; notably, the machine will mix the relevant story with a similar but irrelevant one that it has memorised, and will issue a response based partly on the irrelevant story; 2) lack of “metacognition”, a set of reasoning procedures that operate in the human brain and cross-check thoughts before they are expressed, making incoherent ideas less likely. It is also favoured by the sycophantic bias of most LLMs, which are trained to answer queries even in cases when they only find responses with low probability.

Reproducibility is a pillar of scientific operations. To be accepted by the scientific community, results must (usually) be verifiable, and reproducible by others. One condition for reproducibility is full disclosure of the methods and data that led to the conclusion, meaning transparency and accessibility. From this perspective, GenAI models do not meet scientific criteria. First, the most popular models of GenAI are “black boxes”, as neither their weights (the parameters that define a neural network) nor their training data are publicised. Thus, disentangling the contribution of the data and the contribution of various components of the model is difficult in any scientific result coming from such a model. This comes from the very nature of neural networks: knowledge is distributed, hence difficult to localise. As GenAI models have a random component at their core, some results might not be robust. In addition, access to the training data can be limited due to the proprietary nature of many GenAI models: one example is the “AI Structural Biology Consortium”, a follow-up to AlphaFold-3, an ongoing project which makes use of data owned by pharmaceutical companies, which is secret and will remain secret (Callaway, 2025). Currently, solutions for access include open weights (e.g. Llama) and open source (including access to training data). The importance of openness was demonstrated by AlphaFold2, as the disclosure of its code and data triggered a series of initiatives refining the tool (Saplakoglu, 2024). Openness is essential to the cumulative progress at the core of science. Two further issues might make AI-based research less reproducible. First is the random aspect of the working of models, which is that a model can generate different results for the same prompt for no substantive reason; second is model drift: models are regularly updated, and asking the same question to the same model at two different moments in time might bring different responses due to a change in the model’s parameters. But AI might also bring transparency to scientific research: AI systems record all that they do and can report all of their operations and corresponding outputs in more precise ways than humans. This makes it possible to reach higher traceability in research activities, and thus easier reproducibility. This transparency might also help identify and publicise “negative research results” (e.g. drugs that did not pass clinical trials). Such results are important for follow-up research, indicating avenues to avoid and saving time and resources, but are currently seldom publicised by researchers as they are under-rewarded on the market (no one got a Nobel Prize for a negative result yet). AI itself would also greatly benefit from the availability of negative results, as they would significantly enrich the data sets used for training.

Some scientists support the view that GenAI will boost certain dimensions of their productivity. In responding to a Nature poll of 1 600 scientists, a majority noted that AI provides faster ways to process data, that it speeds up computations that were not previously feasible, and that it saves scientists time and money. An ERC survey had 85% of respondents who thought that generative AI could take on repetitive or labour-intensive tasks, such as literature reviews. 38% felt that generative AI would promote productivity in science, such as by helping researchers to write papers at a faster pace (Prillaman, 2024). To date, there has not been a direct study of the impact of GenAI on the productivity of researchers, but such studies have been performed in other professions performing some similar tasks as researchers. Noy (2023) examined the productivity effects of ChatGPT on mid-level professional writing tasks. “In a preregistered online experiment, we assigned occupation-specific, incentivised writing tasks to 453 college-educated professionals and randomly exposed half of them to ChatGPT. Our results show that ChatGPT substantially raised productivity: The average time taken decreased by 40% and output quality rose by 18%. Inequality between workers decreased, and concern and excitement about AI temporarily rose. Workers exposed to ChatGPT during the experiment were 2 times as likely to report using it in their real job 2 weeks after the experiment and 1.6 times as likely 2 months after the experiment.” Overall, it can be expected that GenAI will: 1) allow scientists to do things that are not feasible otherwise (read a vast and variegated selection of literature, study complex systems, process big or unstructured data, generate data, etc.), and 2) save researchers time by taking over certain tasks (Schmidgall et al., 2025). This double process of complementing and substituting humans should have a positive effect on productivity. But it might generate costs as well (financial cost, more data-related tasks, non-explainability, fake results, weakening of research on AI-poor issues, reduced diversity, etc.), so that the net effect could be lower than the raw effect.

GenAI has made it much easier (less time- and effort-consuming) to draft papers (relative to other research tasks, except, perhaps, research grant applications). One would therefore expect that for a given level of research input, the number of papers would increase. That’s exactly what is observed (Figure 13.5), and it raises an acute question to the scientific system regarding how to maintain the quality of published science in this abundance of new papers. In this context, GenAI risks flooding the market with papers, creating more clutter than knowledge. “Paper mills” exploit low costs to mass-produce possibly poor research work. Liang et al. (2024) finds that higher AI use in papers correlates with more frequent pre-prints, crowded fields, and shorter papers, i.e. quantity without guaranteed knowledge gain (although there might be some). Initial reactions by publishers were restrictive: the International Conference on Machine Learning (ICML) (2023) and Science banned AI-generated text. As use spread, publishers adopted nuanced policies. Most now distinguish between: 1) AI for editing (no disclosure needed); and 2) AI for content generation (requires disclosure). The International Association of Scientific, Technical and Medical Publishers (STM) (2023) endorsed this; the scientific publisher Wiley (2025) asks authors to specify usage; Science sometimes requires prompts; Nature asks for documentation but not for copy-editing (Kwon, 2025). Paper growth outpaces reviewer supply, stressing the capacity for peer review (Bergstrom and Bak-Coleman, 2025). One possible response is automated reviewing. GenAI is being tested to assist or replace referees. Gruda (2024) proposed structured prompts; Liang et al. (2024) showed strong overlap between AI and human feedback, with many researchers finding it useful. Yet scepticism is strong: over 60% of scientists oppose full automation (Kwon, 2025). Elsevier and the American Association for the Advancement of Science (AAAS) ban AI reviewing; Wiley and Springer Nature permit limited, disclosed use, forbidding confidential uploads. Authorship is key to the traceability of scientific discoveries and to human researchers’ careers. Currently, AI is not recognised as a co-author, though a fully AI-written paper (“The AI Scientist-v2”) was accepted in 2025 (Sakana.ai, 2025). Recognition of AI authorship would raise legal questions, since AI lacks rights and responsibilities. The number of cases questioning research integrity has been rising, and many of them recently have involved GenAI. GenAI can create convincing simulated and synthetic data, useful for augmentation as mentioned above, but vulnerable to misuse. Simulated data can and have already fuelled dubious research, as shown in health datasets where post-2021 papers surged with questionable results (Suchak et al., 2025).

Developing core GenAI models requires massive resources beyond those available in academia or most governments, leading to concentration in business, which controls talent, data and computing power (Ahmed, Wahed and Thompson, 2023). This strengthens trends already visible in non-generative AI.

This situation has several consequences:

• Large funding flows to basic research, boosting fields like cognitive sciences and mathematics.

• Much academic AI research depends on industry financing, giving business strong influence over agendas; businessrelated priorities such as advertising may displace societal goals like health research or research in education.

• Academic standards of openness clash with corporate secrecy, so widely-used models may lack reproducibility or transparency

GenAI is generating a transformation of science that non-generative AI, and even data-based research prior to the rise of AI initiated. Although most observers agree on this diagnosis, they emphasise different aspects of this transformation and have different views of the future. One view is that science as generated by GenAI will be too complicated for humans to understand; another view (which is not exclusive) is that science will need less and less theory and become closer to data.

GenAI is one step on the way to the automation of science. If science continues its advance on this path, and current technical change makes this scenario plausible, it might become fully automated at some point: machines would not only perform research tasks that are beyond human capabilities, but also design these tasks and take the decision to execute them, after having worked out the underlying hypotheses and before drawing an interpretation of the results. This is the “Nobel prize awarded to an AI” scenario (Kitano, 2016). One possible correlate of this scenario is the emergence of a new science that would be beyond the understanding of humans: using models, concepts and techniques that are too big or too complex for humans to grasp. This sort of science would be more powerful than the current one, freed from the limits of human cognition; it would allow the invention of more sophisticated technologies. At the same time, it would largely escape the grasp and control of humans, and it would not fulfil one of the traditional missions of science: to deepen human understanding and conceptualisation of nature. It would be a science by and for machines. Initial steps have already been taken toward this scenario, with publication formats for scientific endeavours that are machine readable, so that the limits of human communication could be overtaken (Stocker et al., 2025[66]). One possible solution to this problem is the development of AI models in charge of translating the findings of AI scientific models into human language.

Theory can be viewed as a sort of compression of data, which allows for the preservation of information (predictive power) while skipping less relevant details. Theory is needed for humans, as the brain has limited computing capacities. It works by abstracting vast amounts of information in concepts and their linkages. It is used to predict features of future situations, on the basis of experience with past situations. From this perspective, with AI, and notably GenAI, this capacity is being expanded and therefore the need for compression, for theory, is getting weaker. In 2008, Chris Anderson, in a famous piece in the journal Wired, envisaged “the end of theory” (Anderson, 2008). In the words of Kristin Persson, a professor in materials science: “we are entering a new era in science — the fourth paradigm. The first paradigm was empirical science based on experiments; the second was model-based science that develops equations to explain experimental observations; and the third created simulations based on those equations. The fourth pparadigm will be science driven by big data and AI, you now have enough data that you can train machine-learning algorithms” (Persson, 2025). There are certain signs of this “atheoretical science” in publications. Duede (2024) finds that AI-engaged papers in any discipline are becoming more semantically similar to the AI-engaged papers of Computer Science and less similar to other research published in their discipline. There would thus be a sort of convergence of scientific fields towards a common, data- and AI-driven framework. In the field of computational linguistics, for instance, LLMs have taken over what used to be structured, theory-based analysis of language. Such an evolution might raise a risk of focusing research on questions that are data-rich, i.e. more operational questions, at the expense of more “blue sky”, i.e. conceptual and foundational ones (Evans et al., 2024). It could be associated with a decrease in research productivity, understood as the number or variety of ideas in the field rather than the number of papers produced per researcher. It may also just be a phase in science that will lead to new theories and research questions.

The diffusion of GenAI in scientific research has been extremely rapid. GenAI has gained a core role in all text-related tasks like writing, editing and translating; it is making breakthroughs in analytical work with specialised foundation models for complex phenomena; it is progressing in its ideation capabilities, for suggesting hypotheses or experiment design; and it is becoming a credible, if fallible, research assistant. Its impact on the functioning, the productivity and the epistemological dimensions of research are not fully clear yet, not least because the technology is in rapid evolution. It is, however, already clear that scientific research will be transformed in the years to come, especially as new generations of researchers with increasing grasp of the technology will take over. Policies for encouraging GenAI, notably in the domain of research, have been examined extensively already (OECD, 2023; OECD, 2023); they research funding, investment in computing, infrastructure and appropriate regulatory frameworks. One important aspect is training: research in most, if not all, domains does or would benefit from AI, and researchers with the required skills and competencies are in short supply. Most scientists able to master AI techniques are junior researchers, fresh from specialised curricula, while senior researchers struggle with new techniques. This also risks creating a rift between AI experts and domain experts, resulting, on the one hand, in publications intensive in AI but poor in domain relevance, and, on the other hand, in publications strong in domain relevance but under-using AI. Hence, there is a need to train more scientists in AI curricula and to encourage multi-disciplinary teams, but also to involve more senior researchers in AI-related life-long learning. This is a new responsibility for higher education. This evolution will also gradually affect education research. One can assume that many of these trends are already or will soon be visible in education research. It is likely that education researchers already use GenAI tools for languagerelated tasks, such as writing and editing, as well as for programming and data analysis. This is the same for GenAI tools used or designed to manage scientific knowledge: they can also be used to search the education research literature, to produce literature reviews or even research reports. While most specialised scientific GenAI tools were developed for the natural sciences, similar ones will most likely become available for human and social sciences (beyond the “research models” of large LLMs). As AI teacher assistants are being developed for teachers and students, education researchers and their research assistants, if any, could also benefit from GenAI research assistants performing some of their routine tasks and combining some of the affordances mentioned above.

Search This Blog

International Day of Education

Effects and challenges of GenAI in scientific research.

Comments

Post a Comment

Popular posts from this blog

(Day 2) Beyond the Algorithm: Navigating the Future of Artificial Intelligence - 49th Annual UNIS-UN International Student Conference.

Ensure that AI complements, rather than replaces, the essential human elements of learning.

(Day 1 - Part 2) Beyond the Algorithm: Navigating the Future of Artificial Intelligence - 49th Annual UNIS-UN International Student Conference.