Effects and challenges of GenAI in scientific research.
GenAI displays a growing range of capacities increasingly used by scientists. It is therefore affecting core features of the scientific endeavour: creativity, reliability, publishing and academia-industry relations.
In scientific research, as in any other field, creativity means producing something novel and relevant. Empirical studies and researchers’ own experiences illustrate that GenAI can influence scientific creativity either directly, through its own discoveries, or indirectly, by augmenting scientists’ creative capacities. Moreover, it can also affect collective creativity.
Creativity in scientific research requires mastery of existing knowledge and the cognitive skills to move beyond it (Simon, 2001). Current GenAI models excel at storing and combining vast amounts of information; they can identify patterns and generate plausible re-combinations (interpolations) of existing ideas using statistical techniques. However, according to most tests, they struggle to extrapolate new concepts beyond their training data. Even their capacity to combine “ideas” might concern words rather than ideas, due to their lack of genuine understanding (the “stochastic parrot” metaphor). The capacity of AI to reason can also be questioned: many tests show a limited capacity of AI to perform logical inference (Shojaee et al., 2025), which is a necessary operation for articulating scientific ideas. As a result, GenAI’s capacity for conducting radical scientific breakthroughs remains limited, while its capacity for incremental novelty requires the significant involvement of humans. In testimonies of researchers collected by Castelvecchi (2024), one argues that current LLMs “cannot formulate novel and useful scientific directions beyond superficial combinations of buzzwords.” Yet, there are striking cases of GenAI models demonstrating true creativity. For instance, the model ESM3 has designed fluorescent proteins radically different from naturally occurring ones, variants that nature might have taken hundreds of millions of years to evolve (Hayes, 2025). In an experiment conducted in the discipline of natural language processing, Claude 3.5 Sonnet outperformed human experts in generating interesting research questions, although its suggestions were often less feasible in practice (Chenglei Si, 2024). In another example, an AI algorithm designed unconventional configurations for space wave detectors, some of which could dramatically increase the sensitivity of these instruments (Krenn, Drori and Adhikari, 2025). Such cases demonstrate that GenAI can produce surprising and valuable outputs, although they remain rare and highly domain-specific. One factor that can make GenAI creative is the immensity of the search space it navigates for potential solutions. AI models can identify and examine huge numbers of ideas, situated in extremely remote parts of the search space. Scientific discovery often involves chance, finding unexpected solutions by exploring many possibilities. Because AI models can test and combine so many ideas, they can stumble upon configurations that humans might never have considered yet. This combinatorial advantage is particularly effective in fields like chemistry, materials science and biology, where novelty often comes from exploring novel combinations. The fluorescent proteins generated by ESM3 illustrate how “interpolation” within an immense search space, that humans know only small parts of, can still yield significantly novel outcomes. Several additional aspects are worth noting: • Curiosity: Human creativity is driven by curiosity. Ongoing efforts aim to endow AI models with a form of “artificial curiosity” to encourage questioning established ideas, notably the ones it has found itself. • Anomaly detection: Many discoveries begin with spotting inconsistencies between observations and expectations (predictions by a theoretical model). GenAI can excel at detecting simple anomalies in vast datasets, but lacks the deep reasoning and mere “representation of the world” required for complex theoretical contradictions. • Hallucinations: Also known as “confabulations”, these are (often plausible though wrong) statements or references invented by the AI model, often resulting from the accidental merger of two or more source pieces of information. They are not an example of true creativity as they lack relevance; they are disconnected from the original question and do not carry meaning.
Although GenAI cannot yet rival human creativity, it offers complementary strengths that can enhance human creative work. A well-designed division of labour between humans and AI might then result in greater creativity of a human-AI hybrid system. Researchers are experimenting with human-AI collaborations in various ways:
• Brainstorming: AI can stimulate new ideas by challenging human assumptions, despite its tendency to mirror users’ inputs uncritically. Anderson, Shah and Kreminski (2024) found that people who used an LLM generated more creative stories than those working alone. • Productivity: Freeing a researcher’s time (by taking over simpler tasks) so that they can focus on higher-level thinking. • Balancing exploration and exploitation: GenAI’s strength in re-combining known ideas (exploitation) may allow humans to concentrate on genuine leaps into the unknown (exploration) (Gans, 2025). • Search and screening (evolutionary computation): GenAI can scan vast combinatorial spaces and present promising options for human evaluation. For example, Si et al. (2024) showed that LLMs can generate research plans, but human judgment remains essential to assess feasibility. • Flexible collaboration modes: Tools like Agent Laboratory (an AI-based research assistant) offer both autonomous and co-pilot modes, where researchers can choose when to intervene and guide AI-generated work (Schmidgall et al., 2025).
These mixed-initiative approaches underline a key insight that current GenAI is best seen as an augmentation tool that complements, rather than replaces, human scientific creativity. Additionally, some researchers worry that reliance on AI might erode human creativity, as it assumes tasks that support human’s creativity: if “writing is thinking” and if humans leave the writing to machines, then they will do less thinking as well. A recent study (Lee et al., 2025) suggests that GenAI can reduce the perceived effort of critical thinking, fostering over-reliance on AI and diminishing independent involvement in problem-solving. However, these findings are based on studies conducted in laboratory conditions, related to one-off tasks and literary creation rather than real-world research contexts, so more robust evidence is needed.
Even if GenAI enhances the output of individual researchers who use it, its widespread use could have a negative effect on overall scientific production; in particular, it could reduce the diversity in scientific inquiry. A large-scale study found that scientists who adopt AI tools publish significantly more papers and receive more citations than others, but AI also narrows the range of topics being explored collectively, as researchers using AI tend to concentrate on similar topics (Evans et al., 2024). Rather than inspiring bold ventures into new fields, AI appears to accelerate work in established, data-rich areas, raising concerns about a “homogenisation effect.” Similar findings emerge from creative writing experiments. For instance, Doshi and Hauser (2024) found that while generative AI ideas made stories more creative and engaging - especially for less creative writers - they also made the stories more similar to each other. Anderson (2024) similarly observed that users produced less semantically distinct ideas with ChatGPT than with other collaborative support tools, and felt less responsible for the ideas they generated. In sum, GenAI is unlikely to replace human creativity in science any time soon. The time has not come for an AI system to be awarded a Nobel prize (Kitano, 2016), except to make a point about the creation potential of AI (as was the case for AlphaFold). GenAI might have the potential to reshape and boost human creativity in profound ways though. It can extend the combinatorial power of researchers, accelerate repetitive tasks, and foster serendipitous (“by chance”) discoveries. At the same time, risks remain: over-reliance on AI could dampen human critical thinking, and collective creativity might suffer if the research community focuses too narrowly on questions where GenAI works best.
Veracity is another pillar of science. A scientific statement is accepted as such only after it has been validated according to certain socially vetted procedures and rules among the scientific community. GenAI has been challenged on this front, with a number of cases where it generated clearly inaccurate results.
Technical limitations
A principal problem is that AI models have a notion of “veracity” that is restricted to their training set, which is significantly narrower than that of the real world, especially for LLMs that are essentially trained on texts whose own veracity is often dubious. A second problem is “hallucinations”, or confabulations, which consist of confusing and conflating two or more pieces of information from the training set, resulting in incorrect, usually plausible though often nonsensical statements. This sometimes happens with references to the literature, where certain articles are simply invented by the AI (attributing one title to one author, as the two do exist but are not connected and the reference itself does not exist). Confabulation is generally attributed to technical factors: 1) compression of the data, a process that sometimes generates accidental mixture of data in the decompression stage; notably, the machine will mix the relevant story with a similar but irrelevant one that it has memorised, and will issue a response based partly on the irrelevant story; 2) lack of “metacognition”, a set of reasoning procedures that operate in the human brain and cross-check thoughts before they are expressed, making incoherent ideas less likely. It is also favoured by the sycophantic bias of most LLMs, which are trained to answer queries even in cases when they only find responses with low probability.
Reproducibility is a pillar of scientific operations. To be accepted by the scientific community, results must (usually) be verifiable, and reproducible by others. One condition for reproducibility is full disclosure of the methods and data that led to the conclusion, meaning transparency and accessibility. From this perspective, GenAI models do not meet scientific criteria. First, the most popular models of GenAI are “black boxes”, as neither their weights (the parameters that define a neural network) nor their training data are publicised. Thus, disentangling the contribution of the data and the contribution of various components of the model is difficult in any scientific result coming from such a model. This comes from the very nature of neural networks: knowledge is distributed, hence difficult to localise. As GenAI models have a random component at their core, some results might not be robust. In addition, access to the training data can be limited due to the proprietary nature of many GenAI models: one example is the “AI Structural Biology Consortium”, a follow-up to AlphaFold-3, an ongoing project which makes use of data owned by pharmaceutical companies, which is secret and will remain secret (Callaway, 2025). Currently, solutions for access include open weights (e.g. Llama) and open source (including access to training data). The importance of openness was demonstrated by AlphaFold2, as the disclosure of its code and data triggered a series of initiatives refining the tool (Saplakoglu, 2024). Openness is essential to the cumulative progress at the core of science. Two further issues might make AI-based research less reproducible. First is the random aspect of the working of models, which is that a model can generate different results for the same prompt for no substantive reason; second is model drift: models are regularly updated, and asking the same question to the same model at two different moments in time might bring different responses due to a change in the model’s parameters. But AI might also bring transparency to scientific research: AI systems record all that they do and can report all of their operations and corresponding outputs in more precise ways than humans. This makes it possible to reach higher traceability in research activities, and thus easier reproducibility. This transparency might also help identify and publicise “negative research results” (e.g. drugs that did not pass clinical trials). Such results are important for follow-up research, indicating avenues to avoid and saving time and resources, but are currently seldom publicised by researchers as they are under-rewarded on the market (no one got a Nobel Prize for a negative result yet). AI itself would also greatly benefit from the availability of negative results, as they would significantly enrich the data sets used for training.
Comments
Post a Comment