Exploring effective uses of generative artificial intelligence in education: An overview.

OECD Digital Education Outlook 2026.

This section examines generative AI (GenAI), a transformative technology that brought artificial intelligence into the public spotlight, including for students and education policymakers, following the launch of OpenAI’s ChatGPT in 2022. Unlike earlier educational AI systems, GenAI is available and used by students outside of educational institutions, with or without the blessing of teachers, school leaders and policymakers. This presents both significant opportunities and complex challenges for education. After clarifying what is meant by GenAI, this chapter gives an overview of the uptake of GenAI among OECD populations, including students and teachers. It then provides a summary of the knowledge and information in this OECD Digital Education Outlook 2026: research evidence on the effects of GenAI on studentlearning, examples of what educational GenAI could look like, and possible uses to improve workflows at the institution and system levels. 



GenAI is a subset of AI focused on producing new content such as text, pictures, videos, songs, mathematical equations, computer programmes, typically in response to a question or command (“prompt”).1 These outputs are generated based on large volumes of training data. To do this, GenAI relies on advanced machine-learning techniques, such as neural networks based on transformers (notably Generative Pre-trained Transformer (GPT)), embeddings, tokens, etc. Most people have experienced GenAI via chatbots based on large language models (LLMs) such as OpenAI’s ChatGPT, Google’s Gemini, Microsoft’s Co-pilot, Anthropic’s Claude, Mistral’s LeChat or Deepseek’s Deepseek. In contrast, non-generative AI systems mainly produce predictions, classifications, recommendations, and ratings, for example for movies, books or other products and services. While they may use similar techniques as GenAI, their primary goal is to identify patterns and relationships in vast amounts of data, rather than create new content. Those AI systems are sometimes referred to as “rule-based”, “predictive”, or “good old-fashioned” AI. Despite often being less visible to end users, these systems are still powerful and have a variety of uses, including in education. They are embedded in assistive technologies, for example for students with special needs, used to adapt learning to personal needs within intelligent tutoring systems, to score assessments or to predict whether students are at risk of dropping out. An important distinction should be made between AI tools that are general-purpose and those that are specialised (in our case, mainly educational): general-purpose systems are versatile and designed to serve many purposes, including educational ones, whereas specialised educational tools are designed for educational purposes only (see Table 1.1)





General-purpose GenAI tools often provide pertinent and contextualised answers to questions, with the ability to clarify and ask follow-up questions. These capabilities were not possible with earlier (non-generative AI) natural language processing. They are trained on massive data sets that exceed what humans could retrieve manually. Moreover, they are flexible and can be applied to many different subjects. Contrary to most educational AI, general-purpose GenAI tools usually offer free versions, enabling students and teachers to use them even if they are not provided by universities or schools, assuming they have an adequate device and connectivity. Even offline, small language models can run, albeit with lower performance. A series of well-known shortcomings are also specific to current GenAI systems and inherent to its technology. Because they are based on probabilistic models, they can “hallucinate”, that is, produce a plausible but wrong answer or fabricate details of an output. They do not generate consistent results over time. For example, repeating the same task several times will yield (at least slightly) different answers or productions, which is sometimes a problem. This is due to regular system updates and to their probabilistic nature. As they are trained on available datasets, their answers and other productions tend to reproduce the views and perspectives represented in those datasets, which are overwhelmingly based on English-speaking (and Western) cultures. For example, unless prompted otherwise, they will typically use Western names or examples in their production. In addition, despite appearing intelligent, GenAI tools do not “understand” the input they process or the content they generate. As a result, their outputs typically require human supervision and scrutiny, often more than specialised, non-generative AI systems. While beyond the scope of this report, GenAI also comes with a series of societal challenges. Many observers are concerned by its environmental footprint, though this is still difficult to measure and compare with other digital technology. The dissemination of AI-generated information and data may decrease the quality of future generated content (as they enter their training datasets) and amplify some current limitations of our knowledge. This will make critical thinking and the development of metacognitive and higher-order thinking skills even more important than before. The full impact of how GenAI might transform societies, labour markets and economies is still emerging.



Most people experience GenAI through chatbots based on large language models (LLMs), such as OpenAI’s ChatGPT, Google’s Gemini, Microsoft Copilot, Anthropic’s Claude, Mistral’s LeChat, and Deepseek’s Deepseek-R1. As of April 2025, based on website traffic data, chatbots dominated public use of GenAI tools, accounting for 95% of monthly traffic to the top 60 GenAI platforms. ChatGPT alone represented about 78% of the monthly visits, down from 89% in April 2023. Image-generating tools accounted for 2.4% of GenAI websites’ traffic, video and audio tools for 1.9%, and productivity and business tools for less than 0.5%. While these shares remain small, the use of these systems has grown significantly since 2023, in line with the overall growth of GenAI use. Competition is also mounting across platforms, with newcomers such as Deepseek and Perplexity gaining market share since 2023. Liu, Huang and Wang (2025) show that the use of GenAI tools has both expanded and intensified. For example, between 2024 and 2025, the number of unique users of ChatGPT grew by 42%, visits per user increased by 50% and the average session duration doubled from 7 to 15 minutes – resulting in the doubling of its traffic (113% growth). Most of this growth has been driven by users in high-income countries. In 2025, they accounted for 60% of GenAI use (compared to 55% in 2024), against 39% for middle-income countries and less than 1% for low-income countries. This reflects strong uptake in OECD members as well as accession and key partner countries such as Brazil, China and India. However, it also points toward a widening digital divide based on an adoption and use gap. Part of this gap might be due to measurement issues, as users in low-connectivity regions may not be able to access platforms via the Internet and use versions running offline on their device. Figure 1.1 presents the share of Internet users that used ChatGPT in 2025 and 2024 and thus provides an estimate of the uptake of GenAI tools across populations, acknowledging that averages mask higher usage among younger generations.

 
The share of Internet users accessing ChatGPT has increased in OECD, accession and key partner countries

ChatGPT is not generally accessible in China. While ChatGPT remained by far the largest GenAI chatbot service, local alternative chatbots tend to be more popular in their countries/regions of origin. The figure highlights the growth of the use of GenAI chatbots in almost all countries.

While there is currently no authoritative comparative data on the use of GenAI by students at different levels of education, several domestic and international surveys provide an initial picture of how widely students use these tools and for what educational purposes. In Switzerland, a 2024 statistically representative survey of 8-18 year-old students points to a steep difference in use depending on age. Around 8% of primary students stated they used GenAI tools at least once a week, 30% in lower secondary, about half in general upper secondary education, and 40% in vocational education. Use in the home followed a similar age pattern (roughly 9%, 33%, 54% and respectively) (Oggenfuss and Wolter, 2024[5]). Including uses less frequent than at least once a week, about 70% of Swiss general upper secondary students use GenAI, and other Swiss pupils use it with a similar age/school pattern as intensive users.



While there is currently no authoritative comparative data on the use of GenAI by students at different levels of education, several domestic and international surveys provide an initial picture of how widely students use these tools and for what educational purposes. In Switzerland, a 2024 statistically representative survey of 8-18 year-old students points to a steep difference in use depending on age. Around 8% of primary students stated they used GenAI tools at least once a week, 30% in lower secondary, about half in general upper secondary education, and 40% in vocational education. Use in the home followed a similar age pattern (roughly 9%, 33%, 54% and respectively) (Oggenfuss and Wolter, 2024[5]). Including uses less frequent than at least once a week, about 70% of Swiss general upper secondary students use GenAI, and other Swiss pupils use it with a similar age/school pattern as intensive users


In Estonia, a national survey of about 16 000 students found that 74% of lower secondary students and 90% of upper secondary students reported using AI tools to support their studies in 2024, with ChatGPT by far the dominant tool (70% of students use it). Beyond national case studies, a cross-country European survey of more than 7 000 12-17 year-olds across seven countries (Germany, Greece, Portugal, Romania, Spain, Türkiye, and United Kingdom) saw high use of Generative AI by students. For example, 48% declared having used ChatGPT in 2024, with almost half of them instructed to do so by their teachers. The use of GenAI for higher education students seems to align with the age pattern mentioned above, although statistically representative surveys providing information on this are not yet available. Still, a few studies have surveyed a large number of higher education students (and reweighed their answers to make them more representative). In France, a 2023 study of about 4 500 students reported that 55% of higher education students used GenAI tools. In 2025, the share had increased to 82% (Pascal et al., 2025[9]). In Germany, a survey of over 23 000 higher education students found 94% used AI in 2025, including 65% daily or weekly (Hüsch, Horstmann and Breiter, 2025[10]). A 2024 international survey of 3 000 higher education students in 16 countries also found that 86% used AI in their studies, including 54% daily or weekly. Evidence suggests that student use of generative AI has moved rapidly from marginal to mainstream since 2022. This is illustrated by looking at the trends among US upper-secondary students – the United States being one of the few places where several surveys were conducted over time. Surveys conducted in 2023 already indicated widespread exposure to GenAI, with around 25-33% of secondary students reporting having used GenAI for schoolwork. In 2024, comparable surveys suggest a marked acceleration, with close to 50% of middle and high school students reporting some use of AI tools, particularly for homework support, idea generation and explanations of difficult concepts. In 2025, about 68% of teenagers aged 15-17 reported using AI chatbots such as ChatGPT. The above-mentioned increase in GenAI engagement between 2024 and 2025 is also likely driven by younger age groups. In 2024, compared to general Internet users, younger age groups and more educated people drove a substantial share of traffic to these tools, signalling early and concentrated use among teenagers and young adults. There is no reason to believe that their contribution to this share decreased. It is possible or even likely that the early experimenters of 2024 might have transitioned to routine users in 2025. In short, students do use GenAI – a small extent in primary education, a moderate share in lower secondary education, but a majority seem to use it regularly in upper secondary and higher education. While student uptake of GenAI varies by country, the overall trends suggest student use is broadly growing across OECD countries.



Many students are clearly turning to GenAI tools for academic purposes. However, their primary motivations often center on convenience and efficiency rather than deeper learning. When asked why they use GenAI, according to a number of studies, students typically responded they wanted “cognitive support”, such as information, explanations and summaries, or “production support”, such as idea generation, drafting, and, perhaps more problematically, solution generation. In Estonia, for instance, grade 6-12 students most often reported using GenAI to achieve better scores, make educational tasks easier, and save time. These uses typically do not support student learning. Common uses include answering homework questions and generating ideas. Lower secondary students more often reported fact-checking, while upper secondary students tended to report summarising specific topics and creating visuals for presentations. In most of these cases, the primary motivation was efficiency and convenience (rather than deeper learning). Similarly, in the seven-country European survey mentioned earlier, the most common out-of-school, non-instructed learning uses are obtaining information (56%) and getting explanations of terms and concepts (45%). Nearly one-third (31%) report using AI to provide complete solutions to tasks, while fewer (20%) use it for self-regulatory functions such as structuring personalised learning plans or tracking progress (Figure 1.2). These patterns align with findings from indepth qualitative interviews with Dutch pupils.



In higher education, students seem to mainly use GenAI tools to search for information, as well as for linguistic tasks such as editing, summarising, paraphrasing, and to a lesser extent drafting. Hüsch, Horstmann and Breiter (2025) provide the most detailed categories of use and show a largely similar picture, with students primarily using it for general search, idea generation and literature research on the “cognitive” side and for summarising and drafting on the “production” side (with about 22% of regular users) (see Figure 1.3). Interestingly, about 33% of students use GenAI as a “learning partner”. Taken together, available evidence suggests that a growing number of students use GenAI for general searches, comprehension and drafting, including as a shortcut to complete tasks and homework. The uses do not seem to be very different in higher education and upper secondary education and tend to reflect the study expectations for students at these different levels.


Results from the CHE University Ranking's Student Survey 2025


While the survey does not ask whether they used AI (all kinds) or GenAI, the tasks that teachers report suggest that most uses involve GenAI tools. Teachers primarily use AI for preparation and productivity tasks: on average, 68% report they use it to efficiently learn about and summarise topics they teach, and 64% use it to generate lesson plans. Among AI users, on average 25% report using it to review data on student participation or performance and 26% use it to assess or grade student work (Figure 1.4). In addition, 40% of teachers “agree” or “strongly agree” that AI helps them support students individually, on average. Around 50% agree that AI assists in creating or improving lesson plans, though agreement ranges from as low as 18% in France to as high as 91% in Viet Nam. Seven in ten teachers, on average, believe AI could enable students to misrepresent others’ work as their own. Around four in ten teachers agree that AI may amplify biases, reinforce student misconceptions, or compromise data privacy and security. As for the teachers who have not used it, they report feeling overwhelmed by the growing expectation to integrate digital tools in education, which they see as a barrier to using AI in their teaching. This varies markedly across systems, from fewer than 20% in Brazil, Chile, Costa Rica, Italy, Morocco, Türkiye and the United Arab Emirates, to over 50% in Croatia, the Flemish Community of Belgium, Japan and Serbia. On average, three in four teachers report that they lack the knowledge or skills to teach using AI. About half of these teachers also believe that AI should not be used in teaching. In terms of school policy, one in ten teachers reported that their school prohibits the use of AI in teaching

Percentage of lower secondary teachers who agree with the following statements


What do we know about the use of GenAI tools by teachers in other levels of education? 

TALIS 2024 allows for comparisons of teachers’ use of AI at other levels in a limited number of participating countries. In these education systems, primary teachers are generally less likely to report using AI than their lower secondary counterparts, with particularly large gaps observed in Australia and the Flemish Community of Belgium. Teachers who use AI in primary education are often more likely to apply it to targeted pedagogical practices, notably to support students with special education needs and to adapt the difficulty of lesson materials to students’ learning needs, with especially large differences observed in France and the Netherlands. This may suggest that they use assistive AI technologies (for students with special needs) and adaptive learning systems, which may or may not involve GenAI. Domestic surveys and country studies largely confirm the picture drawn by TALIS in terms of GenAI uses. For example, in Estonia, a survey of about 4 000 teachers found that 53% reported using AI tools in their work, with higher reported use among primary and lower-secondary school teachers (66%) than among upper secondary teachers (50%). Teachers who use generative AI mainly do so to increase efficiency in their work, such as preparing materials, supporting planning, or streamlining routine tasks, rather than for deeply transformative or individualised pedagogical practices. The strongest predictors of use are teachers’ self-reported readiness, including confidence, access to tools, and institutional support, and their belief that GenAI provides clear practical benefits for teaching. By contrast, non-use is largely explained by insufficient skills, lack of confidence, limited training opportunities, or uncertainty about pedagogical value. Age shows a small positive association with use, while years of teaching experience do not matter once readiness and perceived usefulness are taken into account, suggesting that GenAI adoption in Estonia is driven less by seniority than by capacity-building and perceived relevance to day-to-day teaching.



Studies in Australia (AHISA, 2023[18]; Collie and Martin, 2025[19]), Italy (INDIRE, 2025[20]), Slovenia (Licardo et al., 2025[21]) and the United States (Diliberti et al., 2024[22]) provide variations on the same types of usages mainly for preparation work (generative worksheets, lesson plans, activities). They also provide differences in the use across subjects: for example, in the United States, English and “social studies” teachers were more likely to use these tools, possibly because they are more likely to either design or adapt their lessons compared to some other subjects (Diliberti et al., 2024[22]). The uptake of GenAI is likely much greater in higher education, given the widespread use of GenAI tools for research (Guellec and Vincent-Lancrin, 2026). In France, a survey of 30 000 higher education students, teachers and academic staff found that 80% of higher education teachers had already used GenAI tools in 2025, mainly to help them draft and prepare their course (49%) or draft student evaluations (26%), and more rarely to support them to correct (13%) or mark (8%) student work (Pascal et al., 2025[9]). An international study of 1 700 teachers in 52 higher education institutions reported that 68% used AI in general. Among those, 75% used GenAI to create teaching materials and 24% to generate feedback to student work. Interestingly, both surveys suggest that higher education teachers rarely use GenAI as part of their actual teaching or request their students to use GenAI. Unfortunately, data on the frequency of use remains limited. In sum, teachers’ use of GenAI varies by country and the education level they teach. While the use of GenAI seems more prominent in lower secondary than in primary when looking at TALIS 2024 data, research from Estonia shows that primary teachers there use these tools more than secondary teachers. Teachers seem to use these tools to the same or to a lesser extent than their students, but not more. In Estonia again, while 50% of upper secondary teachers used GenAI tools in 2025, 90% of their students do – and in the United States, the only study surveying students and teachers at the same time found similar levels of regular use (with slightly higher levels of use overall for undergraduate students) (Impact Research, 2024). In higher education, the share of regular users seems similar among teachers and students. Whatever the education level, teachers report mainly using GenAI tools for the same reasons: assisting them with the generation of lesson plans, pedagogical activities and teaching/learning materials, and sometimes with the provision of feedback to students.



Given the widespread use of GenAI tools by students, including outside school settings and without teacher guidance, and to a lesser extent by teachers themselves, important questions for the education sector include: how does the use of GenAI affect learning? And how can GenAI tools be used to enhance learning? The first part of the OECD Digital Education Outlook 2026 addresses these questions, which are further explored throughout this section.



Gasevic and Yan provide an overview of the emerging research on teaching and learning with GenAI and highlight that, depending on their use, GenAI tools can undermine or enhance learning. In some cases, the use of GenAI can be deceiving. For example, GenAI systems may enhance the apparent quality of student work (that is, their performance at educational tasks) without improving their actual learning (their knowledge and skill acquisition) – a paradox illustrated in several studies. A study of 1 000 high school students in Türkiye in mathematics in grades 9, 10 and 11 examined three practice conditions across six 90-minute sessions: 1) with their course notes and textbook (as usual); 2) with a general-purpose LLM chatbot (“GPT base”); 3) with an educational LLM chatbot (“GPT tutor”). Educational LLMs are configured (or finetuned) to avoid the provision of direct answers and support learning (but there is no indication in this case that it was an adaptive learning tool). This randomised controlled trial analysed the results of the students during practice and noted that the percentage of correct answers to the exercises was much higher for the students using GenAI tools compared to those working by themselves, with a much higher performance for those using the educational chatbot. However, when their knowledge was assessed in a closed-book environment, the performance gains vanished: students who used the general-purpose GPT scored lower than those studying on their own (Figure 1.5). The students who used the educational chatbot performed about as well as their self-study peers. While they enhanced their GenAI skills, one would expect effective digital learning tools to enhance learning, not merely practice performance, which hints to the possible under-performance of self-declared “educational” GenAI tools.

Successfully performing a task with GenAI does not automatically lead to learning

Other studies show similar results in different circumstances, although with less statistical power. Their purpose lies more in explaining why enhanced performance when using general-purpose tools does not necessarily lead to learning. Two studies analysed the metacognitive processes of Chinese higher education students using general-purpose LLMs to revise an essay they had written in English without using a GenAI tool. In a first randomised control trial, students were assigned to revise their essay in 4 different ways: 1) alone; 2) with human expert advice; 3) using a check list (and digital tools); 4) with a general-purpose LLM. They found no statistically significant difference in motivation among the conditions, although the three groups with external support had slightly increased motivation. In terms of task performance, the group using the GenAI tool scored highest, but knowledge gains measured by a knowledge test did not improve. In terms of metacognitive processes or self-regulated learning, the group using GenAI performed less metacognitive tasks, especially evaluation and orientation. Another study compared differences between students who revised essays by asking for human advice or used a general-purpose LLM . They found that those who interacted with human experts sought help in a linear way, following the models of “help seeking” theory: diagnosing what they need help with, asking for help, evaluating the help received, iterating, and then implementing the final help. They showed that when interacting with a general-purpose chatbot, some students tended to ask directly for help and implement immediately the solution received. They would often skip the diagnosis, evaluation and iteration stages. The authors refer to this as “metacognitive laziness”, a way of describing “cognitive offloading”. Another study is worth mentioning as it casts neuroscientific light on the above findings. In the United States, students from 5 universities were asked to write a 20-minute essay under three working conditions: writing by themselves (“brain-only”), with a search engine, or with a general-purpose GenAI tool (ChatGPT). Afterwards, within one hour of the actual essay writing, only 12% of the LLM group could quote something from their essay (exact recall) as opposed to 89% in the two other groups. Even though the essays of the LLM group were well rated, the LLM group also had a lower ability to summarise their essay’s viewpoint, lower level of ownership, and more similar content across essays. Brain imagery suggested a shift in their executive tasks from generating content to supervising the AI
generative content, with lower neural connectivity and involvement. They also showed that writing alone first and then using the LLM preserved higher levels of activation and recall. In contrast, those who started with an LLM and then continued alone had low levels of activation and recall. These findings suggest that initial cognitive activation is crucial before using GenAI. This evidence highlights that a proportion of students using general-purpose LLMs may take shortcuts, avoiding the productive struggle and cognitive effort needed for learning and durable knowledge and skill acquisition. This aligns with a “fast” rather than more productive “slow” and iterative use of GenAI, as argued by Beghetto (2026[30]), and the notion of “lazy” use of GenAI.


While evidence suggests GenAI tools sometimes enhance student performance at the cost of lasting skill and knowledge development, that does not mean positive outcomes are impossible. In fact, several chapters of this Outlook highlight some promising results too. Two types of uses of educational GenAI should be considered. Some LLM-based GenAI tools are repurposed for education. They have either been “finetuned”, that is to say, partly retrained based on education-relevant data, or “configured” through a series of instructions and prompts on how they should answer user requests. For example, in the United States, a Stanford-developed GenAI tool called “Tutor CoPilot” finetuned GPT4 based on the ethnographic observations of good teachers providing feedback. The tool was integrated in an online tutoring platform to assist in real time 900 tutors (rather than teachers) working with 1 800 underserved pupils in the United States. The intervention raised student pass rates by 4% on average, with the largest gains among less experienced tutors (9%) and those previously rated as lower quality (7%). The use of the tool made less difference for the more effective and experienced tutors. Given the effectiveness of tutoring as a learning strategy, this robust study shows promise for GenAI tools to support tutors, and perhaps also less experienced teachers. Other promising uses are also being developed to make teaching more effective.

Results from a randomised controlled trial taking place over two weeks in a higher education introductory physics course, United States, 2023

Another randomised controlled trial in the United States compared learning in an undergraduate physics course at Harvard university delivered through in-person “active learning” versus an AI tutor based on GenAI that was configured to implement “active learning” pedagogical principles online. The idea was to have the same pedagogical practices in both conditions, only the delivery differed. The study found that students learnt significantly more in less time when using the AI tutor, and also felt more engaged and motivated (Figure 1.6). In China, a GenAI tool configured to implement problem-based learning in reading led to enhanced reading performance and motivation compared to the conventional, less personalised approach. Strauß and Rummel reviewed different studies that use LLMs to support collaborative learning. They noted that most studies do not use a general-purpose LLM but have instead usually configured the model to control the behaviour of the LLM and assign it some specific role and behaviour, following previously identified good practices in research on computer-assisted collaborative learning. While evidence is limited and often lacks statistical power, these studies show that GenAI can enhance learning when tools have been configured to be educational and grounded in expert teaching practice and learning science.



A key question is whether we should move exclusively to educational GenAI tools that have been designed from the outset to support learning. While these tools are in principle more likely to support learning than general-purpose models, a pedagogical use of general-purpose GenAI tools can also develop students’ knowledge and skills. In addition, the use of general-purpose GenAI tools will foster students’ AI literacy in a broader way, allowing them to engage with some of the tools they will have to use in the labour market. For example, in the United Kingdom, a study on creativity provides some lessons that could be applied to education. The study measured how the use of a general-purpose LLM can enhance people’s creative output as well as the quality of their writing (communication). Participants were asked to write an original eight-sentence creative story and assigned to three groups: 1) they had to write independently, with no GenAI support; 2) they could ask for one GenAI idea (the GenAI was configured to provide a 3-sentence idea); 3) they could ask for 5 GenAI ideas (same principle repeated). The groups that could brainstorm with GenAI outperformed those working alone both in terms of creativity and in terms of quality of the writing, with those receiving more ideas performing best (Figure 1.7). Importantly, compared to the studies presented before, in this case they all wrote the story without using GenAI so that the benefits did not stem from cognitive (and production) offloading. However, as in the “brain imagery” study mentioned earlier, the stories written from GenAI-assisted groups were more similar, pointing to a possible negative side effect and a drop in collective creativity
A randomised controlled trial in nine secondary schools in Nigeria provided access to Microsoft Copilot based on GPT-4 in an after-school programme while the control group did not have access to it. The students in the treatment group worked in pairs and received teacher instructions on how to use Copilot, including prompts. While this was a general-purpose LLM, the students were pedagogically guided on how to use it effectively and put intentionally in a particular pedagogical scenario (peer learning). The study found a positive, medium-sized impact on learning. In Indonesia, Darmawansah et al. compared the effects of using ChatGPT rather than search engines as a support tool to prepare and implement argumentative speeches in English (as a foreign language). Far from being left alone, the ChatGPT group were provided with some initial training on prompting, and looked for initial information with ChatGPT. In a second phase, both groups were provided with a collaboration script guiding their collaborative learning. In a third phase, groups produced an argument mapping, with the help of ChatGPT in that group, before the performance of their argumentative speeches. The GenAI group outperformed the “search engine” group in learning gains on argumentation, and in self-reported levels of “critical thinking awareness” and “collaborative tendency”. The GenAI group provided more “backing” (factual information) to its arguments and benefited from the “rebuttals” provided by the GenAI, with results dependent on the quality of student prompting and of their English proficiency. A study in Vietnam making students use ChatGPT and CharacterAI in a collaborative learning setting in English as a foreign language exhibited similar findings. Other studies show that the use of general-purpose LLMs in a pedagogical manner can improve students’ learning outcomes. For example, studies indicate better knowledge acquisition when using an LLM as a teachable agent in computer science (as opposed to studying alone)

Comparison of creative writing output between those with no support from GenAI and those receiving 1 or 5 GenAI ideas


GenAI holds strong promise for enabling the rapid and scalable generation of feedback to students. Good formative assessment relies on frequent, timely, targeted, and individualised feedback on student work. Given classroom sizes, teachers cannot always provide detailed, personalised feedback to all students, which makes AI-generated feedback a plausible driver of learning outcomes. Gašević and Yan (2026[25]) review the research literature on GenAI-generated feedback and argue that GenAI can support teachers to give better feedback, although it cannot replace human feedback. The research literature comparing feedback generated by LLMs, usually configured based on marking rubrics and examples of good answers, finds that AI-generated feedback matches human feedback in quality, while acknowledging the shortcomings of human feedback. Heinrich et al. don’t find much difference on the grading of short answers in political science. Chevalier, Orzech and Stankov found that students who received feedback based on GPT-4 had similar learning gains as those who always received feedback from human instructors, as is also shown by a meta-analysis of AI feedback covering all forms of AI . Dai et al. compared the quality of human and GenAI-generated feedback on task (correctness), process (learning strategies), self-regulation (monitoring learning), and self (personal traits and motivation). On average, GenAI produces more readable and stylistically polished feedback than human educators on written essays. Human feedback tends to be more succinct. GenAI also slightly outperformed humans in the frequency of provision of feedback on process and self-regulation, which support deeper learning and learner autonomy. Despite similar performance on formative feedback, humans and GenAI can have low levels of agreement on the strength and weaknesses of student work, and thus on the marking of student work. Does this equivalent performance imply that formative feedback on students’ tasks should be delegated to GenAI? Not necessarily. Feedback is only effective if taken seriously and acted upon, which partly depends on its quality but also on its credibility and “motivational” dimension. As noted by Gašević and Yan, comparable performance does not imply pedagogical interchangeability. The growing research literature on the perception of AI feedback by students finds that students perceive human feedback as more credible and meaningful, which makes it more likely to influence their motivation, evaluative judgement, and trust – relational dimensions that remain difficult for GenAI systems to reproduce.

  As a result, the consensus among experts points towards a hybrid approach to feedback. The differences in performance between GenAI and human instructors create new opportunities to complement and enhance the effectiveness of teachers on feedback provision. For example, GenAI could generate initial feedback on student work, which teachers can use as suggestions to enhance their own feedback drafts. GenAI could also be used to assess the quality of the feedback provided based on research on high quality feedback. To the extent that their work is done on digital platforms, GenAI could also provide feedback on the process of learning, an important form of feedback that is usually out of reach for teachers. Ultimately, getting this feedback vetted by humans is essential for it to be trusted and effective for students.



In conclusion, using general purpose GenAI tools carries risks, particularly if students use them to avoid the cognitive effort that educational tasks are designed to elicit. However, learning science also recognises that learners may need “scaffolding”, support that is gradually removed as students become more proficient. When used pedagogically, GenAI tools can serve as scaffolding (Strauß and Rummel, 2026[33]). This highlights the importance of teachers to develop pedagogical competences that include GenAI in their teaching and assignments. Studies that found a positive effect in the use of general-purpose GenAI with pedagogical purpose included some development of students’ GenAI literacy. Countries are starting to develop strategies to this effect (Box 1.1). The OECD and the European Commission have developed an AI literacy framework to support these efforts (European Commission and OECD, 2025[49]). Even when teachers do not integrate GenAI as part of their teaching, they may still have to adapt their teaching practices as their students can use these tools independently. How should teachers adapt their teaching and assignments so that they continue to yield positive learning results, even for students who use GenAI as a productivity, rather than a learning, tool? While many of these attempts at pedagogical change are undocumented, a few studies offer some insights. For example, Kosar et al. (2024[50]) document how they redesigned their computer science course by: 1) changing their home assignments so they could not be directly completed by a GenAI chatbot; 2) using lab time for oral defences of the produced code, asking for “understanding” questions; 3) turning their mid-term exams into a paper and pencil exam focused on conceptual understanding. In these conditions, the groups with and without access to GenAI had similar learning outcomes. Documenting these efforts of pedagogical redesign internationally would support a more rapid sharing of knowledge around the use of GenAI.
GenAI


Finally, while GenAI hold promise for formative assessment and the quality of the feedback given to students, it may mainly be used to support human instructors rather than to fully automate feedback. The human relationship remains a core element of teaching and learning, and AI-generated feedback does not come with the same credibility and motivational drive as human feedback, even when its quality is equivalent. While most research has so far focused on general-purpose GenAI tools, future studies should examine the effects of GenAI tools designed for education. These tools seem more promising. Nevertheless, their efficacy in terms of improving learning outcomes or pedagogical competence should be evaluated as a minimal requirement for their adoption. As shown in the reviewed evidence, educational GenAI tools can be both student-facing, teacher-facing, or both. In practice, many tools combine these roles.

While general-purpose GenAI tools can support learning when used with clear educational purpose, current evidence suggests that the development of education-specific GenAI tools may hold even greater promise for improving teacher practice and student learning outcomes. This raises an important question: what does an educational GenAI tool look like? At a minimum, any educational GenAI tool should generate safe and age-appropriate content, respect users’ privacy and data protection, be explainable and transparent, and mitigates algorithmic bias to the extent possible. Beyond these safeguards, it should also be “educational” and help teachers to teach more effectively and enable students to learn more or to catch up with peers. This report provides several examples of prototypes or early implementation of such educational GenAI tools.


Adaptive learning systems are one of the most used digital learning tools in education systems. They illustrate the “personalisation” agenda associated with AI by providing learners with the possibility to practice and expand their knowledge. These systems usually assess students’ initial knowledge and skills as well as their misconceptions, diagnose types of problems that students should be performing, and adjust the difficulty of problems depending on how students perform. Impact evaluations show that these systems are overall effective for learning. Intelligent tutoring systems usually provide feedback and support students to learn, rather than simply telling them whether they were right or wrong. These rule-based AI tutors struggled with unanticipated student inputs or questions though, which limited the scalability and the richness of tutoring interactions. GenAI models, particularly LLMs, overcome these constraints, enabling more engaging and versatile tutoring experiences. Recent research that compares legacy intelligent tutors to next-generation LLM-driven systems help articulate a vision for GenAI pedagogical agents. Li and Hu show how LLMs’ capacity to dynamically generate fluent, contextually appropriate dialogue brings new opportunities to intelligent tutoring systems in terms of adaptability to learner profiles and ability to be applied to different subject matters. They allow for a more flexible tutoring experience, capable of addressing unforeseen questions or novel problem scenarios in real time. GenAI tutors can produce human-like explanations, ask clarification questions, and scaffold student thinking through multiturn dialogue. Using techniques such as retrieval-augmented generation (RAG) or fine-tuning, they can incorporate up-to-date factual information into their tutoring. Still, the big challenge is ensuring GenAI tutors are configured or fine-tuned to have interactions that are pedagogically sound, as was the case for effective intelligent tutoring systems not based on GenAI. Through mechanisms like conversation history or explicit memory modules, GenAI tutors can iteratively refine learners’ profiles and adapt tutoring sessions accordingly, for example by adjusting difficulty or revisiting past misconceptions. Keeping the challenge level appropriate for learners is one of the key insights of learning science. GenAI tutors also have the potential to play different pedagogical roles, and to shift between these different roles according to circumstances. They can play the role of: mentors, providing academic guidance; coaches, providing motivational support; peers, with less formal interactions that resemble peer learning. Researchers are also exploring how they could become companions and support learners across a wide range of learning and subjects. Often drawing on Socratic questioning and related strategies, GenAI tutors can provide scaffolded dialogue: guiding learners to develop their knowledge through carefully sequenced questions rather than delivering answers outright. This approach is rooted in Vygotskian “scaffolding” and “zone of proximal development” theory, where support is provided just beyond the learner’s current ability and gradually withdrawn as competence grows. GenAI tutors are particularly well-suited to implementing Socratic questioning as they can generate an extensive range of questions and follow-ups and flexibly rephrase or adjust the difficulty of questions based on learner responses. Liand Hu take the Socratic Playground as a case study to illustrate the possibilities of GenAI tutors, as well as presenting the underlying AI models that make it possible.


Improved learning outcomes come from sound pedagogical principles. When it comes to collaborative learning, Strauß and Rummel emphasise that general purpose GenAI systems are unlikely to support effective collaborative learning, just as random human beings, even if knowledgeable, would not automatically make collaboration successful. To be effective, GenAI tutors should play different roles and target group interactions differently. Research identified several roles GenAI played when integrated into collaborative learning environments: 1) it served as a repository of information for the group; 2) it collected information about the group, its collaboration, or its results; 3) it generated additional learning material for the group to use in their reflection, for example a contrasting case; 4) it encouraged active participation as a facilitator positioned as “outside of the group”; 5) it aimed at developing domain-specific knowledge as a dialogue partner, in the Socratic spirit mentioned above; 6) it brought a specific expertise to the group as an artificial group member. By playing these possible formal roles, GenAI chatbots can target different aspects of the collaboration: the cognitive part, by providing knowledge and expertise; the social part, by making sure that students contribute equally and that all voices are heard; the metacognitive part, to encourage students to reflect on the collaborative process or encourage them to make some steps in their reflections. The small number of studies on GenAI collaborative learning show small- to medium-sized positive results, but importantly, GenAI does not orchestrate collaborative learning on its own – researchers and practitioners do. All the possible roles mentioned above must be assigned by researchers or developers by configuring or fine-tuning the GenAI tools used, building on the accumulated knowledge of computer-assisted collaborative learning. One cannot assume that, by itself, general-purpose LLMs can spontaneously take on different roles to make collaborative learning successful. However, the rise of GenAI is opening new avenues for computer-assisted collaborative learning that show promise if they remain aligned with existing pedagogical knowledge and research.



As mentioned earlier, teachers often use general-purpose GenAI to support their work, for example the generation of lesson plans and learning materials. Some studies have shown that these activities can bring productivity gains. For example, a randomised controlled trial examined the use of GenAI among 259 teachers across 68 secondary schools in England and found that teachers who received practical guidance to use GenAI reduced their lesson and resource planning time by an average of 31 percent, reducing their weekly average planning time from 81.5 to 56.2 minutes, without compromising the quality of their lesson plans and resources. However, such uses may come with risks similar to those observed with students, of “cognitive offloading” or “metacognitive laziness”. Teachers and education systems will have to explore and define how, when and in which cases it is appropriate to use these tools. If teachers offload too many of their tasks to GenAI they may stop their professional development and harm their relationship with students. For example, evidence shows that students prefer to receive feedback from human teachers, sometimes even if they rated the feedback from GenAI tools as superior. A related question is what design features ensure educational GenAI tools for teachers improve teaching quality, while maintaining teacher agency and autonomy? Cukurova proposes a conceptual framework for human-AI interactions and defines three different types: replacement (or full automation), when the AI tool accomplishes a task or sub-task for the teacher; complementarity, when the AI tool amplifies a teacher’s capabilities while the teacher remains actively involved; augmentation, when the human-AI system accomplishes a task with an improved output that outperforms what either the human or the AI system could have performed alone. While replacement may boost teachers’ productivity by saving time, it may also come at a cost in terms of teacher professional development and autonomy but also impoverish human relationships in teaching and learning. Which tasks one would want to automate requires thorough reflection. Complementarity gives a boost to teachers’ productivity while the teacher is still in control, but typically without enhancing teachers’ competence. Augmentation requires interactions during which both teachers and AI evaluate and critique each other’s suggestions and propositions to move towards a shared understanding and mutual development while solving a problem. Cukurova presents different examples of human-AI interactions and argues that GenAI tools provide new possibilities for augmenting teachers’ professional abilities. For example, Reza et al. developed a GenAI tool through a co-creation process with 10 mathematics teachers who worked on content creation for adaptive learning platforms. Using a prompting tool, teachers could quickly see how small changes affected the feedback being provided to students. In this case, the human-AI iterations were iterative and evaluative, but reduced the teachers’ perceived workload by 50% and shortened the content development process from several months to a few hours. While GenAI holds promise for more reciprocal exchanges thanks to its dialogic functionalities, GenAI tools with the capacity to push

 alternative perspectives grounded in educational theory and evidence remain to be developed. It will require advances in both the cognitive modelling of teaching expertise and the design of teacher-AI interaction interfaces. The process to design educational GenAI systems also matters. For example, Topali, Ortega-Arranz and Molenaar provide an example of the different steps of the human-centred design approach for educational GenAI tools for teachers. By involving teachers and students in the design from the start, first by eliciting teachers’ and students’ uses and expressed needs and suggestions, the developers involved them at different stages of creating a prototype. This involvement ensured the development of tools that align with teacher and student needs but also recognised their autonomy and agency, both during the design and use stages of the GenAI tool. The exemplar prototype co-designed through this process allows teachers to monitor student-AI interactions and to set the GenAI behaviour to some extent. Teachers have the possibility to easily set the “percentage” of hallucinations of the tool, depending on how much critical thinking they want students to exercise while interacting with it. From the student perspective, this is a general-purpose GenAI chatbot that they use for education. Teachers valued the potential of the tool to increase their insights into student progress and to personalise their feedback, while maintaining pedagogical control. However, some pointed to the risk of added complexity and workload. Meaningful AI integration in teaching requires intentional design of autonomy, where teachers define pedagogical approaches, have settings allowing them to define AI behaviour, and retain responsibility for educational interpretation. GenAI then serves as an assistant within teacher-defined parameters. Some educational GenAI tools that meet these requirements are already developed and deployed to support teachers, teacher assistants and students in higher education. Baker et al. provide several examples in Czechia and India, and present an AI teaching assistant that they developed in the United States as a case study: JeepyTA. JeepyTA supports teaching assistants, students and teachers at various tasks: 1) answering logistics questions related to the course (admission, requirements, dates, etc.); 2) providing feedback to student essays based on specific pedagogical goals and rubrics designed by the instructors; 3) responding to student reflections and questions on the course readings and lectures, offering additional clarification, prompting further thinking, and connecting ideas across course materials; 4) providing debugging support to programming code (in courses where programming is not the learning goal); 5) generating a discussion prompt to start weekly discussions among the class and summarising conversations on the discussion forum, sometimes to provide information to teachers, sometimes to make it visible to learners; 6) suggesting ideas for essays and supporting student brainstorming; and acting as different personas during a course. In order to maintain teachers’ autonomy and control of the tool, instructors or teaching assistants can modify the tool settings so that responses are automated (and go directly to students) or reviewed by them first. This helps prevent the provision of incorrect or misleading information, a particular issue in subject areas where misconceptions are highly present on the web and therefore also in the LLM knowledge base. Instructors or teaching assistants can review flagged responses. If a response is inaccurate, they can discard it and reply directly. If the response is mostly correct but needs refinement, they can edit it before posting. When a response is accurate and well-structured, it can be approved with no changes. This additional layer allows JeepyTA to provide timely support while making sure students receive information that is accurate, relevant, and aligned with the course objectives. While researching the efficacy of JeepyTA and similar GenAI use cases is still underway, existing studies show that JeepyTA reduced median response times to students from around 7 hours to approximately 2 hours, while keeping human oversight in place. It brought some productivity gains leading to a better student experience. Students rated JeepyTA as comparable to human teacher assistants in clarity, accuracy and professionalism, though it was weaker in motivating students or offering higher-level developmental guidance. When used for feedback on essays, JeepyTA raised the proportion of students achieving top grades on essay assignments from roughly 64% to 95%, illustrating that careful prompt design and alignment with instructor expectations can significantly improve revision quality. However, risks include homogenisation of ideas (seen in brainstorming tasks), overreliance, and the temptation for institutions to reduce human teaching agency. Together, these empirical findings confirm the potential of GenAI to amplify teacher productivity, improve instructional quality, while keeping human oversight and agency central to system design.



GenAI tools can also support education systems and institutions in ways that do not immediately impact learning outcomes. Like other sectors, GenAI offers opportunities to streamline workflows and improve the operational efficiency of educational institutions and systems. This Outlook focuses on this aspect through three lenses. It shows how GenAI techniques can already support the improvement of “back office” processes in higher education institutions, such as degree recognition, educational programme design, and support for study advisers; the development of standardised assessment items; and new opportunities and challenges for scientific research, which are also largely similar for educational research.


One of the challenges for higher education institutions is articulating their programmes with those offered by other institutions, domestically and abroad. Recognising equivalences between them is key to ensuring that students can change study paths without losing time and to support student mobility across institutions, for example. Some systems address these issues by designing national degree structures or using common credit systems that ease pathways across institutions and subjects for students. Articulation agreements and exchange programmes between institutions play a similar role. In practice, however, transfer decisions and articulation agreements are still largely human made and time-consuming, requiring faculty and admission officers to review syllabi and programmes and make a judgment on the equivalence between institutions. Simplifying these processes can enhance the systems’ efficiency but also their completion and attainment rates, thanks to more flexible pathways for students. School systems can face similar issues when, typically in secondary education, they start offering more tracks and choices of courses to meet students’ interests. Pardos and Borchers show that models underpinning GenAI can support these equivalency processes by making relationships between courses, within and across institutions, apparent. To this effect, the AI model processes the text description of the courses and/or the past enrolment patterns to provide administrators with better clarity on which courses within their own institutions are close or similar or well-articulated, and which external programmes (and courses) can be considered equivalent to theirs. Current research shows that GenAI tools can be very similar to human judgment at identifying equivalences (and can also uncover new equivalence possibilities), but adoption depends on trust and the way information is displayed to the final decision makers. These models could also provide advice to students about the next steps of their studies, for example by recommending institutions or programmes. In the same spirit, work on how GenAI can support study advisers is emerging. In education systems and higher education institutions, study advisers typically provide advice on course selection and career pathways. Lekan and Pardos developed a GPT-driven model that asks first-year university students about their course preferences and career goals, and then gives recommendations with justifications for advisors to review before in-person meetings. Academic advisors rated the suggestions of the GenAI tool favourably, fully agreeing with the GenAI-generated major recommendation 33% of the time, saving them time while maintaining both their professional autonomy and student relationships. Other functionalities using similar techniques can be mentioned, such as the automatic tagging of learning content according to changing taxonomies. Tagging learning resources such as open educational resources, or the learning content included in an adaptive learning system, is essential for their discoverability and ensuring that they match local curricula.




GenAI is increasingly used in the development and delivery of standardised assessments, including in high stakes contexts. Pardos and Borchers and von Davier highlight how LLMs can automate the creation of multiple-choice and short-answer assessment items, particularly when anchored in existing curricular material or when models are initially designed by experts, possibly assisted by GenAI tools. For example, Bhandari et al. report that ChatGPT-generated algebra items demonstrated comparable psychometric properties as those created by humans. Notably, the LLM-generated items exhibited slightly stronger differentiation between high- and low-ability respondents, suggesting that GenAI can produce assessment content of similar or even superior quality under controlled conditions. This represents time and productivity gains for both national authorities, test developers and potentially instructors to design assessments for students. This can also address known limitations in traditional test banks, for example the persistent challenge of overexposure of items (which makes them less effective). The limitation is that until GenAI stops hallucinating, instructional staff will still need to check every problem before it is seen by a student. GenAI can also be used to innovate standardised assessments. For example, von Davier shows how the Duolingo English Test introduced two new types of writing and speaking assessments that would not have been possible without the use of GenAI. One is an interactive writing task, where a chatbot provides mid-task feedback to the testtakers as they are writing a short essay in English, suggesting additional directions and revisions. Another one is an automated assessment of their oral speaking during a dialogue in “natural language” with a GenAI-powered agent. In the case of high stakes assessments, generative models are just a layer of a more complex architecture with other AI tools and humans. Finally, GenAI models can be used to evaluate and calibrate the quality of standardised items. Liu et al. demonstrate that multi-agent AI models, which bring together ensembles of LLMs acting as “synthetic” or “simulated” respondents, can produce response distributions with psychometric properties closely aligned to those of students. These augmentation strategies, such as adding LLM responses to even a small set of human respondent data, suggests that LLM-based calibration can complement limited student response data, reducing costs and accelerating item validation cycles in the development of standardised assessments.



GenAI is having a significant impact on scientific research, with growing use by researchers to write academic papers and assist them at different stages of the research process. Guellec and Vincent-Lancrin highlight these trends, providing examples of how GenAI intervenes in the research process in the natural sciences, including the elaboration of literature reviews, analysis of large datasets to generate new research questions or hypotheses, and cooperation with humans to generate and perform research and experiments. While GenAI has already had spectacular achievements, such as the generation of the 3D structure of 200 million proteins, it usually still involves intense human supervision and presents some risks for the research enterprise, such as reduced collective originality and the increasing impossibility to keep up with the collective research output. Improving the quality of learning and effectiveness of education systems requires investment in high-quality research and evidence to inform policy and practice. While information on how education researchers use GenAI is not available, they likely leverage GenAI tools for tasks like writing and editing research papers and performing literature reviews. Beyond these tasks, three areas stand out as particularly promising for GenAI to support educational research. First, while education is a data-rich environment with lots of administrative data collected for the smooth operation of education systems and institutions, those are often under-analysed because of legitimate privacy concerns. GenAI applications can now easily generate synthetic privacy-preserving datasets that reproduce the statistical characteristics of a dataset with very little risk of privacy breach (as the dataset is entirely created). Second, the use of simulated data to augment real ones could also be tested and applied, albeit with caution and when the context makes it appropriate, to supplement real data where survey response rates are too low. Lastly, the rise of multi-agent models based on GenAI (also called “agentic AI”) allows new possibilities for research that could be deployed in the case of education, where answering research questions often require interdisciplinary perspectives.

The emerging evidence on GenAI highlights its potential to improve the quality and effectiveness of education. It also demonstrates that it carries risks for student learning and for the professional development of teachers. GenAI appears more disruptive than non-generative AI because students have access to general-purpose GenAI platforms. They can and do use them at home to perform their educational assignments. Even if general-purpose GenAI tools are not used in education institutions, their availability outside of school would challenge current educational processes. As a result, education stakeholders should consider how education systems can leverage and/or adjust to GenAI tools. Many countries have included digital skills, including GenAI literacy, as part of their curricular objectives. Students should acquire GenAI literacy over the course of formal education, mainly to prepare for the labour market and for societies where GenAI will likely continue to play an important role. In some domains, such as computer programming, students’ employability depends on their ability to use GenAI to code, even though understanding the core concepts and principles of programming remains a must. Digital content will also increasingly incorporate AI-generated content, hence the importance for all to have some understanding of how GenAI works. The acquisition of knowledge, skills and attitudes in various domains remains crucial to young people’s education, including reasoning, critical thinking, creativity, empathy, curiosity and judgment. While these skills can be acquired with no technology, GenAI could be leveraged by teachers and students for this purpose. Current evidence shows that educational GenAI tools aligned with educational knowledge and science can lead to better learning, but also that teachers can use general-purpose GenAI effectively if they embed it in a clear pedagogical strategy. In that context, GenAI tools could be used at any stage of the educational process, provided the GenAI tools are well designed or used with sound pedagogical purpose. GenAI tools can also be used or designed to support teachers and other educational staff such as teacher assistants or study and career advisers. While several studies show that they can improve their productivity, allowing them to spend less time on some tasks with the same quality of output, they face the same risks of cognitive offloading and lack earning as students. Research and development on educational GenAI is exploring how GenAI tool can maintain the users’ autonomy, professional learning and sense of responsibility in the final output. This can take different avenues, from co-creation with end users following the “human-centred design” approach to ensuring that teachers or other educational staff can adjust the tools to their local context and objectives. The provision of formative feedback to students, a crucial but time-consuming task for teachers, is a good point in case. Current research shows that feedback generated by GenAI, while often not as reliable and consistent as feedback provided by non-generative AI, matches or surpasses human feedback, given its own strengths and limitations. Still, most studies show that students find human feedback more meaningful, trustworthy and motivating. In this context, using GenAI to assist and complement teacher feedback, while they assume full responsibility for this feedback may be the way forward. Understanding of what makes educational GenAI tools effective, and how this effectiveness compares to non-generative AI tools, is just emerging. Similarly, knowledge about the effective integration of general-purpose GenAI systems in teacher-designed learning scenarios is nascent. Several country initiatives will provide new knowledge on possible approaches (Box 1.2), highlighting the importance of international co-operation and educational research in this area. Current research on the use of GenAI in education is still limited and could be strengthen by research investment and international co-operation. For example, most current results are based on very short interventions rather than a continuous, repeated use of GenAI tools over longer periods of time. 
When considering how to effectively use GenAI in education, here are some key take-aways to consider:
 • Successfully performing an educational task with GenAI does not automatically lead to learning; 
• Acquiring and demonstrating foundational knowledge and skills in key subjects without the use of general-purpose GenAI remains key; • GenAI tools, whether educational or general-purpose, should be used within learning scenarios intentionally designed by teachers to achieve specific learning goals; 
• When using GenAI tools, teachers and other education staff must continue to exercise their professional judgment and remain responsible for the quality of the output by evaluating, modifying or endorsing the AI-generated output; 
• GenAI developers should design education-specific GenAI tools based on educational research and pedagogical knowledge, and involve teachers as well as other stakeholders such as students or parents or teacher unions, as appropriate, in the design process; 
• International co-operation on scientific research to assess the impacts of well-defined pedagogical uses of GenAI tools on students’ and teachers’ learning and well-being will help leverage these tools in an effective way.
 


As an example of the actual introduction of educational tools in school, Korea has licensed and made available GenAIpowered tutors to teachers and schools as AI digital learning materials that work like regular intelligent tutoring systems, allowing students to have adaptive practice in several subjects, and teachers to receive feedback on students’ possible misconceptions. Such features include adaptive explanation, automated feedback and interactive dialogue aligned with the national curriculum. With its AI Leap programme implemented in 2025-26, Estonia aims to explore the use of GenAI in upper secondary education, with a system-level approach combining infrastructure, curriculum development, teacher training and partnerships with technology providers. The programme has made available general-purpose GenAI tools to all teachers and will provide free access to high school students to LLM chatbots that are configured to be educational and follow education research principles (and also to stick to Estonian as the language of interaction). In Greece, selected upper secondary schools pilot the use of ChatGPT Edu through the OpenAI for Greece partnership (launched in 2025). The project includes teacher training and monitoring of the pedagogical impacts of using GenAI. Other countries focus on the introduction of teacher- and school-facing tools. Slovakia pilots AI assistants for lesson planning and assessment. Finland tests GenAI applications primarily for teacher support and feedback. Japan, Canada and Australia conduct subnational pilots focused on writing support, feedback generation and workload reduction. France develops a “sovereign AI” for education that will support teachers for lesson planning as well as a Chatbot that will provide generic answers on human resource management questions to its 1.3 million teachers, allowing humans to focus on individual cases.


Two examples can illustrate approaches to the development of appropriate GenAI tools for education.

 In the United Kingdom (England), the Department for Education’s “content store” consolidates curriculum guidance, lesson plans, and anonymised pupil assessments to support the training of AI models, enabling the development of accurate, high-quality, and legally compliant educational GenAI tools tailored for English schools. Safety expectations for GenAI tools were also developed, providing developers with a clear set of expectations that can facilitate adoption by schools. In the Netherlands, the National Lab on Artificial Intelligence (NOLAI) co-designs and develops educational GenAI tools (among other AI tools) for the education system through a partnership between government, academia, industry and schools.

Comments

Popular posts from this blog

(Day 2) Beyond the Algorithm: Navigating the Future of Artificial Intelligence - 49th Annual UNIS-UN International Student Conference.

(Day 1 - Part 2) Beyond the Algorithm: Navigating the Future of Artificial Intelligence - 49th Annual UNIS-UN International Student Conference.