Fostering collaborative learning and promoting collaboration skills: What generative AI could contribute.
This section illustrates the potentials of generative AI (GenAI) to support collaborative learning and
reviews the emerging research. After presenting how technology can support collaborative learning,
we illustrate the roles that GenAI can play during collaboration (for example serving as a repository
of information or as a teacher/tutor), which aspects of the collaboration it can support (for example
providing knowledge or fostering social interaction), and whether this support affects learning in
terms of domain-specific knowledge and effective collaboration skills. We conclude by discussing
potential ways to combine GenAI with established types of support for groups.
Collaborating with others in small groups can be effective for learning, fostering not only domain-specific knowledge
but also the knowledge and skills necessary for learning and working in teams. At the same time, groups encounter challenges during collaboration and efforts have therefore
been undertaken to design support that helps groups navigate these challenges. Work within the learning sciences,
especially the field of computer-supported collaborative learning (CSCL), has a long-standing tradition of developing
and testing computer-based support for collaborative learning. Such work has now been expanded by research
exploring the use of machine learning models, which fall under the term ‘generative artificial intelligence’ (GenAI)
in the context of collaborative learning. However, a clear picture of the landscape of GenAI in CSCL is currently
lacking. This chapter therefore illustrates how generative AI systems are currently used to support groups during
collaborative learning, and analyses their effectiveness in supporting collaborative learning. To this aim, we review a
broad sample of current research studies in which GenAI is utilised to support collaborative learning.
To understand how GenAI can be designed to support collaborative learning, and to determine what is required
to provide groups with support that is sensitive to their current needs, it is useful to elucidate how learning during
collaboration occurs. Therefore, we first provide a brief overview of the core mechanisms underlying collaborative
learning and illustrate why groups benefit from support. Drawing on previous work that focused on utilising computer
technology to support groups, we then present the most prominent CSCL approaches to supporting groups. This
allows us to place GenAI in the broader context of instructional support, to compare it to previous efforts in this field,
and to identify the potentials that this technology brings to the table in terms of providing groups with personalised
(i.e. adaptive) support.
The main body of this section describes how GenAI systems are currently used to provide groups with support. Here,
we illustrate the roles that GenAI can play during collaboration (for example serving as a repository of information or
as a teacher/tutor), which aspects of the collaboration can be supported by GenAI (for example providing knowledge
or fostering social interaction), whether this support affects learning in terms of domain-specific knowledge and
knowledge about effective collaboration, and on what grounds GenAI tools determine when and how groups can be
supported.
Our review illustrates the potentials of GenAI for supporting collaborative learning and underlines the need for more
systematic efforts to determine how GenAI should be designed to support collaboration, especially in terms of how it
affects collaboration processes and subsequent learning. Although more research is needed, future efforts to design
effective support using GenAI can leverage established insights from learning science research. We conclude the
chapter with an outlook, in which we discuss potential ways forward regarding the design of GenAI support, how this
technology can be combined with established types of support for groups, and how we can investigate the benefits
of GenAI support and deepen our understanding of its impact on collaboration and learning processes. Importantly,
we point to several issues that need to be navigated during this process.
Evidently, collaborative learning is complex and challenging for learners. Therefore, learning science research has
explored how to help groups engage in effective interaction that eventually affords learning. In the next section, we provide an overview of the rationales that guide the design of such collaboration support and illustrate the most
relevant types of adaptive support, starting with support that does not rely on GenAI. This overview serves as the
basis for analysing current implementations of GenAI and discussing future pathways for the development of GenAI
support for collaborative learning.
Providing groups with adaptive (personalised) support is not a new concept. The earliest efforts in CSCL stem from
the 1990s and investigated support that is not adaptive. While
even such non-adaptive support has been shown to enhance the beneficial effects of collaborative learning, adaptive
support has been a constant theme in the discourse around collaboration support.
The conceptual basis for providing groups with support that is tailored to their needs is the concept of scaffolding.
Scaffolding evokes the metaphor of scaffolds during the construction of buildings, denoting that learners are given the
means with which to accomplish hitherto unattainable learning tasks.
Essential during scaffolding is that the learner performs all parts of the task on their own, despite receiving support
(“independent activity”). Scaffolds come from a more knowledgeable other, such as a
teacher, learning materials, or a digital learning environment. For instance, scaffolds can model the problem-solving
process, direct attention to important aspects of the task, or elicit explanations. Given that
every learner enters a learning situation with different prior knowledge, a tenet of scaffolding is adaptivity. The same
is true for groups of learners. Such adaptive (or personalised) support is adjusted to the relevant characteristics of
the learner, the group or their interaction, with characteristics being “relevant” if they are expected to affect groups’
ability to achieve their goal (e.g. acquiring new domain-specific knowledge.
This resonates with works in CSCL, especially research on external collaboration scripts as research on expertise and collaborative learning scripts has
shown that not all groups benefit from the same amount of support. Rather, support that is too fine-grained can
be expected to lead to lower performance in learners who already have high competence whereas learners with little prior experience
may struggle to collaborate effectively when they receive too little support (“underscripting”).
Adapting the support to a group’s needs is thus seen as crucial.
When designing support for collaborative learning, researchers and developers have to make a number of decisions,
for which the framework of CSCL design dimensions presented by Rummel can serve as a conceptual guide
(Figure 4.1).
First, developers must decide what the support is ultimately expected to help groups achieve, that is, the goal of the
support. This goal may be acquiring new domain-specific knowledge, acquiring collaboration skills, having satisfaction
with the collaboration, or other relevant outcomes. Next, it has to be decided how the instructional support can
achieve this goal. Rummel refers to this as the target of the support. For instance, in order to co-construct
knowledge that each group member did not hold before, the group may receive support that targets their interaction,
for instance prompting groups to engage in a discussion. The same goal
may be addressed by helping groups monitor their understanding and support each other repair misconceptions.
Besides other dimensions such as timing, addressee, or availability of the support, it also has to be decided whether
the support is implemented in a fixed (every groups receives the same support at the same time), adaptable (each
group can decide which support they receive) or adaptive way (a system decides under which circumstances a group
receives which degree of support).
The foundation for an adaptive system is a model of relevant prerequisites for collaborative learning and desired
states or processes that can occur during collaboration+. This model underlies the
processes performed by the automated support system. The early works on adaptive support for collaborative
learning comes from the 2010s and leveraged techniques that represent “good old-fashioned”, or symbolic, artificial
intelligence – they were rule-based.
In order to provide groups with adaptive support during the collaboration, a system has to collect information
about the collaboration, process this information to determine whether the group requires support, and then deploy
adequate interventions that help the group achieve their goals.
Following Molenaar in the first step, a system has to collect information about the learners, the groups and the collaboration process (detect), and process this information into indicators that represent relevant insights into characteristics and processes during collaboration (e.g. the distribution of relevant domain-specific knowledge among the group members). One of the early approaches to automatically collecting and processing relevant information about the collaboration process was leveraging text classification approaches, a group of machine learning techniques from computational linguistics. Based on their experiences from an interdisciplinary project, Rosé et al. discussed how text classification can be utilised to analyse transcripts of collaborative dialogues in order to automatically analyse the process of knowledge co-construction using a multidimensional coding scheme (instead of merely analysing the content of the dialogue). This ability to automatically assess collaboration processes in terms of relevant events (e.g. knowledge co-construction) then provided the basis for deciding upon groups’ need for support. The authors urged great care when developing the indicators to be used by an automated system to assess the collaboration process. For instance, it is essential that the indicators are externally valid, and actually represent what they aim to represent These indicators inform the system about the current state of the collaboration and are then compared to a desired goal-state (diagnose). Depending on the outcome of this comparison, support may be offered (act). Deiglmayr and Spada underlined the importance of developing rules that govern the behaviour of the support system. Specifically, they illustrate production rules (IF-THEN rules) that specify which support is offered under which circumstances; Radkowitsch et al. These rules can target critical moments (for example, “IF state is detrimental, THEN offer support A”) or opportune moments (“IF potential for beneficial interaction, THEN offer support B”). Following the concept of scaffolding, systems that support collaborative learning are designed to only provide support as long as the group requires it. As a consequence, the degree of support is gradually reduced (or if necessary increased) as the competence of the learner or the group changes. Such fading can be realised, for instance, by removing certain action prompts after they have been shown for a set number of times. Thus, support in (and for) collaborative learning (i.e. scaffolding) is not available indefinitely, but only so long as learners require it. After illustrating the foundations of the mechanisms behind learning during collaboration and how support for these processes can be conceptualised, we turn to the most central approaches to providing groups with adaptive support. Here, we begin with providing an overview of well-established approaches.
Methods for supporting groups include:
1. adaptive collaboration scripts; Rummel et al.);
2. adaptive tutoring systems that have been modified to accommodate collaborative learning and tutoring; or
3. virtual agents in the form of collaborative conversational agents and chatbots.
These approaches aim to facilitate interactions that are conducive to knowledge co-construction or that inhibit
undesired interaction patterns while providing learners with opportunities to internalise beneficial interaction patterns. Besides approaches that aim at directly fostering the collaboration
in groups, some approaches instead are designed to help the teacher to monitor and orchestrate the students’
collaboration.
These systems often use production rules (IF-THEN rules) to generate diagnoses of the collaboration or decisions
about the need for support based on production rules. Despite the effectiveness of adaptive collaboration support,
over the years it has not been implemented on a broad scale. One reason for this may be the complex boundary
conditions for effective collaborative learning which have to be understood well in order to formulate production
rules. Moreover, assessing the state of the collaboration in a valid and automated manner is far from trivial.
The recent introduction of chatbots that leverage large language models (LLMs) has sparked a new conversation
about providing groups with adaptive support, with often high hopes regarding their ability to provide groups with
adaptive support.
As illustrated in the previous section, there is a long-standing tradition of employing computer technology to provide
groups with adaptive scaffolding. Research and development in this field was reinvigorated when the company
OpenAI provided public access to their LLM GPT 3.5 in the form of ‘ChatGPT’. The introduction of LLM platforms
such as OpenAI’s ChatGPT, Google’s Gemini, Microsoft’s Co-Pilot, Anthropic’s Claude, Mistral’s Le Chat, or DeepSeek’s
Deepseek have been touted as "a new type of pedagogical scaffolding"with the potential to
reshape and revolutionise education.
Despite not being dedicated educational technology, expectations that this technology can achieve a more thorough
personalisation of support than previous attempts are high. The hope is that past efforts can be continued with
adaptive support that is more flexible than what has hitherto been possible.
For the context of collaborative learning, authors highlight the capabilities of GenAI (usually LLMs) to process inputs
from different modalities such as text, speech, or images, as well as to retain the context of the conversation and
generate adequate responses, such as (almost) human-like, coherent texts or other content that is similar to human-created material. Furthermore, these tools are expected to perform automated assessments of the collaboration and
provide immediate, tailored, adaptive scaffolding guidance, and correction of errors. Eventually,
this adaptive support is expected to lead to improved learning through collaboration.
As described above, to provide adaptive support, a system has to achieve three goals: 1) detect relevant characteristics
of the group members, the group as a whole, and the collaboration process; 2) formulate a diagnosis about the
current state of the collaboration by contrasting it with a desired goal state; and 3) select and deploy scaffolds (see
Molenaar, 2022[43]).
Generative AI may detect and diagnose tasks. Like approaches from computational linguistics (see Rosé et al.,
2008[45]), generative AI LLMs have natural language processing capabilities that can make a valuable contribution to
the analysis pipeline within collaboration support. For example, Zheng et al. (2021[79]) used an LLM to label data from
the interaction in groups. After labelling (i.e. categorising (or annotating) different collaborative actions according to a
coding schema), the support used production rules to generate support for the groups. This use case for LLMs hinges
on the accuracy with which the model assigns labels to collaboration data (e.g. audio, video, computer log files). Thus,
an essential question is to what extent LLMs are capable of labelling data reliably based on a coding schema. Some
studies found acceptable labelling performance across multiple labels when compared to human coding (as indicated
by interrater reliability scores greater than 0.7), while others reported lower overall
performance or high performance only for certain labels. Thus, there is potential to integrate LLMs into the analysis that lays the foundation upon which to provide
groups with adaptive support. But there are still obstacles to overcome, especially in terms of consistently accurate
diagnoses so that groups do not receive inadequate support due to misclassifications. In this context, Wong et al. recently demonstrated that the accuracy of automatic coding can be further improved by using models that
are able to leverage inputs from multiple modalities, such as speech (audio) and written text.
In this session, we focus on how researchers and developers have leveraged GenAI models to provide groups with scaffolds and to what effect. Therefore, we collected a broad sample of articles from scientific journals and conference proceedings that reported on studies in which generative AI systems were implemented in collaborative learning scenarios. Notably, we observed that studies in this field tended to use LLMs as the foundation of support, while rarely leveraging them to expand the capabilities of already established types of collaboration support, such as collaboration scripts or pedagogical agents. To gain a clearer picture of the landscape of GenAI support for collaborative learning, we analysed these studies and determined 1) which role a support system may assume during collaboration to scaffold the collaboration, 2) which aspects of the collaboration are scaffolded (i.e. the dimension ‘target’ in the framework proposed by Rummel, 3) whether the tools are effective in supporting knowledge acquisition during collaborative learning, and 4) on what grounds the adaptive behaviour of the tools is designed.
Inspecting current studies on collaborative learning with generative AI, we derived four different ways in which generative AI was integrated into collaborative learning settings in order to afford or support the collaboration. These are characterised below, with examples provided (see Figure 4.2 for an illustrative overview).
First, there are studies in which GenAI (i.e. LLMs) served as a repository of information, which groups can query to obtain information that can be used to solve their tasks, thus serving the function of a web-search. For example, Darmawansah et al.implemented ChatGPT to support groups during argumentative knowledge co-construction. In the first phase of the collaborative activity, groups were tasked to research information, which they subsequently used to develop arguments. During this phase, groups could search for information about different topics using ChatGPT or other sources. In the subsequent phases of the activity, students discussed using their arguments.
Generative AI can further be a source for personalised learning material, that is, a tool that
collects information about the group, its collaboration, or its results, and generates additional learning material
that the group can use. For instance, Naik et al. implemented GenAI in a collaborative scenario where
groups first developed a solution for a problem and then contrasted their solution to an alternative solution,
subsequently discussing trade-offs between the two solutions. In this case, the group’s solution was processed
by ChatGPT using prompts from the researchers, who tasked the model to create an alternative solution, a
so-called contrasting case. Contrasting cases present learners with alternative solutions to the problem (i.e. cases). By
contrasting their own solution to another case, learners can gain a deeper understanding of the underlying principles
of the correct solution to the problem. The group then had to discuss
the merits and drawbacks of its own and the generated solution. In other studies, generative AI systems were used
as tools that performed tasks for the groups, such as generating narrations or images based on inputs from the
group.
Other studies used GenAI to intervene and scaffold the collaboration, thereby assuming the role of a teacher or facilitator. In this role, GenAI is positioned outside
of the group, monitors the collaboration, and provides support. For example, Cai et al. implemented a
chatbot based on ChatGPT, which followed pre-specified rules to facilitate active participation during collaboration.
To this end, the system monitored the participation and alerted the group if the participation was uneven, prompting
the group to ensure that all group members contributed. Furthermore, the bot scripted the discussion based on
the stages suggested by Tuckman to promote knowledge construction, for instance by asking follow-up
questions and steering the conversation back on-topic if necessary.
Feng described a system that only provided support upon request. Specifically, groups could ask the
system to provide summaries of or feedback on the group’s discussion. An analysis of the interaction between group
members and the chatbot revealed that most of the groups’ requests were cognitive interactions, such as asking
task-related questions or requesting the chatbot to perform tasks like formatting outputs. In other examples where
GenAI assumed this role, the system was designed to promote reflection about the collaboration process, for instance
by monitoring a group’s interaction in terms of group norms and promoting reflection about the interaction in the
group, by helping supported groups to interpret and reflect upon information from a learning
analytics dashboard during a debriefing session after a collaboration phase, by providing
feedback on texts that were created during a knowledge-building activity, or by facilitating the
process of providing peer feedback.
A GenAI system may also assume the role of a tutor or dialogue partner. This role relates to that of a teacher, but instead of providing guidance on the interaction in the group, the GenAI system serves as a partner in a dialogue aimed at developing domain-specific knowledge. For example, Ahlström et al., described a collaborative learning scenario where groups interacted with digital characters in a virtual reality (VR) environment (e.g. a hurricane evacuee or a climate scientist) in order to collect information that was needed to solve a task. These characters represented a storytelling element and were not designed to act as pedagogical agents. A more widespread implementation of pedagogical dialogue partners can be found in arrangements where an individual learner interacts with an artificial agent acting as a tutor or partner for Socratic dialogue. As one such example, Goda et al. described a chatbotbased system that was structured around a set of principles for Socratic dialogue, such as providing learners with structured questions that are expected to elicit critical thinking and analyse their own reasoning instead of providing direct answers to students’ questions. In a similar vein, Dang et al. developed an LLM chatbot for interaction during a mixed-reality learning activity. The chatbot was designed to make use of the conversation history between learner and chatbot in order to provide personalised responses that sought to promote critical thinking, such as asking open-ended questions and encouraging elaboration and reflection. We wish to note that this application of generative AI arguably represents a fringe case within the broader landscape of collaborative learning, which traditionally emphasises interaction between at least two human learners. We include it here since the interaction with an artificial interaction partner can elicit cognitive processes conducive to learning that can also be found in collaboration between human learners (e.g. eliciting argumentation and self-explanations). However, a separate discussion about whether these contexts truly represent collaborative learning is required.
Finally, generative AI may also be implemented as an artificial group member. In this role, the system is positioned as part of the group and participates in the learning activity, for example by contributing domain-specific knowledge to dialogues or by facilitating interaction, implemented support for learning about interdisciplinary collaboration: students were assigned the role of different experts and then collaborated to solve a problem in a team. In one group, a conversational agent based on ChatGPT functioned as a peer group member. This agent played an engineer and provided respective expert knowledge to the discussion in the group. Similarly, Hernandez-Leo et al., implemented an artificial peer in a knowledge-building activity. This agent participated in the activity and was designed to elicit discussions and promote critical thinking skills by submitting its own answers to the knowledge-building environment, rating other students’ answers, and participating in the knowledge-building discussion.
A GenAI system may also assume the role of a tutor or dialogue partner. This role relates to that of a teacher, but
instead of providing guidance on the interaction in the group, the GenAI system serves as a partner in a dialogue
aimed at developing domain-specific knowledge. For example, Ahlström et al., described a collaborative
learning scenario where groups interacted with digital characters in a virtual reality (VR) environment (e.g.
a hurricane evacuee or a climate scientist) in order to collect information that was needed to solve a task. These
characters represented a storytelling element and were not designed to act as pedagogical agents.
A more widespread implementation of pedagogical dialogue partners can be found in arrangements where an
individual learner interacts with an artificial agent acting as a tutor or partner for Socratic dialogue. As one such example, Goda et al. described a chatbotbased system that was structured around a set of principles for Socratic dialogue, such as providing learners with
structured questions that are expected to elicit critical thinking and analyse their own reasoning instead of providing
direct answers to students’ questions. In a similar vein, Dang et al. developed an LLM chatbot for interaction
during a mixed-reality learning activity. The chatbot was designed to make use of the conversation history between
learner and chatbot in order to provide personalised responses that sought to promote critical thinking, such as
asking open-ended questions and encouraging elaboration and reflection. We wish to note
that this application of generative AI arguably represents a fringe case within the broader landscape of collaborative
learning, which traditionally emphasises interaction between at least two human learners. We include it here since
the interaction with an artificial interaction partner can elicit cognitive processes conducive to learning that can also
be found in collaboration between human learners (e.g. eliciting argumentation and self-explanations). However, a
separate discussion about whether these contexts truly represent collaborative learning is required.
While assuming these different roles, the support can scaffold different aspects of the collaboration to achieve its
goal, that is different targets. In the following, we present examples of different targets that we
identified in the literature (Figure 4.3). Notably, in some studies, groups were free to choose how they used the GenAI
tool. Not all of the studies explicitly reported the
target of the support, in which case the target(s) of the support can only be assumed based on the reported results
or illustrative screenshots provided by the authors.
Support can target cognitive aspects of the collaboration, for instance by providing information or resources that the
group needs to solve its task and learn implemented a chatbot based on ChatGPT that acted as a peer during the
collaboration. One version of the chatbot in this study provided disciplinary knowledge that was necessary to solve
the collaborative task. In other studies, the LLM-based system aimed to elicit cognitive processes such as elaborating
on new information.
There are also approaches, where GenAI tools are provided so that groups offload tasks to them. For instance, Wei
et al. provided groups with a combination of different GenAI tools (ChatGPT, Midjourney, Runway) which
performed different tasks for the group, such as creating storyboards, images, and videos. These approaches exist
cannot necessarily be conceptualised as scaffolding.
Other authors explored how generative AI can scaffold socio-cognitive interaction patterns that afford the
co-construction of knowledge in the group; or promote
reflection processes highlighted certain parts of the groups’ solutions and asked the group members to reflect on their solution and
explain concepts included in the solution.
Metacognitive aspects of the collaboration have often been scaffolded by providing groups with feedback about
their work. Dal Ponte et al. presented a system that evaluated a group’s solution and then provided the group with feedback about
the quality of the solution. Groups could then react to the feedback and revise their solution. Moreover, when groups
were freely able to choose how to use an LLM chatbot, some explicitly requested feedback on their written solution.
Another popular method of providing support in the context of feedback is peer feedback. Greisel et al. presented two approaches on how generative AI
may facilitate this process: The first approach (contribution by Greisel et al.) consists of a system
that processes a student-written text and generates a message with feedback for this text, while another
student (the peer reviewer) also reads the text and creates their own message with feedback. Subsequently,
the peer reviewer reviews both feedback messages and creates a feedback message that combines the
machine-generated message and the human-generated message, before sending the feedback message to the
student who created the text. In the second approach, an LLM provided feedback on a peer reviewer’s feedback
message. With the help of this machine-generated feedback, the peer reviewer then revised their own feedback
before sending it to the original student. Noroozi et al. further noted that an LLM may also help learners to
reflect on the peer feedback they have received or to revise their text using peer feedback.
Another way in which generative AI systems can provide feedback relates to promoting collaborative reflection about
the collaboration process. The system described by Ko and Foltz (contribution in Greisel et al. monitored the
collaboration in terms of previously negotiated group norms and provides examples from the collaboration process
so that the learners can reflect upon their collaboration after collaborating.
Some systems target social aspects of the collaboration, such as the participation of the group members or off-topic discussions. As described
above, the systems implemented by Cai et al. monitored the amount of participation
in the groups and prompted groups to regulate the distribution of participation if it was uneven. Similarly, An et al. implemented an artificial group member that monitored the interaction to detect moments of uneven
participation, prompting inactive group members to contribute to the discussion.
The system by Cai et al. additionally monitored the content of the conversation to identify off-topic talk. If
off-topic conversations were detected, the system would prompt the group to return to the main task.
Other systems focused on guiding and orchestrating the collaboration on a behavioural level, for example by leading
the group through the task. For instance, Liu et al. designed a system that moderated the collaboration by guiding group members through the different phases of the
collaborative task, explained tasks and orchestrated turn-taking during discussions. The system presented by Lin et
al. served as a summarisation aid to create feedback for the group and propose potential further topics
with which the group may familiarize itself. Specifically, based on the contributions of the group members, the system
created feedback messages that summarised (the model’s “understanding” of) what the group members already
understood and how they could proceed with their task.
Finally, some systems were designed to support socio-emotional aspects during collaboration, which includes
motivating students, regulating emotions during learning, or giving compliments. While studies such as those
reported by Dang et al. and Feng investigated the interaction between learners and generative
AI systems in terms of socio-emotional processes, studies in which generative AI is deployed to specifically promote
beneficial socio-emotional processes or states are rare.
As we have illustrated, GenAI based systems (usually LLMs) can assume several different roles and target different
aspects of the collaboration. The ultimate aim of supporting groups is to afford interaction patterns that benefit
groups in terms of achieving their goals, for instance learning. Learning may encompass the acquisition of domainspecific knowledge as well as knowledge that enables group members to collaborate more effectively. It is important to note that beneficial effects of a support tool depend on whether and to what extent the tool can elicit conducive
interaction patterns. Against this background, asking about “the” effect of generative AI on collaborative learning is
not precise enough. Instead, we should wonder how learners and groups interact with a tool while working on a task,
which individual and collaborative learning processes are sparked in this way, and how this in turn affects outcomes
such as the co-construction and acquisition of knowledge. This insight comes from the so-called media/methods
debate which is essential for contemporary research on technology-enhanced leaning. In the following,
we briefly summarise the evidence on the effectiveness of generative AI support for learning.
To initiate and maintain effective collaboration that eventually leads to learning, groups need to perform a broad
variety of actions. Despite the crucial role
of collaboration skills, only a few of the studies we found investigated them as an outcome (or dependent) variable.
One example is the study by Darmawansah et al., who compared argumentative speaking performance,
as well as the complexity of the arguments, before and after collaborating with support from a generative AI system
when learning English as a foreign language. In one phase, the groups could query ChatGPT for information about
the topic, and in a subsequent phase, groups used predefined prompts for ChatGPT that processed the input from
the group, for instance "Rewrite these arguments using the argumentation model”. The results revealed a significant
large effect (η² = 0.33) of the scaffolding on students’ argumentative speaking performance and on the complexity of
the arguments that students were able to provide after the collaboration.
Another study, by X. Wei et al., examined the effect of a combination of different GenAI tools (ChatGPT,
Midjourney, Runway) during a digital storytelling activity on students’ collaborative problem- solving skills. Over
several weeks, students created digital videos that told short stories. Groups in the experimental condition used
the different GenAI tools to create storyboards, images, and videos. As the dependent variable, the authors used. a questionnaire on which participants rated statements regarding their past collaborative behaviour and potential
reactions to hypothetical collaborative situations. Learners who collaborated using these tools achieved a higher
score on this measure than did their peers who collaborated without GenAI tools (large effect, η² = 0.16).
Taken together, there is still very limited research on the effects of different approaches to using generative AI in
order to scaffold collaborative learning with the aim of promoting collaboration skills. Further studies in the field have
investigated other outcomes or characteristics of the collaboration process, such as group performance, interaction patterns and participant roles that emerged during collaboration when
participants had access to a generative AI system, reflective
thinking, the perceived influence of ChatGPT on group dynamics or the overall perception of a generative AI chatbot.
Research emphasises that scaffolds are effective if they are designed in accordance with didactic principles. Against this background, we explored how generative AI systems were designed so as to provide groups with adaptive support.
One approach utilised by authors to afford personalised scaffolding was to rely on the generative model (i.e. LLMs)
to perform monitoring, diagnosis, and decisions about specific instructional actions. In some cases, groups could use the generative AI as they wished;
consequently, the specific adaptive actions from the system were dependent on groups’ prompts or the behaviour
that was transmitted to the LLM. In these cases, if the model received a request aimed at receiving
feedback or guidance, the guidance would be dependent on the
qualities of the model (i.e. LLMs such as ChatGPT 3.5).
Other researchers used prompts to instruct the GenAI system how it should react. One such approach is to specify a
role that the system should assume and specific evaluation criteria that the system should use to assess inputs from
a group. For
instance, groups participating in the study by dal Ponte et al. (2023[106]) were tasked to develop evaluation plans, which
were subsequently assessed by ChatGPT. The authors provided the model with an instructional prompt delineating
the role that the model should assume (i.e. an expert) as well as criteria against which the model should evaluate the
group's solution: “Act as an expert in the learning health systems framework with a focus on socio-technical evaluation
plans. [...] Meticulously analyse its content based on the specific categories” (p. 2). These categories included the
method of evaluation: “Assess the appropriateness, robustness, and feasibility of the chosen evaluation method”
(p. 2), or the data source: “Critically examine the listed data sources for their relevance, reliability, and potential to
address the evaluation’s objectives” (p. 2).
An example of a prompt that was used to design an artificial peer can be found in An et al., who tested a
chatbot for collaborative learning. The authors gave GenAI models (Ernie-Speed-128k and Qwen LLM) prompts such
as “You are a student agent named Alice who will participate in a group discussion with several students on [topic]”
(p. 3), or “Once the discussion starts, you should guide the conversation and help students delve deeper into the
questions being discussed by sharing various viewpoints.” (p. 3). The artificial agent was implemented in a chatroom
where students could collaborate and participated in the discussion.
A more complex system, which makes use of multiple LLM-based agents, was described by Wu et al., who
examined a multi-agent system that was used to support groups of learners during a programming task. The authors
specified the system’s behaviour by giving it directions about the behaviour the system should exhibit (e.g. style and
persona), the format the system’s responses should have, and example responses.
An alternative approach to researchers or teachers providing the evaluation criteria that should be used by a
generative AI system is to allow the groups to determine the criteria themselves. One such example was presented
by Ko and Foltz (contribution in Greisel et al., who described a generative AI system that monitors groups’
discussions against the background of group norms that were designed by the groups prior to the collaboration.
Afterwards, the system facilitates the reflection on the collaboration by providing examples from the collaboration
process in which these norms were visible.
As an alternative to using qualitative instructions to steer adaptive interventions by the generative AI system, interventions may be based on production rules. For instance, the system described by Cai et al. counted the number of contributions made by each group member. If the relative number of contributions of a group member fell below 10%, then the system would issue a prompt to encourage this group member to participate more. Similarly, Naik et al. implemented several production rules that used expression-based matching to identify specific types of code fragments that groups used while solving their programming tasks. Each production rule was aimed at encouraging reflection about one of five learning goals. For example, if the code entered by the students included a specific expression that aimed at altering the data type of a column, then the system would prompt the group to discuss their selection.
Two further approaches to guide a generative AI system in providing instructions to groups are to fine-tune the
system and to provide it with additional information about the learning context. Fine-tuning consists of adapting a
pre-trained model for specific tasks or use cases using a smaller, specific dataset. One example of fine-tuning can
be found in An et al..While the authors stated that they used “extensive tutor dialogue data [...] to train
the LLM, ensuring the generation of more professional responses for instructional and pedagogical guidance” (p. 2),
they did not provide further details regarding the corpus of tutoring dialogue or the training procedure.
Lin et al. also mentioned that they fine-tuned ChatGPT before implementing it into their GPT-Assisted
Summarization Aid (GASA), but did not provide further information on this.
Instead of fine-tuning an LLM, Feng provided their LLM chatbot with presentation slides and a description of
the collaborative problem-solving task that were used in the course that students attended, in addition to specifying
the chatbot’s role and desired behaviour (see above). Feng argued that this approach is associated with
lower costs than fine-tuning and allows for adjustments as soon as the system is in use.
In response to generative AI, educational researchers and developers are exploring novel ways to adaptively support
collaborative learning using this new type of machine learning model (usually LLMs). To determine how best to design
effective support, and to establish for which learning outcomes such support has proven beneficial, we analysed a
broad sample of studies. Overall, we identified and illustrated the potential roles of GenAI support in collaborative
learning settings – often higher education settings. Upon closer scrutiny of the studies, we saw that GenAI systems
were often designed to assume the role of a tutor or facilitator, or to function as a repository of information (akin
to a search engine), while the role of artificial group members was explored less frequently. The support targeted
different aspects of the collaboration in order to promote collaborative learning, for example by providing domainspecific information, facilitating discussions, eliciting reflective thinking, or nudging equal participation of all group
members. Moreover, a significant proportion of the studies ultimately relied on human intelligence to implement
evaluation criteria and to specify rules to govern the desired pedagogical behaviour of the generative AI (rather than
relying on the LLM to make suggestions).
Given the small number of studies that have systematically investigated the benefits of GenAI support for knowledge
acquisition, it is currently somewhat difficult to determine how these potentials of GenAI to support collaborative
learning can best be leveraged. In terms of promoting the acquisition of domain-specific knowledge, results are
mixed, with two of the three studies that focused on domain-specific knowledge as an outcome finding small to
medium effects, although these studies targeted a very specific domain, namely computer programming. Studies that focused on learning to collaborate – instead of
collaborating to learn – were especially rare. However, Darmawansah et al. reported promising findings with
respect to supporting students’ argumentation skills.
Our overview shows that the potentials of GenAI are indeed being explored, and conceptual and empirical work
continues. Previous research on collaborative learning has already accumulated a wealth of insights into boundary
conditions for effective collaboration. These insights represent a fruitful basis for continuing research on adaptive
support that leverages GenAI, especially LLMs. Most importantly, collaborative learning scenarios should create social
interdependence, while support is most beneficial when it can elicit interaction
processes cognitive, metacognitive, motivation or affective processes that benefit the knowledge construction in the
group.
Conceptually, it is important to discuss which roles GenAI can realistically play in the process of detecting and diagnosing
collaboration processes, how it can generate adequate instructional support, and how it may be integrated into
existing well-researched approaches to supporting collaboration. Empirically, it will be beneficial to explore whether
GenAI models actually provide the support that designers are expecting, whether the supporting actions are in line
with beneficial actions identified by previous research, and whether supporting actions by GenAI elicit beneficial
collaboration processes. Ideally, the conceptual and empirical perspectives should be combined in order to design
and test collaboration support. In the final section, we share ideas on designing collaboration support as well as
essentials for the design of future studies in this field. In doing so, we hope to encourage researchers and developers
to collaborate and to continue charting out the potentials of GenAI for collaborative learning.
After illustrating the current landscape of the use of generative AI in collaborative learning contexts, the question
arises of what the future may hold. As researchers, educators, developers, and policymakers, we must play an active
role in shaping this future by utilising educational theories and empirical evidence to design instructional support that
benefits learning in small groups. This section presents our thoughts on the current discourse and hopes regarding
the development and implementation of generative AI for collaborative learning, but also regarding the necessary
research to inform the development of support for collaborative learning.
While there is already ample research on collaborative learning and adaptive support for collaboration, first hopes
on leveraging GenAI to support collaborative learning tend to rely on the use of LLMs as a stand-alone solution
for instructional support. Given the rich tradition in adaptive collaboration support, this trend raises an important
question: which aspects of the design of LLMs warrant the assumption that LLMs are indeed capable of performing
all actions necessary for effective support in collaborative learning?
As illustrated above, to generate adaptive support for groups, generative AI models (and humans) must be capable
of performing activities such as those laid out in Molenaar’s detect-diagnose-act framework.
Research in the areas of CSCL, learning analytics, and AI in education have been exploring how to derive indicators
from multimodal data streams that represent objective, reliable, and valid operationalisations reflecting relevant
characteristics of learners, groups, and collaborative interaction. These efforts have shown that this is no trivial task.
Besides conceptual work, there are also
empirical approaches that aim at exploring multimodal data streams (e.g. video, voice, computer log files) to detect
aspects of the collaboration process. Notably, the results described in the overview by Schneider et al. (2021[126]) point to persisting
challenges to derive valid indicators for different aspects of collaborative learning.
Supposing that the type and goal of the support have already been decided, a GenAI model then needs to determine
whether support is required and then generate supporting actions that elicit productive interaction (e.g. by providing
feedback, contrasting cases, or prompts). In other words, instead of merely predicting a ‘next token’ (i.e. parts of
words or sentences) generative AI systems are assumed to be capable of predicting ‘the next instructionally adequate
actions’. Given criticisms that LLMs represent ‘stochastic parrots’ and potentially generate
advice that reproduces neuromyths as Richter et al. illustrate, one may wonder whether it is realistic to
expect LLMs to be capable of providing instructionally beneficial support, in the sense of a ‘next token pedagogy’. In
this regard, a thorough discussion among researchers and developers is needed to clarify our expectations of LLMs in
terms of their ability to detect relevant characteristics of collaborative learning and groups, leverage this information
for diagnoses, and generate adequate supporting actions that address the diagnosed states. This discussion should
be accompanied by empirical investigations into the functioning of LLM-based support that compares machine
outputs with best practices derived from research on learning and instruction. As we have seen, many researchers
specify pedagogical guidelines that the LLMs are expected to follow. However, whether the systems indeed produce
the desired outputs in response to events during collaboration or groups’ requests has rarely been reported.
A second approach (and question to discuss) is whether generative AI applications such as LLMs can complement or
expand established means of collaboration support. Given the wealth of research on collaboration support, it is surely
beneficial to explore how to leverage the capabilities of generative AI in combination with already existing types of
collaboration support. The most prominent features of LLMs are their natural language processing capabilities and
their potential dialogic nature (if implemented alongside a chat interface). A core value of collaborative learning is the interaction between humans, who interact to support each other and co-construct knowledge. Provided that it does
not undermine human-human contact and interactions, generative AI may have its place in collaborative learning. For
instance, it may facilitate various aspects of the collaboration (i.e. targets), such as social interaction, socio-cognitive
processes, or group metacognition (see above). To this end, the support must elicit beneficial interaction patterns,
including sharing information, giving explanations, and monitoring the collaboration (King, 2007[15]; Kreijns, Kirschner
and Jochems, 2003[129]; Nokes-Malach et al., 2019[130]).
Here are a few potentially fruitful contributions of generative AI to already established types of collaboration support.
First, an LLM may adapt the phrasing of an external collaboration script (scriptlets: Kollar, Wecker and Fischer,
2018[36]) to the competence of the individual group members, for example by using easier terminology or providing
suggestions about how the prompt from the script may be performed. If the script is more fine-grained, for instance
on the level of individual utterances or sentence-starters (scriptlets: Kollar, Wecker and Fischer, 2018[36]), an LLM may
help formulate messages that are tailored to the context of the task and the group’s conversation history. At the
same time, there is the potential for groups to interact with and adapt the support during collaboration: if support
is delivered by a chatbot, groups may ask clarifying questions, such as how to implement a particular collaboration
strategy. Groups may also modify the degree of support, for instance by increasing or decreasing the amount or
granularity of support, or by selecting specific aspects of their collaboration for which they wish (or no longer wish)
to receive support. In this case, GenAI provides not only adaptive support, but also adaptable support, that is, the
group can modify the support based on their needs during collaboration.
When integrated into conversational agents or intelligent tutoring systems, an LLM may render the
artificial agent’s text output more human-like, allow for natural language input from the learners, provide prompts
that are more context-aware, and take into account the history (i.e. context) of the conversation between the learner
and the artificial agent, such as the goals that the group is currently pursuing. Accordingly, learners may engage with
the support messages more actively. The same may apply to systems that do not take the role of a facilitator, but of a
peer that has to be taught by the group and thus elicits beneficial interaction patterns.
The effects of such applications of GenAI should be subject to empirical inquiry. It is essential to note that outputs
from LLMs are likely to suffer from encoded biases and stereotypes; for examples in learning
contexts, see Kotek et al., or incorrect information that may even include neuromyths about learning and
teaching. Researchers, developers, and teachers must be sensitive to such issues and test a
system’s performance before implementing it on a broader scale.
Finally, generative AI, and especially LLMs, may be a valuable addition to the analytics pipeline of adaptive support
systems due to their natural language processing capabilities. In this context, GenAI is not directly responsible for diagnosing the collaboration or deploying pedagogical
actions but is part of the analysis of the interaction (e.g. dialogue in the group).
So far, we have discussed various specific ways to leverage GenAI to support collaboration. Next, we highlight
some more general aspects to consider when designing support that includes GenAI. Importantly, the design of
adaptive collaboration support needs to centre on challenges that groups experience, as opposed to employing
GenAI for its own sake. Thus, researchers and developers should design support that targets specific challenges
that may arise during collaborative learning. Here, the discussion returns to our assessment of the current state of
research on the effects of GenAI to facilitate collaborative learning. While the evidence regarding tangible beneficial
effects is currently limited though positive, we can conceptualise collaboration support that might be effective. The
mechanisms underlying effective collaboration and the challenges for groups are well documented illustrated the importance of designing support carefully based on
established theories and empirical evidence. Luckin and Cukurova illustrate one way in which insights and
methods from the learning sciences can guide the design of effective instructional support.
We have proposed conceptualising adaptive support as scaffolding. From this perspective, support does not remain
available to the groups, but is rather faded out as learners become increasingly competent to perform all aspects of
the task on their own. The ultimate goal of scaffolding is to achieve learning that develops
over time and leads to independent performance, as opposed to increasing the mere performance in the moment
(and perhaps only as long as support is available).
More specific guidance in designing support can be found in the framework of CSCL design dimensions presented
by Rummel (see Figure 4.1), to which developers can refer in order to familiarise themselves with design
decisions and potential options and subsequently utilise theoretical and empirical work to inform the specific design
of the support. One perspective that may be fruitful when reflecting on these more fine-grained design decisions
is the concept of cognitive offloading. This perspective highlights the question of
which (cognitive) activities are delegated (e.g. an object in the environment or an artificial agent) and thus not (yet)
performed by human learners. While offloading or delegating certain tasks may lead to an increase in performance, it should be kept in mind that specific activities represent opportunities
for learners, and delegating them may thus have undesired effects or desired difficulties. From this perspective, there are good arguments for on-loading (and supporting) activities
such as generating and collaboratively exploring different arguments and explanations or supporting other group
members in monitoring and regulating their engagement during collaboration.
Finally, the design, effects, and consequences of supporting learning, especially when leveraging machine learning
models, should be considered in the light of ethical, legal, and social implications (ELSI) or aspects (ELSA). Technology is never neutral, and we therefore need to reflect
not only upon our use of technology but also its design (ethics by design). Guidance for such reflection might be found
in the field of applied ethic, and especially from specific ethics such as the ethics of technology, ethics of artificial intelligence, and ethics of artificial intelligence in education. A more practical approach for ethical reflection is offered by Simis, who provides suggestions and leading questions. Given the current trends of exploring the use of LLMs in
educational settings, we summarise some insights that might be included during ethical reflection: while there are
undoubtedly potential benefits of utilising GenAI as part of educational technologies, we must be aware of potential
costs (legal, social, or economic) associated with them. Bender and Bender et al. describe some
of the costs linked to specific LLMs, including the question of which materials companies can legally use to train their
models or precarious working conditions of the
data workers involved in the training and fine-tuning of models. Other costs
are biases and stereotypes that are encoded in machine learning models, and the energy necessary to train models
and process user queries. With this in mind, we have to discuss which costs we are willing to tolerate in
exchange for benefits such as positive effects (of a particular magnitude) on learning outcomes and goals such as
the UN Sustainable Development Goals; United Nations, for instance quality education,
gender equality, or decent work and economic growth. Here, we have to keep in mind that the costs associated with
LLMs can be expected to differ depending on the specifics of the development process (training) as well as our way
of implementing them in educational contexts.
As we have illustrated, more research on the effects of GenAI support is necessary to better inform the design
of collaborative learning settings. Therefore, we sketch out relevant design aspects of empirical studies that are
essential for creating robust and sufficiently reliable evidence to inform stakeholders’ decisions about the design of
effective collaboration environments.
Our review revealed that greater research attention needs to be placed on roles of support, goals of support, and
aspects of the collaboration that instructional support may address to achieve these goals. Thus, one avenue for
future research is to explore these more thoroughly. Such research could be guided by the framework of CSCL design
dimensions proposed by Rummel. Given the crucial role of collaboration processes for learning, research
should also investigate processes that occur in the group, such as which roles the learners in the group take when
interacting with each other and with an artificial agent,
the quality of the interaction, the regulation processes, or the processes of knowledge co-construction. As
noted by Cukurova, research should not be limited to cognitive outcomes. Collaborative learning offers
opportunities to learn how to build relationships with peers, practice emotional regulation, build self-esteem, or
acquire metacognitive capabilities.
However, future research might not only expand on potential targets and goals of collaboration support, or
systematically investigate other CSCL design dimensions, but might also investigate the consequences of introducing artificial agents into groups, such as altering the interaction in the group and thus how
information is processed by human learners. For example, the review by Vaccaro et al. highlights that the
success of human-machine teams is not guaranteed. Instead, performance benefits of human-AI teams are only more
likely if the human alone would outperform an AI alone.
From the perspective of cognitive offloading, the introduction of artificial agents may
lead human learners to delegate tasks to the artificial agent instead of performing these tasks themselves. This may
undermine activities that are essential for learning, especially generative learning activities that pose desired difficulties for the learners. The same applies to monitoring and controlling the collaboration. For the
context of individual learning, Fan et al. found that some learners delegated metacognitive processes to
an artificial agent, a finding they termed “metacognitive laziness”. At the same time, it is important to acknowledge
that cognitive offloading is not necessarily always detrimental, and can be part of the fading-out process like Reiser
and Tabak (facilitative) co-regulation of collaboration.
In pursuing avenues like those described above, we seek to gain insights that will allow us to determine how to design
effective collaboration support for different educational contexts.
The basis for designing empirical studies is to test hypotheses on the expected mechanisms underlying how the
support affects collaboration processes. Ideally, we can then develop experiments that isolate the effect of this
support on outcomes such as knowledge acquisition. Conducting such studies in the context of collaborative
learning comes with additional challenges, such as small sample sizes, randomisation, and statistical analyses. Experiments on the effects of GenAI-based support are
especially informative for educational practice if an experiment compares this newly designed support with a ‘strong’
control condition, such as other types of support (e.g. collaboration scripts, conversational agents). A comparison
of outcomes from groups receiving adaptive support that utilises GenAI with the outcomes from groups receiving
no support or receiving a ‘business-as-usual’ instruction may confound the effects of the support with other factors,
such as the novelty factor, and may not be very informative for educational practice. When
comparing conditions, it is further vital to check whether the adaptive system indeed provides the intended support
(i.e. implementation check). Otherwise, the results provide little insight into how to design and implement adaptive
support. This is especially relevant for situations requiring insights about the conditions under which different types
of support are effective, enabling us to select between alternatives.
Ideally, there will come a time when ample evidence from rigorous research is available that can be synthesised into
overviews such as systematic literature reviews or meta-analyses. This appears to be a challenge pertinent for the
current generation of research on the effects of generative AI on learning, as illustrated by Weidlich et al.
Thus, we need more research that yields robust findings.
We can tackle the complexity of developing and evaluating GenAI support for collaborative learning by bringing
together expertise from fields such as the learning sciences, artificial intelligence, computer science, and educational
practice. Such interdisciplinary collaboration can lead to more comprehensive research designs, more nuanced and
robust data analysis, and consequently a deeper understanding of how a specific type of support affords collaborative
learning. While such collaborations require more time and effort, we believe that the costs required to generate
robust and reliable findings are justifiable given the consequences of premature conclusions such as investing time to develop and implement ineffective support, or hampering
learning. One framework for conducting interdisciplinary research that aims to exert an impact on practice in authentic
learning contexts is design-based research.
Instead of ‘moving fast and breaking things’, we advocate that the costs for conducting research that produces
reliable insights are well spent if harms are mitigated and we are able to have confidence that the support indeed
benefits our learners. As researchers, educators, developers, and policymakers, it is our responsibility to shape a
future where educational technology is used to afford meaningful collaboration and support effective interaction
that serves learning. To this end, technologies need to be designed thoughtfully. As Schleicher notes,
“digital technologies are also creating opportunities that will amplify great teaching, even if great digital technology
can never replace poor teaching” (p. 68). Therefore, considering insights from research on collaborative learning is
the backbone of the design of support that benefits groups and not only fosters the acquisition of domain-specific
knowledge, but also helps them become capable team members.
We would like to thank our colleagues in the research group who inspired and helped us sharpen the chapter, as
well as Stéphan Vincent-Lancrin for his helpful comments on an earlier version of this chapter. Special thanks go to
Nadine Lordick for sharing her thoughts on the “costs” and benefits of LLMs. Additional thanks go to Mutlu Cukurova,
and Lenka Schnaubert.
Comments
Post a Comment