Skip to main content
Top

Brainstorming with a Generative Language Model: Effect of Exposure to AI Ideas on Brainstorming Performance and Cognitive Load

  • Open Access
  • 18-11-2025
  • Research Paper

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This study delves into the effects of generative language models (GLMs) on brainstorming performance and cognitive load, comparing human-only brainstorming with human-GLM collaboration. The research focuses on four key metrics: fluency, flexibility, novelty, and value, and explores the cognitive load experienced by participants. The findings reveal that while individual human performance may decrease when working with GLMs, the human-GLM dyad achieves superior performance across all metrics. The study also introduces the concept of 'smart loafing,' where humans reduce effort to maintain cognitive resources, and discusses the implications for tool design and management strategies. The research provides valuable insights into the dynamics of human-AI collaboration, offering practical recommendations for enhancing creative output in professional settings.

Supplementary Information

The online version contains supplementary material available at https://doi.org/10.1007/s12599-025-00974-y.
Accepted after one revision by Alexander Richter.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Generating ideas is the divergent stage of a larger process for solving creative problems in organizations, such as developing new products or marketing slogans (Dell’Acqua et al. 2023; Bouschery et al. 2023; Sawyer 2021). A popular idea-generation technique is brainstorming (Maaravi et al. 2021). Traditionally, it was designed for human groups. Today, advances in generative AI allow a single person to brainstorm with an AI system rather than with a colleague (Kumar et al. 2025; Memmert et al. 2024b; Holzner et al. 2025).
As generative language models (GLMs) such as ChatGPT are rapidly adopted and often used for on-demand ideation when human partners are unavailable (Jiang et al. 2024), it becomes both timely and relevant to understand their influence on brainstorming outcomes (Kumar et al. 2025). GLMs, a term we use interchangeably with large language models (LLMs), refers to generative AI systems pre-trained on extensive text corpora that can generate coherent, contextually appropriate text in response to user input. Specifically, we examine how GLMs affect classical performance dimensions – fluency, flexibility, novelty, and value (Dean et al. 2006; Holzner et al. 2025) – and participants’ cognitive load (Gonzalez et al. 2024). Particularly as concerns have been raised that working with AI may invite reduced effort, i.e., loafing-like behavior (Stieglitz et al. 2022; Liu et al. 2023) or, conversely, increase cognitive load, for instance, because humans evaluate the outputs (Tankelevitch et al. 2024), it is vital for tool users and designers as well as organizations to understand GLMs’ effects.
Using GLMs for brainstorming extends the long tradition of leveraging technology to support, rather than replace, human creativity. Decades of work on creativity-support systems show how tools can assist individuals during idea generation and, therefore, provide a useful conceptual foundation. Prior research has documented both the metrics used to assess brainstorming performance (Dean et al. 2006) and the mechanisms within human groups that enhance or reduce them (Pinsonneault et al. 1999; Dugosh et al. 2000). Among the most prominent are cognitive stimulation – the additional ideas produced when individuals are exposed to others’ ideas – and social loafing – the tendency to reduce effort when working with others because one’s contribution seems less essential (Pinsonneault et al. 1999). These two mechanisms form the theoretical backdrop for our investigation of human–GLM brainstorming.
Building on these insights, individual creativity-support systems (ICSS) were developed to help individuals generate ideas more effectively. So-called stimuli-provider ICSS supply brief cues such as single words or short phrases that can trigger cognitive stimulation, but ICSS typically cannot generate complete ideas dynamically (Wang and Nickerson 2017). We argue that GLMs represent a qualitative leap beyond such systems in three key respects (cf. Brown et al. 2020). First, they are generative rather than retrieval-based: instead of offering predefined cues, they produce original, contextually coherent text by predicting word sequences. Second, they operate across a broad semantic range, drawing on knowledge encoded across vast corpora rather than limited domain databases. Third, they are adaptive: their outputs can be conditioned on the user’s prior ideas and requests, enabling iterative exchanges that resemble human-to-human ideation. Together, these properties allow GLMs to produce complete, context-specific ideas in real time, with creative performance approaching that of many humans (Haase and Hanel 2023; Holzner et al. 2025). Hence, integrating GLMs into brainstorming changes the nature of creative interaction itself – from systems that merely inspire humans to partner-like systems that co-generate ideas.
This leap in generative capability fundamentally reshapes how brainstorming unfolds (e.g., Gonzalez et al. 2024; Specker et al. 2025; Kumar et al. 2025; Houde et al. 2025; Zhan et al. 2024; Tao et al. 2023). In human dyads, two humans actively supply complete ideas that build on each other, whereas stimuli-provider ICSS only (reactively) provide brief cues that the one human must develop further. A human–GLM dyad may lie between these cases: one human actively contributes complete ideas but can also prompt the GLM to generate complete, copy-ready ideas on demand. Nonetheless, the GLM remains fundamentally reactive. Prior research has mapped interaction dynamics in both human groups (Pinsonneault et al. 1999) and settings involving stimuli-provider ICSS (Gabriel et al. 2016; Wang and Nickerson 2019, 2017). This reactive but capable role makes the human–GLM dyads distinct from both human-only groups and traditional ICSS setups, underscoring the need for dedicated theoretical and empirical study. At the same time, insights from both prior streams – group brainstorming and ICSS research – help anticipate how GLM-based collaboration might enhance or impede brainstorming performance.
Because a human–GLM dyad includes only one human partner, the individual must continually decide whether to invest effort in crafting ideas personally or to request the GLM’s copy-ready suggestions, and, once received, whether to refine or adopt them. This effort-allocation trade-off may be obscured in multi-human–GLM sessions, where social pressures and peer ideas may also influence contributions (Pinsonneault et al. 1999). By eliminating human–human dynamics, the dyad setting allows us to isolate the effects of high-quality, reactive GLM output on (1) the individual's idea production and (2) the dyad's joint output as well as any changes in cognitive load. Following the ICSS research tradition, we therefore compare the dyad with a solitary brainstormer who faces no external ideas or social influence, and, therefore, no comparable trade-off decisions. At the individual level, early studies of GLM-supported brainstorming with individuals or small groups report mixed findings: GLM ideas can make idea generation feel easier (Nomura et al. 2024; Specker et al. 2025; Zhang et al. 2025) but do not necessarily lead humans to produce more or more original ideas (Kumar et al. 2025). Accordingly, our first research question examines how human brainstorming performance differs when considering only ideas originating from the human – comparing those who work with a GLM to those who brainstorm alone.
A central ambition of human–AI collaboration is to achieve superior or complementary performance, that is, performance in the human–AI group (here, the dyad) that exceeds what a human can achieve alone (Dellermann et al. 2019; Hemmer et al. 2021, 2025). Given GLMs’ ability to generate human-level creative ideas and the potential presence of both positive (e.g., cognitive stimulation, cf. Gonzalez et al. 2024; Specker et al. 2025) and negative group mechanisms (e.g., social loafing, cf. Liu et al. 2023, or the lack thereof, cf. Nomura et al. 2024) in human–GLM brainstorming (cf. Zhan et al. 2024), the overall effect on performance remains uncertain. Studies on individuals or human groups suggest that GLM use can positively affect the number of ideas and topic coverage (Specker et al. 2025), yet it is unclear how these benefits translate to human–GLM dyads. Recent meta-analytic evidence (Holzner et al. 2025) indicates that while generative-AI collaboration modestly enhances creative performance overall, the effects vary substantially across tasks and even different ideation task types. Hence, for the kind of ideation task examined in our study, the impact of human–AI collaboration remains an open question. This underscores the need for more granular, mechanism-oriented research to better understand GLMs impact in brainstorming contexts. Thus, second, we examine how dyad-level brainstorming performance differs from that of human brainstorming alone.
Lastly, it was hypothesized that humans might use AI to maintain or reduce cognitive resources (Stieglitz et al. 2022; Jiang et al. 2024). However, when brainstorming with a GLM, humans must balance their effort between generating their own ideas and leveraging or evaluating the GLM’s suggestions (Gonzalez et al. 2024), which might require additional cognitive resources (Tankelevitch et al. 2024). Accordingly, our third research question examines whether working with a GLM influences perceived cognitive load as compared to brainstorming alone.
To address these research questions, we conducted a between–subjects experiment (n = 75) in which participants brainstormed using a purpose-built app. In the treatment condition, the app allowed participants to request ideas from a GLM on demand, whereas in the control condition (human-only), participants received no GLM support. Brainstorming performance was assessed across four established dimensions – fluency, flexibility, novelty, and value – at both the individual (human) and dyad (human–GLM) levels. After the session, participants completed a cognitive load scale and answered open-ended questions about their experience. Overall, human–GLM dyads performed significantly better than solitary humans. However, contrary to expectations, when considering only ideas originating from humans (excluding GLM ideas), participants in human–GLM dyads did not outperform those working alone and reported no significant difference in perceived cognitive load.
By investigating human–GLM brainstorming, we contribute to the long-standing information systems research stream on ICSS (Althuizen and Reichel 2016; Pinsonneault et al. 1999; Siemon et al. 2015; Wang and Nickerson 2017, 2019). We show that GLMs influence humans in ways that differ from those of traditional, stimuli-provider ICSS. Whereas stimuli-provider ICSS seek to improve individual brainstorming performance, e.g., the supported human produces more ideas, GLMs – contrary to expectations – do not improve individual brainstorming performance and may even reduce the number of ideas produced. Moreover, we demonstrate that simply applying the well-documented performance mechanisms from purely human groups (Pinsonneault et al. 1999; Dugosh et al. 2000) to human–GLM dyads can mischaracterize a shift in effort as social loafing. Based on our findings, we offer a nuanced perspective on human effort, reflecting both the new opportunities (e.g., GLMs as capable partners) and challenges (e.g., shift in effort and cognitive load) associated with working with GLMs in creative contexts. In doing so, we contribute towards addressing key research challenges in human–AI collaboration by advancing our understanding of human–AI group dynamics (cf. Makarius et al. 2020; Benbya et al. 2024) and of human–AI superior – or complementary – performance (cf. Hemmer et al. 2021, 2025; Dellermann et al. 2019).

2 Background

2.1 Brainstorming Performance and Determinants

Brainstorming is a popular technique for generating ideas (Maaravi et al. 2021; Osborn 1953). Osborn (1953) suggested four brainstorming rules: (1) delayed judgment, (2) encouragement of wild ideas and (3) idea quantity, and (4) combining ideas. While initially proposed for groups (Osborn 1953), different group structures were explored to determine performance factors (Pinsonneault et al. 1999). This included comparing real groups, in which members brainstormed together, to nominal groups, in which humans brainstormed alone without interaction and without seeing others’ ideas (Coskun et al. 2000; Diehl and Stroebe 1987), i.e., similar to individual or solitary brainstorming.
Common brainstorming performance metrics include the quantity of ideas, so-called fluency (Paulus 2000; Nijstad et al. 2010), and idea quality. For individual ideas, typical quality dimensions are, e.g., novelty (or originality), and value (or usefulness or feasibility) of ideas (Althuizen and Reichel 2016; Paulus 2000; Diehl and Stroebe 1987; Nijstad et al. 2010). Dimension might combine multiple characteristics, e.g., for rating novelty, the closely related characteristics of originality and surprisingness might be included (Siangliulue et al. 2015b); hence, we use the terms interchangeably. Ideas are usually rated by blind-to-condition judges (Althuizen and Reichel 2016) or, more recently, by GLMs (Haase and Hanel 2023). Considering only the best ideas is common to prevent penalizing idea quantity. This might be achieved by having participants self-select ideas and consider only those shortlisted ideas (Siangliulue et al. 2015a). While perhaps realistic for work settings, this introduces uncertainty regarding the selection quality. A different approach is the “good-idea-count” metric (Reinig et al. 2007), in which only ideas rated above a defined threshold are considered. Beyond individual ideas, a key aspect of brainstorming is how narrowly or broadly the topic was covered, i.e., the breadth of exploration or flexibility (Althuizen and Reichel 2016), which can be assessed by categorizing all ideas of a brainstorming session and counting the number of categories containing at least one idea (Althuizen and Reichel 2016; Nijstad et al. 2010). The metrics can be calculated on both the individual level and the (collective) group level to understand individual and group performance.
In human brainstorming groups, several group mechanisms rooted in psychological and social theories are known to lead to performance gains or performance losses during the brainstorming session (Pinsonneault et al. 1999). Here, we discuss the three mechanisms of cognitive stimulation, cognitive inertia, and free riding, as these were observed to be relevant for our GLM-based brainstorming setting.
Cognitive stimulation refers to group members creating more or better ideas because they are stimulated by other’s ideas; an effect that can occur in real groups when working with others (Pinsonneault et al. 1999) or though creativity support systems (Siemon et al. 2015), though paying attention to others’ ideas is necessary (Leggett Dugosh and Paulus 2005; Dugosh et al. 2000). Additionally, the number of stimuli and the semantic relatedness of stimuli, i.e., the kind of (categories of) ideas, determine if and how ideation is affected (Baruah and Paulus 2016; Wang and Nickerson 2019, 2017).
Cognitive inertia refers to humans “embark[ing] on a single train of thought, which limits creativity and productivity” (Pinsonneault et al. 1999, p. 114), which is typically observed in nominal groups (Pinsonneault et al. 1999), where humans brainstorm alone. In contrast to real groups, humans in nominal groups are not exposed to outside stimuli and thus might come up with more ideas in the same category, i.e., higher depth or higher within-category fluency (Nijstad et al. 2010), as opposed to exploring new categories (i.e., higher flexibility).
Free riding or social loafing refers to humans reducing their effort and relying on others to complete the task (Pinsonneault et al. 1999), an effect occurring in real groups (Pinsonneault et al. 1999). Influencing factors include perceived dispensability of effort, i.e., feeling one’s contribution would not make much of a difference (Kerr and Bruun 1983), and diffused responsibility (Pinsonneault et al. 1999; Latané et al. 1979), i.e., feeling less responsible when being part of a group, consequently reducing one's effort (Latané et al. 1979). While those mechanisms were initially discovered in human groups, decades of research explored how to affect these mechanisms through technology (Pinsonneault et al. 1999).

2.2 Towards Human–AI Collaboration for Brainstorming

Building on the human group mechanisms above, we examine how these might translate when brainstorming with a tool rather than a human. In human groups, each human contributes ideas directly but can also spark cognitive stimulation through those ideas in the other humans, helping them to overcome inertia (Pinsonneault et al. 1999). However, when humans feel responsibility is shared or their effort is dispensable, e.g., when working with a capable partner, they may reduce their effort, i.e., social loaf (Pinsonneault et al. 1999).
Creativity support systems research investigates tools to enhance brainstorming performance (Wang and Nickerson 2017), e.g., supporting individuals or groups on a meta-level through procedural facilitation that is brainstorming question independent, or on a content-level, i.e., acting more like a contributor for the specific brainstorming question (e.g., Siemon et al. 2015; Althuizen and Reichel 2016; Hwang and Won 2021). In a human-tool dyad, these tools, labelled as stimuli-provider ICSS, supply brief cues like words, phrases, or short sentences related to the brainstorming question. Examples include an “AI-like” system that mines social-media stimuli (Siemon et al. 2015) and a hint generator based on an association dictionary (Althuizen and Reichel 2016). Such cues act as external stimuli that can spark new ideas and overcome cognitive inertia, much like cognitive stimulation in human groups. Unlike another human, however, ICSS typically do not dynamically and actively contribute complete ideas; they merely provide inspiration, leaving the human to elaborate the full contribution.
We argue that GLMs mark a qualitative leap. Studies show that text-based generative AI, more specifically, large language models (LLMs) or generative language models (GLMs), can generate creative ideas at humanlike creative levels across a variety of subjects and creative tasks (Haase and Hanel 2023; Organisciak et al. 2023; Holzner et al. 2025). Replacing an ICSS’s backend with a GLM, therefore, enables generating live, content-level support across many topics without prior stimulus curation. Instead of supplying incomplete ideas or cues, GLMs deliver fully formed ideas that can both stimulate the human and can be copied directly to the joint idea pool, much like when brainstorming with another person. Moreover, these systems can dynamically respond to user input, much like a human brainstorming partner could do. Indeed, Gabriel et al. (2016, p. 112) refer to systems capable of suggesting ideas during brainstorming as “colleagues”. While one might argue that GLMs are still just tools and not human, this may fail to acknowledge the new behaviors and dynamics that may emerge in human–GLM dyads (Liu et al. 2023).
Past research suggests that humans might socially respond to technical systems (Nass and Moon 2000), which could mean that similar group mechanisms arise in human-tool dyads and in human–human dyads (Siemon and Wank 2021). Indeed, for several ICSS not based on GLMs, the group mechanisms of cognitive stimulation or free riding known from human groups were explored in different constellations with non-humans (Siemon and Wank 2021; Althuizen and Reichel 2016; Siemon et al. 2015), including dyads of humans and tools (Stieglitz et al. 2022; Althuizen and Reichel 2016), and GLMs specifically (Nomura et al. 2024). Recent studies show that humans might attribute responsibility to the AI and might feel their effort to be dispensable (Stieglitz et al. 2022; Memmert and Tavanapour 2023; Memmert et al. 2025), which are known antecedents for social loafing (Pinsonneault et al. 1999).
In human groups, social loafing is considered a negative behavior, as humans reduce their effort at the expense of other humans. However, when working with AI, loafing might need to be judged differently (Stieglitz et al. 2022). Particularly, if the AI group member performs well, a reduction of effort might be desirable to maintain cognitive resources. Thus, as an adaptation to social loafing, the construct of “smart loafing” was proposed, defined as “the reduction of effort in human-[virtual assistants] collaboration to maintain cognitive resources and enhance efficiency in work” (Stieglitz et al. 2022, p. 758). If the AI group member indeed leads to a reduction in cognitive load, this could be particularly relevant for the creative task of idea generation, because cognitive load and creative performance are negatively associated (Redifer et al. 2021).
Figure 1 summarizes these contrasts in a conceptual framework. Human–human and human–GLM dyads both involve an “other entity” that contributes full ideas, which can also act as stimulating cues, whereas stimuli-provider ICSS contribute cues only; solitary brainstorming contains neither. Because GLMs share the “full-idea” property with human partners while remaining tools that are triggered by human action (instead of being actively like a human), human–GLM brainstorming research may be informed through the insights from both ICSS and group-brainstorming research, highlighting mechanisms like cognitive stimulation and overcoming inertia, but also cautioning regarding potential loafing behavior.
Fig. 1
Conceptual framework comparing selected mechanisms in human dyads, human-tool dyads, and solitary humans
Full size image
Going beyond investigating GLM and human ideation separately (Haase and Hanel 2023; Organisciak et al. 2023), we explore joint human–GLM sessions in the tradition of content-level ICSS in typical brainstorming studies (Wang and Nickerson 2017; Gabriel et al. 2016; Althuizen and Reichel 2016). Though prior research reported a preference for brainstorming with a GLM over brainstorming alone (Muller et al. 2024; Houde et al. 2025), signs for both performance-enhancing and -reducing mechanisms were predicted or observed (Zhan et al. 2024; Specker et al. 2025; Memmert 2024; Memmert and Bittner 2024), rendering it uncertain if a GLM-based tool improves performance compared to unaided brainstorming, calling for more research. With our study, we seek to contribute to the discourse on whether and how group mechanisms known from human groups occur when individuals work with AI (Siemon and Wank 2021; Siemon 2022b; Stieglitz et al. 2022; Liu et al. 2023; Siemon et al. 2015).

2.3 Hypotheses Development

We contrast a human-only baseline, in which every idea is generated solely by the participant, with a human + GLM treatment that lets participants both produce their own ideas and request (and copy) GLM ideas. This conceptual comparison yields two analytically distinct performance layers (Fig. 2):
  • Human performance: ideas authored by the human, and
  • Dyad performance: the combined output of the human and the GLM.
Fig. 2
Idea in- and exclusion for hypotheses with performance levels and conditions
Full size image
Separating these layers is essential for isolating (a) the GLM’s influence on the human’s own contribution and (b) the net benefit of adding a GLM. Accordingly, our hypotheses proceed in three steps. Hypothesis Set 1 tests how a GLM alters individual output; Hypothesis Set 2 asks whether the dyad outperforms the unaided human; and Hypothesis 3 compares self-reported cognitive load. Figure 3 summarizes the specific hypotheses, which we develop next by drawing on established brainstorming mechanisms and recent empirical findings.
Fig. 3
Research Model
Full size image

2.3.1 Hypothesis Set 1: Human Performance

For hypothesis set 1: human performance, we investigate how the human brainstorming performance, i.e., considering only ideas originating from the human, excluding GLM ideas, varies across the two conditions, uncovering how typical brainstorming performance dimensions are affected by GLM usage. By adding a GLM, individual human brainstorming is transformed into collaborative human–GLM brainstorming. Previous research on all-human brainstorming groups suggests that individual performance can be enhanced or harmed due to being part of a group (Pinsonneault et al. 1999). While these effects are well-documented in purely human groups, and while humans might socially respond to technical systems (Nass and Moon 2000), such effects in groups with non-human members (Stieglitz et al. 2022; Siemon and Wank 2021; Liu et al. 2023; Tao et al. 2023) are still being researched, i.e., it is uncertain whether humans themselves perform differently when working with or without a GLM (Tao et al. 2023; Specker et al. 2025). Previous research suggests group effects like cognitive stimulation, social loafing, or cognitive inertia to be relevant in our setting (Stieglitz et al. 2022; Tao et al. 2023; Memmert 2024; Memmert and Tavanapour 2023).
In brainstorming sessions, human fluency decreases after the first few minutes (Baruah and Paulus 2016; Kohn and Smith 2011). When running out of ideas, humans working with the GLM can request GLM ideas. These may act as stimuli, activating the human’s memory and triggering the human to themselves develop more own ideas, i.e., cognitive stimulation could occur (Nijstad et al. 2010; Dugosh et al. 2000), increasing the human’s fluency (Dugosh et al. 2000; Althuizen and Reichel 2016) as compared to solitary humans, who do not receive stimuli. However, a human working with a GLM might reduce their effort (i.e., social loafing could occur), particularly if they feel their effort is dispensable or feel less responsible for the results (Pinsonneault et al. 1999). Given that GLMs can outperform many humans in generating ideas (Haase and Hanel 2023), humans might feel dispensable and might, therefore, reduce their effort. Additionally, previous research suggests that humans might attribute some of the responsibility to the AI (Stieglitz et al. 2022; Memmert and Tavanapour 2023; Memmert 2024), potentially leading to a reduction in effort and fewer ideas contributed. While the above suggests both performance-enhancing and -reducing effects, recent studies of multi-human groups suggest that the flow of GLM ideas may make it easier to come up with more ideas (Specker et al. 2025; Zhang et al. 2025; Nomura et al. 2024) and may reduce loafing (Nomura et al. 2024); hence, we propose:
H1-a
Humans brainstorming with a GLM achieve higher fluency (excluding GLM ideas) compared to humans brainstorming alone (solitary brainstormers).
For flexibility or breadth of exploration, different effects could influence performance. Cognitive inertia refers to producing more ideas in the same category (i.e., higher within-category fluency) but exploring fewer categories (i.e., lower flexibility). It can occur when brainstorming alone (here, human-only condition) without seeing others’ ideas, thereby receiving external stimuli (Pinsonneault et al. 1999), e.g., from a GLM (here, human + GLM condition). However, the effect of GLM ideas depends on the degree of relatedness and the amount. Exposure to many closely related ideas might lead to higher conformity to those idea or idea categories, and, consequently, to reduced flexibility (Lamm and Trommsdorff 1973; Kohn and Smith 2011). On the other hand, if the stimuli like GLM ideas are diverse, covering different categories, inertia might be reduced (Briggs and Reinig 2010), aiding humans to explore new categories, i.e., increasing flexibility. Given recent literature suggesting that AI and GLMs, specifically, can expand the problem and solution space (Specker et al. 2025; Bouschery et al. 2023; Nomura et al. 2024), we suggest:
H1-b
Humans brainstorming with a GLM achieve higher flexibility (excluding GLM ideas) compared to humans brainstorming alone (solitary brainstormers).
Stimuli can enhance idea novelty and value (Althuizen and Reichel 2016). However, exposure to typical or related ideas is associated with reduced idea novelty and increased idea value, and exposure to more ideas with increased fixation (Kohn and Smith 2011; Wang and Nickerson 2019). Prior literature comparing human and GLM ideas shows that GLMs can produce ideas of comparable or even higher creativity compared to many humans (Haase and Hanel 2023), which should result in humans also generating more novel ideas (Wang and Nickerson 2019). Results on the impact of GLMs on individual performance are mixed. Kumar et al. (2025) find that when humans work in a short brainstorming session (2 min) and receive a limited number of GLM ideas (7), they do not produce more novel ideas. However, Specker et al. (2025) report that when working with a GLM, participants report perceiving increased idea novelty at an individual level. We propose that when participants have more time and can request new GLM ideas without restriction, this high number of creative GLM ideas should induce cognitive stimulation (Leggett Dugosh and Paulus 2005) and should allow humans to more broadly and deeply explore the topic, which is associated with more novel and valuable ideas (Althuizen and Reichel 2016). Therefore, we suggest:
H1-c
Humans brainstorming with a GLM produce more novel ideas (excluding GLM ideas) compared to humans brainstorming alone (solitary brainstormers).
H1-d
Humans brainstorming with a GLM produce more valuable ideas (excluding GLM ideas) compared to humans brainstorming alone (solitary brainstormers).

2.3.2 Hypothesis Set 2: Dyad Performance

For hypothesis set 2: dyad performance, we compare the overall performance between the conditions, uncovering if superior performance by the human–GLM dyad compared to the individual – a key ambition for human–AI groups (Dellermann et al. 2019) – is achieved.
GLMs “are programmed to create vast amounts of text” (Haase and Hanel 2023, p. 10) and were shown to be capable of generating ideas (Haase and Hanel 2023; Dell’Acqua et al. 2023). Humans working with the GLM could repeatedly request and copy GLM ideas, steadily increasing group fluency directly. Thus, human–GLM dyads might outperform solitary humans with respect to fluency. In multi-human groups, GLM-supported groups produced more ideas (Specker et al. 2025). Therefore, we propose:
H2-a
Human–GLM dyads achieve higher fluency compared to solitary humans when brainstorming.
GLMs can produce many ideas, and a higher number of ideas is associated with higher flexibility (Nijstad et al. 2010). However, due to how machine-learning-based AI works, particularly as it is trained on historical data, there is a discussion on whether AI can be creative (Wu et al. 2021) or produces “more of the same”. Indeed, GLMs might reproduce bias and show limited output diversity (Bender et al. 2021; Lin et al. 2022). In a brainstorming context, producing “more of the same” might manifest in the GLM suggesting similar ideas, i.e., taking a narrow perspective on the topic, which aligns with the view of creativity being a comparative human strength (Dellermann et al. 2019). Additionally, research suggests humans working with GLMs to produce ideas with less variability, i.e., potentially lower flexibility (Dell’Acqua et al. 2023). However, recent work on group brainstorming indicates higher flexibility when working with a GLM (Specker et al. 2025). We thus propose:
H2-b
Human–GLM dyads achieve higher flexibility (i.e., breadth of exploration) compared to solitary humans when brainstorming.
Higher fluency and higher flexibility are associated with higher originality (Nijstad et al. 2010). Potentially, with the exploration of less common categories, less common (i.e., more original) ideas are explored (Nijstad et al. 2010). Additionally, a more in-depth exploration of categories is related to higher originality (Althuizen and Reichel 2016). Thus, the human and the GLM could jointly explore the problem more broadly or more in-depth, both of which is associated with higher originality (Nijstad et al. 2010). Previous research separately investigating humans and GLMs suggests that GLMs can produce ideas of comparable novelty and value to humans or even outperform many humans, though the most novel ideas still originate from humans (Haase and Hanel 2023; Boussioux et al. 2023). Thus, it was suggested that the human and AI could collectively outperform either entity alone, with humans focusing on producing novel and GLMs on producing valuable ideas (Boussioux et al. 2023). Furthermore, in a human–GLM brainstorming setting, many participants selected GLM ideas as their best ideas (Memmert and Tavanapour 2023). We thus propose:
H2-c
Human–GLM dyads produce more novel ideas compared to solitary humans when brainstorming.
H2-d
Human–GLM dyads produce more valuable ideas compared to solitary humans when brainstorming.

2.3.3 Hypothesis 3: Cognitive Load

For hypothesis 3: cognitive load, we consider how the cognitive load differs across the conditions. Social loafing describes the reduction of effort at the expense of others. Non-human group members may relieve cognitive burdens on humans and support them during task execution (Stieglitz et al. 2022; Boussioux et al. 2023). Therefore, the reliance of the human on AI might be desirable to maintain cognitive resources and improve work efficiency (“smart loafing”, Stieglitz et al. 2022), and previous research shows that humans might save cognitive resources when working with AI (Brachten et al. 2020). It is unclear whether this effect transfers onto divergent thinking tasks like brainstorming, i.e., whether humans brainstorming with GLMs could maintain cognitive resources and would, therefore, report lower levels of cognitive load upon task completion. However, having to pay attention to others’ ideas during brainstorming could result in a distraction effect (Pinsonneault et al. 1999) and potentially increase cognitive load (Coskun et al. 2000) due to the need to process the stimuli and additional knowledge activation (Santanem et al. 2004). Particularly, if the stimuli are of low relatedness, this could increase cognitive load if distant or even topic-irrelevant knowledge (Wang and Nickerson 2019) is activated. Moreover, prior research suggests that additional cognitive demand is placed on humans when working with GLMs, because they need to evaluate the outputs (Tankelevitch et al. 2024). Research for multi-human brainstorming groups showed that too many GLM hints can increase perceived cognitive load (Zhang et al. 2025) and that participants do not simply accept GLM ideas, but review and evaluate them (Gonzalez et al. 2024), adding to the cognitive load. We, therefore, propose:
H3
Humans brainstorming with a GLM report a higher cognitive load compared to solitary brainstormers.

3 Method

3.1 Experimental Design and Procedure

We implemented a single-factor between–subjects experiment with two conditions: human-only vs. human + GLM. The study had four steps: briefing, 10-min brainstorming (cf. Michinov 2012), selection of top ideas, and post-survey. The procedure is summarized in Fig. 4. Task instructions followed Osborn’s (1953) rules: we asked participants to come up with as many ideas as possible and encouraged wild ideas. The brainstorming question was: “How can we reduce food waste?”, which was used in prior studies (Zhu et al. 2021). Participants in the human + GLM condition could request GLM ideas during the session.
Fig. 4
Experimental procedure with differences between conditions marked in italic font
Full size image

3.2 Experimental Treatments

Participants were randomly assigned to one of two conditions. In the human-only baseline, each person brainstormed individually with the app’s GLM function deactivated. In the human + GLM treatment, participants can request and copy GLM suggestions during their brainstorming session whenever they desire; instructions required at least one such request.
There are many alternatives for selecting a baseline, e.g., the human brainstorms alone (ours, comparable to nominal brainstorming groups), with other support systems, or with other humans. We chose the human-only baseline for its research and practical relevance. From the perspective of human–AI collaboration research, the comparison is relevant to explore to what extent the human–AI group achieves “superior” or “complementary” performance (Dellermann et al. 2019; Hemmer et al. 2021, 2025) as compared to the individual, a key aspect of hybrid intelligence (Dellermann et al. 2019; Hemmer et al. 2021), which we will seek to quantify. Comparing the performance of individuals with an ICSS to unaided individuals has been typical for brainstorming research to understand the effects of such tools (Althuizen and Reichel 2016; Dell’Acqua et al. 2023; Siemon et al. 2015; Wang and Nickerson 2017). We did not use human groups as the baseline either, as we seek to understand the effect of working with the GLM on cognitive load in alignment with the “smart loafing” construct (Stieglitz et al. 2022). From a practical perspective, other humans for brainstorming might not always be (cost-efficiently) available.

3.3 Material: GPT-3.5-based Brainstorming Application and Use Process

To operationalize our treatment and investigate the effects of using GLMs for brainstorming, we developed a brainstorming app without social cues (Fig. 5), allowing adding, editing, and removing ideas based on prior research (Di Fede et al. 2022; Memmert and Tavanapour 2023), following the approach of adapting – through task-specific prompting – and integrating an existing GLM (see Schneider et al. 2024). We asked participants to enter ideas one at a time. Only for participants in the human + GLM condition, the GLM functionality on the right was shown. The list is empty initially. With every click on the “generate ideas” button, three ideas are generated and shown at the top of the list. The copy button allows for accepting the GLM ideas into the long list of ideas (left).
Fig. 5
Brainstorming app in the treatment condition (human + GLM) with participant’s long list of ideas (left) and on-demand-generated GLM ideas (right)
Full size image
Participants had to request GLM ideas (pull) instead of providing them automatically (push), as this was shown to be most effective (Siangliulue et al. 2015b). When the user clicked the “generate ideas” button, our app’s backend automatically assembled a prompt based on the current canvas state and sent a request to a GLM. The response was processed by our app’s backend and displayed as separate GLM ideas without any changes, e.g., without filtering or randomization. The GLM capability is embedded into the app, abstracting away the interaction with the GLM. This allows humans to focus on the task, freeing them from interacting with GLMs, which might not be easy for non-technical or novice users (Zamfirescu-Pereira et al. 2023). Online Appendix A (available online via http://link.springer.com) depicts the user’s interaction with the app and the information flow to and from the GLM.
As a GLM, we used GPT-3.5-turbo-0301 with a temperature of 0.9 as OpenAI (2023) recommends for creative tasks. At every “Generate ideas” click, the backend filled a pre-defined prompt template (cf. Memmert et al. 2024a; see Fig. 6; full explanation in Online Appendix B) with the brainstorming question and the number of ideas per request, and the live canvas state:
  • The participant’s own idea list (left panel);
  • Any ideas the GLM had already produced for that participant in the same session (right panel).
Fig. 6
Prompt template with placeholders for producing GLM ideas
Full size image
Including these lists follows Osborn’s (1953) idea of building on others’ ideas and might lead the GLM to avoid repeats. No ideas were shared across participants, so sessions remained independent. For each request, the GLM returned three ideas, delivered as an ordered list and limited to 20 words each. The word limit was introduced because pre-tests showed responses to be lengthy otherwise. To ensure a consistent participant experience, if the request to the GLM timed out, we displayed suggestions generated with the initial prompt.

3.4 Participants

77 participants brainstormed and completed the survey. We excluded data from two participants: one who misunderstood the task, and one who did not request any GLM ideas despite being in the human + GLM condition. Of the remaining 75 participants (age: mean = 23.3, min = 18, max = 37 years; gender: 13 female, 62 male), all but one are in study programs at the informatics department. The experiment was integrated into in-person university seminars across four courses.

3.5 Measures

We measured brainstorming performance via idea fluency and different idea quality dimensions as common brainstorming performance measures (Paulus 2000). Idea fluency refers to the number of ideas produced (Paulus 2000). However, fluency is not meaningful for directly comparing humans and GLMs, because GLMs can autonomously generate large quantities of text with minimal effort (Haase and Hanel 2023). In our design, by contrast, GLM output is human-bounded: a limited number of ideas are generated only when participants explicitly request them, and these ideas can be added to the idea list only through an active copy action. Consequently, the GLM cannot autonomously inflate fluency through continuous generation. Instead, the number of GLM ideas in a session depends on the participant’s choice, making the dyad’s output the relevant unit of analysis, as it reflects interaction behavior rather than raw machine capacity.
For quality, we followed the approach of Siangliulue et al. (2015a) in having participants select their best ideas after the session, i.e., reducing their long list to a shortlist of four ideas. Shortlisted ideas were evaluated independently by five blind-to-condition judges, including the three authors not involved in app development, on a 7-point Likert scale. We adapted the dimensions and their definitions from Siangliulue (2015b), evaluating the ideas according to novelty – “consider how novel, original or surprising the idea is” – and value – “consider how useful the idea is.” The judges were provided with examples of ideas rated as very/somewhat/not at all novel and valuable, respectively, extracted from a prior study on the same topic (Zhu et al. 2021). We calculated the averages for novelty and value for the shortlisted ideas per participant. Including only the shortlisted ideas ensures that a high number of ideas, which might include low-quality ideas, is not penalized.
GLMs can be used as judges for open-ended questions, an approach known as LLM-as-a-judge (Zheng et al. 2023), and, more specifically, to rate ideas (Dell’Acqua et al. 2023; Haase and Hanel 2023; Organisciak et al. 2023), showing good correlation with human ratings. Hence, we used GPT-4 in addition to human judges. Adapting an existing approach (Organisciak et al. 2023), we prompted the GLM to rate each idea on novelty and value. We provided the GLM with the same data as the human raters to achieve comparability, including the same definition of novelty and value, the same and exemplary rated ideas, which was shown to increase performance (Organisciak et al. 2023). Results based on GLM ratings are reported separately. Besides considering the participant’s self-selected idea shortlist, we used the “Good-Idea-Count” metric, counting ideas exceeding a set rating threshold (Reinig et al. 2007) in the idea long list, enabling an analysis independently of the subjective selection of the participants. As a threshold, we used the average GLM rating for novelty and value, respectively.
Beyond individual ideas, we consider how broadly the topic was covered in each session, so-called flexibility, by categorizing each idea (Nijstad et al. 2010; Althuizen and Reichel 2016). After reviewing multiple category systems on the brainstorming topic “food waste”, we selected Specht et al.’s system (2019), as its development was based on (1) user-generated content (tweets) of similar length compared to the contributions in our study and based on (2) contributions from a Western country (US; our participants attend a Western European University). Additionally, (3) the system had an appropriate level of granularity (Althuizen and Reichel 2016). A blind-to-condition student assistant categorized all except for three ideas. Non-categorized and deleted ideas were excluded from the analysis. For each brainstorming session, flexibility was determined as the number of categories with at least one idea present.
We considered both individual- and dyad-level performance (see Fig. 2). For the human-only condition, dyad performance is set to individual performance. On the individual level, for the human + GLM condition, only ideas originating from the human were considered. GLM ideas that were not included in the long list by the human were not considered, which was clearly explained in the task introduction.
For measuring perceived cognitive load, we used the same instrument as Stieglitz et al. (2022), the self-report-based, six-dimensional NASA-TLX (Hart and Staveland 1988). The post-survey contained open- and closed-ended questions (Tab. 1, Online Appendix C for full questionnaire). The operationalization for all brainstorming performance metrics and variables is detailed in Online Appendix D.
Table 1
Open-ended questions for the human + GLM condition
Felt responsibility (after asking participants to quantify their felt responsibility):
Please briefly explain your reasoning for the previous answer for the share of responsibility regarding the result type 'list of ideas' (prior to final idea selection)
Please briefly explain your reasoning for the previous answer for the share of responsibility regarding the result type 'list of 4 best ideas' (after the final idea selection)
GLM effect:
How did the task change for you due to the presence of the AI (as compared to brainstorming alone)?
How did the AI (ideas/suggestions) help you in this brainstorming task?
How did the AI (ideas/suggestions) hinder you in this brainstorming task (i.e., made it difficult to perform the brainstorming task)?

3.6 Data Collection, Preparation, and Analysis

We use two sources of data: the participants’ interaction data from the brainstorming sessions, such as the number of ideas or the number of suggestion requests, and the survey data, allowing us to contrast actual behavior with the participants’ perception. For the evaluation of the shortlisted ideas, the ideas were randomized, exported, and evaluated. Using the consistency definition with average measures (McGraw and Wong 1996), the inter-rater reliability ICC(C,k) was 0.864 for novelty and 0.736 for value, indicating “strong agreement” (LeBreton and Senter 2008). Additionally, we adapted an approach of using GPT-4 to rate the ideas (Haase and Hanel 2023) on the same dimensions and scale. We find a significant positive correlation with high correlation coefficients for both novelty (Spearman ρ = 0.726, p < 0.001) and value (ρ = 0.727, p < 0.001) between average human and GPT-4 ratings, indicating humans and GPT-4 generally agree on idea ratings, which is consistent with prior literature. For statistical analysis, we used the JASP statistics software (JASP 2023). A sensitivity analysis in G*Power (Faul et al. 2009) using two-tailed tests with α = 0.05, 80% power and group sizes of 38 and 37, showed the experiment was sensitive to effects of Cohen’s d ≥ 0.656 for independent-samples t tests and d ≥ 0.671 for Mann–Whitney U tests.
To analyze the responses to the qualitative open-ended questions (Table 1), we followed a qualitative content analysis approach (Hsieh and Shannon 2005; Mayring 2014), more specifically, a directed content analysis (Hsieh and Shannon 2005). The goal of a directed approach is to “extend conceptually a theoretical framework,” particularly when “existing theory or prior research exists about a phenomenon that is incomplete or would benefit from further description” (Hsieh and Shannon 2005, p. 1281). This fits well with our study: while the mechanisms we investigate – cognitive stimulation/inertia, effort (in-)dispensability, responsibility (non-)diffusion, and cognitive load – are well established in human groups, their manifestation in human–AI settings is uncertain. Because these same mechanisms are used to motivate and derive our quantitative hypotheses (see Sect. 2.3), we used these as deductive categories for our qualitative coding. This alignment allows the qualitative analysis to extend and contextualize the interpretation of the quantitative findings.
After coding approximately 20% of the responses, two authors reviewed and discussed the coded material to resolve discrepancies and clarify category application. The finalized coding scheme (Online Appendix E) was used to analyze the remaining material. Individual answers to the open-ended survey questions were used as the unit of analysis, and illustrative quotes are reported in the results section.

4 Results

Participants contributed 373 ideas, made 206 requests for suggestions, copied 177 suggestions, and selected 283 ideas as their best ideas. Of those ideas selected as best, 224 (79.2%) originated from humans, and 59 (20.8%) originated from the GLM. We compared the two conditions on brainstorming performance on two levels (human, dyad) and four measures (fluency, flexibility, novelty, value), and on cognitive load.

4.1 Hypothesis Set 1: Human-Level Performance

To measure the GLM’s effect on humans, we compared the humans’ performance, including only human ideas, but excluding GLM ideas. The normality assumption was not fulfilled for the human-only condition (Shapiro–Wilk: W = 0.877, p < 0.001) for fluency on the human level (humanfluency). Thus, we calculated the non-parametric Mann–Whitney-U test, which showed a significant difference (U = 475.0, p = 0.015; Fig. 7a) between the human-only (mdn = 5) and human + GLM (mdn = 4) condition. The rank-biserial correlation rB = −0.324 suggests this to be a medium effect size (Goss-Sampson 2022). This corresponds to a Cohen’s d = 0.581 – below the study’s sensitivity threshold of d ≥ 0.671 for Mann–Whitney tests – so the significant difference should be interpreted with caution.
Fig. 7
Comparison of human-level performance on a Humanfluency, b Humanflexibility, c Humannovelty, d Humanvalue (**p <.01)
Full size image
For human-level flexibility, we counted the distinct categories covered per session. The normality assumption was not fulfilled for the human-only condition (W = 0.914, p = 0.006). The Mann–Whitney U test did not show a significant difference (U = 656.0, p = 0.617; Fig. 7b) for flexibility (humanflexibility).
We compared the shortlisted ideas’ qualities on a human level. While for the human-only condition, necessarily all ideas originated from the human, for the human + GLM condition, four participants who did not select any of their own ideas into the shortlist were excluded. Regarding humannovelty, the assumption of normality was not fulfilled for the human + GLM condition (W = 0.923, p = 0.022). The calculated Mann–Whitney-U test was not significant (U = 699.5, p = 0.406; Fig. 7c). For valuehuman, the assumption for normality was not fulfilled for the human + GLM condition (W = 0.928, p = 0.032). The test did reveal no statistically significant difference between the groups (U = 737.5, p = 0.204; Fig. 7d).
In line with recent studies (Organisciak et al. 2023; Haase and Hanel 2023), we used GPT-4 (additionally to humans) to rate the ideas and performed the same comparative group analyses. Again, we find no significant differences for novelty (U = 757.0, p = 0.569) and value (U = 653.5, p = 0.602) for the shortlisted ideas, underlining the robustness of our findings. Additionally, using the “Good-Idea-Count” metric (Reinig et al. 2007) with GPT-4 ratings, we counted the number of good ideas, i.e., above average ideas for novelty (mean = 3.396) and value (mean = 4.747), respectively, in the long-lists of ideas. We find no significant differences for novelty (U = 663.5, p = 0.672) and value (U = 619.5, p = 0.369), confirming the results.

4.2 Hypothesis Set 2: Dyad-Level Performance

For dyad-level fluency (dyadfluency), the assumption of normality was not fulfilled for both groups (human-only: W = 0.877, p < 0.001, human + GLM: W = 0.818, p < 0.001). The Mann–Whitney U test shows a significant difference between the human-only (mdn = 5) and human + GLM (mdn = 7) condition on dyad-level fluency (U = 963.5, p = 0.006; Fig. 8a). The rank-biserial correlation rB = 0.371 suggests this to be a medium to large effect size (Goss-Sampson 2022).
Fig. 8
Comparison of dyad-level performance on a Dyadfluency, b Dyadflexibility, c Dyadnovelty, d Dyadvalue (**p <.01, ***p <.001)
Full size image
We then analyzed the flexibility, i.e., the number of distinct categories covered on the dyad level. The assumption of normality was not fulfilled for either of the conditions (human-only: W = 0.914, p = 0.006; human + GLM: W = 0.941, p = 0.048). The Mann–Whitney U test showed a significant difference (U = 1080.0, p < 0.001; Fig. 8b) with a large effect size (rB = 0.536) for dyadflexibility between the human-only (mdn = 3) and human + GLM (mdn = 5) conditions.
Lastly, we analyzed the shortlisted ideas according to novelty and value on a group level. For dyadnovelty, the assumptions of normality and homogeneity of variances (homoscedasticity) were fulfilled. The Student’s t-test showed a significant difference with a medium to large effect size (t(73) = 2.889, p = 0.005, Cohen's d = 0.667; Fig. 8c) for dyadnovelty between the human-only (mean = 2.876) and human + GLM (mean = 3.396) condition. For dyadvalue, the assumptions of normality were not fulfilled for the human + GLM condition (W = 0.797, p < 0.001). The Mann–Whitney U test revealed a significant difference with a medium to large effect size (U = 994.0, p = 0.002, rB = 0.414; Fig. 8d) between the human-only (mdn = 4.425) and human + GLM condition (mdn = 4.900) for group level value. The results are shown in Fig. 8 with interval plots for the variables following the normal distribution and boxplots for non-normally distributed variables.
Using GPT-4 ratings, we found results similar to the human ratings for the shortlisted ideas, i.e., significant differences between the human-only and human + GLM condition for both novelty (groupnovelty GPT-rated: U = 932.5, p = 0.015) and value (groupvalue GPT-rated: U = 986.5, p = 0.003). Using the “Good-Idea-Count” (Reinig et al. 2007), we find significant differences between the conditions for both novelty (U = 1073.5, p < 0.001; mdnhuman-only = 2, mdnhuman+GLM = 4) and value (U = 1095.5, p < 0.001; mdnhuman-only = 2, mdnhuman+GLM = 6), increasing result robustness.

4.3 Hypothesis 3: Cognitive Load

To understand if humans could reduce their cognitive load by relying on the GLM (cf. Stieglitz et al. 2022), we calculated the Student’s T-test. We find no significant difference between the two conditions overall (t(73) = -−0.892, p = 0.375; Fig. 9) and for none of the six dimensions (mental demand, physical demand, temporal demand, performance, effort, frustration) of the cognitive load instrument (Cronbach’s α = 0.762, Hart and Staveland 1988).
Fig. 9
Comparison of cognitive load
Full size image
In summary, brainstorming performance was significantly lower regarding fluency for humans working with GLMs. We did not observe significant differences for flexibility, novelty, or value for ideas of human origin between conditions. However, we found brainstorming performance to be significantly higher for human–GLM dyads as compared to unaided humans for fluency, flexibility, novelty, and value, i.e., superior group performance (Dellermann et al. 2019; Hemmer et al. 2025, 2021) compared to individuals occurred. We find no significant difference between the conditions for cognitive load overall or on individual dimensions. Table 2 summarizes the findings.
Table 2
Findings for human- and dyad-level performance and cognitive load between the conditions human-only and human + GLM
Hypotheses
Level
Variable
Significant difference at 95%-confidence
Condition with higher performance (or, for H3, with higher cognitive load)
Actual
Hypothesized
Actual
1-a
Human
Fluency (No. of ideas)
Yes
Human + GLM
Human-only
1-b
Human
Flexibility (No. of categories)
No
Human + GLM
1-c
Human
Novelty
No
Human + GLM
1-d
Human
Value
No
Human + GLM
2-a
Dyad
Fluency (No. of ideas)
Yes
Human + GLM
Human + GLM
2-b
Dyad
Flexibility (No. of categories)
Yes
Human + GLM
Human + GLM
2-c
Dyad
Novelty
Yes
Human + GLM
Human + GLM
2-d
Dyad
Value
Yes
Human + GLM
Human + GLM
3
Human
Cognitive Load
No
Human + GLM

4.4 Explorative Analysis of Performance-affecting Aspects

To contextualize our findings, we explore performance in the human + GLM condition, particularly regarding the mentioned group effects. First, we explore how the shortlisted ideas differ depending on their origin. Of the 283 shortlisted ideas, 224 were of human and 59 of GLM origin. The normality assumption was not fulfilled for the set of human ideas for both novelty (W = 0.945, p < 0.001) and value (W = 0.929, p < 0.001). The Mann–Whitney-U test showed a significant difference (U = 9654.0, p < 0.001) between the ideas originating from humans (mdn = 2.8) and from GLM (mdn = 4.0) on novelty with a medium to large effect size (rB = 0.461), and for value (U = 10,573.0, p < 0.001, human: mdn = 4.6, GLM: mdn = 5.2) with a large effect size (rB = 0.600). We found the same significant differences when using GPT-4 ratings. Given that GLM ideas were rated significantly higher, we thus assume a higher share of GLM ideas in the participants’ shortlist to be correlated with dyad-level novelty and value ratings. We find a strong positive correlation between the share of GLM ideas on the shortlist and both dyadnovelty (Person’s r = 0.458, p = 0.004) and dyadvalue (Spearman’s ρ = 0.649, p < 0.001).

4.4.1 Cognitive Stimulation and Cognitive Inertia

We analyzed the responses to the open-ended survey questions to shine a light on potential cognitive stimulation or inertia when working with a GLM and receiving its dynamically generated ideas. In line with prior work (Houde et al. 2025; Memmert and Tavanapour 2023), several participants stated to have been inspired by the GLM ideas to start generating ideas, to generate additional ideas, or to get unstuck (sic!):
  • “It was an insperational help” (P68) and “It inspire me” (P55)
  • “The new ideas of the AI helped me thinkig of additional ideas” (P66)
  • “The ideas on the side which came from the AI were a great inspiration for developping new ideas on my own” (P67)
Cognitive stimulation might manifest in humans producing more, or more novel or valuable ideas (Althuizen and Reichel 2016; Dugosh et al. 2000). However, as described above (hypothesis set 1), humans in the human + GLM condition produced fewer ideas themselves (humanfluency) and did not show higher levels of flexibility, novelty, or value (humanflexibility, humannovelty, humanvalue), which seems contrary to these subjective reports of feeling inspired. We asked participants about their perceived cognitive stimulation, adapting a 6-item instrument by Gozzo et al. 2022 (α = 0.723). We did not find correlations between self-reported cognitive stimulation and humanfluency, humanflexibility, humannovelty, or humanvalue to be significant. Thus, potentially, participants did not actually experience cognitive stimulation but just felt to more efficiently and holistically cover the topic as a (human–GLM) dyad. Indeed, there were significant correlations of reported cognitive stimulation with dyadfluency (Spearman’s ρ = 0.436, p = 0.007) and dyadflexibility (r = 0.402, p = 0.014). From many of the participants’ answers, it was unclear if participants felt that the GLM ideas actually stimulated them or if these GLM ideas were “just” used as additional ideas for the long list (sic!):
  • “It made it easiert to come up with more solutions that I wouldn't have thought of” (P45)
  • “AI can help you to bring more ideas to light” (P74)
  • “It was easier to generate many ideas” (P41)
Similarly, responses regarding GLM ideas’ flexibility were mixed. Some participants stressed that the GLM helped them to gain a new perspective, which could hint at the AI ideas successfully broadening the breadth of exploration and reducing cognitive inertia:
  • “The AI included fields that I didn't think about before” (P66)
  • “Because of the AI I first came up with certain ideas” (P74)
  • “They showed me other fields that I had not considered before” (P63)
However, as reported above, while we observed increased flexibility on the dyad level (dyadflexibility), we did not observe a significant difference on the individual level (humanflexibility). Indeed, several participants perceived GLM ideas as repetitive (e.g., P44, P45, P48), with some stating they felt “governed” or their focus was set in a certain direction, which some felt reduced (“cutting off”, P69) or “took away” (P42) their creativity or concentration:
  • “It felt a lot faster, but a bit repetitive also since the AI started generating ideas similar to the previous ones” and “[…] It definitely felt like my thought process was being governed by the AI […]” (P48)
  • “The AI stopped coming up with original ideas after a few were generated and I couldn't concentrate on coming up with my own ideas” (P62)
  • “[…] set my focus in the direction of the suggestions” (P59)
This suggests that GLM ideas may risk narrowing ideation or reinforcing existing idea paths (cf. Zhan et al. 2024). Although fixation was not universal, this tension between stimulation and narrowing may help explain why individual performance effects were muted, even when some participants reported subjective feelings of stimulation and results surfaced dyad-level gains.

4.4.2 Free Riding and Smart Loafing

Reduced effort and disengagement: Regarding free riding, qualitative responses indicated that some participants reduced their effort or became disengaged, for example, stating:
  • “I became lazy” (P61) and “I didn’t do much” (P61)
  • “By simply leaving [the AI] as an option, I didn't see the need to make an effort to come up with new ideas” (P46)
  • “I feel like using the AI made my brain shut off a little bit” (P47) and “It took the “thinking” out of my hand” (P47)
  • “I had to do less. […]” (P50)
Dispensability of effort: A contributing factor for free riding is felt dispensability of effort (Pinsonneault et al. 1999), i.e., the feeling that (additional) own effort would not (substantially) improve results. In line with recent literature finding that GLMs can produce creative ideas comparable to many humans (Haase and Hanel 2023), several participants expressed a sense of inferiority compared to the GLM when it came to generating ideas, e.g., describing themselves as “not that creative” (P60) or stating that GLM ideas were “overall better defined and closer to the actual topic” (P42). However, other participants described the GLM ideas as of “lower quality and more repetitive” (P44), “uncreative and boring” (P51), “too mundane and common” (P75), or as “generic answers already are established in a way” (P73) that would “create different problems” (P53), and stating that their “ideas are unique and make more sense to [them] than the ‘cheap’ AI suggestions” (P49). These mixed results may hint at underlying individual differences for working with GLMs in creative tasks (cf. Memmert et al. 2024b).
Diffused responsibility: An additional aspect related to free-riding is diffused responsibility (Pinsonneault et al. 1999; Latané et al. 1979). In the post-survey, we asked participants about the degree of responsibility they felt for the longlist and shortlist of ideas. We find that none of the participants claimed full responsibility for the long list, and only 6 participants (16.2%) for the shortlist of ideas. We found a strong correlation between the felt responsibility and the share of ideas originating from the human in the long list (r = 0.710, p < 0.001). The same holds true for the share of human ideas on and felt responsibility for the shortlist (r = 0.489, p = 0.002). This is also reflected in the qualitative responses, where many participants provided a rationale based on the contribution share for the level of felt responsibility, e.g., stating “I shared a low responsibility because I choose to include comparatively many ideas from the AI.” (P44).
Effort shift and cognitive load: Tying our findings back to theory, smart loafing refers to “the reduction of effort […] to maintain cognitive resources and enhance efficiency in work” (Stieglitz et al. 2022). While some participants reported having reduced their effort as described above, other participants offered more nuanced responses (sic!):
  • “The AI made the task easier and less mentally straining, I didn't have to put all of my focus on coming up with ideas, only on reading and evaluating the AI's ideas” (P62)
  • “It was less Brainstorming, more likely a reading task where i chose the best options” (P60)
  • “I had a helping hand which never ran out of ideas, so it was more stress releving.” (P68)
Rather than an effort reduction, these responses signal an effort shift from generating to curating ideas. This effort shift seemed to have been both a source of feelings of indispensability and non-diffused responsibility according to some participants, e.g., stating “[…] I think I wouldnt really relie on AI ranking” (sic!, P47) for the former, and “I evaluated all of the ideas on my own an[d] picked the what i thought were the best ideas” (P68) for the latter. When asked about how to achieve the best brainstorming performance with our app, most participants (n = 30) selected a mix between generating and curating ideas (see Fig. 10). As participants in the human + GLM condition spent time reviewing and selecting GLM ideas, this might explain their lower fluency. Indeed, the number of suggestion requests and humanfluency is negatively correlated (r = −0.419, p = 0.01).
Fig. 10
Self-reported optimal effort distribution
Full size image
This effort shift may explain why we did not find a significant difference in cognitive load, contrary to expectations under the smart-loafing assumption. Though several participants reported to have felt working with the AI made the task “easier” (e.g., P41, P42), “less mentally straining” (P62), and “took the stress away” (P59), many reported to having to read and evaluate GLM ideas, explaining “I had to reject the ai suggestions and pay attention not to be influenced too much by it or to be bogged down in the ai's way of thinking” (P75). While not explicitly stated by many as additional cognitive load, this constant evaluation effort might have offset the load reduction due to the GLM contributing ideas (cf. Tankelevitch et al. 2024). Thus, while working with GLMs can offer performance improvements, there are also risks associated.

5 Discussion

5.1 General Discussion

We explored the effect of a GLM-based ICSS on brainstorming performance and cognitive load through the lens of group mechanisms known from purely human groups. We find that when excluding GLM ideas, humans in the human–GLM dyad produce fewer ideas compared to solitary brainstormers (hypothesis set 1). However, human–GLM dyads collectively, i.e., including GLM ideas, produce more, more novel, and more valuable ideas and cover more categories, compared to solitary brainstormers (hypothesis set 2). There was no difference in cognitive load between the conditions (hypothesis 3).

5.1.1 Superiority of Human–GLM Brainstorming

A key goal for human–AI collaboration is for the human–AI group to achieve superior performance compared to the human; that is, working with the AI system should yield better results than working alone (Dellermann et al. 2019). Similarly, Hemmer et al. (2021, p. 2) and Hemmer et al. (2025) describe the desired outcome for a human and an AI – which they label team members – as working together to achieve “complementary team performance (CTP),” defining it as “the team performance exceed[ing] the maximum performance of both individual entities.” AI was traditionally discussed as being helpful for (automating) repetitive tasks (Krogh 2018), whereas creative tasks were thought to be in the realm of humans (Redifer et al. 2021; Dellermann et al. 2019), particularly as AI systems’ output was thought to potentially lack diversity (Bender et al. 2021). After the creativity of GLMs and humans was investigated in isolation (Organisciak et al. 2023; Haase and Hanel 2023; Boussioux et al. 2023), we studied joint human–AI brainstorming, as has been called for in the literature (Di Fede et al. 2022; Liu et al. 2023; Rafner et al. 2023; Wu et al. 2021). In empirically demonstrating the superiority of the human–AI dyad in brainstorming, i.e., achievement of complementary performance (Hemmer et al. 2021, 2025) of the human–AI dyad compared to the individual human, we confirm what was hypothesized in the literature (Boussioux et al. 2023). Our results partially confirm earlier findings, such as those showing heightened fluency and flexibility when brainstorming with AI (Demir et al. 2024; Specker et al. 2025), but contradict other findings, e.g., those indicating reduced novelty (Demir et al. 2024; Kumar et al. 2025). Overall, the results align with the recent meta-analysis by Holzner et al. (2025), which suggests that human–AI dyads outperform solitary humans in creative performance, particularly when participants are non-domain experts and the AI is a capable GLM like GPT-3.5. However, our study offers a more nuanced analysis, distinguishing among four brainstorming metrics rather than using a single aggregated measure of creative performance. Holzner et al. (2025) report that AI reduces the diversity of ideas, whereas our study shows increased flexibility of human–AI dyads as compared to solitary humans. We find that human–AI dyads cover more categories of ideas compared to solitary humans. This may hint at the trade-off that although individuals may benefit from AI usage to increase the human–AI dyad’s performance, on an aggregated level – when comparing pooled ideas of solitary humans and pooled ideas of human–AI dyads – results may become more homogeneous (cp. Dell’Acqua et al. 2023). Future research should explore the differences in measurement approaches, for example, by comparing flexibility through assessing the number of categories covered for a topic (as in our study) with statistical approaches, such as embedding-based measures like cosine similarity (e.g., Dell’Acqua et al. 2023; Holzner et al. 2025).
Moreover, future research should extend performance benchmarking to include both human–human dyads and AI-only conditions to provide a more comprehensive comparison of creative performance across different collaboration types. Including a human–human condition would enable disentangling the effects of GLM support from the general effects of working with another idea-contributing partner. While we deliberately omitted a human–human group to isolate the cognitive and behavioral mechanisms of human–AI interaction, thereby avoiding human social dynamics, future work could systematically contrast these settings. Such comparisons would directly respond to calls for studying human–AI complementarity (Hemmer et al. 2025; Dellermann et al. 2019) and to recent findings that challenge the notion that human input necessarily has a positive performance impact (Lee and Chung 2024; Vaccaro et al. 2024).

5.1.2 Unexpected Effect of GLM-Based ICSS on Human Performance

Intuitively, the results may not come as a surprise – tool support leads to better performance. However, our analysis shows that when excluding GLM ideas, humans working with GLMs themselves do not perform better compared to solitary brainstormers; instead, they perform worse. This is contrary to expectations for traditional stimuli-based ICSS (Wang and Nickerson 2017). After all, stimulating the human to create more and more novel ideas is the goal of a stimuli-provider ICSS; i.e., ICSS-induced cognitive stimulation should manifest in higher individual brainstorming performance (Pinsonneault et al. 1999).
Siangliulue et al. (2015b, p. 88) demonstrated that stimuli-based ICSS are most effective when humans request stimuli themselves (i.e., pull; as in our app), and they showed that “participants primarily requested examples when they ran out of ideas.” While there is no agreed-upon duration for brainstorming, durations in experiments between 4, 10 (this study), and 15 min are common (Siangliulue et al. 2015a; Michinov 2012; Pinsonneault et al. 1999; Dennis et al. 2013), and past research has shown that fluency rapidly declines after the first few minutes (Kohn and Smith 2011; Baruah and Paulus 2016), with participants adopting more GLM ideas than producing ideas themselves after about 3–4 min (Memmert et al. 2024b). Given those empirical timings, participants in our study had sufficient time to both contribute their own ideas and interact with the ICSS when running out of ideas.
In our study, the ICSS provides stimuli in the form of full ideas rather than only cues, the latter of which is common for prior stimuli-provider ICSS. In that, working with our tool might be somewhat similar to brainstorming with another human, who also contributes full ideas (see Fig. 1). According to the literature, both incomplete ideas from ICSS and complete ideas from other humans can induce cognitive stimulation and consequently improve human brainstorming performance (Pinsonneault et al. 1999; Wang and Nickerson 2017). However, contrary to those expectations, here, external stimuli (i.e., GLM ideas) did not increase humans’ brainstorming performance, but rather resulted in the humans generating fewer ideas themselves. Thus, while both the traditional and the GLM-based ICSS resulted in better collective group-level performance, different mechanisms seem to drive these improvements.
What seems interesting is that despite participants objectively not producing more (novel) ideas when working with the GLM, they – in line with prior research (Specker et al. 2025) – subjectively reported feeling stimulated or inspired, which could be rooted in the new possibility that brainstorming with a GLM provides. Unlike with traditional ICSS, which require the human to develop a full idea from the stimuli, with our app providing complete, creative ideas – GLM ideas were rated as more novel than human ideas – these GLM ideas can simply be copied by the human without additional effort. While this is similar to brainstorming with another human, participants did not have to credit another person for ideas they had not developed themselves. Hence, humans potentially felt subjectively inspired and indeed perceived themselves as having covered the topic more broadly because they subconsciously attributed the GLM ideas to themselves. This perspective is compatible with the theory of the “extended self” (Mirbabaie et al. 2021) and with viewing GLM-based ICSS as merely tools.
Perhaps, this reduced the drive of humans to come up with more ideas themselves based on the GLM's ideas, which might mean that the dyad-level performance improvement is mainly driven by the GLM’s ideas, rather than by improved human performance. This raises the question of whether “collaborative complementarity potential” (Hemmer et al. 2025, p. 7) is realized – that is, not merely that the dyad outperforms either entity alone, but that their interaction yields performance exceeding the additive combination of their individual contributions – or, put simply, whether the whole exceeds its parts.

5.1.3 Smart Loafing Perspective on Performance

Another potential explanation for us not observing higher performance with increased cognitive stimulation could be that the performance-enhancing and -reducing effects canceled each other out. Liu et al. (2023) raised the question, “Will humans be Free-Riders?”. Indeed, we observe humans working with the GLM to contribute fewer ideas (hypothesis set 1), and some admitted to having become “lazy,” suggesting that free riding might have occurred. A cause for free riding is the phenomenon of diffused responsibility. Indeed, no participant felt full responsibility for the long list of ideas, and only a few for the shortlist. Such responsibility attribution to non-human group members has been observed before (Stieglitz et al. 2022). Thus, one might assume that humans reduced their effort, i.e., free rode (Latané et al. 1979). However, on a group level, human + GLM dyads performed better (hypothesis set 2), so task performance was not harmed. We thus concur with Stieglitz et al. (2022) that free riding or social loafing may need to be conceptualized differently in human–AI groups, as it may sometimes be desirable, particularly if it does not lead to a reduction in the quality of the result. Below, we reflect on their concept of smart loafing, i.e., humans' effort reduction to maintain cognitive resources while enhancing work efficiency (Stieglitz et al. 2022). We discuss the key parts of the definition below.
Though perhaps occasionally, overall, it is not apparent that participants reduced their effort when working with the GLM. Humans in the human-only condition focused exclusively on generating ideas, whereas humans in the human + GLM condition split their effort between generating and curating (requesting, evaluating, and adjusting) ideas, implying a difference in roles from pure creator to creator and curator (cf. Gonzalez et al. 2024; Tankelevitch et al. 2024). Perhaps there was not a reduction in effort but a shift in effort allocation.
Maintaining cognitive resources could mean conserving them when performing the task due to the AI. We would then expect lower cognitive load levels for humans working with the GLM. It could also mean improving performance while not increasing cognitive load, i.e., enhancing work efficiency. Work efficiency enhancement was included in the original definition (Stieglitz et al. 2022) and was observed in our study, i.e., with similar cognitive load, a higher collective task performance was achieved by the human–GLM dyad. We did not observe lower cognitive load, i.e., humans did not save cognitive resources when working with the GLM (hypothesis 3). A potential explanation is that there was a shift instead of a reduction in cognitive load. According to cognitive load theory, cognitive load comprises intrinsic, extraneous, and germane components (Sweller 1988; Paas et al. 2003). While the intrinsic cognitive load for generating own ideas might have decreased, an additional intrinsic cognitive load for curating ideas might have occurred (Fig. 11). Confirming earlier reports of such behavior (Gonzalez et al. 2024), many participants reported having reviewed the GLM ideas before adopting them. These findings align with prior research suggesting that GLMs may introduce additional cognitive demands, for example, for the evaluation of AI outputs (Tankelevitch et al. 2024). Moreover, additional extrinsic cognitive load due to the presentation of the GLM ideas may have occurred, and germane cognitive load – the mental effort required for constructing, refining, or integrating knowledge structures or schemas (Paas et al. 2003, p. 2) – may have been maintained. Together, this may explain why we did not observe a reduction in cognitive load.
Fig. 11
Schematic comparison of the aspired (reduction) and potentially observed (shift) effect on cognitive load when brainstorming with a GLM
Full size image
Future research might differentiate cognitive load types (Cheng et al. 2020; Javadi et al. 2013) and explore if GLM ideas can be controlled to balance the positive effects of offering ideas and the negative effects of inducing cognitive load, e.g., by controlling idea relatedness (Baruah and Paulus 2016; Wang and Nickerson 2019, 2017), e.g., through prompt design or filtering (Summers-Stay et al. 2023) of GLM ideas, or by delivery mechanisms (Siangliulue et al. 2015b).
Smart loafing implies a deliberate and intelligent approach to loafing. This may require reflection on one’s own and the GLM's comparative task-relevant capabilities, such as creative ability or topic knowledge, to ultimately achieve a symbiotic collaboration and a productive division of work (Benbya et al. 2024; Boussioux et al. 2023; Vaccaro et al. 2024). We found signs of this reflection in some qualitative responses. In the literature, various roles have been proposed for AI in human–AI work (Siemon 2022a). Perhaps similarly interesting is the human’s role expansion from creator to curator (Memmert et al. 2025). It might be fruitful to structure human roles and support individuals in taking a good role or effort split, i.e., striking a balance between creating and curating in open-ended settings, enabling them to loaf “smartly”.
A related aspect is the attribution of responsibility in human–AI settings. As in prior studies (Stieglitz et al. 2022), participants did not attribute full responsibility to themselves despite being the only human. The share of human ideas was strongly positively correlated with felt responsibility. While causation is unclear, this might hint at humans feeling less responsible the more the AI contributes, even when in control of result curation. With GLMs becoming more capable, this may introduce problems. Future research should explore the rationale for responsibility attribution, such as the perception of the AI on the spectrum from tool to colleague, to inform interventions for calibrating felt responsibility.
Smart loafing was suggested to be desirable for repetitive or overhead tasks (Stieglitz et al. 2022). We show that it might be desirable even for one-off tasks, such as brainstorming for a specific problem, and thus agree with the call for extending the focus of smart loafing beyond its initial, more narrow scope towards other, more cognitively challenging tasks.

5.2 Contributions to Literature and Theory

With our study, we contribute to the long history of literature on ICSS and group performance mechanisms (Wang and Nickerson 2017; Gabriel et al. 2016; Althuizen and Reichel 2016) as well as to the emerging information systems human–AI collaboration literature, particularly regarding human–AI (superior) performance (Dellermann et al. 2019; Hemmer et al. 2021), group dynamics (Makarius et al. 2020; Liu et al. 2023), and roles (Siemon 2022a).
Following up on prior research comparing brainstorming of humans and GLMs in isolation (Koivisto and Grassini 2023; Haase and Hanel 2023), and human groups with and without GLM support (Specker et al. 2025), we investigate joint human–GLM brainstorming as suggested, e.g., by Di Fede et al. (2022) and discussed by Holzner et al. (2025). Achieving superior or complementary performance of the human–AI group compared to the individual human is a core ambition for human–AI collaboration (Hemmer et al. 2021, 2025; Dellermann et al. 2019) and for stimuli-based ICSS. We empirically demonstrate that such superior or complementary performance can be achieved with the proposed GLM-based ICSS. This is relevant, as we show that such superior performance can be achieved not only in large-scale decision-making (Hemmer et al. 2021, 2025), but also in generating ideas, a creative activity traditionally considered a human comparative strength (Dellermann et al. 2019).
Our findings align with recent meta-analytic evidence by Holzner et al. (2025) and seem to confirm prior findings for stimuli-provider ICSS (Wang and Nickerson 2017). We show that a GLM-based ICSS can enhance brainstorming performance. However, our in-depth analysis reveals a contradiction: humans produced fewer ideas – rather than more, as would typically be expected when external stimuli are present and cognitive stimulation is induced – yet they nonetheless reported feeling subjectively inspired. This suggests that different mechanisms drive performance improvement at the group level in traditional ICSS and GLM-based ICSS, and might hint at undesired side effects. We offer potential explanations based on the difference in capability and interaction: the GLM contributes complete ideas instead of cues, but the human needs to review those ideas, which may induce additional cognitive load (Tankelevitch et al. 2024). We highlight that mechanisms from human groups, which are increasingly used to investigate human–AI groups (Siemon et al. 2015; Stieglitz et al. 2022), need to be applied carefully to avoid drawing incorrect conclusions. To that end, we offer a nuanced perspective on the “smart loafing” construct (Stieglitz et al. 2022) based on our findings.

5.3 Implications for Practice

While free-text, conversational GLM interactions, such as those with ChatGPT, are possible, prompting can be challenging for novices (Zamfirescu-Pereira et al. 2023). Apps can make GLM capabilities available by automatically constructing prompts, abstracting away the GLM interaction, which could be referred to as embedded GLM capabilities. However, this might change user behavior in undesired ways, or, as one participant put it, “I became lazy” (P49). Users might overly rely on the suggestions without critically engaging with them (Stieglitz et al. 2022). Such overreliance might be particularly problematic if it results in creative humans, or humans who possess unique knowledge not accessible to the GLM, contributing fewer ideas, particularly because the most creative ideas still originate from humans (Haase and Hanel 2023). Developers should, therefore, focus on sociotechnical design to address undesired behavioral effects, thereby preventing productivity losses (Alavi et al. 2024) and fostering a symbiotic collaboration between GLMs and humans (Benbya et al. 2024; Feuerriegel et al. 2024). Tool designers may explore, e.g., providing only partial suggestions to “force” humans to actively engage with GLM ideas (Buçinca et al. 2021), which is a prerequisite for cognitive stimulation (Dugosh et al. 2000); change the timing of GLM ideas to encourage humans to think for themselves before working with a GLM, as suggested by some participants, or use dynamic feedback to encourage re-combining and improving ideas (cf. Di Fede et al. 2022; Javadi et al. 2013; Gordetzki et al. 2023), which is a core idea of brainstorming (Osborn 1953).
Our findings have implications for managing employees who use GLMs. As we observe significant improvements in performance (hypothesis set 2), managers may want to encourage employees to explore the potential of GLMs for their work (Dell’Acqua et al. 2023). However, as we observe a strong correlation between the human contribution share and felt responsibility alongside responsibility attribution to the GLM, managers could educate employees about such expected changes in felt responsibility and consider clarifying their stance on responsibility for work results. Additionally, managers should consider this new way of working when assessing the employees’ performance, for example, by considering both critical engagement with the GLM and the outcome.

5.4 Limitations

Our study has several limitations. The sensitivity analysis showed that our sample size was adequate for detecting medium-to-large effects but underpowered for smaller ones; therefore, both the non-significant outcomes and the significant human fluency effect (Cohen’s d = 0.581, marginally below the 80% power threshold) should be interpreted with caution. We therefore provide qualitative interview data to contextualize both significant and non-significant quantitative findings. In addition, the sample – composed primarily of informatics students – limits generalizability, as such students may have greater prior exposure to and comfort with GLMs, potentially influencing their interaction behavior. Future research should replicate and extend these findings with larger and more diverse samples. Moreover, our participants worked on a general societal question that required no specific knowledge. It might be worthwhile to explore whether and how the participant-task-fit influences GLM usage behavior, for example, by varying the required level of task expertise and the participants’ level of expertise, such as assigning programming problems to informatics students. This might be particularly interesting because prior research has highlighted the potential moderating effect of domain expertise (Holzner et al. 2025).
Our study was limited to a specific understanding of brainstorming performance – specifically, idea fluency, flexibility, novelty, and value – but other dimensions exist (Dean et al. 2006). We selected these dimensions because they are commonly used in brainstorming research (Paulus 2000; Althuizen and Reichel 2016; Reinig et al. 2007). For group effects, we covered cognitive stimulation, cognitive inertia, and free riding (with smart-loafing adaptation, Stieglitz et al. 2022). While there are more potential effects (Pinsonneault et al. 1999), previous exploratory research has revealed those effects to be relevant for this GLM-based brainstorming setting (Memmert and Tavanapour 2023). Additionally, we have focused on dyads of one human and one GLM to investigate effects in isolation. However, GLMs will likely impact dynamics when added to multiple-human brainstorming groups (cf. Gonzalez et al. 2024; Muller et al. 2024), which future research should investigate.
Our brainstorming question addresses a societal problem, and such problems in general, and this question in particular, have been used in prior brainstorming research (Zhu et al. 2021). While we did not investigate other questions – limiting generalizability – our technical approach was not tailored to the specific question. On the contrary, the question could easily be replaced in our prompt template. Prior research has shown that GLMs can generate creative ideas across various topics (Haase and Hanel 2023); nonetheless, future research should investigate the robustness of these results across different topics.
Lastly, our study relied on GPT-3.5 as the embedded GLM, which represented the state of the art at the time of data collection. Generative AI is a rapidly evolving field, and newer models have been shown to outperform earlier ones, including in creative idea generation when assessed in isolation (Haase and Hanel 2023). However, such technical advances do not necessarily translate into improved human–AI creative performance, as shown in the meta-analysis by Holzner et al. (2025), i.e., increases in model capability may change rather than uniformly enhance co-creative outcomes. Nevertheless, the behavioral mechanisms we identify, such as the redistribution of effort, changes in responsibility attribution, and potential cognitive load trade-offs, may persist across model generations (cf. Memmert and Tavanapour 2023).

5.5 Outlook

We observed humans assuming the roles of both creators and curators. Future research should further investigate group dynamics and roles when working with GLMs. This may include checking whether patterns for GLM usage emerge, how and in which moments mechanisms like cognitive stimulation unfold, and how cognitive load is affected in these situations. To that end, it may be beneficial to combine interaction log data with observational or psychometric measures. With the increased embedding of GLM capabilities in apps, supporting humans in finding the right role balance and adapting favorably (see Benke et al. 2024) will become crucial. Along with taking on multiple roles, we observed that participants did not fully attribute responsibility to themselves. It will be important to understand the effect on behavior and how responsibility attribution can be influenced. Additionally, it will be important to understand how these group mechanisms change when systems with (higher) agency are introduced, e.g., if the human does not have to trigger the GLM but it contributes automatically (cf. Specker et al. 2025).
Regarding how generative AI will affect jobs (Dell’Acqua et al. 2023), we show that not only repetitive tasks (Dellermann et al. 2019) but also creative tasks can benefit from generative AI. However, creativity ranges from everyday creativity to genius-level creativity (Haase and Hanel 2023), underscoring the need for further research on tasks and people. For the former, the question arises as to which open-ended (creative) tasks can benefit from GLM support. For the latter, the question arises as to who benefits from such tools (Dell’Acqua et al. 2023; Benbya et al. 2024), i.e., do creative and less creative individuals benefit similarly (Wang and Nickerson 2017)? Lastly, future research should explore the broader implications of GLM usage, both for organizations and for individuals. While we find that individuals benefit in the short term by achieving higher creative performance on a dyad level, recent research suggests that, when taken together, GLM-assisted humans produce less diverse ideas compared to unaided humans (Holzner et al. 2025). Moreover, in the long term, individuals may exhibit a reduced level of engagement and creative thinking ability when using GLMs (Kosmyna et al. 2025; Kumar et al. 2025), raising the question of how tools should be designed for leveraging the creative potential of GLMs while mitigating negative long-term effects on engagement and creative ability.

6 Conclusions

As GLMs become more widely adopted, new opportunities for human–AI collaboration in solving complex problems emerge. To ensure the effectiveness of such collaboration, it is crucial to understand the group dynamics when working alongside AI (Makarius et al. 2020; Benbya et al. 2024). In our experiment, we observed higher performance by the human–GLM dyads compared to the solitary humans. However, humans working with the GLM contributed fewer ideas themselves, perhaps because they had to balance the roles of creator and curator of AI ideas. This role split was accompanied by most participants attributing less than full responsibility to themselves, despite being the only human in the group. We believe future research should explore this role change and support striking a balance between roles when individual human work is transformed into collaborative human–AI work.

Acknowledgements

This research was funded by the German Federal Ministry of Education and Research (BMBF) in the context of the project HyMeKI (reference number: 01IS20057).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Title
Brainstorming with a Generative Language Model: Effect of Exposure to AI Ideas on Brainstorming Performance and Cognitive Load
Authors
Lucas Memmert
Izabel Cvetkovic
Navid Tavanapour
Eva Bittner
Publication date
18-11-2025
Publisher
Springer Fachmedien Wiesbaden
Published in
Business & Information Systems Engineering
Print ISSN: 2363-7005
Electronic ISSN: 1867-0202
DOI
https://doi.org/10.1007/s12599-025-00974-y

Supplementary Information

Below is the link to the electronic supplementary material.
go back to reference Alavi M, Leidner DE, Mousavi R (2024) Knowledge management perspective of generative artificial intelligence. J Assoc Inf Syst 25(1):1–12. https://doi.org/10.17705/1jais.00859CrossRef
go back to reference Althuizen N, Reichel A (2016) The effects of IT-enabled cognitive stimulation tools on creative problem solving: a dual pathway to creativity. J Manag Inf Syst 33(1):11–44. https://doi.org/10.1080/07421222.2016.1172439CrossRef
go back to reference Baruah J, Paulus PB (2016) The role of time and category relatedness in electronic brainstorming. Small Group Res 47(3):333–342. https://doi.org/10.1177/1046496416642296CrossRef
go back to reference Benbya H, Strich F, Tamm T (2024) Navigating generative artificial intelligence promises and perils for knowledge and creative work. J Assoc Inf Syst 25(1):23–36. https://doi.org/10.17705/1jais.00861CrossRef
go back to reference Bender EM, Gebru T, McMillan-Major A, Shmitchell S (2021) On the dangers of stochastic parrots. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM. https://doi.org/10.1145/3442188.3445922
go back to reference Benke I, Knierim M, Adam M, Beigl M, Dorner V, Ebner-Priemer U, Herrmann M, Klarmann M, Maedche A, Nafziger J, Nieken P, Pfeiffer J, Puppe C, Putze F, Scheibehenne B, Schultz T, Weinhardt C (2024) Hybrid adaptive systems. Bus Inf Syst Eng 66(2):233–247. https://doi.org/10.1007/s12599-024-00861-yCrossRef
go back to reference Bouschery SG, Blazevic V, Piller FT (2023) Augmenting human innovation teams with artificial intelligence: exploring transformer-based language models. J Prod Innov Manag 40(2):139–153. https://doi.org/10.1111/jpim.12656CrossRef
go back to reference Boussioux L, Lane JN, Zhang M, Jacimovic V, Lakhani KR (2023) The crowdless future? How generative AI is shaping the future of human crowdsourcing. Harvard Business Sch 35(5):1589–1607. https://doi.org/10.2139/ssrn.4533642CrossRef
go back to reference Brachten F, Brünker F, Frick NRJ, Ross B, Stieglitz S (2020) On the ability of virtual agents to decrease cognitive load: an experimental study. Inf Syst e-Bus Manag 18(2):187–207. https://doi.org/10.1007/s10257-020-00471-7CrossRef
go back to reference Briggs RO, Reinig BA (2010) Bounded ideation theory. J Manag Inf Syst 27(1):123–144. https://doi.org/10.2753/MIS0742-1222270106CrossRef
go back to reference Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp 1877–1901. https://doi.org/10.5555/3495724.3495883
go back to reference Buçinca Z, Malaya MB, Gajos KZ (2021) To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. In: Proceedings of the ACM on Human-Computer Interaction 5(CSCW1). https://doi.org/10.1145/3449287
go back to reference Cheng X, Fu S, de Vree T, de Vree G-J, Seeber I, Maier R, Weber B (2020) Idea convergence quality in open innovation crowdsourcing: a cognitive load perspective. J Manag Inf Syst 37(2):349–376. https://doi.org/10.1080/07421222.2020.1759344CrossRef
go back to reference Coskun H, Paulus PB, Brown V, Sherwood JJ (2000) Cognitive stimulation and problem presentation in idea-generating groups. Group Dyn Theor Res Pract 4(4):307–329. https://doi.org/10.1037/1089-2699.4.4.307CrossRef
go back to reference Dean D, Hender J, Rodgers T, Santanen E (2006) Identifying quality, novel, and creative ideas: constructs and scales for idea evaluation. J Assoc Inf Syst 7(10):646–699. https://doi.org/10.17705/1jais.00106CrossRef
go back to reference Dell’Acqua F, McFowland E, Mollick ER, Lifshitz-Assaf H, Kellogg K, Rajendran S, Krayer L, Candelon F, Lakhani KR (2023) Navigating the jagged technological frontier: field experimental evidence of the effects of AI on knowledge worker productivity and quality. Harvard Business School Working Paper
go back to reference Dellermann D, Ebel P, Söllner M, Leimeister JM (2019) Hybrid intelligence. Bus Inf Syst Eng 61(5):637–643. https://doi.org/10.1007/s12599-019-00595-2CrossRef
go back to reference Demir S, Fuegener A, Gupta A, Weinmann M (2024) The effect of AI support on Torrance’s creativity dimensions: Evidence from an online experiment. In: ECIS 2024 Proceedings. https://aisel.aisnet.org/ecis2024/track09_coghbis/track09_coghbis/17/
go back to reference Dennis AR, Minas RK, Bhagwatwar AP (2013) Sparking creativity: improving electronic brainstorming with individual cognitive priming. J Manag Inf Syst 29(4):195–216. https://doi.org/10.2753/MIS0742-1222290407CrossRef
go back to reference Diehl M, Stroebe W (1987) Productivity loss in brainstorming groups: toward the solution of a riddle. J Pers Soc Psychol 53(3):497–509. https://doi.org/10.1037/0022-3514.53.3.497CrossRef
go back to reference Dugosh KL, Paulus PB, Roland EJ, Yang HC (2000) Cognitive stimulation in brainstorming. J Pers Soc Psychol 79(5):722–735. https://doi.org/10.1037/0022-3514.79.5.722CrossRef
go back to reference Faul F, Erdfelder E, Buchner A, Lang A-G (2009) Statistical power analyses using G* power 3.1: tests for correlation and regression analyses. Behav Res Methods 41(4):1149–1160. https://doi.org/10.3758/BRM.41.4.1149CrossRef
go back to reference Di Fede G, Rocchesso D, Dow SP, Andolina S (2022) The Idea Machine: LLM-based expansion, rewriting, combination, and suggestion of ideas. In: Creativity and Cognition. ACM. https://doi.org/10.1145/3527927.3535197
go back to reference Feuerriegel S, Hartmann J, Janiesch C, Zschech P (2024) Generative AI. Bus Inf Syst Eng 66(1):111–126. https://doi.org/10.1007/s12599-023-00834-7CrossRef
go back to reference Gabriel A, Monticolo D, Camargo M, Bourgault M (2016) Creativity support systems: a systematic mapping study. Think Skills Creat 21:109–122. https://doi.org/10.1016/j.tsc.2016.05.009CrossRef
go back to reference Gonzalez GE, Moran DAS, Houde S, He J, Ross SI, Muller MJ, Kunde S, Weisz JD (2024) Collaborative Canvas: A tool for exploring LLM use in group ideation tasks. In: IUI Workshops. https://hai-gen.github.io/2024/papers/1541-Gonzalez.pdf
go back to reference Gordetzki P, Blohm I, Hofstetter R (2023) Generative AI in idea development: The role of numeric and visual feedback. In: ICIS 2023 Proceedings. https://aisel.aisnet.org/icis2023/techandfow/techandfow/14
go back to reference Gozzo M, Woldendorp MK, Rooij A de (2022) Creative collaboration with the “brain” of a search engine: Effects on cognitive stimulation and evaluation apprehension. In: Wölfel M, et al (eds) ArtsIT, Interactivity and Game Creation. Springer, pp 209–223. https://doi.org/10.1007/978-3-030-95531-1_15
go back to reference Haase J, Hanel PHP (2023) Artificial muses: generative artificial intelligence chatbots have risen to human-level creativity. J Creat 33(3):100066. https://doi.org/10.48550/arXiv.2303.12003CrossRef
go back to reference Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In: Human Mental Workload. Elsevier, pp 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
go back to reference Hemmer P, Schemmer M, Kühl N, Vössing M, Satzger G (2025) Complementarity in human-AI collaboration: concept, sources, and evidence. Eur J Inf Syst. https://doi.org/10.1080/0960085X.2025.2475962CrossRef
go back to reference Hemmer P, Schemmer M, Vössing M, Kühl N (2021) Human-AI complementarity in hybrid intelligence systems: A structured literature review. In: 25th Pacific Asia Conference on Information Systems, Dubai. https://aisel.aisnet.org/pacis2021/78
go back to reference Holzner N, Maier S, Feuerriegel S (2025) Generative AI and creativity: A systematic literature review and meta-analysis. https://arxiv.org/abs/2505.17241v1
go back to reference Houde S, Brimijoin K, Muller M, Ross SI, Silva Moran DA, Gonzalez GE, Kunde S, Foreman MA, Weisz JD (2025) Controlling AI agent participation in group conversations: A human-centered approach. In: IUI '25 Proceedings. ACM. https://doi.org/10.1145/3708359.3712089
go back to reference Hsieh H-F, Shannon SE (2005) Three approaches to qualitative content analysis. Qual Health Res 15(9):1277–1288. https://doi.org/10.1177/1049732305276687CrossRef
go back to reference Hwang AH-C, Won AS (2021) IdeaBot: Investigating social facilitation in human-machine team creativity. In: CHI '21 Proceedings. ACM. https://doi.org/10.1145/3411764.3445270
go back to reference JASP (2023) (Version 0.17) [Computer software]
go back to reference Javadi E, Gebauer J, Mahoney J (2013) The impact of user interface design on idea integration in electronic brainstorming: an attention-based view. J Assoc Inf Syst 14(1):1–21. https://doi.org/10.17705/1jais.00322CrossRef
go back to reference Jiang M, Karanasios S, Breidbach C (2024) Generative AI in the wild: An exploratory case study of knowledge workers. In: ECIS 2024 Proceedings. https://aisel.aisnet.org/ecis2024/track04impactai/track04impactai/7
go back to reference Kerr NL, Bruun SE (1983) Dispensability of member effort and group motivation losses: free-rider effects. J Pers Soc Psychol 44(1):78–94. https://doi.org/10.1037/0022-3514.44.1.78CrossRef
go back to reference Kohn NW, Smith SM (2011) Collaborative fixation: effects of others’ ideas on brainstorming. Appl Cogn Psychol 25(3):359–371. https://doi.org/10.1002/acp.1699CrossRef
go back to reference Koivisto M, Grassini S (2023) Best humans still outperform artificial intelligence in a creative divergent thinking task. Sci Rep 13(1):13601. https://doi.org/10.1038/s41598-023-40858-3CrossRef
go back to reference Kosmyna N, Hauptmann E, Yuan YT, Situ J, Liao X-H, Beresnitzky AV, Braunstein I, Maes P (2025) Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task. https://arxiv.org/abs/2506.08872v1
go back to reference Kumar H, Vincentius J, Jordan E, Anderson A (2025) Human creativity in the age of LLMs: Randomized experiments on divergent and convergent thinking. In: CHI'25 Proceedings. https://doi.org/10.1145/3706598.3714198
go back to reference Lamm H, Trommsdorff G (1973) Group versus individual performance on tasks requiring ideational proficiency (brainstorming): a review. Eur J Soc Psychol 3(4):361–388. https://doi.org/10.1002/ejsp.2420030402CrossRef
go back to reference Latané B, Williams K, Harkins S (1979) Many hands make light the work: the causes and consequences of social loafing. Eur J Soc Psychol 37(6):822–832. https://doi.org/10.1037/0022-3514.37.6.822CrossRef
go back to reference LeBreton JM, Senter JL (2008) Answers to 20 questions about interrater reliability and interrater agreement. Organ Res Methods 11(4):815–852. https://doi.org/10.1177/1094428106296642CrossRef
go back to reference Lee BC, Chung JJ (2024) An empirical investigation of the impact of ChatGPT on creativity. Nat Hum Behav 8(10):1906–1914. https://doi.org/10.1038/s41562-024-01953-1CrossRef
go back to reference Leggett Dugosh K, Paulus PB (2005) Cognitive and social comparison processes in brainstorming. J Exp Soc Psychol 41(3):313–320. https://doi.org/10.1016/j.jesp.2004.05.009CrossRef
go back to reference Lin S, Hilton J, Evans O (2022) TruthfulQA: Measuring how models mimic human falsehoods. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.229
go back to reference Liu M, Ke W, Xu DJ (2023) Will humans be free-riders? The effects of expectations for AI on human-AI team performance. In: PACIS 2023 Proceedings. https://aisel.aisnet.org/pacis2023/20
go back to reference Maaravi Y, Heller B, Shoham Y, Mohar S, Deutsch B (2021) Ideation in the digital age: literature review and integrative model for electronic brainstorming. Rev Manag Sci 15(6):1431–1464. https://doi.org/10.1007/s11846-020-00400-5CrossRef
go back to reference Makarius EE, Mukherjee D, Fox JD, Fox AK (2020) Rising with the machines: a sociotechnical framework for bringing artificial intelligence into the organization. J Bus Res 120:262–273. https://doi.org/10.1016/j.jbusres.2020.07.045CrossRef
go back to reference Mayring P (2014) Qualitative content analysis. Theoretical foundation, basic procedures and software solution. Klagenfurt
go back to reference McGraw KO, Wong SP (1996) Forming inferences about some intraclass correlation coefficients. Psychol Methods 1(1):30–46. https://doi.org/10.1037/1082-989X.1.1.30CrossRef
go back to reference Memmert L, Soroko D, Bittner E (2025) From effort reduction to effort management: an expectancy theory perspective on professionals’ work practices with generative AI. Bus Inf Syst Eng. https://doi.org/10.1007/s12599-025-00960-4CrossRef
go back to reference Memmert L, Bittner E (2024) Human-AI collaboration for brainstorming: Effect of the presence of AI ideas on breadth of exploration. In: HICSS57 Proceedings. https://aisel.aisnet.org/hicss-57/cl/machines_as_teammates/6
go back to reference Memmert L, Tavanapour N (2023) Towards human-AI collaboration in brainstorming: Empirical insights into the perception of working with a generative AI. In: 31st European Conference on Information Systems. https://aisel.aisnet.org/ecis2023_rp/429
go back to reference Memmert L, Cvetkovic I, Bittner E (2024a) The more is not the merrier: Effects of prompt engineering on the quality of ideas generated by GPT-3. In: Proceedings of the 57th Hawaii International Conference on System Sciences, Honolulu, pp 7520–7529. https://hdl.handle.net/10125/107289
go back to reference Memmert L, Mies J, Bittner E (2024b) Brainstorming with a generative language model: The role of creative ability and tool-support for brainstorming performance. In: Proceedings of the 45th International Conference on Information Systems (ICIS). https://aisel.aisnet.org/icis2024/aiinbus/aiinbus/7/
go back to reference Memmert L (2024) Brainstorming with a generative language model: Understanding performance through brainstorming group effects. In: ECIS 2024 Proceedings. https://aisel.aisnet.org/ecis2024/track06_humanaicollab/track06_humanaicollab/1
go back to reference Michinov N (2012) Is electronic brainstorming or brainwriting the best way to improve creative performance in groups? An overlooked comparison of two idea-generation techniques. J Appl Soc Psychol 42:E222–E243. https://doi.org/10.1111/j.1559-1816.2012.01024.xCrossRef
go back to reference Mirbabaie M, Stieglitz S, Brünker F, Hofeditz L, Ross B, Frick NRJ (2021) Understanding collaboration with virtual assistants – the role of social identity and the extended self. Bus Inf Syst Eng 63(1):21–37. https://doi.org/10.1007/s12599-020-00672-xCrossRef
go back to reference Muller M, Houde S, Gonzalez G, Brimijoin K, Ross SI, Moran DAS, Weisz JD (2024) Group brainstorming with an AI agent: Creating and selecting ideas. In: International Conference on Computational Creativity. https://computationalcreativity.net/iccc24/papers/ICCC24_paper_18.pdf
go back to reference Nass C, Moon Y (2000) Machines and mindlessness: social responses to computers. J Soc Iss 56(1):81–103. https://doi.org/10.1111/0022-4537.00153CrossRef
go back to reference Nijstad BA, de Dreu CKW, Rietzschel EF, Baas M (2010) The dual pathway to creativity model: creative ideation as a function of flexibility and persistence. Eur Rev Soc Psychol 21(1):34–77. https://doi.org/10.1080/10463281003765323CrossRef
go back to reference NISO CRediT Working Group (2022) ANSI/NISO Z39.104–2022, CRediT, Contributor Roles Taxonomy. NISO, Baltimore
go back to reference Nomura M, Ito T, Ding S (2024) Towards collaborative brainstorming among humans and AI agents: An implementation of the IBIS-based Brainstorming support system with multiple AI agents. In: ACM Collective Intelligence Conference Proceedings. ACM. https://doi.org/10.1145/3643562.3672609
go back to reference Organisciak P, Acar S, Dumas D, Berthiaume K (2023) Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models. Think Skills Creat 49:101356. https://doi.org/10.1016/j.tsc.2023.101356CrossRef
go back to reference Osborn AF (1953) Applied imagination: principles and procedures of creative thinking. Scribner, New York
go back to reference Paas F, Renkl A, Sweller J (2003) Cognitive load theory and instructional design: recent developments. Educ Psychol 38(1):1–4. https://doi.org/10.1207/S15326985EP3801_1CrossRef
go back to reference Paulus P (2000) Groups, teams, and creativity: the creative potential of idea-generating groups. Appl Psychol 49(2):237–262. https://doi.org/10.1111/1464-0597.00013CrossRef
go back to reference Pinsonneault A, Barki H, Gallupe RB, Hoppen N (1999) Electronic brainstorming: the illusion of productivity. Inf Syst Res 10(2):110–133. https://doi.org/10.1287/isre.10.2.110CrossRef
go back to reference Rafner J, Beaty RE, Kaufman JC, Lubart T, Sherson J (2023) Creativity in the age of generative AI. Nat Hum Behav 7(11):1836–1838. https://doi.org/10.1038/s41562-023-01751-1CrossRef
go back to reference Redifer JL, Bae CL, Zhao Q (2021) Self-efficacy and performance feedback: Impacts on cognitive load during creative thinking. Learn Instruct 71:101395. https://doi.org/10.1016/j.learninstruc.2020.101395CrossRef
go back to reference Reinig BA, Briggs RO, Nunamaker JF (2007) On the measurement of ideation quality. J Manag Inf Syst 23(4):143–161. https://doi.org/10.2753/MIS0742-1222230407CrossRef
go back to reference Santanem EL, Briggs RO, de Vreede G-J (2004) Causal relationships in creative problem solving: comparing facilitation interventions for ideation. J Manag Inf Syst 20(4):167–198. https://doi.org/10.1080/07421222.2004.11045783CrossRef
go back to reference Sawyer RK (2021) The iterative and improvisational nature of the creative process. J Creat 31:100002. https://doi.org/10.1016/j.yjoc.2021.100002CrossRef
go back to reference Schneider J, Meske C, Kuss P (2024) Foundation models. Bus Inf Syst Eng 66(2):221–231. https://doi.org/10.1007/s12599-024-00851-0CrossRef
go back to reference Siangliulue P, Arnold KC, Gajos KZ, Dow SP (2015a) Toward collaborative ideation at scale. In: CSCW '15 Proceedings. ACM, pp 937–945. https://doi.org/10.1145/2675133.2675239
go back to reference Siangliulue P, Chan J, Gajos KZ, Dow SP (2015b) Providing timely examples improves the quantity and quality of generated ideas. In: C&C '15: Creativity and Cognition. ACM, pp 83–92. https://doi.org/10.1145/2757226.2757230
go back to reference Siemon D (2022a) Elaborating team roles for artificial intelligence-based teammates in human-AI collaboration. Group Decis Negot 31(5):871–912. https://doi.org/10.1007/s10726-022-09792-zCrossRef
go back to reference Siemon D (2022b) Let the computer evaluate your idea: evaluation apprehension in human-computer collaboration. Behav Inf Technol. https://doi.org/10.1080/0144929X.2021.2023638CrossRef
go back to reference Siemon D, Wank F (2021) Collaboration with AI-based teammates – Evaluation of the social loafing effect. In: PACIS 2021 Proceedings. https://aisel.aisnet.org/pacis2021/146/
go back to reference Siemon D, Eckardt L, Robra-Bissantz S (2015) Tracking down the negative group creativity effects with the help of an artificial intelligence-like support system. In: 48th HICSS Proceedings. https://doi.org/10.1109/HICSS.2015.37
go back to reference Specht AR, Buck EB (2019) Crowdsourcing change: An analysis of Twitter discourse on food waste and reduction strategies. J Appl Commun 103(2):8. https://doi.org/10.4148/1051-0834.2240CrossRef
go back to reference Specker RJ, Bucher A, Katsiuba D, Dolata M (2025) An extra brain in the room: Enhancing large group brainstorming with an autonomous AI-agent. In: ECIS 2025 Proceedings. https://aisel.aisnet.org/ecis2025/humanai/humanai/15
go back to reference Stieglitz S, Mirbabaie M, Möllmann NRJ, Rzyski J (2022) Collaborating with virtual assistants in organizations: analyzing social loafing tendencies and responsibility attribution. Inf Syst Front J Res Innov 24(3):745–770. https://doi.org/10.1007/s10796-021-10201-0CrossRef
go back to reference Summers-Stay D, Voss CR, Lukin SM (2023) Brainstorm, then select: A generative language model improves its creativity score. In: The AAAI-23 Workshop on Creative AI Across Modalities
go back to reference Sweller J (1988) Cognitive load during problem solving: effects on learning. Cogn Sci 12(2):257–285. https://doi.org/10.1016/0364-0213(88)90023-7CrossRef
go back to reference Tankelevitch L, Kewenig V, Simkute A, Scott AE, Sarkar A, Sellen A, Rintel S (2024) The metacognitive demands and opportunities of generative AI. In: CHI '24 Proceedings. https://doi.org/10.1145/3613904.3642902
go back to reference Tao Y, Yoo C, and Animesh A (2023) AI plus other technologies? The impact of ChatGPT and creativity support systems on individual creativity. In: ICIS 2023 Proceedings. https://aisel.aisnet.org/icis2023/aiinbus/aiinbus/1
go back to reference Vaccaro M, Almaatouq A, Malone T (2024) When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat Hum Behav 8(12):2293–2303. https://doi.org/10.1038/s41562-024-02024-1CrossRef
go back to reference von Krogh G (2018) Artificial intelligence in organizations: new opportunities for phenomenon-based theorizing. Acad Manag Discov 4(4):404–409. https://doi.org/10.3929/ethz-b-000320207CrossRef
go back to reference Wang K, Nickerson JV (2017) A literature review on individual creativity support systems. Comput Hum Behav 74:139–151. https://doi.org/10.1016/j.chb.2017.04.035CrossRef
go back to reference Wang K, Nickerson JV (2019) A wikipedia-based method to support creative idea generation: the role of stimulus relatedness. J Manag Inf Syst 36(4):1284–1312. https://doi.org/10.1080/07421222.2019.1661095CrossRef
go back to reference Wu Z, Ji D, Yu K, Zeng X, Wu D, Shidujaman M (2021) AI creativity and the human-AI co-creation model. In: Kurosu M (ed) Human-computer interaction. Theory, methods and tools. Springer, Cham, pp 171–190. https://doi.org/10.1007/978-3-030-78462-1_13
go back to reference Zamfirescu-Pereira JD, Wong RY, Hartmann B, Yang Q (2023) Why Johnny can’t prompt: How non-AI experts try (and fail) to design LLM prompts. In: CHI '23 Proceedings. ACM. https://doi.org/10.1145/3544548.3581388
go back to reference Zhan X, Sun H, Fang Y (2024) The Janus-faced effects of ideation with generative AI on individual creativity and dignity. In: ICIS 2024 Proceedings. https://aisel.aisnet.org/icis2024/userbehav/userbehav/23
go back to reference Zhang Z, Peng W, Chen X, Cao L, Li TJ-J (2025) LADICA: A large shared display interface for generative AI cognitive assistance in co-located team collaboration. In: CHI'25 Proceedings. ACM. https://doi.org/10.1145/3706598.3713289
go back to reference Zheng L, Chiang W-L, Sheng Y, Zhuang S, Wu Z, Zhuang Y, Lin Z, Li Z, Li D, Xing E, Zhang H, Gonzalez JE, Stoica I (2023) Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. In: Advances in Neural Information Processing Systems, pp 46595–46623. https://proceedings.neurips.cc/paper_files/paper/2023/file/91f18a1287b398d378ef22505bf41832-Paper-Datasets_and_Benchmarks.pdf
go back to reference Zhu Y, Ritter SM, Dijksterhuis A (2021) The effect of rank-ordering strategy on creative idea selection performance. Eur J Soc Psych 51(2):360–376. https://doi.org/10.1002/ejsp.2743CrossRef

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG