Skip to main content

Open Access 08.01.2025

Accelerating social science knowledge production with the coordinated open-source model

verfasst von: Konrad Turek

Erschienen in: Quality & Quantity

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the growing complexity of knowledge production, social science must accelerate and open up to maintain explanatory power and responsiveness. This goal requires redesigning the front end of the research to build an open and expandable knowledge infrastructure that stimulates broad collaborations, enables breaking down inertia and path dependencies of conventional approaches, and boosts discovery and innovation. This article discusses the coordinated open-source model as a promising organizational scheme that can supplement conventional research infrastructure in certain areas. The model offers flexibility, decentralization, and community-based development and aligns with open science ideas, such as reproducibility and transparency. Similar solutions have been successfully applied in natural science, but social science needs to catch up. I present the model’s design and consider its potential and limitations (e.g., regarding development, sustainability, and coordination). I also discuss open-source applications in various areas, including a case study of an open-source survey harmonization project Comparative Panel File.
Hinweise

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The growing complexity of knowledge production requires social science to reconsider its collaborative and management approaches in order to accelerate and advance (Hofman et al. 2021; King 1995; Vazire 2018). The dominant conventional knowledge infrastructures – based on institutionalized, top-down decision-making processes – tend to reproduce certain methods of knowledge production. This inherent path dependence can limit science’s ability to understand and solve societal problems, stimulate innovations, and incorporate novel developments (Hirschman 2021; Spirling 2023; Sterman and Wittenberg 1999). As a result, researchers are increasingly considering alternative ways to open and connect research processes, utilize diverse resources, and share work. Since ad hoc and occasional efforts do not guarantee a cumulative scientific process, an efficient infrastructure for managing such collaborations is needed. In particular, the increasingly computational nature of social science requires new platforms for data management and new ways of interdisciplinary team collaboration (Lazer et al. 2020). One organizational model that has been successfully applied in various areas of (mainly natural) science is crowd- and open-source collaboration (Franzoni and Sauermann 2014; Moshontz et al. 2018). However, for some reason, it rarely appears in social science.
This article considers the application of the open-source model in social science, together with its benefits and limitations. The potential benefits include, first, flexibility, decentralized control, and community-based development, which can facilitate opening the knowledge infrastructure (i.e., an established way to collect, process, and distribute data, Hirschman 2021). As such, the model can help break down inertia and path dependencies and stimulate novel research applications. And all of this at a fraction of the costs of a typical conventional alternative. Second, it can potentially allow social science to better identify and respond to societal challenges, e.g., by improving access to and the processing speed of high-quality data. For example, the response to the pandemic showed that crowd-based science can efficiently integrate global-scale efforts (Altman and Cohen 2022; Callaway 2020). Third, the open-source model aligns with the priorities of the open science movement and emanates the fundamental idea of science as a collaborative process, where researchers benefit from sharing their efforts and contribute to faster and more ambitious scientific progress.
However, the open-source model also has its limitations, especially when compared to more established, conventional, and expert-based initiatives. Some major limitations include organizational challenges to facilitate and incentivize collaborations, issues related to expertise coordination and quality control, and long-term sustainability of the initiatives. This article adds to the discussion on open-source management in social science by integrating lessons from other disciplines and software development, as well as experiences from a case study of the open-source survey harmonization project. In particular, I argue that the open-source model can supplement the more conventional research infrastructure in certain areas of social science, helping to advance and accelerate it. However, a certain level of coordination of bottom-up processes and adequate technical solutions are required to stimulate both the applications and crowd-based development. The article begins by discussing the role of an open knowledge ecosystem for scientific advancement. Then, I present the design, potential, and limitations of the open-source model in social science knowledge production. As a case study, I will discuss the survey harmonization project Comparative Panel File (CPF) and evaluate its first three years of functioning (Turek et al. 2021). Finally, I will consider other potential areas of implementation and critical success factors.

2 Opening the knowledge ecosystem

The trend toward open and transparent science is changing how knowledge is produced, verified, and distributed (Altman and Cohen 2022; Callaway 2020). At the fundamental level, “opening” science means making knowledge more transparent, accessible, reproducible, and reliable. Many relatively simple practices support these goals, e.g., open-access publishing, preprints, preregistration, replications, and transparency in data management (Altman and Cohen 2022; Firebaugh 2007; Freese et al. 2022; Manago 2023; Nosek et al. 2015). In the last decade, we have also witnessed large-scale initiatives to reform the more broadly defined scholarly knowledge infrastructure (Altman and Cohen 2022; Edwards et al. 2013), such as building institutional networks, open data infrastructures, and data hubs. Examples can be found in many countries, including Open Data Infrastructure for Social Science and Economic Innovations (Odissei) in the Netherlands, Dataverse Network, Harvard Dataverse, and openICPSR (Freese and King 2018; Gerring et al. 2020; Kapiszewski and Karcher 2020; King 2007; Moshontz et al. 2018).
Despite many advancements, social science knowledge infrastructure is still constrained by deeper structural problems that limit its explanatory power and responsiveness. The explanatory power refers to the ability to describe the observed phenomenon using theory, reveal mechanisms or relations underlying observed data, and predict generalized conclusions for new situations (Franck 2002; Sterman and Wittenberg 1999). The explanatory power of social science is challenged by the growing complexity of scientific problems (Anderson 1972; Evans and Foster 2011; Franck 2002). Such concerns have been expressed, among others, in psychology (Vazire 2018), sociology (Freese and Peterson 2017), political science (Jacobs et al. 2021), epidemiology (Galea et al. 2010; Zuiderwijk et al. 2024), organizational research (Anderson 1999; Starbuck 2006), ethnography (Murphy et al. 2021), and ecological research (Low-Décarie et al. 2014). Social sciences increasingly use sophisticated theories and advanced methodological, statistical, and computational approaches, which demand diverse analytical and programming skills (Edelmann et al. 2020; Jones 2009; Salganik 2017). To keep up with the growing complexity of knowledge production, “social scientists need to continue to build a common, open-source, collaborative infrastructure that makes data analysis and sharing easy” (King 2011, p. 720). Therefore, international and multidisciplinary collaborations are often becoming necessary for successful research (Woolley et al. 2015; Wuchty et al. 2007; Zuo and Zhao 2018). However, social researchers prefer smaller teams and rarely attempt to coordinate activities or share work (Auspurg and Brüderl 2021; Hucka et al. 2015).
Responsiveness of science considers the ability to identify and address societal challenges and produce practically relevant outcomes. As such, responsiveness constitutes the basis for the speed of reaction and knowledge production, applicability of this knowledge, and openness to new perspectives and approaches (Savage 2021; Starbuck 2006; Van De Ven and Johnson 2006). The COVID-19 pandemic elucidated the importance of quickly producing reliable and innovative scientific solutions (Callaway 2020; Dahlander et al. 2021; Kühne et al. 2020). Conventional academic infrastructure was limited in this regard, for example, by slow research and review process. Open data sharing was fundamental for efficient scientific cooperation, verification of the results, and implementation of life-saving solutions (Besancon et al. 2021; Lucas-Dominguez et al. 2021; Watson 2022). For example, preprints with replication files allowed quick crowd-based review (Fraser et al. 2021; Watson 2022). Crowd-based cooperation detected errors and helped retract unreliable studies, most likely faster than it could have happened in conventional expert-based systems. Briefly, the opening of knowledge production speeded up investigations, the publishing process, the verification of findings, and the development of solutions.

2.1 Limitations of social science: the case of inequality studies

Several scholars have recently discussed the problems of weak explanatory power and slow responsiveness of social science, taking the history of economic studies on inequality as an example (Hirschman 2021; Jackson 2022; Savage 2021). This debate was stimulated by the late recognition of the sharply rising economic inequalities. Although income and wealth disparities have steadily grown since the 1980s, the trend was ‘discovered’ in the economics and broader public debate only in the early 2000s. In their impactful works, Piketty and Saez (2003) and Piketty (2014) showed a progressing accumulation of incomes among the richest. These results surprised the world and reframed the public debate to focus on the ‘rich as a social problem’ (Savage 2021, p. 3).
Given the importance of this problem, Hirschman (2021) wondered why it took economists as long as two decades to identify and incorporate this problem into the public debate (we should add that research on growing inequalities existed earlier, e.g., in sociology, yet with lower impact on the public debate; see also Savage 2021). An in-depth analysis of the history of economic studies on inequality led him first to blame the fundamental limitation of the conventional system of knowledge production. Economic research of that time was focused on other questions and approaches. As he suggests, once established, the ways, tools, norms, and perspectives of a specific knowledge infrastructure make certain research questions, applications, and outcomes more “doable”. Consequently, researchers follow these established paths and those who succeed and gain more power benefit from maintaining the system. Such path dependency of scientific systems can constrain novelty, narrow the research perspectives, and limit the responsiveness and explanatory power. A similar argument has been presented in other discussions on the history of inequality studies (DiPrete and Fox-Williams 2021; Jackson 2022; Starbuck 2006; Tomaskovic-Devey and Avent-Holt 2019). For example, Jackson (2022) argues that the strong position of economists in the United States has promoted economic paradigm and income-related outcomes and marginalized alternative approaches to studying inequalities.
The next question Hirschman (2021) asked was why and how Piketty and Saez succeeded in attracting public attention. One reason that is often emphasized is their novel take on the problem of rising inequalities (e.g., using the 1% indicator instead of an abstract Gini coefficient) and a long historical perspective (Savage 2021). However, Hirschman argues that their approach was not entirely novel and did not guarantee success. Researchers have been using similar data and indicators before. What was new and necessary for the success, writes Hirschman, was the establishment of a new, stable, and open knowledge infrastructure for inequality data and sharing it with the scientific community. Specifically, Piketty and Saez made their data freely and easily available, first, by publishing a simple spreadsheet on a university webpage and later within an extensive collaborative research program. This enabled continuous, open, and stable production and distribution of data, allowing other researchers to verify their work, monitor the trends, and develop systematic research, which eventually resulted in a more significant influence on public debate.

2.2 Limitations of conventional knowledge infrastructure: the case of survey harmonization

The case of inequality studies illustrates the importance of a knowledge infrastructure and its openness. We will now turn to the case of survey harmonization to further discuss the limitations of conventional knowledge infrastructures. Ex-post survey harmonization aims to create a single comparative data file out of various surveys that are mostly not designed to be integrated. This time-demanding and complex process involves technical (e.g., different data and file structures) and conceptual (e.g., different questionnaires and sample designs) challenges. Thus, the second-order knowledge infrastructure provided by data harmonization can greatly benefit a broad research community. It saves weeks or months of harmonization work, increases the usability of existing data, provides larger data sets, and stimulates new (e.g., comparative) applications.
Over the past four decades, various large-scale initiatives have aimed at building a stable framework for ex-post survey data harmonization (Doiron et al. 2012; Tomescu-Dubrow et al. 2023; Wolf et al. 2016; Wysmułek et al. 2021). For example, Survey Data Recycling (SDR) project (which builds on previous similar initiatives) creates a multi-country, multi-year database pooled from cross-sectional surveys from over 150 countries and territories (Slomczynski and Tomescu-Dubrow 2018). One of the largest harmonization initiatives is IPUMS, which integrates census microdata collected over multiple decades in around 100 countries into one consistent database (Ruggles et al. 2023). Another example of a collaborative survey harmonization effort is QuestionLink, run by GESIS, which provides recording scripts for selected concepts that can be comparable across several surveys.
A particularly interesting area for ex-post harmonization relates to national panel studies. Long-running panel studies that follow individuals over time around the world provide stable measurements over extensive periods (even decades) with large samples and high response rates. Given the high costs of such surveys, their harmonization can stimulate the cost-effective reuse of data, facilitate comparative research, and contribute to the knowledge infrastructure of social science. However, only a few conventional harmonization initiatives of panel surveys exist in social science (Dubrow and Tomescu-Dubrow 2016). One of the flagship examples is the well-established and long-standing survey harmonization project, the Cross-National Equivalent File (CNEF) (Burkhauser et al. 2001; Frick et al. 2007; Lillard 2023). CNEF integrates extensive household panel surveys from several countries in cooperation with the national source data administrators. It provides data files with harmonized outcome variables. The great comparative research value of the CNEF data has been confirmed in numerous publications, e.g., on income-related topics, life satisfaction, and self-employment (Turek et al. 2021).
Despite differences in scale and scope, most harmonization initiatives so far have adopted the institutionalized and centralized organizational model, with a core development team separated from users, primarily top-down decision-making processes, and strongly embedded in public institutional frameworks (e.g., academic or government) (Dubrow and Tomescu-Dubrow 2016). The key advantage of the model is the control over the methodology and the entire process of data harmonization, which requires expertise. The one final solution the centralized harmonization provides – the ready-to-work database – is an attractive option for users, which additionally assures a common methodological standard.
However, such a model has some limitations. To begin with, conventional harmonizations operate in closed developmental frameworks, which are focused primarily on providing the final products rather than supporting cooperative creation. Although harmonization scholars have argued about the benefits of opening the harmonization process to the research community (Burkhauser and Lillard 2005; Frick et al. 2007), conventional structures are usually restricted in this aspect.
At the application level, the closed design can restrict innovation and the scope of potential research based on the harmonized data. Scholars cannot add or modify variables, even if the necessary information is available in the original datasets. They also cannot modify the methodology of harmonization. Since multiple standards and strategies for ex-post harmonization exist (Dubrow and Tomescu-Dubrow 2016; Kołczyńska 2022), a single selected approach may not fit some of the research needs. For example, when employment status has been categorized into a few general categories, and a more detailed categorization is not included in the harmonized file, there is mostly no simple way to add it. The same problem occurs while harmonizing detailed education levels or subjective opinion items with different scales. In all such situations, the harmonization team is responsible for selecting the solution. This restricts users’ methodological flexibility and applicability and requires much responsibility and transparency from the harmonization team (Lillard 2023). Additionally, the top-down approach can also negatively affect learning processes since more open approaches could help improve common harmonization standards (Wysmułek et al. 2021).
At the infrastructure level, the limited flexibility of the conventional harmonization model may favor certain research approaches and suppress others due to the dependence on public or governmental funding or the domination of field-specific research lenses. Survey harmonization initiatives were criticized for being strongly oriented towards the economic perspective, leading to a focus on economic factors while neglecting other aspects (Dubrow and Tomescu-Dubrow 2016). For example, CNEF provides very detailed indicators of income and wealth but much less on topics such as education, well-being, family relationships, or labor market status. As a result, despite the large number of articles using CNEF, the set of topics has been relatively narrow given the potential of the source data (Turek et al. 2021).
Moreover, conventional harmonization infrastructure can work slowly, potentially limiting scientific responsiveness. For example, large harmonization projects with several teams working worldwide and complex, centralized management may need much time for data production (e.g., integration of the latest panel waves). According to Burkhauser and Lillard (2005), the failure of the European Community Household Panel (ECHP), a prominent European harmonization project, was largely due to inefficient administration that ignored the research community’s needs and could not integrate the project with a broader scientific infrastructure.

2.3 Breaking the path dependency

The history of inequality studies and the case of survey harmonization motivate a rethinking of how social science operates. Conventional knowledge infrastructures tend to be conservative and path-dependent, which can constrain explanatory power and responsiveness (Benbya et al. 2006; Coombs and Hull 1998; Volberda et al. 2021). Path dependency is a historical development trajectory in which past decisions shape and constrain present choices, even if contextual factors have changed and alternative steps could be better (David 2007). It develops well-established patterns of behaviors, routines, know-how resources, and petrifies dependencies between system elements. Such solutions have functional benefits because they are often efficient, reduce costs or effort, increase returns, and stabilize relationships in the system. However, path-dependent systems are limited in novel situations and innovations because they reinforce behaviors consistent with prior developments (Arthur 1994; Cohen and Levinthal 1990).
Academic systems seem to be perfect examples of such institutional path dependencies (Hollingsworth 2008; Krücken 2003). Academia is still largely structured by ideas and infrastructures forged in the nineteenth and early twentieth century, keeping scientific activity firmly within the boundaries of universities, research institutes, and companies (Franzoni and Sauermann 2014; Savage 2021). Despite being considered the cradle of liberal and progressive thinking, social science is ‘remarkably conservative’ in its academic practice (Savage 2021, p. 7), especially with regard to strong disciplinary boundaries and limited cooperation and communication. According to Hirschman (2021), the problems recognized in the economic approach to inequalities are inherent to each conventional knowledge infrastructure. The inertia of the scholarly system produces systematic ignorance that limits novelty. “Past priorities shape existing knowledge infrastructures that in turn channel researcher attention toward some problems and away from others” (Hirschman 2021, p. 742). Narrow theoretical perspectives also limit the contribution to public debate and the practical impact of research because they may ignore some important aspects of societal problems, as in the case of the dominance of the economic perspective in inequality studies (Jackson 2022). Knowledge infrastructure should provide enough room for discovery and innovation and enable breaking path dependencies (Swedberg 2020).
As the pressure to maintain explanatory power and practical applicability increases, social science must accelerate and open knowledge production. Meeting this challenge goes, however, beyond the back-end practices regarding dissemination and verification of results, which are increasingly popularized by the open science movement. Following Friesike et al. (2015), we should also redesign the front end of the research regarding production and innovation processes. Similarly, Arthur and Cohen (2022, p. 2) call to “entirely re-engineer the systems of scholarly knowledge creation, dissemination, and discovery” by building a stable knowledge infrastructure that increases access to high-quality data, stimulates broad collaborations, and is open to discovery and innovation.

3 The rise and progress of open-source initiatives

From the perspective of the limitations of conventional knowledge infrastructures, the open-source model (also called crowd-based or networked collaboration) can be considered an interesting organizational alternative for producing knowledge. The model is based on the idea of sharing work in open networks of contributors and disseminating outcomes to a broad community of users. Active participation by contributing to the initiative is voluntary and unpaid. Usage and application of the outcomes are free of charge and do not require any active contributions, and open-source results become a public good.
Open-source cooperation originated as an alternative method to software development in the 1980s, but it gained more attention in the early 2000s with advancements in computer technologies and programming frameworks. Today, Free/Libre Open-Source Software (FLOSS) is a widespread programming solution (Crowston et al. 2012). However, the open-source idea goes beyond software development and can be found in various virtual collaborations that aim to generate knowledge and solutions by involving many external actors. Such collaborations are organized around virtual platforms that integrate people, processes, services, knowledge, resources, and opportunities. This integrative approach can maximize capabilities existing in the broader ecosystem and co-create value and innovations (Abbate et al. 2021; De Falco et al. 2017).
Open-source projects usually involve several groups. As in the traditional operation model, the core development team can initiate development and control critical activities. However, the team’s composition, role, and authority are much more fluid – they arise from bottom-up processes, e.g., as a result of the contributions to the commonly agreed goal, and can vastly differ between projects and change over time (Bonaccorsi and Rossi 2003). There are also passive users (who apply the product yet do not actively contribute to its development) and active users (they may report errors, comment, or request features). A critical role in open-source projects is played by peripheral developers, who temporarily and voluntarily contribute to the product (Crowston et al. 2012; Setia et al. 2012). Although peripheral developers usually have a shorter affiliation with the project than core developers, they contribute significantly to the success of many open-source initiatives (especially in the more mature stages of product development).
Adaptation of the open-source model in science has been relatively slow and selective. Open-source scientific initiatives began to appear at a larger scale in the early 2000s, and since then, the amount of research done this way has steadily risen, but mainly in natural sciences, e.g., biology, medicine, ecology, physics, astronomy, and geography (Franzoni et al. 2021; Hucka et al. 2015; Kullenberg and Kasperowski 2016; Pfaff and Hasan 2007). For example, in medicine, the trend toward open-source collaborations has been argued as a promising avenue for fast and democratic advancements in drug discoveries (DeLano 2005). Open-source solutions also contributed to biology and biomedicine, including fundamental studies on the genome and DNA (Rai 2005; Singh 2014). Specifically, open-source databases and software were argued to provide a more coordinated response to complex problems, reduce transaction costs, and offer solutions to licensing difficulties that can significantly restrict access to knowledge in this area.
Open-source software and databases are also very important for environmental studies. This includes frameworks for mapping applications that collect and visualize geographical and spatial data (Lehtonen et al. 2024) and platforms for participatory modeling that allow the co-production of geospatial knowledge in the cloud and facilitate action by engaging stakeholders (White et al. 2023).
The role of open-source software and collaborations was also emphasized in computational chemistry (Pirhadi et al. 2016). One example is the Open Chemistry project, which offers a valuable open-source framework for producing, sharing, and visualizing quantum chemical data (Hanwell et al. 2020). Such initiatives were also considered highly valuable for educational purposes in this field (Lehtola and Karttunen 2022). Gezelter (2015) argues that open source should be the standard practice in chemistry, as it opens unexpected research opportunities, facilitates error correction, enhances reproducibility, and substantially lowers research costs.
However, open-source cooperation still rarely appears in social science (Beck et al. 2022; Firebaugh 2007; Franzoni and Sauermann 2014; Friesike et al. 2014; Gerring et al. 2020; Vazire 2018). When searching for “open source” in the Web of Science portal archives, we find a continuously rising trend in the number of publications. Yet most of the findings come from computer sciences and engineering journals (due to the technical nature of this issue), and the rest are dominated by natural sciences, such as biology, ecology, astronomy, or physics. Much of the records refer to general statistical tools that are also used in social science, e.g., Python and Stan programming languages or R statistical packages. When it comes to broadly defined social science and humanities, Web of Science query sums it up to merely 1% of all entries (less than 15.000 from the total of 150.000).
Open-source initiatives can be considered a sub-category of a broader group of crowd research, also called crowd science, networked science, or crowdsourcing research (Auspurg and Brüderl 2021; Beck et al. 2022; Franzoni and Sauermann 2014; Uhlmann et al. 2019). Both approaches engage scholars who are not formally linked to cooperate in an open network. While crowd research is defined through the goal of investigating a certain common topic or applying a common research design, open-source focuses specifically on the bottom-up co-development of research tools, code, and entire infrastructure with potential applications for various topics. Crowd research can be considered a next step by using these open infrastructures, but it can also use other data types and tools. For example, Salganik et al. (2020) used scientific mass collaboration of 160 teams to perform the same research task – measuring the predictability of specific life outcomes using the same data but various methods. Another popular form of crow research is large-scale collaborative replication efforts, such as SCORE (Alipourfard et al. 2021) aiming to assess the credibility of results published in social and behavioral science by engaging hundreds of researchers in distributed tasks, such as reproduction and replication. An example of infrastructure-oriented crowd research that also involves some open-source tools is the Psychological Science Accelerator, which builds a distributed network of laboratories designed to enable and support crowdsourced research projects (Moshontz et al. 2018).

4 The promise of an open-source model in social science

The open-source model can be a promising alternative to conventional ways of cooperation and production of social knowledge, given its specific characteristics and potential (summarized in Table 1). Although relatively new as an organizational scheme, it builds upon long-standing ideas of equal, inclusive, and open communication in science expressed by 20th -century philosophers (Breznau 2021). For example, Habermas (1984) considered open communication as a solution for inequalities in the production and consumption of communication. The open-source model also embodies the vision of science as a collaborative and cumulative process, where researchers benefit from sharing their efforts and contribute to faster and more ambitious scientific progress. As expressed by Popper (1959 [1934]), this is a never-ending, always incomplete process focused on temporary solutions and eliminating errors. Open cooperation also opens ways to innovation. For example, Peirce (1902) emphasized that the context of discovery stimulates open-minded approaches and unbiased conceptual frameworks necessary for explaining reality. Eventually, Merton (1973 [1942]) famously argued for the need for communalism, universalism, and organized skepticism, which is often considered a fundament for the contemporary open science movement.
These ideas fit well with today’s knowledge creation, which has become a collaborative enterprise, strongly dependent on virtual research cooperation, computer-supported cooperative work, and distributed research networks (Almaatouq et al. 2021; Aydinoglu 2013; Bullinger-Hoffmann et al. 2021; Raasch et al. 2013; Wuchty et al. 2007). One of the fundamental benefits is that open collaborations allow for a large and diverse base of contributors. This heterogeneity of actors amplifies collective intelligence and creativity, potentially expanding the range of scientific problems that can be addressed (Arza et al. 2018).
Open-source projects are not only a way to accelerate the research process by sharing tasks. Very often, they are the essential requirement for conducting a large-scale project that exceeds the capabilities of any single team. Many modern scientific problems benefit from such collaborations, especially those that are complex, interdisciplinary, and heavily dependent on computer technology and dispersed knowledge (Felin and Zenger 2014; Raasch et al. 2013). Thus, crowd research and open collaborations can improve scientific quality.
Open and flexible infrastructures can enable faster responses to unpredictable challenges with novel ideas and solutions (Aydinoglu 2013). Virtual teams with well-developed communication systems and distributed division of labor create transactive memory systems (Chen et al. 2013), where information and knowledge are allocated, stored, and retrieved collectively. Such systems are characterized by a high absorptive capacity (Cohen and Levinthal 1990), an ability to identify, assimilate, and exploit knowledge from the environment. Virtual collaborations can increase performance by exploiting extended knowledge and resources (Volberda et al. 2021). They may also pursue open innovation by allowing unconstrained inflow and outflow of knowledge to accelerate value creation and build new applications (Chesbrough 2003; Levine and Prietula 2014; Raasch et al. 2013). Studies show that an openly-governed environment with interdisciplinary and diverse teams is more likely to generate innovative outcomes (Dahlander and Gann 2010; Felin and Zenger 2014; Raasch et al. 2013) and high-impact scientific publications (Banal-Estañol et al. 2019).
Furthermore, open-source initiatives can accelerate research, increase efficiency, and stimulate the accumulation of knowledge. For example, the code designed for openly available data can be reused by other scientists, decreasing the workload. Importantly, it can also limit barriers to initiating and conducting studies, especially those riskier and with high entry costs (Arza et al. 2018; Franzoni and Sauermann 2014; Jones 2009). Re-usage of the code also allows for spotting problems or errors and faster correction (Steinhardt et al. 2022). Overall, it aligns with the open-science principles of transparency and reproducibility.
Another important aspect is the relatively low cost of developing and managing open-source infrastructure. Distributed work implies the dispersion of labor costs among teams and organizations. Open-source is also based on the idea of reusing components developed for other purposes (Gezelter 2015). As a result, open-source projects can yield much higher returns on investment than conventional approaches, making them an attractive option for the mostly underinvested science. A broad scientific community organized around a crowd initiative can also have a stronger position in seeking financial funding (Hucka et al. 2015). Although maintenance of the software and platform is required, there are low-cost solutions, such as using open platforms like GitHub. In some applications, maintenance can be financed by applying a business model where part of the software or activities is used to gain revenues. This includes, for example, adding extra paid features (e.g., interface, tailored applications), or paid support, training, and consulting (Gezelter 2015).
From a broader perspective of the knowledge-creation ecosystems (Abbate et al. 2021), the open-source model can allow for shifting from a knowledge-based to a capabilities-based scientific ecosystem. The more conventional knowledge-based ecosystem entails a network of actors and institutions (e.g., research organizations, universities, and for-profit innovators) focusing on generating and sharing knowledge through individual and collaborative research (Abbate et al. 2021; Järvi et al. 2018). Knowledge-based ecosystems can vary in openness, engagement of actors, level of cooperation, diversity of knowledge sources, and many other aspects. For example, broad institutional networks, open data infrastructures, and data hubs can largely contribute to opening knowledge ecosystems. However, open-source collaborations belong to a qualitatively different model of a capability-based ecosystem. The capability-based ecosystem allows going beyond the exchange of knowledge and systematically stimulates value co-creation processes and the generation of new capabilities. Here, open innovation moves upfront as the major goal (although unpredictable and achieved in uncoordinated ways) supported by developmental activities, tools, and services.
To transition from the knowledge exchange to the capacity-generating framework, the ecosystem must be open and capable of breaking path dependencies by non-deterministic development. Drawing upon the complexity theory (Benbya et al. 2006; Elder-Vass 2010), the transition can be viewed as acquiring two basic capacities of a complex adaptive system. One of them is self-organization that allows for the development of new behavior patterns through interactions between agents who are independent of each other but share some common understanding or rules. The other is the ability to generate emergent properties, defined as qualitatively novel outcomes that develop in uncoordinated interactions and are irreducible to the inputs. Sterman and Wittenberg (1999) argue that openness, dynamics, and adaptability are essential for developing and expanding new paradigms, allowing scientific revolutions, and enhancing the explanatory power of science. For example, the recent advances in artificial intelligence language models (e.g., ChatGPT) can open new and unexpected ways for code-based collaborations. Already now, such tools perform well in writing and translating code, designing algorithms, or preparing code-based documentation. It is difficult to assess the consequences of the AI revolution for research processes, but the open-source model seems to provide a way to harness the potential of these developments.
Table 1
Main differences between conventional and open-source infrastructure
 
Conventional infrastructure
Open-source infrastructure
Scientific Ecosystem
Knowledge-based ecosystem;
Integrating research actors and institutions;
Aimed at generating and sharing knowledge through individual and collaborative research
Capabilities-based ecosystem;
Integrating people, processes, services, knowledge, resources and opportunities;
Aimed at value co-creation processes and generation of new capabilities
Management of knowledge and innovation processes
Institutionalized, top-down decision-making;
Large and robust administration;
Closed and static governance (authority, property rights, and hierarchies)
Inclusion of bottom-up decision-making;
Limited administration and control;
Open and dynamic governance (many external linkages, proactive role of managers and contributors)
Communication
Communication is regulated, based on hierarchies and power structures
Inclusive, open, and uncoordinated communication
Key actors
Developers and users
Developers, peripheral developers, passive users, and active users
Ownership of the infrastructure
Strongly regulated, private and public ownership
Weakly regulated, public good, openly shared resources
Infrastructure development
Organized around institutions and projects;
Central role of key developers (authors, owners, experts)
Collaborative and cumulative process;
Strong role of crowd-based development;
Based on accumulation of resources, solutions, tools, etc.;
Collaborations
Dominance of individual and small-team work;
Institutional affiliation;
Open and broad collaboration;
Virtual teams;
Distribution of knowledge and information
Fragmented, shared between actors
Collective (transactive memory systems);
High absorptive capacity for external knowledge
Interdisciplinarity
Dominance of uni-disciplinary science;
Interdisciplinarity based on well-established conventional approaches
Strong potential for multi- and interdisciplinarity, diversity of topics and applications
Novelty and innovation
Limited by the dominance of certain research topics and approaches;
Aimed at improvement and verification
Nondeterministic, bottom-up development;
Aimed at open innovations and discovery
Adaptation and change
Tendency to inertia and path-dependent development; slow adaptation
Fast adaptive capabilities;
Ability to break path dependencies
Transparency, reproducibility and reliability
Limited transparency and reproducibility;
Clear responsibility and ownership;
Peer-review correction
Strong transparency and reproducibility;
Dispersed responsibility;
Crowd-based correction
Re-use of resources
Limited and restricted
Strong: based on sharing tasks and resources
Costs and risky projects
High costs of initiation and management;
High barriers to risky projects
Typically low costs; Low barriers to risky projects
Sustainability and uncertainty management
Dependent on financing and institutional context;
Stable and predictable development
Less dependence on external financing;
Much uncertainty and unpredictability

5 Potential fields of application

It seems that open-source collaborations may have something to offer to social sciences. Let us now consider several promising areas for implementing this model.

5.1 Computational social science

Computational advancements are a major driving force behind open-source and crowd cooperation in social science. The increasing implementation of computer-based methods, such as machine learning techniques, simulations, natural language processing, data mining, network analysis, and automated text analysis (Edelmann et al. 2020; Hofman et al. 2021; Lazer et al. 2020; Salganik 2017) is dependent on complex programming code and code-based cooperation. Very often, the general programming framework can serve multiple purposes and projects. Programming components and solutions developed for a particular goal can be reused and adjusted by other teams, speeding up the research processes.
A good example is Agent-Based Modelling (ABM), a method that simulates the adaptive behaviors of agents (e.g., individuals) who influence one another and react to the environment (Macy and Willer 2002; Steinbacher et al. 2021). Although ABM gains attention for studying emergent systems, collective behaviors, and complex experimentation, it is rarely applied in social science. One of the major obstacles is the technical complexity – ABM programming requires skills and time. Therefore, several open-source platforms emerged to share and reuse ABM code and modeling frameworks (Devillers et al. 2010; Janssen et al. 2008; Marwick 2016). For example, CoMSES Net, the Network for Computational Modeling in Social and Ecological Sciences (www.​comses.​net), is an open community of researchers interested in ABM of social and ecological systems. Another initiative is AgentBlocks (Berger et al. 2024), a platform to share, improve, and reuse components for agent-based models.

5.2 Crowd-based experiments and virtual laboratories

Another promising field is virtual collaboration via digital platforms, such as virtual and crowd-based experimentation labs (Beck et al. 2022; De Falco et al. 2017; Hofman et al. 2021; Horton et al. 2011; Mason and Watts 2012). Such platforms allow for collecting experimental data at a scale and pace unavailable in physical laboratories. For example, Salganik et al. (2006) designed an artificial music market with 14,341 participants to study the effects of social influence on individual and collective decision-making in cultural markets.
Although such virtual platforms represent, in principle, the crowd-based research model, they can integrate some open-source solutions. Almaatouq et al. (2021) argue that the new open experimental ecosystems can boost creativity, leading to new types of methods and theories unavailable with conventional approaches. To fully utilize the opportunities of the digital world, they suggest developing a broader virtual lab infrastructure designed as an open, flexible, and modular system, where the research community can easily adapt the technical solutions to run larger, faster, and more complex experiments. A similar idea can be recognized in wiki surveys proposed by Salganik and Levy (2015) as an open and crowd-based survey instrument in which respondents’ answers to open questions are added to the list for further participants. The authors show that such a collaborative and adaptive design can help generate and evaluate ideas.

5.3 Open-source code for secondary data analysis

One of the most promising areas for the open-source model is secondary data analysis, where researchers use existing data collected by others. A re-use of data is very popular in social science, particularly for extensive and costly population surveys. There is also a growing interest in register and administrative data (Connelly et al. 2016). Since preparing and managing such data is technically challenging, code sharing (if allowed by security protocols) can have many advantages and increase efficiency (Fecher et al. 2015). For instance, data preparation (e.g., combining files, cleaning the data, integrating and harmonizing separate surveys) proceeds with a similar workflow regardless of the research topic.
One example of an open-source secondary data analysis initiative is the Gateway to Global Aging Data platform (www.​g2aging.​org). It provides free resources for harmonizing survey data on aging-related issues and encourages research collaboration and data sharing (Jain et al. 2016). Furthermore, some secondary survey data sources include users’ code repositories that enrich usability and applications (e.g., UK Understanding Society Household Longitudinal Study). In many other cases, researchers share such code directly, e.g., at GitHub code repository or private websites. However, sharing individually created code files does not utilize the potential of open-source collaborations. It lacks procedures for crowd-based development and mostly aims at specific research purposes.

5.4 Survey harmonization example: the story of comparative panel file

The last field I want to consider in detail is ex-post survey harmonization. Since the conventional approaches in this area – discussed previously – are limited in several ways, open-source appears as a suitable alternative. Similarly to secondary data analysis, much of the code-based work in harmonizing surveys can be shared between researchers. Regardless of the research goal and perspective, the basic coding framework is similar and follows the same steps, e.g., integrating data files, identifying similar source variables, and transforming them into target variables.
I will discuss probably the first fully open-source survey harmonization project in social science, the CPF (Turek et al. 2021). It was inspired by the limited applicability of the CNEF data, namely some key variables were harmonized in a way that was not useful for the intended analysis. A necessary correction of the harmonization algorithm would be easy, but CNEF system does not allow for direct user modifications. Although, over the years, CNEF has introduced various solutions to extend explanatory power (e.g., adding countries and variables) and implement more open communication (Lillard 2023), the entire content is still prepared by the CNEF team (even if inspired by researchers’ needs) and focuses on an economic perspective (Dubrow and Tomescu-Dubrow 2016). This approach has many advantages typical for the well-established conventional infrastructure, including control over the methodology and quality of the final product. However, it also shares some limitations of the centralized and closed model discussed previously, such as limited responsiveness and flexibility in harmonization.
Building on the pioneering developments of CNEF and other conventional harmonization initiatives, CPF, published in December 2020, was an attempt to move the harmonization process to open science and crowdsource space. In principle, both approaches to harmonization can co-exist, addressing different audiences and allowing for different applications. Using network-based technologies, CPF provides researchers with new tools and possibilities. It is organized as a virtual platform that integrates tools for communication, code development, and general management of scientific research (Fig. 1; for details, see Turek et al. 2021). CPF is the freely and openly available harmonization code built from scratch. The code generates a comparative dataset based on the original household panel surveys (that are available for free from national data providers). The procedures integrate datasets and waves within countries, transform input variables into harmonized variables, and merge them into a single dataset. CPF version 1.5 data file contains over 3 million observations, coming from ca. 400 thousand individuals and covering up to 41 waves. Compared to CNEF (at least currently), CPF offers a different range of variables and more recent samples. The open-source code is organized into multiple lower- and higher-level files. It is stored at GitHub, a popular open-source code repository that provides tools to develop the code, track and share changes, and integrate them into consecutive versions. Users can modify and add variables, include more recent samples, or add new surveys.
Although CPF shares the same goal as its predecessor and focuses on the same datasets, the novel open-source framework and tools may contribute to comparative social science in several ways. At the basic level, CPF’s open-source code can save weeks or months of harmonization work. All household panel studies included in the CPF are extensively used in research, and the comparative potential added by the CPF may only extend the utility of these surveys. Importantly, the code is also helpful for only working with data from one country.
The open-source model facilitates new applications and extends researchers’ flexibility. Compared to top-down initiatives, CPF allows for more open management, unconstrained development, and better responsiveness to researchers’ needs. The open-source format allows engaging the crowd wisdom (Beck et al. 2022) to boost creativity and extend the CPF code for new applications. Peripheral developers can directly contribute to the code through GitHub, and active users can share their ideas or suggestions. The modularity of the code development process, i.e., decomposition of a complex harmonization into more manageable and independent tasks (e.g., adding new variables), allows division of labor and parallel work. Once the coding framework that organizes the most technical and time-consuming aspects of harmonization (such as preparing and combining the source data files) is provided, researchers can focus on lower-scale tasks. Most of the distributed coding tasks in harmonization can be classified as low complexity and well-structured, according to the terminology of Franzoni and Sauermann (2014). This means that tasks tend to be independent, and contributors can work in parallel. Such tasks refer to the most important input for the CPF, i.e., adding new variables and developing small parts of the code. They are organized by microtask workflows that instruct how to proceed. However, CPF’s development can also involve highly complex and ill-structured tasks, such as adding new surveys or changing larger structures in the code. In this case, obtaining the final solution requires developing a shared understanding of the goal and approach, sequential cooperation, and coordinated verification of the changes.
Furthermore, the open and dynamic design supports technical and substantive solutions for harmonization dilemmas. Conventional projects put much emphasis on developing a unified and reliable harmonization methodology because they provide ready-to-analyze integrated datasets (Tomescu-Dubrow et al. 2023). Instead of top-down and ultimate solutions to comparability, CPF’s open-source model explores a very different approach where researchers have complete control over the harmonization process, yet they are also responsible for the quality and outcomes.
Inappropriate and erroneous harmonization is one of the major risks of open-source code development. The main solutions to this are based on transparency and version control, which facilitate error detection and code improvement. GitHub allows integration of the distributed code into consecutive official versions, providing version control. CPF is also integrated with Open Science Framework, one of the most popular open science platforms that facilitate collaborative workflow on research projects, pre-registering studies, storing code and data, etc. With permanent identifiers and continuous access to all versions of the data and documentation, the design stimulates transparency and reproducibility of research. However, although the highest quality can be achieved in open-source projects, it is not guaranteed. The coordination team can play an important role in quality control, but the basic correcting mechanisms are crowd-based, and can also be more efficient in this task (Arza et al. 2018).
These aspects related to the quality and methodology of harmonization are crucial for assessing the applicability and risks of open and centralized harmonization models. The accents between quality control and efficiency of development are allocated differently in these two approaches. In the short term, open-source harmonization focuses on efficiency in developing new code, at the cost of the quality being initially dependent on the contributor. This potentially allows for many more errors. However, crow-based mechanisms are expected to detect these errors and lead to quick improvements if needed (the more users, the better quality). Centralized harmonization, on the other hand, can prioritize quality control based on internal expertise. However, with less emphasis and tools for external control, correcting errors (as well as any new developments) can take longer. Put it differently, the open approach can quickly generate a code for more applications that the user must first test, while the conventional approach provides tested and trusted data with somewhat limited applicability. As such, both models answer different research needs.
As a crowd-based cooperation, CPF can also be independent of administrative and institutional constraints of conventional projects. The model can potentially improve the efficiency of harmonization projects and lower costs and the time required for comparative research. The cost of the CPF was incomparably lower than the cost of most data harmonization initiatives while providing comparable results. For example, the cost of building the Consortium of Household Panels for European Socio-economic Research (CHER) between 2000 and 2003 exceeded one million Euros (Dubrow and Tomescu-Dubrow 2016). As a comparison, the first published version of the CPF costs about 20 times less (although, to be fair, technological advancements can also help to reduce the cost of conventional infrastructure nowadays). Similar advantages are recognized in the case of virtual labs that highly decrease development costs and time, resulting in lower investment risks (Almaatouq et al. 2021).
Open-source initiatives, such as CPF, appear and diffuse because such solutions are needed. However, they will develop and mature only if they are useful for scientific research. CPF is still in its initial stage but has already been recognized as a contribution to the research infrastructure in social science. It received a positive response from data providers, scholars, and research institutions. The interest was substantial in the first two years after the publication, with ca. 10,000 + site views from 100 + countries, 30,000 + social media interactions, and 6,400 views of the main article. A baseline evaluation criterion for measuring the success of open-source initiatives in science is the application of the infrastructure in research practice, e.g., in publications. Conventional harmonization initiatives (e.g., CNEF) were very successful in this regard, leading to many publications (Tomescu-Dubrow et al. 2023). For CPF, such evaluation is difficult because three years might be too early for the lengthy publication process in science, yet several articles have already been published (Thielemans and Mortelmans 2022; Turek et al. 2022; van Wijk and Billari 2024).

6 Challenges and limitations: towards the coordinated open-source model

So far, the article has focused more systematically on the advantages of the open-source model, but some serious challenges and limitations must be addressed. Studies on open-source projects over the past few decades show that abandonment and termination of such initiatives are not uncommon (Avelino et al. 2019). I will organize the discussion around three central and critical goals of open-source initiatives: stimulation of crowd-based development, appropriate technical infrastructure, and assurance of long-term sustainability (Table 2).

6.1 Crowd-based contribution and development

The key criterion for measuring the success of an open-source model is the active involvement of the academic community in code development. Sharing analytical code for publicly available secondary data sources is usually regarded as beneficial for stimulating and advancing research, but active participation in such practices is less common (Fecher et al. 2015; Linåker and Regnell 2020; Scheliga et al. 2018; Steinhardt et al. 2022; Zuiderwijk et al. 2024). According to Freese (2007), researchers may hesitate to release their code to the public because the efforts required to do it and perceived risks often outweigh the potential individual benefits. Specifically, they may be reluctant to allow others to benefit from their work. Researchers or teams experienced with a particular dataset may see it as a competitive advantage for future projects. They can also worry that discovering errors in their code might negatively affect their careers. Moreover, original ideas or solutions can also be used in an unauthorized way, without proper ownership credits, or stolen. Another limitation is a lack of direct stimuli to contribute to voluntary work for the broader community. There is always a cost associated with communication (Baldwin and Clark 2006). In a survey of the users of the German Socio-Economic Panel (SOEP) data, the primary answer for not sharing data and code was simply “too much effort” (Fecher et al. 2015).
This type of situation, where the group benefits and individual risks of collaboration are difficult to balance, is well known to social scientists as the collective action problem or free-riding problem (Baldwin and Clark 2006; Olson 1965). While cooperation would benefit the entire community, individualistic motives and conflicting objectives often discourage collective efforts. Open-source collaborations have sought two kinds of solutions to this problem: organizational and technical.
Organizational solutions focus on supporting community building and the active involvement of peripheral developers (Bonaccorsi and Rossi 2003; Fecher et al. 2015; Franzoni and Sauermann 2014; Matei and Irimia 2014; Shah 2006). This can be done by stimulating intrinsic motivation (e.g., by gamification, supporting community commitment, reputation, or reciprocity norm), raising interest (e.g., by providing access to materials or outcomes), or facilitating formal recognition (e.g., citations or increasing career prospects). For instance, a study of GitHub open-source projects showed that personal and professional needs were the primary motivations to contribute (Avelino et al. 2019). In a systematic literature review on the barriers to contributing to open-source software projects, Steinmacher et al. (2015) find that newcomers are often discouraged by a lack of social interactions with project members that would enable better socialization and identification with the initiative.
In the case of scientific applications, it seems that the necessary incentive structure should be built on two aspects. The first one considers tools and solutions for recognizing and protecting open-source contributions. The critical stimulus is a formal acknowledgment as a co-author of a research product (e.g., of a code, software, or documentation). The GitHub system of monitoring all modifications and linking them to specific contributors can be helpful but insufficient. It may be supported by registering and promoting scientific output based on open-source resources. Some propositions exist and are already occasionally implemented, e.g., the CRediT standard for documenting contributions (Holcombe et al. 2020). Still, some contributions to open-source initiatives are too small, too complex, or leave little trace in repositories.
Another vital issue is the protection of intellectual property. It has been recognized as a vital issue for open-source innovators and creators, especially in fields where scientific solutions can also be applied commercially, such as bioinformatics (Singh 2014) or medicine (DeLano 2005). Various regulations and protective methods have been discussed and implemented in open science and open source, including copyrights, licensing, patents, and citation rules for software and code (DeLano 2005), but the issue is still complex and unregulated due to the crowd-based nature of the products (Gorbatyuk et al. 2016). Nevertheless, in many research fields, the risks or consequences of intellectual theft are relatively low. For instance, contributing with a code already published in open science materials (becoming a common practice in social science) does not bring many more risks.
The second aspect relates to a broader culture and norm in the scientific community. As mentioned before, many areas of social science highly value individual work or small-team cooperation. Cultural and normative change can be facilitated by the promotion of specific tools and solutions (e.g., for recognition and protection) by the scientific community, e.g., in hiring, promotion, tenure, and funding decisions. The open-source literature also points out social motives as essential incentives to contribute (Okoli and Oh 2007; Oreg and Nov 2008; Trinkenreich et al. 2020). Gaining status and respect in the community, networking, and building social capital can be very useful in academic careers, e.g., in building grant consortia. One way to support these processes is by organizing networks around specific open-source initiatives. Some fields, e.g., agent-based modeling, put a lot of attention to such contributions, making these roles visible and well-recognized.
Table 2
Challenges, limitations, and potential solutions for scientific open-source initiatives
Key success factors of open-source initiatives
Key limitations
Priorities
Potential solution
Crowd-based contribution and development
• Lack of engagement of peripheral developers
• Effort and risks of sharing work
• Free-riding and collective action problem
• Community building
• Appropriate incentive structure
• Changing scientific culture and norms
• Protecting intellectual property
• Increasing intrinsic motivation, e.g., gamification, reputation or reciprocity norm
• Rising interest, e.g., by providing access to materials or outcomes
• Supporting community commitment and identification, e.g., by socialization opportunities
• Formal acknowledgment of open-source contributions in hiring, promotion, tenure, and funding decisions
• Regulating contributions to large-team projects
• Registering and promoting scientific output
• Recognition of the contributing roles within networks and communities
Technical quality of the infrastructure
• Technical difficulties in usage and contribution
• Poor integration of contributions
• Lack of transparency
• Quality of infrastructure
• Documentation and support
• Streamlining bottom-up developments
• Clear workflows and examples (e.g., videos, walk-through examples)
• Modularization, e.g., micro-tasks
• Continuous, coordinated verification and integration
Long-term sustainability
• Uncertainty about the speed, financing, continuity and direction of development
• Uncertainty about the quality of outcomes and users’ interest
• Coordination problems
• Assuring daily functioning
• Stimulation and structuration of bottom-up contributions
• Integration with the scientific ecosystem
• Expertise coordination
• Stable core coordination team with limited goals
• Balance between top-down and bottom-up processes
• Flexible management related to projects’ stage and current challenges
• Assuring linkages with conventional infrastructure, e.g., grants and research projects
• Promotion and dissemination

6.2 Appropriate technical infrastructure

Technical solutions to the collective action problem focus on improving the quality of the knowledge infrastructure (Freese and King 2018). While appropriate solutions can support easy contribution and continuous integration, design flaws can result in a reluctance to transparency and barriers to code and data sharing (Avelino et al. 2019; Gerring et al. 2020; Zuiderwijk et al. 2024). For example, Baldwin and Clark (2006) argue that a more modular and flexible codebase architecture can stimulate contributors’ engagement and mitigate the free-riding problem.
A complete open-source infrastructure builds on the engagement of peripheral developers. Such infrastructure must provide functional components that are modular, interoperable, and reusable (Almaatouq et al. 2021). The open-source software literature offers suggestions about a successful streamlining of the bottom-up processes, e.g., by microtask workflows that modularize and pre-specify goals and actions, sequential cooperation, parallel and independent development paths, clear workflow for highly complex tasks, and coordinated verification (Franzoni and Sauermann 2014; Valentine et al. 2017). These goals should be supported by clear instructions and examples, e.g., videos or walk-through examples.
In the case of CPF, the community-based input has been so far limited but not negligible. The project received several substantial contributions from active users (e.g., error detection) and external developers (e.g., pieces of code, detailed suggestions). The authors have also initiated larger cooperations aimed at extensive developments. However, one concern is the usability of the GitHub environment for crowd-based code development, which seems challenging and many users prefer sharing ideas by email rather than introducing them directly in the code. In particular, more demanding tasks (like adding new countries) appear too complex for purely crowd-based cooperation. Alternative solutions for active development and technical improvements of website usability should be considered.

6.3 Sustainability, expertise and coordination

Another major challenge for open-source projects is their long-term sustainability. Even though design, functionality, and community are important, they do not guarantee that the initiative will continue. Open-source projects are much more uncertain than institutionalized initiatives, so concerns about long-term sustainability remain a significant risk factor. Contrary to conventional projects, financing plays a smaller (though not negligible) role here because open-source initiatives are less costly. A key sustainability factor is coordination.
The idea of “coordination” of open-source collaborations is not straightforward because decision-making processes have predominantly bottom-up, self-organizing, and decentralized character (Bonaccorsi and Rossi 2003; Setia et al. 2012). Nevertheless, open-source projects also require a management framework and leadership. While a rigid management style may harm collaboration and limit the unique values of virtual research collaborations, some general leadership is required (Aydinoglu 2013; Duparc et al. 2022; Felin and Zenger 2014; Matei and Irimia 2014; Volberda et al. 2021). Studies on information and knowledge-based systems suggest that if the growing complexity of the environment is not managed appropriately, such systems fail (Benbya et al. 2006). Ongoing coordination of the core management processes is especially vital for scientific applications, where the contributors base is relatively limited, expertise is dispersed and diverse, and the links to the academic environment are complex.
Effective coordination within complex adaptive systems, such as open-source crowd-science initiatives, hinges on harmonizing top-down and bottom-up processes. To begin with, coordination is necessary to ensure the daily functioning and resolving technical issues of open-source platforms. The core management team can also initiate, stimulate, and structure contributions (Bonaccorsi and Rossi 2003). For example, it may provide positive feedback to bottom-up, self-organizing developments, facilitating new functionalities and structures (Duparc et al. 2022). Notably, stronger coordination may be necessary for more complex, open-ended tasks where pre-specification and modularization are difficult (Valentine et al. 2017). For instance, in the CPF, the coordinating team has led most of the development so far (partly based on users’ suggestions and input). Yet stimulating and integrating external developers at a larger scale in developing processes is fundamental to success (Duparc et al. 2022; Setia et al. 2012).
Moreover, top-down management is crucial for navigating the scientific environment and executing strategic moves. The success of scientific open-source initiatives greatly relies on their integration into the broader scientific ecosystem. This requires active promotion, collaboration with key actors, and funding acquisition, which are difficult to do through crowd-based collaborations.
Another vital issue is expertise coordination. As Scheliga et al. (2018) argue, quality assurance is pivotal to the success of open-source initiatives. Dispersed and diverse expertise may cause problems and even contribute to their failure (Faraj and Sproull 2000; Pfaff and Hasan 2007; Poor 2020). For example, Nupedia – the predecessor to Wikipedia – was abandoned largely due to the complicated process of article reviewing (Rosenzweig, 2006). Wikipedia succeeded by implementing a new open knowledge management system that reduced the review and edition time. Expertise coordination, especially in scientific applications, relates primarily to the quality and credibility of outcomes (Franzoni and Sauermann 2014; Friesike et al. 2014). The conventional, expert-based model, where hierarchies, procedures, and prestige serve as gatekeeping mechanisms for quality outcomes, has been most natural for science. For example, the centralized harmonization model, such as in the CNEF project, is based on internal expertise, allowing for centralized control over the final harmonized database (Lillard 2023). Reliance on open-source mechanisms may cause concerns about the outcomes’ quality. Crowd science must still meet rigorous scientific standards. However, it is important to understand that the quality does not have to be at risk here, and it is achieved and reviewed in different ways. Contrary to this data-oriented approach, the open-source harmonization in CPF is oriented at the code and process. This approach opens the way for various modifications in the methodology, allowing alternative solutions to co-exist (e.g., users have tools to change the harmonization code). There are three major ways to identify those incorrect solutions (e.g., due to an error or inappropriate methodology). First, crowd-based cooperation allows errors in the core version of the open-source code to be identified. Appropriate feedback and correction mechanisms are thus crucial (Scheliga et al. 2018). Second, the open-source model transfers the expertise requirement to the final user, who must eventually take responsibility for the quality. Errors can be identified in typical ways, as in any other peer-reviewed research, whether during the review or post-publication verification. Open access to the code facilitates it, especially in case of deviations from the core version of the harmonization methodology. Third, the coordinated open-source model also implements some amount of conventional and centralized expertise control (which may be necessary for scientific applications). For example, the core team of CPF supervises the core code development, ensuring quality through their activities (e.g., testing) and collecting user input. Overall, expertise coordination is one of the crucial challenges for applying the open-source model in science, but the three mechanisms allow for achieving and verifying quality, providing additional space for more flexibility.
Management literature has recently shown a growing interest in exploring the coordination of open-source initiatives and – more broadly – managing knowledge development and stimulating innovations. The theoretical perspective on this topic appears to shift from static towards more dynamic models as more suitable for open innovation and knowledge exchange (Duparc et al. 2022; Felin and Zenger 2014; Volberda et al. 2021). Static models focus on more traditional, closed governance based on authority, property rights, and strong hierarchies. Dynamic models build upon a larger number of external linkages, entail some forms of open governance, and emphasize the proactive role of managers in shaping strategies for knowledge development and innovation (e.g., managerial agency theory). For example, Felin and Zenger (2014) distinguish three variants of open governance. The “markets/contracts” variant is based on a centrally controlled system of open knowledge transfer aimed at completed solutions and governed by explicit contracts. The “partnerships/alliances“ model allows for more open knowledge exchange and numerous communication channels within a diverse network of cooperators. The last variant of open governance is based on the “user community” that generates solutions and manages the initiative.
Which type of governance is best suited to open-source initiatives is an open debate. Most likely, it depends on the application and stage of the project. An essential disadvantage of purely community-based governance is the limited control over innovation and development (Felin and Zenger 2014). Bottom-up governance may take time and require complex negotiations, which can be risky for early-stage projects, especially with a diverse user community. Thus, central, top-down coordination can benefit open-source initiatives at certain stages. However, it should not dominate the management structure as it could block all bottom-up, non-deterministic open innovations. Eventually, the goal of open-source virtual collaborations is to create an ecosystem of capabilities (Abbate et al. 2021), where sharing internal and external knowledge allows co-creation of novel solutions that exceed the potential of particular actors. Finding the balance between the strength and amount of coordination seems crucial for the success of open-source initiatives, especially in novel applications in social science. Given the limited evidence, we can only expect that more coordination is needed for the early stages of development, and it should be reduced (but not eliminated) once the bottom-up processes are activated.

7 Conclusions

Open-source collaborations are still rarely encountered in social research, however, they present an attractive organizational scheme. Given the rapid social changes and growing complexity of knowledge production, systemic constraints remain regarding the quality, innovativeness, efficiency, and speed of knowledge production in social science. The history of inequality studies revealed the limitations and malfunctions of the conventional, institutionalized academia, such as narrow approaches, path-dependent development, and slow reactions to social problems. But the crowd-based response to the pandemic showed a much more responsive and dynamic side of science (Altman and Cohen 2022; Callaway 2020).
I have argued that social science can stimulate cooperation and development by adopting solutions to open the knowledge infrastructure. In particular, I focused on the coordinated open-source model. As shown with the examples of survey harmonization initiatives, the open-source model can be potentially applied in certain areas of the increasingly computational and coding-dependent social science, serving as a valuable addition (not a replacement) to conventional research infrastructure. Flexibility, decentralized control, and community-based development facilitate breaking down path dependencies, opening possibilities for innovations, and helping respond to societal challenges. The model also aligns with the ideas of open science, such as reproducibility, transparency, and accessibility, which can stimulate more robust and impactful findings and be appealing to funding agencies. However, the lessons learned from non-scientific open-source initiatives, e.g., software programming, point to various risks related to technical, organizational, and coordination aspects. Moreover, the conventional and expert-based model is often advantageous and irreplaceable, especially in applications when quality is prioritized over speed and creativity. Nevertheless, given the high potential and low costs, it is worth experimenting with open-source applications and evaluating their effects.

Declarations

Declarations of interest

None.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literatur
Zurück zum Zitat Almaatouq, A., Becker, J.A., Bernstein, M.S., Botto, R., Bradlow, E., Damer, E., Duckworth, A.L., Griffiths, T., Hartshorne, J.K., Lazer, D., Law, E., Liu, M., Matias, J.N., Rand, D.G., Salganik, M.J., Satlof-Bedrick, E., Schweitzer, M., Shirado, H., Suchow, J.W., Yin, M.: Scaling up experimental social, behavioral, and economic science. Preprint: OSF https://doiorg. (2021). https://doi.org/10.17605/OSF.IO/KNVJSCrossRef Almaatouq, A., Becker, J.A., Bernstein, M.S., Botto, R., Bradlow, E., Damer, E., Duckworth, A.L., Griffiths, T., Hartshorne, J.K., Lazer, D., Law, E., Liu, M., Matias, J.N., Rand, D.G., Salganik, M.J., Satlof-Bedrick, E., Schweitzer, M., Shirado, H., Suchow, J.W., Yin, M.: Scaling up experimental social, behavioral, and economic science. Preprint: OSF https://​doiorg.​ (2021). https://​doi.​org/​10.​17605/​OSF.​IO/​KNVJSCrossRef
Zurück zum Zitat Anderson, P.W.: More is different: Broken symmetry and the nature of the hierarchical structure of science. Science. 177(4047), 393–396 (1972)CrossRef Anderson, P.W.: More is different: Broken symmetry and the nature of the hierarchical structure of science. Science. 177(4047), 393–396 (1972)CrossRef
Zurück zum Zitat Arthur, W.B.: Increasing Returns and Path Dependence in the Economy. University of Michigan Press (1994) Arthur, W.B.: Increasing Returns and Path Dependence in the Economy. University of Michigan Press (1994)
Zurück zum Zitat Avelino, G.A., Constantinou, E., Valente, M.T., Serebrenik, A.: On the abandonment and survival of open source projects: an empirical investigation. Proceedings – 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society. (2019) Avelino, G.A., Constantinou, E., Valente, M.T., Serebrenik, A.: On the abandonment and survival of open source projects: an empirical investigation. Proceedings – 13th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society. (2019)
Zurück zum Zitat Berger, U., Bell, A., Barton, C.M., Chappin, E., Dreßler, G., Filatova, T., Fronville, T., Lee, A., van Loon, E., Lorscheid, I., Meyer, M., Müller, B., Piou, C., Radchuk, V., Roxburgh, N., Schüler, L., Troost, C., Wijermans, N., Williams, T.G., Grimm, V.: Towards reusable building blocks for agent-based modelling and theory development. Environ. Model. Softw. (2024)., 175 https://doi.org/10.1016/j.envsoft.2024.106003CrossRef Berger, U., Bell, A., Barton, C.M., Chappin, E., Dreßler, G., Filatova, T., Fronville, T., Lee, A., van Loon, E., Lorscheid, I., Meyer, M., Müller, B., Piou, C., Radchuk, V., Roxburgh, N., Schüler, L., Troost, C., Wijermans, N., Williams, T.G., Grimm, V.: Towards reusable building blocks for agent-based modelling and theory development. Environ. Model. Softw. (2024)., 175 https://​doi.​org/​10.​1016/​j.​envsoft.​2024.​106003CrossRef
Zurück zum Zitat Burkhauser, R.V., Butrica, B.A., Daly, M.C., Lillard, D.R.: The cross-national Equivalent file: A product of cross-national research. In: Becker, I., Ott, N., Rolf, G. (eds.) Social Insurance in a Dynamic Society. Campus Fachbuch (2001) Burkhauser, R.V., Butrica, B.A., Daly, M.C., Lillard, D.R.: The cross-national Equivalent file: A product of cross-national research. In: Becker, I., Ott, N., Rolf, G. (eds.) Social Insurance in a Dynamic Society. Campus Fachbuch (2001)
Zurück zum Zitat Callaway, E.: Will the pandemic permanently alter scientific publishing? Nature. 582, 167–168 (2020)CrossRef Callaway, E.: Will the pandemic permanently alter scientific publishing? Nature. 582, 167–168 (2020)CrossRef
Zurück zum Zitat Chesbrough, H.: Open Innovation: The New Imperative for Creating and Profiting from Technology. Harvard Business (2003). Vol. Harvard Chesbrough, H.: Open Innovation: The New Imperative for Creating and Profiting from Technology. Harvard Business (2003). Vol. Harvard
Zurück zum Zitat Cohen, W., Levinthal, D.: Absorptive capacity - a new perspective on learning and innovation. Adm. Sci. Q. 30(1), 128–152 (1990)CrossRef Cohen, W., Levinthal, D.: Absorptive capacity - a new perspective on learning and innovation. Adm. Sci. Q. 30(1), 128–152 (1990)CrossRef
Zurück zum Zitat Coombs, R., Hull, R.: Knowledge management practices’ and path-dependency in innovation. Res. Policy. 28, 237–253 (1998)CrossRef Coombs, R., Hull, R.: Knowledge management practices’ and path-dependency in innovation. Res. Policy. 28, 237–253 (1998)CrossRef
Zurück zum Zitat Doiron, D., Raina, P., Raina, P., L’Heureux, F., Fortier, I.: Facilitating collaborative research: Implementing a platform supporting data harmonization and pooling. Norsk Epidemiologi. 21(2), 221–224 (2012)CrossRef Doiron, D., Raina, P., Raina, P., L’Heureux, F., Fortier, I.: Facilitating collaborative research: Implementing a platform supporting data harmonization and pooling. Norsk Epidemiologi. 21(2), 221–224 (2012)CrossRef
Zurück zum Zitat Edwards, P.N., Jackson, S.J., Chalmers, M.K., Bowker, G.C., Borgman, C.L., Ribes, D., Burton, M., Calvert, S.: Knowledge Infrastructures: Intellectual Frameworks and Research Challenges. Deep Blue. http://hdl.handle.net/2027.42/97552 (2013) Edwards, P.N., Jackson, S.J., Chalmers, M.K., Bowker, G.C., Borgman, C.L., Ribes, D., Burton, M., Calvert, S.: Knowledge Infrastructures: Intellectual Frameworks and Research Challenges. Deep Blue. http://​hdl.​handle.​net/​2027.​42/​97552 (2013)
Zurück zum Zitat Elder-Vass, D.: The Causal Power of Social Structures: Emergence, Structure and Agency. Cambridge University Press (2010) Elder-Vass, D.: The Causal Power of Social Structures: Emergence, Structure and Agency. Cambridge University Press (2010)
Zurück zum Zitat Firebaugh, G.: Replication Data sets and favored-hypothesis Bias. Sociol. Methods Res. 36(2), 200–209 (2007)CrossRef Firebaugh, G.: Replication Data sets and favored-hypothesis Bias. Sociol. Methods Res. 36(2), 200–209 (2007)CrossRef
Zurück zum Zitat Franck, R.E.: The Explanatory Power of Models: Bridging the Gap between Empirical and Theoretical Research in the Social Sciences. Springer Science & Business Media (2002) Franck, R.E.: The Explanatory Power of Models: Bridging the Gap between Empirical and Theoretical Research in the Social Sciences. Springer Science & Business Media (2002)
Zurück zum Zitat Fraser, N., Brierley, L., Dey, G., Polka, J., Pálfy, M., Nanni, F., Coates, J.: The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape. PLoS Biol. 19(4), e3000959 (2021). https://doi.org/10.5281/zenodoCrossRef Fraser, N., Brierley, L., Dey, G., Polka, J., Pálfy, M., Nanni, F., Coates, J.: The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape. PLoS Biol. 19(4), e3000959 (2021). https://​doi.​org/​10.​5281/​zenodoCrossRef
Zurück zum Zitat Freese, J.: Replication standards for quantitative social science: Why not sociology? Sociol. Methods Res. 36(2), 153–172 (2007)CrossRef Freese, J.: Replication standards for quantitative social science: Why not sociology? Sociol. Methods Res. 36(2), 153–172 (2007)CrossRef
Zurück zum Zitat Frick, J., Jenkings, S.P., Lillard, D.R., Lipps, O., Wooden, M.: The Cross-National Equivalent File (CNEF) and Its Member Country Household Panel Studies. EconStor Open Access Articles, ZBW - Leibniz Information Centre for Economics, 627–654. (2007) Frick, J., Jenkings, S.P., Lillard, D.R., Lipps, O., Wooden, M.: The Cross-National Equivalent File (CNEF) and Its Member Country Household Panel Studies. EconStor Open Access Articles, ZBW - Leibniz Information Centre for Economics, 627–654. (2007)
Zurück zum Zitat Gerring, J., Mahoney, J., Elman, C.: In: Elman, C., Gerring, J., Mahoney, J. (eds.) The Production of Knowledge: Enhancing Progress in Social Science. Cambridge University Press (2020). https://doi.org/DOI Gerring, J., Mahoney, J., Elman, C.: In: Elman, C., Gerring, J., Mahoney, J. (eds.) The Production of Knowledge: Enhancing Progress in Social Science. Cambridge University Press (2020). https://​doi.​org/​DOI
Zurück zum Zitat Habermas, J.: Theory of Communicative Action: Reason and the Rationalization of Society. Becon (1984) Habermas, J.: Theory of Communicative Action: Reason and the Rationalization of Society. Becon (1984)
Zurück zum Zitat Hanwell, M.D., Harris, C., Genova, A., Haghighatlari, M., Khatib, E., Avery, M., Hachmann, P., J., de Jong, W.A.: Open Chemistry, JupyterLab, REST, and quantum chemistry. Int. J. Quantum Chem. 121(1) (2020). https://doi.org/10.1002/qua.26472 Hanwell, M.D., Harris, C., Genova, A., Haghighatlari, M., Khatib, E., Avery, M., Hachmann, P., J., de Jong, W.A.: Open Chemistry, JupyterLab, REST, and quantum chemistry. Int. J. Quantum Chem. 121(1) (2020). https://​doi.​org/​10.​1002/​qua.​26472
Zurück zum Zitat Hofman, J.M., Watts, D.J., Athey, S., Garip, F., Griffiths, T.L., Kleinberg, J., Margetts, H., Mullainathan, S., Salganik, M.J., Vazire, S., Vespignani, A., Yarkoni, T.: Integrating explanation and prediction in computational social science. Nature. 595(7866), 181–188 (2021). https://doi.org/10.1038/s41586-021-03659-0CrossRef Hofman, J.M., Watts, D.J., Athey, S., Garip, F., Griffiths, T.L., Kleinberg, J., Margetts, H., Mullainathan, S., Salganik, M.J., Vazire, S., Vespignani, A., Yarkoni, T.: Integrating explanation and prediction in computational social science. Nature. 595(7866), 181–188 (2021). https://​doi.​org/​10.​1038/​s41586-021-03659-0CrossRef
Zurück zum Zitat Hollingsworth, J.R.: In: Hannaway, C. (ed.) Scientific Discoveries: An Institutionalist and Path-Dependent Perspective, pp. 317–353. IOS (2008) Hollingsworth, J.R.: In: Hannaway, C. (ed.) Scientific Discoveries: An Institutionalist and Path-Dependent Perspective, pp. 317–353. IOS (2008)
Zurück zum Zitat Hucka, M., Nickerson, D.P., Bader, G.D., Bergmann, F.T., Cooper, J., Demir, E., Garny, A., Golebiewski, M., Myers, C.J., Schreiber, F., Waltemath, D., Le Novere, N.: Promoting Coordinated Development of Community-Based Information standards for modeling in Biology: The COMBINE Initiative. Front. Bioeng. Biotechnol. 3, 19 (2015). https://doi.org/10.3389/fbioe.2015.00019CrossRef Hucka, M., Nickerson, D.P., Bader, G.D., Bergmann, F.T., Cooper, J., Demir, E., Garny, A., Golebiewski, M., Myers, C.J., Schreiber, F., Waltemath, D., Le Novere, N.: Promoting Coordinated Development of Community-Based Information standards for modeling in Biology: The COMBINE Initiative. Front. Bioeng. Biotechnol. 3, 19 (2015). https://​doi.​org/​10.​3389/​fbioe.​2015.​00019CrossRef
Zurück zum Zitat Jacobs, A.M., Büthe, T., Arjona, A., Arriola, L.R., Bellin, E., Bennett, A., Björkman, L., Bleich, E., Elkins, Z., Fairfield, T., Gaikwad, N., Greitens, S.C., Hawkesworth, M., Herrera, V., Herrera, Y.M., Johnson, K.S., Karakoç, E., Koivu, K., Kreuzer, M., Yashar, D.J.: The qualitative transparency deliberations: Insights and implications. Perspect. Politics. 19(1), 171–208 (2021). https://doi.org/10.1017/s1537592720001164CrossRef Jacobs, A.M., Büthe, T., Arjona, A., Arriola, L.R., Bellin, E., Bennett, A., Björkman, L., Bleich, E., Elkins, Z., Fairfield, T., Gaikwad, N., Greitens, S.C., Hawkesworth, M., Herrera, V., Herrera, Y.M., Johnson, K.S., Karakoç, E., Koivu, K., Kreuzer, M., Yashar, D.J.: The qualitative transparency deliberations: Insights and implications. Perspect. Politics. 19(1), 171–208 (2021). https://​doi.​org/​10.​1017/​s153759272000116​4CrossRef
Zurück zum Zitat Jain, U., Min, J., Lee, J.: Harmonization of cross-national studies of aging to the Health and Retirement Study - user guide: Family transfer - informal care. University of Southern California, CESR-Schaeffer Working Paper Series No. 2016-008. (2016) Jain, U., Min, J., Lee, J.: Harmonization of cross-national studies of aging to the Health and Retirement Study - user guide: Family transfer - informal care. University of Southern California, CESR-Schaeffer Working Paper Series No. 2016-008. (2016)
Zurück zum Zitat Janssen, M.A., Alessa, L.N.I., Barton, M., Bergin, S., Lee, A.: Towards a Community Framework for Agent-based modelling. J. Artif. Soc. Soc. Simul., 11(2). (2008) Janssen, M.A., Alessa, L.N.I., Barton, M., Bergin, S., Lee, A.: Towards a Community Framework for Agent-based modelling. J. Artif. Soc. Soc. Simul., 11(2). (2008)
Zurück zum Zitat Jones, B.: The Burden of Knowledge and the death of the Renaissance Man: Is Innovation getting harder? Rev. Econ. Stud. 76(1), 283–317 (2009)CrossRef Jones, B.: The Burden of Knowledge and the death of the Renaissance Man: Is Innovation getting harder? Rev. Econ. Stud. 76(1), 283–317 (2009)CrossRef
Zurück zum Zitat King, G.: Replication, replication. PS: Political Sci. Politics. 28(3), 444–452 (1995) King, G.: Replication, replication. PS: Political Sci. Politics. 28(3), 444–452 (1995)
Zurück zum Zitat King, G.: An introduction to the Dataverse Network as an infrastructure for data sharing. Sociol. Methods Res. 36(2), 173–199 (2007)CrossRef King, G.: An introduction to the Dataverse Network as an infrastructure for data sharing. Sociol. Methods Res. 36(2), 173–199 (2007)CrossRef
Zurück zum Zitat King, G.: Ensuring the Data-Rich Future of the Social sciences. Science. 331(11), 719–721 (2011a)CrossRef King, G.: Ensuring the Data-Rich Future of the Social sciences. Science. 331(11), 719–721 (2011a)CrossRef
Zurück zum Zitat King, G.: Ensuring the data-rich future of the social sciences. Science. 331, 719–721 (2011b)CrossRef King, G.: Ensuring the data-rich future of the social sciences. Science. 331, 719–721 (2011b)CrossRef
Zurück zum Zitat Krücken, G.: Learning the ‘New, New Thing’: On the role of path dependency in university structures. High. Educ. 46, 315–339 (2003)CrossRef Krücken, G.: Learning the ‘New, New Thing’: On the role of path dependency in university structures. High. Educ. 46, 315–339 (2003)CrossRef
Zurück zum Zitat Lazer, D., Pentland, A., Watts, D., Aral, S., Athey, S., Contractor, N., Freelon, D., Gonzalez-Bailon, S., King, G., Margetts, H.: Computational social science: Obstacles and opportunities. Science. 369(6507), 1060–1062 (2020)CrossRef Lazer, D., Pentland, A., Watts, D., Aral, S., Athey, S., Contractor, N., Freelon, D., Gonzalez-Bailon, S., King, G., Margetts, H.: Computational social science: Obstacles and opportunities. Science. 369(6507), 1060–1062 (2020)CrossRef
Zurück zum Zitat Lillard, D.R.: Harmonization of panel surveys: The cross-national Equivalent file. In: Tomescu-Dubrow, I., Wolf, C., Slomczynski, K.M., Jenkins, J.C. (eds.) Survey Data Harmonization in the Social Sciences, pp. 169–188. Wiley (2023) Lillard, D.R.: Harmonization of panel surveys: The cross-national Equivalent file. In: Tomescu-Dubrow, I., Wolf, C., Slomczynski, K.M., Jenkins, J.C. (eds.) Survey Data Harmonization in the Social Sciences, pp. 169–188. Wiley (2023)
Zurück zum Zitat Merton, R.K.: The Normative Structure of Science. In R. K. Merton & N. W. Storer (Eds.), The Sociology of Science: Theoretical and Empirical Investigations (pp. 267–278). University of Chicago Press. (1973) [1942] Merton, R.K.: The Normative Structure of Science. In R. K. Merton & N. W. Storer (Eds.), The Sociology of Science: Theoretical and Empirical Investigations (pp. 267–278). University of Chicago Press. (1973) [1942]
Zurück zum Zitat Moshontz, H., Campbell, L., Ebersole, C.R., Urry, H.I.J., Forscher, H.L., Grahe, P.S., McCarthy, J.E., Musser, R.J., Antfolk, E.D., Castille, J., Evans, C.M., Fiedler, T.R., Flake, S., Forero, J.K., Janssen, D.A., Keene, S.M.J., Protzko, J.R., Aczel, J., Chartier, B., C. R: The Psychological Science Accelerator: Advancing psychology through a distributed Collaborative Network. Adv. Methods Practices Psychol. Sci. 1(4), 501–515 (2018). https://doi.org/10.1177/2515245918797607CrossRef Moshontz, H., Campbell, L., Ebersole, C.R., Urry, H.I.J., Forscher, H.L., Grahe, P.S., McCarthy, J.E., Musser, R.J., Antfolk, E.D., Castille, J., Evans, C.M., Fiedler, T.R., Flake, S., Forero, J.K., Janssen, D.A., Keene, S.M.J., Protzko, J.R., Aczel, J., Chartier, B., C. R: The Psychological Science Accelerator: Advancing psychology through a distributed Collaborative Network. Adv. Methods Practices Psychol. Sci. 1(4), 501–515 (2018). https://​doi.​org/​10.​1177/​2515245918797607​CrossRef
Zurück zum Zitat Nosek, B.A., Alter, G., Banks, G.C., Borsboom, D., Bowman, S.D., Breckler, S.J., Buck, S., Chambers, C.D., Chin, G., Christensen, G., Contestabile, M., Dafoe, A., Eich, E., Freese, J., Glennerster, R., Goroff, D., Green, D.P., Hesse, B., Humphreys, M., Yarkoni, T.: Promoting an open research culture. Science. 348(6242), 1422–1425 (2015). https://doi.org/10.1126/science.aab2374CrossRef Nosek, B.A., Alter, G., Banks, G.C., Borsboom, D., Bowman, S.D., Breckler, S.J., Buck, S., Chambers, C.D., Chin, G., Christensen, G., Contestabile, M., Dafoe, A., Eich, E., Freese, J., Glennerster, R., Goroff, D., Green, D.P., Hesse, B., Humphreys, M., Yarkoni, T.: Promoting an open research culture. Science. 348(6242), 1422–1425 (2015). https://​doi.​org/​10.​1126/​science.​aab2374CrossRef
Zurück zum Zitat Olson, M.: The Logic of Collective Action. Harvard University Press (1965) Olson, M.: The Logic of Collective Action. Harvard University Press (1965)
Zurück zum Zitat Peirce, C.S.: Truth and Falsity and Error. Dict. Philos. Psychol., 718–720. (1902) Peirce, C.S.: Truth and Falsity and Error. Dict. Philos. Psychol., 718–720. (1902)
Zurück zum Zitat Pfaff, C., Hasan, H.: Can Knowledge Management be Open Source. In J. e. a. Feller (Ed.), The International Federation for Information Processing (Vol. 234, pp. 59–70). Springer. (2007) Pfaff, C., Hasan, H.: Can Knowledge Management be Open Source. In J. e. a. Feller (Ed.), The International Federation for Information Processing (Vol. 234, pp. 59–70). Springer. (2007)
Zurück zum Zitat Piketty, T.: Capital in the Twenty-First Century. Belknap (2014) Piketty, T.: Capital in the Twenty-First Century. Belknap (2014)
Zurück zum Zitat Piketty, T., Saez, E.: Income inequality in the United States, 1913–1998. Q. J. Econ. 18(1), 1–41 (2003)CrossRef Piketty, T., Saez, E.: Income inequality in the United States, 1913–1998. Q. J. Econ. 18(1), 1–41 (2003)CrossRef
Zurück zum Zitat Popper, K.R.: (1959 [1934]). In: The Logic of Scientific Discovery. Basic Books Popper, K.R.: (1959 [1934]). In: The Logic of Scientific Discovery. Basic Books
Zurück zum Zitat Rai, A.: Open and Collaborative Research - A New Model for Biomedicine.pdf>. (2005) Rai, A.: Open and Collaborative Research - A New Model for Biomedicine.pdf>. (2005)
Zurück zum Zitat Ruggles, S., Cleveland, L., Sobek, M.: Harmonization of Census Data: IPUMS – international. In: Tomescu-Dubrow, I., Wolf, C., Slomczynski, K.M., Jenkins, J.C. (eds.) Survey Data Harmonization in the Social Sciences, pp. 207–226. Wiley (2023) Ruggles, S., Cleveland, L., Sobek, M.: Harmonization of Census Data: IPUMS – international. In: Tomescu-Dubrow, I., Wolf, C., Slomczynski, K.M., Jenkins, J.C. (eds.) Survey Data Harmonization in the Social Sciences, pp. 207–226. Wiley (2023)
Zurück zum Zitat Salganik, M.: Bit by Bit: Social Research in the Digital Age. Princeton University Press (2017) Salganik, M.: Bit by Bit: Social Research in the Digital Age. Princeton University Press (2017)
Zurück zum Zitat Salganik, M.J., Lundberg, I., Kindel, A.T., Ahearn, C.E., Al-Ghoneim, K., Almaatouq, A., Altschul, D.M., Brand, J.E., Carnegie, N.B., Compton, R.J., Datta, D., Davidson, T., Filippova, A., Gilroy, C., Goode, B.J., Jahani, E., Kashyap, R., Kirchner, A., McKay, S., McLanahan, S.: Measuring the predictability of life outcomes with a scientific mass collaboration. Proc. Natl. Acad. Sci. U S A. 117(15), 8398–8403 (2020). https://doi.org/10.1073/pnas.1915006117CrossRef Salganik, M.J., Lundberg, I., Kindel, A.T., Ahearn, C.E., Al-Ghoneim, K., Almaatouq, A., Altschul, D.M., Brand, J.E., Carnegie, N.B., Compton, R.J., Datta, D., Davidson, T., Filippova, A., Gilroy, C., Goode, B.J., Jahani, E., Kashyap, R., Kirchner, A., McKay, S., McLanahan, S.: Measuring the predictability of life outcomes with a scientific mass collaboration. Proc. Natl. Acad. Sci. U S A. 117(15), 8398–8403 (2020). https://​doi.​org/​10.​1073/​pnas.​1915006117CrossRef
Zurück zum Zitat Savage, M.: The return of inequality. In: The Return of Inequality. Harvard University Press (2021) Savage, M.: The return of inequality. In: The Return of Inequality. Harvard University Press (2021)
Zurück zum Zitat Singh, K.K.: Intellectual Property Protection in Bioinformatics and Open Bio Development. Asian Biotechnol. Dev. Rev. 16(3), 25–45 (2014) Singh, K.K.: Intellectual Property Protection in Bioinformatics and Open Bio Development. Asian Biotechnol. Dev. Rev. 16(3), 25–45 (2014)
Zurück zum Zitat Slomczynski, K., Tomescu-Dubrow, I.: Basic Principles of Survey Data Recycling. In B.-E. P. T.P. Johnson, I. A. L. Stoop, B. Dorer (Ed.), Advances in Comparative Survey Methodology: Multinational, Multiregional and Multicultural Contexts (pp. 937–962). Wiley Hoboken. (2018) Slomczynski, K., Tomescu-Dubrow, I.: Basic Principles of Survey Data Recycling. In B.-E. P. T.P. Johnson, I. A. L. Stoop, B. Dorer (Ed.), Advances in Comparative Survey Methodology: Multinational, Multiregional and Multicultural Contexts (pp. 937–962). Wiley Hoboken. (2018)
Zurück zum Zitat Spirling, A.: Why open-source generative AI models are an ethical way forward for science. Nature. 616, 413 (2023)CrossRef Spirling, A.: Why open-source generative AI models are an ethical way forward for science. Nature. 616, 413 (2023)CrossRef
Zurück zum Zitat Starbuck, W.H.: The Production of Knowledge: The Challenge of Social Science. Oxford University Press (2006) Starbuck, W.H.: The Production of Knowledge: The Challenge of Social Science. Oxford University Press (2006)
Zurück zum Zitat Tomescu-Dubrow, I., Wolf, C., Slomczynski, K.M., Jenkins, J.C.: Survey Data Harmonization in the Social Sciences. Wiley (2023) Tomescu-Dubrow, I., Wolf, C., Slomczynski, K.M., Jenkins, J.C.: Survey Data Harmonization in the Social Sciences. Wiley (2023)
Zurück zum Zitat Turek, K., Henkens, K., Kalmijn, M.: Gender and Educational inequalities in extending Working lives: Late-life employment trajectories Across three decades in seven countries. Work Aging Retire. (2022). (waac021 Turek, K., Henkens, K., Kalmijn, M.: Gender and Educational inequalities in extending Working lives: Late-life employment trajectories Across three decades in seven countries. Work Aging Retire. (2022). (waac021
Zurück zum Zitat Valentine, M.A., Retelny, D., To, A., Rahmati, N., Doshi, T., Bernstein, M.S.: Flash Organizations Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, (2017) Valentine, M.A., Retelny, D., To, A., Rahmati, N., Doshi, T., Bernstein, M.S.: Flash Organizations Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, (2017)
Zurück zum Zitat Van De Ven, A., Johnson, P.E.: Knowledge for theory and practice. Acad. Manage. Rev. 31(4), 802–821 (2006)CrossRef Van De Ven, A., Johnson, P.E.: Knowledge for theory and practice. Acad. Manage. Rev. 31(4), 802–821 (2006)CrossRef
Zurück zum Zitat Volberda, H., Schneidmuller, T., Zadeh, T.: Knowledge and Innovation - from path dependency toward Managerial Agency. In: Duhaime, I.M., Hitt, M.A., Lyles, M.A. (eds.) Strategic Management: State of the Field and Its Future, pp. 445–466. Oxford University Press (2021) Volberda, H., Schneidmuller, T., Zadeh, T.: Knowledge and Innovation - from path dependency toward Managerial Agency. In: Duhaime, I.M., Hitt, M.A., Lyles, M.A. (eds.) Strategic Management: State of the Field and Its Future, pp. 445–466. Oxford University Press (2021)
Zurück zum Zitat Wolf, C., Schneider, S., Behrand, D., Joye, D.: Harmonizing survey questions between cultures and over time. In: Wolf, C., Joye, D., Smith, T., Fu, Y.-. (eds.) The SAGE Handbook of Survey Methodology, pp. 502–524. SAGE (2016) Wolf, C., Schneider, S., Behrand, D., Joye, D.: Harmonizing survey questions between cultures and over time. In: Wolf, C., Joye, D., Smith, T., Fu, Y.-. (eds.) The SAGE Handbook of Survey Methodology, pp. 502–524. SAGE (2016)
Zurück zum Zitat Wuchty, S., Jones, B.F., Uzzi, B.: The increasing dominance of teams in production of knowledge. Science. 316, 1036–1039 (2007)CrossRef Wuchty, S., Jones, B.F., Uzzi, B.: The increasing dominance of teams in production of knowledge. Science. 316, 1036–1039 (2007)CrossRef
Metadaten
Titel
Accelerating social science knowledge production with the coordinated open-source model
verfasst von
Konrad Turek
Publikationsdatum
08.01.2025
Verlag
Springer Netherlands
Erschienen in
Quality & Quantity
Print ISSN: 0033-5177
Elektronische ISSN: 1573-7845
DOI
https://doi.org/10.1007/s11135-024-02020-7

Premium Partner