Introduction

Cooperation constitutes a key ingredient to understand the origins of animal societies and, in particular, of human ones1. A number of mechanisms leading to the emergence and stability of cooperative behavior have been proposed2,3, ranging from kin selection4 to the existence of a sessile or socially structured population5,6. For the specific case of human behavior7, additional paths for the emergence and maintenance of cooperation have been proposed. These include reciprocity, be it direct8 or indirect9, punishment10,11, refusing to interact12, use of social information13 and others, based on the human cognitive capacity to keep track of other's behavior and use control mechanisms. Without enforcement mechanisms such as those mentioned above, human groups often fail to sustain a public resource, which every group member is free to overuse14,15,16, except in the case of pairwise interactions17,18,19,20. This is also the case when interaction is restricted to a set of neighbors on a (spatial) lattice, as shown by a number of experimental works22,23,24,25 unless there is a possibility to severe connections to non-cooperative individuals or groups26,27, something that can be understood as punishing or abstaining from participation.

One of the most plausible explanations for the decay of cooperation in public goods settings is the fact that many individuals are willing to contribute more the more their partners do. This behavior, called conditional cooperation, has been observed in many public goods experiments28,29,30,31, often along with a large percentage of free-riders. It can be then seen that the combination of people who behave in this manner with free-riders leads to a rapid deterioration of the cooperation32. For the case of interactions taking place on a structured or networked population, similar results have been found in iterated multi-player Prisoner's Dilemma (IMPD) on a square lattice by Traulsen et al.23 and in an experiment where it is allowed to change partners by modifying the network links by Rand et al.27 (except when link rewiring was global, random and at every round). The most recent development on this issue arises from the experiments by Grujić et al.24 and Gracia-Lázaro et al.25, who observed that conditional cooperation may also depend on the individual's own past action, i.e., on the ‘mood’ in which the subject currently is. In this case, individuals behave as conditional cooperators if they cooperated in the past while they ignore the context and free-ride with high probability if they did not cooperate.

From the viewpoint of the ultimate origins of this behavior, conditional cooperation is itself a puzzle, as it has been shown33 that in an IMPD, the only conditionally cooperative, evolutionarily stable strategy prescribes cooperation only if all other group members cooperated in the previous period, which is not what is observed. Furthermore, for the case of moody conditional cooperation, theoretical results based on a replicator dynamics approach showed that in groups with five or more people, the coexistence of moody conditional cooperators with free-riders (and possibly a few unconditional cooperators) is not possible34. In this paper, we advance the knowledge on this issue by reporting on a series of experiments with human subjects playing an IMPD in groups of different sizes. As we will see, the analysis of our results allows us to confirm very clearly the existence of moody conditional cooperation in all group sizes, this being in fact the behavior of almost all the subjects. By developing a generalized linear mixed model (GLMM)35,36 we will also show very strong evidence that the behavior of people changes from what is observed in pairwise interactions to the case of 3 or more players and is independent of the group size once at least 3 players are involved. This results in cooperation actually increasing for pairwise interactions, while decaying as usual for groups of 3 or more individuals. Finally, our analysis also indicates that moody conditional cooperation is very similar in all subjects as far as the reciprocity factor is concerned, i.e., how more willing to contribute the subject is as a function of the number of its previous cooperative partners. We have found that this is quite similar in all subjects, the idiosyncratic component residing in the initial generosity or propensity to cooperate.

Results

Existence of moody conditional cooperation

In our experiment, subjects played a multiplayer prisoner's dilemma in which they had to choose one action to interact with their opponents. Payoffs were set as follows: Mutual cooperation was rewarded with a payoff R for both players and mutual defection earned them nothing, while a cooperator facing a defector obtained nothing, leaving the defector with the temptation payoff T. R and T where slightly modified for each of our four group sizes so the expected earnings would roughly be the same in all cases, always keeping the ratio T/R constant to stay within the same type of game. See Methods for a detailed description of our setup.

Let us begin reporting on the results of our experiment by looking at our first question, namely the existence of moody conditional cooperators and whether it depends on the group size or not. Figure 1 shows our results on this issue. We observe that moody conditional cooperators are indeed present in all sizes, including groups of four and five players, at variance with the analysis in Ref. 34. However, this disagreement is not entirely surprising, since theoretical results for repeated games are notoriously sensitive to modeling assumptions: Thus, computational results on IMPD based on genetic algorithms38 show that the evolution of cooperation in theoretical models depends very much on the implementation details. Therefore, the fact that our experimental observations do not agree with the predictions of a very specific model based on the replicator equation is something that can be expected. On the other hand, we observe only a few players using AllD (always defect) and even less players playing AllC (always cooperate), so what we are observing may be close to a homogeneous state consisting only of moody conditional cooperators, something that is possible even in large groups for certain parameters34. In any event, Fig. 1 shows very clearly the difference between the probability of cooperating after having cooperated or having defected, highlighting the importance of relating the current action with the one in the previous round. The plot also indicates that the probability of cooperation increases with the number of cooperators in the group in the last round, for all group sizes. Cooperation when no one cooperated before is relatively large in groups of size 2 and lower for other group sizes (but similar among them). Interestingly, the increment in probability with increasing number of cooperators is similar for all groups.

Figure 1
figure 1

Probability that an individual cooperates after having cooperated (squares) and after having defected (circles) in the previous round, for groups of 2 (top left), 3 (top right), 4 (bottom left) and 5 (bottom right) people.

Open symbols, experimental results; full symbols, predictions from our GLMM. Error bars correspond to the standard deviation of the observations. Lines are only a guide to the eye.

Group size dependence of cooperation

Figure 2 depicts the fraction of cooperative actions as a function of the iteration of the game, demonstrating that the results for groups of size two (i.e., pairwise interactions or usual 2 × 2 IPD) are very different for the observations on the rest of groups (sizes three and higher). Pairwise interactions show very high cooperation levels with an increasing trend (see below for a discussion of similar, earlier results17,18) whereas for the rest of groups we find that cooperation decays from initially large values (around 60% or larger) much in the same way as in most Public Goods or networked IPD experiments. The fact that for groups with three subjects or more the cooperation level behaves in a similar manner is in agreement with earlier findings that the level of contributions to voluntary public goods does not depend significantly on the group size39. In addition, the low levels of cooperation we observed for groups of size three and four is consistent with the results in Public Goods experiments with up to 50 rounds40,41. In this context, our experiment, by analyzing sizes from two to five in the same experimental setup, provides evidence that there is an abrupt change in behavior in going from a two-player IPD to IMPD or public goods games with three or more participants, i.e., we could say that three is a crowd.

Figure 2
figure 2

Percentage of cooperation as a function of the round for groups of 2 (top left), 3 (top right), 4 (bottom left) and 5 (bottom right) people.

Open circles, experimental results; solid line, predictions from our GLMM.

The case of the two-player IPD

The results for the pairwise IPD deserve a separate discussion as they offer several interesting insights. In our experiment, participants were not informed about the number of rounds of the game, although they were given an estimate of the time duration of the procedure, so they could realize that there would be a sizable number of rounds in any event. Therefore, the ‘shadow of the future’ effect is very present. As a consequence, pairwise IPD experiments show a large level of cooperation in agreement with the observations of Ref. 19, obtained for much shorter IMPDs (an expected length smaller than 6 rounds). Interestingly, the large length of our repeated game allows us to go beyond this observation: Indeed, if we compare our observations to those reported in Ref. 37, who carried out experiments of length 12, we find an agreement for this initial part of the repeated game, as in both cases the cooperation level decreases with increasing round number. However, as the game continues in our experiment, we observe a clear trend towards increasing cooperation, punctuated by episodes of lower cooperation levels which are rapidly overcome. It is also worth mentioning in this context the (often overlooked) early experiments by Rapoport and Chammah17 and by Flood and Dresher (with only one pair of subjects, reviewed in Ref. 18) which, by running up to 300 and 100 iterations of the PD respectively, already showed that cooperative behavior could be stable, in agreement with our findings here. The similarity of the cooperation curve in Ref. 17 to ours, including the initial decrease, is indeed remarkable.

The difference in cooperation among groups does not arise from the initial propensity to cooperate, as cooperation at the first round is mostly independent of the group size (cf. Fig. 2). Instead, it is due to the behavior of the players as the repeated game progresses. This is in agreement with the type of moody conditionally cooperative strategy we found: the strategy parameters for pairwise PD, being clearly different from those of the larger groups, indicate that choosing cooperation is very likely if one cooperated in the previous step, while the probability to cooperate following a defection is relatively large, below but close to 0.5. This strategy is not the well known Tit-for-tat (TFT)5, as TFT does not depend on the player's own previous action, while Fig. 1 strongly suggests that the previous action of the player affects her next choice. Our result is also in agreement with those reported by Fudenberg et al.21, who in their treatments without noise found that when a player has cooperated in all rounds, a defection by her partner is not immediately answered with defection in a 42% of the cases, a number that is roughly similar to the ones we obtain for our moody conditional cooperators (albeit the comparison must be taken with caution as the way to characterize the behavior in both experiments is not exactly the same).

Model

To take into account that our data contains repeated measures on each subject of a binary variable, we resorted to the development a GLMM as follows. Let yijt be the response of the subject i in group j at time t. Let yijt = 1 if this subject cooperates at time t and 0 otherwise for all i, j and t. Then yijt ~ Bernoulli(pijt). By the nature of the experiment, the subjects are nested in groups. Thus, a model needs to take into account the nested structure of the data and the repeated measures on the subjects.

Our concern with respect to dependency is the repeated measures on the same subject. First, the observations on the same subject are correlated just because they are decisions of the same person. This is also known as within subject variability. Second, the observations close in time, on the same actor, are more likely to be highly correlated as opposed to the observations further apart. We interpret this as latent generosity with a time component. Third, another source of variation is the latent component of the individual reaction to the number of cooperative actions observed in the group in the previous round. We can interpret this as latent reciprocity. These latent effects then measure “between-subject” variability.

Before introducing the model we finally chose as the best for our data, let us point out that, in alternative specifications, we checked for effects of major and gender, without finding any significant effect. Most importantly, we tested the dependence on whether the group was manipulated by the computer or not, again finding no differences (see Methods below). With these inputs, we finally proposed the following model:

where pit is the probability of cooperation of subject i at time t and the factors that affect it are as follows: χ(sizeil) is the characteristic function corresponding to the group size of subject i, that is, χ(sizeil) = 1 if subject i played in a group of size l and 0 otherwise; LagCoopit is the number of cooperative actions received by subject i at time t − 1; LagActionit is equal to 1 if the subject cooperated in the previous round and 0 otherwise and βl and , l = 2…5 and are the parameters of the fixed effects. On the other hand αi is the latent cooperativeness of each subject and γi is her latent reciprocity (the individual random variation in the response to perceived cooperation). Individual latent effects follow normal distributions: α ~ N(0,Σ), where , where is the identity matrix and analogously γ ~ N(0, Σγ), where, . In addition, we have the repeated measure structure modeled as AR(1) structure through the ξit term, where , where u is a vector of random variables with variance σu. That is, there is a random component on the left hand side of the model which measures the “within subject” variability. The structure of the covariance matrix for this effect is given by a symmetric matrix, R, whose (ij)-entry is .

Model results

The model captures well the observations from the experiment, as can be seen from the comparison between the experimental data and the model predictions in Figs. 1 and 2. The agreement is particularly good for the cooperation level, as this magnitude can be obtained directly from the model, whereas there are small discrepancies in the slope of the conditional cooperation lines, mostly for the highest cooperative contexts. These discrepancies can be understood as a consequence of the fact that the estimation of these lines is an indirect product of the model. Another feature that is confirmed is the clear dependence on the players' own previous action, their ‘moodiness’, an aspect to which we come back below.

We first discuss the latent factors in the model. The corresponding variance components estimated within our model are represented in Table 1. The corresponding p–values are obtained by applying the log-likelihood ratio significance test (LRT) on the boundary of variance parameter space as in Refs. 43 and 44. From this table, a very interesting result which could not be seen from our analysis so far arises: While there is substantial heterogeneity in baseline attitudes to cooperation, the attitudes to reciprocal altruism are more uniform. To put it more formally, the variation between the individual latent effect, that is, the generosity, is three times larger than the variation of the between-individual reciprocity random effects (γ). Hence, we can conclude that individuals, while differing greatly in generosity, are closer in reciprocity, i.e., they enter the game with a naturally diverse predisposition to cooperate, but once they are playing, the way they answer to a given number of cooperative partners is similar among players. This is a remarkable finding insofar as there have been reports of the importance of heterogeneous behavior in related experiments30,45, but here we are able to identify for the first time the aspect for which heterogeneity is more relevant, namely the a priori cooperative predisposition of the subjects.

Table 1 Results for the variance of the random effects. Shown are the estimates, their standard error and the log-likelihood ratio (LRT) p-value assessing their significance. From top to bottom, the table shows the results for the generosity, the reciprocity and the two parameters of the AR(1) formalism

Turning now to the fixed effects, the predicted values for the corresponding parameters are presented in Table 2. The estimates and their p-values give us the individual significance levels. The type 3 tests collect the information on overall significance of the effects. Based on the Table 2, we have size, LagCoop and LagAction as highly significant covariates at 1% significance level). Other relevant results include, for instance, the fact that the size of the groups is important for cooperative attitudes. As Table 2 shows, the parameter for the baseline cooperative attitude in a group of size 2 is larger and statistically different from all the others. In turn, the baseline cooperative attitude is not statistically different between sizes 3 and 5. The conditional cooperation declines monotonically with group size, although the differences become smaller as size increases and the coefficient is still statistically different from zero even at the largest size. This is an interesting point that might be useful to understand why cooperation is more fragile in large groups, which could in turn explain why social groups often evolve punishment strategies directed solely at deviators, as in Ref. 10. Finally, the result that LagAction is relevant points to the dependence of actions on what occurred at the previous round. In this respect, it is important to mention that we also tried other models in which dependence on two previous time steps was included and we found that this was not significant. Therefore, the dependence on the player's own previous choice is enough to capture the results of the experiment, a finding that is in agreement with earlier work20,21.

Table 2 Results for the fixed effects. Shown are the estimates, their standard error and the p-value assessing their significance. The upper left part of the table shows the estimates for βi coefficients, i = 2, …, 10. The lower left and the right parts of the table show significance test results for the different factors in the model. The tests are summarized in Methods

Discussion

In this paper, we have reported on experiments on IMPD showing that most subjects behave in a moody conditionally cooperative manner, reciprocating the observed cooperation after a cooperative choice while changing into a non-reactive, mostly defector strategy following their own defection. This had been observed earlier in lattice PD experiments24,25 and our results now show that this type of behavior is characteristic of the social dilemma and not of the number of partners or their (spatial) arrangement. By means of our GLMM, we have confirmed by an independent analysis that in order to understand the behavior of subjects in the experiments, it is enough to consider the actions of the previous round, as the information on the precedent round turned out to be not significant. Interestingly, this may be related to the recent results in Ref. 46 showing that, in iterated games, the player with the shortest memory in effect sets the rules of the game. An additional insight provided by our GLMM concerns the universality and heterogeneity of the moody conditionally cooperative strategy: Remarkably, we have shown that heterogeneity arises through the initial predisposition to cooperate, which turns out to be quite idiosyncratic; in contrast, the probability to reciprocate cooperation after having cooperated has an approximately linear dependence whose slope shows a much smaller degree of variability.

On the other hand, while the behavior is generically the same for all sizes, we have found quantitative differences between size two and the other sizes studied, as both the probability of cooperating following a defection and the initial propensity to cooperate are larger for size two. The combination of these modifications leads to the emergence of cooperation in pairwise PDs: following an initial phase in which cooperation decays, compatible with previous observations, the fraction of cooperative actions increases reaching values above 80% after 100 rounds (albeit some spikes of defection appear from time to time, followed by larger levels of cooperation than those observed prior to the spike). In this way, our pairwise PD experiments confirm the possibility of the appearance of very high levels of cooperation through a reciprocity mechanism which, interestingly, does not seem to be mediated by TFT-type strategies (even if they only require one step memory as stated above), but rather they appear to be more related to the reactive type3. With respect to groups of size three of larger, we have found that cooperation is less likely in large groups. This has been already reported by other authors: in experiments with Cournot oligopolies, it has been shown47 that for two firms there is some collusion, for three firms the output is Nash and for four or five firms there is never collusion and the Cournot outcome is obtained. Interestingly, this is also reminiscent of the results from a recently proposed48 agent-based model for certain combinations of public or private signaling and players mistakes. Notwithstanding those results, here, for the first time, we observe within a unique experimental setup that there is a sharp boundary separating the case of size 2 from the rest. Indeed, our experiments show that in terms of the emergence of cooperation in the PD, three is already a crowd and cooperation does not emerge. We believe that this is the case because as soon as there is more than one partner in the game, defections hurt equally partners who defected in the previous round and partners who cooperated. Therefore, in the absence of specific punishment mechanisms, reciprocity does not work as it does in the case of the pairwise PD and cooperation eventually deteriorates.

Our results will have important consequences on the study of cooperation in social dilemmas. The confirmation of the existence and generality of moody conditional cooperation leads, as has been shown in Ref. 49, to the prediction that when players play a PD with their neighbors on a network, the resulting level of cooperation is the same as when they are in a well-mixed population, in which they can interact with every other player. This is a very relevant result, as there has been an ongoing discussion for 20 years about the influence of networks on cooperation50, which could thus be closed for good. The recent experiments on heterogeneous networks25 may indeed provide that closure in view of the results reported here: Our observation that three players are enough for the decline of cooperation offers an explanation for the observation in experiments that different types of neighborhoods (e.g., different degrees) on the network are irrelevant as well. We hope that this work stimulates further experimental research to assess the range of social dilemmas and related problems for which three is a crowd.

Methods

Experimental setup

A total of 228 subjects participated in our experiments. Subjects were volunteers from the pool of the Economics Laboratory at the Department of Economics of Universidad Carlos III de Madrid. Participants interacted anonymously via computers at the Laboratory using software written with z-tree42. In all, 12 sessions were conducted in three consecutive days in April 2011. Each session lasted approximately 45 min on average. In each session, the subjects were paid a 10 € show-up fee. Each subject's final score summed over all rounds was converted into dollars at an exchange rate that depended on the group size. The payoffs were set to R = 7 ECUs and T = 10 ECUs in all group sizes. The adjustment of the expected payoffs was then implemented through the conversion rate: The exchange rate was for 100 ECUs (Experimental Currency Units): 2 € in the group of 2 players; 1.66 € for 3-player groups; 1.33 € for 4-player groups; and 1 € for 5-player groups. Earnings in a typical session ranged from 5 to 15 €. The instructions of the experiment, translated into English, are included in the Supporting Information. The Spanish original is also available upon request.

Computer intervention

In half of the sessions, for all group sizes, there was a computer intervention in the decisions, in order to improve the statistics on the most cooperative contexts. The players were informed that:

“Occasionally and in completely random way, the computer can change your decision or that of the other player. The program does not report this change when it occurs. In such cases the payment is calculated as if the player concerned had actually taken the decision that the computer chose. The frequency with which this happens is low: your actions will remain unchanged for at least an 85% of the time.”

Computer intervention was carried out in the following manner and only after the first 5 rounds took place unmodified: From round 5 to round 25, there is a 20% chance of having a computer intervention. In case there is such an intervention, the idea is to increase the number of highly cooperative contexts and therefore every defection was turned to cooperation with probability (NNcoop)/(NNcoop + 1), N being the number of players in the group and Ncoop being the number of players that cooperated in that round. After round 25, there was still a 20% chance of intervention, but now it was intended to increase the number of contexts that have appeared the smallest number of times up to that round. Let us call the number of cooperators in such context Nwanted. If this number is higher than Ncoop, we change defection to cooperation with probability (NwantedNcoop)/(NNcoop + 1); otherwise, we change cooperation to defection with probability (NcoopNwanted)/(Ncoop + 1). This procedure allowed us to obtain better statistics for the highly cooperative contexts and it was mild enough as to not influence the results, as was shown by the comparison of the results of the treatments with and without computer intervention.

Statistical tests of significance

The statistical tests used in the paper are the log-likelihood ratio (LRT) significance test of variance parameters, standard significance tests and type III test on fixed effects and the contrast analysis for the levels of fixed effects. The log-likelihood ratio significance test is used for the variance parameters since the tested value, 0, in on the boundary of the parameter space of the variances. The theory and development behind this test is explained in Ref. 43. Basically, a low p-value indicates significant variance parameter, i.e., heterogeneity among the participants of the random effect. These results are presented in Table 1. The standard significance test and the type III tests test for the significance on the parameters, the former individually and the latter jointly, for all levels of that variable. For example, consider the variable size. The first part of Table 1 presents the results of standard significance test results for β2, β3, β4 and β5, which corresponds to sizes 2, 3, 4 and 5, respectively. The second part of the same table represents the joint significance of the size effect. i.e., H0 = β2 = β3 = β4 = β5 = 0 versus at least one is nonzero. The last test performed here is the contrast analysis. This procedure investigates the differences between the levels of the same variable. For example, again using size, in the previous tests we have considered differences from 0; now we are studying whether the effect of being in a group of size 2 is different than that of sizes 3, 4 and 5, respectively. Due to the multiple testing in this procedure the p-values are adjusted using Bonferroni adjustment, which adjusts the p-values by the number of tests performed. This adjustment is more on the conservative side, that is, we do not reject more often.

Ethics statement

All participants in the experiments reported in the manuscript signed an informed consent to participate. Besides, their anonymity was always preserved (in agreement with the Spanish Law for Personal Data Protection) by assigning them randomly a username which would identify them in the system. No association was ever made between their real names and the results. As it is standard in socio-economic experiments, no ethic concerns are involved other than preserving the anonymity of participants.

This procedure was checked and approved by the Viceprovost of Research of Universidad Carlos III de Madrid, the institution hosting the experiment.