Detecting autocatalytic, self-sustaining sets in chemical reaction systems

https://doi.org/10.1016/j.jtbi.2003.11.020Get rights and content

Abstract

The ability of systems of molecular reactions to be simultaneously autocatalylic and sustained by some ambient ‘food source’ of simple molecules may have been an essential step in the origin of life. In this paper we first describe a polynomial-time algorithm that determines whether any given set of molecules, reactions and catalysations contains a subsystem that is both autocatalytic and able to be sustained from a given subset of the molecules. We also describe some combinatorial properties of this algorithm, and show how it can be used to find irreducible auto-catalysing and sustaining subsystems. In the second part of the paper we use the algorithm to investigate random catalytic networks—in particular, a model described by Kauffman. Using simulations and some analytic techniques we investigate the rate of catalysis that is required for the emergence of autocatalytic and sustaining subsystems.

Introduction

The origin of life still remains an elusive problem. Two main paradigms currently exist: the RNA-first theory and the protein-first theory (see e.g. Joyce, 1989; Kauffman, 1986). The former theory (RNA-first) maintains that life started with self-reproducing RNA molecules, which replicate by virtue of template complementarity. These molecules then evolved to encode (genetic) information that is translated into proteins. The main drawback of this theory is that replication (even template-based) is difficult, or at least very slow, to achieve without catalysts. Although it has been shown that some RNA has catalytic capability, this is only very limited and quite specific. Thus, the appearance of an RNA molecule that catalyses its own replication (or even that of others) seems highly unlikely.

In the other main theory (protein-first) it is assumed that life started as collectively self-reproducing sets of catalytic polymers (proteins). Since proteins form easily from freely available amino acids, and even small proteins have catalytic capabilities (often quite general), this theory may seem more plausible. However, proteins do not have a template structure, and thus it is hard to envision how they could have replicated easily without the aid of a genetic code.

An alternative theory, which allows both scenarios to occur concurrently, even cooperatively, is that of autocatalytic sets of molecules (including RNA and proteins, and possibly others). The idea is that molecules engage in catalysed chemical reactions, creating or transforming into other molecules. It has been argued that in sufficiently complex chemical reaction systems, an autocatalytic set will emerge spontaneously, i.e. a subset of molecules and reactions where each molecule is created by at least one reaction from this set, and each reaction is catalysed by at least one molecule from the set (‘catalytic closure’) (Dyson, 1985; Kauffman 1986, Kauffman 1993).

Here, we formally investigate this probability of autocatalytic sets arising in chemical reaction systems in more detail, both theoretically and numerically. We present a general, polynomial-time algorithm for detecting, in any given chemical reaction system, autocatalytic sets that can be sustained from a suitable ‘food’ source. We then apply our algorithm to perform numerical simulations, combined with some analytic techniques, to investigate Kauffman's simple abstract origin of life model involving large numbers of molecules randomly catalysing the ligation and cleavage of other molecules (Kauffman 1986, Kauffman 1993). Kauffman claimed that life-like subsystems (‘connected, reflexively autocatalytic’ sets) must spontaneously arise with high probability once the number of molecule types becomes sufficiently large, a conclusion that was subsequently criticised by Lifson (1997), and partially resolved in Steel (2000).

The probability of such autocatalytic sets occurring depends largely on the probability p that a molecule catalyses any given reaction. In Kauffman's model, this probability remains constant, even if the total number of possible reactions increases (e.g. by virtue of having more and more molecule types). In other words, the average number of reactions that a molecule catalyses increases rapidly with an increasing number of molecule types. It is intuitively clear that, given enough molecule types, an autocatalytic network will then occur with high probability. However, in Lifson's interpretation of Kauffman's model, this probability of catalysation p is inversely proportional to the total number of possible reactions, and thus the average number of reactions that a molecule catalyses remains constant with an increasing number of molecule types. In this case, it is not clear whether an autocatalytic set will arise at all, and for p low enough it is almost certain it will not (Steel, 2000).

What is of interest here, is what happens in between these two extreme cases (constant or inverse proportional catalysation probabilities). In particular, where does the transition occur whereby the existence of an autocatalytic set changes from being highly unlikely to being almost certain? And, furthermore, what does this transition look like? In this paper we provide a theorem that shows that an intermediate degree of catalysation (between the two extremes) suffices for there to be, on average, a large number of self-sustaining autocatalytic systems (in line with a conjecture from Steel (2000)). We then provide numerical results generated by an implementation of our algorithm as applied to Kauffman's model, which shed additional light on the occurrence and shape of the transition between the two extreme cases.

The paper is organized as follows. Section 2 formally introduces catalytic reaction systems, and Section 3 defines the notion of autocatalytic sets generated by a food source (called RAF sets here). Section 4 then proves some basic properties of RAF sets, after which Section 5 introduces a pair of operations that can be applied to any catalytic reaction system to reduce its size while maintaining these basic properties. Section 6 then states our main result and shows how the reduction rules can be used to find whether a catalytic reaction system contains an RAF set. Section 7 presents the complete algorithm for finding RAF sets and shows that it has a polynomial running time. Section 8 then provides an overview of random catalytic reaction systems and formalizes Kauffman's model, while the results of our simulations are presented in Section 9. Finally, Section 10 summarizes our main conclusions.

Section snippets

Catalytic reaction systems

In this section, we review the notion of catalytic reaction systems and reaction graphs.

Definitions

  • Let X denote a set of molecules. A reaction is an ordered pair r=(A,B) where A, BX, and AB=∅. If A={a1,…,as} and B={b1,…,bt} then r represents an admissible chemical reaction in which the molecules in A (the ‘reactants’) combine, in certain proportions, to produce the molecules in B (the ‘products’), writtenn1a1+⋯+nsas→n1′b1+⋯+nt′bt,where ni and ni′ are positive integers for all i. Note that a reversible

Autocatalytic sets generated by a food source

Given a subset R of R, and a subset X′ of X, define the closure of X′ relative to R, denoted clR(X′) to be the (unique) minimal subset W of X that contains X′ and that satisfies the condition that, for each reaction (A, B) in R,A⊆X′∪W⇒B⊆W.Informally, clR(X′) is X′ together with all molecules that can be constructed from X′ by the repeated application of reactions in R. Note that clR(X′) is well defined and is a subset of π(R′)∪X′ since π(R′)∪X′ satisfies Eq. (1) and the collection of

Properties of RAF sets

In this section we present some lemmas that provide useful information on the structure of RAF sets.

Lemma 4.1

Consider a catalytic reaction system Q=(X,R,C) over F, and two subsets R1, R2 of R. Then,

  • (i)

    if R1 and R2 are both RA for Q (respectively F-generated) then R1R2 is RA for Q (respectively F-generated);

  • (ii)

    if R1 and R2 are both RAF for Q then R1R2 is RAF for Q.

Proof

The result for RA in part (i) is clear from the definition. If R1 and R2 are both F-generated, thenρ(R1R2)=ρ(R1)∪ρ(R2)⊆clR1(F)∪clR2(F)⊆clR1R2(F)

Reduction rules

In this section we introduce a pair of reduction rules which can be applied to any catalytic reaction system Q=(X,R,C). These rules operate as follows.

  • (R1)

    For any r∈R with {x∈supp(R):(x,r)∈C}=∅, delete r from R.

  • (R2)

    For any r∈R for which ρ(r)⊈clR(F), delete r from R.

Let γ(R)⊆R denote a set of reactions obtained by repeated applications of (R1) to (X,R,C) until no further reductions can be made, and let δ(R)⊆R denote a set of reactions obtained by repeated applications of (R2) to (X,R,C) until no

Finding recursively autocatalytic (and RAF) sets

Using the operations γ and δ as introduced in the previous section, we can now construct a decreasing (in size) and nested sequence R=R1,R2,… of subsets of R whereRi+1=δ(γ(Ri)).Let R=⋂i⩾1Ri.

Given a catalytic reaction system Q=(X,R,C), and a subset R of R, we say that R is an irreducible RAF for Q if R is RAF for Q, but no proper subset of R has this property.

We can now state our main result.

Theorem 6.1

A catalytic reaction system Q=(X,R,C) has an RAF set if and only if R is non-empty, in which case R

The RAF algorithm

We now present the polynomial-time algorithm for computing R. The algorithm consists of three separate subroutines that are called sequentially and repeatedly. All three subroutines assume the existence of a set X of molecule types, a set R of reactions, a set C of catalysations, and a set W representing the closure of the food set F relative to R. The subroutines assume X, F, and C to be fixed, but will change the contents of the sets W and R.

The first subroutine, ReduceToRA, implements the γ(

Random catalytic reaction systems

In this section we will consider catalytic reaction systems Q=(X,R,C) over F for which X, F and R are fixed, but C is a random set of catalysations. In particular, we will suppose that each pair (x,r)∈X×R is independently placed in C with the same probability p. Under this model, we will let P(Q,p) denote the probability that there exists an RAF set for Q.

Proposition 8.1

P(Q,p) is a monotone increasing function of p.

Proof

First note that if C1⊂C2⊆X×R, and if (X,R,C1) contains an RAF set, then (X,R,C2) contains an RAF

Simulations

We have implemented the RAF algorithm and used it on Kauffman's original model to study the questions raised in the previous section numerically. Even though our algorithm is polynomial in |R| (the total number of reactions), in Kauffman's model this number increases exponentially with increasing maximum molecule length n, so the values for n that can be numerically investigated in a reasonable time is still rather limited. In the results presented in this section, the highest value for n used

Conclusions

We have described a polynomial-time algorithm for determining whether an arbitrary catalytic reaction system contains an RAF set, without having to search through the exponentially large space of all possible subsets of reactions. The algorithm relies on some simple combinatorial properties of catalytic reaction systems, and the techniques described may be useful for investigating a variety of random biochemical models.

In this paper we have demonstrated the algorithm's use for the Kauffman

Acknowledgements

We thank the Allan Wilson Centre for Molecular Ecology and Evolution for allowing us to use their Beowulf cluster Helix for our numerical simulations. Also thanks to David Penny for helpful comments.

References (10)

There are more references available in the full text version of this article.
View full text