There are number of challenges which become especially relevant when scaling up the size and complexity of gene circuits for useful functions. Namely the ability of a circuit to sense inputs and generate useful outputs, manage resource consumption, and maintain modularity of parts. In particular, modularity faces challenges in orthogonality, retroactivity and avoiding undesirable behaviour arising from genetic and cellular context. Each problem will be defined, its importance explained, the current state of the art and future prospects examined.
7.1 Modular, robust and well characterized parts
Generally, larger circuits contain more parts with each part and connection representing another point of failure. Therefore parts must be well characterized with robust and predictable behaviour regardless of context, enabling the design of large-scale circuits to be fast, predictable and reliable. Essentially this refers to modularity, whereby parts retain their inherent function and behavioural characteristics irrespective of the conditions that they are placed in Sauro (
2008). This enables two key processes to occur, the decomposition of a system into individual parts which can be constructed and tested separately and the subsequent construction of larger systems and from a library of smaller well understood pieces which generate predictable functions. Modularity is difficult because of several overlapping yet distinct challenges common to biological systems: connectivity, retroactivity, orthogonality, and context effects.
Connectivity in this scenario refers to the ability of parts to communicate reliably with other parts, robust signal propagation down a system is important to generate a consistent output, if a signal degrades due to noise or is unable to be propagated it can disrupt function. Therefore it is desirable to protect circuits by maintaining good connectivity.
Context dependency is the phenomena whereby part behaviour becomes dependent or affected by unwanted interactions from the host, environment, or even its own composition (Cardinale and Arkin
2012). Unlike in electronic circuit design modules are linked by discrete wires, which when layered correctly are unidirectional and do not propagate signals to unintended recipients and have minimal interaction with the surrounding environment. This trait is increasingly important as circuits get larger and more complex, as cross talk leads to noise and unpredictability. Functionality can break down due to these unwanted interactions with the host, system and environment (Kwok
2010; Wang et al.
2011; Wang and Buck
2012; Liu et al.
2018a,
b). A part that does not interact significantly with this context can be surmised to be orthogonal (Liu et al.
2018a). Orthogonality therefore, is important for both functionality and modularity.
The problem of context runs deep, even genetically identical cells in the same environment can show variable phenotypes; attributed in part to stochastic gene expression due to the variable nature of small numbers of interacting molecules (Munsky et al.
2012). Synthetic pathways can elicit responses in the host such as stress or simply display toxicity and circuit performance is tied closely to the health of the host, its physiology, the growth rate, and (discussed later) the availability of resources both internal and external, cell volume and even division state (Cardinale and Arkin
2012; Brophy and Voigt
2014; Liao et al.
2017). Context can also extend into environmental factors such as pH or media (Wang et al.
2011). Temperature in particular has been shown to affect the rate of transcription and the secondary structure of DNA and RNA (Cardinale and Arkin
2012). There is also genetic context; expression can be disturbed by the composition of the adjacent DNA sequences resulting in UTRs affecting the secondary structure and translation rate of the mRNA (Reeve et al.
2014). The size and copy number of the host plasmid can also affect behaviour (Liu et al.
2018b). DNA folding and spacing can affect the steric (spatial) ability for transcription factors to bind, sequence homology can cause deleterious effects and even the orientation of genes on the plasmid can modulate expression levels (Yeung et al.
2017). Since replication of the DNA must occur, there will be errors and possible rendering of parts non-functional. Because many parts have a negative effect on cell health and growth, eventually populations will incorporate an increasingly large subsection of non-functional circuits, this is known as genetic instability (Zhang et al.
2016). This is in spite of selection methods with, for example, antibiotics as the cells will still evolve to only retain the minimum number of genes required.
Retroactivity specifically, was defined by Jayanthi et al. (
2013) as “the phenomenon by which a downstream system changes the dynamic state of an upstream system in the process of receiving information from the latter”. In this case, downstream and upstream are relative to the intended flow of information (Del Vecchio et al.
2008). Essentially, this means that attaching example part B to receive the output of example part A will change the way part A behaves, this of course scales as a problem the more connections there are. In biology this can occur when upstream regulatory factors bind their downstream targets. This gets worse as the ‘load’ increases (the number of sites relative to factors), and is magnified in larger circuits, as when signalling molecules are bound they can no longer transfer information.
The question of context, orthogonality, signal strength and modularity, although distinct are also overlapping challenges and have interacting solutions. One of the best ways to maintain a robust signal in digital like circuits is to maintain a large dynamic range, that is, a large difference or ratio between the ON and OFF state. Although by nature analog graded responses are more vulnerable to noise as they have continuous outputs, this remains true for both as a large dynamic range means the relative effect of any noise is smaller providing the scale of noise remains the same. This protects the signal from degradation as it propagates throughout a system, which itself lends itself to modularity, as a strong signal can mean behaviour remains robust throughout different environments. The tuning tools discussed earlier in Sect.
2 are the often used in adjusting response curves and in optimisation to ensure the output of one part can be received and function as a relevant input for the downstream part. They can also affect the dynamic range as well as modulate retroactivity by increasing expression of the component, (Brophy and Voigt
2014). Alternatively, signal strength can be modulated by the addition of other parts such as amplifiers (Wang et al.
2014).
Other solutions to retroactivity have been attempted by borrowing of concepts from control theory and the subsequent addition of feedback and feed forward loops for insulation, although the latter can only be used based on how the disturbance affects the system, thereby being a much more specialized solution (Del Vecchio et al.
2016). The ideal insulator has zero retroactivity to the input and is not functionally affected in terms of output after taking on the load. One possibility is to use phosphorylation-dephosphorylation cycles since they work on a much faster timescale and do not place a large metabolic burden on the host (Del Vecchio et al.
2008).
To avoid crosstalk within a circuit, we must minimize unwanted interaction with the host and other sections of the circuit. This generally means avoiding repeat use of the same parts, in turn requiring proportionally more parts to increase the complexity and scale of a system making the expansion of the library of well characterized orthogonal parts essential. Alternatively, the circuit can be insulated from unwanted interactions, for example, the circuit can be constructed as to not rely on the host transcriptional machinery (Liu et al.
2018a) or follow the multicellular distributed approach mentioned previously. The former has gained some traction within the community. The phagemid T7 RNAP has been co-opted to separate the transcriptional machinery from the bacterial host Temme et al. (
2012). Chan et al. (
2005), refactored the T7 RNAP itself by isolating genes through physical separation, removing or standardizing adjacent sequences to the coding region whilst retaining functionality, making it much simpler to model and easier to manipulate. This has further led to the idea of an entirely orthogonal central dogma, conceptualising the addition of orthogonal DNA polymerases, aminoacyl-tRNA synthetases, and ribosomes for replication and translation respectively, (Liu et al.
2018a). Cello has incorporated into its design space strong terminators preventing RNAP read-through and ribozyme binding sequences and secondary structures that can cleave off the UTR to standardize context (Lou et al.
2012; Nielsen et al.
2016). Carr et al. (
2017) developed a degenerate insulator screening (DIS) technique to determine exact levels of insulation desired for bacterial promoters. Lengthy DNA sequences can be compressed by sharing regulatory parts, though paradoxically this will take it out of the genetic context it was characterized in, adding more uncertainty (Brophy and Voigt
2014). Lowering expression of and reducing resource consumption as well as reducing the number of repeated sequences and using inducible promoters can provide a reduction in genetic instability, as well as tying the function of the circuit to host health thereby making it advantageous to host survival (Sleight et al.
2010). Noise can be resisted with negative feedback, as well as from feed forward loops which incorporate both positive and negative regulation, whilst cell–cell communication has also been suggested to have robustness to noise (Zhang et al.
2016).
Understandably, for large scale gene circuits, all of these issues are proportionally magnified. The more parts the more points of failure. Ideally, there would be a large number of highly modular components that could easily be assembled together with predictable behaviour, in a sense ‘plug in and play’. However this is far from the case: despite the wide array of parts that have been described in the literature many of them are not well characterized enough to facilitate easy reuse. Simply put, the behaviour of many components becomes less predictable as they taken are taken further away from their original context. This is partly down to a lack of standardization in characterization. Protocols vary across groups, equipment also differs and characterization will be subject to a host of specific design factors such as plasmid, strain and reporter choice which often we do not understand well enough to reliably predict behaviour when they are changed. The latter problem is a result of our general lack of knowledge regarding basic biological system behaviour. Whilst mapping out potential cross reactivity between a small libraries of parts is reasonably feasible, mapping all potential connections and determining all possible interactions with the host is an order of magnitude more complex and can be even higher if accounting for changing environments and different species across time and space. Not only would this be computationally burdensome and difficult to mathematically model, it would also require a heavy amount of accurate and precise data that simply does not exist in the required scale.
Although optimisation steps listed above are possible, the time cost of performing optimization steps in multiple components is vast and any cross talk only increases the time needed as parts respond to multiple unwanted factors and become more difficult to adjust. In a recent pressure test where organisms were to be engineered to produce 10 molecules unknown in advance, Casini et al. (
2018) noticed that literature searches and database entries did not produce actionable data,and even standard procedures such as sequence verification and plasmid/oligo design became bottlenecks. In addition, they had to wait 3-8 weeks for DNA synthesis further reducing available bench time suggesting that there is room for improvement across the board.
The solution to these problems will lie in more accurate and standardized initial characterization of parts, improved understanding of basic circuit–circuit and circuit-host interactions to predict behaviour under different conditions and the reduction of man-hours required using high throughput automated design, construction and characterization methods. It is in this context that large scale circuits could benefit from the scale up and automation of microfluidics for tasks such as genetic assembly and high throughput characterization experiments that gather precise single cell data (which offers a much deeper understanding of host-circuit physiology than population averages), cell free systems enabling rapid prototyping, and methods such as RNA-seq giving us a much wider view of cell state. The resulting data can then be fed into computational simulations and models in order to be fed into the next round of the DBTL cycle. This will result in a positive feedback loop of knowledge; as circuits become better characterized, our understanding of systems will increase, further informing our design, our ability to model and predict behaviour and subsequently reducing the time needed to complete the DBTL cycle.
For circuits to have pertinent real-world applications, they must be able to sense relevant phenomena such as the intracellular concentration of a metabolite or extracellular factors such as heavy metals, RNA, DNA, protein, pH, light, oxygen or heat. In addition they must actuate outputs that are valuable to human endeavour. By doing so gene circuits can make the leap from interesting academic problems to useful biotechnological applications.
The generation of novel functional parts often finds its inspiration in already existing natural systems, although a degree of characterization and refining of these parts is necessary to add them to the toolbox (Wang et al.
2015a). Existing proteins have been engineered to sense new metabolites through directed evolution (Collins et al.
2006; Taylor et al.
2016) and some hybrids with novel function have also been developed. A synthetic light-sensitive sensor kinase (Cph1–EnvZ) was made in
E. coli by fusing the photoreceptor domain of the phytochrome Cph1 protein from Synechocystis to the intracellular signal transduction domain of the
E. coli EnvZ kinase, yielding a functional sensor chimera (Tabor et al.
2009). Antibody domains have been fused with DNA binding domains and activated via ligand induced dimerization to enable sensing of new molecules (Chang et al.
2018) and chimeric custom proteins have also been demonstrated with modified Notch receptors (Morsut et al.
2016). In some cases sensors can be modified to work in different hosts, as demonstrated with the retooling of TetR family repressors, to work in human embryonic kidney (HEK293) and Chinese hamster ovary (CHO) cells (Stanton et al.
2014). Examples of outputs include useful biological or small molecule products (Paddon and Keasling
2014), simple signalling responses to difficult to detect stimuli (Wang et al.
2013a; Bereza-Malcolm et al.
2015), to the cancer targeting classifier circuits that secrete apoptotic proteins (Xie et al.
2011).
Larger scale circuits will likely include a greater number of these unique sensing and output parts that will enable complex programmable functionality. For example, a bioremediation based system could potentially monitor many environmental inputs and secrete specific enzymes that degrade waste products in response. Circuits would benefit then, from a larger library of unique well characterized and modular parts, the general challenges and solutions of which have already been discussed. In particular, the ability to link novel inputs and outputs would benefit strongly from improved protein engineering techniques in modifying existing functionality or the building of chimeric proteins. In turn this would strongly benefit from deep structure function understanding to avoid time consuming trial and error experimentation (Wang et al.
2013b). Bioinformatics may be able to play a strong role too, in estimating structure and function of candidate proteins from their genetic sequences to narrow the design space (Stanton et al.
2013).
Metabolic burden or load can be understood as the resource consumption required by the engineered system upon the host. The concerns of burden are often the focus of metabolic engineers when optimizing a product producing pathway, however it is also relevant in the construction of gene circuits as resource limitation fundamentally affects system behaviour. Cells have an upper limit of nutrient and energy intake that limits all cellular activity, one of these hard limits can usually be described in terms of ATP. Cells can compensate somewhat by increasing respiration and catabolism but under too much strain there is a sharp drop in total protein production to near 0 and often results in the collapse of the population (Wu et al.
2016). The effect of foreign protein production on the host was spotted early on; increasing amounts of foreign protein production led to decreasing growth rate in
E. coli (Bentley et al.
1990; Bhattacharya and Dubey
1995). The amino acid content of recombinant proteins has also been shown to affect production levels (Bonomo and Gill
2005) whilst the amount of free ribosomes and RNAPs is also important, itself affected by presence of plasmid DNA (Birnbaum and Bailey
1991). There is evidence that genetic load resembles the equations of Ohm’s law for resistance in electrical circuits (Carbonell-Ballestero et al.
2016). Other findings have shown that ‘leaky’ basal levels of transcription and high plasmid copy number contribute to the protein burden (Lee et al.
2016), with copy number also changing gene circuit expression as well as in the host cell. Increasing copy number increases expression of the receptors to the system input, thereby increasing retroactivity, decreasing the sensitivity and dynamic range of repressor based systems given the same amount of repressor, and vice versa for activator based systems (Wang et al.
2015a; Liu et al.
2018b).
Managing load requires accurate characterization and calculated mitigation. The copy number and general expression levels of the circuit should be as low as is essential for predictable behaviour. If necessary, the circuit can be spread into multiple cells following the principles of distributed computing. RNA based control tends to be the least burdensome on host metabolism; Lapique and Benenson (
2017) even combined two orthogonal binding sites into one DNA sequence using recombinases to reversibly express equal amounts of the forward and reverse DNA sequence, thereby generating two separate species of RNA, each with one functional and orthogonal binding site. Ceroni et al. (
2015) inserted a constitutively expressed GFP element that would act as a tracker for metabolic change in the host. The Cello design framework manages burden through simulating the load on each cell by factoring in the impact on growth relative to the functional activity of the input promoter (Nielsen et al.
2016). This information can be used by the designer to optimize the circuit (Wu et al.
2016). Liao et al. (
2017) created a model that considers different RNA levels, the proteome (dividing it into gene expression apparatus and metabolic machinery), resource partitioning (including ATP and amino acid synthesis) as well as other factors such as growth, copy number and cell volume. The CRISPR-Cas system has been used to attenuate leaky gene expression with T7 RNAP and has been shown to improve growth in systems with previously toxic leaky expression (McCutcheon et al.
2018). Incoherent feedforwards loops (iFFL) have been engineered into promoters using transcription-activator-like effectors (TALEs) which stabilised expression level at different copy numbers (Segall-Shapiro et al.
2018) whilst Lee et al. (
2016) created single copy plasmids with stable expression.
Larger circuits mean more components and this will inevitably have a proportionally larger effect on metabolic load. Selecting parts that have minimal resource consumption (such as RNA based tools), and reducing consumption of existing through tuning will constitute a large part of the solution. In the latter case, there are complications as once a part is modified away from its original specifications, it will need to be characterized again. Furthermore, reducing the expression level can have negative effects on signal robustness and increase the susceptibility towards unwanted interactions and noise. The literature has suggested that parts with analog behaviour are significantly more resource efficient and the authors suggest that hybrid devices will likely be common in the future (Sarpeshkar
2014). Parts might be also arranged so as not to overlap on the type of load they produce, for example, distributing load across both transcription and translation, or they might be combined into a single layer that does not require communication between parts for sub-computation as demonstrated earlier (Weinberg et al.
2017). However the authors do note that this means the performance of circuits cannot be predicted based off its constituent parts. Another solution would be to distribute the circuit into different consortia, as discussed earlier; likely to become a common approach as reduction of load from individual parts cannot decrease indefinitely.
Tools that allow us to monitor and predict load will also become increasingly important. Here the related field of metabolic engineering may hold some promise. High throughput experimentation again will allow us to gather a larger amount of data in a shorter space of time and tools such RNA-seq or whole cell mass spectrometry that offer a wide view of cellular gene expression and metabolism to be key in deciphering the interactions between circuit and host (Liu et al.
2018b). Here the related field of metabolic engineering holds promise, having developed tools such as metabolic flux balance analysis to predict the distribution of important resources such as carbon (Yang et al.
2007). Finally, like as before, as data becomes more readily available and accurate, computational prediction will become increasingly important.