Skip to main content
Top

Open Access 2016 | Open Access | Book

Cover of the book

Anti-fragile ICT Systems

insite
SEARCH

About this book

This book introduces a novel approach to the design and operation of large ICT systems. It views the technical solutions and their stakeholders as complex adaptive systems and argues that traditional risk analyses cannot predict all future incidents with major impacts. To avoid unacceptable events, it is necessary to establish and operate anti-fragile ICT systems that limit the impact of all incidents, and which learn from small-impact incidents how to function increasingly well in changing environments.

The book applies four design principles and one operational principle to achieve anti-fragility for different classes of incidents. It discusses how systems can achieve high availability, prevent malware epidemics, and detect anomalies. Analyses of Netflix’s media streaming solution, Norwegian telecom infrastructures, e-government platforms, and Numenta’s anomaly detection software show that cloud computing is essential to achieving anti-fragility for classes of events with negative impacts.

Table of Contents

Frontmatter

The Concept of Anti-fragility

Frontmatter

Open Access

Chapter 1. Introduction
Abstract
Modern societies cannot function without information and communications technology (ICT) systems. When ICT systems such as electronic government (e-government) systems, e-payment infrastructures, and mobile phone networks fail, users can still access alternative systems based on older technologies, but these alternatives are rapidly disappearing. It is therefore necessary to develop ICT systems that remain highly robust to undesirable incidents over time and that are available to citizens around the clock.
Kjell Jørgen Hole

Open Access

Chapter 2. Achieving Anti-fragility
Abstract
A stakeholder is a person or institution with a legitimate interest in a given information and communications technology (ICT) system. Examples of stakeholders are users, owners, operators, regulatory government agencies, system architects, and software developers. Given a set of stakeholders, a complex adaptive ICT system is fragile to a particular type of negative impact, for example, downtime, if a possible large impact is unacceptable to some stakeholders in the set and robust if all possible impacts are acceptable to all stakeholders. The ICT system is anti-fragile if it learns (perhaps with help from some stakeholders) to maintain an acceptable impact to all stakeholders as the system and environment change over time.
Kjell Jørgen Hole

Open Access

Chapter 3. The Need to Build Trust
Abstract
An organization operating and managing a complex adaptive information and communications technology (ICT) system is said to be anti-fragile when, over time, the organization is able to protect the user population from serious consequences of system failures and simultaneously provide digital services fulfilling the users’ changing needs. According to Chap. 2, failures are inevitable in a complex ICT system. Unless a user population has a high level of trust in the system, the population may abandon the system after a failure. Hence, any anti-fragile organization running a complex ICT system must maintain a high level of trust over time to keep their users after inevitable system failures.
Kjell Jørgen Hole

Open Access

Chapter 4. Principles Ensuring Anti-fragility
Abstract
This chapter first introduces four design principles that together isolate local failures before they propagate and cause systemic failures. It then presents one operational principle to quickly remove exploitable vulnerabilities. Finally, the chapter discusses how a systemic failure can occur in a complex adaptive system even when no parts fail, as well as the need to build models to understand such extreme global behavior.
Kjell Jørgen Hole

Anti-fragility to Downtime

Frontmatter

Open Access

Chapter 5. Anti-fragile Cloud Solutions
Abstract
To better understand how to achieve anti-fragility to downtime, the chapters of Part II discuss how to realize the four design principles and the one operational principle from Chap. 4 in different types of systems. The current chapter focuses on how to realize the principles in customer-facing web-scale solutions in the cloud. Much of the discussion is based on design and operational patterns described by Michael T. Nygard and Netflix’s realization of these patterns in its cloud-based streaming service.
Kjell Jørgen Hole

Open Access

Chapter 6. Toward an Anti-fragile e-Government System
Abstract
This chapter first studies the Norwegian electronic government system Altinn as it appeared in 2012 to better understand why it is advantageous to base the design of anti-fragile web-scale systems on fine-grained service-oriented architectures in public clouds with scalable and distributed data storage. Next, the chapter considers the United Kingdom’s e-government system to understand the need for user-focused and iterative development to support both rapid change and high availability. Finally, the chapter discusses whether a nation should have a single e-government system running many services or multiple independent and diverse systems running a few services each.
Kjell Jørgen Hole

Open Access

Chapter 7. Anti-fragile Cloud-Based Telecom Systems
Abstract
This chapter studies how to apply the five design and operational principles from Chap. 4 to develop and maintain cloud-based telecom infrastructures with anti-fragility to downtime.
Kjell Jørgen Hole

Anti-fragility to Malware

Frontmatter

Open Access

Chapter 8. Robustness to Malware Spreading
Abstract
This chapter investigates software diversity’s ability to make systems robust to the spreading of infectious malware and argues that diversity increases the time needed to compromise enterprise systems, thus increasing the probability of early detection and mitigation.
Kjell Jørgen Hole

Open Access

Chapter 9. Robustness to Malware Reinfections
Abstract
In this chapter, we study a stochastic epidemiological model of multimalware outbreaks where arbitrary but fixed probabilities determine whether nodes are infected. Furthermore, nodes recover from infections with given probabilities, only to be reinfected later. An incident from 2007, where the same worm repeatedly infected the internal networks of a Norwegian bank, illustrates how reinfections can occur in real networks.
Kjell Jørgen Hole

Open Access

Chapter 10. Anti-fragility to Malware Spreading
Abstract
To achieve anti-fragility to malware spreading, this chapter applies the fail fast principle from Chap. 4 to the robust malware-halting technique developed in the two previous chapters. According to the fail fast principle, it is necessary to learn from failures in complex adaptive systems when the impact of the failures are still small. In the case of infectious malware epidemics, once malware is detected on a node in a networked system, other nodes infected by the same malware should be healed and susceptible nodes should be protected from future infections of this malware.
Kjell Jørgen Hole

Anomaly Detection

Frontmatter

Open Access

Chapter 11. The HTM Learning Algorithm
Abstract
According to the fail fast principle in Chap.  4, we need to learn from systems’ abnormal behavior and downright failures to achieve anti-fragility to classes of negative events. The earlier we can detect problems, the smaller the negative consequences are and the faster we can start learning how to improve the systems. Since humans are not good at detecting anomalies, especially in streaming data from large cloud applications, a form of automatic anomaly detection is needed. This first chapter of Part IV introduces a general learning algorithm based on Jeff Hawkins’s developing theory of how the brain learns, called hierarchical temporal memory (HTM). The HTM learning algorithm is used in the next chapter to detect anomalies in a system’s behavior.
Kjell Jørgen Hole

Open Access

Chapter 12. Anomaly Detection with HTM
Abstract
We model information and communications technology (ICT) systems as complex adaptive systems. Since we cannot hope to predict all future incidents in complex systems, real-time monitoring is needed to detect local failures before they propagate into global failures with an intolerable impact. In particular, monitoring is required to determine the consequences of injecting artificial errors into production systems and to learn how to avoid or limit the impact of future incidents.
Kjell Jørgen Hole

Future Anti-fragile Systems

Frontmatter

Open Access

Chapter 13. Summary and Future Work
Abstract
We have come to the end of the book, which has investigated different aspects of anti-fragile information and communications technology (ICT) systems. This chapter summarizes the book’s main insights into the development and operation of anti-fragile ICT systems, discusses the design of future systems, and outlines the need for anti-fragile processes, especially to handle attacks on the confidentiality, integrity, and availability of ICT systems.
Kjell Jørgen Hole
Backmatter
Metadata
Title
Anti-fragile ICT Systems
Author
Kjell Jørgen Hole
Copyright Year
2016
Electronic ISBN
978-3-319-30070-2
Print ISBN
978-3-319-30068-9
DOI
https://doi.org/10.1007/978-3-319-30070-2

Premium Partner