Skip to main content

2018 | Book

Ecological Informatics

Data Management and Knowledge Discovery


About this book

This book introduces readers to ecological informatics as an emerging discipline that takes into account the data-intensive nature of ecology, the valuable information to be found in ecological data, and the need to communicate results and inform decisions, including those related to research, conservation and resource management. At its core, ecological informatics combines developments in information technology and ecological theory with applications that facilitate ecological research and the dissemination of results to scientists and the public. Its conceptual framework links ecological entities (genomes, organisms, populations, communities, ecosystems, landscapes) with data management, analysis and synthesis, and communicates new findings to inform decisions by following the course of a loop.

In comparison to the 2nd edition published in 2006, the 3rd edition of Ecological Informatics has been completely restructured on the basis of the generic conceptual f

ramework provided in Figure 1. It reflects the significant advances in data management, analysis and synthesis that have been made over the past 10 years, including new remote and in situ sensing techniques, the emergence of ecological and environmental observatories, novel evolutionary computations for knowledge discovery and forecasting, and new approaches to communicating results and informing decisions.

Table of Contents



Chapter 1. Ecological Informatics: An Introduction
Ecological Informatics is an emerging discipline that takes into account the data-intensive nature of ecology, the valuable information content of ecological data, and the need to communicate results and inform decisions, including those related to research, conservation and resource management (Recknagel 2017). At its core, ecological informatics combines developments in information technology and ecological theory with applications that facilitate ecological research and the dissemination of results to scientists and the public. Its conceptual framework links ecological entities (genomes, organisms, populations, communities, ecosystems, landscapes) with data management, analysis and synthesis, and communicating and informing decisions by following the course of a loop (Fig. 1.1).
Friedrich Recknagel, William K. Michener

Managing Ecological Data

Chapter 2. Project Data Management Planning
A data management plan (DMP) describes how you will manage data during a research project and what you will do with the data after the project ends. Research sponsors may have very specific requirements for what should be included in a DMP. In lieu of or in addition to those requirements, good plans address 11 key issues: (1) research context (e.g., what questions or hypotheses will be examined); (2) how the data will be collected and acquired (e.g., human observation, in situ or remote sensing, surveys); (3) how the data will be organized (e.g., spreadsheets, databases); (4) quality assurance and quality control procedures; (5) how the data will be documented; (6) how the data will be stored, backed up and preserved for the long-term; (7) how the data will be integrated, analyzed, modeled and visualized; (8) policies that affect data use and redistribution; (9) how data will be communicated and disseminated; (10) roles and responsibilities of project personnel; and (11) adequacy of budget allocations to implement the DMP. Several tips are offered in preparing and using the DMP. In particular, researchers should start early in the project development process to create the DMP, seek input from others, engage all relevant project personnel, use common and widely available tools, and adopt community practices and standards. The best DMPs are those that are referred to frequently, reviewed and revised on a routine basis, and recycled for use in subsequent projects.
William K. Michener
Chapter 3. Scientific Databases for Environmental Research
Databases are an important tool in the arsenal of environmental researchers. There are a rich variety of database types available to researchers for the management of their own data and for sharing data with others. However, using databases for research is not without challenges due to the characteristics of scientific data, which differ in terms of longevity, volume, diversity and ways they are used from many business applications. This chapter reviews some successful scientific databases, pathways for developing scientific data resources, and general classes of Database Management Systems (DBMS). It also provides an introduction to data modeling, normalization and how databases and data derived from databases can be interlinked to produce new scientific products.
John H. Porter
Chapter 4. Quality Assurance and Quality Control (QA/QC)
This chapter introduces quality assurance processes and procedures that are employed to prevent data contamination from occurring and, secondly, quality control processes and procedures that are used to identify and deal with errors after they have been introduced. In addition, QA/QC activities are described that can be implemented throughout the entire data life cycle from data acquisition through analysis and preservation and general rules of thumb for promoting data quality are presented.
William K. Michener
Chapter 5. Creating and Managing Metadata
This chapter introduces the reader to metadata as well as the standards and tools that can be used to generate and manage standardized metadata during a research project. First, metadata is defined and many of the benefits that accrue from creating comprehensive metadata are listed. Second, the different types of metadata that may be necessary to understand and use (e.g., analyze, visualize) a data set are described along with some relevant examples. Third, the content, characteristics, similarities and differences among many of the relevant ecological metadata standards are presented. Fourth, the various software tools that enable one to create metadata are described and best practices for creating and managing metadata are recommended.
William K. Michener
Chapter 6. Preserve: Protecting Data for Long-Term Use
This chapter provides guidance on fundamental data management practices that investigators should perform during the course of data collection to improve both the preservation and usability of their data sets over the long term. Topics covered include fundamental best practices on how to choose the best format for your data, how to better structure data within files, how to define parameters and units, and how to develop data documentation so that others can find, understand, and use your data easily. We also showcase advanced best practices on how to properly specify spatial and temporal characteristics of your data in standard ways so your data are ready and easy to visualize in both 2-D and 3-D viewers. By following this guidance, data will be less prone to error, more efficiently structured for analysis, and more readily understandable for any future questions that the data products might help address.
Robert B. Cook, Yaxing Wei, Leslie A. Hook, Suresh K. S. Vannan, John J. McNelis
Chapter 7. Data Discovery
Data may be discovered by searching commercially available internet search engines, institutional and public repositories, online data directories, and the content exposed by data aggregators. Chapter 7 describes these various search approaches and presents seven best practices that can promote data discovery and reuse. It further emphasizes the need for data products to be uniquely identifiable and attributable to the data originators who must also be uniquely identifiable.
William K. Michener
Chapter 8. Data Integration: Principles and Practice
Data integration is the process of combining (also called “merging” or “joining”) data together to create a single unified data object from what were multiple, distinct data objects. The motivation for integrating data is usually to bring together the information needed to jointly analyze or model some phenomena. By producing a single, consistently structured object through data integration, the process of further manipulating those data is vastly simplified, while presumed relationships among the data are clarified.
Data integration is essential for many scientific disciplines, but especially in disciplines such as ecology and the environmental sciences, where processes and patterns of interest often emerge from interactions among numerous complex physical phenomena. Observations of these distinct phenomena are often collected by disparate parties in uncoordinated ways, using different data systems. It is then necessary to gather these data together and appropriately integrate them, to clarify through further modeling and analysis the nature and strength of any relationships among them. Synthesis studies, in particular, often require finding, and then bringing together disparate data in order to integrate them, and reveal new insights.
This chapter describes aspects of data that are critical for determining whether and how data can be integrated, and discusses some of the theoretical considerations and common mechanisms for integrating data.
Mark Schildhauer

Analysis, Synthesis and Forecasting of Ecological Data

Chapter 9. Inferential Modelling of Population Dynamics
This chapter introduces the design and applications of evolutionary algorithms and regression trees for inferential modelling of complex ecological data. Evolutionary algorithms prove to be superior tools for developing short-term forecasting models, revealing ecological thresholds and supporting quantitative meta-analyses as demonstrated exemplarily by means of the hybrid evolutionary algorithm (HEA). A case study of Lake Müggelsee (Germany) illustrates that models developed by HEA enable one to identify ecological thresholds and driving forces that perform short-term forecasting of population growth. The meta-analysis of Lakes Wivenhoe (Australia) and Lake Paranoa (Brazil) exemplifies the capability of models developed by HEA to test hypotheses on forcing functions of population growth across different environmental and climate conditions. Regression trees display fully transparent correlations between habitat properties and ecological entities. The tree induction process does not require prior assumptions, is fast and is not influenced by redundant variables and noise. Case studies for Lake Prespa (Macedonia) and land areas in Victoria (Australia) illustrate the capacity of regression trees to unravel complex ecological relationships.
Friedrich Recknagel, Dragi Kocev, Hongqing Cao, Christina Castelo Branco, Ricardo Minoti, Saso Dzeroski
Chapter 10. Process-Based Modeling of Nutrient Cycles and Food-Web Dynamics
Mathematical models are indispensable for addressing pressing aquatic ecosystem management issues, such as understanding the oceanic response to climate change, the interplay between plankton dynamics and atmospheric CO2 levels, and alternative management plans for eutrophication control. The appeal of process-based (mechanistic) models mainly stems from their ability to synthesize among different types of information reflecting our best understanding of the ecosystem functioning, to identify the key individual relationships and feedback loops from a complex array of intertwined ecological processes, and to probe ecosystem behavior using a range of model application domains. Significant progress in developing and applying mechanistic aquatic biogeochemical models has been made during the last three decades. Many of these ecological models have been coupled with hydrodynamic models and include detailed biogeochemical/biological processes that enable comprehensive assessment of system behavior under various conditions. In this chapter, case studies illustrate ecological models with different spatial configurations. Given that each segmentation depicts different trade-offs among model complexity, information gained, and predictive uncertainty, our objective is to draw parallels and ultimately identify the strengths and weaknesses of each strategy.
George Arhonditsis, Friedrich Recknagel, Klaus Joehnk
Chapter 11. Uncertainty Analysis by Bayesian Inference
The scientific methodology of mathematical models and their credibility to form the basis of public policy decisions have been frequently challenged. The development of novel methods for rigorously assessing the uncertainty underlying model predictions is one of the priorities of the modeling community. Striving for novel uncertainty analysis tools, we present the Bayesian calibration of process-based models as a methodological advancement that warrants consideration in ecosystem analysis and biogeochemical research. This modeling framework combines the advantageous features of both process-based and statistical approaches; that is, mechanistic understanding that remains within the bounds of data-based parameter estimation. The incorporation of mechanisms improves the confidence in predictions made for a variety of conditions, whereas the statistical methods provide an empirical basis for parameter value selection and allow for realistic estimates of predictive uncertainty. Other advantages of the Bayesian approach include the ability to sequentially update beliefs as new knowledge is available, the rigorous assessment of the expected consequences of different management actions, the optimization of the sampling design of monitoring programs, and the consistency with the scientific process of progressive learning and the policy practice of adaptive management. We illustrate some of the anticipated benefits from the Bayesian calibration framework, well suited for stakeholders and policy makers when making environmental management decisions, using the Hamilton Harbour and the Bay of Quinte—two eutrophic systems in Ontario, Canada—as case studies.
George Arhonditsis, Dong-Kyun Kim, Noreen Kelly, Alex Neumann, Aisha Javed
Chapter 12. Multivariate Data Analysis by Means of Self-Organizing Maps
Ecological data range widely in variability, showing non-linear and complex relationships among variables. Although conventional multivariate analyses are useful tools to explore ecological data, data mining by non-linear methods is preferred because a high degree of complexity resides in ecological phenomena. One of these methods is artificial neural networks in machine learning based on biologically inspired learning algorithms. Self-organizing map (SOM) is one of the most popular unsupervised artificial neural networks and are commonly used to seek patterns and clusters in ecological data. SOMs are versatile in analysing non-linear and complex data, which are observed frequently in ecological systems. In this paper, we explain the theory of SOMs and their application in ecological modelling, with a focus on learning processes, visualization, preprocessing of input data, and interpretation of results. We also discuss the advantages and disadvantages of SOM approaches.
Young-Seuk Park, Tae-Soo Chon, Mi-Jung Bae, Dong-Hwan Kim, Sovan Lek
Chapter 13. GIS-Based Data Synthesis and Visualization
Synthesizing and properly visualizing data in 2D systems is a key issue when aiming at explaining spatial patterns by spatial processes.
In this chapter we address the topics synthesis and visualization in relation to following ecological issues: (1) synthesizing species distribution models relying on virtual species, (2) visualizing spatial uncertainty in species distribution based on cartograms, (3) fuzzy methods to synthesize species distribution uncertainty, (4) remote sensing data synthesis by exploratory analysis and replotting data in new systems, (5) measuring and visualizing ecological diversity from space based on generalized entropy, and (6) neutral landscape for testing ecological theories. We will make use of examples from the free and open source software GRASS GIS and R.
Duccio Rocchini, Carol X. Garzon-Lopez, A. Marcia Barbosa, Luca Delucchi, Jonathan E. Olandi, Matteo Marcantonio, Lucy Bastin, Martin Wegmann

Communicating and Informing Decisions

Chapter 14. Communicating and Disseminating Research Findings
This chapter provides guidance on approaches and best practices for communicating and disseminating research findings to technical audiences via scholarly publications such as peer-reviewed journal articles, abstracts, technical reports, books and book chapters. We also discuss approaches for communicating findings to more general audiences via newspaper and magazine articles and highlight best practices for designing effective figures that explain and support the research findings that are presented in scientific and general audience publications. Research findings may also be presented verbally to educate, change perceptions and attitudes, or influence policy and resource management. Key topics include simple steps for giving effective presentations and best practices for designing slide text and graphics, posters and handouts. Websites and social media are increasingly important mechanisms for communicating science. We discuss forms of commonly used social media, identify simple steps for effectively using social media, and highlight ways to track and understand your social media and overall research impact using various metrics and altmetrics.
Amber E. Budden, William K. Michener
Chapter 15. Operational Forecasting in Ecology by Inferential Models and Remote Sensing
This chapter addresses the demand of environmental agencies and water industries for tools enabling them to prevent and mitigate events of rapid deterioration of environmental assets such as contamination of air, soils and water, declining biodiversity, desertification of landscapes. Getting access to reliable early warning signals may avoid excessive ecological and economic costs.
Here we present examples of recently emerging technologies for predictive modelling and remote sensing suitable for early warning of outbreaks of toxic cyanobacteria blooms in freshwaters that pose a serious threat to public health and biodiversity. As demonstrated by two case studies, inferential models developed from in situ water quality data by evolutionary computation prove to be suitable for up to 30 days forecasting of population dynamics of cyanobacteria and concentrations of cyanotoxins in drinking water reservoirs with different climates. The models not only forecast daily concentrations of cyanobacteria and cyanotoxins but also daily proliferation rates. Proliferation rates exceeding 0.2 day−1 serve as criteria for early warning. Alarm is triggered if forecasted concentrations of cyanobacteria or cyanotoxins exceed predefined threshold values and proliferation rates exceed 0.2 day−1, constituting a bloom event. Findings from these case studies suggest that cyanobacteria blooms can be forecasted up to 30 days ahead in real-time mode solely based on online water quality data monitored by multi-sensor data loggers.
Advanced remote sensing technology allows to quantify absorption/reflectance characteristics of algal pigments of a water column for deriving chlorophyll-a concentrations as indicator for algal biomass, or discriminating cyanobacteria by specific pigments such as cyano-phycocyanin and cyano-phycoerithrin. It has the potential to monitor spatio-temporal distribution of water quality parameters and cyanobacteria blooms based on sufficient spatial, temporal and spectral resolution of the sensors, and the availability of suitable algorithms to match satellite information with high-resolution in-situ measurements. The chapter discusses the prospect of using remote sensing technology for forecasting seasonal trajectories of cyanobacteria blooms that requires the combination of in-situ monitoring and remote sensing data with hydrodynamic models. By deriving vertical light attenuation in the water column from remote sensing data, hydrodynamic models will be enabled to predict seasonally occurring cyanobacteria blooms.
Friedrich Recknagel, Philip Orr, Annelie Swanepoel, Klaus Joehnk, Janet Anstee
Chapter 16. Strategic Forecasting in Ecology by Inferential and Process-Based Models
Long-term forecasts are crucial for successful preventative and restorative management in ecology, and therefore require valid forecasting models. However, the validity of models is restricted by their scope and their inherent uncertainties.
This chapter discusses benefits of ensemble modelling in order to strengthen the validity and reliability of long-term forecasts. An ensemble of inferential models is demonstrated to overcome the limited scope of a single model for forecasting population dynamics of the cyanobacterium Microcystis in response to adaptive flow management of the River Nakdong (South Korea). Ensembles of alternative process-based models based on model averaging are examined to decrease uncertainties of single models when applied to determine the Remedial Action Plan for eutrophication control of Hamilton Harbour (Canada) and global warming effects on the phytoplankton community of Lake Engelsholm (Denmark). An ensemble of the complementary models SWAT and SALMO is applied to the catchment-reservoir system Millbrook (Australia) to overcome limitations of the scope of the two individual models. Results indicate that both, complex catchment-specific and lake-specific processes need to be considered in order to realistically forecast spatial cascading effects between catchments and lakes under the influence of prospective land use and climate changes.
Friedrich Recknagel, George Arhonditsis, Dong-Kyun Kim, Hong Hanh Nguyen

Case Studies

Chapter 17. Biodiversity Informatics
Biodiversity informatics, the application of informatics techniques to biodiversity data, is rooted in physical objects and nomenclatural codes. Through two user stories, one from wildlife conservation and another from agriculture, we demonstrate the importance and process of biodiversity informatics. We discuss the importance and integration of taxonomic names, identification tools, species distributions, phylogenetic trees, traits, associations, the literature, ontologies, controlled vocabularies, standards, and genomics. Despite the plethora of resources, a seamless, biodiversity question and answer engine is still out of reach. The largest impediment to our user stories is the lack of cross-disciplinary infrastructure and the digitized and standardized data to support services. Satisfying our user stories will require additional investment in infrastructure and data that will be a challenge to manage and sustain. This chapter discusses the basic biodiversity informatics concepts that are at the heart of our user stories, and will be the basis of the user stories of the future as society rushes to cope with global environmental change.
Cynthia S. Parr, Anne E. Thessen
Chapter 18. Lessons from Bioinvasion of Lake Champlain, U.S.A.
Freshwater lakes provide ideal habitat for invasive species, such as the zebra mussel, which can weaken lake ecological integrity by altering food web structure and dynamics. This case study utilized 23 years Lake Champlain data to examine relationships among water quality, invasive species, native mysids (Mysis diluviana) and the zooplankton community. Canonical correspondence analysis (CCA) was employed to ordinate and qualitatively assess long-term patterns across the datasets, and the hybrid evolutionary algorithm (HEA) revealed quantitative relationships and thresholds. Results from both methods are complementary and suggest that: (1) zebra mussels directly affect rotifer densities by preying on slow moving rotifers, and (2) zebra mussels indirectly affect cladocerans, copepods and mysids by both preying on rotifers and grazing on phytoplankton. The direct and indirect effects of zebra mussels on the zooplankton community as well as on mysids adversely affect the ecological integrity of Lake Champlain. Data ordination by CCA and inferential modelling by HEA proved useful for elucidating long-term food web patterns in the complex Lake Champlain ecosystem.
Timothy B. Mihuc, Friedrich Recknagel
Chapter 19. The Global Lake Ecological Observatory Network
This chapter explores a socio-technological (S-T) approach to information management within the Global Lake Ecological Observatory Network (GLEON). In S-T systems, information management, relevant organizational policies, and the supporting technologies are integral components of the network fabric. They derive from the needs of the community, articulated through representative governance, and they service the needs of the community by engaging data providers as partners in scientific endeavors. Through a brief history of GLEON, we recount the emergence of the S-T approach as part of GLEON’s philosophy as a learning organization. It is clear that there is still much to be learned about streamlining data curation and publishing, especially from an international network of observatories with diverse data and sensor networks. Grassroots networks such as GLEON often do not have the resources—human, financial, and infrastructure—required for persistent and highly efficient data curation and publishing. However, strategies that address directly the needs of the network community, such as providing credit to data providers, tracking the progress of projects that use the data, and sharing high-value synthesized data sets, quickly gain acceptance and garner commitment by the community. Today, S-T systems require ‘humans in the loop’ for data curation, which, in turn, results in constraints on scalability of these systems. One of the great challenges that lie ahead will be connecting GLEON S-T, which represents a diverse international community, with existing external data curation and archiving services.
Paul C. Hanson, Kathleen C. Weathers, Hilary A. Dugan, Corinna Gries
Chapter 20. Long-Term Ecological Research in the Nakdong River: Application of Ecological Informatics to Harmful Algal Blooms
In recent decades, the importance of long-term ecological research (LTER) has been highlighted because of the growing interest in global environmental changes. Specifically, LTER data allows one to track the history of target ecosystems (e.g., trends of particular ecological entities) and enables one to understand the causal relationships of ecosystem functioning. One ecological problem is harmful algal blooms (HABs) in freshwater environments. It is generally perceived that global warming and local eutrophication are responsible for serious and frequent HAB events, and various efforts have been made to explain and forecast HABs. LTER data for HABs typically consist of various forcing functions and variables; thus, the selection of appropriate data-analysis methods for a HAB database is necessary. This chapter presents a series of studies related to the prediction and elucidation of two HABs, such as summer cyanobacteria (e.g., Microcystis aeruginosa) and winter diatom (e.g., Stephanodiscus hantzschii) that occur in the regulated Nakdong River, South Korea. First, HABs, water quality, and zooplankton patterns were analyzed using self-organizing maps (SOMs). Those major factors that have a close relationship to HABs, i.e., water temperature, pH, and rainfall, were selected. We created a predictive model and control scenario for HABs using a variety of methods (evolutionary computation, artificial neural network) in the real world based on confirmed information. We also suggest potential further studies of the Nakdong River.
This chapter focuses on: (1) properties of the limnological dataset of the Nakdong River derived from Korean Long-Term Ecological Research (KLTER), (2) analysis and time-series modelling of KLTER dataset by means of machine learning techniques, and (3) benefits of applied ecological informatics for KLTER dataset.
Dong-Gyun Hong, Kwang-Seuk Jeong, Dong-Kyun Kim, Gea-Jae Joo
Chapter 21. From Ecological Informatics to the Generation of Ecological Knowledge: Long-Term Research in the English Lake District
Lakes are highly connected systems that are affected by a hierarchy of stressors operating at different scales, making them particularly sensitive to anthropogenic perturbation. Traditionally, lakes are studied as a whole system ‘from physics to fish’ and long-term monitoring programmes were initiated on this basis, some starting over a century ago. This chapter describes the long-term monitoring programme on the Cumbrian lakes, UK, how it is operated and how its scientific value is increased by combining it with additional activities. Case-studies are presented on the advances long-term research has made to testing ecological theory and understanding teleconnexions and phenology. Automatic high-frequency measurements are an important complementary approach that has been made possible by technological revolutions in computing, and telecommunications. They provide a window into the true dynamic nature of lakes that cannot be achieved by manual sampling. The large volume of data produced can now be quality controlled and analysed by bespoke software that has been developed in recent years by a global network of lake and data scientists. Finally, lake models constructed using the insights from monitoring, as well as experiments, are powerful ways to identify knowledge gaps and allow forecasts to be made of future responses to environmental change or management intervention. As other approaches become incorporated into lake research, such as Earth Observation and citizen science, the scale of knowledge about the system will increase, improving our ability to provide robust scientific advice for the sustainable management of these fragile, but important ecosystems.
S. C. Maberly, D. Ciar, J. A. Elliott, I. D. Jones, C. S. Reynolds, S. J. Thackeray, I. J. Winfield
Ecological Informatics
Dr. Friedrich Recknagel
Prof. Dr. William K. Michener
Copyright Year
Electronic ISBN
Print ISBN