Data-Centric Biology A Philosophical Study
by Sabina Leonelli
University of Chicago Press, 2016
Cloth: 978-0-226-41633-5 | Paper: 978-0-226-41647-2 | Electronic: 978-0-226-41650-2
DOI: 10.7208/chicago/9780226416502.001.0001
ABOUT THIS BOOKAUTHOR BIOGRAPHYREVIEWSTABLE OF CONTENTS

ABOUT THIS BOOK

In recent decades, there has been a major shift in the way researchers process and understand scientific data. Digital access to data has revolutionized ways of doing science in the biological and biomedical fields, leading to a data-intensive approach to research that uses innovative methods to produce, store, distribute, and interpret huge amounts of data. In Data-Centric Biology, Sabina Leonelli probes the implications of these advancements and confronts the questions they pose. Are we witnessing the rise of an entirely new scientific epistemology? If so, how does that alter the way we study and understand life—including ourselves?

 Leonelli is the first scholar to use a study of contemporary data-intensive science to provide a philosophical analysis of the epistemology of data. In analyzing the rise, internal dynamics, and potential impact of data-centric biology, she draws on scholarship across diverse fields of science and the humanities—as well as her own original empirical material—to pinpoint the conditions under which digitally available data can further our understanding of life. Bridging the divide between historians, sociologists, and philosophers of science, Data-Centric Biology offers a nuanced account of an issue that is of fundamental importance to our understanding of contemporary scientific practices.

AUTHOR BIOGRAPHY

Sabina Leonelli is associate professor of philosophy and history of science at the University of Exeter.
 

REVIEWS

“The first critical book-length study of data centrism in the life sciences and beyond, Data-Centric Biology sheds new and surprising light on the phenomenon of big data. Analytically competent, historically informed, sociologically sensitive, this book is a brilliant and successful demonstration of what bringing together philosophy, history, and social studies of science can achieve.”
— Hans-Jörg Rheinberger, Max Planck Institute for the History of Science

“Going far beyond the epistemic concerns that preoccupy many philosophers, Data-Centric Biology brilliantly shows readers the practices that make data informative and meaningful, the biocurators who carefully attend to data’s forms, and the social, economic, and political resources on which our systems of Big Data Sciences depend. Leonelli is a leader in this area of scholarship, commanding a vast comprehensive knowledge of the historical, philosophical, and social studies of the life sciences and the data practices that sustain them.”
— Mike Fortun, Rensselaer Polytechnic Institute

“At the core of the book is a philosophical view of data, which is seen as a relational category, applicable to portable (though not immutable) research outputs that have or are expected to have evidential value for knowledge claims. A key feature of data-centrism is then, for Leonelli, the recognition that this evidential value is underdetermined, i.e. that we cannot say in advance what claims any given data may bear upon. This prospective and open nature of data has a number of consequences on how it is to be dealt with, which much of the book sets out to explore. . . . It offers very interesting epistemological views about data, its usage, and the challenges in its dissemination; it does an excellent job in revealing the often less visible work of curators and of the social and conceptual structures underlying broad efforts at data annotation, organization, and dissemination; and it represents an important resource for scholars interested in these issues.”
— History and Philosophy of the Life Sciences

"Has there been a shift in what “data” means that is key to understanding the future of science? Sabina Leonelli’s new book, Data-Centric Biology: A Philosophical Study, argues for an important and fruitful answer outside the comfort zone of many philosophers. Along the way, she also delivers valuable insights into expanding efforts to standardize, automate, and communicate how scientists handle, share, reproduce, interpret, and store data."

— Philosophy of Science

"In Data-Centric Biology: A Philosophical Study, Sabina Leonelli provides the first book-length study of what is novel about data-centric research in the life sciences. . . . Her book is a laudable achievement in integrating insights from a methodologically permissive and pluralistic outlook."
— Quarterly Review of Biology

"This is one of the most important books published in science studies over the last few years, and generalist readers should not be put off by its seemingly narrow topic, namely the actualization and validation of the vast mass of biological data encoded in digital formats, so that it may be ‘shared’ amongst relevant scientists. One should take the subtitle seriously: the world is bloated with attempts by philosophers to clarify the fine points of empiricism in science, but almost none of that achieves anything near the level of insight and enlightenment achieved in this short book. Perhaps more surprisingly, few authors prove capable of grasping the profound transformations happening in the political economy of science with greater sensitivity than Leonelli. Her pungent observations deserve the attention of anyone interested in the meaning and significance of the ‘Open Science’ movement."
— Metascience

"The dynamic, procedural themes running throughout Leonelli’s book usefully shift our philosophical focus from the abstracta of understanding how science works from a frozen perspective to a more useful analysis of how scientists in fact get the job done (when they do) and—more critically—how they might do it better."
— BJPS Review of Books

"Data-Centric Biology is a rich and carefully argued book that nicely integrates social and philosophical perspectives . . . a pathbreaking contribution"
— Isis

TABLE OF CONTENTS

- Sabina Leonelli
DOI: 10.7208/chicago/9780226416502.003.0001
[data revolution;data paradigm;empirical philosophy of science;data science;novelty of data science]
Summary of the argument, background, methods and approach as "philosophy of science in practice". The introduction articulates what is revolutionary and novel about the current turn to "data-centric science", and why it is important to develop an epistemological analysis of data-centric science. (pages 1 - 10)
This chapter is available at:
    https://academic.oup.com/chica...

- Sabina Leonelli
DOI: 10.7208/chicago/9780226416502.003.0002
[databases;data journeys;model organism research;data flows;experimental biology]
Chapter 1 analyses the development of sophisticated technologies to disseminate data over the last three decades of experimental biology research. I focus in particular on the online databases used to collect and analyse the wide variety of data acquired on key model organisms, such as the fruit-fly Drosophila melanogaster, the thale-cress Arabidopsis thaliana and the nematode Caenorhabditis elegans. I discuss the history and features of these databases as a prime example of technology set up to organise and interpret such data; and the wealth and diversity of expertise, resources and conceptual scaffolding that such databases draw upon in order to function. I conclude with a discussion of the idea of data journeys, which I use to capture the labor-intensive, uneven and unpredictable nature of data travel through biological databases. (pages 13 - 44)
This chapter is available at:
    https://academic.oup.com/chica...

- Sabina Leonelli
DOI: 10.7208/chicago/9780226416502.003.0003
[Open Data;Open Science;data value;data governance;science policy;scientific institutions]
Chapter 2 broadens the scope of my analysis by considering the social structures of data-centric biology and some of the reasons why data have taken a central place in contemporary science and policy discourse. To this aim, I relate the techno-scientific developments discussed in the first chapter to the social, political and economic roles played by this terminology and related practices. I start with an overview of how online databases are managed, financed and institutionalised, where I argue that the consortia and steering committees set up to create and maintain these resources have acquired a key regulatory function in contemporary biology. I then discuss the relation between the establishment of databases and the growth of the Open Data movement, which is currently playing an important role in articulating the reasons and incentives for sharing scientific data in the first place. This brings me to reflect on the multiple types of value acquired by data when circulating across contexts. (pages 45 - 66)
This chapter is available at:
    https://academic.oup.com/chica...

- Sabina Leonelli
DOI: 10.7208/chicago/9780226416502.003.0004
[data epistemology;data ontology;evidence;philosophy of science;raw data]
Chapter 3 articulates a novel relational view of the epistemology of data. I portray data as analytic category that emerges from my empirical discussion and relates it to existing philosophical treatments. Data are tools that scientists use to understand the world and communicate with each other: sources of knowledge, rather than knowledge in themselves. I argue that this is what biologists typically mean when talking about ‘raw data’, or ‘data in need of an interpretation’: an object becomes a datum when it is viewed as potential or actual evidence for one or more prospective knowledge claims, and given the unpredictable variety of claims which data might be used to corroborate, their evidential value can vary. Data do not have truth-value in and of themselves, nor can they be seen as straightforward representations of given phenomena. Rather, data are relational and fungible objects, which are defined by their portability and their prospective usefulness as evidence. (pages 69 - 92)
This chapter is available at:
    https://academic.oup.com/chica...


DOI: 10.7208/chicago/9780226416502.003.0005
[experimentation;metadata;data generation;data analysis;scientific knowledge]
Chapter 4 examines the relationship between experimental practices at the bench, particularly ones that involve the physical manipulation of organic materials, and the processes through which data on organisms are packaged for dissemination. Capturing the embodied knowledge involved in experimental work is a crucial task for database curators. This type of knowledge is a key component of data analysis; and codifying it in ways that can be accessed and understood by a wide variety of users constitutes one of the fundamental challenges confronted by database curators. A close investigation of how databases are set up reveals how information about experimental practices of data production (‘meta-data’) is essential to the successful adoption and re-use of data in a new research context, and yet cannot be accurately captured without relying on regular feedback and active participation by database users. Tracing the ways in which embodied knowledge is involved in data journeys sheds light on the high number and varied expertise of individuals involved in making data travel. This leads me to portray data-centric biology as a remarkably effective form of distributed cognition; and reject the possibility that experimental research could be fully automated, and thus replaced by data analysis in silico.
This chapter is available at:
    https://academic.oup.com/chica...


DOI: 10.7208/chicago/9780226416502.003.0006
[theory;ontologies;Gene Ontology;classification;bio-ontologies;scientific knowledge]
Chapter 5 shifts the focus of analysis from embodied to propositional knowledge, by investigating the conceptual scaffolding developed by curators to order and systematize data, so as to facilitate their travel and future re-use. In particular, I describe on the classificatory systems and practices involved in data organisation and dissemination within model organisms databases, with particular emphasis on bio-ontologies such as the Gene Ontology. I argue that the activities through which bio-ontologies are created and maintained should be viewed as theory-making rather than simply theory-laden, insofar as they involve the formalization and expression of the scientific significance attributed to the data being classified. This brings me to highlight the role of theories in data-centric biology and point out that, far from being theory-free or signalling the “end of theory”, this approach is imbued with theoretical commitments. In the second part of the chapter, I propose that at least some of the classificatory activities involved in research, including the ones used by database curators in model organism biology, can be usefully regarded as generating their own specific form of scientific theory. I call this classificatory theory, and discuss its features and role both within and beyond data-centric research.
This chapter is available at:
    https://academic.oup.com/chica...

- Sabina Leonelli
DOI: 10.7208/chicago/9780226416502.003.0007
[data integration;data use;data reuse;plant science;translational research;Big Data]
Chapter 6, which is targeted at anyone interested specifically in the life sciences, examines the implications of my analysis of data-centric science for biology as a discipline and for the knowledge thus gained about living organisms. I consider three examples of successful data integration within the field of plant science, which I take to exemplify three different modes of data integration in biology as a whole, each with its own goals, means and ways of valuing and assessing data. I then use this analysis to stress how differences in how data are elected, analyzed and integrated may challenge existing conceptions of what counts as scientific knowledge in the first place. Finally, I highlight the opportunities offered by data-centric research as well as the dangers and misconceptions associated with big data rhetoric and practices, paying particular attention to related processes of inclusion and exclusion, and the ways in which data infrastructures can affect the visibility and future development of research traditions both within and beyond the life sciences. (pages 141 - 175)
This chapter is available at:
    https://academic.oup.com/chica...

- Sabina Leonelli
DOI: 10.7208/chicago/9780226416502.003.0008
[inference from data;situations;scientific epistemology;research context;data epistemology;process epistemology]
Chapter 7 focuses on the processes through which data are situated within specific contexts of inquiry. I start from the notion of ‘research context’ itself, which remains under-theorized within philosophy despite the importance that is routinely attributed to it in assessing the background, development and results of science. Using insights from John Dewey’s pragmatist logic of inquiry, I reflect on what constitutes context in scientific research and how this notion can be used to better understand and investigate, rather than dismiss or discount, its social and material nature. My analysis of data journeys in biology illustrates how what counts as data is strongly intertwined with what counts as context for their interpretation, and for whom, at any point of their travels. What counts as data can therefore vary considerably, and how these variations are managed matters enormously to the development and results of scientific inquiry. (pages 176 - 192)
This chapter is available at:
    https://academic.oup.com/chica...

- Sabina Leonelli
DOI: 10.7208/chicago/9780226416502.003.0009
[data epistemology;data intensive science;scientific progress;big data;open data]
The conclusion summarises the arguments within the book. I defend the idea that data-centrism brings new salience to aspects of scientific practice which have always been vital to successful empirical research and yet have been overlooked by policy-makers, funders, publishers, philosophers of science and sometimes even scientists themselves, who have frequently evaluated science in terms of its products (e.g. new claims about phenomena or technologies for intervention in the world) rather than in terms of the processes through which such results are eventually achieved. These include the processes involved in valuing data as a key scientific resource; structuring scientific institutions and credit mechanisms so that data dissemination is supported and regulated in ways conducive to the advancement of both science and society; and situating and organising data into a context within which they can be interpreted reliably. (pages 193 - 198)
This chapter is available at:
    https://academic.oup.com/chica...