Skip to main content

About this book

This book springs from a multidisciplinary, multi-organizational, and multi-sector conversation about the privacy and ethical implications of research in human affairs using big data. The need to cultivate and enlist the public’s trust in the abilities of particular scientists and scientific institutions constitutes one of this book’s major themes. The advent of the Internet, the mass digitization of research information, and social media brought about, among many other things, the ability to harvest – sometimes implicitly – a wealth of human genomic, biological, behavioral, economic, political, and social data for the purposes of scientific research as well as commerce, government affairs, and social interaction. What type of ethical dilemmas did such changes generate? How should scientists collect, manipulate, and disseminate this information? The effects of this revolution and its ethical implications are wide-ranging. This book includes the opinions of myriad investigators, practitioners, and stakeholders in big data on human beings who also routinely reflect on the privacy and ethical issues of this phenomenon. Dedicated to the practice of ethical reasoning and reflection in action, the book offers a range of observations, lessons learned, reasoning tools, and suggestions for institutional practice to promote responsible big data research on human affairs. It caters to a broad audience of educators, researchers, and practitioners. Educators can use the volume in courses related to big data handling and processing. Researchers can use it for designing new methods of collecting, processing, and disseminating big data, whether in raw form or as analysis results. Lastly, practitioners can use it to steer future tools or procedures for handling big data. As this topic represents an area of great interest that still remains largely undeveloped, this book is sure to attract significant interest by filling an obvious gap in currently available literature.

Table of Contents



Without Abstract
Jeff Collmann, Sorin Adam Matei

Applying a Contextual Analysis of Privacy in Big Data Research


A Theoretical Framework for Ethical Reflection in Big Data Research

The emergence of massive datasets collected using automated or large scale data harvesting methodologies ushered in by digital tools creates new challenges for researchers with respect to the ethical use and securing of private information. Such concerns are heightened by the fact that big data is likely to be reused, reanalyzed, recombined, and repurposed without explicit consent from the original data providers. The present chapter offers a methodology for assessing the ethical nature of a given big data use situation. The approach is contextual and relational. Researchers are invited to use a privacy matrix that intersects five ethical concerns (non-maleficence, beneficence, justice, autonomy, and trust) with four possible contexts of use (social, science, government, and science). The matrix can be used to determine the significance of the ethical problem for a given situation by tallying the number and assessing the nature of concerns present in using the data in a given context. Furthermore, the heuristic procedure can be enhanced with a modified type of trade-off analysis which starts from a minimum ethical threshold below which trade-offs should not be performed. The matrix and the heuristic method offer a pragmatic, yet ethically grounded approach to dealing with the complex nature of privacy in the context of big data research.
Michael Steinmann, Sorin Adam Matei, Jeff Collmann

Ethical Reasoning Beyond Privacy in Big Data


The Privacy Preferences of Americans

Most Americans are in a transactional frame of mind as they consider whether to share personal information in return for a benefit or surveillance. They care about who is collecting data about them; with whom it is being shared; how it is being used; and the bargain they are being offered. Many think they have lost control of their information and that frustrates them because many believe they cannot reclaim key parts of their identities once their information has been collected. They are divided about whether the legal system is protecting them well enough and many would like to see more laws to protect them. Those who work with big data would benefit from understanding these realities and by disclosing what they are doing; alerting users and companies when they find deficiencies in personal data protection; being explicit about the insights they are gaining from their analysis and the benefits those insights yield; and pressing for legal regimes that minimize potential harms that could emerge from big data.
Lee Rainie

Engaging the Public in Ethical Reasoning About Big Data

The public constitutes a major stakeholder in the debate about, and resolution of privacy and ethical issues in Big Data research about human affairs. Thus, scientists must learn to take public concerns about Big Data research seriously and how to communicate messages designed to build trust in specific big data projects and the institution of science in general. This chapter explores the implications of various examples of engaging the public in online activities such as Wikipedia that contrast with “Notice and Consent” forms and offers models for scientists to consider when approaching their potential subjects in research. Drawing from Lessig, Code and Other Laws of Cyberspace, the chapter suggests that four main regulators drive the shape of online activity: Code (or Architecture), Laws, Markets, and Norms. Specifically, scientists should adopt best practices in protecting computerized Big Data (Code), remain completely transparent about their data management practices (Law), make smart choices when deploying digital solutions that place a premium on information protection (Market), and, critically, portray themselves to the public as seriously concerned with protecting the privacy of persons and security of data (Norms). The community of Big Data users and collectors should remember that such data are not just “out there” somewhere but are the intimate details of the lives of real persons who have just as deep an interest in protecting their privacy as they do in the good work that is conducted with such data.
Justin Anthony Knapp

The Ethics of Large-Scale Genomic Research

The potential for big data to advance our understanding of human disease has been particularly heralded in the field of genomics. Recent technological advances have accelerated the massive data generation capabilities of genomic research, which has allowed researchers to undertake larger scale genomic research, with significantly more participants, further spurring the generation of massive amounts of data. The advance of technology has also triggered a significant reduction in cost, allowing large-scale genomic research to be increasingly feasible, even for smaller research sites. The rise of genetic research has triggered the creation of many large-scale genomic repositories (LSGRs) some of which contain the genomic information of millions of research participants. While LSGRs have genuine potential, they also have raised a number of ethical concerns. Most prominently, commentators have raised questions about the privacy implications of LSGRs, given that all genomic data is theoretically re-identifiable. Privacy can be further threatened by the possibility of aggregation of data sets, which can give rise to unexpected, and potentially sensitive, information. Beyond privacy concerns, LSGRs also raise questions about participant autonomy, public trust in research, and justice. In this chapter, we explore these ethical challenges, with the goal of elucidating which ones require closer scrutiny and perhaps policy action. Our analysis suggests that caution is warranted before any major policies are implemented. Much attention has been directed at privacy concerns raised by LSGRs, but perhaps for the wrong reasons, and perhaps at the expense of other relevant concerns. We do not think that there is yet sufficient evidence to motivate enactment of major policy changes in order to safeguard welfare interests, although there might be some stronger reasons to worry about subjects’ non-welfare interests. We also believe that LSGRs raise genuine concerns about autonomy and justice. Big data research, and LSGRs in particular, have the potential to radically advance our understanding of human disease. While these new research resources raise important ethical concerns, any policies implemented concerning LSGRs should be carefully tailored to ensure that research is not unduly burdened.
Benjamin E. Berkman, Zachary E. Shapiro, Lisa Eckstein, Elizabeth R. Pike

Neurotechnological Convergence and “Big Data”: A Force-Multiplier Toward Advancing Neuroscience

In this chapter, we address if and how big data computational capability could enable the establishment of a common, accessible database for neuroscientific research and its translation that provides a resource for 1) (raw) data harvesting; 2) data fusion; 3) data integration, functional formulation, and exchange; and 4) broad data access and use. We posit that big data represents a force multiplier to augment and optimize the capabilities and de-limit certain constraints that impede broad-scale use of neuroscientific information. More than a simple repository, we maintain that this enterprise would require a dynamic—and secure—resource of tools and methods for harvesting (and provenance), quality evaluation (and data retraction if and when quality issues and/or problems are revealed/elucidated), distribution, and sharing. Such an integrated big data system could allow (a) methodological validity (b) adequate probabilistic inference, and (c) reliability. However big data employment in brain sciences also incurs a number of ethico-legal issues, and these are addressed, and approaches toward their resolution are discussed.
Diane DiEuliis, James Giordano

Data Ethics—Attaining Personal Privacy on the Web

Every form a user fills out, every click a user makes on a website, every comment or recommendation a user posts about a product creates a new data point that is being used by companies and researchers to better understand and potentially infer human behavior. In this chapter we highlight cases when companies and/or researchers stepped over the boundary of ethically reasonable uses of big data or manipulated individuals online without their expressed consent. We describe tools that members of the American public could use to improve the level of user privacy on the Internet and to gain more control over their data. Yet, recognizing the limits of individual protections, we also argue that the time has come to launch a public discussion about ethical uses of large-scale human behavioral data. We should develop guidelines and regulations that protect users while still allowing companies and researchers the ability to advance knowledge about human behavior in responsible ways.
Lisa Singh

Institutionalizing Ethical Reasoning About Big Data


Technology for Privacy Assurance

A government’s ability to quickly collect, analyze, and share information is often considered vital to its national security and public safety. At increasing scale, however, this same information sharing is increasingly placed in tension with the protection of one’s personal liberties. This chapter describes a method for relieving this tension, offering an approach that enables both to be pursued simultaneously. The approach is founded on the concept of an impenetrable “Black Box” in which information can be placed within, but no person can ever access. The operation of this box is defined by encoded policy statements that specify patterns of reasonable suspicion and/or public safety concern. These statements are derived from standard of law and governance guidelines, as interpreted by a duly constituted policy body. The technological approach is specifically designed to only output detection of authorized patterns, but without ever revealing or enabling any sort of access to the information it contains. With no human involvement in the analytic process, substantially higher levels of privacy assurance are believed possible.
J. C. Smart

Institutionalizing Ethical Reasoning: Integrating the ASA’s Ethical Guidelines for Professional Practice into Course, Program, and Curriculum

An emphasis on the application of data mining and other “big data” techniques has raised concerns about ethical use, analysis, and interpretation of large amounts of data obtained from a variety of sources. In 2014, the Committee on Professional Ethics of the American Statistical Association (ASA) initiated a revision of their Ethical Guidelines for Professional Practice, which was completed in 2015. Although interest in these new Guidelines is keen across the ASA membership and leadership, as of the 2013–2014 academic year, only 35 % of universities in the United States required any ethics content for “at least some” of their students in statistics and biostatistics programs. There are two main barriers to increasing this to 100 % of universities and 100 % of students. First is time and effort: either a new (additional) course is needed or time within existing courses must be carved out to accommodate new material. Similarly, teaching and learning about ethics, or professional practice, is qualitatively different— particularly in terms of assessment—than it is around statistics and biostatistics/data analysis. A second barrier is content: it can sometimes seem that “ethics training” is only required for those who violate norms for ethical practice (or the law—by falsifying data, plagiarising, or committing fraud in scientific research). Moreover, most faculty, if they have received training in ethics or in the “responsible conduct of research”, have experienced a major focus on memorization of rules or guidelines and possibly an emphasis on the protection of human subjects and their privacy. However, “statistical practice” involves a great deal more than just the consideration of the human (or animal) subjects in a research study—as does responsible conduct in research. The Guideline principles interact, and sometimes must be prioritized due to their potentially conflicting applicability in any given situation. Therefore, memorization of the Guidelines—or their simple distribution to students or faculty—is unlikely to promote the awareness of their use and importance that is desired by the ASA and the Committee on Professional Ethics that has maintained and revised them. This article outlines elements of the 2015 revision of the ASA Ethical Guidelines for Statistical Practice that are suitable—and important—components of training all undergraduates and graduates whether or not they are statistics majors. It also contains recommendations reflecting current research on how best to promote adult learning—using a constructivist approach consistent with principles of andragogy, and supporting the promotion of the development of expertise, or at least its initiation. Methods for assessment of student work across level (undergrad/grad/post doc/faculty) and context (within major/non-stats majors) are also discussed.
Rochelle E. Tractenberg

Data Management Plans, Institutional Review Boards, and the Ethical Management of Big Data About Human Subjects

This chapter offers investigators a set of decision trees with detailed case examples to help them prepare Data Management Plans (DMP) that address ethical and information protection issues in big data research projects. It opens with an explanation of the 4Rs of big data research that is, reuse, repurposing, recombination and reanalysis. The 4Rs highlight the importance of ethical provence and ethical horizon; that is, the ethical implications of the possibility that investigators using big data for human research will draw their sources from pre-existing data from multiple kinds of sources and that other investigators will use their data in subsequent studies. The DMP decision tree encourages investigators to reflect on the ethical status of any consent provided by the human subjects under consideration, if known or potentially known, for both existing data and data proposed for collection. The decision tree also encourages investigators to reflect on the range of ethical implications of their research, including potential benefits and harms as well as implications for individual and group autonomy, social justice and trust in the institution of science. We offer the DMP decision tree as a tool to help investigators and their organizations become more adept in assessing the privacy and ethical risks of big data research with human subjects and, thus, ensure the public’s acceptance and participation in the projects they plan for the future. Working through the steps of the decision trees also highlights the need for investigators to seek counsel from various institutional resources relevant to protecting human subjects and their information in big data research, particularly their organization’s Institutional Review Board, office of general counsel, and computer security staff. From this perspective, we urge caution against downgrading the role of such organizations in managing human subjects research in big data until the scientific community as a whole has more experience with its complexities.
Jeff Collmann, Kevin T. FitzGerald, Samantha Wu, Joel Kupersmith, Sorin Adam Matei

Integrating Ethical Reasoning into Preparation for Participation to Work in/with Big Data Through the Stewardship Model

In 2001, the Carnegie Foundation for the Advancement and Scholarship of Teaching instituted a 5-year, in-depth review of doctoral training in the United States, the Carnegie Initiative on the Doctorate (CID). To frame that initiative, the project leaders defined a “steward of the discipline” as “someone who will creatively generate new knowledge, critically conserve valuable and useful ideas, and responsibly transform those understandings through writing, teaching, and application.” Although formulated as the purpose of doctoral education, disciplinary stewardship can support a more general model for “education and preparation”, and need not be limited to doctoral training. This chapter argues that stewardship can support the integration of ethical reasoning into preparation to work in/with Big Data. Three dimensions of stewardship that can be introduced earlier than doctoral education are the ideas that: (1) disciplines and fields are dynamic, and require stewardship; (2) the quality and integrity of disciplines must be actively preserved and conserved; and (3) there are particular habits of mind that characterize “those to whom we can entrust” the core features of a discipline or field. In order to formulate clear objectives that will support the explicit and evaluable integration of ethical reasoning across an institution of higher education, doctorally-trained faculty need to perceive its relevance to their own disciplinary stewardship. Efforts to prepare students for participation to work in/with Big Data—whether at or prior to the doctoral program—cannot presume that all possible future situations will have been conceptualized. Preparation for participation to work in/with Big Data must include the initiation of awareness of—and readiness for- the potential to engage with complex ethical, social, and legal implications (ELSI) that have not yet been recognized or encountered. Since Big Data draws practitioners from a wide range of disciplines, it might also be impossible to integrate training for reasoning through ELSI in Big Data into a few specific programs. Thus, a specific and evaluable institutional objective would be “one instructor in every degree-granting program will teach one required course in ethical reasoning for <their discipline>”. The creation of that course is supported by a published developmental framework for instruction in ethical reasoning that can either comprise the single course or could be a first step to initiate career-spanning engagement with ethical reasoning. A course built with these recommendations can support ongoing development in ethical reasoning—both within, and independent of, the target discipline.
Rochelle E. Tractenberg
Additional information

Premium Partner

    Image Credits