Skip to main content

Über dieses Buch

This edited volume focuses on big data implications for computational social science and humanities from management to usage. The first part of the book covers geographic data, text corpus data, and social media data, and exemplifies their concrete applications in a wide range of fields including anthropology, economics, finance, geography, history, linguistics, political science, psychology, public health, and mass communications.

The second part of the book provides a panoramic view of the development of big data in the fields of computational social sciences and humanities. The following questions are addressed: why is there a need for novel data governance for this new type of data?, why is big data important for social scientists?, and how will it revolutionize the way social scientists conduct research?

With the advent of the information age and technologies such as Web 2.0, ubiquitous computing, wearable devices, and the Internet of Things, digital society has fundamentally changed what we now know as "data", the very use of this data, and what we now call "knowledge". Big data has become the standard in social sciences, and has made these sciences more computational. Big Data in Computational Social Science and Humanities will appeal to graduate students and researchers working in the many subfields of the social sciences and humanities.



Chapter 1. Big Data in Computational Social Sciences and Humanities: An Introduction

This chapter provides an overview of the current development of big data in the computational social sciences and humanities. It is composed of two parts. In the first part, we review works incorporating the three most frequently seen types of big data, namely geographic data, text corpus data, and social media data, that are used to conduct research on the social sciences in a wide range of fields, including anthropology, economics, finance, geography, history, linguistics, political science, psychology, public health, and mass communications. The second part of the chapter provides a panoramic view of the development of big data in the computational social sciences and humanities, including recent trends and the evoked challenges. As for the former, we review four representative cases of its timely development. They are big data finance, big data in psychology, the spatial humanities, and cloud computing. As for the latter, we present an overview of four challenges associated with big data, namely the complexity of big data or the ontology and epistemology of big data, big data search, big data simulation, and big data risk.
Shu-Heng Chen, Tina Yu



Chapter 2. Application of Citizen Science and Volunteered Geographic Information (VGI): Tourism Development for Rural Communities

Owing to the rapid development of geospatial and mobile communication technologies in recent years, acquisition of high-quality spatial and temporal information has become much more efficient and cost effective than before. As a result, many researchers deem that data collected by volunteers with little training can be used for scientific researches if carefully designed quality assurance process is performed. In this chapter, we introduce the application of VGI (volunteered geographic information) in spatial humanities. In particular, we demonstrate the procedures for obtaining high-quality spatiotemporal information of various community resources from data collected by volunteers equipped with mobile devices such as smart phones, tablet PC, and GPS (global positioning system) tagger.
Jihn-Fa Jan

Chapter 3. Telling Stories Through R: Geo-Temporal Mappings of Epigraphic Practices on Penghu

Analyzing the transformation of epigraphic practices on Penghu in the wake of the Japanese occupation, we try to shed some light on a keystone event that shaped a century of epigraphic practices on Penghu and Taiwan and that has the potential to help us understanding the nature of social practices, their emergence, transformation, and conceptualization as tradition. We show to what extent the geological and climatic conditions of a site shape these processes, yet without fully determining their development. Instead, human agents twist and fix shaped practices in accordance with their political or economic strategies, trying to outperform potential rivals and to conquer new markets with convincing cosmologies that sell their strategic inventions. Based on a large digital archive of epigraphic practices in East Asia, we try to set up a showcase of how to approach this and similar datasets in a framework of digital humanities that is driven by a hermeneutic concern of understanding textual or symbolic communication. We use R as a Swiss army knife that allows us to plot tables, timelines, maps, and more, hoping to create a wider interest in data exploration endeavors, which despite their technical appearance, endow our lives with meaning and a sense of relatedness.
Oliver Streiter

Chapter 4. Expressing Dynamic Maps Through Seventeenth-Century Taiwan Dutch Manuscripts

The proposed book chapter will guide the reader through the process involved in applying digital software to seventeenth-century Dutch handwritten manuscripts which document the presence of the Dutch community in Taiwan and are indispensable for our understanding of Taiwan history in a global setting. The case study for this exercise is the digitalized version of the Church Minutes (Kercboek) of the manuscript Kercboek, Brievenboek van Formosa, 23 januari 1642-4 maart 1660 in Dutch and English translation. The following steps will come to the attention. How to handle issues of transcription and transliteration in consideration of non-standardized orthography and spelling?
This particularly pertains to place and personal names which are used as main entries in Proper Noun (PN) recognition. Second, documenting how the set of training documents which generate word-clips are applied to generate candidate PNs. Finally, some observations will be shared how working with digitalized historical documents pertaining to Dutch Formosa research enables a new line of inquiry that approaches the cultural encounter between Dutch and indigenous society by paying attention to the various ways in which the encounter was expressed, represented, and by which our current understanding is shaped.
Ann Heylen

Chapter 5. Has Homo economicus Evolved into Homo sapiens from 1992 to 2014: What Does Corpus Linguistics Say?

Thaler (Journal of Economic Perspectives 14:133–141, 2000) predicted that the paradigm of Homo economicus, which basically formulates the rationality of economic behavior in an ideal mathematical optimization framework and had dominated orthodox economics for a substantial period of the entire twentieth century, would “evolve” into the paradigm of Homo sapiens, which emphasizes the consideration of the psychological, cultural, and social factors that constrain a human’s rationality. We applied a corpus linguistic approach to examine whether this prediction is true. To this end, we built a corpus using the abstracts of 51,285 economics research articles published from 1992 to 2014 in 42 mainstream economics journals. By analyzing the upward-trending and downward-trending words in this corpus, we found the Homo sapiens paradigm to have expanded significantly, while there was no clear evidence of the concession of the Homo economicus paradigm. From the analysis of increasingly used words related to Homo sapiens we can further attribute the expansion of the Homo sapiens paradigm to the research attention increasingly drawn to the interdisciplinary integration of the social sciences, human heterogeneity and (cognitive) constraints, and the complexity of economic behaviors. Likewise, from the analysis of words related to Homo economicus that are less and less used, we found that the research attention directed to the concept of equilibrium was gradually drawn away. Our main finding based on the corpus linguistic analysis was further supported and consolidated by the co-word network analysis.
Yawen Zou, Shu-Heng Chen

Chapter 6. Big Data and FinTech

In this chapter, we examine the research issues related to the real-time and mobile data analytics in the area of FinTechs. The issues examined consist of the non-traditional data analytics approach, news media sentiment analysis and opinion mining, asset pricing modeling, real-world financial multi-case study, and mobile cloud computing creation. We develop the multifactor asset pricing model, the multiword text analytics approach, the supply demand framework of financial service innovation, and the Big Data mobile prototype system. The promising research results include the technology transfer to a start-up firm, the university-industry cooperation, the set of three apps in the Android store, and the multi-case study reports and academic publications.
Jia-Lang Seng, Yao-Min Chiang, Pang-Ru Chang, Feng-Shang Wu, Yung-Shen Yen, Tzu-Chieh Tsai

Chapter 7. Health in Biodiversity-Related Conventions: Analysis of a Multiplex Terminological Network (1973 –2016)

Included from 1992 in the International Convention on Biological Diversity (CBD), themes related to Health are increasingly cited in later COPs (Conferences of the Parties) as well as taken into account into other conventions (CMS, the Convention on the Conservation of Migratory Species or CITES, Convention on International Trade in Endangered Species). From a biodiversity perspective, Health thematic encompasses dimensions of human health, animal health (domestic and wild fauna), and ecosystem health. Other ecological or environmental concepts such as biodiversity, ecosystemic approach, and risks assessment favored the emergence of Health issues and their integration into the CBD.
Having realized the mining of the textual corpus associating the three conventions related to biodiversity and all the decisions or resolutions of their respective COPs up to 2014, we obtain more than 22,172 complex nominal terms among which 213 are related to Health. Those terms are organized hierarchically into concepts (micro-ontologies), specific to each concept linked to Health (biodiversity, disease, health, pathogen, security, warning, etc.). We thus analyze how concepts are used in a complete or partial form in each COP and how they are transmitted between COPs through a multiplex network: each type of link of the network corresponds to a concept. Then, we identify the most central COPs and their gathering into communities in the process of Health issues emergence. The terminological network links being colored by concepts, we analyze how each concept contributes to the building of an integrative and multi-dimensional approach of Health issues within the main biodiversity-related conventions.
Claire Lajaunie, Pierre Mazzega, Romain Boulet

Chapter 8. How Does Linguistic Complexity in Shakespeare’s Plays Relate to the Production History of a Commercial American Theater?

Sweller’s Cognitive Load Theory (CLT) prompted educators to analyze the cognitive load they are placing upon their students. A Shakespearean play as performed commercially for modern audiences may be classified as entertainment, but in terms of CLT, the spoken words and actions on stage may be classified as signals, and are therefore subject to analysis. This paper uses a tool to quantify the linguistic complexity (LC) of the spoken text of each play in the Shakespeare Corpus along four dimensions: average syllables per word, average words per sentence, percentage of complex words, and percentage of words not found in a standard dictionary. The plays were ranked from lowest LC score to highest. Then these rankings were compared with the ranked production frequency of a commercial Shakespearean theater, whose mission facilitates a modern audience experiencing the play as Shakespeare originally intended. Results indicate that the plays offered with the highest frequency over the theater’s history were also among the least complex of Shakespeare’s plays. Therefore, there appears to be a relationship between linguistic complexity in the text of Shakespeare’s plays and the commercial viability of offering those plays to a paying audience. As the linguistic complexity of a performed play affects the cognitive load on audience members, it is reasonable that plays with the lowest linguistic complexity will be chosen for production more often than their higher linguistic complexity counterparts for a theater that seeks to successfully entertain patrons and keep them coming back.
Brian Kokensparger

Chapter 9. Language Communities, Corpora, and Cognition

Language data are digitized for analyzing and computing patterns of linguistic form, meaning, and use shaped and reshaped in the interactions of the users in social–cultural contexts in homogeneous or heterogeneous language communities. In this chapter, the rationale and tenets of the corpus-driven paradigm are introduced. Then, three areas of studies of linguistic patterns and cognition based on digitized corpus data collected from different language communities are discussed so as to understand what kinds of corpus data are employed in language studies, how the corpus data manifest the recurrent patterns of linguistic form, meaning, and use in various social–cultural contexts, and how the linguistic patterns reveal the linguistic cognition of a language community. In the first study, the corpus-based linguistic findings in news media demonstrate the intricate patterns of language in the social–cultural discourse in Taiwan. In the second, the use of language and gesture in Taiwan Mandarin shows the cross-modal behaviors and cognition embodied in people’s perceptual and bodily experiences in recurrent individual and social–cultural practices. Finally, in the third study, the narrative data produced by typical and atypical children in Taiwan Mandarin sheds light on the developmental integration of social–emotional, cognitive, and linguistic abilities across the two groups of young language users.
Huei-Ling Lai, Kawai Chui, Wen-Hui Sah, Siaw-Fong Chung, Chao-Lin Liu

Chapter 10. From Naive Expectation to Realistic Progress: Government Applications of Big Data on Public Opinions Mining

Identifying public policy agenda and relevant issues have served one of the crucial stages in public policy analysis. In addition to the traditional channels such as telephones and newspapers, the Internet has been emerging as the most challenging source of citizens’ complaints and comments. The existing practice and literature, however, appear insufficient to provide systematic investigation for conducting Internet public opinions analysis (IPOA). The present study reflects upon planning and implementing IPOA in a public agency via a series of interviews and field observation.
The field experience contributes to the development of a step-by-step process to facilitate how public officials interact with consulting professionals and the IPOA service provider. Unlike transaction-oriented information systems, implementing IPOA is much similar to a decision support system that requires iterative communication and interpretation among three parties as specified above. Moreover, longitudinal volume and sentiment analyses have effectively provided fundamental insight. Nevertheless, the IPOA results appear to have potential limitation while the policy makers aspire to dig into the events correspondence and in-depth contents related to public attitudes and arguments concerning the policy examined.
Naiyi Hsiao, Zhoupeng Liao, Don-Yun Chen

Chapter 11. Understanding “The User-Generated”: The Construction of the “ABC Model” and the Imagination of “Digital Humanities”

The “Sunflower Movement,” which sprouted in the March of 2014, was viewed as the best evidence to show how students spread information and get organized through Facebook. Compared with studies concerning relationships between Facebook and political-social life, this study focuses on influence of a group of fan pages serving a social movement. We propose a data-driven approach based on the analysis of digital footprints; visualization tools applied in this study are constructed based on the “ABC model” and evolved along with the need of analysis. Through fans’ action characteristics we can construct the role played by each fan page. It is found that most fans only visit once and leave a sole footprint. The active 20% of users either compact their engagement in a short period or construct their involvement during a rather long period.
Hui-Wen Liu, I-Ying Lin, Ming-Te Chi, Kuo-Wei Hsu

Survey and Challenges


Chapter 12. Big Data Finance and Financial Markets

Financial markets are always the most aggressive adopters of new information technologies. The recent boom in big data has enhanced the effect of information diffusion in financial markets since the physical cost of participation has been reduced and interactions among investors have become more efficient. In this chapter, we provide an overview of the current state of the art related to the utilization of big data in financial markets. To start with, we introduce the concept of financial big data from the perspective of complementing our understanding of the predictability and dynamics of financial markets as well as illustrating the changing landscape from conventional media to big data in academic research. Secondly, we summarize the medium effects of financial big data on the efficient market hypothesis and the market dynamics, respectively. Thirdly, we further probe into the underlying mechanisms as to why financial big data exhibits superior predictability and explanatory power for the market dynamics. Finally, this chapter outlines the challenges and promising avenues for future research.
Dehua Shen, Shu-Heng Chen

Chapter 13. Applications of Internet Methods in Psychology

Web technology evolves quickly from Web 1.0 to Web 2.0 and even Web 3.0 since its birth in 1990s. Now it is not only a broadcasting channel (e.g., Wikipedia) but also a platform where people share their opinions, ideas, and sentiments with friends (e.g., social network sites). Therefore, more and more psychologists are interested in how the Web can help us investigate human mind and behaviors. In this chapter, I review different approaches of psychological studies on the Internet as a summary for the current applications of the Internet technology in psychology. The first approach is simply conducting surveys and experiments online, although caution is needed for some types of online experiment. The second approach is using the Internet search engine (e.g., Google or Wikipedia) to search for behavior criteria on Web pages. The last one is directly using social network sites (e.g., Facebook) to investigate people’s behaviors under online social contexts.
Lee-Xieng Yang

Chapter 14. Spatial Humanities: An Integrated Approach to Spatiotemporal Research

Spatial humanities are a sub-discipline of digital humanities based on geographic information systems (GIS) and timelines providing an effective integrating and contextualizing function for geo-cultural attributes. As information systems from multiple sources and in multiple formats they create visual indexes for diverse cultural data. Spatiotemporal interfaces provide new methods of integrating primary source materials into web-based interactive and 3D visualizations. We are able to chart the extent of specific traits of cultural information via maps using GIS gazetteer style spreadsheets for collecting and curating datasets.
The system is based on GIS point locations, routes, and regions linked to enriched attribute information. These are charted and visualized in maps and can be analyzed with network analysis, creating an innovative digital infrastructure for scholarly collaboration and creation of customizable visualizations. This method gives the researchers an expanse of data in layers of time across space providing new tools to advance humanistic inquiry. This in turn becomes a Web-based bulletin board for local community and scholarly knowledge exchange.
David Blundell, Ching-Chih Lin, James X. Morris

Chapter 15. Cloud Computing in Social Sciences and Humanities

Using the cloud implies purchasing time on web based virtual computers or servers. The affordability and accessibility of this level of computer power stands to revolutionize Computational Social Science and Humanities. Increasingly we live in a quantitative world. The ability to read, store, and manipulate larger and larger amounts of data is becoming a prerequisite to be on the cutting edge of research. Econometric methods utilizing big data and high performance computing may shed a new perspective on existing beliefs or unsolved puzzles in Social Sciences and Humanities. In this chapter, some of the rapidly expanding cloud computing options available to the researcher are explored. Vendors vary with respect to costs and accessibility, operating systems, and available software. Microsoft Cloud Solutions, Hewlett Packard Enterprise, and Google Cloud, to name a few, are household names which now provide cloud computing solutions. These solutions are modular services which allow the user to create the environment best suited to their needs and applications, whether that is websites, storage, or complex computational applications. Furthermore, household names in research such as Matlab have partnered with existing cloud solutions such as Amazon so that these familiar applications can easily be scaled up to analyze huge data sets. At the other end of the spectrum, some cloud solution providers provide only barebones Linux, OS X, or Windows operating systems and the researcher is given the opportunity to construct the environment which specifically meets their needs. Included are brief instructions to set up a high performance computing environment using Amazon and freely available Open Source applications such as Open Message Passing Interface and R with R Studio Server. These instructions allow the researcher to build a high performance parallel computing environment with a minimum of time or expense.
Michael J. Gallagher

Chapter 16. Analysis of Social Media Data: An Introduction to the Characteristics and Chronological Process

A means toward understanding the problems facing today’s social scientists is through the analysis of social media data. This analysis is approached by forecasting and analyzing phenomena within social media generated big data. The approach demands interdisciplinary teamwork between the data sciences and other disciplines. The aforementioned is still an emerging discourse, thereby demanding the ongoing devotion of researchers in allied disciplines. This chapter seeks to describe the characteristics, elements, and the chronological process of analyzing social media data from a mass communication scholar’s perspective. It aims to present the chronological process in which a researcher deals with social media data in the form of case studies, and how that researcher deals with the social data regarding the study’s posed question.
Pai-Lin Chen, Yu-Chung Cheng, Kung Chen

Chapter 17. Big Data and Research Opportunities Using HRAF Databases

The HRAF databases, eHRAF World Cultures and eHRAF Archaeology, each containing large corpora of curated text subject-indexed at the paragraph-level by anthropologists, were designed to facilitate rapid retrieval of information. The texts describe social and cultural life in past and present societies around the world. As of the spring of 2018, eHRAF contains almost three million indexed “paragraph” units from over 8000 documents describing over 400 societies and archaeological traditions. This chapter first discusses concrete problems of scale resulting from large numbers of complex elements retrieved by any given search. Second, we discuss potential and partial solutions that resolve these problems to advance research, whether based on specific hypotheses, classification or identifying and evaluating embedded patterns of relationships. Third, we discuss new kinds of research possibilities that can be further advanced, have not yet been successfully attempted, or have not even been considered using anthropological data because of scale and complexity of achieving a result.
Michael D. Fischer, Carol R. Ember

Chapter 18. Computational History: From Big Data to Big Simulations

The first section of this chapter gives an overview on how big data and their mathematical calculation enter in the historical discourse. It introduces the two main issues that prevent ‘big’ results from emerging so far. Firstly, the input is problematic because historical records cannot be easily and comprehensively decomposed into unambiguous fields, except for the population and taxation ones, which are rare and scattered throughout space and time till the nineteenth century. Secondly, even if we run machine-learning tools on properly structured data, big results cannot emerge until we built formal models, with explanatory and predictive powers. The second section of the chapter presents a complex network, data-driven approach to mining historical sources and supporting the perennial historical chase for truth. In the time-integrated network obtained by overlaying all records from the historians’ databases, the nodes are actors, while the links are actions. The third section explains how this tool allows historians to deal with historical data issues (e.g., source criticism, facts validation, trade-conflict-diplomacy relationships, etc.), and take advantage of automatic extraction of key narratives to formulate and test their hypotheses on the courses of history in other actions or in additional data sets. The conclusions describe the vision of how this narrative-driven analysis of historical big data can lead to the development of multiscale agent-based models and simulations to generate ensembles of counterfactual histories that would deepen our understanding of why our actual history developed the way it did and how to treasure these human experiences.
Andrea Nanetti, Siew Ann Cheong

Chapter 19. A Posthumanist Reflection on the Digital Humanities and Social Sciences

The emergence of interdisciplinary studies in the digital humanities and social sciences is relevant to the development of digital technologies. Although the digital humanities were often seen as only a technical support to the “real” humanities studies in the early days, the definition of “digital” changed with the advent of the Internet in the 1990s. Scholars said that, entirely new discipline paradigms were introduced by the two waves of the digital humanities. Thus, researchers in the digital humanities and social sciences currently experience a fundamental transformation of epistemology, in addition to the introduction of new tools and methods. This article investigates this transformation from the “posthumanist” theoretical perspective. The author argues that posthumanist theories can elucidate how researchers and their digital tools coproduce knowledge. In other words, from the posthumanist perspective, this article points out that digital technologies inevitably affect current research practices and knowledge production and, more importantly, researchers also experience fundamental transformations in this coconstitution process.
Chia-Rong Tsao


Weitere Informationen

Premium Partner