Introduction: The Changing Definition of Big Data
In terms of Volume, Big Data are those data that cannot be handled by traditional analytics tools.In terms of Velocity, Big Data refers data that are coming in (almost) real-time.In terms of Variety, Big Data are complex datasets and include very different sources of context such as unstructured text, media content such as images and videos, logs, and other data sources.
-
Internet data: Online text, videos, and sound data. It encompasses all online content relevant to a research question. Using such data is commonly referred to as Internet research methods (Hewson et al. 2016).
-
Social media data. Social media data are a subset of Internet data and include text, photos, and videos which are publicly available by mining social media networks such as Twitter and Facebook. Social media data are probably the first and most studied Big Data for public opinion measurement (Schober et al. 2016).
-
Website metadata, logs, cookies, transactions, and website analytics. These are data produced by websites and analytics tools (think about Google Analytics or Adobe Analytics) and used heavily in online advertisement, shopping analytics, and website analytics.
-
-
The Internet of Things. Internet of Things (IOT) (Gershenfeld et al. 2004) refers to any device that can communicate with another using the Internet as the common transmission protocol. As more and more devices become connected via the Internet, more data are generated and can be used to answer research questions.
-
Behavioral data are a subset of the IOT. Behavioral data come from devices such as smartphones, wearable technology, and smart watches carried by subjects and passively recording data such as locations, physical activities, and health status (e.g., Swan 2013). Behavioral data can also be manually recorded by the users.
-
-
Transaction data. In the business world, transaction data have been around since before electronic data formats existed. They are records of orders, shipments, payments, returns, billing, and credit card activities, for examples (Ferguson 2014). Transaction data are nowadays part of customer relationship management tools where the attempt is to capture every interaction a customer has with a company or product. The area is also called business intelligence (Hsinchun et al. 2012). The same applies to government and public sector where more and more user interactions are stored digitally.
-
Administrative data. Administrative data and registers are a form of Big Data collected by public offices such as national health, tax, school, benefits, and pensions, or driver licenses databases. Administrative data have a long tradition of being used for statistical purposes (Wallgren and Wallgren 2014). Survey data can be linked to administrative data as shown by Sakshaug in this volume. Health data in some countries are collected and stored by private companies but, although they are of the same nature of public health data, they are usually not discussed as administrative data in the academic literature.
-
Commercially available databases. More and more companies are collecting, curating, and storing data about consumers. By using publicly available records, purchasing records from companies, matching techniques (Pasek, this volume), and other algorithms such as imputations from other sources (e.g., census data), these companies create a profile for each individual in their database. They combine data from the previously mentioned sources just described. Examples are Acxiom, Epsilon, Experian Marketing Services, or, in the political domain, Catalist, Aristotle, and NationBuilder. These companies are often referred to as data brokers (Committee on Commerce, Science and Transportation 2013).
The Perspectives About Error and Data Quality
Challenges and New Skills Needed for the Survey Researcher Working with Big Data
Changes in the Survey Landscape
-
Do-it-yourself (DIY) web survey platforms
-
In-house web survey tools
-
From offline data collection methods to web surveys
-
From web surveys to mobile web surveys
-
From outsourced market research to in-house market research using DIY web survey platforms
-
From outsourced market research to in-house market research fully integrated with internal systems
How Surveys and Big Data Can Work Together
Answering the What and the Why
The goal of News Feed is to show you the stories that matter most to you. The actions people take on Facebook – liking, clicking, commenting or sharing a post – are historically some of the main factors considered to determine what to show at the top of your News Feed. But these factors don’t always tell us the whole story of what is most meaningful to you. As part of our ongoing effort to improve News Feed, we ask over a thousand people to rate their experience every day and tell us how we can improve the content they see when they check Facebook – we call this our Feed Quality Panel. We also survey tens of thousands of people around the world each day to learn more about how well we’re ranking each person’s feed.
Surveys Are Just One of a Number of Tools
Strengths and Challenges of Surveys and Big Data
Privacy, Confidentiality, and Transfer of Data
Barack Obama’s campaign began the year of his reelection fairly confident it knew the names of every one of the 69,456,897 Americans whose votes had put him in the White House. The votes may have been cast via secret ballot, but because Obama’s analysts had come up with individual-level predictions, they could look at the Democrat’s vote totals in each precinct and identify the people most likely to have backed him. (Issenberg 2012)
Looking at the Future of Big Data and Surveys
-
a comprehensive coverage of the constructs relevant to a research program.
-
the inclusion of multiple complementary indicators that enable accurate and efficient quantification of the target constructs and their relationships.
-
the application of appropriate tools to extract information from data, derive defensible and useful insights, and communicate them in compelling fashion.
-
Using high-quality surveys to validate the quality of Big Data sources. This is the case of using surveys to validate the accuracy of voter registration records as reported by Berent et al. (2016).
-
Using Big Data to ask better questions in surveys. Big Data can be used as validation data (true value) and different question wording can be tested to determine what is closer to the “true value.” The idea is to extend the traditional validation data used in many medical studies such as physicians or nurse tests (e.g., Kenny Gibson et al. 2014) with validation data collected from wearables, or other IOT devices at scale.
-
Augment Big Data with survey data such as the Google Local Guides.2 This opt-in program asks its users to answer few “Yes, No, Not Sure” questions about locations such as restaurants, stores, or point of interest. For example, users can be asked if the restaurant they just visited is family friendly, or has Wi-Fi.