Skip to main content

2022 | Buch

Modern Survey Analysis

Using Python for Deeper Insights

insite
SUCHEN

Über dieses Buch

This book develops survey data analysis tools in Python, to create and analyze cross-tab tables and data visuals, weight data, perform hypothesis tests, and handle special survey questions such as Check-all-that-Apply. In addition, the basics of Bayesian data analysis and its Python implementation are presented. Since surveys are widely used as the primary method to collect data, and ultimately information, on attitudes, interests, and opinions of customers and constituents, these tools are vital for private or public sector policy decisions.

As a compact volume, this book uses case studies to illustrate methods of analysis essential for those who work with survey data in either sector. It focuses on two overarching objectives:

Demonstrate how to extract actionable, insightful, and useful information from survey data; andIntroduce Python and Pandas for analyzing survey data.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction to Modern Survey Analytics
Abstract
There are two things, it is often said, that you cannot escape: death and taxes. This is too narrow because there is a third: surveys. You are inundated daily by surveys of all kinds that cover both the private and public spheres of your life. In the private sphere, there are product surveys designed to learn what you buy, use, have, would like to have, and uncover what you believe is right and wrong about existing products. They are also used to determine the optimal marketing mix that consists of the right product, placement, promotion, and pricing combination to effectively sell products. They are further used to segment the market recognizing that one marketing mix does not equally apply to all customers. There are surveys used to gauge how well the producers of these products perform in all aspects of making, selling, and supporting their products. And there are surveys internal to those producers to help business managers determine if their employees are happy with their jobs and if they have any ideas for making processes more efficient or have suggestions and advice regarding new reorganization efforts and management changes.
Walter R. Paczkowski
Chapter 2. First Step: Working with Survey Data
Abstract
You cannot do basic survey data analysis or any type of data analysis, whether it be for surveys or not, without understanding the structure of your data. For surveys, this means at least understanding the background of your respondents: their gender, age, education, and so forth. This amounts to understanding respondents’ profiles. Examples include age distribution, gender distribution, income distribution, political party affiliation distribution, and residency distribution, to mention just a few. Profiles provide a perspective on how your respondents answer the main survey questions; different groups answer differently. But your data have to be organized to allow you to do this. In this chapter, you will gain a perspective on how to organize your data to prepare to look at the basic distributions of your respondents. You will then begin to look at your data in the next chapter.
Walter R. Paczkowski
Chapter 3. Shallow Survey Analysis
Abstract
Once you understand your data’s structure, you can begin to analyze them for your Core Questions. Analysis usually begins by creating tabulations (the “tabs”) and visualizations. I classify these as Shallow Analyses. They are shallow because they only skim the surface of your data, providing almost obvious results but not penetrating insight. Summaries are usually developed and presented as if they are detailed analyses, but they are not the essential and critical information contained in the data. Key decision-makers do not get the information they need to make their best decisions. If anything, Shallow Analysis raises more questions than they answer. In addition, those who develop Shallow Analyses pass the actual analyses onto their clients who must decipher meaning, content, and messages from them. These are the responsibilities of the analysts, responsibilities met by Deep Analysis but left unaddressed by Shallow Analysis.
Walter R. Paczkowski
Chapter 4. Beginning Deep Survey Analysis
Abstract
I had divided the analysis of survey data into Shallow Analysis and Deep Analysis. The former just skims the surface of all the data collected from a survey, highlighting only the minimum of findings with the simplest analysis tools. These tools, useful and informative in their own right, are only the first that should be used, not the only ones. They help you dig out some findings but leave much buried. I covered them and their use in Python in the previous chapter.
Walter R. Paczkowski
Chapter 5. Advanced Deep Survey Analysis: The Regression Family
Abstract
I will discuss some advanced analysis methods in this chapter. Specifically, I will discuss modeling survey responses using linear regression for continuous variable responses, logistic regression for binary variable responses, and Poisson regression for count responses. The latter two are particularly important and relevant for survey data analysis because many survey Core Questions have discrete, primarily binary and count, responses such as “Will you vote in the next presidential election?”, “Do you shop for jewelry online?”, and “How many times have you seen your doctor?” Logistic regression leads to a form of analysis called key driver analysis (KDA) which seeks the key factors that drive or determine a Core Question.
Walter R. Paczkowski
Chapter 6. Sample of Specialized Survey Analyses
Abstract
There are many specialized analyses for Core Questions. I will summarize a few in this chapter to highlight the possibilities with Python.
Walter R. Paczkowski
Chapter 7. Complex Surveys
Abstract
The surveys I considered until now have been “simple” sample surveys. This is not to say that the sampling is trivial or unimportant. It is to say that their design is uncomplicated and easily developed. Simple sample surveys are based on simple random sampling (SRS). Recall that random sampling could be with and without replacement. The former refers to placing a sampled unit back into the population. In essence, the population becomes infinitely large because it is never depleted. Without replacement means that the population size always gets smaller with each sampled unit. So, the probability of selecting any unit changes as units are sampled.
Walter R. Paczkowski
Chapter 8. Bayesian Survey Analysis: Introduction
Abstract
I previously discussed and illustrated deep analysis methods for survey data when the target variable of a Core Question is measured on a continuous or discrete scale. A prominent method is OLS regression for a continuous target. The target is the dependent or left-hand-side variable, and the independent variables, or features (perhaps from Surround Questions such as demographics), are the right-hand-side variables in a linear model. A logit model is used rather than an OLS model for a discrete target because of statistical issues, the most important being that OLS can predict outside the range of the target. For example, if the target is customer satisfaction measured on a 5-point Likert scale, but the five points are encoded as 0 and 1 (i.e., B3B and T2B, respectively), then OLS could predict a value of −2 for the binary target. What is −2? A logit model is used to avoid this nonsensical result. I illustrated how this is handled in Chap. 5.
Walter R. Paczkowski
Chapter 9. Bayesian Survey Analysis: Multilevel Extension
Abstract
The unilevel approach I covered in Chap. 8 is sufficient for many survey-based problems, and, in fact, for many problems whether survey-based or not. It is fundamentally one way of viewing the probabilistic structure of the target variable. There are times, however, as I noted at the beginning of that chapter, when the data structure requires a different approach. That structure is hierarchical with primary sampling units (PSUs) nested under a larger category of objects so that there are multiple levels to the data. The problem is multilevel as opposed to unilevel as in Chap. 8. I want to extend the unilevel Bayesian framework in this chapter to cover multilevel modeling.
Walter R. Paczkowski
Backmatter
Metadaten
Titel
Modern Survey Analysis
verfasst von
Walter R. Paczkowski
Copyright-Jahr
2022
Electronic ISBN
978-3-030-76267-4
Print ISBN
978-3-030-76266-7
DOI
https://doi.org/10.1007/978-3-030-76267-4