Skip to main content

2021 | Buch

A Beginner’s Guide to Statistics for Criminology and Criminal Justice Using R

verfasst von: Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Dr. Samuel Langton

Verlag: Springer International Publishing

insite
SUCHEN

Über dieses Buch

This book provides hands-on guidance for researchers and practitioners in criminal justice and criminology to perform statistical analyses and data visualization in the free and open-source software R. It offers a step-by-step guide for beginners to become familiar with the RStudio platform and tidyverse set of packages.

This volume will help users master the fundamentals of the R programming language, providing tutorials in each chapter that lay out research questions and hypotheses centering around a real criminal justice dataset, such as data from the National Survey on Drug Use and Health, National Crime Victimization Survey, Youth Risk Behavior Surveillance System, The Monitoring the Future Study, and The National Youth Survey. Users will also learn how to manipulate common sources of agency data, such as calls-for-service (CFS) data. The end of each chapter includes exercises that reinforce the R tutorial examples, designed to help master the software as well as to provide practice on statistical concepts, data analysis, and interpretation of results.

The text can be used as a stand-alone guide to learning R or it can be used as a companion guide to an introductory statistics textbook, such as Basic Statistics in Criminal Justice (2020).

Inhaltsverzeichnis

Frontmatter
Chapter one. A First Lesson on R and RStudio
Abstract
R is a powerful tool for statistical analyses and data visualization that is widely used and increasingly popular in criminology and criminal justice. R is open source—and it’s free! In early 2020, there were over 15,000 available packages, and the number of things that can be done with R grows exponentially every day as users keep adding new packages. Because it is open source, new statistical methods are quickly implemented, and R offers more analytical solutions, flexibility, and customizability than commonly used statistical software. This book offers a step-by-step guide for beginners to become familiar with the RStudio platform and master the fundamentals of the R programming language quickly and painlessly https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-50625-4_1/MediaObjects/490254_1_En_1_Figa_HTML.gif . The text can be used as a stand-alone guide to learning R, or it can be used as a companion guide to an introductory statistics textbook. Along the way, users will get hands-on experience with using actual criminal justice datasets, and application activity exercises are provided at the end of each chapter to practice covered material.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter two. Getting to Know Your Data
Abstract
Now that you are familiar with creating data in different formats in R, we can start to discuss one of the most important steps of the data analysis process—transforming your data into what Hadley Wickham (Wickham, Journal of Statistical Software, 59(10), 1–23, 2014) calls tidy data. Fittingly, many useful functions for data tidying are in a set of packages called the tidyverse. Data tidying is a very important step that will ensure your data are in the format you need to conduct your analyses. It includes steps such as viewing data types; viewing, editing, and adding both variable labels and value labels; formatting classes; and recoding and creating new variables. Learning basic techniques to determine, for example, how different variables in your dataset are stored or whether a variable has too many missing cases can be extremely useful when you are planning what you can feasibly analyze and how to do it. In pretty much any research project involving data analysis, you can expect that your data will require some level of manipulation. We rarely receive data that are perfectly clean and set up for our purpose! Fortunately, R offers a great deal of flexibility in how to accomplish these tasks. In this section, we will walk through some examples of common data transformations you may need to perform in your own analysis while at the same time practicing the concept of levels of measurement using data from the National Crime Victimization Survey.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter three. Data Visualization
Abstract
Are you sick of those drab-looking graphs and plots in Stata and SPSS? We are too! So, this chapter covers data visualization in R. Specifically, you will be working with ggplot2, a package within the tidyverse set of packages, for making high-quality, reproducible graphics. Data visualization is an accessible, aesthetically pleasing, and powerful way to explore, analyze, and convey complex information. It is an integral part of investigating data and disseminating findings to wider audiences. Learning the basics of data visualization in R can improve your workflow and make your findings easier to interpret and more impactful. This chapter reviews how to visually represent nominal and ordinal data using bar charts and how to visualize ratio and interval data using histograms, scatterplots, line graphs, and boxplots in R. Crime data from Greater Manchester, England, will be used to demonstrate these visualization techniques.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter four. Descriptive Statistics: Measures of Central Tendency
Abstract
This chapter covers some basic and commonly used methods of describing your data, including calculating measures of central tendency. When we talk about measures of central tendency, we are referring to cases that fall in the middle of a distribution. In other words, we can think of these as being the typical or average case. Measures of central tendency are very efficient ways of describing how some variable is distributed in the population. It is very important in the early stages of data analysis to examine your dataset descriptively, particularly looking at the various measures of central tendency (the focus on this chapter), and dispersion (the focus of Chap. 5). Performing descriptive analyses on your data is an important part of the analysis process. Chances are, whether you are working with a dataset you have compiled or commonly available criminal justice datasets, your data will likely be a sample of some population. R makes it simple to describe your sample using the various measures of central tendency. The three common measures of central tendency reviewed in this chapter are the mean, the median, and the mode. We will demonstrate how to compute these statistics using the 2016 Body-Worn Camera Survey from the Law Enforcement Management and Administrative Statistics Survey.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter five. Measures of Dispersion
Abstract
In the last chapter, we covered how to use R functions to calculate various measures of central tendency. But while it is certainly useful to know the mean of a given variable you are examining, it is even more helpful if you also know the spread, or dispersion, of cases around this mean. For this chapter, we will focus on measures of dispersion for both nominal-/ordinal-level variables (variation ratio and index of qualitative variation) and ratio-/interval-level variables (range, variance, standard deviation, and coefficient of relative variation). We will use data from the 2004 Survey of Inmates in State and Federal Correctional Facilities, a nationally representative sample of US prison inmates to demonstrate some of these concepts.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter six. Inferential Statistics
Abstract
In many topics within criminology and criminal justice research, we want to draw conclusions from our data that are generalizable to wider populations. These make our findings relevant to the real world, and not specific to any one study or any one dataset. This is where inferential statistics are particularly useful. There are a number of key concepts underlying inferential statistics that we can demonstrate visually within R. This chapter will review sampling from a population, standard error/confidence intervals, and how to generate data based on a distribution in R. In doing so, you will generate a synthetic dataset of intelligence quotient (IQ) scores for each of the approximately 3.6 million probationers in the United States.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter seven. Defining the Observed Significance Level of a Test
Abstract
It is probably clear to you now that research in criminal justice is often concerned with making inferences to a population based on a sample statistic. During the course of our research, we may often use tests of statistical significance to determine whether we can safely reject a null hypothesis as being true for our population of interest. But, when we conduct these tests, we will always have some risk of what is called a type I error (mistakenly conclude that an intervention or strategy is effective or efficacious). This chapter will illustrate some of the basics of probability theory in R that demonstrates how we identify the risk of type I error. In doing so, you will be posed with scenarios where you will compute binomial probabilities of a criminal court judge delivering a guilty verdict in bench trials using for loops and while loops. This chapter will be using binomial probabilities as an example (covering the multiplication rule and arrangements, specifically).
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter eight. Hypothesis Testing Using the Binomial Distribution
Abstract
Many people involved in criminology and criminal justice research spend time making predictions about populations in the real world. These predictions tend to be based on a theoretical framework and are formally stated as hypotheses in order to answer a specific research question. Using inferential statistics (see Chap. 6), we can test to what extent our data support these hypotheses and provide empirical evidence to support (or reject) our expectations in R. This chapter uses a simulated dataset of results from a crime reduction intervention for at-risk youth to explore how the binomial distribution allows us to generalize from a sample of 100 participants in a study to the wider population.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter nine. Chi-Square and Contingency Tables
Abstract
This chapter introduces methods to explore the relationship between two categorical variables (either measured at the nominal or ordinal levels) using the British Crime Survey data. We will cover how to tabulate and visualize this kind of relationships using a two-way contingency table (also referred to as cross tabulation or cross tabs), but also review the chi-square test (χ2 test), which is the statistical significance test used to infer association in these cases. The chapter also covers Fisher’s exact test and calculation of residuals (difference between observed and expected frequencies).
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter ten. The Normal Distribution and Single-Sample Significance Tests
Abstract
In this chapter, we focus on characteristics of the normal distribution and single-sample significance tests that are used for variables measured at the ratio and interval levels. Specifically, this chapter reviews percentages under the normal curve, application of the 68-95-99.7 rule, and how to conduct a significance test in R for the following: (1) comparing a sample mean to a known population (single-sample z-test for means), (2) comparing a sample mean to an unknown population (single-sample t-test), and (3) comparing a sample proportion to a population proportion (single-sample z-test for proportions). In doing so, the chapter walks through criminal justice-related examples, lays out the null and alternative hypotheses for presented examples, and shows the user how to make a determination about the null hypothesis for the aforementioned tests from R output. Additionally, you will learn how to write your own functions in R.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter eleven. Comparing Two-Sample Means or Proportions
Abstract
In criminal justice research, we are often interested in comparing the means or proportions in two samples of data, either two different groups, the same group across time (before/after), or two related samples (e.g., comparing twins). In this chapter, we will walk through using the independent sample t-test for two sample means, dependent sample t-test for two sample means, or z-test to compare means and proportions in R using data from the National Youth Survey, which is a nationally representative survey of 1725 American adolescents, aged 11–17, to gauge adolescents’ attitudes and behaviors on various topics, including school performance, family life, deviance, drug use, and peer influence.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter twelve. Analysis of Variance (ANOVA)
Abstract
In the previous chapter, you learned to compare the means of a numeric variable between two groups. But what if you want to compare a ratio or interval variable between more than two groups? If you are interested in comparing across more than two groups, you cannot run multiple t-tests because it increases the risk of a type I error (mistakenly concluding an intervention is effective or efficacious). In these instances, you will want to conduct a one-way analysis of variance (ANOVA). In this chapter, you will walk through how to conduct ANOVA and the appropriate post hoc tests by comparing frequencies of stop and searches conducted by the police between neighborhoods across different local authorities in London.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter thirteen. Measures of Association for Nominal and Ordinal Variables
Abstract
Throughout this book, we have covered various ways of measuring the relationships among variables. We have already discussed tests of statistical significance and how they help us infer differences in a population based on a sample from the population. However, tests of statistical significance do not tell us about the strength of associations among variables. In criminal justice research, we often want to detect not only whether a relationship exists among variables but the size of this relationship as well. Determining the size of the relationship among variables makes the interpretation of our results much more meaningful and useful in real-life applications. In this chapter, we focus on how to use various measures of association for nominal- and ordinal-level variables in R by relying on data from the Seattle Neighborhoods and Crime Survey, which aimed to test multilevel theories of neighborhood social organization and crime using telephone surveys of 2,220 Seattle, WA, residents.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter fourteen. Bivariate Correlation
Abstract
This chapter covers how to measure the strength of the relationship between two ratio-/interval- and two ordinal-level variables. The walk-through starts out by visually examining the bivariate relationship between the two variables of interest using a scatterplot. This is important because it will inform us whether we measure the strength of the relationship using a Pearson’s correlation (parametric test for a linear relationship) or a Spearman’s rho/Kendall’s tau correlation (nonparametric test for a nonlinear relationship). The chapter draws on a dataset used by Patrick Sharkey et al. (2017) to study the effect of nonprofit organizations in the levels of crime.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Chapter fifteen. Ordinary Least Squares Regression
Abstract
This chapter provides an introduction to ordinary least squares (OLS) regression analysis in R. This is a technique used to explore whether one or multiple variables (the independent variable or X) can predict or explain the variation in another variable (the dependent variable or Y). OLS regression belongs to a family of techniques called generalized linear models, so the variables being examined must be measured at the ratio or interval level and have a linear relationship. The chapter also reviews how to assess model fit using regression error (the difference between the predicted and actual values of Y) and R2. While you learn these techniques in R, you will be using the Crime Survey for England and Wales data from 2013 to 2014; these data derive from a face-to-face survey that asks people about their experiences of crime during the 12 months prior to interview.
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
16. Correction to: A Beginner’s Guide to Statistics for Criminology and Criminal Justice Using R
Alese Wooditch, Nicole J. Johnson, Reka Solymosi, Juanjo Medina Ariza, Samuel Langton
Backmatter
Metadaten
Titel
A Beginner’s Guide to Statistics for Criminology and Criminal Justice Using R
verfasst von
Alese Wooditch
Nicole J. Johnson
Reka Solymosi
Juanjo Medina Ariza
Dr. Samuel Langton
Copyright-Jahr
2021
Electronic ISBN
978-3-030-50625-4
Print ISBN
978-3-030-50624-7
DOI
https://doi.org/10.1007/978-3-030-50625-4