Skip to main content
main-content
Top

About this book

Stata is the most flexible and extensible data analysis package available from a commercial vendor. R is a similarly flexible free and open source package for data analysis, with over 3,000 add-on packages available. This book shows you how to extend the power of Stata through the use of R. It introduces R using Stata terminology with which you are already familiar. It steps through more than 30 programs written in both languages, comparing and contrasting the two packages' different approaches. When finished, you will be able to use R in conjunction with Stata, or separately, to import data, manage and transform it, create publication quality graphics, and perform basic statistical analyses.

A glossary defines over 50 R terms using Stata jargon and again using more formal R terminology. The table of contents and index allow you to find equivalent R functions by looking up Stata commands and vice versa. The example programs and practice datasets for both R and Stata are available for download.

Table of Contents

Frontmatter

1. Introduction

Abstract
R [38] is a powerful and flexible environment for research computing. Written by Ross Ihaka, Robert Gentleman (hence the name “R”), the R Core Development Team, and an army of volunteers, R provides a wider range of analytical and graphical commands than any other software. The fact that this level of power is available free of charge has dramatically changed the landscape of research software.
Robert A. Muenchen, Joseph M. Hilbe

2. Installing and Updating R

Abstract
Stata and R are somewhat similar in that both are modular. Each comes with a single “binary” executable file and a large number of individual functions or commands. These are text files that users can modify in a text editor. Both applications come with their own built-in text editors, and both allow the use of outside text editors as well.
Robert A. Muenchen, Joseph M. Hilbe

3. Running R

Abstract
There are several ways you can run R:
• Interactively using its programming language. You can see the result of each function call immediately after you submit it.
• Interactively using one of several graphical user interfaces (GUIs) that you can add on to R. Some of these use programming and some use menus much like Stata.
• Noninteractively in batch mode using its programming language. You enter your program into a file and run it all at once.
Robert A. Muenchen, Joseph M. Hilbe

4. Help and Documentation

Abstract
The full Stata package comes with 17 volumes of reference manuals. Both these manuals and the Stata help files are well written and authoritative and their style is consistent. They are of great help to beginners through advanced users.
Robert A. Muenchen, Joseph M. Hilbe

5. Programming Language Basics

Abstract
R is an object-oriented language. Everything that exists in it — variables, data sets, functions (commands) — are all objects.
Robert A. Muenchen, Joseph M. Hilbe

6. Data Acquisition

Abstract
R can read or import data from a wide range of sources. It includes a data editor for manual input, and it can access files in text as well as Stata format. For other topics, such as reading importing data from relational databases, see the R Data Import/Export manual [39].
Robert A. Muenchen, Joseph M. Hilbe

7. Selecting Variables

Abstract
In Stata, selecting variables for an analysis is simple, while selecting observations is often a bit more complicated. In R, these two processes can be almost identical. As a result, variable selection in R can at times be somewhat more complex. However, since you need to learn that complexity to select observations, it is not much added effort.
Robert A. Muenchen, Joseph M. Hilbe

8. Selecting Observations

Abstract
It bears repeating that the approaches that R uses to select observations are, for the most part, the same as those discussed in the previous chapter for selecting variables. This chapter focuses only on selecting observations, and it does so in the same order as the chapter on selecting variables. The next chapter will cover the selection of variables and observations at the same time but will do so in much less detail.
Robert A. Muenchen, Joseph M. Hilbe

9. Selecting Variables and Observations

Abstract
In the previous two chapters, we focused on selecting variables and observations separately. You can combine those approaches to select both variables and observations at the same time. As an example, we will use the various methods to select the variables workshop and q1 to q4 for only the males.
Robert A. Muenchen, Joseph M. Hilbe

10. Data Management

Abstract
There is an old rule of thumb that says 80% of your data analysis time is spent transforming, reshaping, merging, and otherwise managing your data. Stata has a reputation of being more flexible than R for data management. However, as you will see in this chapter, R can do everything that Stata can do on these important tasks.
Robert A. Muenchen, Joseph M. Hilbe

11. Enhancing Your Output

Abstract
As we have seen, R output is quite sparse and not nicely formatted for word processing. As in Stata, you can improve R’s output by adding value and variable labels. You can also format the output to make beautiful tables to use with word processors, web pages, and document preparation systems.
Robert A. Muenchen, Joseph M. Hilbe

12. Generating Data

Abstract
Stata can generate data in a number of ways. The simplest is by use of the generate command. It can generate data in loops and through the use of matrix operations as well.
Robert A. Muenchen, Joseph M. Hilbe

13. Managing Your Files and Workspace

Abstract
Stata and R both have commands that replicate many of your computer’s operating system functions such as listing names of objects, deleting them, setting search paths, and so on. Learning how to use these commands is especially important because, like Stata, R stores its data in your computer’s limited random access memory. You need to make the most of your computer’s memory when handling large data sets or when a command is highly iterative.
Robert A. Muenchen, Joseph M. Hilbe

14. Graphics Overview

Abstract
Graphics is perhaps the most difficult topic to compare between Stata and R. Both packages contains at least two graphical approaches, each with dozens of options and each with entire books devoted to them. Therefore, we will focus on only two main approaches in R, and we will discuss many more examples in R than in Stata. This chapter focuses on a broad, high-level comparison. The next chapter focuses on R’s traditional graphics. The one after that focuses just on the grammar of graphics approaches used in both R and Stata.
Robert A. Muenchen, Joseph M. Hilbe

15. Traditional Graphics

Abstract
In the previous chapter, we discussed the various graphics packages in R and Stata. Now we will delve into R’s traditional, or base, graphics.
Robert A. Muenchen, Joseph M. Hilbe

16. Graphics with ggplot2

Abstract
As we discussed in Chapter 14, “Graphics Overview,” the ggplot2 package is an implementation of Wilkinson’s Grammar of Graphics (hence the “gg” in its name). The last chapter focused on R’s traditional graphics functions. Many plots were easy, but other plots were a lot of work compared to Stata. In particular, adding things like legends and confidence intervals were complicated.
Robert A. Muenchen, Joseph M. Hilbe

17. Statistics

Abstract
This chapter demonstrates some basic statistical methods. Since this book is aimed at people who already know Stata, we assume you are already familiar with most of these methods. We briefly list each example test’s goal and assumptions and how to get R to perform them. For more statistical coverage see Dalgaard’s Introductory Statistics with R [9], or Venable and Ripley’s much more advanced Modern Applied Statistics with S [51]. For a comprehensive text that shows Stata and R code being used for the same analysis, see Hilbe’s Logistic Regression Models, [21]. As usual, Stata code duplicating the R examples used throughout the text is found at the end of the chapter.
Robert A. Muenchen, Joseph M. Hilbe

18. Conclusion

Abstract
As we have seen, R has many features in common with Stata. Both share rich programming environments optimized for extensibility, functions open for you to see and modify, and flourishing ecosystems of extensions written by their devoted users.
Robert A. Muenchen, Joseph M. Hilbe

Backmatter

Additional information

Premium Partner

    Image Credits