Skip to main content
main-content

Über dieses Buch

This textbook is aimed at computer science undergraduates late in sophomore or early in junior year, supplying a comprehensive background in qualitative and quantitative data analysis, probability, random variables, and statistical methods, including machine learning.

With careful treatment of topics that fill the curricular needs for the course, Probability and Statistics for Computer Science features:

• A treatment of random variables and expectations dealing primarily with the discrete case.

• A practical treatment of simulation, showing how many interesting probabilities and expectations can be extracted, with particular emphasis on Markov chains.

• A clear but crisp account of simple point inference strategies (maximum likelihood; Bayesian inference) in simple contexts. This is extended to cover some confidence intervals, samples and populations for random sampling with replacement, and the simplest hypothesis testing.

• A chapter dealing with classification, explaining why it’s useful; how to train SVM classifiers with stochastic gradient descent; and how to use implementations of more advanced methods such as random forests and nearest neighbors.

• A chapter dealing with regression, explaining how to set up, use and understand linear regression and nearest neighbors regression in practical problems.

• A chapter dealing with principal components analysis, developing intuition carefully, and including numerous practical examples. There is a brief description of multivariate scaling via principal coordinate analysis.

• A chapter dealing with clustering via agglomerative methods and k-means, showing how to build vector quantized features for complex signals.

Illustrated throughout, each main chapter includes many worked examples and other pedagogical elements such as

boxed Procedures, Definitions, Useful Facts, and Remember This (short tips). Problems and Programming Exercises are at the end of each chapter, with a summary of what the reader should know.

Instructor resources include a full set of model solutions for all problems, and an Instructor's Manual with accompanying presentation slides.

Inhaltsverzeichnis

Frontmatter

Describing Datasets

Frontmatter

1. First Tools for Looking at Data

Abstract
The single most important question for a working scientist—perhaps the single most useful question anyone can ask—is: “what’s going on here?” Answering this question requires creative use of different ways to make pictures of datasets, to summarize them, and to expose whatever structure might be there. This is an activity that is sometimes known as “Descriptive Statistics”. There isn’t any fixed recipe for understanding a dataset, but there is a rich variety of tools we can use to get insights.
David Forsyth

2. Looking at Relationships

Abstract
We think of a dataset as a collection of d-tuples (a d-tuple is an ordered list of d elements). For example, the Chase and Dunner dataset had entries for Gender; Grade; Age; Race; Urban/Rural; School; Goals; Grades; Sports; Looks; and Money (so it consisted of 11-tuples). The previous chapter explored methods to visualize and summarize a set of values obtained by extracting a single element from each tuple. For example, I could visualize the heights or the weights of a population (as in Fig. 1.7). But I could say nothing about the relationship between the height and weight. In this chapter, we will look at methods to visualize and summarize the relationships between pairs of elements of a dataset.
David Forsyth

Probability

Frontmatter

3. Basic Ideas in Probability

Abstract
We will perform experiments—which could be pretty much anything, from flipping a coin, to eating too much saturated fat, to smoking, to crossing the road without looking—and reason about the outcomes (mostly bad for the examples I gave). But these outcomes are uncertain, and we need to weigh those uncertainties against one another. If I flip a coin, I could get heads or tails, and there’s no reason to expect to see one more often than the other. If I eat too much saturated fat or smoke, I will very likely have problems, though I might not. If I cross the road without looking, I may be squashed by a truck or I may not. Our methods need also to account for information. If I look before I cross the road, I am much less likely to be squashed. Probability is the machinery we use to describe and account for the fact that some outcomes are more frequent than others.
David Forsyth

4. Random Variables and Expectations

Abstract
We have machinery to describe experiments with random outcomes, but we mostly care about numbers that are random. It is straightforward to link a number to the outcome of an experiment. The result is a random variable, a useful new idea. Random variables turn up in all sorts of places. For example, the amount of money you win or lose on a bet is a random variable. Now if you take the same bet repeatedly, you could wonder how much money will change hands in total, per bet. This yields a new and useful idea, the expected value of a random variable.
David Forsyth

5. Useful Probability Distributions

Abstract
We will use probability as a tool to resolve practical questions about data. Here are important example questions. We could ask what process produced the data? For example, I observe a set of independent coin flips. I would now like to know the probability of observing a head when the coin is flipped. We could ask what sort of data can we expect in the future? For example, what will be the outcome of the next election? Answering this requires collecting information about voters, preferences, and the like, then using it to build a model that predicts the outcome. We could ask what labels should we attach to unlabelled data? For example, we might see a large number of credit card transactions, some known to be legitimate and others known to be fraudulent. We now see a new transaction: is it legitimate? We could ask is an effect easily explained by chance variations, or is it real? For example, a medicine appears to help patients with a disease. Is there a real effect, or is it possible that by chance the patients we tested the medicine on felt better?
David Forsyth

Inference

Frontmatter

6. Samples and Populations

Abstract
Very often the data we see is a small part of the data we could have seen. The data we could have observed, if we could have seen everything, is the population . I will write populations like random variables with capital letters to emphasize we don’t actually know the whole population. The data we actually have is the sample (lower case, as usual). We would like to know the population mean , which we write popmean X. We must estimate this using the sample.
David Forsyth

7. The Significance of Evidence

Abstract
Imagine you believe the mean human body weight is 72kg. The mean human weight isn’t a random number, but it’s very hard to measure directly. You are forced to take a sample, and compute the sample mean. This sample mean is a random variable, and it will have different values for different samples. You need to know how to tell whether the difference between the observed value and 72kg is just an effect of variance caused by sampling, or is because the mean weight actually isn’t 72kg. One strategy is to construct an interval around the sample mean within which the true value will lie for (say) 99% of possible samples. If 72kg is outside that interval, then very few samples are consistent with the idea that 72kg is the mean human body weight. If you want to believe the mean human body weight is 72kg, you have to believe that you obtained a very odd sample.
David Forsyth

8. Experiments

Abstract
An experiment tries to evaluate the effects of one or more treatments. Imagine you wish to evaluate the effect of some treatment. One natural strategy is: choose groups of subjects; apply this treatment at different levels to different groups; then see if the groups are different after treatment. For example, you might wish to investigate whether consuming a painkiller affected headaches. Different levels of treatment would correspond to different numbers of pills (say, none, one or two). Different subjects would be different people. We would then divide the subjects into groups, apply different levels of treatment (i.e. give them different numbers of pills), and record outcomes. Now we need to tell whether the groups are different.
David Forsyth

9. Inferring Probability Models from Data

Abstract
One very useful way to draw conclusions from a dataset is to fit a probability model to that dataset. Once this is done, we can apply any of our procedures to the probability model to (for example) predict new data or estimate properties of future data. For example, you could flip a coin ten times and see five heads and five tails. To be able to estimate the probability of seeing heads on a future flip, you would need to fit a model. But once you had done so, you could also estimate the probability that five future flips would give you three heads and two tails, and so on.
David Forsyth

Tools

Frontmatter

10. Extracting Important Relationships in High Dimensions

Abstract
Chapter 2 described methods to explore the relationship between two elements in a dataset. We could extract a pair of elements and construct various plots. For vector data, we could also compute the correlation between different pairs of elements. But if each data item is d-dimensional, there could be a lot of pairs to deal with, and it is hard to plot d-dimensional vectors.
David Forsyth

11. Learning to Classify

Abstract
classifier is a procedure that accepts a set of features and produces a class label for them. Classifiers are immensely useful, and find wide application, because many problems are naturally classification problems. For example, if you wish to determine whether to place an advert on a web-page or not, you would use a classifier (i.e. look at the page, and say yes or no according to some rule). As another example, if you have a program that you found for free on the web, you would use a classifier to decide whether it was safe to run it (i.e. look at the program, and say yes or no according to some rule). As yet another example, credit card companies must decide whether a transaction is good or fraudulent.
David Forsyth

12. Clustering: Models of High Dimensional Data

Abstract
High-dimensional data comes with problems. Data points tend not to be where you think; they can scattered quite far apart, and can be quite far from the mean. There is an important rule of thumb for coping with high dimensional data: Use simple models. One very good, very simple, model for high dimensional data is to assume that it consists of multiple blobs. To build models like this, we must determine which datapoints belong to which blob by collecting together data points that are close and forming blobs out of them.
David Forsyth

13. Regression

Abstract
Classification tries to predict a class from a data item. Regression tries to predict a value. For example, we know the zip code of a house, the square footage of its lot, the number of rooms and the square footage of the house, and we wish to predict its likely sale price. As another example, we know the cost and condition of a trading card for sale, and we wish to predict a likely profit in buying it and then reselling it. As yet another example, we have a picture with some missing pixels—perhaps there was text covering them, and we want to replace it—and we want to fill in the missing values. As a final example, you can think of classification as a special case of regression, where we want to predict either + 1 or − 1; this isn’t usually the best way to classify, however. Predicting values is very useful, and so there are many examples like this.
David Forsyth

14. Markov Chains and Hidden Markov Models

Abstract
There are many situations where one must work with sequences. Here is a simple, and classical, example. We see a sequence of words, but the last word is missing. I will use the sequence “I had a glass of red wine with my grilled xxxx”. What is the best guess for the missing word? You could obtain one possible answer by counting word frequencies, then replacing the missing word with the most common word. This is “the”, which is not a particularly good guess because it doesn’t fit with the previous word. Instead, you could find the most common pair of words matching “grilled xxxx”, and then choose the second word. If you do this experiment (I used Google Ngram viewer, and searched for “grilled *”), you will find mostly quite sensible suggestions (I got “meats”, “meat”, “fish”, “chicken”, in that order). If you want to produce random sequences of words, the next word should depend on some of the words you have already produced.
David Forsyth

Mathematical Bits and Pieces

Frontmatter

15. Resources and Extras

Abstract
This chapter contains some mathematical material that you will likely have seen, but some may not have stayed with you. I have also relegated the detailed discussion of how one splits a node in a decision tree to this chapter.
David Forsyth

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise