Skip to main content
Top

2016 | OriginalPaper | Chapter

11. Using Generalized Linear (Mixed) Models in HCI

Author : Maurits Kaptein

Published in: Modern Statistical Methods for HCI

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In HCI we often encounter dependent variables which are not (conditionally) normally distributed: we measure response-times, mouse-clicks, or the number of dialog steps it took a user to complete a task. Furthermore, we often encounter nested or grouped data; users are grouped within companies or institutes, or we obtain multiple observations within users. The standard linear regression models and ANOVAs used to analyze our experimental data are not always feasible in such cases since their assumptions are violated, or the predictions from the fitted models are outside the range of the observed data. In this chapter we introduce extensions to the standard linear model (LM) to enable the analysis of these data. The use of [R] to fit both Generalized Linear Models (GLMs) as well as Generalized Linear Mixed Models (GLMMs, also known as random effects models or hierarchical models) is explained. The chapter also briefly covers regularized regression models which are hardly used in the social sciences despite the fact that these models are extremely popular in Machine Learning, often for good reasons. We end with a number of recommendations for further reading on the topics that are introduced: the current text serves as a basic introduction.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
The practice of fitting GLMs is also briefly discussed in Chap. 6 as a method for dealing with non-normal dependent data.
 
2
See also Chap. 3 of this volume for more on data visualization.
 
3
This is quite obvious since in this particular case we generated the data using a second-order polynomial, \(y \,=\, 320 \, +\, 25x \,-\, 0.3x^2 \,+\, \varepsilon \). However, in a real study one would not know the exact data generating model and visual inspection of the data will help to understand the data.
 
4
More formally, finding the “best” of “closest” line can be (and is often) defined by minimizing the squared error \(\sum _{i\,=\,1}^n (y\, -\, X\beta )^2\) where, using matrix notation, X is the \(n \times k\) design matrix, and \(\beta \) the vector of coefficients of length k. For example, for model M1 (\(\hat{y} \,=\, \beta _0 + \beta _1 x\), see body text), X is a \(n \times 2\) matrix of which the first column contains only 1’s (for the intercept) and the second column contains the values \(x_{1, \,\ldots , n}\) respectively. We are looking for the vector \(\beta \) that minimizes the square error which is relatively easy to do by taking the gradient (vector of first order partial derivatives) of the error function and setting it to 0. Minimizing the squared error gives the same solution for \(\beta \) as maximizing the likelihood using a probabilistic framework. Likelihood maximization provides an estimation method that scales more easily to more complex models then the minimixation of the squared error. To use maximum likelihood estimation we would assume \(y | X \sim \mathscr {N}(X\beta , \sigma ^2)\) where \(\sigma ^2\) denotes the residual variance. Hence, in this model we assume the dependent variable y to be distributed normal conditional on X. The likelihood of the dataset, given that we assume our observations to be independent and identically distributed (i.i.d), is just the product over the likelihood for each datapoint. This we can maximize by taking its derivative and setting to zero. Often, for practical purposes, we take the derivative of the log of the likelihood which results in a summation over datapoints instead of a product and is thus easier to differentiate (For more info see, e.g., Gelman and Hill 2007; Millar 2011). In the case of simple linear models (LMs) an exact solution for \(\beta \) exists and is given, in matrix notation, by: \(\hat{\beta } \,=\, (X^T X)^{-1} X^T y\).
 
5
The call takes a number of additional arguments which are not discussed here. For more details one can always type ?lm into the [R] terminal and see the documentation.
 
6
Please be cautious using these types of comparisons: a “good” fit, does not mean the model is true. This chapter is too short to properly cover model selection methods. More on the topic of model selection can be found in (e.g., Bozdogan 1987).
 
7
Note that analysis using only categorical predictors are often thought of in an ANOVA frame-work by most social scientist. However, the lm models and ANOVA models are mathematically the exact same, perhaps with a different choice of dummy encoding and different summary statistics that are of interest.
 
8
Model M0 is a 0th order polynomial, M1 is a first order polynomial of age (the linear term), and M2 a second order polynomial of age (the quadratic term). The poly function easily generates higher order polynomial. The model we fit here thus looks as follows: \(y \,=\, \beta _0 \,+\, \beta _1 \texttt {age} \,+\, \beta _2 \texttt {age}^2 +\, \cdots \,+\, \beta _{30} \texttt {age}^{30}\).
 
9
Actually, in this specific case we know that M3 is modeling noise: we generated the data using only a 2nd order polynomial and some Gaussian noise.
 
10
This procedure outlines the standard methods that are used to asses overfitting: by splitting up a dataset into a training-set and a test-set one can fit models on the training-set, and subsequently evaluate them on the test-set. If a model performs well on the training-set, but badly on the test-set (in terms of error), then the model likely overfits the data.
 
11
Regularization changes the definition of “best” line to one in which we minimize a term that looks like \(\sum _{i\,=\,1}^n (y\, -\,X\beta )^2 \,+\, \lambda f(\beta )\). Here, \(f({\beta })\) is some function whose output grows as the size of the elements of \(\beta \) grow. Using the sum of the absolute elements of \(\beta \) for f(x) is called the “Lasso”, while using the L2 norm is called “ridge” regression. Here, \(\lambda \) is a tuning parameter which determines the magnitude of the penalty.
 
12
Note the use of jitter to display the values of adopt. Since these observations are \(\in \{ 0 ,1 \}\), plotting the values directly would clutter the figure. The jitter command adds a slight random variation to the observed values.
 
13
For example the fact that for the linear model, one of the assumptions—in the maximum likelihood framework—is the fact that the conditional expected value and the variance of the observed variable are unrelated: while this is true for normally distributed outcomes, it is not generally true for many other outcome types.
 
14
For more info on the actual methods of finding a “best” line in this context, using Maximum Likelihood estimation, see (Hastie et al. 2013).
 
15
Often, in a complete unpooled approach, the analyst actually fits multiple, completely independent models. Then, the slopes would also differ per country. Here we focus only on the intercept for illustration purposes.
 
16
From the above it can be seen why also more formally mixed models can be regarded “in-between” pooled and unpooled models: A pooled model is the special case of a mixed model where \(\beta _{[k]}\,\sim \,\mathscr {N}(\beta _0, 0)\), and an unpooled model is the special case where \(\beta _{[k]}\,\sim \,\mathscr {N}(\beta _0, \infty )\).
 
17
The analyst could also include different distributional assumptions regarding the “batches” of coefficient. This however is outside the scope of this article.
 
Literature
go back to reference Bishop CM (2006) Pattern recognition and machine learning. Springer Bishop CM (2006) Pattern recognition and machine learning. Springer
go back to reference Bozdogan H (1987) Model selection and akaike’s information criterion (aic): the general theory and its analytical extensions. Psychometrika 52(3):345–370MathSciNetCrossRefMATH Bozdogan H (1987) Model selection and akaike’s information criterion (aic): the general theory and its analytical extensions. Psychometrika 52(3):345–370MathSciNetCrossRefMATH
go back to reference Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press
go back to reference Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn, vol 1. CRC Press Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn, vol 1. CRC Press
go back to reference Hastie T, Tibshirani R, Friedman J (2013) The elements of statistical learning: data mining, inference, and prediction, vol 11. Springer Science & Business Media Hastie T, Tibshirani R, Friedman J (2013) The elements of statistical learning: data mining, inference, and prediction, vol 11. Springer Science & Business Media
go back to reference Hawkins DM (2004) The problem of overfitting. Journal Chem Inf Comput Sci 44(1):1–12CrossRef Hawkins DM (2004) The problem of overfitting. Journal Chem Inf Comput Sci 44(1):1–12CrossRef
go back to reference Kaptein MC, Eckles D (2012) Heterogeneity in the effects of online persuasion. J Interact Mark 26(3):176–188CrossRef Kaptein MC, Eckles D (2012) Heterogeneity in the effects of online persuasion. J Interact Mark 26(3):176–188CrossRef
go back to reference Kaptein MC, van Halteren A (2012) Adaptive persuasive messaging to increase service retention. J Pers Ubiquit Comput 17(6):1173–1185CrossRef Kaptein MC, van Halteren A (2012) Adaptive persuasive messaging to increase service retention. J Pers Ubiquit Comput 17(6):1173–1185CrossRef
go back to reference Millar RB (2011) Maximum likelihood estimation and inference: with examples in R, SAS and ADMB. Wiley, ChichesterCrossRefMATH Millar RB (2011) Maximum likelihood estimation and inference: with examples in R, SAS and ADMB. Wiley, ChichesterCrossRefMATH
go back to reference Morris CN, Lysy M (2012) Shrinkage estimation in multilevel normal models. 27(1):115–134 Morris CN, Lysy M (2012) Shrinkage estimation in multilevel normal models. 27(1):115–134
go back to reference Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc: Ser B (Stat Methodol) 67(2):301–320MathSciNetCrossRefMATH Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc: Ser B (Stat Methodol) 67(2):301–320MathSciNetCrossRefMATH
Metadata
Title
Using Generalized Linear (Mixed) Models in HCI
Author
Maurits Kaptein
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-26633-6_11