Skip to main content
main-content

Tipp

Weitere Artikel dieser Ausgabe durch Wischen aufrufen

01.12.2019 | Regular article | Ausgabe 1/2019 Open Access

EPJ Data Science 1/2019

Predicting and explaining behavioral data with structured feature space decomposition

Zeitschrift:
EPJ Data Science > Ausgabe 1/2019
Autoren:
Peter G. Fennell, Zhiya Zuo, Kristina Lerman
Wichtige Hinweise

Electronic Supplementary Material

The online version of this article (https://​doi.​org/​10.​1140/​epjds/​s13688-019-0201-0) contains supplementary material.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Abstract

Modeling human behavioral data is challenging due to its scale, sparseness (few observations per individual), heterogeneity (differently behaving individuals), and class imbalance (few observations of the outcome of interest). An additional challenge is learning an interpretable model that not only accurately predicts outcomes, but also identifies important factors associated with a given behavior. To address these challenges, we describe a statistical approach to modeling behavioral data called the structured sum-of-squares decomposition (S3D). The algorithm, which is inspired by decision trees, selects important features that collectively explain the variation of the outcome, quantifies correlations between the features, and bins the subspace of important features into smaller, more homogeneous blocks that correspond to similarly-behaving subgroups within the population. This partitioned subspace allows us to predict and analyze the behavior of the outcome variable both statistically and visually, giving a medium to examine the effect of various features and to create explainable predictions. We apply S3D to learn models of online activity from large-scale data collected from diverse sites, such as Stack Exchange, Khan Academy, Twitter, Duolingo, and Digg. We show that S3D creates parsimonious models that can predict outcomes in the held-out data at levels comparable to state-of-the-art approaches, but in addition, produces interpretable models that provide insights into behaviors. This is important for informing strategies aimed at changing behavior, designing social systems, but also for explaining predictions, a critical step towards minimizing algorithmic bias.

Unsere Produktempfehlungen

Premium-Abo der Gesellschaft für Informatik

Sie erhalten uneingeschränkten Vollzugriff auf alle acht Fachgebiete von Springer Professional und damit auf über 45.000 Fachbücher und ca. 300 Fachzeitschriften.

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 69.000 Bücher
  • über 500 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Umwelt
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Testen Sie jetzt 30 Tage kostenlos.

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 58.000 Bücher
  • über 300 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Testen Sie jetzt 30 Tage kostenlos.

Weitere Produktempfehlungen anzeigen
Zusatzmaterial
Supplementary information (PDF 527 kB)
13688_2019_201_MOESM1_ESM.pdf
Literatur
Über diesen Artikel

Weitere Artikel der Ausgabe 1/2019

EPJ Data Science 1/2019 Zur Ausgabe

Premium Partner

    Bildnachweise