1 Introduction
2 Topic modeling
2.1 Latent Dirichlet Allocation
2.2 Comparing LDA to related methods
2.3 Extensions of the basic LDA
2.4 Procedures and criteria for model evaluation
2.5 Limitations and critique
3 Approaches and applications in marketing research
3.1 Data structures and data retrieval
Units are already present | Extraction of units beforehand | Synthetically generated data | ||
---|---|---|---|---|
The units to be pre-processed and used in topic models are already present beforehand like words in texts, products in purchase records, etc. | Automatized recognition of discrete units from high dimensional data using algorithms, lexica, etc. | Extraction of discrete units using manually predefined groups / categorical reduction of high dimensional data | Algorithmic generation of entirely artificial data | Generating artificial data based on / including real data |
E.g., Do and Gatica-Perez (2010, pp. 4); | E.g., Ishingaki et al. (2015, pp. 13); |
3.2 Topic model implementation and extensions
Exploratory baseline | Topic models as part of a more complex model or research objective | ||
---|---|---|---|
Using the output of topic models as research results | Using the output of topic models for further processing in a bigger model or research aim | Using the output of other methods / models as vital input for topic models | Using several topic models in a framework |
E.g., basic LDA to cluster textual online reviews from PatientsLikeMe.com (Park and Ha 2016, pp. 1494); Further examples: Cao et al. (2014, p. 8964); Christidis and Mentzas (2013, pp. 4375); Karpienko and Reutterer (2017, p. 17); Luo et al. (2015, pp. 1185); Schröder et al. (2017, p. 42); Sun et al. (2013, p. 7); Wang et al. (2015, p. 3); Yang et al. (2015, p. 419); | E.g., using the clustering output of LDA as input for further models (PCIM & GICIM) (Sun et al. 2013, pp. 4); Further examples: Cao et al. (2014, pp. 8959); Christidis and Mentzas (2013, pp. 4373); Dan et al. (2017, pp. 46); Karpienko and Reutterer (2017, pp. 11); Luo et al. (2015, pp. 1180); Schröder et al. (2017, pp. 42); Yang et al. (2015, pp. 420); | E.g., VSTM, which consists of a foreground and a background topic model (Cao et al. 2014, p. 8959); |
Integrating additional variables | Changing the inference method | Changing basic assumptions | Introducing constraints |
---|---|---|---|
Incorporating extra information into the model in terms of additional variables / parameters | Changing the inference method (Variational Approximation with EM in original LDA) (Blei et al. 2003, p. 1003) to optimize the predictive performance, the rate of convergence and the computational effectiveness in respect to e.g., the data, the number of topics and hyperparameter settings (Asuncion et al. 2009, pp. 28) | Changing basic assumptions entailed in LDA to adapt to specific data and research interests | Optimizing topics to learn by putting constraints into the model in respect to certain purposes and assumptions of the specific research endeavour |
E.g., assumed distributions (e.g., Trusov et al. 2016, pp. 415), bag-of-words (Yang et al. 2015, p. 418), that the order of documents does not matter (e.g., Wang et al. 2012, pp. 124), etc. Further examples: Büschken and Allenby (2016, p. 958); Phuong and Phuong (2012, pp. 66); Ramage et al. (2010, p. 132); |
3.3 Evaluation procedures
Computational performance | Optimal parameter settings | Model Fit (in sample and predictive out of sample) | Analysis of clustering output | Analysis of the estimator (for inference) |
---|---|---|---|---|
Evaluating the computational performance in terms of e.g., computational time, scalability, etc | Evaluating/determining the optimal parameter settings, in terms of e.g., number of topics, prior values, etc | Evaluating the predictive performance of the model (as a topic quality indicator), e.g., in sample and out of sample, using real or synthetic data | Evaluating the clustering output (topics) in terms of e.g., semantic coherence, exclusivity, etc | Analyzing the estimator for topic inference, e.g., in terms of number of iterations for convergence |
E.g., Ishingaki et al. (2015, pp. 13); |
Types of comparisons | Sources |
---|---|
Human ratings/scores | E.g., Tirullinai and Tellis (2014, pp. 470); |
Comparing results of an automated process to human ratings/scores and evaluations | |
External reports and categories | E.g., Tirullinai and Tellis (2014, pp. 470); |
Comparing the clustering output of topic models to external reports (e.g., consumer reports) or already present categories | |
Traditional clustering techniques | E.g., Trusov et al. (2016, p. 417); |
Comparing the output of topic models to traditional customer segmentation and clustering techniques | |
Specific metrics, associated with a field | E.g., Weng et al. (2010, pp. 267); |
Comparing a topic model to specific algorithms, associated with a research field (like page rank, in degree, etc.) | |
Topic modelsa | |
Comparing a topic model to other topic models | |
Comparing (mathematical, componential, parameterwize (like the number of topics)) variations of the same topic model |