Skip to main content
Top

2015 | OriginalPaper | Chapter

Visual Analysis of Topical Evolution in Unstructured Text: Design and Evaluation of TopicFlow

Authors : Alison Smith, Sana Malik, Ben Shneiderman

Published in: Applications of Social Media and Social Network Analysis

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Topic models are regularly used to provide directed exploration and a high-level overview of a corpus of unstructured text. In many cases, it is important to analyze the evolution of topics over a time range. In this work, we present an application of statistical topic modeling and alignment (binned topic models) to group related documents into automatically generated topics and align the topics across a time range. Additionally, we present TopicFlow, an interactive tool to visualize the evolution of these topics. The tool was developed using an iterative design process based on feedback from expert reviewers. We demonstrate the utility of the tool with a detailed analysis of a corpus of data collected over the period of an academic conference, and demonstrate the effectiveness of this visualization for reasoning about large data by a usability study with 18 participants.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
This work is an extension of our prior work [13], in which we originally introduced TopicFlow as a Twitter analysis tool. A video demonstrating this work can be found here: https://​www.​youtube.​com/​watch?​v=​qqIlvMOQaOE&​feature=​youtu.​be
 
2
For TopicFlow, the number of topics is adjustable with a default of 15 to balance granularity and comprehensibility of the resulting topics.
 
3
For this implementation the LDA algorithm runs for 100 iterations with \(\alpha =0.5\) and \(\beta =0.5\).
 
4
For example, Twitter-specific stop words include {rt, retweet, etc.} and Spanish stop words include {el, la, tu, etc.}.
 
5
\(cos(A, B) = \frac{A\cdot B}{\left\| A \right\| \left\| B \right\| }\).
 
6
For prototyping and evaluation purposes, the threshold was set between 0.15 and 0.25 depending on the dataset.
 
7
A prototype of the TopicFlow tool is available for demo here: http://​www.​cs.​umd.​edu/​~maliks/​topicflow/​TopicFlow.​html.
 
8
Twitter’s open API and the fact that tweets are rich with metadata, specifically time stamps, makes it an appropriate data source for prototyping and testing.
 
Literature
1.
go back to reference Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of 23rd international conference on machine learning. ACM Press, New York, pp 113–120 Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of 23rd international conference on machine learning. ACM Press, New York, pp 113–120
2.
go back to reference Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH
4.
go back to reference Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, Tong X (2011) TextFlow: towards better understanding of evolving topics in text. IEEE Trans Vis Comput Graph 17(12):2412–2421CrossRef Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, Tong X (2011) TextFlow: towards better understanding of evolving topics in text. IEEE Trans Vis Comput Graph 17(12):2412–2421CrossRef
5.
go back to reference Hart S, Staveland L (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Hum Mental Workload 1:139–183 Hart S, Staveland L (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Hum Mental Workload 1:139–183
6.
go back to reference Havre S, Hetzler B, Nowell L (2000) ThemeRiver: visualizing theme changes over time. In: Proceedings of IEEE symposium on information visualization, pp 115–123 Havre S, Hetzler B, Nowell L (2000) ThemeRiver: visualizing theme changes over time. In: Proceedings of IEEE symposium on information visualization, pp 115–123
7.
go back to reference Hu Y, Boyd-Graber J, Satinoff B, Smith A (2013) Interactive topic modeling. Mach Learn J 95:423–469 Hu Y, Boyd-Graber J, Satinoff B, Smith A (2013) Interactive topic modeling. Mach Learn J 95:423–469
8.
go back to reference Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7:373–397 (2003) Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7:373–397 (2003)
10.
go back to reference Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 497–506 Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 497–506
11.
go back to reference Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151CrossRefMATH Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151CrossRefMATH
12.
go back to reference Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link LDA: joint models of topic and author community. In: Proceedings of 26th annual international conference on machine learning. ACM Press, New York, pp 665–672 Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link LDA: joint models of topic and author community. In: Proceedings of 26th annual international conference on machine learning. ACM Press, New York, pp 665–672
13.
go back to reference Malik S, Smith A, Hawes T, Dunne C, Papadatos P, Li J, Shneiderman B (2013) Topicflow: visualizing topic alignment of twitter data over time. In: The 2013 IEEE/ACM international conference on advances in social networks analysis and mining Malik S, Smith A, Hawes T, Dunne C, Papadatos P, Li J, Shneiderman B (2013) Topicflow: visualizing topic alignment of twitter data over time. In: The 2013 IEEE/ACM international conference on advances in social networks analysis and mining
14.
go back to reference Mimno D, McCallum A (2007) Organizing the OCA: learning faceted subjects from a library of digital books. In: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries. ACM Press, New York, pp 376–385 Mimno D, McCallum A (2007) Organizing the OCA: learning faceted subjects from a library of digital books. In: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries. ACM Press, New York, pp 376–385
15.
go back to reference Nikulin M (2001) Hazewinkel, Michiel, encyclopaedia of mathematics : an updated and annotated translation of the Soviet. Mathematical encyclopaedia. Reidel Sold and distributed in the U.S.A. and Canada. Kluwer Academic, Boston Nikulin M (2001) Hazewinkel, Michiel, encyclopaedia of mathematics : an updated and annotated translation of the Soviet. Mathematical encyclopaedia. Reidel Sold and distributed in the U.S.A. and Canada. Kluwer Academic, Boston
16.
go back to reference O’Brien WL (2012) Preliminary investigation of the use of Sankey diagrams to enhance building performance simulation-supported design. In: Proceedings of 2012 symposium on simulation for architecture and urban design. Society for Computer Simulation International, San Diego, pp 15:1–15:8 O’Brien WL (2012) Preliminary investigation of the use of Sankey diagrams to enhance building performance simulation-supported design. In: Proceedings of 2012 symposium on simulation for architecture and urban design. Society for Computer Simulation International, San Diego, pp 15:1–15:8
17.
go back to reference Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, New York, pp 248–256 Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, New York, pp 248–256
19.
go back to reference Sopan A, Rey P, Butler B, Shneiderman B (2012) Monitoring academic conferences: real-time visualization and retrospective analysis of backchannel conversations. In: ASE international conference on social informatics, pp 63–69 Sopan A, Rey P, Butler B, Shneiderman B (2012) Monitoring academic conferences: real-time visualization and retrospective analysis of backchannel conversations. In: ASE international conference on social informatics, pp 63–69
20.
go back to reference Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison Wesley, New York Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison Wesley, New York
21.
go back to reference Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101:1566–1581 Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101:1566–1581
22.
go back to reference Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 424–433 Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 424–433
23.
go back to reference Wilbur WJ, Sirotkin K (1992) The automatic identification of stop words. J Inf Sci 18(1):45–55CrossRef Wilbur WJ, Sirotkin K (1992) The automatic identification of stop words. J Inf Sci 18(1):45–55CrossRef
24.
go back to reference Zhai K, Boyd-Graber J, Asadi N, Alkhouja M (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In: ACM international conference on world wide web Zhai K, Boyd-Graber J, Asadi N, Alkhouja M (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In: ACM international conference on world wide web
Metadata
Title
Visual Analysis of Topical Evolution in Unstructured Text: Design and Evaluation of TopicFlow
Authors
Alison Smith
Sana Malik
Ben Shneiderman
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-19003-7_9

Premium Partner