Skip to main content

2015 | OriginalPaper | Buchkapitel

Visual Analysis of Topical Evolution in Unstructured Text: Design and Evaluation of TopicFlow

verfasst von : Alison Smith, Sana Malik, Ben Shneiderman

Erschienen in: Applications of Social Media and Social Network Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Topic models are regularly used to provide directed exploration and a high-level overview of a corpus of unstructured text. In many cases, it is important to analyze the evolution of topics over a time range. In this work, we present an application of statistical topic modeling and alignment (binned topic models) to group related documents into automatically generated topics and align the topics across a time range. Additionally, we present TopicFlow, an interactive tool to visualize the evolution of these topics. The tool was developed using an iterative design process based on feedback from expert reviewers. We demonstrate the utility of the tool with a detailed analysis of a corpus of data collected over the period of an academic conference, and demonstrate the effectiveness of this visualization for reasoning about large data by a usability study with 18 participants.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This work is an extension of our prior work [13], in which we originally introduced TopicFlow as a Twitter analysis tool. A video demonstrating this work can be found here: https://​www.​youtube.​com/​watch?​v=​qqIlvMOQaOE&​feature=​youtu.​be
 
2
For TopicFlow, the number of topics is adjustable with a default of 15 to balance granularity and comprehensibility of the resulting topics.
 
3
For this implementation the LDA algorithm runs for 100 iterations with \(\alpha =0.5\) and \(\beta =0.5\).
 
4
For example, Twitter-specific stop words include {rt, retweet, etc.} and Spanish stop words include {el, la, tu, etc.}.
 
5
\(cos(A, B) = \frac{A\cdot B}{\left\| A \right\| \left\| B \right\| }\).
 
6
For prototyping and evaluation purposes, the threshold was set between 0.15 and 0.25 depending on the dataset.
 
7
A prototype of the TopicFlow tool is available for demo here: http://​www.​cs.​umd.​edu/​~maliks/​topicflow/​TopicFlow.​html.
 
8
Twitter’s open API and the fact that tweets are rich with metadata, specifically time stamps, makes it an appropriate data source for prototyping and testing.
 
Literatur
1.
Zurück zum Zitat Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of 23rd international conference on machine learning. ACM Press, New York, pp 113–120 Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of 23rd international conference on machine learning. ACM Press, New York, pp 113–120
2.
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022MATH
4.
Zurück zum Zitat Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, Tong X (2011) TextFlow: towards better understanding of evolving topics in text. IEEE Trans Vis Comput Graph 17(12):2412–2421CrossRef Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, Tong X (2011) TextFlow: towards better understanding of evolving topics in text. IEEE Trans Vis Comput Graph 17(12):2412–2421CrossRef
5.
Zurück zum Zitat Hart S, Staveland L (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Hum Mental Workload 1:139–183 Hart S, Staveland L (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Hum Mental Workload 1:139–183
6.
Zurück zum Zitat Havre S, Hetzler B, Nowell L (2000) ThemeRiver: visualizing theme changes over time. In: Proceedings of IEEE symposium on information visualization, pp 115–123 Havre S, Hetzler B, Nowell L (2000) ThemeRiver: visualizing theme changes over time. In: Proceedings of IEEE symposium on information visualization, pp 115–123
7.
Zurück zum Zitat Hu Y, Boyd-Graber J, Satinoff B, Smith A (2013) Interactive topic modeling. Mach Learn J 95:423–469 Hu Y, Boyd-Graber J, Satinoff B, Smith A (2013) Interactive topic modeling. Mach Learn J 95:423–469
8.
Zurück zum Zitat Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7:373–397 (2003) Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7:373–397 (2003)
10.
Zurück zum Zitat Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 497–506 Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 497–506
11.
Zurück zum Zitat Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151CrossRefMATH Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151CrossRefMATH
12.
Zurück zum Zitat Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link LDA: joint models of topic and author community. In: Proceedings of 26th annual international conference on machine learning. ACM Press, New York, pp 665–672 Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link LDA: joint models of topic and author community. In: Proceedings of 26th annual international conference on machine learning. ACM Press, New York, pp 665–672
13.
Zurück zum Zitat Malik S, Smith A, Hawes T, Dunne C, Papadatos P, Li J, Shneiderman B (2013) Topicflow: visualizing topic alignment of twitter data over time. In: The 2013 IEEE/ACM international conference on advances in social networks analysis and mining Malik S, Smith A, Hawes T, Dunne C, Papadatos P, Li J, Shneiderman B (2013) Topicflow: visualizing topic alignment of twitter data over time. In: The 2013 IEEE/ACM international conference on advances in social networks analysis and mining
14.
Zurück zum Zitat Mimno D, McCallum A (2007) Organizing the OCA: learning faceted subjects from a library of digital books. In: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries. ACM Press, New York, pp 376–385 Mimno D, McCallum A (2007) Organizing the OCA: learning faceted subjects from a library of digital books. In: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries. ACM Press, New York, pp 376–385
15.
Zurück zum Zitat Nikulin M (2001) Hazewinkel, Michiel, encyclopaedia of mathematics : an updated and annotated translation of the Soviet. Mathematical encyclopaedia. Reidel Sold and distributed in the U.S.A. and Canada. Kluwer Academic, Boston Nikulin M (2001) Hazewinkel, Michiel, encyclopaedia of mathematics : an updated and annotated translation of the Soviet. Mathematical encyclopaedia. Reidel Sold and distributed in the U.S.A. and Canada. Kluwer Academic, Boston
16.
Zurück zum Zitat O’Brien WL (2012) Preliminary investigation of the use of Sankey diagrams to enhance building performance simulation-supported design. In: Proceedings of 2012 symposium on simulation for architecture and urban design. Society for Computer Simulation International, San Diego, pp 15:1–15:8 O’Brien WL (2012) Preliminary investigation of the use of Sankey diagrams to enhance building performance simulation-supported design. In: Proceedings of 2012 symposium on simulation for architecture and urban design. Society for Computer Simulation International, San Diego, pp 15:1–15:8
17.
Zurück zum Zitat Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, New York, pp 248–256 Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of 2009 conference on empirical methods in natural language processing, vol 1. Association for Computational Linguistics, New York, pp 248–256
19.
Zurück zum Zitat Sopan A, Rey P, Butler B, Shneiderman B (2012) Monitoring academic conferences: real-time visualization and retrospective analysis of backchannel conversations. In: ASE international conference on social informatics, pp 63–69 Sopan A, Rey P, Butler B, Shneiderman B (2012) Monitoring academic conferences: real-time visualization and retrospective analysis of backchannel conversations. In: ASE international conference on social informatics, pp 63–69
20.
Zurück zum Zitat Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison Wesley, New York Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison Wesley, New York
21.
Zurück zum Zitat Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101:1566–1581 Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101:1566–1581
22.
Zurück zum Zitat Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 424–433 Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 424–433
23.
Zurück zum Zitat Wilbur WJ, Sirotkin K (1992) The automatic identification of stop words. J Inf Sci 18(1):45–55CrossRef Wilbur WJ, Sirotkin K (1992) The automatic identification of stop words. J Inf Sci 18(1):45–55CrossRef
24.
Zurück zum Zitat Zhai K, Boyd-Graber J, Asadi N, Alkhouja M (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In: ACM international conference on world wide web Zhai K, Boyd-Graber J, Asadi N, Alkhouja M (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in mapreduce. In: ACM international conference on world wide web
Metadaten
Titel
Visual Analysis of Topical Evolution in Unstructured Text: Design and Evaluation of TopicFlow
verfasst von
Alison Smith
Sana Malik
Ben Shneiderman
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-19003-7_9

Premium Partner