1 Introduction
2 Background
2.1 Latent Dirichlet allocation (LDA) model and Author Topic (AT) model
2.2 Dynamic Topic (DT) model, Sequential LDA (S-LDA) model, and Sequential Entity Group Topic (SEGT) model
2.3 Topics Over Time (TOT) model and Temporal Author Topic (TAT) model
Notation | Meaning of notation |
---|---|
D
| Number of documents |
N
| Number of words |
T
| Number of topics |
A
| Number of authors |
V
| Size of vocabulary |
Y
| Number of time spans, e.g., years, in this paper |
\(\alpha \) | A T-dimensional Dirichlet prior vector for \(\theta \) of the authors |
\(\beta \) | A V-dimensional Dirichlet prior vector for \(\phi \) of the topics |
\(\gamma \) | A Y-dimensional Dirichlet prior vector for \(\psi \) of the topics |
\(\theta \) | T-dimensional topic distributions of the authors |
\(\psi \) | Y-dimensional time span distributions of the topics |
\(\phi \) | The topics (the V-dimensional probability distributions over the vocabulary) |
3 Finding topic flows of authors over time
3.1 Problem definition and contributions
3.2 Author Topic-Flow (ATF) model
Notation | Meaning of notation |
---|---|
Y
| Number of time spans, e.g., years, in this paper |
D
| Number of documents |
N
| Number of words |
T
| Number of topics |
A
| Number of authors |
V
| Size of vocabulary |
\(\alpha _a\) | A T-dimensional Dirichlet prior vector for \(\theta \) of the ath author |
\(\beta \) | A V-dimensional Dirichlet prior vector for \(\phi \) |
\(\gamma _a\) | A Y-dimensional Dirichlet prior vector for \(\psi \) of the ath author |
\(\theta _{ay}\) | A T-dimensional topic distribution of the ath author for the yth year |
\(\psi _a\) | A Y-dimensional time span distribution of the ath author |
\(\phi _t\) | The tth topic (the V-dimensional probability distribution over the vocabulary) |
\(a_d\) | An observed author list of the dth document |
\(y_d\) | An observed year, e.g., the time tag of the dth document |
\(a_{dn}\) | An author assignment of the nth word in the dth document |
\(z_{dn}\) | A topic assignment of the nth word in the dth document |
\(\mathbf{z}_{-dn}\) | A vector of topic assignments for all words except the nth word of the dth document |
\(w_{dn}\) | The observed nth word in the dth document |
\(\mathbf{w}\) | A vector of all the observed words |
\(C^{AYT}_{ayt}\) | The number of words that are assigned to the tth topic and the ath author, within the yth year |
\(C^{AY}_{ay}\) | The number of words that are assigned to the ath author within the yth year |
\(C^{TY}_{tv}\) | The number of the vth unique words that are assigned to the tth topic in every document |
4 Experiments
4.1 Description of dataset and environment
4.2 Topic discovery
4.3 Topic-wise research interest over years
Topic | Descriptive languages | Robot group control | ||
---|---|---|---|---|
Models | TAT | ATF | TAT | ATF |
Top 10 words | Ontolgy | Query | Robot | Robot |
Owl | Ontology | Group | Group | |
Logic | Owl | Robots | Control | |
Reason | Logic | Individual | System | |
Query | Dl | Behavior | Task | |
Dl | Reason | Collect | Individual | |
Language | Description | Evolution | Robots | |
Description | Language | Task | Collect | |
Express | Answer | Swarm | Swarm | |
Semantic | Express | Fault | Communication |
Years | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
Ian Horrocks | 10 (8) | 10 (8) | 10 (7) | 10 (6) | 10 (8) |
Bernardo Cuenca | 4 (4) | 5 (4) | 7 (6) | 6 (3) | 3 (2) |
Yevgeny Kazakov | 4 (4) | 1 (1) | 3 (2) | 4 (2) | 2 (1) |
Ulrike Sattler | 8 (8) | 6 (6) | 1 (1) | 2 (2) | 2 (1) |
Birte Glimm | 3 (3) | 2 (2) | 1 (1) | 2 (2) | 3 (3) |
Years | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
Marco Dorigo | 10 (7) | 10 (9) | 10 (8) | 10 (7) | 10 (7) |
Stefano Nolfi | 10 (9) | 4 (3) | 7 (5) | 8 (5) | 2 (2) |
Anders Lyhne Christensen | 4 (4) | 3 (3) | 3 (3) | 6 (4) | 4 (2) |
Rehan O’Grady | 3 (3) | 3 (3) | 2 (2) | 4 (3) | 3 (3) |
Christos Ampatzis | 2 (2) | 1 (1) | 3 (1) | 4 (2) | 3 (3) |
4.4 Author-wise research interest over years
Years | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
Robot group control | 2 | 1 | 1 | 2 | 3 |
Network | 0 | 0 | 0 | 2 | 0 |
Years | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
Rank 1 | Ian | Ian | Ian | Ian | Ian |
Horrocks | Horrocks | Horrocks | Horrocks | Horrocks | |
Rank 2 | Ulrike | Bernardo | Bernardo | Yevgeny | Carsten |
Sattler | Cuenca | Cuenca | Kazakov | Lutz | |
Rank 3 | Birte | Ulrike | Carsten | Bernardo | Birte |
Glimm | Sattler | Lutz | Cuenca | Glimm | |
Rank 4 | Bernardo | Birte | Yevgeny | Birte | Bernardo |
Cuenca | Glimm | Kazakov | Glimm | Cuenca | |
Rank 5 | Yevgeny | Carsten | Birte | Carsten | Yevgeny |
Kazakov | Lutz | Glimm | Lutz | Kazakov |
Years | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
Rank 1 | Ulrike | Ian | Boris | Ian | Ian |
Sattler | Horrocks | Motik | Horrocks | Horrocks | |
Rank 2 | Ian | Ulrike | Bernardo | Boris | Frank |
Horrocks | Sattler | Cuenca | Motik | Wolter | |
Rank 3 | Yevgeny | Bernardo | Ian | Bernardo | Ilianna |
Kazakov | Cuenca | Horrocks | Cuenca | Kollia | |
Rank 4 | Birte | Boris | Carsten | Yevgeny | Carsten |
Glimm | Motik | Lutz | Kazakov | Lutz | |
Rank 5 | Bernardo | Birte | Yevgeny | Giorgos | Birte |
Cuenca | Glimm | Kazakov | Stoilos | Glimm |
2007 | 2008 | 2009 | 2010 | 2011 | |||||
---|---|---|---|---|---|---|---|---|---|
Ian | 10 (8) | Ian | 10 (8) | Ian | 10 (7) | Ian | 10 (6) | Ian | 10 (8) |
Horrocks | Horrocks | Horrocks | Horrocks | Horrocks | |||||
Ulrike | 8 (8) | Ulrike | 6 (6) | Bernardo | 7 (6) | Boris | 5 (4) | Frank | 4 (4) |
Sattler | Sattler | Cuenca | Motik | Wolter | |||||
Yevgeny | 4 (4) | Bernardo | 5 (4) | Boris | 7 (6) | Bernardo | 6 (3) | Carsten | 4 (4) |
Kazakov | Cuenca | Motik | Cuenca | Lutz | |||||
Bernardo | 4 (4) | Boris | 4 (3) | Carsten | 4 (3) | Giorgos | 3 (3) | Ilianna | 3 (3) |
Cuenca | Motik | Lutz | Stoilos | Kollia | |||||
Birte | 3 (3) | Birte | 2 (2) | Frank | 3 (3) | Yevgeny | 4 (2) | Birte | 3 (3) |
Glimm | Glimm | Wolter | Kazakov | Glimm |
4.5 Author ranking
4.6 Finding authors similar to a particular author
2007 | 2008 | 2009 | 2010 | 2011 | |||||
---|---|---|---|---|---|---|---|---|---|
Yevgeny | 4 | Bernardo | 5 | Bernardo | 6 | Bernardo | 4 | Carsten | 0 |
Kazakov | Cuenca | Cuenca | Cuenca | Lutz | |||||
Bernardo | 4 | Birte | 4 | Carsten | 0 | Yevgeny | 3 | Birte | 3 |
Cuenca | Gilmm | Lutz | Kazakov | Glimm | |||||
Birte | 3 | Ulrike | 6 | Yevgeny | 0 | Carsten | 0 | Bernardo | 3 |
Glimm | Sattler | Kazakov | Lutz | Cuenca |
2007 | 2008 | 2009 | 2010 | 2011 | |||||
---|---|---|---|---|---|---|---|---|---|
Ulrike | 8 | Ulrike | 6 | Bernardo | 6 | Boris | 4 | Frank | 4 |
Sattler | Sattler | Cuenca | Motik | Wolter | |||||
Yevgeny | 4 | Bernardo | 5 | Carsten | 0 | Yevgeny | 3 | Birte | 3 |
Kazakov | Cuenca | Lutz | Kazakov | Glimm | |||||
Birte | 3 | Birte | 4 | Hector | 2 | Bernardo | 4 | Bernardo | 3 |
Glimm | Glimm | Perez | Cuenca | Cuenca |
4.7 Author prediction on unseen documents
4.8 Efficiency of models
# of topics | 10 | 30 | 50 | 70 | 90 | Average |
---|---|---|---|---|---|---|
AT model | 126.8 | 372.9 | 568.6 | 856.3 | 1013.1 | 587.54 |
TAT model | 147.2 | 520.9 | 682.3 | 981.3 | 1212.1 | 708.76 |
ATF model | 132.7 | 418.8 | 617.9 | 941.7 | 1105.1 | 1105.1 |