Skip to main content

2017 | OriginalPaper | Buchkapitel

Multi-perspective Hierarchical Dirichlet Process for Geographical Topic Modeling

verfasst von : Yuan He, Cheng Wang, Changjun Jiang

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The pervasion of location acquisition technology has strongly propelled the popularity of geo-tagged user-generated content (UGC), which also raises new computational possibility for investigating geographical topics and users’ spatial behaviors. This paper proposes a novel method for geographical topic modeling by combining text content with user information and spatial knowledge. Topics are estimated as the interests of users and features of locations. The joint modeling of the three heterogeneous sources (1) leads to high accuracy in predicting visit behaviors driven by personal interests, (2) discovers coherent topic representations for topic modeling, (3) enables the recommender system to suggest interpretable locations. Our framework is flexible to incorporate new dimensions of data such as temporal information without substantially changing the model structure. We also experimentally demonstrate the limitations of the traditional assumption that a topic is selected considerably dependent on the location. In many cases, the published topics are mainly affected by the user’s interests rather than the current location. Our model discriminates these two scenarios. Through employing hierarchical Dirichlet process, we also need not predefine the number of topics like other mixture models. Experiments on three different datasets show that our model is effective in discovering spatial topics and significantly outperforms the state of the art.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM (JACM) 57, 7 (2010)MathSciNetCrossRefMATH Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM (JACM) 57, 7 (2010)MathSciNetCrossRefMATH
2.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
4.
Zurück zum Zitat Chang, J., Gerrish, S., Wang, C., Boyd-graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009) Chang, J., Gerrish, S., Wang, C., Boyd-graber, J.L., Blei, D.M.: Reading tea leaves: how humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
5.
Zurück zum Zitat Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Discovering coherent topics using general knowledge. In: CIKM, pp. 209–218 (2013) Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Discovering coherent topics using general knowledge. In: CIKM, pp. 209–218 (2013)
6.
Zurück zum Zitat Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Leveraging multi-domain prior knowledge in topic models. In: IJCAI, pp. 2071–2077 (2013) Chen, Z., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Leveraging multi-domain prior knowledge in topic models. In: IJCAI, pp. 2071–2077 (2013)
7.
Zurück zum Zitat Eisenstein, J., O’Connor, B., Smith, N.A., Xing, E.P.: A latent variable model for geographic lexical variation. In: EMNLP, pp. 1277–1287 (2010) Eisenstein, J., O’Connor, B., Smith, N.A., Xing, E.P.: A latent variable model for geographic lexical variation. In: EMNLP, pp. 1277–1287 (2010)
9.
Zurück zum Zitat He, Y., Wang, C., Jiang, C.: Discovering canonical correlations between topical and topological information in document networks. In: CIKM, pp. 1281–1290 (2015) He, Y., Wang, C., Jiang, C.: Discovering canonical correlations between topical and topological information in document networks. In: CIKM, pp. 1281–1290 (2015)
10.
Zurück zum Zitat He, Y., Wang, C., Jiang, C.: Modeling document networks with tree-averaged copula regularization. In: WSDM, pp. 691–699 (2017) He, Y., Wang, C., Jiang, C.: Modeling document networks with tree-averaged copula regularization. In: WSDM, pp. 691–699 (2017)
11.
Zurück zum Zitat Hong, L., Ahmed, A., Gurumurthy, S., Smola, A.J., Tsioutsiouliklis, K.: Discovering geographical topics in the twitter stream. In: WWW, pp. 769–778 (2012) Hong, L., Ahmed, A., Gurumurthy, S., Smola, A.J., Tsioutsiouliklis, K.: Discovering geographical topics in the twitter stream. In: WWW, pp. 769–778 (2012)
12.
Zurück zum Zitat Hu, B., Ester, M.: Spatial topic modeling in online social media for location recommendation. In: RecSys, pp. 25–32 (2013) Hu, B., Ester, M.: Spatial topic modeling in online social media for location recommendation. In: RecSys, pp. 25–32 (2013)
13.
Zurück zum Zitat Li, J., Adilmagambetovm, A., Jabbar, M.S.M., Zaïane, O.R., Osornio-Vargas, A., Wine, O.: On discovering co-location patterns in datasets: a case study of pollutants and child cancers. GeoInformatica 20, 651–692 (2016)CrossRef Li, J., Adilmagambetovm, A., Jabbar, M.S.M., Zaïane, O.R., Osornio-Vargas, A., Wine, O.: On discovering co-location patterns in datasets: a case study of pollutants and child cancers. GeoInformatica 20, 651–692 (2016)CrossRef
14.
Zurück zum Zitat Li, J., Zaïane, O.R., Osornio-Vargas, A.: Discovering statistically significant co-location rules in datasets with extended spatial objects. In: Data Warehousing and Knowledge Discovery, pp. 124–135 (2014) Li, J., Zaïane, O.R., Osornio-Vargas, A.: Discovering statistically significant co-location rules in datasets with extended spatial objects. In: Data Warehousing and Knowledge Discovery, pp. 124–135 (2014)
15.
Zurück zum Zitat Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011) Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)
16.
Zurück zum Zitat Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Gr. Stat. 9, 249–265 (2000)MathSciNet Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Gr. Stat. 9, 249–265 (2000)MathSciNet
17.
Zurück zum Zitat Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25, 855–900 (1997)MathSciNetCrossRefMATH Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25, 855–900 (1997)MathSciNetCrossRefMATH
18.
Zurück zum Zitat Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP, pp. 248–256 (2009) Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP, pp. 248–256 (2009)
19.
Zurück zum Zitat Sizov, S.: Geofolk: Latent spatial semantics in web 2.0 social media. In: WSDM, pp. 281–290 (2010) Sizov, S.: Geofolk: Latent spatial semantics in web 2.0 social media. In: WSDM, pp. 281–290 (2010)
20.
Zurück zum Zitat Teh, Y.W., Kurihara, K., Welling, M.: Collapsed variational inference for HDP. In: NIPS, pp. 1481–1488 (2007) Teh, Y.W., Kurihara, K., Welling, M.: Collapsed variational inference for HDP. In: NIPS, pp. 1481–1488 (2007)
21.
Zurück zum Zitat Teh, Y.W.: Dirichlet process. In: Encyclopedia of Machine Learning, pp. 280–287 (2010) Teh, Y.W.: Dirichlet process. In: Encyclopedia of Machine Learning, pp. 280–287 (2010)
22.
Zurück zum Zitat Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006)MathSciNetCrossRefMATH Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101, 1566–1581 (2006)MathSciNetCrossRefMATH
23.
Zurück zum Zitat Wei, L.Y., Zheng, Y., Peng, W.C.: Constructing popular routes from uncertain trajectories. In: SIGKDD, pp. 195–203 (2012) Wei, L.Y., Zheng, Y., Peng, W.C.: Constructing popular routes from uncertain trajectories. In: SIGKDD, pp. 195–203 (2012)
24.
Zurück zum Zitat Yin, Z., Cao, L., Han, J., Zhai, C., Huang, T.: Geographical topic discovery and comparison. In: WWW, pp. 247–256 (2011) Yin, Z., Cao, L., Han, J., Zhai, C., Huang, T.: Geographical topic discovery and comparison. In: WWW, pp. 247–256 (2011)
25.
Zurück zum Zitat Yuan, Q., Cong, G., Zhao, K., Ma, Z., Sun, A.: Who, where, when, and what: a nonparametric Bayesian approach to context-aware recommendation and search for twitter users. ACM Trans. Inf. Syst. 33(1), 2 (2015)CrossRef Yuan, Q., Cong, G., Zhao, K., Ma, Z., Sun, A.: Who, where, when, and what: a nonparametric Bayesian approach to context-aware recommendation and search for twitter users. ACM Trans. Inf. Syst. 33(1), 2 (2015)CrossRef
26.
Zurück zum Zitat Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: concepts, methodologies, and applications. TIST 5, 38 (2014) Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: concepts, methodologies, and applications. TIST 5, 38 (2014)
Metadaten
Titel
Multi-perspective Hierarchical Dirichlet Process for Geographical Topic Modeling
verfasst von
Yuan He
Cheng Wang
Changjun Jiang
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-57454-7_63