Skip to main content

2018 | OriginalPaper | Buchkapitel

Movie Genre Detection Using Topological Data Analysis

verfasst von : Pratik Doshi, Wlodek Zadrozny

Erschienen in: Statistical Language and Speech Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We show that by applying discourse features derived through topological data analysis (TDA), namely homological persistence, we can improve classification results on the task of movie genre detection, including identification of overlapping movie genres. On the IMDB dataset we improve prior art results, namely we increase the Jaccard score by 4.7% over a recent results by Hoang. We also significantly improve the F-score (by over 15%) and slightly improve the hit rate (by 0.5%, ibid.). We see our contribution as threefold: (a) for general audience of computational linguists, we want to increase their awareness about topology as a possible source of semantic features; (b) for researchers using machine learning for NLP tasks, we want to propose the use of topological features when the number of training examples is small; and (c) for those already aware of the existence of computational topology, we see this work as contributing to the discussion about the value of topology for NLP, in view of mixed results reported by others.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bamman, D., O’Connor, B., Smith, N.A.: Learning latent personas of film characters. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), p. 352 (2014) Bamman, D., O’Connor, B., Smith, N.A.: Learning latent personas of film characters. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), p. 352 (2014)
2.
Zurück zum Zitat Brown, K.A., Knudson, K.P.: Nonlinear statistics of human speech data. Int. J. Bifurcat. Chaos 19(07), 2307–2319 (2009)MathSciNetCrossRef Brown, K.A., Knudson, K.P.: Nonlinear statistics of human speech data. Int. J. Bifurcat. Chaos 19(07), 2307–2319 (2009)MathSciNetCrossRef
4.
Zurück zum Zitat De Silva, V., Ghrist, R.: Homological sensor networks. Not. Am. Math. Soc. 54(1) (2007) De Silva, V., Ghrist, R.: Homological sensor networks. Not. Am. Math. Soc. 54(1) (2007)
5.
Zurück zum Zitat De Silva, V., Ghrist, R.: Coverage in sensor networks via persistent homology. Algebraic Geom. Topol. 7(1), 339–358 (2007)MathSciNetCrossRef De Silva, V., Ghrist, R.: Coverage in sensor networks via persistent homology. Algebraic Geom. Topol. 7(1), 339–358 (2007)MathSciNetCrossRef
7.
Zurück zum Zitat Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence (2010) Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence (2010)
8.
Zurück zum Zitat Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. In: 2000 Proceedings of 41st Annual Symposium on Foundations of Computer Science, pp. 454–463. IEEE (2000) Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. In: 2000 Proceedings of 41st Annual Symposium on Foundations of Computer Science, pp. 454–463. IEEE (2000)
9.
Zurück zum Zitat Freedman, D., Chen, C.: Algebraic topology for computer vision. Comput. Vis. 239–268 (2009) Freedman, D., Chen, C.: Algebraic topology for computer vision. Comput. Vis. 239–268 (2009)
10.
Zurück zum Zitat Gamble, J., Heo, G.: Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivariate Anal. 101(9), 2184–2199 (2010)MathSciNetCrossRef Gamble, J., Heo, G.: Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivariate Anal. 101(9), 2184–2199 (2010)MathSciNetCrossRef
11.
Zurück zum Zitat Guan, H., Tang, W., Krim, H., Keiser, J., Rindos, A., Sazdanovic, R.: A topological collapse for document summarization. In: 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5. IEEE (2016) Guan, H., Tang, W., Krim, H., Keiser, J., Rindos, A., Sazdanovic, R.: A topological collapse for document summarization. In: 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5. IEEE (2016)
12.
Zurück zum Zitat Guss, W.H., Salakhutdinov, R.: On characterizing the capacity of neural networks using algebraic topology. arXiv preprint arXiv:1802.04443 (2018) Guss, W.H., Salakhutdinov, R.: On characterizing the capacity of neural networks using algebraic topology. arXiv preprint arXiv:​1802.​04443 (2018)
14.
Zurück zum Zitat Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. 47(1), 70–84 (1996)CrossRef Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. 47(1), 70–84 (1996)CrossRef
15.
Zurück zum Zitat Kasson, P.M., Zomorodian, A., Park, S., Singhal, N., Guibas, L.J., Pande, V.S.: Persistent voids: a new structural metric for membrane fusion. Bioinformatics 23(14), 1753–1759 (2007)CrossRef Kasson, P.M., Zomorodian, A., Park, S., Singhal, N., Guibas, L.J., Pande, V.S.: Persistent voids: a new structural metric for membrane fusion. Bioinformatics 23(14), 1753–1759 (2007)CrossRef
16.
Zurück zum Zitat Liu, J.Y., Jeng, S.K., Yang, Y.H.: Applying topological persistence in convolutional neural network for music audio signals. arXiv preprint arXiv:1608.07373 (2016) Liu, J.Y., Jeng, S.K., Yang, Y.H.: Applying topological persistence in convolutional neural network for music audio signals. arXiv preprint arXiv:​1608.​07373 (2016)
17.
Zurück zum Zitat Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)
18.
Zurück zum Zitat Michel, P., Ravichander, A., Rijhwani, S.: Does the geometry of word embeddings help document classification? A case study on persistent homology based representations. arXiv preprint arXiv:1705.10900 (2017) Michel, P., Ravichander, A., Rijhwani, S.: Does the geometry of word embeddings help document classification? A case study on persistent homology based representations. arXiv preprint arXiv:​1705.​10900 (2017)
19.
Zurück zum Zitat Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005) Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
20.
Zurück zum Zitat Sami, I.R., Farrahi, K.: A simplified topological representation of text for local and global context. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 1451–1456. ACM (2017) Sami, I.R., Farrahi, K.: A simplified topological representation of text for local and global context. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 1451–1456. ACM (2017)
21.
Zurück zum Zitat Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G., Ringach, D.L.: Topological analysis of population activity in visual cortex. J. Vis. 8(8), 11–11 (2008)CrossRef Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G., Ringach, D.L.: Topological analysis of population activity in visual cortex. J. Vis. 8(8), 11–11 (2008)CrossRef
22.
Zurück zum Zitat Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. (2016) Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. (2016)
23.
Zurück zum Zitat Zhu, X.: Persistent homology: an introduction and a new text representation for natural language processing. In: IJCAI, pp. 1953–1959 (2013) Zhu, X.: Persistent homology: an introduction and a new text representation for natural language processing. In: IJCAI, pp. 1953–1959 (2013)
Metadaten
Titel
Movie Genre Detection Using Topological Data Analysis
verfasst von
Pratik Doshi
Wlodek Zadrozny
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00810-9_11

Premium Partner