Skip to main content
Top

2018 | OriginalPaper | Chapter

Movie Genre Detection Using Topological Data Analysis

Authors : Pratik Doshi, Wlodek Zadrozny

Published in: Statistical Language and Speech Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We show that by applying discourse features derived through topological data analysis (TDA), namely homological persistence, we can improve classification results on the task of movie genre detection, including identification of overlapping movie genres. On the IMDB dataset we improve prior art results, namely we increase the Jaccard score by 4.7% over a recent results by Hoang. We also significantly improve the F-score (by over 15%) and slightly improve the hit rate (by 0.5%, ibid.). We see our contribution as threefold: (a) for general audience of computational linguists, we want to increase their awareness about topology as a possible source of semantic features; (b) for researchers using machine learning for NLP tasks, we want to propose the use of topological features when the number of training examples is small; and (c) for those already aware of the existence of computational topology, we see this work as contributing to the discussion about the value of topology for NLP, in view of mixed results reported by others.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bamman, D., O’Connor, B., Smith, N.A.: Learning latent personas of film characters. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), p. 352 (2014) Bamman, D., O’Connor, B., Smith, N.A.: Learning latent personas of film characters. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), p. 352 (2014)
2.
go back to reference Brown, K.A., Knudson, K.P.: Nonlinear statistics of human speech data. Int. J. Bifurcat. Chaos 19(07), 2307–2319 (2009)MathSciNetCrossRef Brown, K.A., Knudson, K.P.: Nonlinear statistics of human speech data. Int. J. Bifurcat. Chaos 19(07), 2307–2319 (2009)MathSciNetCrossRef
4.
go back to reference De Silva, V., Ghrist, R.: Homological sensor networks. Not. Am. Math. Soc. 54(1) (2007) De Silva, V., Ghrist, R.: Homological sensor networks. Not. Am. Math. Soc. 54(1) (2007)
5.
go back to reference De Silva, V., Ghrist, R.: Coverage in sensor networks via persistent homology. Algebraic Geom. Topol. 7(1), 339–358 (2007)MathSciNetCrossRef De Silva, V., Ghrist, R.: Coverage in sensor networks via persistent homology. Algebraic Geom. Topol. 7(1), 339–358 (2007)MathSciNetCrossRef
7.
go back to reference Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence (2010) Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence (2010)
8.
go back to reference Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. In: 2000 Proceedings of 41st Annual Symposium on Foundations of Computer Science, pp. 454–463. IEEE (2000) Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. In: 2000 Proceedings of 41st Annual Symposium on Foundations of Computer Science, pp. 454–463. IEEE (2000)
9.
go back to reference Freedman, D., Chen, C.: Algebraic topology for computer vision. Comput. Vis. 239–268 (2009) Freedman, D., Chen, C.: Algebraic topology for computer vision. Comput. Vis. 239–268 (2009)
10.
go back to reference Gamble, J., Heo, G.: Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivariate Anal. 101(9), 2184–2199 (2010)MathSciNetCrossRef Gamble, J., Heo, G.: Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivariate Anal. 101(9), 2184–2199 (2010)MathSciNetCrossRef
11.
go back to reference Guan, H., Tang, W., Krim, H., Keiser, J., Rindos, A., Sazdanovic, R.: A topological collapse for document summarization. In: 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5. IEEE (2016) Guan, H., Tang, W., Krim, H., Keiser, J., Rindos, A., Sazdanovic, R.: A topological collapse for document summarization. In: 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5. IEEE (2016)
12.
go back to reference Guss, W.H., Salakhutdinov, R.: On characterizing the capacity of neural networks using algebraic topology. arXiv preprint arXiv:1802.04443 (2018) Guss, W.H., Salakhutdinov, R.: On characterizing the capacity of neural networks using algebraic topology. arXiv preprint arXiv:​1802.​04443 (2018)
14.
go back to reference Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. 47(1), 70–84 (1996)CrossRef Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. 47(1), 70–84 (1996)CrossRef
15.
go back to reference Kasson, P.M., Zomorodian, A., Park, S., Singhal, N., Guibas, L.J., Pande, V.S.: Persistent voids: a new structural metric for membrane fusion. Bioinformatics 23(14), 1753–1759 (2007)CrossRef Kasson, P.M., Zomorodian, A., Park, S., Singhal, N., Guibas, L.J., Pande, V.S.: Persistent voids: a new structural metric for membrane fusion. Bioinformatics 23(14), 1753–1759 (2007)CrossRef
16.
go back to reference Liu, J.Y., Jeng, S.K., Yang, Y.H.: Applying topological persistence in convolutional neural network for music audio signals. arXiv preprint arXiv:1608.07373 (2016) Liu, J.Y., Jeng, S.K., Yang, Y.H.: Applying topological persistence in convolutional neural network for music audio signals. arXiv preprint arXiv:​1608.​07373 (2016)
17.
go back to reference Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011) Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)
18.
go back to reference Michel, P., Ravichander, A., Rijhwani, S.: Does the geometry of word embeddings help document classification? A case study on persistent homology based representations. arXiv preprint arXiv:1705.10900 (2017) Michel, P., Ravichander, A., Rijhwani, S.: Does the geometry of word embeddings help document classification? A case study on persistent homology based representations. arXiv preprint arXiv:​1705.​10900 (2017)
19.
go back to reference Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005) Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
20.
go back to reference Sami, I.R., Farrahi, K.: A simplified topological representation of text for local and global context. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 1451–1456. ACM (2017) Sami, I.R., Farrahi, K.: A simplified topological representation of text for local and global context. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 1451–1456. ACM (2017)
21.
go back to reference Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G., Ringach, D.L.: Topological analysis of population activity in visual cortex. J. Vis. 8(8), 11–11 (2008)CrossRef Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G., Ringach, D.L.: Topological analysis of population activity in visual cortex. J. Vis. 8(8), 11–11 (2008)CrossRef
22.
go back to reference Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. (2016) Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. (2016)
23.
go back to reference Zhu, X.: Persistent homology: an introduction and a new text representation for natural language processing. In: IJCAI, pp. 1953–1959 (2013) Zhu, X.: Persistent homology: an introduction and a new text representation for natural language processing. In: IJCAI, pp. 1953–1959 (2013)
Metadata
Title
Movie Genre Detection Using Topological Data Analysis
Authors
Pratik Doshi
Wlodek Zadrozny
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-00810-9_11

Premium Partner