Skip to main content

2024 | OriginalPaper | Buchkapitel

Filtering Communities in Word Co-Occurrence Networks to Foster the Emergence of Meaning

verfasst von : Anna Béranger, Nicolas Dugué, Simon Guillot, Thibault Prouteau

Erschienen in: Complex Networks & Their Applications XII

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With SINr, we introduced a way to design graph and word embeddings based on community detection. Contrary to deep learning approaches, this approach does not require much compute and was proven to be at the state-of-the-art for interpretability in the context of word embeddings. In this paper, we investigate how filtering communities detected on word co-occurrence networks can improve performances of the approach. Community detection algorithms tend to uncover communities whose size follows a power-law distribution. Naturally, the number of activations per dimensions in SINr follows a power-law: a few dimensions are activated by many words, and many dimensions are activated by a few words. By filtering this distribution, removing part of its head and tail, we show improvement on intrinsic evaluation of the embedding while dividing their dimensionality by five. In addition, we show that these results are stable through several runs, thus defining a subset of distinctive features to describe a given corpus.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adamic, L.: Unzipping Zipf’s law. Nature 474(7350), 164–165 (2011)CrossRef Adamic, L.: Unzipping Zipf’s law. Nature 474(7350), 164–165 (2011)CrossRef
2.
Zurück zum Zitat Agirre,E., Alfonseca,E., Hall,K., Kravalova, J., Paşca, M.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: NAACL, pp. 19–27, (2009) Agirre,E., Alfonseca,E., Hall,K., Kravalova, J., Paşca, M.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: NAACL, pp. 19–27, (2009)
3.
Zurück zum Zitat Baroni, M.: 39 distributions in text. In: Corpus Linguistics: An international handbook, vol. 2, pp. 803–822. Mouton de Gruyter (2005) Baroni, M.: 39 distributions in text. In: Corpus Linguistics: An international handbook, vol. 2, pp. 803–822. Mouton de Gruyter (2005)
4.
Zurück zum Zitat Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)CrossRef Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)CrossRef
5.
Zurück zum Zitat Blondel, V.D., Guillaume,J.-L., Lambiotte,R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp., p. P10008 (2008) Blondel, V.D., Guillaume,J.-L., Lambiotte,R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp., p. P10008 (2008)
6.
Zurück zum Zitat Bruni, E., Tran, N.K., Baroni, M.: Multimodal distributional semantics. J. Artif. Int. Res. 49(1), 1–47 (2014)MathSciNet Bruni, E., Tran, N.K., Baroni, M.: Multimodal distributional semantics. J. Artif. Int. Res. 49(1), 1–47 (2014)MathSciNet
7.
Zurück zum Zitat BNC Consortium. British national corpus, XML edition (2007). Oxford Text Archive BNC Consortium. British national corpus, XML edition (2007). Oxford Text Archive
8.
Zurück zum Zitat Dao, V.L., Bothorel, C., Lenca, P.: Community structure: a comparative evaluation of community detection methods. Netw. Sci. 8(1), 1–41 (2020)CrossRef Dao, V.L., Bothorel, C., Lenca, P.: Community structure: a comparative evaluation of community detection methods. Netw. Sci. 8(1), 1–41 (2020)CrossRef
9.
Zurück zum Zitat Dugué, Nicolas, Lamirel, Jean-Charles., Perez, Anthony: Bringing a feature selection metric from machine learning to complex networks. In: Aiello, Luca Maria, Cherifi, Chantal, Cherifi, Hocine, Lambiotte, Renaud, Lió, Pietro, Rocha, Luis M.. (eds.) COMPLEX NETWORKS 2018. SCI, vol. 813, pp. 107–118. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05414-4_9 Dugué, Nicolas, Lamirel, Jean-Charles., Perez, Anthony: Bringing a feature selection metric from machine learning to complex networks. In: Aiello, Luca Maria, Cherifi, Chantal, Cherifi, Hocine, Lambiotte, Renaud, Lió, Pietro, Rocha, Luis M.. (eds.) COMPLEX NETWORKS 2018. SCI, vol. 813, pp. 107–118. Springer, Cham (2019). https://​doi.​org/​10.​1007/​978-3-030-05414-4_​9
10.
Zurück zum Zitat Guillot, S., Prouteau, T., Dugué, N.: Sparser is better: one step closer to word embedding interpretability. In: International Workshop on Computational Semantics (2023) Guillot, S., Prouteau, T., Dugué, N.: Sparser is better: one step closer to word embedding interpretability. In: International Workshop on Computational Semantics (2023)
11.
Zurück zum Zitat Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. CoRR, abs/1408.3456 (2014) Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. CoRR, abs/1408.3456 (2014)
12.
Zurück zum Zitat Huang, E., Socher, R., Manning, C., Ng, A.: Improving word representations via global context and multiple word prototypes. In: ACL, pp. 873–882 (2012) Huang, E., Socher, R., Manning, C., Ng, A.: Improving word representations via global context and multiple word prototypes. In: ACL, pp. 873–882 (2012)
13.
15.
Zurück zum Zitat Levy, O., Goldberg, Y.: Neural Word Embedding as Implicit Matrix Factorization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q., (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014) Levy, O., Goldberg, Y.: Neural Word Embedding as Implicit Matrix Factorization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q., (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)
16.
Zurück zum Zitat Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. ACL 3, 211–225 (2015) Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. ACL 3, 211–225 (2015)
18.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)
19.
Zurück zum Zitat Murphy, B., Talukdar, P., Mitchell, T.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: COLING, pp. 1933–1950 (2012) Murphy, B., Talukdar, P., Mitchell, T.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: COLING, pp. 1933–1950 (2012)
20.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
22.
Zurück zum Zitat Prouteau, T., Dugué, N., Camelin, N., Meignier, S.: Are embedding spaces interpretable? Results of an intrusion detection evaluation on a large French corpus. In: LREC (2022) Prouteau, T., Dugué, N., Camelin, N., Meignier, S.: Are embedding spaces interpretable? Results of an intrusion detection evaluation on a large French corpus. In: LREC (2022)
23.
Zurück zum Zitat Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T., Hovy, E.: Spine: SParse interpretable neural embeddings. In: AAAI (2018) Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T., Hovy, E.: Spine: SParse interpretable neural embeddings. In: AAAI (2018)
Metadaten
Titel
Filtering Communities in Word Co-Occurrence Networks to Foster the Emergence of Meaning
verfasst von
Anna Béranger
Nicolas Dugué
Simon Guillot
Thibault Prouteau
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-53468-3_32

Premium Partner