Skip to main content
Top

2015 | OriginalPaper | Chapter

Multilingual Documents Clustering Based on Closed Concepts Mining

Authors : Mohamed Chebel, Chiraz Latiri, Eric Gaussier

Published in: Database and Expert Systems Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The scarcity of bilingual and multilingual parallel corpora has prompted many researchers to accentuate the need for new methods to enhance the quality of comparable corpora. In this paper, we highlight the interest and usefulness of Formal Concept Analysis in multiligual document clustering to improve corpora comparability. We propose a statistical approach for clustering multiligual documents based on multilingual Closed Concepts Mining to partition the documents belonging to one or more collections, writing in more than one language, in a set of classes. Experimental evaluation was conducted on two collections and showed a significant improvement of comparability of the generated classes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Chen, H.-H., Lin, M.-S., Wei, Y.-C.: Novel association measures using web search with double checking. ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016 (2006) Chen, H.-H., Lin, M.-S., Wei, Y.-C.: Novel association measures using web search with double checking. ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016 (2006)
2.
go back to reference Evans, D., Klavans, J.: A platform for multilingual news summarization. Technical Report, Department of Computer Science, Columbia University (2003) Evans, D., Klavans, J.: A platform for multilingual news summarization. Technical Report, Department of Computer Science, Columbia University (2003)
4.
go back to reference Gliozzo A., Strapparava C.: Cross language text categorization by acquiring multi-lingual domain models from comparable corpora. ParaText 2005: Proceedings of the ACL Workshop on Building and Using Parallel Texts (2005) Gliozzo A., Strapparava C.: Cross language text categorization by acquiring multi-lingual domain models from comparable corpora. ParaText 2005: Proceedings of the ACL Workshop on Building and Using Parallel Texts (2005)
5.
go back to reference Mimouni, N., Nazarenko, A., S. Salotti: Classification conceptuelle d’une collection documentaire, intertextualité et recherche d’information. CORIA 2012: 9th French Information Retrieval Conference. Bordeaux, France (2012) Mimouni, N., Nazarenko, A., S. Salotti: Classification conceptuelle d’une collection documentaire, intertextualité et recherche d’information. CORIA 2012: 9th French Information Retrieval Conference. Bordeaux, France (2012)
6.
go back to reference Montalvo, S., Martínez, R., Casillas, A., Fresno, V.: Multilingual news document clustering: two algorithms based on cognate named entities. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 165–172. Springer, Heidelberg (2006) CrossRef Montalvo, S., Martínez, R., Casillas, A., Fresno, V.: Multilingual news document clustering: two algorithms based on cognate named entities. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 165–172. Springer, Heidelberg (2006) CrossRef
7.
go back to reference Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24(1), 2560 (2005)CrossRef Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24(1), 2560 (2005)CrossRef
8.
go back to reference Peters C.: Result of the CLEF 2003 cross-language system evaluation campaign. In: Notes for the CLEF 2003 Workshop, 21–22 August, Trondheim, Norway (2003) Peters C.: Result of the CLEF 2003 cross-language system evaluation campaign. In: Notes for the CLEF 2003 Workshop, 21–22 August, Trondheim, Norway (2003)
9.
go back to reference Salton, G., Buckely, C.: Term weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRef Salton, G., Buckely, C.: Term weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRef
10.
go back to reference Romeo, S., Ienco, D., Tagarelli, A.: Knowledge-based representation for transductive multilingual document classification. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 92–103. Springer, Heidelberg (2015) Romeo, S., Ienco, D., Tagarelli, A.: Knowledge-based representation for transductive multilingual document classification. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 92–103. Springer, Heidelberg (2015)
11.
go back to reference Wei, C.-P., Yang, C.-C., Lin, C.-M.: A latent semantic indexing-based approach to multilingual document clustering. Decis. Support. Syst. 45(3), 606–620 (2008)CrossRef Wei, C.-P., Yang, C.-C., Lin, C.-M.: A latent semantic indexing-based approach to multilingual document clustering. Decis. Support. Syst. 45(3), 606–620 (2008)CrossRef
12.
go back to reference Zaki, M.-J., Hsiao, C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)CrossRef Zaki, M.-J., Hsiao, C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)CrossRef
Metadata
Title
Multilingual Documents Clustering Based on Closed Concepts Mining
Authors
Mohamed Chebel
Chiraz Latiri
Eric Gaussier
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-22849-5_36

Premium Partner