skip to main content
10.1145/1753326.1753370acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context

Published:10 April 2010Publication History

ABSTRACT

This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create "culturally-aware applications" and "hyperlingual applications".

References

  1. Adafre, S.F. and de Rijke, M. (2006). Finding Similar Sentences Across Multiple Languages in Wikipedia. EACL 2006 Workshop on New Text, Wikis and Blogs and Other Dynamic Text Sources. 62--69.Google ScholarGoogle Scholar
  2. Adar, E., Skinner, M. and Weld, D.S. (2009). Information Arbitrage Across Multi-lingual Wikipedia. WSDM '09, 94--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bergstrom, T. and Karahalios, K. (2009). Conversation clusters: grouping conversation topics through human-computer dialog. CHI '09, 2349--2352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bolikowski, A. (2009) Scale-free topology of the interlanguage links in Wikipedia. http://arxiv.org/abs/0904.0564.Google ScholarGoogle Scholar
  5. Budanitsky, A. and Hirst, G. (2006). Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics, 32 (1). 13--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burke, M. and Kraut, R. (2008). Mopping Up: Modeling Wikipedia Promotion Decisions. CSCW '08, 27--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Callahan, E. and Herring, S.C. (2009). Cultural Bias in Wikipedia Content on Famous Persons. AoIR 10.0.Google ScholarGoogle Scholar
  8. Cimiano, P., Schultz, A., Sizov, S., Sorg, P. and Staab, S., (2009). Explicit Versus Latent Concept Models for Cross--Language Information Retrieval. IJCAI '09, 1513--1518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Erdmann, M., Nakayama, K., Hara, T. and Nishio, S. (2008). A Bilingual Dictionary Extracted from the Wikipedia Link Structure. DASFAA '08, 686--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. (2002). Placing Seach in Context: The Concept Revisited. ACM Transactions on Information Systems, 20 (1). 116--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gabrilovich, E. and Markovitch, S. (2007). Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. IJCAI '07, 1606--1611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gabrilovich, E. and Markovitch, S. (2009). Wikipedia-based Semantic Interpretation for Natural Language Processing. Journal of Artificial Intelligence Research (JAIR), 34. 443--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hassan, S. and Mihalcea, R. (2009). Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. EMNLP'09, 1192--1201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hecht, B. and Gergle, D. (2009). Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Communities & Technologies 2009, 11--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hecht, B. and Raubal, M. (2008). GeoSR: Geographically explore semantic relations in world knowledge. AGILE '08: International Conference on Geographic Information Science, 95 -- 114.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kittur, A., Chi, E., Pendleton, B.A., Suh, B. and Mytkowicz, T. (2007). Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie. CHI '07, 1--9.Google ScholarGoogle Scholar
  17. Kittur, A. and Kraut, R. (2008). Harnessing the Wisdom of Crowds in Wikipedia: Quality Through Coordination. CSCW '08, 37--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lih, A. The Wikipedia Revolution: How a Bunch of Nobodies Created the World's Greatest Encyclopedia. Hyperion, 2009.Google ScholarGoogle Scholar
  19. Miller, G.A. and Charles, W.G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6 (1). 1--28.Google ScholarGoogle ScholarCross RefCross Ref
  20. Milne, D. and Witten, I.H. (2008). Learning to Link with Wikipedia. CIKM '08, 1046--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Muller, M.J. (2007). Comparing tagging vocabularies among four enterprise tag-based services. GROUP '07, 341--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Oh, J.-H., Kawahara, D., Uchimoto, K., Kazama, J.i. and Torisawa, K. (2008). Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia. WI-IAT 2008, 322--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ortega, F., Gonzalez-Barahona, J.M. and Robles, G. (2008). On The Inequality of Contributions to Wikipedia. HICSS '08, 304--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Pedersen, T., Pakhomov, S.V.S., Patwardhand, S. and Chute, C.G. (2007). Meaures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 2007 (40). 288--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Potthast, M., Stein, B. and Anderka, M. (2008). A Wikipedia-Based Multilingual Retrieval Model. ECIR '08, 522--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Priedhorsky, R., Chen, J., Lam, S.T., Panciera, K., Terveen, L.G. and Riedl, J. (2007). Creating, Destroying, and Restoring Value in Wikipedia. GROUP 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sorg, P. and Cimiano, P. (2008). Enriching the Crosslingual Link Structure of Wikipedia -- A Classification-based Approach. WIKI-AI '08.Google ScholarGoogle Scholar
  28. Weld, D.S., Wu, F., Adar, E., Amershi, S., Fogarty, J., Hoffman, R., Patel, K. and Skinner, M. (2008). Intelligence in Wikipedia. AAAI '08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yamashita, N., Inaba, R., Kuzuoka, H. and Ishida, T. (2009). Difficulties in establishing common ground in multiparty groups using machine translation. CHI' 09, 679--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zesch, T., Müller, C. and Gurevych, I. (2008). Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. LREC '08, 1646--1652.Google ScholarGoogle Scholar

Index Terms

  1. The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
        April 2010
        2690 pages
        ISBN:9781605589299
        DOI:10.1145/1753326

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 April 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate6,199of26,314submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader