ABSTRACT
This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create "culturally-aware applications" and "hyperlingual applications".
- Adafre, S.F. and de Rijke, M. (2006). Finding Similar Sentences Across Multiple Languages in Wikipedia. EACL 2006 Workshop on New Text, Wikis and Blogs and Other Dynamic Text Sources. 62--69.Google Scholar
- Adar, E., Skinner, M. and Weld, D.S. (2009). Information Arbitrage Across Multi-lingual Wikipedia. WSDM '09, 94--103. Google ScholarDigital Library
- Bergstrom, T. and Karahalios, K. (2009). Conversation clusters: grouping conversation topics through human-computer dialog. CHI '09, 2349--2352. Google ScholarDigital Library
- Bolikowski, A. (2009) Scale-free topology of the interlanguage links in Wikipedia. http://arxiv.org/abs/0904.0564.Google Scholar
- Budanitsky, A. and Hirst, G. (2006). Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics, 32 (1). 13--47. Google ScholarDigital Library
- Burke, M. and Kraut, R. (2008). Mopping Up: Modeling Wikipedia Promotion Decisions. CSCW '08, 27--36. Google ScholarDigital Library
- Callahan, E. and Herring, S.C. (2009). Cultural Bias in Wikipedia Content on Famous Persons. AoIR 10.0.Google Scholar
- Cimiano, P., Schultz, A., Sizov, S., Sorg, P. and Staab, S., (2009). Explicit Versus Latent Concept Models for Cross--Language Information Retrieval. IJCAI '09, 1513--1518. Google ScholarDigital Library
- Erdmann, M., Nakayama, K., Hara, T. and Nishio, S. (2008). A Bilingual Dictionary Extracted from the Wikipedia Link Structure. DASFAA '08, 686--689. Google ScholarDigital Library
- Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. (2002). Placing Seach in Context: The Concept Revisited. ACM Transactions on Information Systems, 20 (1). 116--131. Google ScholarDigital Library
- Gabrilovich, E. and Markovitch, S. (2007). Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. IJCAI '07, 1606--1611. Google ScholarDigital Library
- Gabrilovich, E. and Markovitch, S. (2009). Wikipedia-based Semantic Interpretation for Natural Language Processing. Journal of Artificial Intelligence Research (JAIR), 34. 443--498. Google ScholarDigital Library
- Hassan, S. and Mihalcea, R. (2009). Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. EMNLP'09, 1192--1201. Google ScholarDigital Library
- Hecht, B. and Gergle, D. (2009). Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Communities & Technologies 2009, 11--21. Google ScholarDigital Library
- Hecht, B. and Raubal, M. (2008). GeoSR: Geographically explore semantic relations in world knowledge. AGILE '08: International Conference on Geographic Information Science, 95 -- 114.Google ScholarCross Ref
- Kittur, A., Chi, E., Pendleton, B.A., Suh, B. and Mytkowicz, T. (2007). Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie. CHI '07, 1--9.Google Scholar
- Kittur, A. and Kraut, R. (2008). Harnessing the Wisdom of Crowds in Wikipedia: Quality Through Coordination. CSCW '08, 37--46. Google ScholarDigital Library
- Lih, A. The Wikipedia Revolution: How a Bunch of Nobodies Created the World's Greatest Encyclopedia. Hyperion, 2009.Google Scholar
- Miller, G.A. and Charles, W.G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6 (1). 1--28.Google ScholarCross Ref
- Milne, D. and Witten, I.H. (2008). Learning to Link with Wikipedia. CIKM '08, 1046--1055. Google ScholarDigital Library
- Muller, M.J. (2007). Comparing tagging vocabularies among four enterprise tag-based services. GROUP '07, 341--350. Google ScholarDigital Library
- Oh, J.-H., Kawahara, D., Uchimoto, K., Kazama, J.i. and Torisawa, K. (2008). Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia. WI-IAT 2008, 322--328. Google ScholarDigital Library
- Ortega, F., Gonzalez-Barahona, J.M. and Robles, G. (2008). On The Inequality of Contributions to Wikipedia. HICSS '08, 304--311. Google ScholarDigital Library
- Pedersen, T., Pakhomov, S.V.S., Patwardhand, S. and Chute, C.G. (2007). Meaures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 2007 (40). 288--299. Google ScholarDigital Library
- Potthast, M., Stein, B. and Anderka, M. (2008). A Wikipedia-Based Multilingual Retrieval Model. ECIR '08, 522--530. Google ScholarDigital Library
- Priedhorsky, R., Chen, J., Lam, S.T., Panciera, K., Terveen, L.G. and Riedl, J. (2007). Creating, Destroying, and Restoring Value in Wikipedia. GROUP 2007. Google ScholarDigital Library
- Sorg, P. and Cimiano, P. (2008). Enriching the Crosslingual Link Structure of Wikipedia -- A Classification-based Approach. WIKI-AI '08.Google Scholar
- Weld, D.S., Wu, F., Adar, E., Amershi, S., Fogarty, J., Hoffman, R., Patel, K. and Skinner, M. (2008). Intelligence in Wikipedia. AAAI '08. Google ScholarDigital Library
- Yamashita, N., Inaba, R., Kuzuoka, H. and Ishida, T. (2009). Difficulties in establishing common ground in multiparty groups using machine translation. CHI' 09, 679--688. Google ScholarDigital Library
- Zesch, T., Müller, C. and Gurevych, I. (2008). Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. LREC '08, 1646--1652.Google Scholar
Index Terms
- The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context
Recommendations
Omnipedia: bridging the wikipedia language gap
CHI '12: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsWe present Omnipedia, a system that allows Wikipedia readers to gain insight from up to 25 language editions of Wikipedia simultaneously. Omnipedia highlights the similarities and differences that exist among Wikipedia language editions, and makes ...
Multilinguals and Wikipedia editing
WebSci '14: Proceedings of the 2014 ACM conference on Web scienceThis article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across ...
Extended explicit semantic analysis for calculating semantic relatedness of web resources
EC-TEL'10: Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practiceFinding semantically similar documents is a common task in Recommender Systems. Explicit Semantic Analysis (ESA) is an approach to calculate semantic relatedness between terms or documents based on similarities to documents of a reference corpus. Here, ...
Comments