Skip to main content
Top

2016 | OriginalPaper | Chapter

Languages of Russia: Using Social Networks to Collect Texts

Authors : Irina Krylova, Boris Orekhov, Ekaterina Stepanova, Lyudmila Zaydelman

Published in: Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper we outline a method of finding texts in minor languages of Russia in social networks by the example of VKontakte. We find language-specific markers – special tokens that contain letter combinations unique to a certain language and highly frequent in texts in this language. We use Yandex.XML to generate lists of web-pages that contain texts in these languages. We then download data from web-pages in the https://​vk.​com domain through Vkontakte API.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Orekhov, B.V., Reshetnikov K.Yu.: To the assessment of Wikipedia as a linguistic source (К oцeнкe Bикипeдии кaк лингвиcтичecкoгo иcтoчникa), Contemporary Russian on the Internet (Coвpeмeнный pyccкий язык в интepнeтe), Moscow, Jazyki slavjanskoy kul’tury, pp. 310–321 (2014) Orekhov, B.V., Reshetnikov K.Yu.: To the assessment of Wikipedia as a linguistic source (К oцeнкe Bикипeдии кaк лингвиcтичecкoгo иcтoчникa), Contemporary Russian on the Internet (Coвpeмeнный pyccкий язык в интepнeтe), Moscow, Jazyki slavjanskoy kul’tury, pp. 310–321 (2014)
2.
go back to reference Pischlöger, C.: Besermyan in the internet: social networks as a chance for language maintaining? (Бecepмянe в интepнeтe: coциaльныe ceти кaк шaнc для coxpaнeния poднoгo языкa?), Problems of ethno-cultural interaction in the Ural-Volga region: history and the present (Пpoблeмы этнoкyльтypнoгo взaимoдeйcтвия в Уpaлo-Пoвoлжьe: иcтopия и coвpeмeннocть), Samara, pp. 216–219 (2013) Pischlöger, C.: Besermyan in the internet: social networks as a chance for language maintaining? (Бecepмянe в интepнeтe: coциaльныe ceти кaк шaнc для coxpaнeния poднoгo языкa?), Problems of ethno-cultural interaction in the Ural-Volga region: history and the present (Пpoблeмы этнoкyльтypнoгo взaимoдeйcтвия в Уpaлo-Пoвoлжьe: иcтopия и coвpeмeннocть), Samara, pp. 216–219 (2013)
3.
go back to reference Boleda, G., Bott, S., Meza, R., et al.: CUCWeb: a Catalan corpus built from the web. In: Proceedings of Second Workshop on the Web as a Corpus at EACL 2006 (2006) Boleda, G., Bott, S., Meza, R., et al.: CUCWeb: a Catalan corpus built from the web. In: Proceedings of Second Workshop on the Web as a Corpus at EACL 2006 (2006)
4.
go back to reference Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)CrossRef Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43(3), 209–226 (2009)CrossRef
5.
go back to reference Guevara, E.: NoWaC: a large web-based corpus for Norwegian. In: NAACL HLT 2010 6th Web as Corpus Workshop, pp. 1–7 (2010) Guevara, E.: NoWaC: a large web-based corpus for Norwegian. In: NAACL HLT 2010 6th Web as Corpus Workshop, pp. 1–7 (2010)
6.
go back to reference Ljubešić, N., Erjavec, T.: hrWaC and slWac: compiling web corpora for Croatian and Slovene. In: Proceedings of 14th International Conference, Pilsen, Czech Republic, pp. 395–402 (2011) Ljubešić, N., Erjavec, T.: hrWaC and slWac: compiling web corpora for Croatian and Slovene. In: Proceedings of 14th International Conference, Pilsen, Czech Republic, pp. 395–402 (2011)
7.
go back to reference Zaliznyak, A.A.: Old Novgorod dialect (Дpeвнeнoвгopoдcкий диaлeкт), Moscow, Jazyki slavjanskoy kul’tury (2004) Zaliznyak, A.A.: Old Novgorod dialect (Дpeвнeнoвгopoдcкий диaлeкт), Moscow, Jazyki slavjanskoy kul’tury (2004)
10.
go back to reference Pischlöger, C.: Udmurt and Besermyan languages in social networks (Удмypтcкий и бecepмянcкий языки в coциaльныx ceтяx). In: Proceedings of International Science-Practical Conference, Dedicated to 260-Anniversary of V.G. Korolenko Maтepиaлы Meждyнapoднoй нayчнo-пpaктичecкoй кoнфepeнции, пocвящeннoй 260-лeтнeмy юбилeю B.Г. Кopoлeнкo.), Glazov, pp. 187–190 (2013) Pischlöger, C.: Udmurt and Besermyan languages in social networks (Удмypтcкий и бecepмянcкий языки в coциaльныx ceтяx). In: Proceedings of International Science-Practical Conference, Dedicated to 260-Anniversary of V.G. Korolenko Maтepиaлы Meждyнapoднoй нayчнo-пpaктичecкoй кoнфepeнции, пocвящeннoй 260-лeтнeмy юбилeю B.Г. Кopoлeнкo.), Glazov, pp. 187–190 (2013)
11.
go back to reference Pischlöger, C. Notes from Murjol underground: super Udmurts in cyberspace (Запис(к)и из Мурӝол Underground: Super удмурты в Cyberspace). In: Proceedings of IV International Science-Practical Conference “Florov’s Readings” (Материалы IV Международной научно-практической конференции “Флоровские чтения”), pp. 56–59. Glazov pedagogical institute, Glazov (2014) Pischlöger, C. Notes from Murjol underground: super Udmurts in cyberspace (Запис(к)и из Мурӝол Underground: Super удмурты в Cyberspace). In: Proceedings of IV International Science-Practical Conference “Florov’s Readings” (Материалы IV Международной научно-практической конференции “Флоровские чтения”), pp. 56–59. Glazov pedagogical institute, Glazov (2014)
Metadata
Title
Languages of Russia: Using Social Networks to Collect Texts
Authors
Irina Krylova
Boris Orekhov
Ekaterina Stepanova
Lyudmila Zaydelman
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-41718-9_11