skip to main content
10.1145/2824864.2824872acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

Published:05 December 2014Publication History

ABSTRACT

This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics.

Query Word Labeling is on token level language identification of query words in code-mixed queries and back-transliteration of identified Indian language words into their native scripts. We have developed letter based language models for the token level language identification of query words and a structured perceptron model for back-transliteration of Indic words.

The second subtask for Mixed-script Ad hoc retrieval for Hindi Song Lyrics is to retrieve a ranked list of songs from a corpus of Hindi song lyrics given an input query in Devanagari or transliterated Roman script. We have used edit distance based query expansion and language modeling followed by relevance based reranking for the retrieval of relevant Hindi Song lyrics for a given query.

References

  1. Michael Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. pages 188--193, 2006.Google ScholarGoogle Scholar
  2. Marcello Federico, Nicola Bertoldi, and Mauro Cettolo. Irstlm: an open source toolkit for handling large scale language models. In Interspeech, pages 1618--1621, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. Parth Gupta, Kalika Bali, Rafael E Banchs, Monojit Choudhury, and Paolo Rosso. Query expansion for mixed-script information retrieval. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 677--686. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  5. Tomas Mikolov, Stefan Kombrink, Anoop Deoras, Lukar Burget, and J Cernocky. Rnnlm-recurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196--201, 2011.Google ScholarGoogle Scholar
  6. Franz Josef Och and Hermann Ney. Giza++: Training of statistical translation models, 2000.Google ScholarGoogle Scholar
  7. Andreas Stolcke et al. Srilm-an extensible language modeling toolkit. In INTERSPEECH, 2002.Google ScholarGoogle Scholar
  8. Olga Vechtomova and Ying Wang. A study of the effect of term proximity on query expansion. Journal of Information Science, 32(4):324--333, 2006.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation
      December 2014
      151 pages
      ISBN:9781450337557
      DOI:10.1145/2824864
      • Editors:
      • Prasenjit Majumder,
      • Mandar Mitra,
      • Sukomal Pal,
      • Madhulika Agrawal,
      • Parth Mehta

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 December 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate19of64submissions,30%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader