ABSTRACT
An empirical study has been conducted investigating the relationship between the performance of an aspect based language model in terms of perplexity and the corresponding information retrieval performance obtained. It is observed, on the corpora considered, that the perplexity of the language model has a systematic relationship with the achievable precision recall performance though it is not statistically significant.
- A. Berger and J. D. Lafferty. Information retrieval as statistical translation. In Research and Development in Information Retrieval pages 222--229, 1999. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research 3(5):993--1022, 2003. Google ScholarDigital Library
- M. Girolami and A. Kaban. On an equivalence between PLSI and LDA. In Proceedings of SIGIR 2003 SIGIR, 2003. Google ScholarDigital Library
- T. Hofmann. Probabilistic Latent Semantic Indexing. In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval pages 50--57, Berkeley, California, August 1999. Google ScholarDigital Library
- V. Lavrenko and W. B. Croft. Relevance-based language models. In Research and Development in Information Retrieval pages 120--127, 2001. Google ScholarDigital Library
- E. L. Margulis. Modeling documents with multiple poisson distributions. Information Processing and Management 29(2):215--227, 1993. Google ScholarDigital Library
- J. Ponte and W. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR 98 pages 275--281. SIGIR, 1998. Google ScholarDigital Library
- F. Song and W. B. Croft. A general language model for information retrieval (poster abstract). In Research and Development in Information Retrieval pages 279--280, 1999. Google ScholarDigital Library
Index Terms
- Investigating the relationship between language model perplexity and IR precision-recall measures
Recommendations
Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system
This paper describes a new technique of language modeling for a highly inflectional Dravidian language, Tamil. It aims to alleviate the main problems encountered in processing of Tamil language, like enormous vocabulary growth caused by the large number ...
The latent words language model
We present a new generative model of natural language, the latent words language model. This model uses a latent variable for every word in a text that represents synonyms or related words in the given context. We develop novel methods to train this ...
Multi class-based n-gram language model for new words using web data
ROCOM'11/MUSP'11: Proceedings of the 11th WSEAS international conference on robotics, control and manufacturing technology, and 11th WSEAS international conference on Multimedia systems & signal processingOut-of-vocabulary (OOV) words cause a serious problem for automatic speech recognition (ASR) system. Not only it will be miss-recognized as an in-vocabulary word with similar phonetics, but the error will also affect nearby words to make errors. ...
Comments