ABSTRACT
Search results of the existing general-purpose search engines usually do not satisfy domain-specific information retrieval tasks as there is a mis-match between the technical expertise of a user and the results returned by the search engine. In this paper, we investigate the problem of ranking domain-specific documents based on the technical difficulty. We propose an unsupervised conceptual terrain model using Latent Semantic Indexing (LSI) for re-ranking search results obtained from a similarity based search system. We connect the sequences of terms under the latent space by the semantic distance between the terms and compute the traversal cost for a document indicating the technical difficulty. Our experiments on a domain-specific corpus demonstrate the efficacy of our method.
- J. Bellegarda. A multispan language modeling framework for large vocabulary speech recognition. IEEE Transactions on Speech and Audio Processing, 6(5):456 --467, sep 1998.Google ScholarCross Ref
- P. Freebody and R. C. Anderson. Effects of vocabulary difficulty, text cohesion, and schema availability on reading comprehension. Reading Research Quarterly, 18(3):pp. 277--294, 1983.Google ScholarCross Ref
- W. Kintsch. The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95:163--182, 1988.Google ScholarCross Ref
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. World Wide Web Internet And Web Information Systems, (1999--66):1--17, 1998.Google Scholar
- S. Robertson. Understanding inverse document frequency: on theoretical arguments for idf. Journal of Documentation, 60(5):503--520, 2004.Google ScholarCross Ref
- S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC, 1994.Google Scholar
- G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18:613--620, November 1975. Google ScholarDigital Library
- X. Shen, B. Tan, and C. Zhai. Implicit user modeling for personalized search. In Proc. of 14th CIKM, pages 824--831, 2005. Google ScholarDigital Library
Index Terms
- An unsupervised technical difficulty ranking model based on conceptual terrain in the latent space
Recommendations
An unsupervised ranking method based on a technical difficulty terrain
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementUsers look for information that can suit their level of expertise, but it often takes a mammoth effort to trace such information. One has to sift through multiple pages to look for one that fits the appropriate technical background. In this paper, a ...
Ranking Text Documents Based on Conceptual Difficulty Using Term Embedding and Sequential Discourse Cohesion
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01We propose a novel framework for determining the conceptual difficulty of a domain-specific text document without using any external lexicon. Conceptual difficulty relates to finding the reading difficulty of domain-specific documents. Previous ...
An Unsupervised Technical Readability Ranking Model by Building a Conceptual Terrain in LSI
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and GridsSearching for domain-specific related information has gained a high popularity in recent years. Naturally, everyone is not at par with each other when it comes to knowledge about the concepts of a domain. A doctor may be well versed in her field of ...
Comments