ABSTRACT
Users look for information that can suit their level of expertise, but it often takes a mammoth effort to trace such information. One has to sift through multiple pages to look for one that fits the appropriate technical background. In this paper, a query-independent ranking system is proposed for technical web pages. The pages returned by the system are sorted by their relative technical difficulty in either ascending or descending order specified by the user. The technical difficulty of a document i.e. terms in sequence, is first computed by the combination of each individual term's geometry in the low-dimensional latent semantic indexing (LSI) space, which can be visualized as a conceptual terrain. Then the pages are ranked based on the expected cost to get over the terrain. Results indicate that our terrain based method outperforms traditional readability measures.
- J. S. Chall and E. Dale. Readability revisited: the new dale-chall readability formula. 1995.Google Scholar
- K. Collins-Thompson and J. Callan. Predicting reading difficulty with statistical language models. J. Am. Soc. Inf. Sci. Technol., 56:1448--1462, November 2005. Google ScholarDigital Library
- S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391--407, 1990.Google ScholarCross Ref
- C. R. Fletcher, S. T. Chrysler, P. van den Broek, J. A. Deaton, and C. P. Bloom. The role of co-occurrence, co-reference, and causality in the coherence of conjoined sentences. In R. F. Lorch, and E. J. O'Brien (Eds.), Sources of coherence in reading, pages 203--218, 1995.Google Scholar
- P. W. Foltz, W. Kintsch, and T. K. Landauer. The measurement of textual coherence with latent semantic analysis. Discourse Process, 15:285--307, 1998.Google ScholarCross Ref
- K. Järvelin and J. Kekäläinen. Ir evaluation methods for retrieving highly relevant documents. In Proceedings of SIGIR, pages 41--48, 2000. Google ScholarDigital Library
- J. P. Kincaid, R. P. Fishburne, R. L. Rogers, and B. S. Chissom. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Technical report, Feb. 1975.Google Scholar
- G. Kumaran, R. Jones, and O. Madani. Biasing web search results for topic familiarity. In Proceedings of CIKM, pages 271--272, 2005. Google ScholarDigital Library
- T. Landauer and S. Dumais. A solution to plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211--240, 1997.Google ScholarCross Ref
- T. K. Landauer, P. W. Foltz, and D. Laham. An Introduction to Latent Semantic Analysis. Discourse Processes, (25):259--284, 1998.Google Scholar
- G. H. M. Laughlin. Smog grading-a new readability formula. Journal of Reading, 12(8):pp. 639--646, 1969.Google Scholar
- D. S. McNamara, E. Kintsch, N. B. Songer, and W. Kintsch. Are good texts always better? interactions of text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instruction, 14(1):pp. 1--43, 1996.Google ScholarCross Ref
- M. Nakatani, A. Jatowt, and K. Tanaka. Easiest-first search: towards comprehension-based web search. In Proceeding of CIKM, pages 2057--2060, 2009. Google ScholarDigital Library
- A. Stenner, I. Horabin, D. Smith, and M. Smith. The lexile framework. 1988.Google Scholar
- R. W. White, S. T. Dumais, and J. Teevan. Characterizing the influence of domain expertise on web search behavior. In Proceedings of the WSDM, pages 132--141, 2009. Google ScholarDigital Library
- M. B. W. Wolfe, M. E. Schreiner, B. Rehder, D. Laham, P. W. Foltz, W. Kintsch, and T. K. Landauer. Learning from text: Matching readers and texts by latent semantic analysis. Discourse Processes, 25(2/3):309--336, 1998.Google ScholarCross Ref
- X. Yan, D. Song, and X. Li. Concept-based document readability in domain specific information retrieval. In Proceedings of CIKM, pages 540--549, 2006. Google ScholarDigital Library
Index Terms
- An unsupervised ranking method based on a technical difficulty terrain
Recommendations
An unsupervised technical difficulty ranking model based on conceptual terrain in the latent space
JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital LibrariesSearch results of the existing general-purpose search engines usually do not satisfy domain-specific information retrieval tasks as there is a mis-match between the technical expertise of a user and the results returned by the search engine. In this ...
An Unsupervised Technical Readability Ranking Model by Building a Conceptual Terrain in LSI
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and GridsSearching for domain-specific related information has gained a high popularity in recent years. Naturally, everyone is not at par with each other when it comes to knowledge about the concepts of a domain. A doctor may be well versed in her field of ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Comments