skip to main content
research-article

Efficient retrieval of the top-k most relevant spatial web objects

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and text relevancy. To our knowledge, only naive techniques exist that are capable of computing a general web information retrieval query while also taking location into account.

This paper proposes a new indexing framework for location-aware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within the framework. The framework encompasses algorithms that utilize the proposed indexes for computing the top-k query, thus taking into account both text relevancy and location proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper's proposal offers scalability and is capable of excellent performance.

References

  1. E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In SIGIR, pp. 273--280, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR, pp. 35--42, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. In SIGMOD, pp. 322--331, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web search engines. In SIGMOD, pp. 277--288, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cong, L. Wang, C.-Y. Lin, Y.-I. Song, and Y. Sun. Finding question-answer pairs from online forums. In SIGIR, pp. 467--474, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. De Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In ICDE, pp. 656--665, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In VLDB, pp. 545--556, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Faloutsos and S. Christodoulakis. Signature files: an access method for documents and its analytical performance evaluation. ACM TODS, 2(4):267--288, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In EDBT, pp. 310--327, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pp. 47--57, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Hariharan, B. Hore, C. Li, and S. Mehrotra. Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In SSDBM, p. 16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. R. Hjaltason and H. Samet. Distance browsing in spatial databases. ACM Trans. Database Syst., 24(2):265--318, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Katayama and S. Satoh. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In SIGMOD, pp. 369--380, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In ICDE, pp. 369--380, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Martins, M. J. Silva, and L. Andrade. Indexing and ranking in geo-IR systems. In GIR, pp. 31--34, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. S. McCurley. Geospatial mapping and navigation of the web. In WWW, pp. 221--229, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Moffat and J. Zobel. Coding for compression in full-text retrieval systems. Data Compression Conference, pp. 72--81, 1992.Google ScholarGoogle Scholar
  21. M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci., 47(10):749--764, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, pp. 275--281, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In TREC, 19 pages, 1994.Google ScholarGoogle Scholar
  24. N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD, pp. 71--79, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Sanderson and J. Kohler. Analyzing geographic queries. In SIGIR Workshop on Geographic Information Retrieval, 2 pages, 2004.Google ScholarGoogle Scholar
  26. B. Schnitzer and S. Leutenegger. Master-client R-trees: a new parallel R-tree architecture. In SSDBM, pp. 68--77, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In SIGIR, pp. 219--225, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. Spatio-textual indexing for geographical search on the web. In SSTD, pp. 218--235, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. A. White and R. Jain. Similarity indexing with the SS-tree. In ICDE, pp. 516--523, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM TOIS, 22(2):179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa. Keyword search in spatial databases: Towards searching by document. In ICDE, pp. 688--699, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. Hybrid index structures for location-based web search. In CIKM, pp. 155--162, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2), 56 pages, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM TODS, 23(4):453--490, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient retrieval of the top-k most relevant spatial web objects

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader