Abstract
The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and text relevancy. To our knowledge, only naive techniques exist that are capable of computing a general web information retrieval query while also taking location into account.
This paper proposes a new indexing framework for location-aware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within the framework. The framework encompasses algorithms that utilize the proposed indexes for computing the top-k query, thus taking into account both text relevancy and location proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper's proposal offers scalability and is capable of excellent performance.
- E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In SIGIR, pp. 273--280, 2004. Google ScholarDigital Library
- V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR, pp. 35--42, 2001. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. In SIGMOD, pp. 322--331, 1990. Google ScholarDigital Library
- Y.-Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web search engines. In SIGMOD, pp. 277--288, 2006. Google ScholarDigital Library
- G. Cong, L. Wang, C.-Y. Lin, Y.-I. Song, and Y. Sun. Finding question-answer pairs from online forums. In SIGIR, pp. 467--474, 2008. Google ScholarDigital Library
- I. De Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In ICDE, pp. 656--665, 2008. Google ScholarDigital Library
- J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In VLDB, pp. 545--556, 2000. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003. Google ScholarDigital Library
- C. Faloutsos and S. Christodoulakis. Signature files: an access method for documents and its analytical performance evaluation. ACM TODS, 2(4):267--288, 1984. Google ScholarDigital Library
- C. Faloutsos and H. V. Jagadish. Hybrid index organizations for text databases. In EDBT, pp. 310--327, 1992. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarDigital Library
- A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pp. 47--57, 1984. Google ScholarDigital Library
- R. Hariharan, B. Hore, C. Li, and S. Mehrotra. Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In SSDBM, p. 16, 2007. Google ScholarDigital Library
- G. R. Hjaltason and H. Samet. Distance browsing in spatial databases. ACM Trans. Database Syst., 24(2):265--318, 1999. Google ScholarDigital Library
- N. Katayama and S. Satoh. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In SIGMOD, pp. 369--380, 1997. Google ScholarDigital Library
- N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In ICDE, pp. 369--380, 2002. Google ScholarDigital Library
- B. Martins, M. J. Silva, and L. Andrade. Indexing and ranking in geo-IR systems. In GIR, pp. 31--34, 2005. Google ScholarDigital Library
- K. S. McCurley. Geospatial mapping and navigation of the web. In WWW, pp. 221--229, 2001. Google ScholarDigital Library
- A. Moffat and J. Zobel. Coding for compression in full-text retrieval systems. Data Compression Conference, pp. 72--81, 1992.Google Scholar
- M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci., 47(10):749--764, 1996. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, pp. 275--281, 1998. Google ScholarDigital Library
- S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In TREC, 19 pages, 1994.Google Scholar
- N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD, pp. 71--79, 1995. Google ScholarDigital Library
- M. Sanderson and J. Kohler. Analyzing geographic queries. In SIGIR Workshop on Geographic Information Retrieval, 2 pages, 2004.Google Scholar
- B. Schnitzer and S. Leutenegger. Master-client R-trees: a new parallel R-tree architecture. In SSDBM, pp. 68--77, 1999. Google ScholarDigital Library
- T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In SIGIR, pp. 219--225, 2005. Google ScholarDigital Library
- S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. Spatio-textual indexing for geographical search on the web. In SSTD, pp. 218--235, 2005. Google ScholarDigital Library
- D. A. White and R. Jain. Similarity indexing with the SS-tree. In ICDE, pp. 516--523, 1996. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM TOIS, 22(2):179--214, 2004. Google ScholarDigital Library
- D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa. Keyword search in spatial databases: Towards searching by document. In ICDE, pp. 688--699, 2009. Google ScholarDigital Library
- Y. Zhou, X. Xie, C. Wang, Y. Gong, and W.-Y. Ma. Hybrid index structures for location-based web search. In CIKM, pp. 155--162, 2005. Google ScholarDigital Library
- J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2), 56 pages, 2006. Google ScholarDigital Library
- J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM TODS, 23(4):453--490, 1998. Google ScholarDigital Library
Index Terms
- Efficient retrieval of the top-k most relevant spatial web objects
Recommendations
Retrieving top-k prestige-based relevant spatial web objects
The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently in many types of mobile and traditional web services and applications, e.g.,...
Finding top-k relevant groups of spatial web objects
The web is increasingly being accessed from geo-positioned devices such as smartphones, and rapidly increasing volumes of web content are geo-tagged. In addition, studies show that a substantial fraction of all web queries has local intent. This ...
A framework for efficient spatial web object retrieval
The conventional Internet is acquiring a geospatial dimension. Web documents are being geo-tagged and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and ...
Comments