skip to main content
10.1145/1142473.1142505acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Efficient query processing in geographic web search engines

Published:27 June 2006Publication History

ABSTRACT

Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.

References

  1. E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In Proc. of the 27th Annual SIGIR Conf. on Research and Development in Information Retrieval, pages 273--280, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Anh, O. Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. of the 24th Annual SIGIR Conf. on Research and Development in Information Retrieval, pages 35--42, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Arasu, J. Cho, H. Garcia-Molina, and S. Raghavan. Searching the web. ACM Transactions on Internet Technologies, 1(1), June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addision Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 322--331, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. U. Board. Dublin Core Qualifiers. Recommendation of the DCMI, Dublin Core Metadata Initiative, Jul 2000.Google ScholarGoogle Scholar
  7. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. of the 7th World Wide Web Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Buyukkokten, J. Cho, H. Garcia-Molina, L. Gravano, and N. Shivakumar. Exploiting Geographical Location Information of Web Pages. In 2nd Int. Workshop on the Web and Databases (WebDB), pages 91--96, 1999.Google ScholarGoogle Scholar
  9. S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. In Proc. of the 8th Int. World Wide Web Conference, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chaudhuri and L. Gravano. Optimizing queries over multimedia repositories. Data Engineering Bulletin, 19(4):45--52, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Daviel. Geographic registration of HTML documents, Apr 2001. Internet Draft http://geotags.com/geo/draft-daviel-html-geo-tag-05.html.Google ScholarGoogle Scholar
  12. J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proc. of the 26th Conf. on Very Large Data Bases (VLDB), pages 545--556, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Egenhofer. Toward the semantic geospatial web. In Proc. of the 10th ACM Int. Symp. on Advances in Geographic Information Systems (GIS), pages 1--4, Nov 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Fagin. Combining fuzzy information from multiple systems. In Proc. of the ACM Symp. on Principles of Database Systems, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Proc. of the ACM Symp. on Principles of Database Systems, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Fontoura, J. Zien, R. Lempel, and R. Qi. Inverted index support for parametric search. Technical Paper RJ10329, IBM, 2004.Google ScholarGoogle Scholar
  17. V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170--231, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Gravano. Geosearch: A geographically-aware search engine, 2003. http://geosearch.cs.columbia.edu.Google ScholarGoogle Scholar
  19. L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. Categorizing web queries according to geographical locality. In Proc. of the 12th Conf. on Information and Knowledge Management (CIKM), pages 325--333, November 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. H. Güting. An introduction to spatial database systems. VLDB Journal, 3(4):357--399, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. B. Jones, A. I. Abdelmoty, D. Finch, G. Fu, and S. Vaid. The spirit spatial search engine: Architecture, ontologies and spatial indexing. In Proc. 3rd Int. Conf. on Geographic Information Science, pages 125--139, October 2004.Google ScholarGoogle ScholarCross RefCross Ref
  22. B. T. Jonsson, M. J. Franklin, and D. Srivastava. Interaction of query evaluation and buffer management for information retrieval. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 118--129, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Kaszkiel, J. Zobel, and R. Sacks-Davis. Efficient passage ranking for document databases. ACM Transactions on Information Systems (TOIS), 17(4):406--439, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Lempel and S. Moran. Optimizing result prefetching in web search engines with segmented indices. In Proc. of the 28th Int. Conf. on Very Large Data Bases (VLDB), August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proc. of the 12th Int. World Wide Web Conference, pages 19--28, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Long and T. Suel. Optimized query execution in large search engines with global page ordering. In Proc. of the 29th Int. Conf. on Very Large Data Bases (VLDB), September 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In Proc. of the 14th Int. World Wide Web Conference, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Markatos. On caching search engine query results. In 5th Int. Web Caching and Content Delivery Workshop, May 2000.Google ScholarGoogle Scholar
  29. A. Markowetz, T. Brinkhoff, and B. Seeger. Exploiting the internet as a geospatial database. In Workshop on Next Generation Geospatial Information, October 2003. (Also presented at the 3rd Int. Workshop on Web Dynamics at WWW13, May 2004.)Google ScholarGoogle Scholar
  30. A. Markowetz, Y.-Y. Chen, T. Suel, X. Long, and B. Seeger. Design and implementation of a geographic search engine. In 8th Int. Workshop on the Web and Databases (WebDB), June 2005.Google ScholarGoogle Scholar
  31. B. Martins, M. Silva, and L. Andrade. Indexing and ranking in geoir systems. In Proc. of the 2nd Int. Workshop on Geo-IR (GIR), November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. McCurley. Geospatial mapping and navigation of the web. In Proc. of the 10th Int. World Wide Web Conference, pages 221--229, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, pages 349--379, October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Navarro, E. de Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Adding compression to block addressing inverted indexes. Information Retrieval, pages 49--77, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, 9(1):38--71, March 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Ohm, G. Klump, and H. Kriegel. Xz-ordering: A space-filling curve for objects with spatial extension. In Proc. of the 6th Int. Symp. on Advances in Spatial Databases, pages 75--90, July 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science, 47(10):749--764, May 1996. Google ScholarGoogle ScholarCross RefCross Ref
  38. Räber Information Management GmbH. www.search.ch.Google ScholarGoogle Scholar
  39. K. Risvik and R. Michelsen. Search engines and web dynamics. Computer Networks, 39:289--302, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  40. P. Saraiva, E. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Ribeiro-Neto. Rank-preserving two-level caching for scalable search engines. In Proc. of the 24th Annual SIGIR Conf. on Research and Development in Information Retrieval, pages 51--58, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. F. Scholer, H. Williams, J. Yiannis, and J. Zobel. Compression of inverted indexes for fast query evaluation. In Proc. of the 25th Annual SIGIR Conf. on Research and Development in Information Retrieval, pages 222--229, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. Spatio-textual indexing for geographical search on the web. In Proc. of the 9th Int. Symp. on Spatial and Temporal Databases (SSTD), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. van Kreveld, I. Reinbacher, A. Arampatzis, and R. van Zwol. Distributed ranking methods for geographic information retrieval. In 20th European Workshop on Computational Geometry, March 2004.Google ScholarGoogle Scholar
  44. I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, second edition, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Xie and D. O'Hallaron. Locality in search engine queries and its implications for caching. In IEEE Infocom 2002, pages 1238--1247, March 2002.Google ScholarGoogle Scholar
  46. Y. Zhou, X. Xie, C. Wang, Y. Gong, and W. Ma. Hybrid index structures for location-based web search. In Proc. of the 14th Conf. on Information and Knowledge Management (CIKM), pages 155--162, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient query processing in geographic web search engines

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
          June 2006
          830 pages
          ISBN:1595934340
          DOI:10.1145/1142473

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader