ABSTRACT
Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.
- E. Amitay, N. Har'El, R. Sivan, and A. Soffer. Web-a-where: geotagging web content. In Proc. of the 27th Annual SIGIR Conf. on Research and Development in Information Retrieval, pages 273--280, 2004. Google ScholarDigital Library
- V. Anh, O. Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. of the 24th Annual SIGIR Conf. on Research and Development in Information Retrieval, pages 35--42, September 2001. Google ScholarDigital Library
- A. Arasu, J. Cho, H. Garcia-Molina, and S. Raghavan. Searching the web. ACM Transactions on Internet Technologies, 1(1), June 2001. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addision Wesley, 1999. Google ScholarDigital Library
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 322--331, May 1990. Google ScholarDigital Library
- D. U. Board. Dublin Core Qualifiers. Recommendation of the DCMI, Dublin Core Metadata Initiative, Jul 2000.Google Scholar
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. of the 7th World Wide Web Conference, 1998. Google ScholarDigital Library
- O. Buyukkokten, J. Cho, H. Garcia-Molina, L. Gravano, and N. Shivakumar. Exploiting Geographical Location Information of Web Pages. In 2nd Int. Workshop on the Web and Databases (WebDB), pages 91--96, 1999.Google Scholar
- S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. In Proc. of the 8th Int. World Wide Web Conference, May 1999. Google ScholarDigital Library
- S. Chaudhuri and L. Gravano. Optimizing queries over multimedia repositories. Data Engineering Bulletin, 19(4):45--52, 1996. Google ScholarDigital Library
- A. Daviel. Geographic registration of HTML documents, Apr 2001. Internet Draft http://geotags.com/geo/draft-daviel-html-geo-tag-05.html.Google Scholar
- J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proc. of the 26th Conf. on Very Large Data Bases (VLDB), pages 545--556, September 2000. Google ScholarDigital Library
- M. Egenhofer. Toward the semantic geospatial web. In Proc. of the 10th ACM Int. Symp. on Advances in Geographic Information Systems (GIS), pages 1--4, Nov 2002. Google ScholarDigital Library
- R. Fagin. Combining fuzzy information from multiple systems. In Proc. of the ACM Symp. on Principles of Database Systems, 1996. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Proc. of the ACM Symp. on Principles of Database Systems, 2001. Google ScholarDigital Library
- M. Fontoura, J. Zien, R. Lempel, and R. Qi. Inverted index support for parametric search. Technical Paper RJ10329, IBM, 2004.Google Scholar
- V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170--231, 1998. Google ScholarDigital Library
- L. Gravano. Geosearch: A geographically-aware search engine, 2003. http://geosearch.cs.columbia.edu.Google Scholar
- L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. Categorizing web queries according to geographical locality. In Proc. of the 12th Conf. on Information and Knowledge Management (CIKM), pages 325--333, November 2003. Google ScholarDigital Library
- R. H. Güting. An introduction to spatial database systems. VLDB Journal, 3(4):357--399, 1994. Google ScholarDigital Library
- C. B. Jones, A. I. Abdelmoty, D. Finch, G. Fu, and S. Vaid. The spirit spatial search engine: Architecture, ontologies and spatial indexing. In Proc. 3rd Int. Conf. on Geographic Information Science, pages 125--139, October 2004.Google ScholarCross Ref
- B. T. Jonsson, M. J. Franklin, and D. Srivastava. Interaction of query evaluation and buffer management for information retrieval. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 118--129, June 1998. Google ScholarDigital Library
- M. Kaszkiel, J. Zobel, and R. Sacks-Davis. Efficient passage ranking for document databases. ACM Transactions on Information Systems (TOIS), 17(4):406--439, October 1999. Google ScholarDigital Library
- R. Lempel and S. Moran. Optimizing result prefetching in web search engines with segmented indices. In Proc. of the 28th Int. Conf. on Very Large Data Bases (VLDB), August 2002. Google ScholarDigital Library
- R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proc. of the 12th Int. World Wide Web Conference, pages 19--28, 2003. Google ScholarDigital Library
- X. Long and T. Suel. Optimized query execution in large search engines with global page ordering. In Proc. of the 29th Int. Conf. on Very Large Data Bases (VLDB), September 2003. Google ScholarDigital Library
- X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In Proc. of the 14th Int. World Wide Web Conference, May 2005. Google ScholarDigital Library
- E. Markatos. On caching search engine query results. In 5th Int. Web Caching and Content Delivery Workshop, May 2000.Google Scholar
- A. Markowetz, T. Brinkhoff, and B. Seeger. Exploiting the internet as a geospatial database. In Workshop on Next Generation Geospatial Information, October 2003. (Also presented at the 3rd Int. Workshop on Web Dynamics at WWW13, May 2004.)Google Scholar
- A. Markowetz, Y.-Y. Chen, T. Suel, X. Long, and B. Seeger. Design and implementation of a geographic search engine. In 8th Int. Workshop on the Web and Databases (WebDB), June 2005.Google Scholar
- B. Martins, M. Silva, and L. Andrade. Indexing and ranking in geoir systems. In Proc. of the 2nd Int. Workshop on Geo-IR (GIR), November 2005. Google ScholarDigital Library
- K. McCurley. Geospatial mapping and navigation of the web. In Proc. of the 10th Int. World Wide Web Conference, pages 221--229, May 2001. Google ScholarDigital Library
- A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, pages 349--379, October 1996. Google ScholarDigital Library
- G. Navarro, E. de Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Adding compression to block addressing inverted indexes. Information Retrieval, pages 49--77, July 2000. Google ScholarDigital Library
- J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, 9(1):38--71, March 1984. Google ScholarDigital Library
- C. Ohm, G. Klump, and H. Kriegel. Xz-ordering: A space-filling curve for objects with spatial extension. In Proc. of the 6th Int. Symp. on Advances in Spatial Databases, pages 75--90, July 1999. Google ScholarDigital Library
- M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science, 47(10):749--764, May 1996. Google ScholarCross Ref
- Räber Information Management GmbH. www.search.ch.Google Scholar
- K. Risvik and R. Michelsen. Search engines and web dynamics. Computer Networks, 39:289--302, 2002.Google ScholarCross Ref
- P. Saraiva, E. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Ribeiro-Neto. Rank-preserving two-level caching for scalable search engines. In Proc. of the 24th Annual SIGIR Conf. on Research and Development in Information Retrieval, pages 51--58, September 2001. Google ScholarDigital Library
- F. Scholer, H. Williams, J. Yiannis, and J. Zobel. Compression of inverted indexes for fast query evaluation. In Proc. of the 25th Annual SIGIR Conf. on Research and Development in Information Retrieval, pages 222--229, August 2002. Google ScholarDigital Library
- S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. Spatio-textual indexing for geographical search on the web. In Proc. of the 9th Int. Symp. on Spatial and Temporal Databases (SSTD), 2005. Google ScholarDigital Library
- M. van Kreveld, I. Reinbacher, A. Arampatzis, and R. van Zwol. Distributed ranking methods for geographic information retrieval. In 20th European Workshop on Computational Geometry, March 2004.Google Scholar
- I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, second edition, 1999. Google ScholarDigital Library
- Y. Xie and D. O'Hallaron. Locality in search engine queries and its implications for caching. In IEEE Infocom 2002, pages 1238--1247, March 2002.Google Scholar
- Y. Zhou, X. Xie, C. Wang, Y. Gong, and W. Ma. Hybrid index structures for location-based web search. In Proc. of the 14th Conf. on Information and Knowledge Management (CIKM), pages 155--162, November 2005. Google ScholarDigital Library
Index Terms
- Efficient query processing in geographic web search engines
Recommendations
Overlap Among Major Web Search Engines
ITNG '06: Proceedings of the Third International Conference on Information Technology: New GenerationsOur study examined the overlap among results retrieved by three major Web search engines for a large set of more than 10,316 queries. Previous smaller studies have discussed the lack of overlap in results returned by Web search engines for the same ...
Mining Web search engines for query suggestion
Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However,...
Query routing for Web search engines: architecture and experiments
AbstractGeneral-purpose search engines such as AltaVista and Lycos are notorious for returning irrelevant results in response to user queries. Consequently, thousands of specialized, topic-specific search engines (from VacationSpot.com to ...
Comments