ABSTRACT
Peer-to-peer Data Networks (PDNs) are large-scale, self-organizing, distributed query processing systems. Familiar examples of PDN are peer-to-peer file-sharing networks, which support exact-match search queries to locate user-requested files. In this paper, we formalize the more general problem of <i>similarity-search</i> in PDNs, and propose a <i>family</i> of distributed access methods, termed <i>Small-World Access Methods (SWAM)</i>, for efficient execution of various similarity-search queries, namely exact-match, range, and k-nearest-neighbor queries. Unlike its predecessors, i.e., LH* and DHTs, SWAM does not control the assignment of data objects to PDN nodes; each node autonomously stores its own data. Besides, SWAM supports all similarity-search queries on multiple attributes. SWAM guarantees that the query object will be found (if it exists in the network) in average time logarithmically proportional to the network size. Moreover, once the query object is found, all the similar objects would be in its proximate network neighborhood and hence enabling efficient range and k-nearest-neighbor queries.
As a specific instance of SWAM, we propose <i>SWAM-V</i>, a Voronoi-based SWAM that indexes PDNs with multi-attribute data objects. For a PDN with <i>N</i> nodes SWAM-V has query time, communication cost, and computation cost of <i>O</i>(log <i>N</i>) for exact-match queries, and <i>O</i>(log <i>N</i> + <b>s</b><i>N</i>) and <i>O</i>(log <i>N</i> + <b>k</b>) for range queries (with selectivity <b>s</b>) and <b>k</b>NN queries, respectively. Our experiments show that SWAM-V consistently outperforms a similarity-search enabled version of CAN in query time and communication cost by a factor of 2 to 3.
- K. Aberer, P. Cudré-Mauroux, A. Datta, Z. Despotovic, M. Hauswirth, M. Punceva, and R. Schmidt. P-grid: A self-organizing structured p2p system. SIGMOD Record, 32(2), 2003. Google ScholarDigital Library
- F. Banaei-Kashani and C. Shahabi. Efficient flooding in power-law networks. In Proceedings of Twenty-Second ACM Symposium on Principles of Distributed Computing (PODC'03), July 2003. Google ScholarDigital Library
- F. Banaei-Kashani and C. Shahabi. Searchable querical data networks. In Proceedings of the International Workshop on Databases, Information Systems and Peer-to-Peer Computing in conjunction with VLDB'03, September 2003.Google Scholar
- M. Bawa, G. Manku, and P. Raghavan. SETS: Search enhanced by topic-segmentation. In Proceedings of the 26th Annual International Conference on Research and Development in Informaion Retrieval (SIGIR'03), August 2003. Google ScholarDigital Library
- B. Bollobás. Random Graphs. Academic Press, New York, 1985.Google Scholar
- S. Brin. Near neighbor search in large metric spaces. In Proceedings of the 21th International Conferenceon Very Large Data Bases (VLDB'95), September 1995. Google ScholarDigital Library
- E. Chavez, G. Navarro, R. A. Baeza-Yates, and J. L. Marroquin. Searching in metric spaces. ACM Computing Surveys, 33(3), 2001. Google ScholarDigital Library
- A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In Proceedings of ACM International Conference on Management of Data (SIGMOD'01), November 2001. Google ScholarDigital Library
- Freenet. Freenet project, 2004. http://freenet.sourceforge.net/.Google Scholar
- V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2), 1997. Google ScholarDigital Library
- A. Gupta, D. Agrawal, and A. El Abbadi. Approximate range selection queries in peer-to-peer systems. In Proceedings of the First Biennial Conference on Innovative Data Systems Research, January 2003.Google Scholar
- R. Huebsch, N. Lanham, B. Loo, J. Hellerstein, S. Shenker, and I. Stoica. Querying the inernet withPIER. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB'03), September 2003. Google ScholarDigital Library
- KaZaA. Sharman networks, 2004. http://www.kazaa.com/.Google Scholar
- J. Kleinberg. The small-world phenomenon: an algorithmic perspective. In Proceedings of the 32nd ACM Symposium on Theory of Computing, May 2000. Google ScholarDigital Library
- W. Litwin, M. Neimat, and D. Schneider. LH*:A scalable, distributed data structure. ACM Transactions on Database Systems, 21(4), 1996. Google ScholarDigital Library
- G. Navarro. Searching in metric spaces by spatial approximation. The Very Large Databases Journal (VLDBJ), 11(1), 2002. Google ScholarDigital Library
- A. Okabe, B. Boots, K. Sugihara, and S. Chiu. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley, 2nd edition, 2000. Google ScholarDigital Library
- S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In Proceedings of ACM SIGCOMM '01, August 2001. Google ScholarDigital Library
- R. Seidel. Exact upper bounds for the number of faces in d-dimensional Voronoi diagrams, DIMACS Series, volume~4. American Mathematical Society, 1991.Google Scholar
- SETI@home, 2004. http://setiathome.ssl.berkeley.edu//.Google Scholar
- I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of ACM SIGCOMM '01, August 2001. Google ScholarDigital Library
- D. Watts and S. Strogatz. Collective dynamics of small world networks. Nature, (393):440--442, 1998.Google Scholar
- B. Yang and H. Garcia-Molina. Designing a super-peer network. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03), March 2003.Google Scholar
Index Terms
- SWAM: a family of access methods for similarity-search in peer-to-peer data networks
Recommendations
Improving lookup latency in distributed hash table systems using random sampling
Distributed hash table (DHT) systems are an important class of peer-to-peer routing infrastructures. They enable scalable wide-area storage and retrieval of information, and will support the rapid development of a wide variety of Internet-scale ...
Efficient Range Query Processing in Peer-to-Peer Systems
With the increasing popularity of the peer-to-peer (P2P) computing paradigm, many general range query schemes for distributed hash table (DHT)-based P2P systems have been proposed in recent years. Although those schemes can provide range query ...
Chord2: A two-layer Chord for reducing maintenance overhead via heterogeneity
Empirical studies have shown that participating nodes in peer-to-peer (P2P) systems are not equivalent. Some nodes, known as ''super peers'', are more powerful and stable than the others. Such heterogeneity has been taken into account in the design of ...
Comments