Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 2/2016

01.04.2016 | Original Article

Pivot selection for metric-space indexing

verfasst von: Rui Mao, Peihan Zhang, Xingliang Li, Xi Liu, Minhua Lu

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Metric-space indexing abstracts various data types into universal metric spaces and prunes data only exploiting the triangle inequality of the distance function in metric spaces. Since there is no coordinates in metric space, one usually first pick a number of reference points, pivots, and consider the distances from a data point to the pivots as its coordinates. In this paper, we first survey and discuss the state of the art of pivot selection for metric-space indexing from the perspectives of importance, objective function, number of pivots, and selection algorithm. Further, we propose a new objective function, a new method to determine the number of pivots and an incremental sampling framework for pivot selection. Experimental results show that the new objective function is more consistent with the query performance, the new method to determine the number of pivots is more efficient, and the incremental sampling framework leads to better query performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Mao R, Honglong X, Wenbo W, Li J, Li Y, Minhua L (2015) Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication systems. IEEE Commun Mag 53(1):42–47CrossRef Mao R, Honglong X, Wenbo W, Li J, Li Y, Minhua L (2015) Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication systems. IEEE Commun Mag 53(1):42–47CrossRef
2.
Zurück zum Zitat Roman S (1992) Advanced linear. Algebra graduate texts in mathematics, vol 135. Springer, BerlinCrossRef Roman S (1992) Advanced linear. Algebra graduate texts in mathematics, vol 135. Springer, BerlinCrossRef
3.
Zurück zum Zitat Chavez E, Navarro G, Baeza-Yates R, Marroqu J (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321CrossRef Chavez E, Navarro G, Baeza-Yates R, Marroqu J (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321CrossRef
4.
Zurück zum Zitat Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach. Springer, HeidelbergMATH Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach. Springer, HeidelbergMATH
5.
Zurück zum Zitat Samet H (2006) Foundations of multidimensional and metric data structures. Morgan-Kaufmann, San FranciscoMATH Samet H (2006) Foundations of multidimensional and metric data structures. Morgan-Kaufmann, San FranciscoMATH
6.
Zurück zum Zitat Hjaltason G, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans Database Syst (TODS) 28(4):517–580CrossRef Hjaltason G, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans Database Syst (TODS) 28(4):517–580CrossRef
7.
Zurück zum Zitat Mao R, Miranker W, Miranker DP (2012) Pivot Selection: dimension reduction for distance-based indexing. J Discret Algorithm Elsevier 13:32–46MathSciNetCrossRefMATH Mao R, Miranker W, Miranker DP (2012) Pivot Selection: dimension reduction for distance-based indexing. J Discret Algorithm Elsevier 13:32–46MathSciNetCrossRefMATH
8.
Zurück zum Zitat Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Proc Lett 40(4):175–179CrossRefMATH Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Proc Lett 40(4):175–179CrossRefMATH
9.
Zurück zum Zitat Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In the fourth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In the fourth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics
10.
Zurück zum Zitat Bozkaya T, Ozsoyoglu M (1999) Indexing large metric spaces for similarity search queries. ACM Trans Database Syst 24(3):361–404CrossRef Bozkaya T, Ozsoyoglu M (1999) Indexing large metric spaces for similarity search queries. ACM Trans Database Syst 24(3):361–404CrossRef
11.
Zurück zum Zitat Bustos B, Navarro G, Chavez E (2003) Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn Lett 24(14):2357–2366CrossRefMATH Bustos B, Navarro G, Chavez E (2003) Pivot selection techniques for proximity searching in metric spaces. Pattern Recogn Lett 24(14):2357–2366CrossRefMATH
12.
Zurück zum Zitat Clarkson KL (2006) Nearest-neighbor searching and metric space dimensions, In: Nearest-neighbor methods for learning and vision: theory and practice, MIT Press, pp. 15–59 Clarkson KL (2006) Nearest-neighbor searching and metric space dimensions, In: Nearest-neighbor methods for learning and vision: theory and practice, MIT Press, pp. 15–59
13.
Zurück zum Zitat Kegl B (2003) Intrinsic dimension estimation using packing numbers. Adv Neural Inf Proc Syst 15:681–688 Kegl B (2003) Intrinsic dimension estimation using packing numbers. Adv Neural Inf Proc Syst 15:681–688
14.
Zurück zum Zitat Camastra F (2003) Data dimensionality estimation methods: a survey. Pattern Recogn 36(12):2945–2954CrossRefMATH Camastra F (2003) Data dimensionality estimation methods: a survey. Pattern Recogn 36(12):2945–2954CrossRefMATH
15.
Zurück zum Zitat Mao R, Xu W, Ramakrishnan S, Nuckolls G, Miranker DP (2005) On optimizing distance-based similarity search for biological databases. In the 2005 IEEE computational systems bioinformatics conference (CSB 2005) Mao R, Xu W, Ramakrishnan S, Nuckolls G, Miranker DP (2005) On optimizing distance-based similarity search for biological databases. In the 2005 IEEE computational systems bioinformatics conference (CSB 2005)
16.
Zurück zum Zitat Traina C, Jr, Traina A, Faloutsos C (1999) Distance exponent: a new concept for selectivity estimation in metric trees, Technical Report CMU-CS-99-110, Computer Science Department, Carnegie Mellon University Traina C, Jr, Traina A, Faloutsos C (1999) Distance exponent: a new concept for selectivity estimation in metric trees, Technical Report CMU-CS-99-110, Computer Science Department, Carnegie Mellon University
17.
Zurück zum Zitat Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? The 7th international conference on database theory. Springer, Berlin Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? The 7th international conference on database theory. Springer, Berlin
18.
Zurück zum Zitat Shaft U, Ramakrishnan R (2005) When is nearest neighbors indexable? In the tenth international conference on database theory (ICDT 2005). Springer, Berlin Shaft U, Ramakrishnan R (2005) When is nearest neighbors indexable? In the tenth international conference on database theory (ICDT 2005). Springer, Berlin
19.
Zurück zum Zitat Grassberger P, Procaccia I (1983) Measuring the strangeness of strange attractors. Physica 9D(1–2):189–208MathSciNetMATH Grassberger P, Procaccia I (1983) Measuring the strangeness of strange attractors. Physica 9D(1–2):189–208MathSciNetMATH
20.
Zurück zum Zitat Roweis S (1997) EM Algorithms for PCA and SPCA. Neural Inf Proc Syst 10:626–632 Roweis S (1997) EM Algorithms for PCA and SPCA. Neural Inf Proc Syst 10:626–632
21.
Zurück zum Zitat Brin S (1995) Near neighbor search in large metric spaces. In the 21th international conference on very large data bases (VLDB’95). 1995. Zurich, Switzerland, Morgan Kaufmann Publishers Inc Brin S (1995) Near neighbor search in large metric spaces. In the 21th international conference on very large data bases (VLDB’95). 1995. Zurich, Switzerland, Morgan Kaufmann Publishers Inc
22.
Zurück zum Zitat Ciaccia P, Patella M (1997) Bulk loading the M-tree. In 9th Australasian database conference (ADO’98) Ciaccia P, Patella M (1997) Bulk loading the M-tree. In 9th Australasian database conference (ADO’98)
23.
Zurück zum Zitat Navarro G (1999) Searching in metric spaces by spatial approximation. In: Proceedings of the string processing and information retrieval symposium and international workshop on groupware. IEEE Computer Society Navarro G (1999) Searching in metric spaces by spatial approximation. In: Proceedings of the string processing and information retrieval symposium and international workshop on groupware. IEEE Computer Society
25.
Zurück zum Zitat Hochbaum DS, David B (1985) Shmoys, A best possible heuristic for the k-center problem. Math Op Res 10(2):180–184CrossRefMATH Hochbaum DS, David B (1985) Shmoys, A best possible heuristic for the k-center problem. Math Op Res 10(2):180–184CrossRefMATH
27.
Zurück zum Zitat Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453CrossRef Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453CrossRef
28.
Zurück zum Zitat Xu W, Miranker DP (2004) A metric model of amino acid substitution. Bioinformatics 20(8):1214–1221CrossRef Xu W, Miranker DP (2004) A metric model of amino acid substitution. Bioinformatics 20(8):1214–1221CrossRef
29.
Zurück zum Zitat Navarro G (2009) Analyzing metric space indexes: what for? In the proceedings of the second international conference on similarity search and applications (SISAP2009), pp. 3–10 Navarro G (2009) Analyzing metric space indexes: what for? In the proceedings of the second international conference on similarity search and applications (SISAP2009), pp. 3–10
30.
Zurück zum Zitat Venkateswaran J, Kahveci T, Jermaine CM, Lachwani D (2008) Reference-based indexing for metric spaces with costly distance measures. VLDB J 17(5):1231–1251 Springer CrossRef Venkateswaran J, Kahveci T, Jermaine CM, Lachwani D (2008) Reference-based indexing for metric spaces with costly distance measures. VLDB J 17(5):1231–1251 Springer CrossRef
31.
Zurück zum Zitat Celik C (2002) Priority vantage points structures for similarity queries in metric spaces. In: Proceedings of EurAsia-ICT 2002: information and communication technology, ser. LNCS(2510). pp. 256–263. Springer Celik C (2002) Priority vantage points structures for similarity queries in metric spaces. In: Proceedings of EurAsia-ICT 2002: information and communication technology, ser. LNCS(2510). pp. 256–263. Springer
32.
Zurück zum Zitat Celik C (2008) Effective use of space for pivot-based metric indexing structures. In: Proceedings of international workshop on similarity search and applications (SISAP’08). IEEE Press, pp. 402–409 Celik C (2008) Effective use of space for pivot-based metric indexing structures. In: Proceedings of international workshop on similarity search and applications (SISAP’08). IEEE Press, pp. 402–409
33.
Zurück zum Zitat Micó ML, Oncina J, Vidal E (1994) A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognition Letters 5(1):9–17CrossRef Micó ML, Oncina J, Vidal E (1994) A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognition Letters 5(1):9–17CrossRef
34.
Zurück zum Zitat Vleugels J, Veltkamp RC (2002) Efficient image retrieval through vantage objects. Pattern Recogn. 35(1):69–80 Elsevier CrossRefMATH Vleugels J, Veltkamp RC (2002) Efficient image retrieval through vantage objects. Pattern Recogn. 35(1):69–80 Elsevier CrossRefMATH
35.
Zurück zum Zitat Van Leuken RH, Veltkamp RC (2011) Selecting vantage objects for similarity indexing. ACM Trans Multim Comput Commun Appl 7(3):1–18CrossRef Van Leuken RH, Veltkamp RC (2011) Selecting vantage objects for similarity indexing. ACM Trans Multim Comput Commun Appl 7(3):1–18CrossRef
36.
Zurück zum Zitat Shapiro M (1977) The choice of reference points in best-match file searching. Commun ACM 20(5):339–343CrossRef Shapiro M (1977) The choice of reference points in best-match file searching. Commun ACM 20(5):339–343CrossRef
37.
Zurück zum Zitat Ramasubramanian V, Paliwal KK (1992) An efficient approximation-elimination algorithm for fast nearest-neighbor search based on a spherical distance coordinate formulation. Pattern Recogn Lett 13(7):471–480CrossRef Ramasubramanian V, Paliwal KK (1992) An efficient approximation-elimination algorithm for fast nearest-neighbor search based on a spherical distance coordinate formulation. Pattern Recogn Lett 13(7):471–480CrossRef
38.
Zurück zum Zitat Traina C Jr, Filho RF, Traina AJ, Vieira MR, Faloutsos C (2007) The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. VLDB J 16(4):483–505CrossRef Traina C Jr, Filho RF, Traina AJ, Vieira MR, Faloutsos C (2007) The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. VLDB J 16(4):483–505CrossRef
39.
Zurück zum Zitat Mao R, Xu W, Singh N, Miranker DP (2005) An assessment of a metric space database index to support sequence homology. Int J Artif Intell Tools 14(5):867–885CrossRef Mao R, Xu W, Singh N, Miranker DP (2005) An assessment of a metric space database index to support sequence homology. Int J Artif Intell Tools 14(5):867–885CrossRef
40.
Zurück zum Zitat Hennig C, Latecki LJ (2003) The choice of vantage objects for image retrieval. Pattern Recognit 36(9):2187–2196CrossRefMATH Hennig C, Latecki LJ (2003) The choice of vantage objects for image retrieval. Pattern Recognit 36(9):2187–2196CrossRefMATH
41.
Zurück zum Zitat Brisaboa NR, Farina A, Pedreira O, Reyes N (2006) Similarity search using sparse pivots for efficient multimedia information retrieval. In Proceedings of the 8th IEEE international symposium on multimedia (ISM’06). IEEE Press, pp. 881–888 Brisaboa NR, Farina A, Pedreira O, Reyes N (2006) Similarity search using sparse pivots for efficient multimedia information retrieval. In Proceedings of the 8th IEEE international symposium on multimedia (ISM’06). IEEE Press, pp. 881–888
42.
Zurück zum Zitat Bustos B, Pedreira O, Brisaboa NR (2008) A dynamic pivot selection technique for similarity search in metric spaces. In Proceedings of 1st international workshop on similarity search and applications (SISAP’08). IEEE Press, pp. 105–112 Bustos B, Pedreira O, Brisaboa NR (2008) A dynamic pivot selection technique for similarity search in metric spaces. In Proceedings of 1st international workshop on similarity search and applications (SISAP’08). IEEE Press, pp. 105–112
43.
Zurück zum Zitat Berman A, Shapiro LG (1998) Selecting good keys for triangle-inequality-based pruning algorithms. In: Proceedings of the 1998 international workshop on content-based access of image and video databases (CAIVD ‘98), pp. 12–19,1998, Bombay, India Berman A, Shapiro LG (1998) Selecting good keys for triangle-inequality-based pruning algorithms. In: Proceedings of the 1998 international workshop on content-based access of image and video databases (CAIVD ‘98), pp. 12–19,1998, Bombay, India
Metadaten
Titel
Pivot selection for metric-space indexing
verfasst von
Rui Mao
Peihan Zhang
Xingliang Li
Xi Liu
Minhua Lu
Publikationsdatum
01.04.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 2/2016
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-016-0504-4

Weitere Artikel der Ausgabe 2/2016

International Journal of Machine Learning and Cybernetics 2/2016 Zur Ausgabe

Neuer Inhalt