Skip to main content
Erschienen in: Cluster Computing 4/2017

12.09.2017

An eigenvalue-based pivot selection strategy for efficient indexing and searching in metric spaces

verfasst von: Sung-Hwan Kim, Da-Young Lee, Hwan-Gue Cho

Erschienen in: Cluster Computing | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Pivots are used widely during indexing and searching in metric spaces. We maintain the distances from pivots to data objects to be indexed so the pre-computed distances can be used to prune unpromising objects during the search process. The search efficiency depends on the pivots used, but choosing good pivots is a challenging task. In this paper, we propose a new pivot selection method that incrementally chooses pivots using an eigenvalue-based uncorrelatedness scoring function. We also present a GPU implementation for computing the uncorrelatedness score in order to accelerate the pivot selection process. Our experimental results demonstrated that the proposed method performed better than other previously described pivot selection methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Beecks, C., Lokoč, J., Seidl, T., Skopal, T.: Indexing the signature quadratic form distance for efficient content-based multimedia retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval (ICMR), p. 24 (2011) Beecks, C., Lokoč, J., Seidl, T., Skopal, T.: Indexing the signature quadratic form distance for efficient content-based multimedia retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval (ICMR), p. 24 (2011)
2.
Zurück zum Zitat Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)CrossRef Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)CrossRef
3.
Zurück zum Zitat Böhm, C., Braunmüller, B., Breunig, M., Kriegel, H.P.: High performance clustering based on the similarity join. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM), pp. 298–305 (2000) Böhm, C., Braunmüller, B., Breunig, M., Kriegel, H.P.: High performance clustering based on the similarity join. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM), pp. 298–305 (2000)
4.
Zurück zum Zitat Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21st Conference on Very Large Databases (VLDB), pp. 574–584 (1995) Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21st Conference on Very Large Databases (VLDB), pp. 574–584 (1995)
5.
Zurück zum Zitat Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24, 2357–2366 (2003)CrossRefMATH Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24, 2357–2366 (2003)CrossRefMATH
6.
Zurück zum Zitat Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRef Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRef
7.
Zurück zum Zitat Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE), pp. 591–602 (2015) Chen, L., Gao, Y., Li, X., Jensen, C.S., Chen, G.: Efficient metric indexing for similarity search. In: Proceedings of the 31st IEEE International Conference on Data Engineering (ICDE), pp. 591–602 (2015)
8.
Zurück zum Zitat Coskun, B., Giura, P.: Mitigating SMS spam by online detection of repetitive near-duplicate messages. In: Proceedings of the IEEE International Conference on Communication (ICC), pp. 999–1004 (2012) Coskun, B., Giura, P.: Mitigating SMS spam by online detection of repetitive near-duplicate messages. In: Proceedings of the IEEE International Conference on Communication (ICC), pp. 999–1004 (2012)
9.
Zurück zum Zitat Farago, A., Linder, T., Lugosi, G.: Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 957–962 (1999)CrossRef Farago, A., Linder, T., Lugosi, G.: Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 957–962 (1999)CrossRef
10.
Zurück zum Zitat Traina Jr., C., Filho, R.F.S., Traina, A.J.M., Vieira, M.R.: The omni-family of all purpose access method: a simple and effective way to make similarity search more efficient. VLDB J 16, 483–505 (2007)CrossRef Traina Jr., C., Filho, R.F.S., Traina, A.J.M., Vieira, M.R.: The omni-family of all purpose access method: a simple and effective way to make similarity search more efficient. VLDB J 16, 483–505 (2007)CrossRef
11.
Zurück zum Zitat Kim, S.H., Lee, D.Y., Cho, H.G.: An eigenvalue-based pivot selection strategy for improving search efficiency in metric spaces. In: Proceedings of the 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 207–214 (2016) Kim, S.H., Lee, D.Y., Cho, H.G.: An eigenvalue-based pivot selection strategy for improving search efficiency in metric spaces. In: Proceedings of the 2016 International Conference on Big Data and Smart Computing (BigComp), pp. 207–214 (2016)
12.
Zurück zum Zitat Mao, R., Miranker, L., Miranker, D.P.: Pivot selection: Dimension reduction for distance-based indexing. J Discret. Algorithms 13, 32–46 (2012)CrossRefMATHMathSciNet Mao, R., Miranker, L., Miranker, D.P.: Pivot selection: Dimension reduction for distance-based indexing. J Discret. Algorithms 13, 32–46 (2012)CrossRefMATHMathSciNet
13.
Zurück zum Zitat Maon, R., Liu, S., Xu, H., Zhang, D., Miranker, D.P.: On data partitioning in tree structure metric-space indexes. In: Lecture Notes in Computer Science: Database Systems for Advanced Applications, vol. 8421, pp. 141–155 (2014) Maon, R., Liu, S., Xu, H., Zhang, D., Miranker, D.P.: On data partitioning in tree structure metric-space indexes. In: Lecture Notes in Computer Science: Database Systems for Advanced Applications, vol. 8421, pp. 141–155 (2014)
14.
Zurück zum Zitat Micó, M.L., Oncina, J.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognit. Lett. 15(1), 9–17 (1994)CrossRef Micó, M.L., Oncina, J.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognit. Lett. 15(1), 9–17 (1994)CrossRef
15.
Zurück zum Zitat Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 269–278 (2002) Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 269–278 (2002)
16.
Zurück zum Zitat Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Proceedings of the 6th International Conference on Implementation and Application of Automata (CIAA), pp. 251–260 (2001) Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Proceedings of the 6th International Conference on Implementation and Application of Automata (CIAA), pp. 251–260 (2001)
17.
Zurück zum Zitat Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett 40, 175–179 (1991)CrossRefMATH Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett 40, 175–179 (1991)CrossRefMATH
18.
Zurück zum Zitat Uribe-Paredes, R., Valero-Lara, ., Arias, E., Sánchez, J.L., Cazorla, D.: A GPU-based implementation for range queries on spaghettis data structure. In: Proceedings of the 11th International Conference on Computational Science and Its Applications. Lecture Notes in Computer Science, vol. 6782, pp. 615–629 (2011) Uribe-Paredes, R., Valero-Lara, ., Arias, E., Sánchez, J.L., Cazorla, D.: A GPU-based implementation for range queries on spaghettis data structure. In: Proceedings of the 11th International Conference on Computational Science and Its Applications. Lecture Notes in Computer Science, vol. 6782, pp. 615–629 (2011)
19.
Zurück zum Zitat Yoon, T., Park, S.Y., Cho, H.G.: A smart filtering system for newly coined profanities by using approximate string alignment. In: Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT), pp. 643–650 (2010) Yoon, T., Park, S.Y., Cho, H.G.: A smart filtering system for newly coined profanities by using approximate string alignment. In: Proceedings of the 10th IEEE International Conference on Computer and Information Technology (CIT), pp. 643–650 (2010)
20.
Zurück zum Zitat Zhou, X., Wang, G., Zhou, X., Yu, G.: Bm+-tree: A hyperplane-based index method for high-dimensional metric spaces. In: Lecture Notes in Computer Science: Database Systems for Advanced Applications, vol. 3453, pp. 398–409 (2005) Zhou, X., Wang, G., Zhou, X., Yu, G.: Bm+-tree: A hyperplane-based index method for high-dimensional metric spaces. In: Lecture Notes in Computer Science: Database Systems for Advanced Applications, vol. 3453, pp. 398–409 (2005)
Metadaten
Titel
An eigenvalue-based pivot selection strategy for efficient indexing and searching in metric spaces
verfasst von
Sung-Hwan Kim
Da-Young Lee
Hwan-Gue Cho
Publikationsdatum
12.09.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1153-4

Weitere Artikel der Ausgabe 4/2017

Cluster Computing 4/2017 Zur Ausgabe