Skip to main content

2018 | OriginalPaper | Buchkapitel

Efficient Approximate Algorithms for the Closest Pair Problem in High Dimensional Spaces

verfasst von : Xingyu Cai, Sanguthevar Rajasekaran, Fan Zhang

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Closest Pair Problem (CPP) is one of the fundamental problems that has a wide range of applications in data mining, such as unsupervised data clustering, user pattern similarity search, etc. A number of exact and approximate algorithms have been proposed to solve it in the low dimensional space. In this paper, we address the problem when the metric space is of a high dimension. For example, the drug-target or movie-user interaction data could contain as many as hundreds of features. To solve this problem under the \(\ell _2\) norm, we present two novel approximate algorithms. Our algorithms are based on the novel idea of projecting the points into the real line. We prove high probability bounds on the run time and accuracy for both of the proposed algorithms. Both algorithms are evaluated via comprehensive experiments and compared with existing best-known approaches. The experiments reveal that our proposed approaches outperform the existing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Bentley, J.L., Shamos, M.I.: Divide-and-conquer in multidimensional space. In: Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pp. 220–230. ACM (1976) Bentley, J.L., Shamos, M.I.: Divide-and-conquer in multidimensional space. In: Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pp. 220–230. ACM (1976)
2.
Zurück zum Zitat Chaudhuri, S., Dubhashi, D.: Probabilistic recurrence relations revisited. Theoret. Comput. Sci. 181(1), 45–56 (1997)MathSciNetCrossRef Chaudhuri, S., Dubhashi, D.: Probabilistic recurrence relations revisited. Theoret. Comput. Sci. 181(1), 45–56 (1997)MathSciNetCrossRef
3.
Zurück zum Zitat Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: ACM SIGMOD Record, vol. 29, pp. 189–200. ACM (2000)CrossRef Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: ACM SIGMOD Record, vol. 29, pp. 189–200. ACM (2000)CrossRef
4.
Zurück zum Zitat Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)MathSciNetCrossRef Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)MathSciNetCrossRef
5.
Zurück zum Zitat Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004) Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)
6.
Zurück zum Zitat Dietzfelbinger, M., Hagerup, T., Katajainen, J., Penttonen, M.: A reliable randomized algorithm for the closest-pair problem. J. Algorithms 25(1), 19–51 (1997)MathSciNetCrossRef Dietzfelbinger, M., Hagerup, T., Katajainen, J., Penttonen, M.: A reliable randomized algorithm for the closest-pair problem. J. Algorithms 25(1), 19–51 (1997)MathSciNetCrossRef
7.
Zurück zum Zitat Rabin, M.: Probabilistic algorithms (1976) Rabin, M.: Probabilistic algorithms (1976)
8.
Zurück zum Zitat Fortune, S., Hopcroft, J.: A note on Rabin’s nearest-neighbor algorithm. Inf. Process. Lett. 8(1), 20–23 (1979)MathSciNetCrossRef Fortune, S., Hopcroft, J.: A note on Rabin’s nearest-neighbor algorithm. Inf. Process. Lett. 8(1), 20–23 (1979)MathSciNetCrossRef
9.
Zurück zum Zitat Ge, Q., Wang, H.-T., Zhu, H.: An improved algorithm for finding the closest pair of points. J. Comput. Sci. Technol. 21(1), 27–31 (2006)MathSciNetCrossRef Ge, Q., Wang, H.-T., Zhu, H.: An improved algorithm for finding the closest pair of points. J. Comput. Sci. Technol. 21(1), 27–31 (2006)MathSciNetCrossRef
11.
Zurück zum Zitat Jiang, M., Gillespie, J.: Engineering the divide-and-conquer closest pair algorithm. J. Comput. Sci. Technol. 22(4), 532–540 (2007)MathSciNetCrossRef Jiang, M., Gillespie, J.: Engineering the divide-and-conquer closest pair algorithm. J. Comput. Sci. Technol. 22(4), 532–540 (2007)MathSciNetCrossRef
12.
Zurück zum Zitat Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26(189–206), 1 (1984)MathSciNetMATH Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26(189–206), 1 (1984)MathSciNetMATH
14.
Zurück zum Zitat Khuller, S., Matias, Y.: A simple randomized Sieve algorithm for the closest-pair problem. Inf. Comput. 118(1), 34–37 (1995)MathSciNetCrossRef Khuller, S., Matias, Y.: A simple randomized Sieve algorithm for the closest-pair problem. Inf. Comput. 118(1), 34–37 (1995)MathSciNetCrossRef
15.
Zurück zum Zitat Lopez, M.A., Liao, S.: Finding k-closest-pairs efficiently for high dimensional data (2000) Lopez, M.A., Liao, S.: Finding k-closest-pairs efficiently for high dimensional data (2000)
16.
Zurück zum Zitat Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 473–484. SIAM (2009)CrossRef Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 473–484. SIAM (2009)CrossRef
17.
Zurück zum Zitat Pereira, J.C., Lobo, F.G.: An optimized divide-and-conquer algorithm for the closest-pair problem in the planar case. J. Comput. Sci. Technol. 27(4), 891–896 (2012)MathSciNetCrossRef Pereira, J.C., Lobo, F.G.: An optimized divide-and-conquer algorithm for the closest-pair problem in the planar case. J. Comput. Sci. Technol. 27(4), 891–896 (2012)MathSciNetCrossRef
19.
Zurück zum Zitat Shamos, M.I., Hoey, D.: Closest-point problems. In: 16th Annual Symposium on Foundations of Computer Science, pp. 151–162. IEEE (1975) Shamos, M.I., Hoey, D.: Closest-point problems. In: 16th Annual Symposium on Foundations of Computer Science, pp. 151–162. IEEE (1975)
20.
Zurück zum Zitat Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. (TODS) 35(3), 20 (2010)CrossRef Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. (TODS) 35(3), 20 (2010)CrossRef
21.
Zurück zum Zitat Yao, A.C.-C.: Lower bounds for algebraic computation trees of functions with finite domains. SIAM J. Comput. 20(4), 655–668 (1991)MathSciNetCrossRef Yao, A.C.-C.: Lower bounds for algebraic computation trees of functions with finite domains. SIAM J. Comput. 20(4), 655–668 (1991)MathSciNetCrossRef
Metadaten
Titel
Efficient Approximate Algorithms for the Closest Pair Problem in High Dimensional Spaces
verfasst von
Xingyu Cai
Sanguthevar Rajasekaran
Fan Zhang
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93040-4_13