Skip to main content

2016 | OriginalPaper | Buchkapitel

kdANN+: A Rapid AkNN Classifier for Big Data

verfasst von : Nikolaos Nodarakis, Evaggelia Pitoura, Spyros Sioutas, Athanasios Tsakalidis, Dimitrios Tsoumakos, Giannis Tzimas

Erschienen in: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A k-nearest neighbor (kNN) query determines the k nearest points, using distance metrics, from a given location. An all k-nearest neighbor (AkNN) query constitutes a variation of a kNN query and retrieves the k nearest points for each point inside a database. Their main usage resonates in spatial databases and they consist the backbone of many location-based applications and not only. In this work, we propose a novel method for classifying multidimensional data using an AkNN algorithm in the MapReduce framework. Our approach exploits space decomposition techniques for processing the classification procedure in a parallel and distributed manner. To our knowledge, we are the first to study the kNN classification of multidimensional objects under this perspective. Through an extensive experimental evaluation we prove that our solution is efficient, robust and scalable in processing the given queries.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The real dataset is part of the Canadian Planetary Emulation Terrain 3D Mapping Dataset, which is a collection of 3-dimensional laser scans gathered at two unique planetary analogue rover test facilities in Canada. The dataset provides the coordinates (xyz) for each laser scan in meters. http://​asrl.​utias.​utoronto.​ca/​datasets/​3dmap/​.
 
Literatur
1.
Zurück zum Zitat Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6, 1009–1020 (2013)CrossRef Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6, 1009–1020 (2013)CrossRef
2.
Zurück zum Zitat Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110. ACM, New York (2010) Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110. ACM, New York (2010)
3.
Zurück zum Zitat Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the KDD process. Knowl. Inf. Syst. 6, 728–749 (2004)CrossRef Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the KDD process. Knowl. Inf. Syst. 6, 728–749 (2004)CrossRef
4.
Zurück zum Zitat Chatzimilioudis, G., Zeinalipour-Yazti, D., Lee, W.-C., Dikaiakos, M. D.: Continuous all k-nearest-neighbor querying in smartphone networks. In: Proceedings of the 2012 IEEE 13th International Conference on Mobile Data Management, pp. 79–88. IEEE Computer Society, Washington, DC (2012) Chatzimilioudis, G., Zeinalipour-Yazti, D., Lee, W.-C., Dikaiakos, M. D.: Continuous all k-nearest-neighbor querying in smartphone networks. In: Proceedings of the 2012 IEEE 13th International Conference on Mobile Data Management, pp. 79–88. IEEE Computer Society, Washington, DC (2012)
5.
Zurück zum Zitat Chang, J., Luo, J., Huang, J.Z., Feng, S., Fan, J.: Minimum spanning tree based classification model for massive data with mapreduce implementation. In: Proceedings of the 10th IEEE International Conference on Data Mining Workshop, pp. 129–137. IEEE Computer Society, Washington, DC (2010) Chang, J., Luo, J., Huang, J.Z., Feng, S., Fan, J.: Minimum spanning tree based classification model for massive data with mapreduce implementation. In: Proceedings of the 10th IEEE International Conference on Data Mining Workshop, pp. 129–137. IEEE Computer Society, Washington, DC (2010)
6.
Zurück zum Zitat Chen, Y., Patel, J.M.: Efficient evaluation of all-nearest-neighbor queries. In: Proceedings of the 23rd IEEE International Conference on Data Engineering, pp. 1056–1065. IEEE Computer Society, Washington, DC (2007) Chen, Y., Patel, J.M.: Efficient evaluation of all-nearest-neighbor queries. In: Proceedings of the 23rd IEEE International Conference on Data Engineering, pp. 1056–1065. IEEE Computer Society, Washington, DC (2007)
7.
Zurück zum Zitat Coxeter, H.S.M.: Regular Polytopes. Dover Publications, New York (1973) Coxeter, H.S.M.: Regular Polytopes. Dover Publications, New York (1973)
8.
Zurück zum Zitat Cromwell, P.R.: Polyhedra. Cambridge University Press, Cambridge (1999)MATH Cromwell, P.R.: Polyhedra. Cambridge University Press, Cambridge (1999)MATH
9.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pp. 137–150. USENIX Association, Berkeley (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation, pp. 137–150. USENIX Association, Berkeley (2004)
10.
Zurück zum Zitat Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall, Upper Saddle River (2002) Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall, Upper Saddle River (2002)
11.
Zurück zum Zitat Eldawy, A.: SpatialHadoop: towards flexible and scalable spatial processing using mapreduce. In: Proceedings of the 2014 SIGMOD Ph.D. Symposium, pp. 46–50. ACM, New York (2014) Eldawy, A.: SpatialHadoop: towards flexible and scalable spatial processing using mapreduce. In: Proceedings of the 2014 SIGMOD Ph.D. Symposium, pp. 46–50. ACM, New York (2014)
12.
Zurück zum Zitat Emrich, T., Graf, F., Kriegel, H.-P., Schubert, M., Thoma, M.: Optimizing all-nearest-neighbor queries with trigonometric pruning. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 501–518. Springer, Heidelberg (2010)CrossRef Emrich, T., Graf, F., Kriegel, H.-P., Schubert, M., Thoma, M.: Optimizing all-nearest-neighbor queries with trigonometric pruning. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 501–518. Springer, Heidelberg (2010)CrossRef
13.
Zurück zum Zitat Gkoulalas-Divanis, A., Verykios, V.S., Bozanis, P.: A network aware privacy model for online requests in trajectory data. Data Knowl. Eng. 68, 431–452 (2009)CrossRef Gkoulalas-Divanis, A., Verykios, V.S., Bozanis, P.: A network aware privacy model for online requests in trajectory data. Data Knowl. Eng. 68, 431–452 (2009)CrossRef
14.
Zurück zum Zitat He, Q., Zhuang, F., Li, J., Shi, Z.: Parallel implementation of classification algorithms based on mapreduce. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS, vol. 6401, pp. 655–662. Springer, Heidelberg (2010)CrossRef He, Q., Zhuang, F., Li, J., Shi, Z.: Parallel implementation of classification algorithms based on mapreduce. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS, vol. 6401, pp. 655–662. Springer, Heidelberg (2010)CrossRef
15.
Zurück zum Zitat Ioup, E., Shaw, K., Sample, J., Abdelguerfi, M.: Efficient AKNN spatial network queries using the m-tree. In: Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems, pp. 46:1–46:4. ACM, New York (2007) Ioup, E., Shaw, K., Sample, J., Abdelguerfi, M.: Efficient AKNN spatial network queries using the m-tree. In: Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems, pp. 46:1–46:4. ACM, New York (2007)
16.
Zurück zum Zitat Lee, K., Ganti, R.K., Srivatsa, M., Liu, L.: Efficient spatial query processing for big data. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 469–472. ACM, New York (2014) Lee, K., Ganti, R.K., Srivatsa, M., Liu, L.: Efficient spatial query processing for big data. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 469–472. ACM, New York (2014)
17.
Zurück zum Zitat Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow. 5, 1016–1027 (2012)CrossRef Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow. 5, 1016–1027 (2012)CrossRef
18.
Zurück zum Zitat Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)CrossRef Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)CrossRef
19.
Zurück zum Zitat Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79. ACM, New York (1995) Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 71–79. ACM, New York (1995)
20.
21.
Zurück zum Zitat Stupar, A., Michel, S., Schenkel, R.: RankReduce - processing k-nearest neighbor queries on top of MapReduce. In: Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval, pp. 13–18 (2010) Stupar, A., Michel, S., Schenkel, R.: RankReduce - processing k-nearest neighbor queries on top of MapReduce. In: Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval, pp. 13–18 (2010)
23.
Zurück zum Zitat Tsoumakos, D., Konstantinou, I., Boumpouka, C., Sioutas, S., Koziris, N.: Automated, elastic resource provisioning for nosql clusters using tiramola. In: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 34–41 (2013) Tsoumakos, D., Konstantinou, I., Boumpouka, C., Sioutas, S., Koziris, N.: Automated, elastic resource provisioning for nosql clusters using tiramola. In: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 34–41 (2013)
24.
Zurück zum Zitat Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 495–506. ACM, New York (2010) Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 495–506. ACM, New York (2010)
25.
Zurück zum Zitat White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly Media/Yahoo Press (2012) White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly Media/Yahoo Press (2012)
26.
Zurück zum Zitat Xia, C., Lu, H., Chin, B., Hu, O.J.: Gorder: An efficient method for KNN join processing. In: VLDB, pp. 756–767. VLDB Endowment (2004) Xia, C., Lu, H., Chin, B., Hu, O.J.: Gorder: An efficient method for KNN join processing. In: VLDB, pp. 756–767. VLDB Endowment (2004)
27.
Zurück zum Zitat Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and KNN-joins in large relational databases (almost) for free. In: Proceedings of the 26th International Conference on Data Engineering, pp. 4–15. IEEE Computer Society, Washington, DC (2010) Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and KNN-joins in large relational databases (almost) for free. In: Proceedings of the 26th International Conference on Data Engineering, pp. 4–15. IEEE Computer Society, Washington, DC (2010)
28.
Zurück zum Zitat Yokoyama, T., Ishikawa, Y., Suzuki, Y.: Processing all k-nearest neighbor queries in hadoop. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol. 7418, pp. 346–351. Springer, Heidelberg (2012)CrossRef Yokoyama, T., Ishikawa, Y., Suzuki, Y.: Processing all k-nearest neighbor queries in hadoop. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol. 7418, pp. 346–351. Springer, Heidelberg (2012)CrossRef
29.
Zurück zum Zitat Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based KNN join processing for high-dimensional data. Inf. Softw. Technol. 49, 332–344 (2007)CrossRef Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based KNN join processing for high-dimensional data. Inf. Softw. Technol. 49, 332–344 (2007)CrossRef
30.
Zurück zum Zitat Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 38–49. ACM, New York (2012) Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 38–49. ACM, New York (2012)
31.
Zurück zum Zitat Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-nearest-neighbors queries in spatial databases. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 297–306. IEEE Computer Society, Washington, DC (2004) Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-nearest-neighbors queries in spatial databases. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 297–306. IEEE Computer Society, Washington, DC (2004)
Metadaten
Titel
kdANN+: A Rapid AkNN Classifier for Big Data
verfasst von
Nikolaos Nodarakis
Evaggelia Pitoura
Spyros Sioutas
Athanasios Tsakalidis
Dimitrios Tsoumakos
Giannis Tzimas
Copyright-Jahr
2016
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-49214-7_5

Neuer Inhalt