skip to main content
10.1145/3167132.3167226acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Adaptive kNN using expected accuracy for classification of geo-spatial data

Published:09 April 2018Publication History

ABSTRACT

The k-Nearest Neighbor (kNN) classification approach is conceptually simple - yet widely applied since it often performs well in practical applications. However, using a global constant k does not always provide an optimal solution, e. g., for datasets with an irregular density distribution of data points. This paper proposes an adaptive kNN classifier where k is chosen dynamically for each instance (point) to be classified, such that the expected accuracy of classification is maximized. We define the expected accuracy as the accuracy of a set of structurally similar observations. An arbitrary similarity function can be used to find these observations. We introduce and evaluate different similarity functions. For the evaluation, we use five different classification tasks based on geo-spatial data. Each classification task consists of (tens of) thousands of items. We demonstrate, that the presented expected accuracy measures can be a good estimator for kNN performance, and the proposed adaptive kNN classifier outperforms common kNN and previously introduced adaptive kNN algorithms. Also, we show that the range of considered k can be significantly reduced to speed up the algorithm without negative influence on classification accuracy.

References

  1. Gianni Barlacchi, Marco De Nadai, Roberto Larcher, Antonio Casella, Cristiana Chitic, Giovanni Torrisi, Fabrizio Antonelli, Alessandro Vespignani, Alex Pentland, and Bruno Lepri. 2015. A multi-source dataset of urban life in the city of Milan and the Province of Trentino. Scientific Data 2 (Oct. 2015), 150055.Google ScholarGoogle Scholar
  2. Martin Becker, Saverio Caminiti, Donato Fiorella, Louise Francis, Pietro Gravino, Mordechai (Muki) Haklay, Andreas Hotho, Vittorio Loreto, Juergen Mueller, Ferdinando Ricchiuti, Vito D. P. Servedio, Alina Sîrbu, and Francesca Tria. 2013. Awareness and Learning in Participatory Noise Sensing. PLOS ONE 8, 12 (2013).Google ScholarGoogle Scholar
  3. Y. Fang, R. Cheng, W. Tang, S. Maniu, and X. Yang. 2016. Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data. IEEE Transactions on Knowledge and Data Engineering 28, 3 (March 2016), 785--800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tom Fawcett. 2006. An Introduction to ROC Analysis. Pattern Recogn. Lett. 27, 8 (June 2006), 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Evelyn Fix and Joseph L Hodges Jr. 1951. Discriminatory analysis-nonparametric discrimination: consistency properties. Technical Report. California Univ Berkeley.Google ScholarGoogle Scholar
  6. B Apexa Kamdar and K Ishan Rajani. 2015. Improved Adaptive K Nearest Neighbor algorithm using MapReduce. International Journal of Science, Engineering and Technology Research (IJSETR) 4, 6 (2015).Google ScholarGoogle Scholar
  7. David J. Ketchen and Christopher L. Shook. 1996. The Application of Cluster Analysis in Strategic Management Research: an Analysis and Critique. Strategic Management Journal 17, 6 (1996), 441--458. <441::AID-SMJ819>3.0.CO;2-GGoogle ScholarGoogle ScholarCross RefCross Ref
  8. S. B. Kotsiantis. 2007. Supervised Machine Learning: A Review of Classification Techniques. Informatica 31, 3 (2007). http://www.informatica.si/index.php/informatica/article/view/148Google ScholarGoogle Scholar
  9. Stefanos Ougiaroglou, Alexandros Nanopoulos, Apostolos N. Papadopoulos, Yannis Manolopoulos, and Tatjana Welzer-Druzovec. 2007. Adaptive k-Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors. In 11th East-European Conference on Advances in Databases and Information Systems (ADBIS 2007). Springer, 66--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Vandana Roy and Sriganesh Madhvanath. 2008. A Skew-tolerant Strategy and Confidence Measure for k-NN Classification of Online Handwritten Characters. Technical Report 4. Tech. rep., HP Laboratories. 48.Google ScholarGoogle Scholar
  11. Robin Senge, Stefan Bösner, Krzysztof Dembczynski, Jörg Haasenritter, Oliver Hirsch, Norbert Donner-Banzhoff, and Eyke Hüllermeier. 2014. Reliable Classification: Learning Classifiers that Distinguish Aleatoric and Epistemic Uncertainty. Inform. Sci. 255 (2014), 16--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shiliang Sun and Rongqing Huang. 2010. An Adaptive k-Nearest Neighbor Algorithm. In 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), Vol. 1. IEEE, 91--94.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jigang Wang, Predrag Neskovic, and Leon N. Cooper. 2006. Neighborhood Size Selection in the k-Nearest-Neighbor Rule Using Statistical Confidence. Pattern Recogn. 39, 3 (2006), 417--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jigang Wang, Predrag Neskovic, and Leon N. Cooper. 2007. Improving Nearest Neighbor Rule with a Simple Adaptive Distance Measure. Pattern Recogn. Lett. 28, 2 (2007), 207--213. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive kNN using expected accuracy for classification of geo-spatial data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
      April 2018
      2327 pages
      ISBN:9781450351911
      DOI:10.1145/3167132

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 April 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader