ABSTRACT
In the age of data collection, machine learning algorithms have to be able to efficiently cope with vast data sets. This requires scalable algorithms and efficient implementations that can cope with heterogeneous hardware. We propose a new, performance-portable implementation of a well-known, robust, and versatile multi-class classification method that supports multiple Graphics Processing Units (GPUs) from different vendors. It is based on a performance-portable implementation of the approximate k-nearest neighbors (k-NN) algorithm in SYCL. The k-NN assigns a class to a data point based on a majority vote of its neighborhood. The naive approach compares a data point x to all other data points in the training data to identify the k nearest ones. However, this has quadratic runtime and is infeasible for large data sets. Therefore, approximate variants have been developed. Such an algorithm is the Locality-Sensitive Hashing (LSH) algorithm, which uses hash tables together with locality-sensitive hash functions to reduce the data points that have to be examined to compute the k-NN.
To the best of our knowledge, there is no distributed LSH version supporting multiple GPUs from different vendors available so far despite the fact that k-NNs are frequently employed. Therefore, we have developed the library. It provides the first hardware-independent, yet efficient and distributed implementation of the LSH algorithm that is suited for modern supercomputers. The implementation uses C++17 together with SYCL 1.2.1, which is an abstraction layer for OpenCL that allows targeting different hardware with a single implementation. To support large data sets, we utilize multiple GPUs using the Message Passing Interface (MPI) to enable the usage of both shared and distributed memory systems.
We have tested different parameter combinations for two locality-sensitive hash function implementations, which we compare. Our results show that our library can easily scale on multiple GPUs using both hash function types, achieving a nearly optimal parallel speedup of up to 7.6 on 8 GPUs. Furthermore, we demonstrate that the library supports different SYCL implementations—ComputeCpp, hipSYCL, and DPC++—to target different hardware architectures without significant performance differences.
- Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL. In Proceedings of the International Workshop on OpenCL. ACM. https://doi.org/10.1145/3388333.3388658Google ScholarDigital Library
- Alexandr Andoni, Ilya Razenshteyn, and Negev Shekel Nosatzki. 2017. LSH Forest: Practical Algorithms Made Theoretical. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611974782.5Google ScholarCross Ref
- Bahman Bahmani, Ashish Goel, and Rajendra Shinde. 2012. Efficient distributed locality sensitive hashing. In Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12. ACM Press. https://doi.org/10.1145/2396761.2398596Google ScholarDigital Library
- P. Baldi, P. Sadowski, and D. Whiteson. 2014. Searching for exotic particles in high-energy physics with deep learning. Nature Communications 5, 1 (jul 2014).Google ScholarCross Ref
- Mayank Bawa, Tyson Condie, and Prasanna Ganesan. 2005. LSH forest. In Proceedings of the 14th international conference on World Wide Web - WWW '05. ACM Press. https://doi.org/10.1145/1060745.1060840Google ScholarDigital Library
- David Beckingsale, Richard Hornung, Tom Scogland, and Arturo Vargas. 2019. Performance portable C++ programming with RAJA. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 455–456.Google ScholarDigital Library
- Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (sep 1975), 509–517. https://doi.org/10.1145/361002.361007Google ScholarDigital Library
- Thomas Cover and P. Hart. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 1 (Jan. 1967), 7 pages. https://doi.org/10.1109/tit.1967.1053964Google ScholarDigital Library
- Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual ACM symposium on Computational geometry. ACM Press.Google ScholarDigital Library
- H. Carter Edwards, Christian R. Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74, 12 (2014), 3202 – 3216. https://doi.org/10.1016/j.jpdc.2014.07.003Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.Google ScholarDigital Library
- Ioannis Z. Emiris, Alexandros Konstantinakis-Karmis, Dimitri Nicolopoulos, and A. Thanos-Filis. 2012. Data structures for approximate nearest neighbor search. Techn. Ber. Technical Report CGLTR-29, NKUA(2012).Google Scholar
- Ioannis Z. Emiris and Dimitri Nicolopoulos. 2013. Randomized kd-trees for approximate nearest neighbor search. CGL-TR-78, NKUA, Tech. Rep.(2013).Google Scholar
- Jerome H. Friedman. 1991. Multivariate Adaptive Regression Splines. The Annals of Statistics 19, 1 (mar 1991), 1–67.Google ScholarCross Ref
- Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Software 3, 3 (sep 1977), 209–226. https://doi.org/10.1145/355744.355745Google ScholarDigital Library
- Parisa Haghani, Sebastian Michel, and Karl Aberer. 2009. Distributed similarity search in high dimensions using locality sensitive hashing. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 744–755.Google ScholarDigital Library
- Parisa Haghani, Sebastian Michel, Karl Aberer, 2008. Lsh at large-distributed knn search in high dimensions. In 11th International Workshop on the Web and Databases, WebDB.Google Scholar
- Jared Hoberock, Michael Garland, Chris Kohlhoff, Chris Mysen, Carter Edwards, Gordon Brown, David Hollman, Lee Howes, Kirk Shoop, Lewis Baker, and Eric Niebler. 2020. A Unified Executors Proposal for C++ (P0443R14). Technical Report.Google Scholar
- Jared Hoberock, Michael Garland, Chris Kohlhoff, Chris Mysen, Carter Edwards, Gordon Brown, and Michael Wong. 2018. Executors Design Document (P0761R2). Technical Report.Google Scholar
- Jared Hoberock and Chris Kohlhoff. 2019. C++20 Executors are Resilient to ABI Breakage (P1405R0). Technical Report.Google Scholar
- Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors. In Proceedings of the thirtieth annual ACM symposium on Theory of computing.Google ScholarDigital Library
- Nicolai Josuttis. 2018. hash_combine() Again (P0814R2). Proposal P0814R2.Google Scholar
- Beis J.S. and D.G. Lowe. [n.d.]. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Comput. Soc. https://doi.org/10.1109/cvpr.1997.609451Google ScholarCross Ref
- Ronan Keryell, Maria Rovatsou, and Lee Howes. 2020. SYCL Specification 1.2.1: SYCL integrates OpenCL devices with modern C++. Khronos SYCL Working Group. Revision: 7.Google Scholar
- Khronos OpenCL Working Group. 2020. The OpenCL Specification.Google Scholar
- Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press. https://doi.org/10.1017/CBO9781139924801Google ScholarCross Ref
- Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases. 950–961.Google Scholar
- Marius Muja and David G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration.VISAPP (1) 2, 331-340 (2009), 2.Google Scholar
- Jia Pan and Dinesh Manocha. 2012. Bi-level Locality Sensitive Hashing for k-Nearest Neighbor Computation. In 2012 IEEE 28th International Conference on Data Engineering. IEEE. https://doi.org/10.1109/icde.2012.40Google ScholarDigital Library
- Rina Panigrahy. 2006. Entropy based nearest neighbor search in high dimensions. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm - SODA '06. ACM Press. https://doi.org/10.1145/1109557.1109688Google ScholarCross Ref
- Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Pradeep Dubey, 2016. Panda: Extreme scale parallel k-nearest neighbor on distributed architectures. In 2016 IEEE international parallel and distributed processing symposium (IPDPS). IEEE, 494–503.Google Scholar
- Parikshit Ram and Kaushik Sinha. 2019. Revisiting kd-tree for Nearest Neighbor Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3292500.3330875Google ScholarDigital Library
- Gomes da Silva Ramon, Matheus Henrique Dal Molin Ribeiro, Viviana Cocco Mariani, and Leandro dos Santos Coelho. 2020. Forecasting Brazilian and American COVID-19 cases based on artificial intelligence coupled with climatic exogenous variables. Chaos, Solitons & Fractals 139 (Oct. 2020), 110027 pages.Google Scholar
- Warda M. Shaban, Asmaa H. Rabie, Ahmed I. Saleh, and M.A. Abo-Elsoud. 2020. A new COVID-19 Patients Detection Strategy (CPDS) based on hybrid feature selection and enhanced KNN classifier. Knowledge-Based Systems 205 (Oct. 2020).Google Scholar
- Chanop Silpa-Anan and Richard Hartley. 2008. Optimised KD-trees for fast image descriptor matching. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. https://doi.org/10.1109/cvpr.2008.4587638Google ScholarCross Ref
- Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden, and Pradeep Dubey. 2013. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proceedings of the VLDB Endowment 6, 14 (sep 2013), 1930–1941. https://doi.org/10.14778/2556549.2556574Google ScholarDigital Library
- Qiang Wang, Zhiyuan Guo, Gang Liu, and Jun Guo. 2012. Entropy based locality sensitive hashing. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.Google ScholarCross Ref
- Roger Weber, Hans-Jörg Schek, and Stephen Blott. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. 98. 194–205.Google Scholar
- Chi Zhang, Feifei Li, and Jeffrey Jestes. 2012. Efficient parallel kNN joins for large data in MapReduce. In Proceedings of the 15th international conference on extending database technology. 38–49.Google ScholarDigital Library
- Peter Zuzek. 2016. Implementation of the SYCL Heterogeneous Computing Library: Masters Thesis: the 2nd Cycle Masters Study Programme Computer and Information Science. (2016).Google Scholar
Index Terms
- Performance-Portable Distributed k-Nearest Neighbors using Locality-Sensitive Hashing and SYCL
Recommendations
Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisIn this paper, we evaluate the portability of the SYCL programming model on some of the latest CPUs and GPUs from a wide range of vendors, utilizing the two main compilers: DPC++ and hipSYCL/OpenSYCL. Both compilers currently support GPUs from all three ...
Improving Performance Portability of the Procedurally Generated High Energy Physics Event Generator MadGraph Using SYCL
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCLEvent Generators are essential tools for simulating Standard Model particle interactions, representing the initial step in modeling proton-proton collisions in the Large Hadron Collider (LHC) at CERN. Traditionally relying on a few algorithms like ...
SYCL-Bench 2020: Benchmarking SYCL 2020 on AMD, Intel, and NVIDIA GPUs
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCLToday, the SYCL standard represents the most advanced programming model for heterogeneous computing, delivering both productivity, portability, and performance in pure C++17. SYCL 2020, in particular, represents a major enhancement that pushes the ...
Comments