Skip to main content
Top

2020 | OriginalPaper | Chapter

LSR-Forest: An LSH-Based Approximate k-Nearest Neighbor Query Algorithm on High-Dimensional Uncertain Data

Authors : Jiagang Wang, Tu Qian, Anbang Yang, Hui Wang, Jiangbo Qian

Published in: Data Science

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Uncertain data is widely used in many practical applications, such as data cleaning, location-based services, privacy protection and so on. With the development of technology, the data has a tendency to high-dimensionality. The most common indexes for nearest neighbor search on uncertain data are the R-Tree and the KD-Tree. These indexes will inevitably bring about “curse of dimension”. Focus on this problem, this paper proposes a new hash algorithm, called the LSR-forest, which based on the locality-sensitive hashing and the R-Tree, to solve the high-dimensional uncertain data approximate neighbor search problem. The LSR-forest can hash similar high dimensional uncertain data into a same bucket with a high probability, and then constructs multiple R-Tree-based indexes for hashed buckets. When querying, it is possible to judge neighbors by checking the data in the hypercube which the query point is in. One can also adjust the query range automatically by different parameter k. Many experiments on different data sets are presented in this paper. The results show that LSR-forest has better effectiveness and efficiency than R-Tree on high-dimensional datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Xiaoye, M., Yunjun, G., Gang, C.: Processing incomplete k nearest neighbor search. IEEE Trans. Fuzzy Syst. 24(6), 1349–1363 (2016)CrossRef Xiaoye, M., Yunjun, G., Gang, C.: Processing incomplete k nearest neighbor search. IEEE Trans. Fuzzy Syst. 24(6), 1349–1363 (2016)CrossRef
2.
go back to reference Sistla, A., Wolfson, O., Xu, B.: Continuous nearest-neighbor queries with location uncertainty. VLDB 24(1), 25–50 (2015)CrossRef Sistla, A., Wolfson, O., Xu, B.: Continuous nearest-neighbor queries with location uncertainty. VLDB 24(1), 25–50 (2015)CrossRef
3.
go back to reference Jian, L., Haiao, W.: Range queries on uncertain data. Theoret. Comput. Sci. 609(1), 32–48 (2016)MathSciNet Jian, L., Haiao, W.: Range queries on uncertain data. Theoret. Comput. Sci. 609(1), 32–48 (2016)MathSciNet
4.
go back to reference Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Chao, H.C.: Mining weighted frequent itemsets without candidate generation in uncertain databases. Int. J. Inf. Technol. Decis. Making 16(06), 1549–1579 (2017) Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Chao, H.C.: Mining weighted frequent itemsets without candidate generation in uncertain databases. Int. J. Inf. Technol. Decis. Making 16(06), 1549–1579 (2017)
5.
go back to reference Ebrahimnejad, A., Tavana, M., Nasseri, S.H., Gholami, O.F.: A new method for solving dual DEA problems with fuzzy stochastic data. Int. J. Inf. Technol. Decis. Making 18(01), 147–170 (2019) Ebrahimnejad, A., Tavana, M., Nasseri, S.H., Gholami, O.F.: A new method for solving dual DEA problems with fuzzy stochastic data. Int. J. Inf. Technol. Decis. Making 18(01), 147–170 (2019)
6.
go back to reference Jianhua, J., Yujun, C., Xianqiu, M., Limin, W., Keqin, L.: A novel density peaks clustering algorithm based on k nearest neighbours for improving assignment process. Phys. A 523, 702–713 (2019)CrossRef Jianhua, J., Yujun, C., Xianqiu, M., Limin, W., Keqin, L.: A novel density peaks clustering algorithm based on k nearest neighbours for improving assignment process. Phys. A 523, 702–713 (2019)CrossRef
7.
go back to reference Jiang, J., Chen, Y., Hao, D., Li, K.: DPC-LG: density peaks clustering based on logistic distribution and gravitation. Phys. A 514, 25–35 (2019)CrossRef Jiang, J., Chen, Y., Hao, D., Li, K.: DPC-LG: density peaks clustering based on logistic distribution and gravitation. Phys. A 514, 25–35 (2019)CrossRef
8.
go back to reference Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD International Conference on Management of Data, New York, NY, USA, vol. 14(2), pp. 47–57 (1984) Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD International Conference on Management of Data, New York, NY, USA, vol. 14(2), pp. 47–57 (1984)
9.
go back to reference Peng, Y., Li, H., Cui, J.: An efficient range query model over encrypted outsourced data using secure k-d tree. In: International Conference on Networking and Network Applications, pp. 250–253 (2016) Peng, Y., Li, H., Cui, J.: An efficient range query model over encrypted outsourced data using secure k-d tree. In: International Conference on Networking and Network Applications, pp. 250–253 (2016)
10.
go back to reference Weber, R., Schek, H., Blot, S.: A quantitative analysis and performance study for similarity search methods in high-dimensional spaces. In: International Conference on Very Large Data Bases, New York, pp. 194–205 (1998) Weber, R., Schek, H., Blot, S.: A quantitative analysis and performance study for similarity search methods in high-dimensional spaces. In: International Conference on Very Large Data Bases, New York, pp. 194–205 (1998)
11.
go back to reference Zhenyun, D., Xiaoshu, Z., Debo, C., Ming, Z., Shichao, Z.: Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)CrossRef Zhenyun, D., Xiaoshu, Z., Debo, C., Ming, Z., Shichao, Z.: Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)CrossRef
12.
go back to reference Giyasettin Ozcan, F.: Unsupervised learning from multi-dimensional data: a fast clustering algorithm utilizing canopies and statistical information. Int. J. Inf. Technol. Decis. Making 17(03), 841–856 (2018)CrossRef Giyasettin Ozcan, F.: Unsupervised learning from multi-dimensional data: a fast clustering algorithm utilizing canopies and statistical information. Int. J. Inf. Technol. Decis. Making 17(03), 841–856 (2018)CrossRef
13.
go back to reference Cheng, R., Dmitri, V., Sunil, P.: Evaluating probabilistic queries over imprecise data. In: ACM SIGMOD International Conference on Management of Data, New York, USA, pp. 551–562 (2003) Cheng, R., Dmitri, V., Sunil, P.: Evaluating probabilistic queries over imprecise data. In: ACM SIGMOD International Conference on Management of Data, New York, USA, pp. 551–562 (2003)
14.
go back to reference Lianmeng, J., Xiaojiao, G., Quanpan, B.: KNN: k-nearest neighbor classifier with pairwise distance metrics and belief function theory. IEEE Access 7, 48935–48947 (2019)CrossRef Lianmeng, J., Xiaojiao, G., Quanpan, B.: KNN: k-nearest neighbor classifier with pairwise distance metrics and belief function theory. IEEE Access 7, 48935–48947 (2019)CrossRef
15.
go back to reference Ljosa, V., Singh, A.: APLA: indexing arbitrary probability distributions. In: Proceedings ICDE, Turkey, pp. 946–955 (2007) Ljosa, V., Singh, A.: APLA: indexing arbitrary probability distributions. In: Proceedings ICDE, Turkey, pp. 946–955 (2007)
16.
go back to reference Reynold, C., Prabhakar, S., Dmitri, V.: Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. J. 16(9), 1112–1127 (2003) Reynold, C., Prabhakar, S., Dmitri, V.: Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. J. 16(9), 1112–1127 (2003)
18.
go back to reference Reynold, C., Jinchuan, C., Mohamed, M.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Data Engineering (ICDE). Piscataway, NJ, pp. 973–982. IEEE (2008) Reynold, C., Jinchuan, C., Mohamed, M.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Data Engineering (ICDE). Piscataway, NJ, pp. 973–982. IEEE (2008)
19.
go back to reference Reynold, C., Lei, C., Jinchuan, C.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Extending Database Technology, New York, pp. 672–683 (2009) Reynold, C., Lei, C., Jinchuan, C.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Extending Database Technology, New York, pp. 672–683 (2009)
20.
go back to reference Gionis, A., Indyky, P., Motwaniz, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases, Cairo, Egypt, pp. 518–529 (1998) Gionis, A., Indyky, P., Motwaniz, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases, Cairo, Egypt, pp. 518–529 (1998)
Metadata
Title
LSR-Forest: An LSH-Based Approximate k-Nearest Neighbor Query Algorithm on High-Dimensional Uncertain Data
Authors
Jiagang Wang
Tu Qian
Anbang Yang
Hui Wang
Jiangbo Qian
Copyright Year
2020
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-2810-1_17

Premium Partner