nach oben

World Wide Web

Erschienen in:

13.05.2016

Multiple k nearest neighbor search

verfasst von: Yu-Chi Chung, I-Fang Su, Chiang Lee, Pei-Chi Liu

Erschienen in: World Wide Web | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time.

Vorheriger Artikel Identifying diverse reviews about products

Nächster Artikel A fast method to exactly calculate the diameter of incremental disconnected graphs

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Foursquer. https://foursquare.com/

Google+. https://plus.google.com/

The north east dataset. http://www.rtreeportal.org/

Yelp. http://www.yelp.com/

Bohm, C., Ooi, B. C., Plant, C., Yan, Y.: Efficiently Processing Continuous K-Nn Queries on Data Streams. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 156–165 (2007)

Braunmuller, B., Ester, M., Kriegel, H.P., Sander, J.: Multiple similarity queries: a basic dbms operation for mining in metric databases. IEEE Trans. Knowl. Data Eng. 13(1), 79–95 (2001)CrossRef

Chávez, E., Navarro, G., Baeza-Yates, R., Marroguin, J.L.: Searching in metric spaces. J. ACM Comput. Surv. (CSUR) 33(3), 273–321 (2001)CrossRef

Cui, B., Ooi, B.C., Su, J., Tan, K.L.: Indexing high-dimensional data for efficient in-memory similarity search. IEEE Trans. Knowl. Data Eng. 17(3), 339–353 (2005)CrossRef

Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining (1996)

10.

Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. (TODS) 24(2), 265–318 (2010)CrossRef

11.

Jagadish, H.V., Ooi, B.C., lee Tan, K., Yu, C., Zhang, R.: Idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. (TODS) 30(2), 364–397 (2005)CrossRef

12.

Jiazhu, D., Zhilong, L.: A Location Authentication Scheme Based on Proximity Test of Location Tags. In: Proceedings of 2013 International Conference on Information and Network Security (ICINS 2013), pp. 1–6 (2013)

13.

Kolahdouzan, M., Shahabi, C.: Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 840–851 (2004)

14.

Korn, F., Muthukrishnan, S.: Influence Sets Based on Reverse Nearest Neighbor Queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 201–212 (2000)

15.

Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast Nearest Neighbor Search in Medical Image Databases (1996). CA, USA

16.

Lu, H., Ooi, B.C., Shen, H.T., Xue, X.: Hierarchical indexing structure for efficient similarity search in video retrieval. IEEE Trans. Knowl. Data Eng. 18(11), 1544–1559 (2006)CrossRef

17.

Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-Probe Lsh: Efficient Indexing for High-Dimensional Similarity Search. In: Proceedings of the 33Rd International Conference on Very Data Bases, pp. 950–961 (2007)

18.

Mokbel, M.F., Xiong, X., Aref, W.G.: Sina: Scalable Incremental Processing of Continuous Queries in Spatio-Temporal Databases. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 623–634 (2004)

19.

Mouratidis, K., Papadias, D.: Continuous nearest neighbor queries over sliding windows. IEEE Trans. Knowl. Data Eng. 19(6), 789–803 (2007)CrossRef

20.

Papadias, D., Tao, Y., Mouratidis, K., Hui, C.K.: Aggregate nearest neighbor queries in spatial databases. ACM Trans. Database Syst. 30(2), 529–576 (2005)CrossRef

21.

Rao, J., Ross, K.A.: Making B+-Trees Cache Conscious in Main Memory. In: ACM SIGMOD Record, vol. 29, pp. 475–486 (2000)

22.

Roussopoulos, N., Kelly, S., Vncent, F.: Nearest Neighbor Queries. In: ACM SIGMOD International Conference on Management of Data. New Your, USA, pp. 71–79 (1995)

23.

Saraiva, P.C., de Moura, E.S., Ziviani, N., andRodrigo Fonseca, W.M., Riberio-Neto, B.: Rank-Preserving Two-Level Caching for Scalable Search Engines. In: Proceedings of the 24Th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 51–58 (2001)

24.

Seidl, T., Kriegel, H.P.: Efficient User-Adaptable Similarity Search in Large Multimedia Database. In: Proceedings of the 23Rd International Conference on Very Large Data Bases, pp. 506–515 (1997)

25.

Sharifzadeh, M., Shahabi, C.: Vor-Tree: R-trees with Voronoi Diagrams for Efficient Processing of Spatial Nearest Neighbor Queries. In: Proceedings of the VLDB Endowment (2010)

26.

Tao, Y., Papadias, D., Lian, X.: Reverse Knn Search in Arbitrary Dimensionality. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, pp. 744–755 (2004)

27.

Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and Efficiency in High Dimensional Nearest Neighbor Search. In: ACM SIGMOD (2009)

28.

Teevan, J., Adar, E., Jones, R., Potts, M.: History Repeats Itself: Repeat Queries in Yahoo’s Query Logs. In: Proceedings of the 29Th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR ’06), pp. 703–704 (2005)

29.

Weber, R., Schek, H., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB, pp. 194–205 (1998)

30.

Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: Simd-scan: ultra fast in-memory table scan using on-chip vector processing units. Proc. VLDB Endowment 2(1), 385–394 (2009)CrossRef

31.

Xie, Y., O’Hallaron, D.: Locality in Search Engine Queries and Its Implications for Caching. In: Proceedings of 21 Annual Joint Conference of the IEEE Computer and Communications Societies (Infocom), pp. 1238–1247 (2002)

32.

Xiong, X., Mokbel, M.F., Aref, W.G.: Sea-Cnn: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-Temporal Databases. In: Proceedings of the 21St International Conference on Data Engineering, pp. 643–654 (2005)

33.

Yang, C., Lin, K.I.: An Index Structure for Efficient Reverse Nearest Neighbor Queries. In: Proceedings of the 2001 International Conference on Data Engineering, pp. 485–492 (2001)

34.

Zhang, J., Zhu, M., Papadias, D., Tao, Y., Lee, D.L.: Location-Based Spatial Queries. In: ACM SIGMOD Int. Conf. Manag. Data. NY, USA, pp. 443–454 (2003)

35.

Zhang, R., Stradling, M.: The hv-tree: a memory hierarchy aware version index. Proc. VLDB Endowment 3(1-2), 397–408 (2010)CrossRef

36.

Zhuang, Y., Li, Q., Chen, L.: Multi-Query Optimization for Distributed Similarity Query Processing. In: International Conference on Distributed Computing Systems (ICDCS) (2008)

Titel: Multiple k nearest neighbor search
verfasst von: Yu-Chi Chung
I-Fang Su
Chiang Lee
Pei-Chi Liu
Publikationsdatum: 13.05.2016
Verlag: Springer US
Erschienen in: World Wide Web / Ausgabe 2/2017
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI: https://doi.org/10.1007/s11280-016-0392-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 2/2017

A probabilistic method for emerging topic tracking in Microblog stream

Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier

Twitter summarization with social-temporal context

Categorical top-k spatial influence query

Adaptive task scheduling strategy in cloud: when energy consumption meets performance guarantee

Identifying diverse reviews about products