Skip to main content
Top
Published in: Knowledge and Information Systems 3/2015

01-06-2015 | Regular Paper

TDEP: efficiently processing top-k dominating query on massive data

Authors: Xixian Han, Jianzhong Li, Hong Gao

Published in: Knowledge and Information Systems | Issue 3/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In many applications including multicriteria decision making, top-k dominating query is a practically useful tool to return k tuples with the highest domination scores in a potentially huge data space. The existing algorithms, either requiring indexes built on the specific attribute subset, or incurring high I/O cost or memory cost, cannot process top-k dominating query on massive data efficiently. In this paper, a novel algorithm TDEP is proposed to utilize sorted lists built for each attribute with low cost to return top-k dominating result on massive data efficiently. Through analysis, it is found that TDEP can be divided into two phases: growing phase and shrinking phase. In each phase, TDEP retrieves the sorted lists in round-robin fashion and maintains the candidates until the stop condition is satisfied. The theoretical analysis is provided for the execution behavior in two phases. An efficient method is developed to compute the domination scores of tuples with the obtained candidates only. Besides, TDEP adopts early pruning to reduce the number of candidate tuples maintained significantly. The extensive experimental results, conducted on synthetic and real-life data sets, show the significant performance advantage of TDEP over the existing algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
In this paper, we consider that the tuples are in general position [24].
 
2
In the following section, we will see that the practical probability is even much higher.
 
Literature
1.
go back to reference Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd international conference on very large data bases, pp 495–506 Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd international conference on very large data bases, pp 495–506
2.
go back to reference Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426CrossRefMATH Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426CrossRefMATH
3.
go back to reference Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430 Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430
4.
go back to reference Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, pp 717–719 Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, pp 717–719
5.
go back to reference Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 102–113 Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 102–113
7.
go back to reference Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51CrossRef Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51CrossRef
8.
go back to reference Feller W (1968) An introduction to probability theory, vol 1, 3rd edn. Wiley, New YorkMATH Feller W (1968) An introduction to probability theory, vol 1, 3rd edn. Wiley, New YorkMATH
9.
go back to reference Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28CrossRef Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28CrossRef
10.
go back to reference Güntzer U, Balke W, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large data bases, pp 419–428 Güntzer U, Balke W, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large data bases, pp 419–428
11.
go back to reference Güntzer U, Balke W, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the international conference on information technology: coding and computing, pp 622–628 Güntzer U, Balke W, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the international conference on information technology: coding and computing, pp 622–628
12.
13.
go back to reference Han X, Li J, Wang J, Yang D (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535CrossRef Han X, Li J, Wang J, Yang D (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535CrossRef
14.
go back to reference Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384CrossRef Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384CrossRef
15.
go back to reference Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183CrossRef Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183CrossRef
16.
go back to reference Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11CrossRef Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11CrossRef
17.
go back to reference Kontaki M, Papadopoulos A, Manolopoulos Y (2012) Continuous top-k dominating queries. IEEE Trans Knowl Data Eng 24(5):840–853CrossRef Kontaki M, Papadopoulos A, Manolopoulos Y (2012) Continuous top-k dominating queries. IEEE Trans Knowl Data Eng 24(5):840–853CrossRef
18.
go back to reference Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data bases, pp 275–286 Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data bases, pp 275–286
19.
go back to reference Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 660–671 Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 660–671
20.
go back to reference Mamoulis N, Yiu M, Cheng K, Cheung D (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19CrossRef Mamoulis N, Yiu M, Cheng K, Cheung D (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19CrossRef
21.
go back to reference Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456CrossRef Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456CrossRef
22.
go back to reference Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82CrossRef Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82CrossRef
23.
go back to reference Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86CrossRef Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86CrossRef
24.
go back to reference Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116 Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116
25.
go back to reference Skoutas D, Sacharidis D, Simitsis A, Kantere V, Sellis T (2009) Top-k dominant web services under multi-criteria matching. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 898–909 Skoutas D, Sacharidis D, Simitsis A, Kantere V, Sellis T (2009) Top-k dominant web services under multi-criteria matching. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 898–909
26.
go back to reference Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606CrossRef Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606CrossRef
27.
go back to reference Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data bases, pp 301–310 Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data bases, pp 301–310
28.
go back to reference Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948CrossRef Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948CrossRef
29.
go back to reference Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242CrossRef Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242CrossRef
30.
go back to reference Yiu M, Mamoulis N (2007) Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd international conference on very large data, bases, pp 483–494 Yiu M, Mamoulis N (2007) Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd international conference on very large data, bases, pp 483–494
31.
go back to reference Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718CrossRef Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718CrossRef
32.
go back to reference Zhang W, Lin X, Zhang Y, Pei J, Wang W (2010) Threshold-based probabilistic top-k dominating queries. VLDB J 19(2):283–305CrossRef Zhang W, Lin X, Zhang Y, Pei J, Wang W (2010) Threshold-based probabilistic top-k dominating queries. VLDB J 19(2):283–305CrossRef
33.
go back to reference Zipf G (1949) Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge Zipf G (1949) Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge
Metadata
Title
TDEP: efficiently processing top-k dominating query on massive data
Authors
Xixian Han
Jianzhong Li
Hong Gao
Publication date
01-06-2015
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 3/2015
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0728-5

Other articles of this Issue 3/2015

Knowledge and Information Systems 3/2015 Go to the issue

Premium Partner