Skip to main content
Erschienen in: Knowledge and Information Systems 3/2021

13.01.2021 | Regular Paper

Efficient computation of deletion-robust k-coverage queries

verfasst von: Jiping Zheng, Xingnan Huang, Yuan Ma

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Extracting a controllable subset from a large-scale dataset so that users can fully understand the entire dataset is a significant topic for multicriteria decision making. In recent years, this problem has been widely studied, and various query models have been proposed, such as top-k, skyline, k-regret and k-coverage queries. Among these models, the k-coverage query is an ideal query method; this model has stability, scale invariance and high traversal efficiency. However, current methods including k-coverage queries cannot deal with deleting some points from the dataset while providing an effective solution set efficiently. In this paper, we study the robustness of k-coverage queries in two cases involving the dynamic deletion of data points. The first case is when it is assumed that the whole dataset can be obtained in advance, while the second is when the data points arrive in a stream. For a centralized dataset, we introduce a sieving mechanism and use a precalculated threshold to filter a coreset from the entire dataset. Then, the k-coverage query can be carried out on this small coreset instead of the entire dataset, and we propose a threshold-based k-coverage query algorithm, which greatly accelerates query processing. For a streaming dataset, a special chain structure is adopted. Furthermore, a single-pass streaming algorithm named Robust-Sieving is proposed. Moreover, the coreset-based method is extended to answer the problem. In addition, sampling techniques are adopted to accelerate query processing under these two circumstances. Extensive experiments verify the effectiveness of our proposed Robust-Sieving algorithm and the coreset-based algorithms with or without sampling.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Allenby R B, Slomson A (2010) How to count: an introduction to combinatorics, 2nd edn. Chapman & Hall/CRC, Boca RatonCrossRef Allenby R B, Slomson A (2010) How to count: an introduction to combinatorics, 2nd edn. Chapman & Hall/CRC, Boca RatonCrossRef
2.
Zurück zum Zitat Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization: Massive data summarization on the fly. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD), pp 671–680 Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization: Massive data summarization on the fly. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD), pp 671–680
3.
Zurück zum Zitat Bai M, Xin J, Wang G, Zhang L, Zimmermann R, Yuan Y, Wu X (2016) Discovering the \(k\) representative skyline over a sliding window. IEEE Trans Knowl Data Eng (TKDE) 28(8):2041–2056CrossRef Bai M, Xin J, Wang G, Zhang L, Zimmermann R, Yuan Y, Wu X (2016) Discovering the \(k\) representative skyline over a sliding window. IEEE Trans Knowl Data Eng (TKDE) 28(8):2041–2056CrossRef
4.
Zurück zum Zitat Börzsöny S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 421–430 Börzsöny S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 421–430
5.
Zurück zum Zitat Boykov YY, Jolly M (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 105–112 Boykov YY, Jolly M (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 105–112
6.
Zurück zum Zitat Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the international conference on management of data (SIGMOD), pp 503–514 Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the international conference on management of data (SIGMOD), pp 503–514
7.
Zurück zum Zitat Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) On high dimensional skylines. In: Proceedings of the international conference on extending database technology (EDBT), pp 478–495 Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) On high dimensional skylines. In: Proceedings of the international conference on extending database technology (EDBT), pp 478–495
8.
Zurück zum Zitat Chester S, Thomo A, Venkatesh S, Whitesides S (2014) Computing k-regret minimizing sets. In: Proceedings of the international conference on very large data bases (VLDB), pp 389–400 Chester S, Thomo A, Venkatesh S, Whitesides S (2014) Computing k-regret minimizing sets. In: Proceedings of the international conference on very large data bases (VLDB), pp 389–400
9.
Zurück zum Zitat Faulkner TK, Brackenbury W, Lall A (2015) k-regret queries with nonlinear utilities. In: Proceedings of the international conference on very large data bases (VLDB), pp 2098–2109 Faulkner TK, Brackenbury W, Lall A (2015) k-regret queries with nonlinear utilities. In: Proceedings of the international conference on very large data bases (VLDB), pp 2098–2109
10.
Zurück zum Zitat Feldman M, Karbasi A, Kazemi E (2018) Do less, get more: streaming submodular maximization with subsampling. In: Proceedings of advances in neural information processing systems (NIPS), pp 732–742 Feldman M, Karbasi A, Kazemi E (2018) Do less, get more: streaming submodular maximization with subsampling. In: Proceedings of advances in neural information processing systems (NIPS), pp 732–742
11.
Zurück zum Zitat Gomes R, Krause A (2010) Budgeted nonparametric learning from data streams. In: Proceedings of the international conference on international conference on machine learning (ICML), pp 391–398 Gomes R, Krause A (2010) Budgeted nonparametric learning from data streams. In: Proceedings of the international conference on international conference on machine learning (ICML), pp 391–398
12.
Zurück zum Zitat Huang X, Zheng J (2019) Deletion-robust k-coverage queries. In: Proceedings of international conference on database systems for advanced applications (DASFAA), pp 215–219 Huang X, Zheng J (2019) Deletion-robust k-coverage queries. In: Proceedings of international conference on database systems for advanced applications (DASFAA), pp 215–219
13.
Zurück zum Zitat Huang X, Zheng J (2019) Streaming deletion-robust k-coverage queries. In: Proceedings of the international symposium on spatial and temporal databases (SSTD), pp 170–173 Huang X, Zheng J (2019) Streaming deletion-robust k-coverage queries. In: Proceedings of the international symposium on spatial and temporal databases (SSTD), pp 170–173
14.
Zurück zum Zitat Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv (CSUR) 40(4):1–58CrossRef Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv (CSUR) 40(4):1–58CrossRef
15.
Zurück zum Zitat Kazemi E, Zadimoghaddam M, Karbasi A (2018) Scalable deletion-robust submodular maximization: data summarization with privacy and fairness constraints. In: Proceedings of the international conference on machine learning (ICML), pp 2549–2558 Kazemi E, Zadimoghaddam M, Karbasi A (2018) Scalable deletion-robust submodular maximization: data summarization with privacy and fairness constraints. In: Proceedings of the international conference on machine learning (ICML), pp 2549–2558
16.
Zurück zum Zitat Krause A, Singh A, Guestrin C (2008) Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res (JMLR) 9:235–284MATH Krause A, Singh A, Guestrin C (2008) Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res (JMLR) 9:235–284MATH
17.
Zurück zum Zitat Lee J, You G won, Hwang S won (2009) Personalized top-k skyline queries in high-dimensional space. Inf Syst 34(1):45–61CrossRef Lee J, You G won, Hwang S won (2009) Personalized top-k skyline queries in high-dimensional space. Inf Syst 34(1):45–61CrossRef
18.
Zurück zum Zitat Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the international conference on extending database technology (EDBT), pp 660–671 Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the international conference on extending database technology (EDBT), pp 660–671
19.
Zurück zum Zitat Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (HLT), pp 510–520 Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (HLT), pp 510–520
20.
Zurück zum Zitat Lin X, Yuan Y, Wei W, Lu H (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 502–513 Lin X, Yuan Y, Wei W, Lu H (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 502–513
21.
Zurück zum Zitat Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 86–95 Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 86–95
22.
Zurück zum Zitat Lu H, Jensen CS, Zhang Z (2011) Flexible and efficient resolution of skyline query size constraints. IEEE Trans Knowl Data Eng (TKDE) 23(7):991–1005CrossRef Lu H, Jensen CS, Zhang Z (2011) Flexible and efficient resolution of skyline query size constraints. IEEE Trans Knowl Data Eng (TKDE) 23(7):991–1005CrossRef
23.
Zurück zum Zitat Magnani M, Assent I, Mortensen ML (2014) Taking the big picture: representative skylines based on significance and diversity. VLDB J 23(5):795–815CrossRef Magnani M, Assent I, Mortensen ML (2014) Taking the big picture: representative skylines based on significance and diversity. VLDB J 23(5):795–815CrossRef
24.
Zurück zum Zitat Mindolin D, Chomicki J (2009) Discovering relative importance of skyline attributes. In: Proceedings of the international conference on very large data bases (VLDB), pp 610–621 Mindolin D, Chomicki J (2009) Discovering relative importance of skyline attributes. In: Proceedings of the international conference on very large data bases (VLDB), pp 610–621
25.
Zurück zum Zitat Mirzasoleiman B, Badanidiyuru A, Karbasi A, Vondrak J, Krause A (2015) Lazier than lazy greedy. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 1812–1818 Mirzasoleiman B, Badanidiyuru A, Karbasi A, Vondrak J, Krause A (2015) Lazier than lazy greedy. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 1812–1818
26.
Zurück zum Zitat Mirzasoleiman B, Karbasi A, Krause A (2017) Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”. In: Proceedings of the international conference on machine learning (ICML), pp 2449–2458 Mirzasoleiman B, Karbasi A, Krause A (2017) Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”. In: Proceedings of the international conference on machine learning (ICML), pp 2449–2458
27.
Zurück zum Zitat Nanongkai D, Lall A, Sarma AD, Makino K (2012) Interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 109–120 Nanongkai D, Lall A, Sarma AD, Makino K (2012) Interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 109–120
28.
Zurück zum Zitat Nanongkai D, Sarma AD, Lall A, Lipton RJ, Xu J (2010) Regret-minimizing representative databases. In: Proceedings of the international conference on very large data bases (VLDB), pp 1114–1124 Nanongkai D, Sarma AD, Lall A, Lipton RJ, Xu J (2010) Regret-minimizing representative databases. In: Proceedings of the international conference on very large data bases (VLDB), pp 1114–1124
29.
Zurück zum Zitat Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions–i. Math Program 14(1):265–294MathSciNetCrossRef Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions–i. Math Program 14(1):265–294MathSciNetCrossRef
30.
Zurück zum Zitat Orlin JB, Schulz AS, Udwani R (2016) Robust monotone submodular function maximization. In: Louveaux Q, Skutella M (eds) Integer programming and combinatorial optimization, pp 312–324 Orlin JB, Schulz AS, Udwani R (2016) Robust monotone submodular function maximization. In: Louveaux Q, Skutella M (eds) Integer programming and combinatorial optimization, pp 312–324
31.
Zurück zum Zitat Papadias D, Tao Y, Fu G, Seeger B (2003) An optimal and progressive algorithm for skyline queries. In: Proceedings of the international conference on management of data (SIGMOD), pp 467–478 Papadias D, Tao Y, Fu G, Seeger B (2003) An optimal and progressive algorithm for skyline queries. In: Proceedings of the international conference on management of data (SIGMOD), pp 467–478
32.
Zurück zum Zitat Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. TODS 30(1):41–82CrossRef Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. TODS 30(1):41–82CrossRef
33.
Zurück zum Zitat Peng P, Wong RC-W (2014) Geometry approach for k-regret query. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 772–783 Peng P, Wong RC-W (2014) Geometry approach for k-regret query. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 772–783
34.
Zurück zum Zitat Qi J, Zuo F, Samet H, Yao JC (2018) K-regret queries using multiplicative utility functions. TODS 43(2):10:1–10:41MathSciNetCrossRef Qi J, Zuo F, Samet H, Yao JC (2018) K-regret queries using multiplicative utility functions. TODS 43(2):10:1–10:41MathSciNetCrossRef
35.
Zurück zum Zitat Søholm M, Chester S, Assent I (2016) Maximum coverage representative skyline. In: Proceedings of the international conference on extending database technology (EDBT), pp 702–703 Søholm M, Chester S, Assent I (2016) Maximum coverage representative skyline. In: Proceedings of the international conference on extending database technology (EDBT), pp 702–703
36.
Zurück zum Zitat Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 896–905 Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 896–905
37.
Zurück zum Zitat Tao Y, Ding L, Lin X, Pei J (2009) Distance-based representative skyline. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 892–903 Tao Y, Ding L, Lin X, Pei J (2009) Distance-based representative skyline. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 892–903
38.
Zurück zum Zitat Wang S, Cheema MA, Zhang Y, Lin X (2015) Selecting representative objects considering coverage and diversity. In: Proceedings of the international ACM workshop on managing and mining enriched geo-spatial data (GeoRich), pp 31–38 Wang S, Cheema MA, Zhang Y, Lin X (2015) Selecting representative objects considering coverage and diversity. In: Proceedings of the international ACM workshop on managing and mining enriched geo-spatial data (GeoRich), pp 31–38
39.
Zurück zum Zitat Xie M, Wong RC-W, Lall A (2019) Strongly truthful interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 281–298 Xie M, Wong RC-W, Lall A (2019) Strongly truthful interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 281–298
40.
Zurück zum Zitat Xie M, Wong RC-W, Lall A (2020) An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J 29:147–175CrossRef Xie M, Wong RC-W, Lall A (2020) An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J 29:147–175CrossRef
41.
Zurück zum Zitat Xie M, Wong RC-W, Li J, Long C, Lall A (2018) Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In: Proceedings of the international conference on management of data (SIGMOD), pp 959–974 Xie M, Wong RC-W, Li J, Long C, Lall A (2018) Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In: Proceedings of the international conference on management of data (SIGMOD), pp 959–974
42.
Zurück zum Zitat Yufei T, Dimitris P (2006) Maintaining sliding window skylines on data streams. IEEE Trans Knowl Data Eng (TKDE) 18(3):377–391CrossRef Yufei T, Dimitris P (2006) Maintaining sliding window skylines on data streams. IEEE Trans Knowl Data Eng (TKDE) 18(3):377–391CrossRef
43.
Zurück zum Zitat Zeighami S, Wong RC-W (2016) Minimizing average regret ratio in database. In: Proceedings of the international conference on management of data (SIGMOD), pp 2265–2266 Zeighami S, Wong RC-W (2016) Minimizing average regret ratio in database. In: Proceedings of the international conference on management of data (SIGMOD), pp 2265–2266
Metadaten
Titel
Efficient computation of deletion-robust k-coverage queries
verfasst von
Jiping Zheng
Xingnan Huang
Yuan Ma
Publikationsdatum
13.01.2021
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2021
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-020-01540-6

Weitere Artikel der Ausgabe 3/2021

Knowledge and Information Systems 3/2021 Zur Ausgabe