Skip to main content

2018 | OriginalPaper | Buchkapitel

Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce

verfasst von : Xiangwu Ding, Chao Yan, Yuan Zhao, Zewei Yang

Erschienen in: Cloud Computing and Security

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Top-k dominating queries, which return the k best items with a comprehensive “goodness” criterion based on dominance, have attracted considerable attention recently due to its important role in many data mining applications including multi-criteria decision making. In the Big Data era, the modes of data storage and processing are becoming distributed, and data is incomplete commonly in some real applications. The related existing researches focus on centralized datasets, or on complete data in distributed environments, and do not involve incomplete data in distributed environments. In this work, we present the first study for processing top-k dominating queries on incomplete data in distributed environments. We show that, through detailed analysis, even though the dominance relation on incomplete data objects is non-transitive in general, the transitive dominance relation holds for some incomplete data objects with different bitmaps. We then propose an novel algorithm TKDI-MR based on MapReduce for processing TKD queries on incomplete data in distributed environments utilizing the aforementioned property. Extensive experiments with both real-world and large-scale synthetic datasets demonstrate that our approach is able to achieve good efficiency and stability.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Amagata, D., Sasaki, Y., Hara, T., Nishio, S.: Efficient processing of top-k dominating queries in distributed environments. World Wide Web-internet Web Inf. Syst. 19(4), 545–577 (2016)CrossRef Amagata, D., Sasaki, Y., Hara, T., Nishio, S.: Efficient processing of top-k dominating queries in distributed environments. World Wide Web-internet Web Inf. Syst. 19(4), 545–577 (2016)CrossRef
2.
Zurück zum Zitat Borzonyi, S.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering, pp. 421–430 (2001) Borzonyi, S.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering, pp. 421–430 (2001)
3.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRef Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRef
4.
Zurück zum Zitat Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM Sigops Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRef Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM Sigops Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRef
5.
Zurück zum Zitat Han, X., Li, J., Gao, H.: Efficient Top-k Dominating Computation on Massive Data. IEEE Educational Activities Department (2017) Han, X., Li, J., Gao, H.: Efficient Top-k Dominating Computation on Massive Data. IEEE Educational Activities Department (2017)
6.
Zurück zum Zitat Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)CrossRef Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)CrossRef
7.
Zurück zum Zitat Khalefa, M.E., Mokbel, M.F., Levandoski, J.J.: Skyline query processing for incomplete data. In: IEEE International Conference on Data Engineering, pp. 556–565 (2008) Khalefa, M.E., Mokbel, M.F., Levandoski, J.J.: Skyline query processing for incomplete data. In: IEEE International Conference on Data Engineering, pp. 556–565 (2008)
8.
Zurück zum Zitat Man, L.Y., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: International Conference on Very Large Data Bases, University of Vienna, Austria, pp. 483–494, September 2007 Man, L.Y., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: International Conference on Very Large Data Bases, University of Vienna, Austria, pp. 483–494, September 2007
9.
Zurück zum Zitat Miao, X., Gao, Y., Zheng, B., Chen, G., Cui, H.: Top-k dominating queries on incomplete data. In: IEEE International Conference on Data Engineering, pp. 1500–1501 (2016) Miao, X., Gao, Y., Zheng, B., Chen, G., Cui, H.: Top-k dominating queries on incomplete data. In: IEEE International Conference on Data Engineering, pp. 1500–1501 (2016)
10.
Zurück zum Zitat Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)CrossRef Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)CrossRef
11.
Zurück zum Zitat Saha, B., Srivastava, D.: Data quality: The other face of big data. In: IEEE International Conference on Data Engineering, pp. 1294–1297 (2014) Saha, B., Srivastava, D.: Data quality: The other face of big data. In: IEEE International Conference on Data Engineering, pp. 1294–1297 (2014)
12.
Zurück zum Zitat Tiakas, E., Papadopoulos, A.N., Manolopoulos, Y.: Progressive processing of subspace dominating queries. VLDB J. 20(6), 921–948 (2011)CrossRef Tiakas, E., Papadopoulos, A.N., Manolopoulos, Y.: Progressive processing of subspace dominating queries. VLDB J. 20(6), 921–948 (2011)CrossRef
13.
Zurück zum Zitat Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18(3), 695–718 (2009)CrossRef Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18(3), 695–718 (2009)CrossRef
14.
Zurück zum Zitat Zhan, L., Zhang, Y., Zhang, W., Lin, X.: Identifying top k dominating objects over uncertain data. In: International Conference on Database Systems for Advanced Applications, pp. 388–405 (2014)CrossRef Zhan, L., Zhang, Y., Zhang, W., Lin, X.: Identifying top k dominating objects over uncertain data. In: International Conference on Database Systems for Advanced Applications, pp. 388–405 (2014)CrossRef
Metadaten
Titel
Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce
verfasst von
Xiangwu Ding
Chao Yan
Yuan Zhao
Zewei Yang
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00006-6_44