Skip to main content
Erschienen in: Knowledge and Information Systems 1/2018

19.08.2017 | Regular Paper

Tractable queries on big data via preprocessing with logarithmic-size output

verfasst von: Jiannan Yang, Hanpin Wang, Yongzhi Cao

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To provide a dichotomy between those queries that are feasible on big data after appropriate preprocessing and those for which preprocessing does not help, Fan et al. developed the \(\sqcap \)-tractability theory, which provides a formal foundation on the tractability of query classes in the context of big data. Inspired by some technologies used to deal with big data, we introduce a novel notion of \(\sqcap '\)-tractability in this paper. We place a restriction on preprocessing functions, which limits the functions to produce relatively short outputs, at most logarithmic-size of the inputs. We set a complexity class to denote the classes of Boolean queries that are \(\sqcap '\)-tractable and conclude that it is properly contained in that of \(\sqcap \)-tractable query classes, after discovering that a \(\sqcap \)-tractable query class is not \(\sqcap '\)-tractable. With an existing reduction, which does not allow re-factorizing data and query parts, we define complete query classes for the complexity class and give an efficient way to detect such query classes. We also investigate the query classes that can be made \(\sqcap '\)-tractable and prove that all PTIME classes of Boolean queries can be made \(\sqcap '\)-tractable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cao Y, Fan W, Wo T, Yu W (2014) Bounded conjunctive queries. PVLDB 7(12):1231–1242 Cao Y, Fan W, Wo T, Yu W (2014) Bounded conjunctive queries. PVLDB 7(12):1231–1242
2.
Zurück zum Zitat Chen CP, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRef Chen CP, Zhang CY (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347CrossRef
3.
Zurück zum Zitat Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
4.
5.
Zurück zum Zitat Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the ACM 2012 international conference on management of data, pp 157–168 Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the ACM 2012 international conference on management of data, pp 157–168
6.
Zurück zum Zitat Fan W, Geerts F, Neven F (2013) Making queries tractable on big data with preprocessing: through the eyes of complexity theory. PVLDB 6(9):685–696 Fan W, Geerts F, Neven F (2013) Making queries tractable on big data with preprocessing: through the eyes of complexity theory. PVLDB 6(9):685–696
7.
Zurück zum Zitat Fan W, Geerts F, Libkin L (2014) On scale independence for querying big data. In: Proceedings of the ACM 33rd symposium on principles of database systems, pp 51–62 Fan W, Geerts F, Libkin L (2014) On scale independence for querying big data. In: Proceedings of the ACM 33rd symposium on principles of database systems, pp 51–62
8.
Zurück zum Zitat Fan W, Wang X, Wu Y (2014) Querying big graphs within bounded resources. In: Proceedings of the ACM 2014 international conference on management of data, pp 301–312 Fan W, Wang X, Wu Y (2014) Querying big graphs within bounded resources. In: Proceedings of the ACM 2014 international conference on management of data, pp 301–312
9.
Zurück zum Zitat Fiori A, Mignone A, Rospo G (2016) Decoclu: density consensus clustering approach for public transport data. Inf Sci 328:378–388CrossRef Fiori A, Mignone A, Rospo G (2016) Decoclu: density consensus clustering approach for public transport data. Inf Sci 328:378–388CrossRef
10.
Zurück zum Zitat Gani A, Siddiqa A, Shamshirband S, Hanum F (2016) A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst 46(2):241–284CrossRef Gani A, Siddiqa A, Shamshirband S, Hanum F (2016) A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst 46(2):241–284CrossRef
12.
Zurück zum Zitat Greenlaw R, Hoover HJ, Ruzzo WL (1995) Limits to parallel computation: P-completeness theory. Oxford University Press, New YorkMATH Greenlaw R, Hoover HJ, Ruzzo WL (1995) Limits to parallel computation: P-completeness theory. Oxford University Press, New YorkMATH
13.
Zurück zum Zitat Hamooni H, Mueen A, Neel A (2016) Phoneme sequence recognition via dtw-based classification. Knowl Inf Syst 48(2):253–275CrossRef Hamooni H, Mueen A, Neel A (2016) Phoneme sequence recognition via dtw-based classification. Knowl Inf Syst 48(2):253–275CrossRef
14.
Zurück zum Zitat Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: review and open research issues. Inf Syst 47:98–115CrossRef Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: review and open research issues. Inf Syst 47:98–115CrossRef
15.
Zurück zum Zitat Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57(7):86–94CrossRef Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57(7):86–94CrossRef
16.
Zurück zum Zitat Jung G, Gnanasambandam N, Mukherjee T (2012) Synchronous parallel processing of big-data analytics services to optimize performance in federated clouds. In: IEEE proceedings of the 5th international conference on cloud computing, pp 811–818 Jung G, Gnanasambandam N, Mukherjee T (2012) Synchronous parallel processing of big-data analytics services to optimize performance in federated clouds. In: IEEE proceedings of the 5th international conference on cloud computing, pp 811–818
17.
Zurück zum Zitat Kang U, Tong H, Sun J, Lin C, Faloutsos C (2011) Gbase: A scalable and general graph management system. In: ACM proceedings of the 17th international conference on knowledge discovery and data mining, pp 1091–1099 Kang U, Tong H, Sun J, Lin C, Faloutsos C (2011) Gbase: A scalable and general graph management system. In: ACM proceedings of the 17th international conference on knowledge discovery and data mining, pp 1091–1099
18.
Zurück zum Zitat Marz N, Warren J (2015) Big data: principles and best practices of scalable realtime data systems. Manning Publications Co, Greenwich Marz N, Warren J (2015) Big data: principles and best practices of scalable realtime data systems. Manning Publications Co, Greenwich
19.
Zurück zum Zitat Michael K, Miller KW (2013) Big data: new opportunities and new challenges. Computer 46(6):22–24CrossRef Michael K, Miller KW (2013) Big data: new opportunities and new challenges. Computer 46(6):22–24CrossRef
20.
Zurück zum Zitat Mozafari B, Zeng K, D’Antoni L, Zaniolo C (2013) High-performance complex event processing over hierarchical data. ACM T Database Syst 38(4):21MathSciNetMATH Mozafari B, Zeng K, D’Antoni L, Zaniolo C (2013) High-performance complex event processing over hierarchical data. ACM T Database Syst 38(4):21MathSciNetMATH
21.
Zurück zum Zitat National Research Council (2013) Frontiers in massive data analysis. The National Academies Press, Washington National Research Council (2013) Frontiers in massive data analysis. The National Academies Press, Washington
22.
Zurück zum Zitat Papadimitriou CH (2003) Computational complexity. In: Encyclopedia of computer science. Wiley, Chichester, pp 260–265 Papadimitriou CH (2003) Computational complexity. In: Encyclopedia of computer science. Wiley, Chichester, pp 260–265
23.
Zurück zum Zitat Ramentol E, Caballero Y, Bello R, Herrera F (2012) Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245–265CrossRef Ramentol E, Caballero Y, Bello R, Herrera F (2012) Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245–265CrossRef
24.
Zurück zum Zitat del Río S, López V, Benítez JM, Herrera F (2014) On the use of mapreduce for imbalanced big data using random forest. Inf Sci 285:112–137CrossRef del Río S, López V, Benítez JM, Herrera F (2014) On the use of mapreduce for imbalanced big data using random forest. Inf Sci 285:112–137CrossRef
25.
Zurück zum Zitat Sarma AD, Lee H, Gonzalez H, Madhavan J, Halevy AY (2013) Consistent thinning of large geographical data for map visualization. ACM T Database Syst 38(4):22MathSciNetMATH Sarma AD, Lee H, Gonzalez H, Madhavan J, Halevy AY (2013) Consistent thinning of large geographical data for map visualization. ACM T Database Syst 38(4):22MathSciNetMATH
26.
Zurück zum Zitat Vardi MY (1982) The complexity of relational query languages. In: Proceedings of the 14th Annual ACM Symposium on Theory of Computing, pp 137–146 Vardi MY (1982) The complexity of relational query languages. In: Proceedings of the 14th Annual ACM Symposium on Theory of Computing, pp 137–146
27.
Zurück zum Zitat Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE T Knowl Data En 26(1):97–107CrossRef Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE T Knowl Data En 26(1):97–107CrossRef
28.
Zurück zum Zitat Yang C, Zhang X, Zhong C, Liu C, Pei J, Ramamohanarao K, Chen J (2014) A spatiotemporal compression based approach for efficient big data processing on cloud. J Comput Syst Sci 80(8):1563–1583MathSciNetCrossRefMATH Yang C, Zhang X, Zhong C, Liu C, Pei J, Ramamohanarao K, Chen J (2014) A spatiotemporal compression based approach for efficient big data processing on cloud. J Comput Syst Sci 80(8):1563–1583MathSciNetCrossRefMATH
Metadaten
Titel
Tractable queries on big data via preprocessing with logarithmic-size output
verfasst von
Jiannan Yang
Hanpin Wang
Yongzhi Cao
Publikationsdatum
19.08.2017
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2018
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-017-1092-7

Weitere Artikel der Ausgabe 1/2018

Knowledge and Information Systems 1/2018 Zur Ausgabe