Skip to main content

2018 | OriginalPaper | Buchkapitel

Humble Data Management to Big Data Analytics/Science: A Retrospective Stroll

verfasst von : Sharma Chakravarthy, Abhishek Santra, Kanthi Sannappa Komar

Erschienen in: Big Data Analytics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We are on the cusp of analyzing a variety of data being collected in every walk of life in diverse ways and holistically as well as developing a science (Big Data Science) to benefit humanity at large in the best possible way. This warrants developing and using new approaches – technological, scientific, and systems – in addition to building upon and integrating with the ones that have been developed so far. With this ambitious goal, there is also the accompanying risk of these advancements being misused or abused as we have seen so many times with respect to new technologies.
In this paper, we plan on providing a retrospective bird’s-eye-view on the approaches that have come about for managing and analyzing data over the last 40+ years. Since the advent of Database Management Systems (or DBMSs) and especially the Relational DBMSs (or RDBMSs), data management and analysis have seen several significant strides. Today, data has become an important tool (or even a weapon) in society and its role and importance is unprecedented.
The goal of this paper is to provide the reader an understanding of data management and analysis approaches with respect to where we have come from, motivations for developing them, and what this journey has been about in a short span of 40+ years. We sincerely hope this presentation provides a historical as well as a pedagogical perspective for those who are new to the field and provides a useful perspective that they can relate to and appreciate for those who have been working and contributing to the field.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Prof. Michael Stonebraker fondly refers to SQL as the inter-galactic data speak. Others may see it differently.
 
2
This could be for a building, mall, check post, or a parking lot etc.
 
Literatur
1.
Zurück zum Zitat Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)CrossRef Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)CrossRef
2.
Zurück zum Zitat Anwar, E., Maugis, L., Chakravarthy, S.: A new perspective on rule support for object-oriented databases. In: SIGMOD Conference, pp. 99–108 (1993)CrossRef Anwar, E., Maugis, L., Chakravarthy, S.: A new perspective on rule support for object-oriented databases. In: SIGMOD Conference, pp. 99–108 (1993)CrossRef
3.
Zurück zum Zitat Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. SIGMOD Rec. 33(3), 6–12 (2004)CrossRef Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. SIGMOD Rec. 33(3), 6–12 (2004)CrossRef
4.
Zurück zum Zitat Balachandran, R., Padmanabhan, S., Chakravarthy, S.: Enhanced DB-subdue: supporting subtle aspects of graph mining using a relational approach. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 673–678. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_77CrossRef Balachandran, R., Padmanabhan, S., Chakravarthy, S.: Enhanced DB-subdue: supporting subtle aspects of graph mining using a relational approach. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 673–678. Springer, Heidelberg (2006). https://​doi.​org/​10.​1007/​11731139_​77CrossRef
6.
Zurück zum Zitat Bodra, J.D.: Processing Queries Over Partitioned Graph Databases: An Approach And It’s Evaluation. Master’s thesis, The University of Texas at Arlington, May 2016 Bodra, J.D.: Processing Queries Over Partitioned Graph Databases: An Approach And It’s Evaluation. Master’s thesis, The University of Texas at Arlington, May 2016
7.
Zurück zum Zitat Chakravarthy, S., Anwar, E., Maugis, L., Mishra, D.: Design of sentinel: an object-oriented DBMS with event-based rules. Inf. Softw. Technol. 36(9), 559–568 (1994)CrossRef Chakravarthy, S., Anwar, E., Maugis, L., Mishra, D.: Design of sentinel: an object-oriented DBMS with event-based rules. Inf. Softw. Technol. 36(9), 559–568 (1994)CrossRef
8.
Zurück zum Zitat Chakravarthy, S., et al.: HiPAC: A Research Project in Active. Time-Constrained Database Management. Technical report, Xerox Advanced Information Technology, Cambridge (1989) Chakravarthy, S., et al.: HiPAC: A Research Project in Active. Time-Constrained Database Management. Technical report, Xerox Advanced Information Technology, Cambridge (1989)
9.
Zurück zum Zitat Chakravarthy, S.: Divide and conquer: a basis for augmenting a conventional query optimizer with multiple query proceesing capabilities. In: ICDE, pp. 482–490 (1991) Chakravarthy, S.: Divide and conquer: a basis for augmenting a conventional query optimizer with multiple query proceesing capabilities. In: ICDE, pp. 482–490 (1991)
11.
Zurück zum Zitat Chakravarthy, S., Jiang, Q.: Stream Data Management: A Quality of Service Perspective. Springer, Boston (2009)MATH Chakravarthy, S., Jiang, Q.: Stream Data Management: A Quality of Service Perspective. Springer, Boston (2009)MATH
12.
Zurück zum Zitat Chakravarthy, S., Krishnaprasad, V., Anwar, E., Kim, S.: Composite events for active databases: semantics, contexts and detection. In: VLDB, pp. 606–617 (1994) Chakravarthy, S., Krishnaprasad, V., Anwar, E., Kim, S.: Composite events for active databases: semantics, contexts and detection. In: VLDB, pp. 606–617 (1994)
13.
Zurück zum Zitat Chakravarthy, S., Nesson, S.: Making an object-oriented DBMS active: design, implementation, and evaluation of a prototype. In: Bancilhon, F., Thanos, C., Tsichritzis, D. (eds.) EDBT 1990. LNCS, vol. 416, pp. 393–406. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0022185CrossRef Chakravarthy, S., Nesson, S.: Making an object-oriented DBMS active: design, implementation, and evaluation of a prototype. In: Bancilhon, F., Thanos, C., Tsichritzis, D. (eds.) EDBT 1990. LNCS, vol. 416, pp. 393–406. Springer, Heidelberg (1990). https://​doi.​org/​10.​1007/​BFb0022185CrossRef
14.
Zurück zum Zitat Chakravarthy, U.S., Grant, J., Minker, J.: Logic-based approach to semantic query optimization. ACM Trans. Database Syst. 15(2), 162–207 (1990)CrossRef Chakravarthy, U.S., Grant, J., Minker, J.: Logic-based approach to semantic query optimization. ACM Trans. Database Syst. 15(2), 162–207 (1990)CrossRef
15.
Zurück zum Zitat Chakravarthy, U.S., Minker, J.: Multiple query processing in deductive databases using query graphs. In: VLDB, pp. 384–391 (1986) Chakravarthy, U.S., Minker, J.: Multiple query processing in deductive databases using query graphs. In: VLDB, pp. 384–391 (1986)
18.
Zurück zum Zitat Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRef Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRef
19.
Zurück zum Zitat Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1, 231–255 (1994)CrossRef Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1, 231–255 (1994)CrossRef
20.
Zurück zum Zitat Das, S.: Divide and Conquer Approach to Scalable Substructure Discovery: Partitioning Schemes, Algorithms, Optimization And Performance Analysis Using Map/reduce Paradigm. Ph.D. thesis, The University of Texas at Arlington, May 2017 Das, S.: Divide and Conquer Approach to Scalable Substructure Discovery: Partitioning Schemes, Algorithms, Optimization And Performance Analysis Using Map/reduce Paradigm. Ph.D. thesis, The University of Texas at Arlington, May 2017
23.
Zurück zum Zitat Das, S., Goyal, A., Chakravarthy, S.: Plan before you execute: a cost-based query optimizer for attributed graph databases. In: DaWaK 2016, Porto, Portugal, 6–8 September 2016, pp. 314–328 (2016)CrossRef Das, S., Goyal, A., Chakravarthy, S.: Plan before you execute: a cost-based query optimizer for attributed graph databases. In: DaWaK 2016, Porto, Portugal, 6–8 September 2016, pp. 314–328 (2016)CrossRef
24.
Zurück zum Zitat Dayal, U., et al.: The HiPAC project: combining active databases and timing constraints. SIGMOD Rec. 17(1), 51–70 (1988)CrossRef Dayal, U., et al.: The HiPAC project: combining active databases and timing constraints. SIGMOD Rec. 17(1), 51–70 (1988)CrossRef
25.
Zurück zum Zitat Dayal, U., Buchmann, A.P., Chakravarthy, S.: The HiPAC project. In: Active Database Systems: Triggers and Rules for Advanced Database Processing, pp. 177–206. Morgan Kaufmann (1996) Dayal, U., Buchmann, A.P., Chakravarthy, S.: The HiPAC project. In: Active Database Systems: Triggers and Rules for Advanced Database Processing, pp. 177–206. Morgan Kaufmann (1996)
26.
Zurück zum Zitat Dittrich, K.R., Kotz, A.M., Mulle, J.A.: An event/trigger mechanism to enforce complex consistency constraints in design databases. SIGMOD Rec. 15(3), 22–36 (1986)CrossRef Dittrich, K.R., Kotz, A.M., Mulle, J.A.: An event/trigger mechanism to enforce complex consistency constraints in design databases. SIGMOD Rec. 15(3), 22–36 (1986)CrossRef
28.
29.
Zurück zum Zitat Engström, H., Chakravarthy, S., Lings, B.: A heuristic for refresh policy selection in heterogeneous environments. In: ICDE, pp. 674–676 (2003) Engström, H., Chakravarthy, S., Lings, B.: A heuristic for refresh policy selection in heterogeneous environments. In: ICDE, pp. 674–676 (2003)
30.
Zurück zum Zitat Engström, H., Chakravarthy, S., Lings, B.: Maintenance policy selection in heterogeneous data warehouse environments: a heuristics-based approach. In: DOLAP, pp. 71–78 (2003) Engström, H., Chakravarthy, S., Lings, B.: Maintenance policy selection in heterogeneous data warehouse environments: a heuristics-based approach. In: DOLAP, pp. 71–78 (2003)
31.
Zurück zum Zitat Goyal, A.: QP-SUBDUE: Processing Queries Over Graph Databases. Master’s thesis, The University of Texas at Arlington, December 2015 Goyal, A.: QP-SUBDUE: Processing Queries Over Graph Databases. Master’s thesis, The University of Texas at Arlington, December 2015
32.
Zurück zum Zitat Hwang, J.H., Cha, S., Çetintemel, U., Zdonik, S.B.: Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications. In: SIGMOD Conference, pp. 1303–1306 (2008) Hwang, J.H., Cha, S., Çetintemel, U., Zdonik, S.B.: Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications. In: SIGMOD Conference, pp. 1303–1306 (2008)
33.
Zurück zum Zitat Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: \(NFM^i\): an inter-domain network fault management system. In: ICDE, pp. 1036–1047 (2005) Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: \(NFM^i\): an inter-domain network fault management system. In: ICDE, pp. 1036–1047 (2005)
34.
Zurück zum Zitat Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: MavEStream: synergistic integration of stream and event processing. In: International Conference on Digital Communications, p. 29 (2007) Jiang, Q., Adaikkalavan, R., Chakravarthy, S.: MavEStream: synergistic integration of stream and event processing. In: International Conference on Digital Communications, p. 29 (2007)
35.
Zurück zum Zitat Jiang, Q., Chakravarthy, S.: Queueing analysis of relational operators for continuous data streams. In: CIKM, pp. 271–278 (2003) Jiang, Q., Chakravarthy, S.: Queueing analysis of relational operators for continuous data streams. In: CIKM, pp. 271–278 (2003)
37.
Zurück zum Zitat Kona, H., Chakravarthy, S.: An SQL-based approach to incremental association rule mining. Found. Comput. Decis. Sci. J. (2006). Special issue Kona, H., Chakravarthy, S.: An SQL-based approach to incremental association rule mining. Found. Comput. Decis. Sci. J. (2006). Special issue
39.
Zurück zum Zitat Lerner, A., Shasha, D.: Aquery: query language for ordered data, optimization techniques, and experiments. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 345–356. VLDB Endowment (2003) Lerner, A., Shasha, D.: Aquery: query language for ordered data, optimization techniques, and experiments. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 345–356. VLDB Endowment (2003)
40.
Zurück zum Zitat Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters (2008) Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters (2008)
43.
Zurück zum Zitat Mishra, P.: Performance Evaluation and Analysis of SQL-based Approaches for Association Rule Mining. Master’s thesis, The University of Texas at Arlington, December 2002 Mishra, P.: Performance Evaluation and Analysis of SQL-based Approaches for Association Rule Mining. Master’s thesis, The University of Texas at Arlington, December 2002
44.
Zurück zum Zitat Newman, M.: Networks: An Introduction. Oxford University Press Inc., New York (2010)CrossRef Newman, M.: Networks: An Introduction. Oxford University Press Inc., New York (2010)CrossRef
45.
Zurück zum Zitat Padmanabhan, S.: HDB-Subdue: A Relational Database Approach to Graph Mining and Hierarchical Reduction. Master’s thesis, The University of Texas at Arlington, December 2005 Padmanabhan, S.: HDB-Subdue: A Relational Database Approach to Graph Mining and Hierarchical Reduction. Master’s thesis, The University of Texas at Arlington, December 2005
46.
Zurück zum Zitat Qingchun, J.: A Framework for Supporting Quality of Service Requirements in a Data Stream Management System. Ph.D. thesis, The University of Texas at Arlington, August 2005 Qingchun, J.: A Framework for Supporting Quality of Service Requirements in a Data Stream Management System. Ph.D. thesis, The University of Texas at Arlington, August 2005
47.
Zurück zum Zitat Ramakrishnan, R.: Database Management Systems. WCB/McGraw-Hill (1998) Ramakrishnan, R.: Database Management Systems. WCB/McGraw-Hill (1998)
48.
Zurück zum Zitat Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 2nd edn. Benjamin/Cummings, Redwood City (1994)MATH Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems, 2nd edn. Benjamin/Cummings, Redwood City (1994)MATH
49.
Zurück zum Zitat Rosenthal, A., Chakravarthy, S., Blaustein, B.T., Blakeley, J.A.: Situation monitoring for active databases. In: VLDB, pp. 455–464 (1989) Rosenthal, A., Chakravarthy, S., Blaustein, B.T., Blakeley, J.A.: Situation monitoring for active databases. In: VLDB, pp. 455–464 (1989)
50.
Zurück zum Zitat Rosenthal, A., Chakravarthy, U.S.: Anatomy of a mudular multiple query optimizer. In: VLDB, pp. 230–239 (1988) Rosenthal, A., Chakravarthy, U.S.: Anatomy of a mudular multiple query optimizer. In: VLDB, pp. 230–239 (1988)
53.
Zurück zum Zitat Santra, A., Bhowmick, S., Chakravarthy, S.: HUBify: efficient estimation of central entities across multiplex layer compositions. In: 2017 IEEE International Conference on Data Mining Workshops, ICDM Workshops (2017) Santra, A., Bhowmick, S., Chakravarthy, S.: HUBify: efficient estimation of central entities across multiplex layer compositions. In: 2017 IEEE International Conference on Data Mining Workshops, ICDM Workshops (2017)
54.
Zurück zum Zitat Solé-Ribalta, A., De Domenico, M., Gómez, S., Arenas, A.: Centrality rankings in multiplex networks. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 149–155. ACM (2014) Solé-Ribalta, A., De Domenico, M., Gómez, S., Arenas, A.: Centrality rankings in multiplex networks. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 149–155. ACM (2014)
55.
Zurück zum Zitat Stonebraker, M., Hanson, E., Potamianos, S.: The POSTGRES rule manager. IEEE Trans. Softw. Eng. 14(7), 897–907 (1988)CrossRef Stonebraker, M., Hanson, E., Potamianos, S.: The POSTGRES rule manager. IEEE Trans. Softw. Eng. 14(7), 897–907 (1988)CrossRef
56.
Zurück zum Zitat Zdonik, S.B., Stonebraker, M., Cherniack, M., Çetintemel, U., Balazinska, M., Balakrishnan, H.: The aurora and medusa projects. IEEE Data Eng. Bull. 26(1), 3–10 (2003) Zdonik, S.B., Stonebraker, M., Cherniack, M., Çetintemel, U., Balazinska, M., Balakrishnan, H.: The aurora and medusa projects. IEEE Data Eng. Bull. 26(1), 3–10 (2003)
57.
Zurück zum Zitat Zhang, H., Wang, C.D., Lai, J.H., Philip, S.Y.: Modularity in complex multilayer networks with multiple aspects: a static perspective. Appl. Inform. 4, 7 (2017)CrossRef Zhang, H., Wang, C.D., Lai, J.H., Philip, S.Y.: Modularity in complex multilayer networks with multiple aspects: a static perspective. Appl. Inform. 4, 7 (2017)CrossRef
Metadaten
Titel
Humble Data Management to Big Data Analytics/Science: A Retrospective Stroll
verfasst von
Sharma Chakravarthy
Abhishek Santra
Kanthi Sannappa Komar
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-04780-1_3