Skip to main content
Top

2017 | OriginalPaper | Chapter

A Software Architecture for Enabling Statistical Learning on Big Data

Authors : Ali Behnaz, Fethi Rabhi, Maurice Peat

Published in: Advances in Time Series Analysis and Forecasting

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most big data analytics research is scattered across multiple disciplines such as applied statistics, machine learning, language technology or databases. Little attention has been paid to aligning big data solutions with end-user’s mental models for conducting exploratory and predictive data analysis. We are particularly interested in the way domain experts perform big data analysis by applying statistics to big data with a focus on statistical learning. In this paper we compare and contrast the different views about data between the fields of statistics and computer science. We review popular analysis techniques and tools within a defined analytics stack. We then propose a model-driven architecture that uses semantic and event processing technologies to achieve a separation of concerns between expressing the mathematical model and the computational requirements. The paper also describes an implementation of the proposed architecture with a case study in funds management.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
This is a multiple linear regression, a widely used form in statistical learning.
 
Literature
1.
go back to reference Laney, D.: 3-D data management: controlling data volume, velocity and variety. Application Delivery Strategies by META Group Inc. (2001) Laney, D.: 3-D data management: controlling data volume, velocity and variety. Application Delivery Strategies by META Group Inc. (2001)
2.
go back to reference Diebold, F.X.: A personal perspective on the origin(s) and development of “big data”: the phenomenon, the term, and the discipline (Scholarly Paper No. ID 2202843). Social Science Research Network (2012) Diebold, F.X.: A personal perspective on the origin(s) and development of “big data”: the phenomenon, the term, and the discipline (Scholarly Paper No. ID 2202843). Social Science Research Network (2012)
4.
go back to reference McKinsey & Company, Big data: The next frontier for innovation, competition, and productivity, p. 156. McKinsey Global Institute (2011) McKinsey & Company, Big data: The next frontier for innovation, competition, and productivity, p. 156. McKinsey Global Institute (2011)
5.
go back to reference Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. In: Proc. VLDB Endow. 5(12), 2032–2033 (2012) Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. In: Proc. VLDB Endow. 5(12), 2032–2033 (2012)
6.
go back to reference Baesens, B.: Analytics in a big data world: the essential guide to data science and its applications. Wiley and SAS Business Series (2014) Baesens, B.: Analytics in a big data world: the essential guide to data science and its applications. Wiley and SAS Business Series (2014)
8.
go back to reference Milosevic, Z., Chen, W., Berry A., Rabhi, F.A.: An open architecture for event-based analytics, submitted to Computing (2015) Milosevic, Z., Chen, W., Berry A., Rabhi, F.A.: An open architecture for event-based analytics, submitted to Computing (2015)
9.
go back to reference Lee, A.S., Hubona, G.S.: A scientific basis for rigor in information systems research. MIS Q. 33(2), 237–262 (2009) Lee, A.S., Hubona, G.S.: A scientific basis for rigor in information systems research. MIS Q. 33(2), 237–262 (2009)
10.
go back to reference Schutt, R., O’Neil, C.: Doing Data Science: Straight Talk from the Frontline. O’Reilly Media Inc (2013) Schutt, R., O’Neil, C.: Doing Data Science: Straight Talk from the Frontline. O’Reilly Media Inc (2013)
11.
go back to reference Landau, S., Everitt, B.S.: A handbook of statistical analysis using SPSS, pp. 8–11. CRC Press (2004) Landau, S., Everitt, B.S.: A handbook of statistical analysis using SPSS, pp. 8–11. CRC Press (2004)
12.
go back to reference Robertson, C.S., Rabhi, F.A., Peat, M.: A service-oriented approach towards real time financial news analysis. In: Consumer Information Systems and Relationship Management: Design, Implementation, and Use: Design, Implementation, and Use (2013)‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬ Robertson, C.S., Rabhi, F.A., Peat, M.: A service-oriented approach towards real time financial news analysis. In: Consumer Information Systems and Relationship Management: Design, Implementation, and Use: Design, Implementation, and Use (2013)‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬
13.
go back to reference Tan, A.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, vol. 8. (1999) Tan, A.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, vol. 8. (1999)
14.
go back to reference Ming, F.: Stock market prediction from WSJ: text mining via sparse matrix factorization. In: 2014 IEEE International Conference on Data Mining (ICDM). IEEE (2014) Ming, F.: Stock market prediction from WSJ: text mining via sparse matrix factorization. In: 2014 IEEE International Conference on Data Mining (ICDM). IEEE (2014)
15.
go back to reference Kohavi, R., Provost, F.: Glossary of terms. Mach. Learn. 30, 271–274 (1998)CrossRef Kohavi, R., Provost, F.: Glossary of terms. Mach. Learn. 30, 271–274 (1998)CrossRef
16.
go back to reference Deng, L., Yu. D.: Deep learning: methods and applications. Found. Tr. Signal Process. 7(3–4), 197–387 (2014) Deng, L., Yu. D.: Deep learning: methods and applications. Found. Tr. Signal Process. 7(3–4), 197–387 (2014)
17.
go back to reference Shen, S., Jiang, H., Zhang, T.: Stock market forecasting using machine learning algorithms (2012) Shen, S., Jiang, H., Zhang, T.: Stock market forecasting using machine learning algorithms (2012)
18.
go back to reference Zaidi, S., Nasir, M.: Teaching and Learning Methods in Medicine. Springer (2015) Zaidi, S., Nasir, M.: Teaching and Learning Methods in Medicine. Springer (2015)
19.
go back to reference James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning with applications in R. Springer, New York (2013) James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning with applications in R. Springer, New York (2013)
20.
go back to reference Frankel, D.: Model Driven Architecture: Applying MDA to Enterprise Computing. OMG Press (2007) Frankel, D.: Model Driven Architecture: Applying MDA to Enterprise Computing. OMG Press (2007)
21.
go back to reference Atkinson, C., Kühne, T.: Model-driven development: a metamodeling foundation. IEEE Softw. 20(5), 36–41 (2003)CrossRef Atkinson, C., Kühne, T.: Model-driven development: a metamodeling foundation. IEEE Softw. 20(5), 36–41 (2003)CrossRef
22.
go back to reference Soley, R.: OMG staff strategy group, model driven architecture. OMG White Paper, pp. 1–12. (April 2000) Soley, R.: OMG staff strategy group, model driven architecture. OMG White Paper, pp. 1–12. (April 2000)
23.
go back to reference Sendall, S., Kozaczynski, W.: Model transformation: the heart and soul of model-driven software development. IEEE Softw. 20(5), 42–45 (2003)CrossRef Sendall, S., Kozaczynski, W.: Model transformation: the heart and soul of model-driven software development. IEEE Softw. 20(5), 42–45 (2003)CrossRef
24.
go back to reference Jouault, F., Allilaire, F., Bézivin, J., Kurtev, I.: ATL: a model transformation tool. Sci. Comput. Program. 72(1–2), 31–39 (2008)MathSciNetCrossRefMATH Jouault, F., Allilaire, F., Bézivin, J., Kurtev, I.: ATL: a model transformation tool. Sci. Comput. Program. 72(1–2), 31–39 (2008)MathSciNetCrossRefMATH
25.
go back to reference Agrawal, G., Karsai, Z., Kalmar, S., Neema, F., Vizhanyo, A.: The Design of a simple language for graph transformations. J. Softw. Syst. Model. (submitted for publication) (2005) Agrawal, G., Karsai, Z., Kalmar, S., Neema, F., Vizhanyo, A.: The Design of a simple language for graph transformations. J. Softw. Syst. Model. (submitted for publication) (2005)
26.
go back to reference Gardner, T., Griffin, C.: A review of OMG MOF 2.0 Query/Views/Transformations Submissions and Recommendations Towards the Final Standard. IBM Hurley Development Lab., e-Business Integration Technologies (2003) Gardner, T., Griffin, C.: A review of OMG MOF 2.0 Query/Views/Transformations Submissions and Recommendations Towards the Final Standard. IBM Hurley Development Lab., e-Business Integration Technologies (2003)
27.
go back to reference Varró, D., Varró, G., Pataricza, A.: Designing the automatic transformation of visual languages. J. Sci. Comput. Program. 44, 205–227 (2002)CrossRefMATH Varró, D., Varró, G., Pataricza, A.: Designing the automatic transformation of visual languages. J. Sci. Comput. Program. 44, 205–227 (2002)CrossRefMATH
30.
go back to reference Allemang, D., Hendler, J.: Semantic Web For The Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann (2008) Allemang, D., Hendler, J.: Semantic Web For The Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann (2008)
31.
go back to reference Dodge, Y.: The Oxford Dictionary of Statistical Terms. OUP (2003) Dodge, Y.: The Oxford Dictionary of Statistical Terms. OUP (2003)
Metadata
Title
A Software Architecture for Enabling Statistical Learning on Big Data
Authors
Ali Behnaz
Fethi Rabhi
Maurice Peat
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-55789-2_24