Top

Published in:

2021 | OriginalPaper | Chapter

4. The Analysis of Big Financial Data Through Artificial Intelligence Methods

Authors : Erkan Ozhan, Erdinç Uzun

Published in: The Impact of Artificial Intelligence on Governance, Economics and Finance, Volume I

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

A new data world which never get deformed, can be reached from anywhere, continuously stream and multiply, emerged with the evolution of technology. The data, in particular, created by business firms, scientific research centers, and automation systems reached great amounts. It has become the main target of many data analysts to reach meaningful, unexplored, and valuable information or deductions among these piles of data. In this chapter, firstly the techniques of artificial intelligence and the skills of these techniques were discussed. Later, the mostly-used techniques in the finance sector, the advantages and weaknesses of these techniques, and the methods which can be used to process the data created by the finance sector, which creates big data and is one of the leading sources, was comparatively shown. The current version of the mostly-used artificial intelligence methods in the finance sector was scanned and the new skills and contributions it provides to the sector were examined. What Classification, clustering, association rules, and time series analysis methods, in particular, cover and what problems they can produce solutions to were examined and the readers were informed about these techniques. It was aimed to give information about forming credit score and customer segmentation, where classification and clustering methods are especially employed, with sample studies. It was aimed to present the principles the up-to-date methods are based on and their theoretical and practical applications in a meaningful way. In addition to these, information about practical and useful software that can be used for data analysis in the finance sector was given and the skills of this software were conveyed to the readers. Finally, how the techniques of processing big data can be used was examined through samples as the finance data are classified as big data. The difficulties met during the analysis of big data, a natural result created by this sector, and solutions to them were presented. Updated big data processing solutions like Hadoop, Spark, MapReduce, Distributed computing, and GPU (Graphics Processing Unit) computing, in particular, were comparatively explained. The main principles that big data processing techniques are based on were simplified in a way that the readers could understand and were supported by examples from the sector. Especially, Spark, Hadoop, and MapReduce methods, which are leading methods in processing big data, were examined. Finally, the contributions made to the sector by artificial intelligence and big data processing techniques were generally summarized and the results were presented.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Impacts of Digitalization on Banks and Banking

next chapter The Impact of Artificial Intelligence on Central Banking and Monetary Policies

Identifying the license plate of a vehicle is actually determining the class of each letter on the license plate in an alphabet with an average number of 40 classes.

These companies promise the required infrastructure for businesses by providing monthly or annual subscriptions. Even though they are widely used today, some businesses are establishing their own data analysis departments for carrying out these analyses.

ENIAC- Electronic Numerical Integrator And Computer: “Built between 1943 and 1945 the first large-scale computer to run at electronic speed without being slowed by any mechanical parts” (CHM).

https://github.com/erdincuzun/ml_intro.

https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.

House Price Prediction with Numeric-only, Dataset https://www.kaggle.com/youngseoklee/house-price-prediction-with-numeric-only-dataset/data.

https://github.com/sowmyacr/kmeans_cluster/blob/master/CLV.csv.

https://archive.ics.uci.edu/ml/datasets/online+retail.

For detail: https://www.cs.waikato.ac.nz/ml/weka/.

For detail: https://rapidminer.com/.

Fro detail: https://orange.biolab.si/.

For detail: https://www.knime.com/.

For detail: https://www.python.org/.

For detail: https://www.r-project.org/.

Visit https://kudu.apache.org/ for more detailed information.

https://kafka.apache.org/.

Anderson TE, Dahlin MD, Neefe JM, et al (1995) Serverless network file systems. SIGOPS Oper Syst Rev 29:109–126. https://doi.org/10.1145/224057.224066.CrossRef

Apache Software Foundation (2019) Apache Hadoop 2.10.0—HDFS Architecture. https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Introduction. Accessed 13 Dec 2019.

Apache Software Foundation Apache Hadoop 3.2.1—Apache Hadoop YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 19 Dec 2019.

Apache Software Foundation Impala. https://impala.apache.org/overview.html. Accessed 23 Dec 2019.

The Apache Software Apache Hadoop. https://hadoop.apache.org/. Accessed 17 Dec 2019.

The Apache Software Foundation Apache Kudu—Fast Analytics on Fast Data. https://kudu.apache.org/. Accessed 19 Dec 2019.

Artis M, Ayuso M, Guillén M (2002) Detection of automobile insurance fraud with discrete choice models and misclassified claims. J Risk Insur 69:325–340.CrossRef

Atlassian (2016) Sentry Tutorial—Apache Sentry—Apache Software Foundation. https://cwiki.apache.org/confluence/display/SENTRY/Sentry+Tutorial. Accessed 23 Dec 2019.

Attiya H, Welch J (2004) Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley.

Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput Appl 19:1165–1195. https://doi.org/10.1007/s00521-010-0362-z.CrossRef

Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50:602–613. https://doi.org/10.1016/j.dss.2010.08.008.

Blazejewski A, Coggins R (2004) Application of self-organizing maps to clustering of high-frequency financial data. In: Proceedings of the Second Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, pp 85–90.

Boeing G, Waddell P (2017) New insights into rental housing markets across the united states: web scraping and analyzing craigslist rental listings. J Plan Educ Res 37:457–476. https://doi.org/10.1177/0739456X16664789.CrossRef

Brause R, Langsdorf T, Hepp M (1999) Neural data mining for credit card fraud detection. In: Proceedings 11th International Conference on Tools with Artificial Intelligence, pp 103–106.

Brockwell PJ, Davis RA (2016) Introduction to Time Series and Forecasting. Springer International Publishing.

Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 14:2. https://doi.org/10.5334/dsj-2015-002.CrossRef

Castillo O, Melin P (1995) An intelligent system for financial time series prediction combining dynamical systems theory, fractal theory, and statistical methods. In: Proceedings of 1995 Conference on Computational Intelligence for Financial Engineering (CIFEr). IEEE, pp 151–155.

Chapman B, Jost G, van der Pas R, Kuck DJ (2008) Using OpenMP: Portable Shared Memory Parallel Programming. https://apps2.mdp.ac.id/perpustakaan/ebook/Karya%20Umum/Portable_Shared_Memory_Parallel_Programming.pdf. Accessed 3 May 2020.

Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook. Springer US, Boston, MA, pp 875–886.CrossRef

Chen D, Sain SL, Guo K (2012) Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J Database Mark Cust Strateg Manag 19:197–208. https://doi.org/10.1057/dbm.2012.17.CrossRef

Chen S (2016) Detection of fraudulent financial statements using the hybrid data mining approach. Springerplus 5:89. https://doi.org/10.1186/s40064-016-1707-6.CrossRef

CHM ENIAC—CHM Revolution. https://www.computerhistory.org/revolution/birth-of-the-computer/4/78. Accessed 9 Dec 2019.

Chou C-H, Hsieh S-C, Qiu C-J (2017) Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction. Appl Soft Comput 56:298–316. https://doi.org/10.1016/j.asoc.2017.03.014.CrossRef

Ciszak L (2008) Application of clustering and association methods in data cleaning. In: 2008 International Multiconference on Computer Science and Information Technology, pp 97–103.

Cryer JD, Chan KS (2008) Time Series Analysis: With Applications in R. Springer. New York.CrossRef

Cumby C, Fano A, Ghani R, Krema M (2004) Predicting customer shopping lists from point-of-sale purchase data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, pp 402–409.

de Sá AGC, Pereira ACM, Pappa GL (2018) A customized classification algorithm for credit card fraud detection. Eng Appl Artif Intell 72:21–29. https://doi.org/10.1016/j.engappai.2018.03.011.

Doumpos M, Zopounidis C (2002) Multi–criteria classification methods in financial and banking decisions. Int Trans Oper Res 9:567–581. https://doi.org/10.1111/1475-3995.00374.CrossRef

Erl T, Khattak W, Buhler P (2015) Big Data Fundamentals: Concepts, Drivers & Techniques. Prentice Hall.

Farajian MA, Mohammadi S (2010) Mining the banking customer behavior using clustering and association rules methods. Int J Indust Eng Prod Res 21:239–245.

Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. ACM, New York, NY, USA, pp 29–43.

Gottfredson LS (1997) Mainstream science on intelligence: an editorial with 52 signatories, history, and bibliography. Intelligence 24:13–23. https://doi.org/10.1016/S0160-2896(97)90011-8.CrossRef

Guida T (2018) Big Data and Machine Learning in Quantitative Investment. Wiley.

Gupta R, Pathak C (2014) A machine learning framework for predicting purchase by online customers based on dynamic pricing. Procedia Comput Sci 36:599–605. https://doi.org/10.1016/j.procs.2014.09.060.CrossRef

Hamuro Y, Katoh N, Edward IH, et al (2003) Combining information fusion with string pattern analysis: a new method for predicting future purchase behavior BT—Information fusion in data mining. In: Torra V (ed). Springer Berlin Heidelberg, Berlin, Heidelberg, pp 161–187.

Holland JH (1992) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press.

Holmes A (2012) Hadoop in practice—MEAP. In: Hadoop in Practice. p 525.

Hsu CF, Hung HF (2009) Classification methods of credit rating—a comparative analysis on SVM, MDA and RST. In: 2009 International Conference on Computational Intelligence and Software Engineering. pp 1–4.

Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big Data for Dummies, For Dummies; 1st Edition (April 15, 2013).

Ishwarappa, Anuradha J (2015) A brief introduction on Big Data 5Vs characteristics and Hadoop Technology. Procedia Comput Sci 48:319–324. https://doi.org/10.1016/j.procs.2015.04.188.CrossRef

Joudaki H, Rashidian A, Minaei-Bidgoli B, et al (2015) Using data mining to detect health care fraud and abuse: a review of literature. Glob J Health Sci 7:194.

Kaggle, House Price Prediction with Numeric-only, https://www.kaggle.com/youngseoklee/house-price-prediction-with-numeric-only-dataset/data. Accessed 5 Apr 2020.

Khan MA, Uddin MF, Gupta N (2014) Seven V’s of Big Data understanding Big Data to extract value. In: Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education. IEEE, pp 1–5.

Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. Decis Support Syst 34:167–175. https://doi.org/10.1016/S0167-9236(02)00079-9.

Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Expert Syst Appl 32:995–1003.CrossRef

Kirlidog M, Asuk C (2012) A fraud detection approach with data mining in health insurance. Procedia-Social Behav Sci 62:989–994.CrossRef

Kshemkalyani AD, Singhal M (2011) Distributed Computing: Principles, Algorithms, and Systems. Cambridge University Press.

Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowledge-Based Syst 114:128–147.CrossRef

Kunigk J, Buss I, Wilkinson P, George L (2018) Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale. O’Reilly Media.

Labrinidis A, Jagadish H V (2012) Challenges and opportunities with Big Data. Proc VLDB Endow 5:2032–2033. https://doi.org/10.14778/2367502.2367572.

McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A proposal for the dartmouth summer research project on artificial intelligence, August 31, 1955. AI Magazine 27(4):12. https://doi.org/10.1609/aimag.v27i4.1904.CrossRef

Meng X, Bradley J, Yavuz B, et al (2016) MLlib: Machine Learning in Apache Spark. J Mach Learn Res 17:1–7.

Mitchell TM (1999) Machine learning and data mining. Commun ACM 42:30–36. https://doi.org/10.1145/319382.319388.CrossRef

Mukid MA, Widiharih T, Rusgiyono A, Prahutama A (2018) Credit scoring analysis using weighted k nearest neighbor. In: Warsito, B and Putro, SP and Khumaeni A (ed) 7th International Seminar on New Paradigm and Innovation on Natural Science and Its Application. IOP PUBLISHING LTD, DIRAC HOUSE, TEMPLE BACK, BRISTOL BS1 6BE, ENGLAND.

Ngai EWT, Hu Y, Wong YH, et al (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 50:559–569. https://doi.org/10.1016/j.dss.2010.08.006.

Owens JD, Houston M, Luebke D, et al (2008) GPU computing. Proc IEEE 96:879–899. https://doi.org/10.1109/JPROC.2008.917757.CrossRef

Pavlidis NG, Plagianakos VP, Tasoulis DK, Vrahatis MN (2006) Financial forecasting through unsupervised clustering and neural networks. Oper Res 6:103–127. https://doi.org/10.1007/BF02941227.CrossRef

Qiu J, Lin Z, Li Y (2015) Predicting customer purchase behavior in the e-commerce context. Electron Commer Res 15:427–452. https://doi.org/10.1007/s10660-015-9191-6.CrossRef

Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7.CrossRef

Sato M (2002) OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In: 15th International Symposium on System Synthesis, 2002. pp 109–111.

Schmuck F, Haskin R (2002) GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, USA.

Spaggiari JM, Kovacevic M, Noland B, Bosshart R (2018) Getting Started with Kudu: Perform Fast Analytics on Fast Data. O’Reilly Media.

Trobec R, Slivnik B, Bulić P, Robič B (2018) Introduction to Parallel Computing: From Algorithms to Programming on State-of-the-Art Platforms. Springer International Publishing.

Turkington G, Deshpande T, Karanth S (2016) Hadoop: Data Processing and Modelling. Packt Publishing.

Uzun E, Özhan E (2018) Examining the impact of feature selection on classification of user reviews in web pages. In: International Conference on Artificial Intelligence and Data Processing (IDAP 2018). Malatya, Turkey, pp 430–437.

Vohra D (2016) Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools. Apress.

Wagner W, Otto J, Chung Q (2002) Knowledge acquisition for expert systems in accounting and financial problem domains. Knowledge-Based Syst 15:439–447. https://doi.org/10.1016/S0950-7051(02)00026-6.CrossRef

Wang Y, Xu W (2018) Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95.CrossRef

Wei-Yang Lin, Ya-Han Hu, Chih-Fong Tsai (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man, Cybern Part C (Applications Rev) 42:421–436. https://doi.org/10.1109/TSMCC.2011.2170420.

Witten IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical Machine Learning Tools and Techniques. Elsevier Science.

Woodward WA, Gray HL, Elliott AC (2017) Applied Time Series Analysis with R. CRC Press.

Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26:97–107. https://doi.org/10.1109/TKDE.2013.109.CrossRef

Yao M, Zhou A, Jia M (2018) Applied Artificial Intelligence: A Handbook for Business Leaders. Topbots.

Yeh I-C, Lien C (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36:2473–2480. https://doi.org/10.1016/j.eswa.2007.12.020.CrossRef

Zhi-min Xu, Rui Zhang (2009) Financial revenue analysis based on association rules mining. In: 2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA), pp 220–223.

Zikopoulos P, Eaton C (2011) Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, 1st edn. McGraw-Hill Osborne Media.

Title: The Analysis of Big Financial Data Through Artificial Intelligence Methods
Authors: Erkan Ozhan
Erdinç Uzun
Publisher: Springer Nature Singapore
Book: The Impact of Artificial Intelligence on Governance, Economics and Finance, Volume I
Print ISBN: 978-981-336-810-1

Electronic ISBN: 978-981-336-811-8

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-981-33-6811-8_4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"