Top

Published in:

2016 | OriginalPaper | Chapter

2. Big Data Analytics

Authors : Chun-Wei Tsai, Chin-Feng Lai, Han-Chieh Chao, Athanasios V. Vasilakos

Published in: Big Data Technologies and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The age of big data is now coming. But the traditional data analytics may not be able to handle such large quantities of data. The question that arises now is, how to develop a high performance platform to efficiently analyze big data and how to design an appropriate mining algorithm to find the useful things from big data. To deeply discuss this issue, this paper begins with a brief introduction to data analytics, followed by the discussions of big data analytics. Some important open issues and further research directions will also be presented for the next step of big data analytics.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Introduction to Big Data

next chapter Transfer Learning Techniques

In this chapter, by the data analytics, we mean the whole KDD process, while by the data analysis, we mean the part of data analytics that is aimed at finding the hidden information in the data, such as data mining.

In this chapter, by an unlabeled input data, we mean that it is unknown to which group the input data belongs. If all the input data are unlabeled, it means that the distribution of the input data is unknown.

In this paper, the analysis framework refers to the whole system, from raw data gathering, data reformat, data analysis, all the way to knowledge representation.

The whole system may be down when the master machine crashed for a system that has only one master.

The learner typically represented the classification function which will create the classifier to help us classify the unknown input data.

The basic idea of [128] is that each ant will pick up and drop data items in terms of the similarity of its local neighbors.

Lyman P, Varian H. How much information 2003? Tech. Rep, 2004. [Online]. http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/printable_report.pdf.

Xu R, Wunsch D. Clustering. Hoboken: Wiley-IEEE Press; 2009.

Ding C, He X. K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on machine learning; 2004. pp. 1–9.

Kollios G, Gunopulos D, Koudas N, Berchtold S. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans Knowl Data Eng. 2003;15(5):1170–87.CrossRef

Fisher D, DeLine R, Czerwinski M, Drucker S. Interactions with big data analytics. Interactions. 2012;19(3):50–9.CrossRef

Laney D. 3D data management: controlling data volume, velocity, and variety. META Group, Tech. Rep. 2001. [Online]. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf.

van Rijmenam M. Why the 3v’s are not sufficient to describe big data. BigData Startups, Tech. Rep. 2013. [Online]. http://www.bigdata-startups.com/3vs-sufficient-describe-big-data/.

Borne K. Top 10 big data challenges a serious look at 10 big data v’s. Tech. Rep. 2014. [Online]. https://www.mapr.com/blog/top-10-big-data-challenges-look-10-big-data-v.

Press G. $16.1 billion big data market: 2014 predictions from IDC and IIA, Forbes, Tech. Rep. 2013. [Online]. http://www.forbes.com/sites/gilpress/2013/12/12/16-1-billion-big-data-market-2014-predictions-from-idc-and-iia/.

10.

Big data and analytics—an IDC four pillar research area. IDC, Tech. Rep. 2013. [Online]. http://www.idc.com/prodserv/FourPillars/bigData/index.jsp.

11.

Taft DK. Big data market to reach $46.34 billion by 2018. EWEEK, Tech. Rep. 2013. [Online]. http://www.eweek.com/database/big-data-market-to-reach-46.34-billion-by-2018.html.

12.

Research A. Big data spending to reach $114 billion in 2018; look for machine learning to drive analytics, ABI Research. Tech. Rep. 2013. [Online]. https://www.abiresearch.com/press/ big-data-spending-to-reach-114-billion-in-2018-loo.

13.

Furrier J. Big data market $50 billion by 2017—HP vertica comes out #1—according to wikibon research, SiliconANGLE. Tech. Rep. 2012. [Online]. http://siliconangle.com/blog/2012/02/15/ big-data-market-15-billion-by-2017-hp-vertica-comes-out-1-according-to-wikibon-research/.

14.

Kelly J, Vellante D, Floyer D. Big data market size and vendor revenues. Wikibon, Tech. Rep. 2014. [Online]. http://wikibon.org/wiki/v/Big_Data_Market_Size_and_Vendor_Revenues.

15.

Kelly J, Floyer D, Vellante D, Miniman S. Big data vendor revenue and market forecast 2012–2017, Wikibon. Tech. Rep. 2014. [Online]. http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2012-2017.

16.

Mayer-Schonberger V, Cukier K. Big data: a revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt; 2013.

17.

Chen H, Chiang RHL, Storey VC. Business intelligence and analytics: from big data to big impact. MIS Quart. 2012;36(4):1165–88.

18.

Kitchin R. The real-time city? big data and smart urbanism. Geo J. 2014;79(1):1–14.

19.

Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Mag. 1996;17(3):37–54.

20.

Han J. Data mining: concepts and techniques. San Francisco: Morgan Kaufmann Publishers Inc.; 2005.

21.

Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. Proc ACM SIGMOD Int Conf Manag Data. 1993;22(2):207–16.CrossRef

22.

Witten IH, Frank E. Data mining: practical machine learning tools and techniques. San Francisco: Morgan Kaufmann Publishers Inc.; 2005.MATH

23.

Abbass H, Newton C, Sarker R. Data mining: a heuristic approach. Hershey: IGI Global; 2002.CrossRef

24.

Cannataro M, Congiusta A, Pugliese A, Talia D, Trunfio P. Distributed data mining on grids: services, tools, and applications. IEEE Trans Syst Man Cyber Part B Cyber. 2004;34(6):2451–65.CrossRef

25.

Krishna K, Murty MN. Genetic k-means algorithm. IEEE Trans Syst Man Cyber Part B Cyber. 1999;29(3):433–9.CrossRef

26.

Tsai C-W, Lai C-F, Chiang M-C, Yang L. Data mining for internet of things: a survey. IEEE Commun Surv Tutor. 2014;16(1):77–97.CrossRef

27.

Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comp Surv. 1999;31(3):264–323.CrossRef

28.

McQueen JB. Some methods of classification and analysis of multivariate observations. In: Proceedings of the Berkeley symposium on mathematical statistics and probability; 1967. pp. 281–297.

29.

Safavian S, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cyber. 1991;21(3):660–74.MathSciNetCrossRef

30.

McCallum A, Nigam K. A comparison of event models for naive bayes text classification. In: Proceedings of the national conference on artificial intelligence; 1998. pp. 41–48.

31.

Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the annual workshop on computational learning theory; 1992. pp. 144–152.

32.

Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data; 2000. pp. 1–12.

33.

Kaya M, Alhajj R. Genetic algorithm based framework for mining fuzzy association rules. Fuzzy Sets Syst. 2005;152(3):587–601.MathSciNetCrossRefMATH

34.

Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology: advances in database technology; 1996. pp. 3–17.

35.

Zaki MJ. Spade: an efficient algorithm for mining frequent sequences. Mach Learn. 2001;42(1–2):31–60.CrossRefMATH

36.

Baeza-Yates RA, Ribeiro-Neto B. Modern Information Retrieval. Boston: Addison-Wesley Longman Publishing Co., Inc; 1999.

37.

Liu B. Web data mining: exploring hyperlinks, contents, and usage data. Berlin: Springer; 2007.MATH

38.

d’Aquin M, Jay N. Interpreting data mining results with linked data for learning analytics: motivation, case study and directions. In: Proceedings of the international conference on learning analytics and knowledge. pp. 155–164.

39.

Shneiderman B. The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the IEEE symposium on visual languages; 1996. pp. 336–343.

40.

Mani I, Bloedorn E. Multi-document summarization by graph search and matching. In: Proceedings of the national conference on artificial intelligence and ninth conference on innovative applications of artificial intelligence; 1997. pp. 622–628.

41.

Kopanakis I, Pelekis N, Karanikas H, Mavroudkis T. Visual techniques for the interpretation of data mining outcomes. In: Proceedings of the Panhellenic conference on advances in informatics; 2005. pp. 25–35.

42.

Elkan C. Using the triangle inequality to accelerate k-means. In: Proceedings of the international conference on machine learning; 2003. pp. 147–153.

43.

Catanzaro B, Sundaram N, Keutzer K. Fast support vector machine training and classification on graphics processors. In: Proceedings of the international conference on machine learning; 2008. pp. 104–111.

44.

Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data; 1996. pp. 103–114.

45.

Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; 1996. pp. 226–231.

46.

Ester M, Kriegel HP, Sander J, Wimmer M, Xu X. Incremental clustering for mining in a data warehousing environment. In: Proceedings of the International Conference on Very Large Data Bases; 1998. pp. 323–333.

47.

Ordonez C, Omiecinski E. Efficient disk-based k-means clustering for relational databases. IEEE Trans Knowl Data Eng. 2004;16(8):909–21.CrossRef

48.

Kogan J. Introduction to clustering large and high-dimensional data. Cambridge: Cambridge Univ Press; 2007.MATH

49.

Mitra S, Pal S, Mitra P. Data mining in soft computing framework: a survey. IEEE Trans Neural Netw. 2002;13(1):3–14.CrossRef

50.

Mehta M, Agrawal R, Rissanen J. SLIQ: a fast scalable classifier for data mining. In: Proceedings of the 5th international conference on extending database technology: advances in database technology; 1996. pp. 18–32.

51.

Micó L, Oncina J, Carrasco RC. A fast branch and bound nearest neighbour classifier in metric spaces. Pattern Recogn Lett. 1996;17(7):731–9.CrossRef

52.

Djouadi A, Bouktache E. A fast algorithm for the nearest-neighbor classifier. IEEE Trans Pattern Anal Mach Intel. 1997;19(3):277–82.CrossRef

53.

Ververidis D, Kotropoulos C. Fast and accurate sequential floating forward feature selection with the bayes classifier applied to speech emotion recognition. Signal Process. 2008;88(12):2956–70.CrossRefMATH

54.

Pei J, Han J, Mao R. CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery; 2000. pp. 21–30.

55.

Zaki MJ, Hsiao C-J. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng. 2005;17(4):462–78.CrossRef

56.

Burdick D, Calimlim M, Gehrke J. MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the international conference on data engineering; 2001. pp. 443–452.

57.

Chen B, Haas P, Scheuermann P. A new two-phase sampling based algorithm for discovering association rules. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2002. pp. 462–468.

58.

Yan X, Han J, Afshar R. CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining; 2003. pp. 166–177.

59.

Pei J, Han J, Asl MB, Pinto H, Chen Q, Dayal U, Hsu MC. PrefixSpan mining sequential patterns efficiently by prefix projected pattern growth. In: Proceedings of the international conference on data engineering; 2001. pp. 215–226.

60.

Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern Mining using a bitmap representation. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2002. pp. 429–435.

61.

Masseglia F, Poncelet P, Teisseire M. Incremental mining of sequential patterns in large databases. Data Knowl Eng. 2003;46(1):97–121.CrossRef

62.

Xu R, Wunsch-II DC. Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645–78.CrossRef

63.

Chiang M-C, Tsai C-W, Yang C-S. A time-efficient pattern reduction algorithm for k-means clustering. Inform Sci. 2011;181(4):716–31.CrossRef

64.

Bradley PS, Fayyad UM. Refining initial points for k-means clustering. In: Proceedings of the international conference on machine learning; 1998. pp. 91–99.

65.

Laskov P, Gehl C, Krüger S, Müller K-R. Incremental support vector learning: analysis, implementation and applications. J Mach Learn Res. 2006;7:1909–36.MathSciNetMATH

66.

Russom P. Big data analytics. TDWI: Tech. Rep; 2011.

67.

Ma C, Zhang HH, Wang X. Machine learning for big data analytics in plants. Trends Plant Sci. 2014;19(12):798–808.CrossRef

68.

Boyd D, Crawford K. Critical questions for big data. Inform Commun Soc. 2012;15(5):662–79.CrossRef

69.

Katal A, Wazid M, Goudar R. Big data: issues, challenges, tools and good practices. In: Proceedings of the international conference on contemporary computing; 2013. pp. 404–409.

70.

Baraniuk RG. More is less: signal processing and the data deluge. Science. 2011;331(6018):717–9.CrossRef

71.

Lee J, Hong S, Lee JH. An efficient prediction for heavy rain from big weather data using genetic algorithm. In: Proceedings of the international conference on ubiquitous information management and communication; 2014. pp. 25:1–25:7.

72.

Famili A, Shen W-M, Weber R, Simoudis E. Data preprocessing and intelligent data analysis. Intel Data Anal. 1997;1(1–4):3–23.CrossRef

73.

Zhang H. A novel data preprocessing solution for large scale digital forensics investigation on big data, Master’s thesis, Norway; 2013.

74.

Ham YJ, Lee H-W. International journal of advances in soft computing and its applications. Calc Paralleles Reseaux et Syst Repar. 2014;6(1):1–18.

75.

Cormode G, Duffield N. Sampling for big data: a tutorial. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2014. pp. 1975–1975.

76.

Satyanarayana A. Intelligent sampling for big data using bootstrap sampling and chebyshev inequality. In: Proceedings of the IEEE Canadian conference on electrical and computer engineering; 2014. pp. 1–6.

77.

Jun SW, Fleming K, Adler M, Emer JS. Zip-io: architecture for application-specific compression of big data. In: Proceedings of the international conference on field-programmable technology; 2012. pp. 343–351.

78.

Zou H, Yu Y, Tang W, Chen HM. Improving I/O performance with adaptive data compression for big data applications. In: Proceedings of the international parallel and distributed processing symposium workshops; 2014. pp. 1228–1237.

79.

Yang C, Zhang X, Zhong C, Liu C, Pei J, Ramamohanarao K, Chen J. A spatiotemporal compression based approach for efficient big data processing on cloud. J Comp Syst Sci. 2014;80(8):1563–83.MathSciNetCrossRefMATH

80.

Xue Z, Shen G, Li J, Xu Q, Zhang Y, Shao J. Compression-aware I/O performance analysis for big data clustering. In: Proceedings of the international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications; 2012. pp. 45–52.

81.

Pospiech M, Felden C. Big data—a state-of-the-art. In: Proceedings of the Americas conference on information systems; 2012. pp. 1–23. [Online]. http://aisel.aisnet.org/amcis2012/proceedings/DecisionSupport/22.

82.

Apache Hadoop, February 2, 2015. [Online]. http://hadoop.apache.org.

83.

Cuda, February 2, 2015. [Online]. http://www.nvidia.com/object/cuda_home_new.html.

84.

Apache Storm, February 2, 2015. [Online]. http://storm.apache.org/.

85.

Curtin RR, Cline JR, Slagle NP, March WB, Ram P, Mehta NA, Gray AG. MLPACK: a scalable C++ machine learning library. J Mach Learn Res. 2013;14:801–5.MathSciNetMATH

86.

Apache Mahout, February 2, 2015. [Online]. http://mahout.apache.org/.

87.

Huai Y, Lee R, Zhang S, Xia CH, Zhang X. DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceedings of the ACM symposium on cloud computing; 2011. pp. 4:1–4:14.

88.

Rusu F, Dobra A. GLADE: a scalable framework for efficient analytics. In: Proceedings of LADIS workshop held in conjunction with VLDB; 2012. pp. 1–6.

89.

Cheng Y, Qin C, Rusu F. GLADE: big data analytics made easy. In: Proceedings of the ACM SIGMOD international conference on management of data; 2012. pp. 697–700.

90.

Essa YM, Attiya G, El-Sayed A. Mobile agent based new framework for improving big data analysis. In: Proceedings of the international conference on cloud computing and big data; 2013. pp. 381–386.

91.

Wonner J, Grosjean J, Capobianco A, Bechmann. D Starfish: a selection technique for dense virtual environments. In: Proceedings of the ACM symposium on virtual reality software and technology; 2012. pp. 101–104.

92.

Demchenko Y, de Laat C, Membrey P. Defining architecture components of the big data ecosystem. In: Proceedings of the international conference on collaboration technologies and systems; 2014. pp. 104–112.

93.

Ye F, Wang ZJ, Zhou FC, Wang YP, Zhou YC. Cloud-based big data mining and analyzing services platform integrating r. In: Proceedings of the international conference on advanced cloud and big data; 2013. pp. 147–151.

94.

Wu X, Zhu X, Wu G-Q, Ding W. Data mining with big data. IEEE Trans Knowl Data Eng. 2014;26(1):97–107.CrossRef

95.

Laurila JK, Gatica-Perez D, Aad I, Blom J, Bornet O, Do T, Dousse O, Eberle J, Miettinen M. The mobile data challenge: big data for mobile computing research. In: Proceedings of the mobile data challenge by Nokia workshop; 2012. pp. 1–8.

96.

Demirkan H, Delen D. Leveraging the capabilities of service-oriented decision support systems: putting analytics and big data in cloud. Decis Support Syst. 2013;55(1):412–21.CrossRef

97.

Talia D. Clouds for scalable big data analytics. Computer. 2013;46(5):98–101.CrossRef

98.

Lu R, Zhu H, Liu X, Liu JK, Shao J. Toward efficient and privacy-preserving computing in big data era. IEEE Netw. 2014;28(4):46–50.CrossRef

99.

Cuzzocrea A, Song IY, Davis KC. Analytics over large-scale multidimensional data: the big data revolution!. In: Proceedings of the ACM international workshop on data warehousing and OLAP; 2011. pp. 101–104.

100.

Zhang J, Huang ML. 5Ws model for big data analysis and visualization. In: Proceedings of the international conference on computational science and engineering; 2013. pp. 1021–1028.

101.

Chandarana P, Vijayalakshmi M. Big data analytics frameworks. In: Proceedings of the international conference on circuits, systems, communication and information technology applications; 2014. pp. 430–434.

102.

Apache Drill February 2, 2015. [Online]. http://drill.apache.org/.

103.

Hu H, Wen Y, Chua T-S, Li X. Toward scalable systems for big data analytics: a technology tutorial. IEEE Access. 2014;2:652–87.CrossRef

104.

Sagiroglu S, Sinanc D, Big data: a review. In: Proceedings of the international conference on collaboration technologies and systems; 2013. pp. 42–47.

105.

Fan W, Bifet A. Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newslett. 2013;14(2):1–5.CrossRef

106.

Diebold FX. On the origin(s) and development of the term “big data”. Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, Tech. Rep. 2012. [Online]. http://economics.sas. upenn.edu/sites/economics.sas.upenn.edu/files/12-037.pdf.

107.

Weiss SM, Indurkhya N. Predictive data mining: a practical guide. San Francisco: Morgan Kaufmann Publishers Inc.; 1998.MATH

108.

Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A, Foufou S, Bouras A. A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Topics Comp. 2014;2(3):267–79.CrossRef

109.

Shirkhorshidi AS, Aghabozorgi SR, Teh YW, Herawan T. Big data clustering: a review. In: Proceedings of the international conference on computational science and its applications; 2014. pp. 707–720.

110.

Xu H, Li Z, Guo S, Chen K. Cloudvista: interactive and economical visual cluster analysis for big data in the cloud. Proc VLDB Endow. 2012;5(12):1886–9.CrossRef

111.

Cui X, Gao J, Potok TE. A flocking based algorithm for document clustering analysis. J Syst Archit. 2006;52(89):505–15.CrossRef

112.

Cui X, Charles JS, Potok T. GPU enhanced parallel computing for large scale data clustering. Future Gener Comp Syst. 2013;29(7):1736–41.CrossRef

113.

Feldman D, Schmidt M, Sohler C. Turning big data into tiny data: constant-size coresets for k-means, pca and projective clustering. In: Proceedings of the ACM-SIAM symposium on discrete algorithms; 2013. pp. 1434–1453.

114.

Tekin C, van der Schaar M. Distributed online big data classification using context information. In: Proceedings of the Allerton conference on communication, control, and computing; 2013. pp. 1435–1442.

115.

Rebentrost P, Mohseni M, Lloyd S. Quantum support vector machine for big feature and big data classification. CoRR, vol. abs/1307.0471; 2014. [Online]. http://dblp.uni-trier.de/db/journals/corr/corr1307.html#RebentrostML13.

116.

Lin MY, Lee PY, Hsueh SC. Apriori-based frequent itemset mining algorithms on mapreduce. In: Proceedings of the international conference on ubiquitous information management and communication; 2012. pp. 76:1–76:8.

117.

Riondato M, DeBrabant JA, Fonseca R, Upfal E. PARMA: a parallel randomized algorithm for approximate association rules mining in mapreduce. In: Proceedings of the ACM international conference on information and knowledge management; 2012. pp. 85–94.

118.

Leung CS, MacKinnon R, Jiang F. Reducing the search space for big data mining for interesting patterns from uncertain data. In: Proceedings of the international congress on big data; 2014. pp. 315–322.

119.

Yang L, Shi Z, Xu L, Liang F, Kirsh I. DH-TRIE frequent pattern mining on hadoop using JPA. In: Proceedings of the international conference on granular computing; 2011. pp. 875–878.

120.

Huang JW, Lin SC, Chen MS. DPSP: Distributed progressive sequential pattern mining on the cloud. In: Proceedings of the advances in knowledge discovery and data mining, vol. 6119; 2010. pp. 27–34.

121.

Paz CE. A survey of parallel genetic algorithms. Calc Paralleles Reseaux et Syst Repar. 1998;10(2):141–71.

122.

Kranthi Kiran B, Babu AV. A comparative study of issues in big data clustering algorithm with constraint based genetic algorithm for associative clustering. Int J Innov Res Comp Commun Eng. 2014;2(8):5423–32.

123.

Bu Y, Borkar VR, Carey MJ, Rosen J, Polyzotis N, Condie T, Weimer M, Ramakrishnan R. Scaling datalog for machine learning on big data, CoRR, vol. abs/1203.0160, 2012. [Online]. http://dblp.uni-trier.de/db/journals/corr/corr1203.html#abs-1203-0160.

124.

Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In: Proceedings of the ACM SIGMOD international conference on management of data; 2010. pp. 135–146.

125.

Hasan S, Shamsuddin S, Lopes N. Soft computing methods for big data problems. In: Proceedings of the symposium on GPU computing and applications; 2013. pp. 235–247.

126.

Ku-Mahamud KR. Big data clustering using grid computing and ant-based algorithm. In: Proceedings of the international conference on computing and informatics; 2013. pp. 6–14.

127.

Deneubourg JL, Goss S, Franks N, Sendova-Franks A, Detrain C, Chrétien L. The dynamics of collective sorting robot-like ants and ant-like robots. In: Proceedings of the international conference on simulation of adaptive behavior on from animals to animats; 1990. pp. 356–363.

128.

Radoop [Online]. https://rapidminer.com/products/radoop/. Accessed 2 Feb 2015.

129.

PigMix [Online]. https://cwiki.apache.org/confluence/display/PIG/PigMix. Accessed 2 Feb 2015.

130.

GridMix [Online]. http://hadoop.apache.org/docs/r1.2.1/gridmix.html. Accessed 2 Feb 2015.

131.

TeraSoft [Online]. http://sortbenchmark.org/. Accessed 2 Feb 2015.

132.

TPC, transaction processing performance council [Online]. http://www.tpc.org/. Accessed 2 Feb 2015.

133.

Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proceedings of the ACM symposium on cloud computing; 2010. pp. 143–154.

134.

Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA. BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD international conference on management of data; 2013. pp. 1197–1208.

135.

Cheptsov A. Hpc in big data age: An evaluation report for java-based data-intensive applications implemented with hadoop and openmpi. In: Proceedings of the European MPI Users’ Group Meeting; 2014. pp. 175:175–175:180.

136.

Yuan LY, Wu L, You JH, Chi Y. Rubato db: A highly scalable staged grid database system for OLTP and big data applications. In: Proceedings of the ACM international conference on conference on information and knowledge management; 2014. pp. 1–10.

137.

Zhao JM, Wang WS, Liu X, Chen YF. Big data benchmark—big DS. In: Proceedings of the advancing big data benchmarks; 2014. pp. 49–57.

138.

Saletore V, Krishnan K, Viswanathan V, Tolentino M. HcBench: Methodology, development, and full-system characterization of a customer usage representative big data/hadoop benchmark. In: Advancing big data benchmarks; 2014. pp. 73–93.

139.

Zhang L, Stoffel A, Behrisch M, Mittelstadt S, Schreck T, Pompl R, Weber S, Last H, Keim D. Visual analytics for the big data era—a comparative review of state-of-the-art commercial systems. In: Proceedings of the IEEE conference on visual analytics science and technology; 2012. pp. 173–182.

140.

Harati A, Lopez S, Obeid I, Picone J, Jacobson M, Tobochnik S. The TUH EEG CORPUS: A big data resource for automated EEG interpretation. In: Proceeding of the IEEE signal processing in medicine and biology symposium; 2014. pp. 1–5.

141.

Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endow. 2009;2(2):1626–9.CrossRef

142.

Beckmann M, Ebecken NFF, de Lima BSLP,Costa MA. A user interface for big data with rapidminer. RapidMiner World, Boston, MA, Tech. Rep.; 2014. [Online]. http://www.slideshare.net/RapidMiner/a-user-interface-for-big-data-with-rapidminer-marcelo-beckmann.

143.

Januzaj E, Kriegel HP, Pfeifle M. DBDC: Density based distributed clustering. In: Proceedings of the advances in database technology, vol. 2992; 2004. pp. 88–105.

144.

Zhao W, Ma H, He Q. Parallel k-means clustering based on mapreduce. Proc Cloud Comp. 2009;5931:674–9.CrossRef

145.

Nolan RL. Managing the crises in data processing. Harvard Bus Rev. 1979;57(1):115–26.

146.

Tsai CW, Huang WC, Chiang MC. Recent development of metaheuristics for clustering. In: Proceedings of the mobile, ubiquitous, and intelligent computing, vol. 274. 2014; pp. 629–636.

Title: Big Data Analytics
Authors: Chun-Wei Tsai
Chin-Feng Lai
Han-Chieh Chao
Athanasios V. Vasilakos
Publisher: Springer International Publishing
Book: Big Data Technologies and Applications
Print ISBN: 978-3-319-44548-9

Electronic ISBN: 978-3-319-44550-2

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-44550-2_2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner