Skip to main content
Top

2016 | OriginalPaper | Chapter

2. Big Data Analytics

Authors : Chun-Wei Tsai, Chin-Feng Lai, Han-Chieh Chao, Athanasios V. Vasilakos

Published in: Big Data Technologies and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The age of big data is now coming. But the traditional data analytics may not be able to handle such large quantities of data. The question that arises now is, how to develop a high performance platform to efficiently analyze big data and how to design an appropriate mining algorithm to find the useful things from big data. To deeply discuss this issue, this paper begins with a brief introduction to data analytics, followed by the discussions of big data analytics. Some important open issues and further research directions will also be presented for the next step of big data analytics.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In this chapter, by the data analytics, we mean the whole KDD process, while by the data analysis, we mean the part of data analytics that is aimed at finding the hidden information in the data, such as data mining.
 
2
In this chapter, by an unlabeled input data, we mean that it is unknown to which group the input data belongs. If all the input data are unlabeled, it means that the distribution of the input data is unknown.
 
3
In this paper, the analysis framework refers to the whole system, from raw data gathering, data reformat, data analysis, all the way to knowledge representation.
 
4
The whole system may be down when the master machine crashed for a system that has only one master.
 
5
The learner typically represented the classification function which will create the classifier to help us classify the unknown input data.
 
6
The basic idea of [128] is that each ant will pick up and drop data items in terms of the similarity of its local neighbors.
 
Literature
2.
go back to reference Xu R, Wunsch D. Clustering. Hoboken: Wiley-IEEE Press; 2009. Xu R, Wunsch D. Clustering. Hoboken: Wiley-IEEE Press; 2009.
3.
go back to reference Ding C, He X. K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on machine learning; 2004. pp. 1–9. Ding C, He X. K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on machine learning; 2004. pp. 1–9.
4.
go back to reference Kollios G, Gunopulos D, Koudas N, Berchtold S. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans Knowl Data Eng. 2003;15(5):1170–87.CrossRef Kollios G, Gunopulos D, Koudas N, Berchtold S. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans Knowl Data Eng. 2003;15(5):1170–87.CrossRef
5.
go back to reference Fisher D, DeLine R, Czerwinski M, Drucker S. Interactions with big data analytics. Interactions. 2012;19(3):50–9.CrossRef Fisher D, DeLine R, Czerwinski M, Drucker S. Interactions with big data analytics. Interactions. 2012;19(3):50–9.CrossRef
12.
13.
16.
go back to reference Mayer-Schonberger V, Cukier K. Big data: a revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt; 2013. Mayer-Schonberger V, Cukier K. Big data: a revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt; 2013.
17.
go back to reference Chen H, Chiang RHL, Storey VC. Business intelligence and analytics: from big data to big impact. MIS Quart. 2012;36(4):1165–88. Chen H, Chiang RHL, Storey VC. Business intelligence and analytics: from big data to big impact. MIS Quart. 2012;36(4):1165–88.
18.
go back to reference Kitchin R. The real-time city? big data and smart urbanism. Geo J. 2014;79(1):1–14. Kitchin R. The real-time city? big data and smart urbanism. Geo J. 2014;79(1):1–14.
19.
go back to reference Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Mag. 1996;17(3):37–54. Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Mag. 1996;17(3):37–54.
20.
go back to reference Han J. Data mining: concepts and techniques. San Francisco: Morgan Kaufmann Publishers Inc.; 2005. Han J. Data mining: concepts and techniques. San Francisco: Morgan Kaufmann Publishers Inc.; 2005.
21.
go back to reference Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. Proc ACM SIGMOD Int Conf Manag Data. 1993;22(2):207–16.CrossRef Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. Proc ACM SIGMOD Int Conf Manag Data. 1993;22(2):207–16.CrossRef
22.
go back to reference Witten IH, Frank E. Data mining: practical machine learning tools and techniques. San Francisco: Morgan Kaufmann Publishers Inc.; 2005.MATH Witten IH, Frank E. Data mining: practical machine learning tools and techniques. San Francisco: Morgan Kaufmann Publishers Inc.; 2005.MATH
23.
go back to reference Abbass H, Newton C, Sarker R. Data mining: a heuristic approach. Hershey: IGI Global; 2002.CrossRef Abbass H, Newton C, Sarker R. Data mining: a heuristic approach. Hershey: IGI Global; 2002.CrossRef
24.
go back to reference Cannataro M, Congiusta A, Pugliese A, Talia D, Trunfio P. Distributed data mining on grids: services, tools, and applications. IEEE Trans Syst Man Cyber Part B Cyber. 2004;34(6):2451–65.CrossRef Cannataro M, Congiusta A, Pugliese A, Talia D, Trunfio P. Distributed data mining on grids: services, tools, and applications. IEEE Trans Syst Man Cyber Part B Cyber. 2004;34(6):2451–65.CrossRef
25.
go back to reference Krishna K, Murty MN. Genetic k-means algorithm. IEEE Trans Syst Man Cyber Part B Cyber. 1999;29(3):433–9.CrossRef Krishna K, Murty MN. Genetic k-means algorithm. IEEE Trans Syst Man Cyber Part B Cyber. 1999;29(3):433–9.CrossRef
26.
go back to reference Tsai C-W, Lai C-F, Chiang M-C, Yang L. Data mining for internet of things: a survey. IEEE Commun Surv Tutor. 2014;16(1):77–97.CrossRef Tsai C-W, Lai C-F, Chiang M-C, Yang L. Data mining for internet of things: a survey. IEEE Commun Surv Tutor. 2014;16(1):77–97.CrossRef
27.
go back to reference Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comp Surv. 1999;31(3):264–323.CrossRef Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comp Surv. 1999;31(3):264–323.CrossRef
28.
go back to reference McQueen JB. Some methods of classification and analysis of multivariate observations. In: Proceedings of the Berkeley symposium on mathematical statistics and probability; 1967. pp. 281–297. McQueen JB. Some methods of classification and analysis of multivariate observations. In: Proceedings of the Berkeley symposium on mathematical statistics and probability; 1967. pp. 281–297.
29.
go back to reference Safavian S, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cyber. 1991;21(3):660–74.MathSciNetCrossRef Safavian S, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cyber. 1991;21(3):660–74.MathSciNetCrossRef
30.
go back to reference McCallum A, Nigam K. A comparison of event models for naive bayes text classification. In: Proceedings of the national conference on artificial intelligence; 1998. pp. 41–48. McCallum A, Nigam K. A comparison of event models for naive bayes text classification. In: Proceedings of the national conference on artificial intelligence; 1998. pp. 41–48.
31.
go back to reference Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the annual workshop on computational learning theory; 1992. pp. 144–152. Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. In: Proceedings of the annual workshop on computational learning theory; 1992. pp. 144–152.
32.
go back to reference Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data; 2000. pp. 1–12. Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data; 2000. pp. 1–12.
33.
34.
go back to reference Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology: advances in database technology; 1996. pp. 3–17. Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology: advances in database technology; 1996. pp. 3–17.
35.
go back to reference Zaki MJ. Spade: an efficient algorithm for mining frequent sequences. Mach Learn. 2001;42(1–2):31–60.CrossRefMATH Zaki MJ. Spade: an efficient algorithm for mining frequent sequences. Mach Learn. 2001;42(1–2):31–60.CrossRefMATH
36.
go back to reference Baeza-Yates RA, Ribeiro-Neto B. Modern Information Retrieval. Boston: Addison-Wesley Longman Publishing Co., Inc; 1999. Baeza-Yates RA, Ribeiro-Neto B. Modern Information Retrieval. Boston: Addison-Wesley Longman Publishing Co., Inc; 1999.
37.
go back to reference Liu B. Web data mining: exploring hyperlinks, contents, and usage data. Berlin: Springer; 2007.MATH Liu B. Web data mining: exploring hyperlinks, contents, and usage data. Berlin: Springer; 2007.MATH
38.
go back to reference d’Aquin M, Jay N. Interpreting data mining results with linked data for learning analytics: motivation, case study and directions. In: Proceedings of the international conference on learning analytics and knowledge. pp. 155–164. d’Aquin M, Jay N. Interpreting data mining results with linked data for learning analytics: motivation, case study and directions. In: Proceedings of the international conference on learning analytics and knowledge. pp. 155–164.
39.
go back to reference Shneiderman B. The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the IEEE symposium on visual languages; 1996. pp. 336–343. Shneiderman B. The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the IEEE symposium on visual languages; 1996. pp. 336–343.
40.
go back to reference Mani I, Bloedorn E. Multi-document summarization by graph search and matching. In: Proceedings of the national conference on artificial intelligence and ninth conference on innovative applications of artificial intelligence; 1997. pp. 622–628. Mani I, Bloedorn E. Multi-document summarization by graph search and matching. In: Proceedings of the national conference on artificial intelligence and ninth conference on innovative applications of artificial intelligence; 1997. pp. 622–628.
41.
go back to reference Kopanakis I, Pelekis N, Karanikas H, Mavroudkis T. Visual techniques for the interpretation of data mining outcomes. In: Proceedings of the Panhellenic conference on advances in informatics; 2005. pp. 25–35. Kopanakis I, Pelekis N, Karanikas H, Mavroudkis T. Visual techniques for the interpretation of data mining outcomes. In: Proceedings of the Panhellenic conference on advances in informatics; 2005. pp. 25–35.
42.
go back to reference Elkan C. Using the triangle inequality to accelerate k-means. In: Proceedings of the international conference on machine learning; 2003. pp. 147–153. Elkan C. Using the triangle inequality to accelerate k-means. In: Proceedings of the international conference on machine learning; 2003. pp. 147–153.
43.
go back to reference Catanzaro B, Sundaram N, Keutzer K. Fast support vector machine training and classification on graphics processors. In: Proceedings of the international conference on machine learning; 2008. pp. 104–111. Catanzaro B, Sundaram N, Keutzer K. Fast support vector machine training and classification on graphics processors. In: Proceedings of the international conference on machine learning; 2008. pp. 104–111.
44.
go back to reference Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data; 1996. pp. 103–114. Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data; 1996. pp. 103–114.
45.
go back to reference Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; 1996. pp. 226–231. Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; 1996. pp. 226–231.
46.
go back to reference Ester M, Kriegel HP, Sander J, Wimmer M, Xu X. Incremental clustering for mining in a data warehousing environment. In: Proceedings of the International Conference on Very Large Data Bases; 1998. pp. 323–333. Ester M, Kriegel HP, Sander J, Wimmer M, Xu X. Incremental clustering for mining in a data warehousing environment. In: Proceedings of the International Conference on Very Large Data Bases; 1998. pp. 323–333.
47.
go back to reference Ordonez C, Omiecinski E. Efficient disk-based k-means clustering for relational databases. IEEE Trans Knowl Data Eng. 2004;16(8):909–21.CrossRef Ordonez C, Omiecinski E. Efficient disk-based k-means clustering for relational databases. IEEE Trans Knowl Data Eng. 2004;16(8):909–21.CrossRef
48.
go back to reference Kogan J. Introduction to clustering large and high-dimensional data. Cambridge: Cambridge Univ Press; 2007.MATH Kogan J. Introduction to clustering large and high-dimensional data. Cambridge: Cambridge Univ Press; 2007.MATH
49.
go back to reference Mitra S, Pal S, Mitra P. Data mining in soft computing framework: a survey. IEEE Trans Neural Netw. 2002;13(1):3–14.CrossRef Mitra S, Pal S, Mitra P. Data mining in soft computing framework: a survey. IEEE Trans Neural Netw. 2002;13(1):3–14.CrossRef
50.
go back to reference Mehta M, Agrawal R, Rissanen J. SLIQ: a fast scalable classifier for data mining. In: Proceedings of the 5th international conference on extending database technology: advances in database technology; 1996. pp. 18–32. Mehta M, Agrawal R, Rissanen J. SLIQ: a fast scalable classifier for data mining. In: Proceedings of the 5th international conference on extending database technology: advances in database technology; 1996. pp. 18–32.
51.
go back to reference Micó L, Oncina J, Carrasco RC. A fast branch and bound nearest neighbour classifier in metric spaces. Pattern Recogn Lett. 1996;17(7):731–9.CrossRef Micó L, Oncina J, Carrasco RC. A fast branch and bound nearest neighbour classifier in metric spaces. Pattern Recogn Lett. 1996;17(7):731–9.CrossRef
52.
go back to reference Djouadi A, Bouktache E. A fast algorithm for the nearest-neighbor classifier. IEEE Trans Pattern Anal Mach Intel. 1997;19(3):277–82.CrossRef Djouadi A, Bouktache E. A fast algorithm for the nearest-neighbor classifier. IEEE Trans Pattern Anal Mach Intel. 1997;19(3):277–82.CrossRef
53.
go back to reference Ververidis D, Kotropoulos C. Fast and accurate sequential floating forward feature selection with the bayes classifier applied to speech emotion recognition. Signal Process. 2008;88(12):2956–70.CrossRefMATH Ververidis D, Kotropoulos C. Fast and accurate sequential floating forward feature selection with the bayes classifier applied to speech emotion recognition. Signal Process. 2008;88(12):2956–70.CrossRefMATH
54.
go back to reference Pei J, Han J, Mao R. CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery; 2000. pp. 21–30. Pei J, Han J, Mao R. CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery; 2000. pp. 21–30.
55.
go back to reference Zaki MJ, Hsiao C-J. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng. 2005;17(4):462–78.CrossRef Zaki MJ, Hsiao C-J. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng. 2005;17(4):462–78.CrossRef
56.
go back to reference Burdick D, Calimlim M, Gehrke J. MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the international conference on data engineering; 2001. pp. 443–452. Burdick D, Calimlim M, Gehrke J. MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the international conference on data engineering; 2001. pp. 443–452.
57.
go back to reference Chen B, Haas P, Scheuermann P. A new two-phase sampling based algorithm for discovering association rules. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2002. pp. 462–468. Chen B, Haas P, Scheuermann P. A new two-phase sampling based algorithm for discovering association rules. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2002. pp. 462–468.
58.
go back to reference Yan X, Han J, Afshar R. CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining; 2003. pp. 166–177. Yan X, Han J, Afshar R. CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining; 2003. pp. 166–177.
59.
go back to reference Pei J, Han J, Asl MB, Pinto H, Chen Q, Dayal U, Hsu MC. PrefixSpan mining sequential patterns efficiently by prefix projected pattern growth. In: Proceedings of the international conference on data engineering; 2001. pp. 215–226. Pei J, Han J, Asl MB, Pinto H, Chen Q, Dayal U, Hsu MC. PrefixSpan mining sequential patterns efficiently by prefix projected pattern growth. In: Proceedings of the international conference on data engineering; 2001. pp. 215–226.
60.
go back to reference Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern Mining using a bitmap representation. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2002. pp. 429–435. Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern Mining using a bitmap representation. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2002. pp. 429–435.
61.
go back to reference Masseglia F, Poncelet P, Teisseire M. Incremental mining of sequential patterns in large databases. Data Knowl Eng. 2003;46(1):97–121.CrossRef Masseglia F, Poncelet P, Teisseire M. Incremental mining of sequential patterns in large databases. Data Knowl Eng. 2003;46(1):97–121.CrossRef
62.
go back to reference Xu R, Wunsch-II DC. Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645–78.CrossRef Xu R, Wunsch-II DC. Survey of clustering algorithms. IEEE Trans Neural Netw. 2005;16(3):645–78.CrossRef
63.
go back to reference Chiang M-C, Tsai C-W, Yang C-S. A time-efficient pattern reduction algorithm for k-means clustering. Inform Sci. 2011;181(4):716–31.CrossRef Chiang M-C, Tsai C-W, Yang C-S. A time-efficient pattern reduction algorithm for k-means clustering. Inform Sci. 2011;181(4):716–31.CrossRef
64.
go back to reference Bradley PS, Fayyad UM. Refining initial points for k-means clustering. In: Proceedings of the international conference on machine learning; 1998. pp. 91–99. Bradley PS, Fayyad UM. Refining initial points for k-means clustering. In: Proceedings of the international conference on machine learning; 1998. pp. 91–99.
65.
go back to reference Laskov P, Gehl C, Krüger S, Müller K-R. Incremental support vector learning: analysis, implementation and applications. J Mach Learn Res. 2006;7:1909–36.MathSciNetMATH Laskov P, Gehl C, Krüger S, Müller K-R. Incremental support vector learning: analysis, implementation and applications. J Mach Learn Res. 2006;7:1909–36.MathSciNetMATH
66.
go back to reference Russom P. Big data analytics. TDWI: Tech. Rep; 2011. Russom P. Big data analytics. TDWI: Tech. Rep; 2011.
67.
go back to reference Ma C, Zhang HH, Wang X. Machine learning for big data analytics in plants. Trends Plant Sci. 2014;19(12):798–808.CrossRef Ma C, Zhang HH, Wang X. Machine learning for big data analytics in plants. Trends Plant Sci. 2014;19(12):798–808.CrossRef
68.
go back to reference Boyd D, Crawford K. Critical questions for big data. Inform Commun Soc. 2012;15(5):662–79.CrossRef Boyd D, Crawford K. Critical questions for big data. Inform Commun Soc. 2012;15(5):662–79.CrossRef
69.
go back to reference Katal A, Wazid M, Goudar R. Big data: issues, challenges, tools and good practices. In: Proceedings of the international conference on contemporary computing; 2013. pp. 404–409. Katal A, Wazid M, Goudar R. Big data: issues, challenges, tools and good practices. In: Proceedings of the international conference on contemporary computing; 2013. pp. 404–409.
70.
go back to reference Baraniuk RG. More is less: signal processing and the data deluge. Science. 2011;331(6018):717–9.CrossRef Baraniuk RG. More is less: signal processing and the data deluge. Science. 2011;331(6018):717–9.CrossRef
71.
go back to reference Lee J, Hong S, Lee JH. An efficient prediction for heavy rain from big weather data using genetic algorithm. In: Proceedings of the international conference on ubiquitous information management and communication; 2014. pp. 25:1–25:7. Lee J, Hong S, Lee JH. An efficient prediction for heavy rain from big weather data using genetic algorithm. In: Proceedings of the international conference on ubiquitous information management and communication; 2014. pp. 25:1–25:7.
72.
go back to reference Famili A, Shen W-M, Weber R, Simoudis E. Data preprocessing and intelligent data analysis. Intel Data Anal. 1997;1(1–4):3–23.CrossRef Famili A, Shen W-M, Weber R, Simoudis E. Data preprocessing and intelligent data analysis. Intel Data Anal. 1997;1(1–4):3–23.CrossRef
73.
go back to reference Zhang H. A novel data preprocessing solution for large scale digital forensics investigation on big data, Master’s thesis, Norway; 2013. Zhang H. A novel data preprocessing solution for large scale digital forensics investigation on big data, Master’s thesis, Norway; 2013.
74.
go back to reference Ham YJ, Lee H-W. International journal of advances in soft computing and its applications. Calc Paralleles Reseaux et Syst Repar. 2014;6(1):1–18. Ham YJ, Lee H-W. International journal of advances in soft computing and its applications. Calc Paralleles Reseaux et Syst Repar. 2014;6(1):1–18.
75.
go back to reference Cormode G, Duffield N. Sampling for big data: a tutorial. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2014. pp. 1975–1975. Cormode G, Duffield N. Sampling for big data: a tutorial. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining; 2014. pp. 1975–1975.
76.
go back to reference Satyanarayana A. Intelligent sampling for big data using bootstrap sampling and chebyshev inequality. In: Proceedings of the IEEE Canadian conference on electrical and computer engineering; 2014. pp. 1–6. Satyanarayana A. Intelligent sampling for big data using bootstrap sampling and chebyshev inequality. In: Proceedings of the IEEE Canadian conference on electrical and computer engineering; 2014. pp. 1–6.
77.
go back to reference Jun SW, Fleming K, Adler M, Emer JS. Zip-io: architecture for application-specific compression of big data. In: Proceedings of the international conference on field-programmable technology; 2012. pp. 343–351. Jun SW, Fleming K, Adler M, Emer JS. Zip-io: architecture for application-specific compression of big data. In: Proceedings of the international conference on field-programmable technology; 2012. pp. 343–351.
78.
go back to reference Zou H, Yu Y, Tang W, Chen HM. Improving I/O performance with adaptive data compression for big data applications. In: Proceedings of the international parallel and distributed processing symposium workshops; 2014. pp. 1228–1237. Zou H, Yu Y, Tang W, Chen HM. Improving I/O performance with adaptive data compression for big data applications. In: Proceedings of the international parallel and distributed processing symposium workshops; 2014. pp. 1228–1237.
79.
go back to reference Yang C, Zhang X, Zhong C, Liu C, Pei J, Ramamohanarao K, Chen J. A spatiotemporal compression based approach for efficient big data processing on cloud. J Comp Syst Sci. 2014;80(8):1563–83.MathSciNetCrossRefMATH Yang C, Zhang X, Zhong C, Liu C, Pei J, Ramamohanarao K, Chen J. A spatiotemporal compression based approach for efficient big data processing on cloud. J Comp Syst Sci. 2014;80(8):1563–83.MathSciNetCrossRefMATH
80.
go back to reference Xue Z, Shen G, Li J, Xu Q, Zhang Y, Shao J. Compression-aware I/O performance analysis for big data clustering. In: Proceedings of the international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications; 2012. pp. 45–52. Xue Z, Shen G, Li J, Xu Q, Zhang Y, Shao J. Compression-aware I/O performance analysis for big data clustering. In: Proceedings of the international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications; 2012. pp. 45–52.
85.
go back to reference Curtin RR, Cline JR, Slagle NP, March WB, Ram P, Mehta NA, Gray AG. MLPACK: a scalable C++ machine learning library. J Mach Learn Res. 2013;14:801–5.MathSciNetMATH Curtin RR, Cline JR, Slagle NP, March WB, Ram P, Mehta NA, Gray AG. MLPACK: a scalable C++ machine learning library. J Mach Learn Res. 2013;14:801–5.MathSciNetMATH
87.
go back to reference Huai Y, Lee R, Zhang S, Xia CH, Zhang X. DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceedings of the ACM symposium on cloud computing; 2011. pp. 4:1–4:14. Huai Y, Lee R, Zhang S, Xia CH, Zhang X. DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceedings of the ACM symposium on cloud computing; 2011. pp. 4:1–4:14.
88.
go back to reference Rusu F, Dobra A. GLADE: a scalable framework for efficient analytics. In: Proceedings of LADIS workshop held in conjunction with VLDB; 2012. pp. 1–6. Rusu F, Dobra A. GLADE: a scalable framework for efficient analytics. In: Proceedings of LADIS workshop held in conjunction with VLDB; 2012. pp. 1–6.
89.
go back to reference Cheng Y, Qin C, Rusu F. GLADE: big data analytics made easy. In: Proceedings of the ACM SIGMOD international conference on management of data; 2012. pp. 697–700. Cheng Y, Qin C, Rusu F. GLADE: big data analytics made easy. In: Proceedings of the ACM SIGMOD international conference on management of data; 2012. pp. 697–700.
90.
go back to reference Essa YM, Attiya G, El-Sayed A. Mobile agent based new framework for improving big data analysis. In: Proceedings of the international conference on cloud computing and big data; 2013. pp. 381–386. Essa YM, Attiya G, El-Sayed A. Mobile agent based new framework for improving big data analysis. In: Proceedings of the international conference on cloud computing and big data; 2013. pp. 381–386.
91.
go back to reference Wonner J, Grosjean J, Capobianco A, Bechmann. D Starfish: a selection technique for dense virtual environments. In: Proceedings of the ACM symposium on virtual reality software and technology; 2012. pp. 101–104. Wonner J, Grosjean J, Capobianco A, Bechmann. D Starfish: a selection technique for dense virtual environments. In: Proceedings of the ACM symposium on virtual reality software and technology; 2012. pp. 101–104.
92.
go back to reference Demchenko Y, de Laat C, Membrey P. Defining architecture components of the big data ecosystem. In: Proceedings of the international conference on collaboration technologies and systems; 2014. pp. 104–112. Demchenko Y, de Laat C, Membrey P. Defining architecture components of the big data ecosystem. In: Proceedings of the international conference on collaboration technologies and systems; 2014. pp. 104–112.
93.
go back to reference Ye F, Wang ZJ, Zhou FC, Wang YP, Zhou YC. Cloud-based big data mining and analyzing services platform integrating r. In: Proceedings of the international conference on advanced cloud and big data; 2013. pp. 147–151. Ye F, Wang ZJ, Zhou FC, Wang YP, Zhou YC. Cloud-based big data mining and analyzing services platform integrating r. In: Proceedings of the international conference on advanced cloud and big data; 2013. pp. 147–151.
94.
go back to reference Wu X, Zhu X, Wu G-Q, Ding W. Data mining with big data. IEEE Trans Knowl Data Eng. 2014;26(1):97–107.CrossRef Wu X, Zhu X, Wu G-Q, Ding W. Data mining with big data. IEEE Trans Knowl Data Eng. 2014;26(1):97–107.CrossRef
95.
go back to reference Laurila JK, Gatica-Perez D, Aad I, Blom J, Bornet O, Do T, Dousse O, Eberle J, Miettinen M. The mobile data challenge: big data for mobile computing research. In: Proceedings of the mobile data challenge by Nokia workshop; 2012. pp. 1–8. Laurila JK, Gatica-Perez D, Aad I, Blom J, Bornet O, Do T, Dousse O, Eberle J, Miettinen M. The mobile data challenge: big data for mobile computing research. In: Proceedings of the mobile data challenge by Nokia workshop; 2012. pp. 1–8.
96.
go back to reference Demirkan H, Delen D. Leveraging the capabilities of service-oriented decision support systems: putting analytics and big data in cloud. Decis Support Syst. 2013;55(1):412–21.CrossRef Demirkan H, Delen D. Leveraging the capabilities of service-oriented decision support systems: putting analytics and big data in cloud. Decis Support Syst. 2013;55(1):412–21.CrossRef
97.
go back to reference Talia D. Clouds for scalable big data analytics. Computer. 2013;46(5):98–101.CrossRef Talia D. Clouds for scalable big data analytics. Computer. 2013;46(5):98–101.CrossRef
98.
go back to reference Lu R, Zhu H, Liu X, Liu JK, Shao J. Toward efficient and privacy-preserving computing in big data era. IEEE Netw. 2014;28(4):46–50.CrossRef Lu R, Zhu H, Liu X, Liu JK, Shao J. Toward efficient and privacy-preserving computing in big data era. IEEE Netw. 2014;28(4):46–50.CrossRef
99.
go back to reference Cuzzocrea A, Song IY, Davis KC. Analytics over large-scale multidimensional data: the big data revolution!. In: Proceedings of the ACM international workshop on data warehousing and OLAP; 2011. pp. 101–104. Cuzzocrea A, Song IY, Davis KC. Analytics over large-scale multidimensional data: the big data revolution!. In: Proceedings of the ACM international workshop on data warehousing and OLAP; 2011. pp. 101–104.
100.
go back to reference Zhang J, Huang ML. 5Ws model for big data analysis and visualization. In: Proceedings of the international conference on computational science and engineering; 2013. pp. 1021–1028. Zhang J, Huang ML. 5Ws model for big data analysis and visualization. In: Proceedings of the international conference on computational science and engineering; 2013. pp. 1021–1028.
101.
go back to reference Chandarana P, Vijayalakshmi M. Big data analytics frameworks. In: Proceedings of the international conference on circuits, systems, communication and information technology applications; 2014. pp. 430–434. Chandarana P, Vijayalakshmi M. Big data analytics frameworks. In: Proceedings of the international conference on circuits, systems, communication and information technology applications; 2014. pp. 430–434.
103.
go back to reference Hu H, Wen Y, Chua T-S, Li X. Toward scalable systems for big data analytics: a technology tutorial. IEEE Access. 2014;2:652–87.CrossRef Hu H, Wen Y, Chua T-S, Li X. Toward scalable systems for big data analytics: a technology tutorial. IEEE Access. 2014;2:652–87.CrossRef
104.
go back to reference Sagiroglu S, Sinanc D, Big data: a review. In: Proceedings of the international conference on collaboration technologies and systems; 2013. pp. 42–47. Sagiroglu S, Sinanc D, Big data: a review. In: Proceedings of the international conference on collaboration technologies and systems; 2013. pp. 42–47.
105.
go back to reference Fan W, Bifet A. Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newslett. 2013;14(2):1–5.CrossRef Fan W, Bifet A. Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newslett. 2013;14(2):1–5.CrossRef
106.
go back to reference Diebold FX. On the origin(s) and development of the term “big data”. Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, Tech. Rep. 2012. [Online]. http://economics.sas. upenn.edu/sites/economics.sas.upenn.edu/files/12-037.pdf. Diebold FX. On the origin(s) and development of the term “big data”. Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, Tech. Rep. 2012. [Online]. http://​economics.​sas. upenn.edu/sites/economics.sas.upenn.edu/files/12-037.pdf.
107.
go back to reference Weiss SM, Indurkhya N. Predictive data mining: a practical guide. San Francisco: Morgan Kaufmann Publishers Inc.; 1998.MATH Weiss SM, Indurkhya N. Predictive data mining: a practical guide. San Francisco: Morgan Kaufmann Publishers Inc.; 1998.MATH
108.
go back to reference Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A, Foufou S, Bouras A. A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Topics Comp. 2014;2(3):267–79.CrossRef Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A, Foufou S, Bouras A. A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Topics Comp. 2014;2(3):267–79.CrossRef
109.
go back to reference Shirkhorshidi AS, Aghabozorgi SR, Teh YW, Herawan T. Big data clustering: a review. In: Proceedings of the international conference on computational science and its applications; 2014. pp. 707–720. Shirkhorshidi AS, Aghabozorgi SR, Teh YW, Herawan T. Big data clustering: a review. In: Proceedings of the international conference on computational science and its applications; 2014. pp. 707–720.
110.
go back to reference Xu H, Li Z, Guo S, Chen K. Cloudvista: interactive and economical visual cluster analysis for big data in the cloud. Proc VLDB Endow. 2012;5(12):1886–9.CrossRef Xu H, Li Z, Guo S, Chen K. Cloudvista: interactive and economical visual cluster analysis for big data in the cloud. Proc VLDB Endow. 2012;5(12):1886–9.CrossRef
111.
go back to reference Cui X, Gao J, Potok TE. A flocking based algorithm for document clustering analysis. J Syst Archit. 2006;52(89):505–15.CrossRef Cui X, Gao J, Potok TE. A flocking based algorithm for document clustering analysis. J Syst Archit. 2006;52(89):505–15.CrossRef
112.
go back to reference Cui X, Charles JS, Potok T. GPU enhanced parallel computing for large scale data clustering. Future Gener Comp Syst. 2013;29(7):1736–41.CrossRef Cui X, Charles JS, Potok T. GPU enhanced parallel computing for large scale data clustering. Future Gener Comp Syst. 2013;29(7):1736–41.CrossRef
113.
go back to reference Feldman D, Schmidt M, Sohler C. Turning big data into tiny data: constant-size coresets for k-means, pca and projective clustering. In: Proceedings of the ACM-SIAM symposium on discrete algorithms; 2013. pp. 1434–1453. Feldman D, Schmidt M, Sohler C. Turning big data into tiny data: constant-size coresets for k-means, pca and projective clustering. In: Proceedings of the ACM-SIAM symposium on discrete algorithms; 2013. pp. 1434–1453.
114.
go back to reference Tekin C, van der Schaar M. Distributed online big data classification using context information. In: Proceedings of the Allerton conference on communication, control, and computing; 2013. pp. 1435–1442. Tekin C, van der Schaar M. Distributed online big data classification using context information. In: Proceedings of the Allerton conference on communication, control, and computing; 2013. pp. 1435–1442.
116.
go back to reference Lin MY, Lee PY, Hsueh SC. Apriori-based frequent itemset mining algorithms on mapreduce. In: Proceedings of the international conference on ubiquitous information management and communication; 2012. pp. 76:1–76:8. Lin MY, Lee PY, Hsueh SC. Apriori-based frequent itemset mining algorithms on mapreduce. In: Proceedings of the international conference on ubiquitous information management and communication; 2012. pp. 76:1–76:8.
117.
go back to reference Riondato M, DeBrabant JA, Fonseca R, Upfal E. PARMA: a parallel randomized algorithm for approximate association rules mining in mapreduce. In: Proceedings of the ACM international conference on information and knowledge management; 2012. pp. 85–94. Riondato M, DeBrabant JA, Fonseca R, Upfal E. PARMA: a parallel randomized algorithm for approximate association rules mining in mapreduce. In: Proceedings of the ACM international conference on information and knowledge management; 2012. pp. 85–94.
118.
go back to reference Leung CS, MacKinnon R, Jiang F. Reducing the search space for big data mining for interesting patterns from uncertain data. In: Proceedings of the international congress on big data; 2014. pp. 315–322. Leung CS, MacKinnon R, Jiang F. Reducing the search space for big data mining for interesting patterns from uncertain data. In: Proceedings of the international congress on big data; 2014. pp. 315–322.
119.
go back to reference Yang L, Shi Z, Xu L, Liang F, Kirsh I. DH-TRIE frequent pattern mining on hadoop using JPA. In: Proceedings of the international conference on granular computing; 2011. pp. 875–878. Yang L, Shi Z, Xu L, Liang F, Kirsh I. DH-TRIE frequent pattern mining on hadoop using JPA. In: Proceedings of the international conference on granular computing; 2011. pp. 875–878.
120.
go back to reference Huang JW, Lin SC, Chen MS. DPSP: Distributed progressive sequential pattern mining on the cloud. In: Proceedings of the advances in knowledge discovery and data mining, vol. 6119; 2010. pp. 27–34. Huang JW, Lin SC, Chen MS. DPSP: Distributed progressive sequential pattern mining on the cloud. In: Proceedings of the advances in knowledge discovery and data mining, vol. 6119; 2010. pp. 27–34.
121.
go back to reference Paz CE. A survey of parallel genetic algorithms. Calc Paralleles Reseaux et Syst Repar. 1998;10(2):141–71. Paz CE. A survey of parallel genetic algorithms. Calc Paralleles Reseaux et Syst Repar. 1998;10(2):141–71.
122.
go back to reference Kranthi Kiran B, Babu AV. A comparative study of issues in big data clustering algorithm with constraint based genetic algorithm for associative clustering. Int J Innov Res Comp Commun Eng. 2014;2(8):5423–32. Kranthi Kiran B, Babu AV. A comparative study of issues in big data clustering algorithm with constraint based genetic algorithm for associative clustering. Int J Innov Res Comp Commun Eng. 2014;2(8):5423–32.
124.
go back to reference Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In: Proceedings of the ACM SIGMOD international conference on management of data; 2010. pp. 135–146. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In: Proceedings of the ACM SIGMOD international conference on management of data; 2010. pp. 135–146.
125.
go back to reference Hasan S, Shamsuddin S, Lopes N. Soft computing methods for big data problems. In: Proceedings of the symposium on GPU computing and applications; 2013. pp. 235–247. Hasan S, Shamsuddin S, Lopes N. Soft computing methods for big data problems. In: Proceedings of the symposium on GPU computing and applications; 2013. pp. 235–247.
126.
go back to reference Ku-Mahamud KR. Big data clustering using grid computing and ant-based algorithm. In: Proceedings of the international conference on computing and informatics; 2013. pp. 6–14. Ku-Mahamud KR. Big data clustering using grid computing and ant-based algorithm. In: Proceedings of the international conference on computing and informatics; 2013. pp. 6–14.
127.
go back to reference Deneubourg JL, Goss S, Franks N, Sendova-Franks A, Detrain C, Chrétien L. The dynamics of collective sorting robot-like ants and ant-like robots. In: Proceedings of the international conference on simulation of adaptive behavior on from animals to animats; 1990. pp. 356–363. Deneubourg JL, Goss S, Franks N, Sendova-Franks A, Detrain C, Chrétien L. The dynamics of collective sorting robot-like ants and ant-like robots. In: Proceedings of the international conference on simulation of adaptive behavior on from animals to animats; 1990. pp. 356–363.
133.
go back to reference Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proceedings of the ACM symposium on cloud computing; 2010. pp. 143–154. Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In: Proceedings of the ACM symposium on cloud computing; 2010. pp. 143–154.
134.
go back to reference Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA. BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD international conference on management of data; 2013. pp. 1197–1208. Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen HA. BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD international conference on management of data; 2013. pp. 1197–1208.
135.
go back to reference Cheptsov A. Hpc in big data age: An evaluation report for java-based data-intensive applications implemented with hadoop and openmpi. In: Proceedings of the European MPI Users’ Group Meeting; 2014. pp. 175:175–175:180. Cheptsov A. Hpc in big data age: An evaluation report for java-based data-intensive applications implemented with hadoop and openmpi. In: Proceedings of the European MPI Users’ Group Meeting; 2014. pp. 175:175–175:180.
136.
go back to reference Yuan LY, Wu L, You JH, Chi Y. Rubato db: A highly scalable staged grid database system for OLTP and big data applications. In: Proceedings of the ACM international conference on conference on information and knowledge management; 2014. pp. 1–10. Yuan LY, Wu L, You JH, Chi Y. Rubato db: A highly scalable staged grid database system for OLTP and big data applications. In: Proceedings of the ACM international conference on conference on information and knowledge management; 2014. pp. 1–10.
137.
go back to reference Zhao JM, Wang WS, Liu X, Chen YF. Big data benchmark—big DS. In: Proceedings of the advancing big data benchmarks; 2014. pp. 49–57. Zhao JM, Wang WS, Liu X, Chen YF. Big data benchmark—big DS. In: Proceedings of the advancing big data benchmarks; 2014. pp. 49–57.
138.
go back to reference Saletore V, Krishnan K, Viswanathan V, Tolentino M. HcBench: Methodology, development, and full-system characterization of a customer usage representative big data/hadoop benchmark. In: Advancing big data benchmarks; 2014. pp. 73–93. Saletore V, Krishnan K, Viswanathan V, Tolentino M. HcBench: Methodology, development, and full-system characterization of a customer usage representative big data/hadoop benchmark. In: Advancing big data benchmarks; 2014. pp. 73–93.
139.
go back to reference Zhang L, Stoffel A, Behrisch M, Mittelstadt S, Schreck T, Pompl R, Weber S, Last H, Keim D. Visual analytics for the big data era—a comparative review of state-of-the-art commercial systems. In: Proceedings of the IEEE conference on visual analytics science and technology; 2012. pp. 173–182. Zhang L, Stoffel A, Behrisch M, Mittelstadt S, Schreck T, Pompl R, Weber S, Last H, Keim D. Visual analytics for the big data era—a comparative review of state-of-the-art commercial systems. In: Proceedings of the IEEE conference on visual analytics science and technology; 2012. pp. 173–182.
140.
go back to reference Harati A, Lopez S, Obeid I, Picone J, Jacobson M, Tobochnik S. The TUH EEG CORPUS: A big data resource for automated EEG interpretation. In: Proceeding of the IEEE signal processing in medicine and biology symposium; 2014. pp. 1–5. Harati A, Lopez S, Obeid I, Picone J, Jacobson M, Tobochnik S. The TUH EEG CORPUS: A big data resource for automated EEG interpretation. In: Proceeding of the IEEE signal processing in medicine and biology symposium; 2014. pp. 1–5.
141.
go back to reference Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endow. 2009;2(2):1626–9.CrossRef Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endow. 2009;2(2):1626–9.CrossRef
143.
go back to reference Januzaj E, Kriegel HP, Pfeifle M. DBDC: Density based distributed clustering. In: Proceedings of the advances in database technology, vol. 2992; 2004. pp. 88–105. Januzaj E, Kriegel HP, Pfeifle M. DBDC: Density based distributed clustering. In: Proceedings of the advances in database technology, vol. 2992; 2004. pp. 88–105.
144.
go back to reference Zhao W, Ma H, He Q. Parallel k-means clustering based on mapreduce. Proc Cloud Comp. 2009;5931:674–9.CrossRef Zhao W, Ma H, He Q. Parallel k-means clustering based on mapreduce. Proc Cloud Comp. 2009;5931:674–9.CrossRef
145.
go back to reference Nolan RL. Managing the crises in data processing. Harvard Bus Rev. 1979;57(1):115–26. Nolan RL. Managing the crises in data processing. Harvard Bus Rev. 1979;57(1):115–26.
146.
go back to reference Tsai CW, Huang WC, Chiang MC. Recent development of metaheuristics for clustering. In: Proceedings of the mobile, ubiquitous, and intelligent computing, vol. 274. 2014; pp. 629–636. Tsai CW, Huang WC, Chiang MC. Recent development of metaheuristics for clustering. In: Proceedings of the mobile, ubiquitous, and intelligent computing, vol. 274. 2014; pp. 629–636.
Metadata
Title
Big Data Analytics
Authors
Chun-Wei Tsai
Chin-Feng Lai
Han-Chieh Chao
Athanasios V. Vasilakos
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-44550-2_2

Premium Partner