Abstract
Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. The Big Data challenge is becoming one of the most exciting opportunities for the years to come. We present in this issue, a broad overview of the topic, its current status, controversy, and a forecast to the future. We introduce four articles, written by influential scientists in the field, covering the most interesting and state-of-the-art topics on Big Data mining.
- SAMOA, http://samoa-project.net, 2013.Google Scholar
- C. C. Aggarwal, editor. Managing and Mining Sensor Data. Advances in Database Systems. Springer, 2013. Google ScholarDigital Library
- Apache Hadoop, http://hadoop.apache.org.Google Scholar
- Apache Mahout, http://mahout.apache.org.Google Scholar
- A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive Online Analysis http://moa.cms.waikato.ac.nz/. Journal of Machine Learning Research (JMLR), 2010. Google ScholarDigital Library
- C. Bockermann and H. Blom. The streams Framework. Technical Report 5, TU Dortmund University, 12 2012.Google Scholar
- d. boyd and K. Crawford. Critical Questions for Big Data. Information, Communication and Society, 15(5):662--679, 2012.Google ScholarCross Ref
- F. Diebold. "Big Data" Dynamic Factor Models for Macroeconomic Measurement and Forecasting. Discussion Read to the Eighth World Congress of the Econometric Society, 2000.Google Scholar
- F. Diebold. On the Origin(s) and Development of the Term "Big Data". Pier working paper archive, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, 2012.Google Scholar
- B. Efron. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics Monographs. Cambridge University Press, 2010.Google Scholar
- U. Fayyad. Big Data Analytics: Applications and Opportunities in On-line Predictive Modeling. http:// big-data-mining.org/keynotes/#fayyad, 2012.Google Scholar
- D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. In SODA, 2013.Google ScholarDigital Library
- J. Gama. Knowledge Discovery from Data Streams. Chapman & Hall/Crc Data Mining and Knowledge Discovery. Taylor & Francis Group, 2010. Google ScholarDigital Library
- J. Gantz and D. Reinsel. IDC: The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. December 2012.Google Scholar
- Gartner, http://www.gartner.com/it-glossary/bigdata.Google Scholar
- V. Gopalkrishnan, D. Steier, H. Lewis, and J. Guszcza. Big data, big business: bridging the gap. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, Big- Mine '12, pages 7--11, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- Intel. Big Thinkers on Big Data, http://www.intel.com/content/www/us/en/bigdata/big-thinkers-on-big-data.html, 2012.Google Scholar
- U. Kang, D. H. Chau, and C. Faloutsos. PEGASUS: Mining Billion-Scale Graphs in the Cloud. 2012.Google Scholar
- D. Laney. 3-D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, February 6, 2001.Google Scholar
- J. Langford. Vowpal Wabbit, http://hunch.net/¿vw/, 2011.Google Scholar
- D. J. Leinweber. Stupid Data Miner Tricks: Overfitting the S&P 500. The Journal of Investing, 16:15--22, 2007.Google ScholarCross Ref
- E. Letouzé. Big Data for Development: Opportunities & Challenges. May 2011.Google Scholar
- J. Lin. MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail! CoRR, abs/1209.2191, 2012.Google Scholar
- Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, July 2010.Google ScholarDigital Library
- N. Marz and J. Warren. Big Data: Principles and best practices of scalable realtime data systems. Manning Publications, 2013.Google ScholarDigital Library
- L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed Stream Computing Platform. In ICDM Workshops, pages 170--177, 2010. Google ScholarDigital Library
- C. Parker. Unexpected challenges in large scale machine learning. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine '12, pages 1--6, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Petland. Reinventing society in the wake of big data. Edge.org, http://www.edge.org/conversation/reinventing-societyin-the-wake-of-big-data, 2012.Google Scholar
- R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3-900051-07-0.Google Scholar
- R. Smolan and J. Erwitt. The Human Face of Big Data. Sterling Publishing Company Incorporated, 2012.Google Scholar
- Storm, http://storm-project.net.Google Scholar
- N. Taleb. Antifragile: How to Live in a World We Don't Understand. Penguin Books, Limited, 2012.Google Scholar
- UN Global Pulse, http://www.unglobalpulse.org.Google Scholar
- K. Wagstaff. Machine learning that matters. In ICML. icml.cc / Omnipress, 2012.Google Scholar
- S. M. Weiss and N. Indurkhya. Predictive data mining: a practical guide. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998. Google ScholarDigital Library
- P. Zikopoulos, C. Eaton, D. deRoos, T. Deutsch, and G. Lapis. IBM Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Companies, Incorporated, 2011. Google ScholarDigital Library
Index Terms
- Mining big data: current status, and forecast to the future
Recommendations
Mining Big Data
ICEIS 2015: Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1Nowadays, the daily amount of generated data is measured in exabytes. Such huge data is now referred to as Big Data. Big data mining leads to the discovery of the useful information from huge data repositories. However, this huge amount of data hinders ...
From Big Data to Big Data Mining: Challenges, Issues, and Opportunities
Proceedings of the 18th International Conference on Database Systems for Advanced Applications - Volume 7827While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and ...
Comments