skip to main content
research-article

Mining big data: current status, and forecast to the future

Authors Info & Claims
Published:30 April 2013Publication History
Skip Abstract Section

Abstract

Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. The Big Data challenge is becoming one of the most exciting opportunities for the years to come. We present in this issue, a broad overview of the topic, its current status, controversy, and a forecast to the future. We introduce four articles, written by influential scientists in the field, covering the most interesting and state-of-the-art topics on Big Data mining.

References

  1. SAMOA, http://samoa-project.net, 2013.Google ScholarGoogle Scholar
  2. C. C. Aggarwal, editor. Managing and Mining Sensor Data. Advances in Database Systems. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Apache Hadoop, http://hadoop.apache.org.Google ScholarGoogle Scholar
  4. Apache Mahout, http://mahout.apache.org.Google ScholarGoogle Scholar
  5. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive Online Analysis http://moa.cms.waikato.ac.nz/. Journal of Machine Learning Research (JMLR), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bockermann and H. Blom. The streams Framework. Technical Report 5, TU Dortmund University, 12 2012.Google ScholarGoogle Scholar
  7. d. boyd and K. Crawford. Critical Questions for Big Data. Information, Communication and Society, 15(5):662--679, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  8. F. Diebold. "Big Data" Dynamic Factor Models for Macroeconomic Measurement and Forecasting. Discussion Read to the Eighth World Congress of the Econometric Society, 2000.Google ScholarGoogle Scholar
  9. F. Diebold. On the Origin(s) and Development of the Term "Big Data". Pier working paper archive, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, 2012.Google ScholarGoogle Scholar
  10. B. Efron. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics Monographs. Cambridge University Press, 2010.Google ScholarGoogle Scholar
  11. U. Fayyad. Big Data Analytics: Applications and Opportunities in On-line Predictive Modeling. http:// big-data-mining.org/keynotes/#fayyad, 2012.Google ScholarGoogle Scholar
  12. D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. In SODA, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Gama. Knowledge Discovery from Data Streams. Chapman & Hall/Crc Data Mining and Knowledge Discovery. Taylor & Francis Group, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Gantz and D. Reinsel. IDC: The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. December 2012.Google ScholarGoogle Scholar
  15. Gartner, http://www.gartner.com/it-glossary/bigdata.Google ScholarGoogle Scholar
  16. V. Gopalkrishnan, D. Steier, H. Lewis, and J. Guszcza. Big data, big business: bridging the gap. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, Big- Mine '12, pages 7--11, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Intel. Big Thinkers on Big Data, http://www.intel.com/content/www/us/en/bigdata/big-thinkers-on-big-data.html, 2012.Google ScholarGoogle Scholar
  18. U. Kang, D. H. Chau, and C. Faloutsos. PEGASUS: Mining Billion-Scale Graphs in the Cloud. 2012.Google ScholarGoogle Scholar
  19. D. Laney. 3-D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, February 6, 2001.Google ScholarGoogle Scholar
  20. J. Langford. Vowpal Wabbit, http://hunch.net/¿vw/, 2011.Google ScholarGoogle Scholar
  21. D. J. Leinweber. Stupid Data Miner Tricks: Overfitting the S&P 500. The Journal of Investing, 16:15--22, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  22. E. Letouzé. Big Data for Development: Opportunities & Challenges. May 2011.Google ScholarGoogle Scholar
  23. J. Lin. MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail! CoRR, abs/1209.2191, 2012.Google ScholarGoogle Scholar
  24. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, July 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Marz and J. Warren. Big Data: Principles and best practices of scalable realtime data systems. Manning Publications, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed Stream Computing Platform. In ICDM Workshops, pages 170--177, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Parker. Unexpected challenges in large scale machine learning. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine '12, pages 1--6, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Petland. Reinventing society in the wake of big data. Edge.org, http://www.edge.org/conversation/reinventing-societyin-the-wake-of-big-data, 2012.Google ScholarGoogle Scholar
  29. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3-900051-07-0.Google ScholarGoogle Scholar
  30. R. Smolan and J. Erwitt. The Human Face of Big Data. Sterling Publishing Company Incorporated, 2012.Google ScholarGoogle Scholar
  31. Storm, http://storm-project.net.Google ScholarGoogle Scholar
  32. N. Taleb. Antifragile: How to Live in a World We Don't Understand. Penguin Books, Limited, 2012.Google ScholarGoogle Scholar
  33. UN Global Pulse, http://www.unglobalpulse.org.Google ScholarGoogle Scholar
  34. K. Wagstaff. Machine learning that matters. In ICML. icml.cc / Omnipress, 2012.Google ScholarGoogle Scholar
  35. S. M. Weiss and N. Indurkhya. Predictive data mining: a practical guide. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Zikopoulos, C. Eaton, D. deRoos, T. Deutsch, and G. Lapis. IBM Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Companies, Incorporated, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining big data: current status, and forecast to the future

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGKDD Explorations Newsletter
        ACM SIGKDD Explorations Newsletter  Volume 14, Issue 2
        December 2012
        81 pages
        ISSN:1931-0145
        EISSN:1931-0153
        DOI:10.1145/2481244
        Issue’s Table of Contents

        Copyright © 2013 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 April 2013

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader