research-article

Mining big data: current status, and forecast to the future

Authors:
Wei Fan

Huawei Noah's Ark Lab, Hong Kong Science Park, Shatin, Hong Kong

Huawei Noah's Ark Lab, Hong Kong Science Park, Shatin, Hong Kong
View Profile

,
Albert Bifet

Yahoo! Research Barcelona, Barcelona, Catalonia, Spain

Yahoo! Research Barcelona, Barcelona, Catalonia, Spain
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 14 Issue 2December 2012pp 1–5https://doi.org/10.1145/2481244.2481246

Published:30 April 2013Publication History

ACM SIGKDD Explorations Newsletter

Abstract

Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. The Big Data challenge is becoming one of the most exciting opportunities for the years to come. We present in this issue, a broad overview of the topic, its current status, controversy, and a forecast to the future. We introduce four articles, written by influential scientists in the field, covering the most interesting and state-of-the-art topics on Big Data mining.

References

SAMOA, http://samoa-project.net, 2013.Google Scholar
C. C. Aggarwal, editor. Managing and Mining Sensor Data. Advances in Database Systems. Springer, 2013. Google ScholarDigital Library
Apache Hadoop, http://hadoop.apache.org.Google Scholar
Apache Mahout, http://mahout.apache.org.Google Scholar
A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive Online Analysis http://moa.cms.waikato.ac.nz/. Journal of Machine Learning Research (JMLR), 2010. Google ScholarDigital Library
C. Bockermann and H. Blom. The streams Framework. Technical Report 5, TU Dortmund University, 12 2012.Google Scholar
d. boyd and K. Crawford. Critical Questions for Big Data. Information, Communication and Society, 15(5):662--679, 2012.Google ScholarCross Ref
F. Diebold. "Big Data" Dynamic Factor Models for Macroeconomic Measurement and Forecasting. Discussion Read to the Eighth World Congress of the Econometric Society, 2000.Google Scholar
F. Diebold. On the Origin(s) and Development of the Term "Big Data". Pier working paper archive, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, 2012.Google Scholar
B. Efron. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics Monographs. Cambridge University Press, 2010.Google Scholar
U. Fayyad. Big Data Analytics: Applications and Opportunities in On-line Predictive Modeling. http:// big-data-mining.org/keynotes/#fayyad, 2012.Google Scholar
D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. In SODA, 2013.Google ScholarDigital Library
J. Gama. Knowledge Discovery from Data Streams. Chapman & Hall/Crc Data Mining and Knowledge Discovery. Taylor & Francis Group, 2010. Google ScholarDigital Library
J. Gantz and D. Reinsel. IDC: The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. December 2012.Google Scholar
Gartner, http://www.gartner.com/it-glossary/bigdata.Google Scholar
V. Gopalkrishnan, D. Steier, H. Lewis, and J. Guszcza. Big data, big business: bridging the gap. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, Big- Mine '12, pages 7--11, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
Intel. Big Thinkers on Big Data, http://www.intel.com/content/www/us/en/bigdata/big-thinkers-on-big-data.html, 2012.Google Scholar
U. Kang, D. H. Chau, and C. Faloutsos. PEGASUS: Mining Billion-Scale Graphs in the Cloud. 2012.Google Scholar
D. Laney. 3-D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, February 6, 2001.Google Scholar
J. Langford. Vowpal Wabbit, http://hunch.net/¿vw/, 2011.Google Scholar
D. J. Leinweber. Stupid Data Miner Tricks: Overfitting the S&P 500. The Journal of Investing, 16:15--22, 2007.Google ScholarCross Ref
E. Letouzé. Big Data for Development: Opportunities & Challenges. May 2011.Google Scholar
J. Lin. MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail! CoRR, abs/1209.2191, 2012.Google Scholar
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, July 2010.Google ScholarDigital Library
N. Marz and J. Warren. Big Data: Principles and best practices of scalable realtime data systems. Manning Publications, 2013.Google ScholarDigital Library
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed Stream Computing Platform. In ICDM Workshops, pages 170--177, 2010. Google ScholarDigital Library
C. Parker. Unexpected challenges in large scale machine learning. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine '12, pages 1--6, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
A. Petland. Reinventing society in the wake of big data. Edge.org, http://www.edge.org/conversation/reinventing-societyin-the-wake-of-big-data, 2012.Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3-900051-07-0.Google Scholar
R. Smolan and J. Erwitt. The Human Face of Big Data. Sterling Publishing Company Incorporated, 2012.Google Scholar
Storm, http://storm-project.net.Google Scholar
N. Taleb. Antifragile: How to Live in a World We Don't Understand. Penguin Books, Limited, 2012.Google Scholar
UN Global Pulse, http://www.unglobalpulse.org.Google Scholar
K. Wagstaff. Machine learning that matters. In ICML. icml.cc / Omnipress, 2012.Google Scholar
S. M. Weiss and N. Indurkhya. Predictive data mining: a practical guide. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998. Google ScholarDigital Library
P. Zikopoulos, C. Eaton, D. deRoos, T. Deutsch, and G. Lapis. IBM Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Companies, Incorporated, 2011. Google ScholarDigital Library

Index Terms

Mining big data: current status, and forecast to the future
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining

Recommendations

Mining Big Data
ICEIS 2015: Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1

Nowadays, the daily amount of generated data is measured in exabytes. Such huge data is now referred to as Big Data. Big data mining leads to the discovery of the useful information from huge data repositories. However, this huge amount of data hinders ...
Read More
From Big Data to Big Data Mining: Challenges, Issues, and Opportunities
Proceedings of the 18th International Conference on Database Systems for Advanced Applications - Volume 7827

While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and ...
Read More
Big Data Analytics
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGKDD Explorations Newsletter Volume 14, Issue 2
December 2012
81 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/2481244
Issue’s Table of Contents

Copyright © 2013 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 April 2013
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 506
  Total Citations
  View Citations
- 11,404
  Total Downloads
- Downloads (Last 12 months)218
- Downloads (Last 6 weeks)33
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining big data: current status, and forecast to the future

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Mining Big Data

From Big Data to Big Data Mining: Challenges, Issues, and Opportunities

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mining big data: current status, and forecast to the future

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Mining Big Data

From Big Data to Big Data Mining: Challenges, Issues, and Opportunities

Big Data Analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media