ABSTRACT
As Big Data becomes better understood, there is a need for a comprehensive definition of Big Data to support work in fields such as data quality for Big Data. Existing definitions of Big Data define Big Data by comparison with existing, usually relational, definitions, or define Big Data in terms of data characteristics or use an approach which combines data characteristics with the Big Data environment. In this paper we examine existing definitions of Big Data and discuss the strengths and limitations of the different approaches, with particular reference to issues related to data quality in Big Data. We identify the issues presented by incomplete or inconsistent definitions. We propose an alternative definition and relate this definition to our work on quality in Big Data.
- Wand, Y & Wang R.Y. (1996) Anchoring Data Quality Dimensions in Ontological Foundations Communications of the ACM 39, 86--95 Google ScholarDigital Library
- Gupta, P., Tyagi, N., 2015. An approach towards big data; A review, 2015 International Conference on Computing, Communication Automation (ICCCA). Presented at the 2015 International Conference on Computing, Communication Automation (ICCCA), 118--123.Google ScholarCross Ref
- Suresh, J. (2014) Bird's Eye View on Big Data Management 2014 Conference on IT in Business, Industry and Government (CSIBIG) 1--5Google Scholar
- Khan, M, Uddin, M. & Gupta N. (2014) Seven V's of Big Data; Understanding Big Data to extract value. Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education "Engineering Education: Industry Involvement and Interdisciplinary Trends" ASEE Zone 1 2014Google ScholarCross Ref
- Bedi, P., Jindal, V., & Gautam, A. (2014) Beginning with big data simplified. 2014 International Conference on Data Mining and Intelligence Computing (ICDMIC) 1--7,Google ScholarCross Ref
- Demchenko, Y., Grosso, P., De Laat, C., & Membrey, P. (2013). Addressing big data issues in Scientific Data Infrastructure. In 2013 International Conference on Collaboration Technologies and Systems (CTS) (pp. 48--55).Google ScholarCross Ref
- Demchenko, Y., Gruengard, E. & Klous, S., 2014. Instructional Model for Building Effective Big Data Curricula for Online and Campus Education. 2014 IEEE 6th International Conference on Cloud Computing Technology and Science, pp.935--941. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7037787 Google ScholarDigital Library
- Marr, B., 2014. Big Data: The 5 Vs Everyone Must Know. LinkedIn Pulse. https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-knowGoogle Scholar
- Press, G., 2014. 12 Big Data Definitions: What's Yours? Forbes. http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#611ecbcc21a9Google Scholar
- Hu, H., Wen, Y., Chua, T.-S., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. IEEE Access, 2, 652--687.Google ScholarDigital Library
- Durham, E.E., Rosen, A. & Harrison R.W. (2014). A model architecture for Big Data applications using relational databases 2014 International Conference on Big Data 9--16.Google Scholar
- Navathe, S.B., (1992). Evolution of Data Modeling for Databases. Communications ACM 35, 112--123. Google ScholarDigital Library
- Codd, E.F., 1970. A Relational Model of Data for Large Shared Data Banks. Communications. ACM, 13(6), 377--387 Google ScholarDigital Library
- Angles, R., & Gutierrez, C. (2008). Survey of Graph Database Models. ACM Comput. Surv., 40(1), 1:1--1:39. Google ScholarDigital Library
- Gartner Research http://www.gartner.com/it-glossary/big-data/Google Scholar
- Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H., (2011) Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute http://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovationGoogle Scholar
- Jacobs, A. (2009). The Pathologies of Big Data. Queue, 7(6), 10. (2009) Google ScholarDigital Library
- Gantz, B. J., & Reinsel, D. (2011). Extracting Value from Chaos State of the Universe: An Executive Summary. IDC iView, (June), 1--12. Retrieved from http://idcdocserv.com/1142Gantz & Reinsel, 2011Google Scholar
- Chen, M., Mao, S. & Liu, Y.(2014) Big Data: A survey Mobile Networks and Applications, 19, 171--209.Google Scholar
- Gandomi, A. & Haider, M., 2015. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), pp.137--144. Available at: http://www.sciencedirect.com/science/article/pii/S0268401214001066. Google ScholarDigital Library
- IBM Big Data & Analytics Hub http://www.ibmbigdatahubGoogle Scholar
- Saha, B., & Srivastava, D. (2014). Data quality: The other face of Big Data. Proceedings - International Conference on Data Engineering, 1294--1297.Google ScholarCross Ref
- Datafloq https://datafloq.com/read/3vs-sufficient-describe-big-data/166Google Scholar
- Sagiroglu, S. & Sinanc, D., 2013. Big data: A review. 2013 International Conference on Collaboration Technologies and Systems (CTS). pp. 42--47Google ScholarCross Ref
- Xhafa, F., Naranjo, V., Barolli, L., & Takizawa, M. (2015). On Streaming Consistency of Big Data Stream Processing in Heterogenous Clutsers. 2015 18th International Conference on Network-Based Information Systems, 476--482. Google ScholarDigital Library
- Batini, C., & Scannapieco, M. (2006). Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Secaucus, NJ, USA: Springer-Verlag New York, Inc. Google ScholarDigital Library
- Juddoo, S. (2015). Overview of data quality challenges in the context of Big Data. In 2015 International Conference on Computing, Communication and Security (ICCCS) (pp. 1--9).Google ScholarCross Ref
- Cai, L., & Zhu, Y. (2015). The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal, 14(0), 2.Google Scholar
- Strong, D.M., Lee, Y.W., Wang, R.Y., (1997). Data Quality in Context. Communications ACM 40, 103--110. Google ScholarDigital Library
- Cooper, M., Mell, P. (2012) Tackling Big Data NIST Computer Security Resource Centre (fcsm_june2012_cooper_mell).Google Scholar
- NIST Big Data Public Working Group, & Subgroup, T. (2015). NIST Special Publication XXX-XXX DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions, 1.Google Scholar
- Cetintemel, U. et al., 2014. S-Store: a streaming NewSQL system for big velocity applications. Proceedings of the VLDB Endowment, 7(13), pp.1633--1636. Available at: http://dl.acm.org/citation.cfm?doid=2733004.2733048. Google ScholarDigital Library
- Moreno, A., & Redondo, T. (2016). Text Analytics: the convergence of Big Data and Artificial Intelligence. International Journal of Interactive Multimedia and Artificial Intelligence, 3(6), 57.Google Scholar
- Weets, J., Kakhani, M. K., & Kumar, A. (2015). Limitations and Challenges of HDFS and MapReduce. In 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015. Google ScholarDigital Library
- Sakr, S., Liu, A., & Fayoumi, A. G. (2013). The Family of MapReduce and Large-Scale Data Processing Systems. ACM Computing Surveys, 46(1), 1--44. Google ScholarDigital Library
Recommendations
A Data Quality in Use model for Big Data
Beyond the hype of Big Data, something within business intelligence projects is indeed changing. This is mainly because Big Data is not only about data, but also about a complete conceptual and technological stack including raw and processed data, ...
A Brief Survey on Big Data in Healthcare
This article presents a brief introduction to big data and big data analytics and also their roles in the healthcare system. A definite range of scientific researches about big data analytics in the healthcare system have been reviewed. The definition ...
Comments