nach oben

2024 | Buch

Kapitel lesen Erstes Kapitel lesen

Development Methodologies for Big Data Analytics Systems

Plan-driven, Agile, Hybrid, Lightweight Approaches

herausgegeben von: Manuel Mora, Fen Wang, Jorge Marx Gomez, Hector Duran-Limon

Verlag: Springer International Publishing

Buchreihe : Transactions on Computational Science and Computational Intelligence

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book presents research in big data analytics (BDA) for business of all sizes. The authors analyze problems presented in the application of BDA in some businesses through the study of development methodologies based on the three approaches – 1) plan-driven, 2) agile and 3) hybrid lightweight. The authors first describe BDA systems and how they emerged with the convergence of Statistics, Computer Science, and Business Intelligent Analytics with the practical aim to provide concepts, models, methods and tools required for exploiting the wide variety, volume, and velocity of available business internal and external data - i.e. Big Data – and provide decision-making value to decision-makers. The book presents high-quality conceptual and empirical research-oriented chapters on plan-driven, agile, and hybrid lightweight development methodologies and relevant supporting topics for BDA systems suitable to be used for large-, medium-, and small-sized business organizations.

Inhaltsverzeichnis

Frontmatter

Open Source IT for Delivering Big Data Analytics Systems as Services: A Selective Review

Abstract

Advanced IT is used to develop and deliver big data analytics systems (BDAS) as services. In the last decade, proprietary and open source IT has been available for organizations interested in providing IT services classified as BDAS as a service (BDASaaS), as a platform (BDASaaP), or as an infrastructure (BDASaaI). For small and medium-size organizations – business and academic settings – the technological and economic barriers impede usually the implementation of proprietary IT, and, consequently, open source IT is preferred. Nevertheless, there is a vast variety of open source IT for BDAS services, and its adequate selection and integration is not straightforward. This chapter, thus, aims to report a descriptive review of the main open source IT for developing and delivering BDASaaS, BDASaaP, and BDASaaI capabilities from a high-level architectural perspective. This review is theoretically framed using a generic BDAS pipeline derived from the main BDAS literature and the new NIST Big Data Reference Architecture (NBDRA). Our descriptive review, thus, provides theoretical and practical insights for implementing BDAS services.

Manuel Mora, Paola Yuritzy Reyes-Delgado, Sergio Galvan-Cruz, Lizeth I. Solano-Romo

The Role of Machine Learning in Big Data Analytics: Current Practices and Challenges

Abstract

A massive amount of data is generated at an ever-increasing rate. Social media, mobile phones, sensors, and medical imaging, among others, are examples of data sources. An important characteristic of the data generated by these sources is that the data is commonly either unstructured or semi-structured. Big data analytics comprises software systems that are able to analyze vast amounts of data to uncover information such as patterns and correlations that help decision-makers in making better decisions. Traditional approaches such as data warehousing and the use of a classic relational database management system (RDBMS) have become impractical to analyze such unstructured and semi-structured data. On the other hand, machine learning (ML) algorithms have proven to be successful in analyzing such vast amounts of data. In this chapter, we present some of the most widely used ML algorithms in big data analytics as well as the distributed platforms typically employed for processing the data. We also present a selection of three important application domains where ML algorithms have been applied to perform big data analytics. These application domains include healthcare, weather forecasting, and social networking. Finally, we review relevant approaches used in each domain area, the most commonly used ML algorithms per area, and specific domain area issues that need further research in big data analytics.

Hector A. Duran-Limon, Arturo Chavoya, Martha Hernández-Ochoa

The Data Value Chain Ontology

Abstract

When it comes to decision-making, companies are often unaware how valuable data is. As a result, data is underutilized. Before data can be used for data-driven decision-making purposes, users of data analytics must define their needs. Users know the difficult decision processes, problems, and questions particular to their domain. However, often, they do not have the skills necessary to collect, analyze, and understand data. Data scientists, on the other hand, know how to prepare and examine data. Yet they do not know the value of their work for the application domain. A data product is an answer to an important question for users that can be answered by data scientists using data analysis. However, this concept does not help much in connecting the specific problems of data science and the application domain, in order to optimize the data value chain of an organization. For this reason, this paper develops a data value chain ontology from a literature review.

Dirk Bendlin, Jorge Marx Gómez, H. Kaddoura, A. Kucewicz, M. Werther Häckell

Requirements for Machine Learning Methodology Software Tooling

Abstract

A number of machine learning process models (SEMMA, KDD, CRISP-DM, CRISP-ML(Q), Data-to-Value, etc.) have been recently proposed to facilitate the development of machine learning models in their organizational context. While the existing proposals vary with respect to complexity and suitability for particular tasks, it would be desirable to have software tools that embody and support these methodologies and make it easier for project teams to capture, share among team members and stakeholders, and preserve the relevant project information pertaining to the various process stages. Various existing software systems cover parts such as team and communication management (Confluence, Jira, Slack, Zoom, etc.), project management (scrum, kanban, etc.), data and information management (Model Management Platform, cf. (Weber and Hirmer, Business Information Systems. Springer International Publishing, Cham, 2020), inter alia), or experimentation (RapidMiner, Orange, Weka, Tensorflow, etc.), but we are not aware of any management tools that tie them together and ensure methodology compliance. To the best of our knowledge, to date, no requirement analysis exists for a system that meets the need to provide guidance to teams for how to follow a machine learning methodology nor for managing all of a project’s metadata throughout its entire life cycle. To this end, we present an analysis and resulting collection of a set of 29 requirements for the software tooling for machine learning methodologies, derived from properties of the methodologies, user stories, and introspection of the authors.

Jochen L. Leidner, Michael Reiche

A Selective Conceptual Review of CRISP-DM and DDSL Development Methodologies for Big Data Analytics Systems

Abstract

Big data analytics systems (BDAS) have emerged through the convergence of analytics techniques and the availability of sources of massive data, internal and external to the organization, with descriptive, predictive, or prescriptive purposes. BDAS are relevant software systems pursued in diverse domains of application such as marketing, healthcare, finance, manufacturing, logistics, education, and tourism, among others. However, despite BDAS being a modern software system, its development has been conducted mainly using either ad hoc practical guidelines or old rigor-oriented heavyweight methodologies. The business competitive environment demands currently modern – i.e., lightweight or agile – BDAS development methodologies. However, despite some new BDAS development methodologies that have been proposed, studies contrasting rigor-oriented vs. lightweight BDAS development methodologies are still scarce in the literature. In this chapter, we address this knowledge gap, and using the ISO/IEC 29110 standard – Basic profile – as a theoretical expected lightweight development process, we report a selective conceptual review between CRISP-DM – the main rigor-oriented BDAS methodology – and DDSL, a new relevant proprietary lightweight one. Our selective comparative review provides theoretical and practical insights for discriminating both BDAS development approaches useful for researchers and practitioners in the domain of BDAS development projects.

David Montoya-Murillo, Manuel Mora, Sergio Galvan-Cruz, Angel Muñoz-Zavala

A Selective Comparative Review of CRISP-DM and TDSP Development Methodologies for Big Data Analytics Systems

Abstract

Big data analytics systems (BDAS) are modern software systems with descriptive, predictive, or prescriptive purposes developed by current organizations. BDAS are viable due to the convergence of analytics techniques and the availability of sources of massive data, internal and external to the organization. BDAS are developed in a variety of domains of application such as marketing, healthcare, finance, manufacturing, logistics, education, and tourism, among others. However, although BDAS are modern software systems, organizations have used trial-and-error practical guidelines or old rigor-oriented heavyweight methodologies (a.k.a. plan-driven ones). The business competitive environment demands currently modern – i.e., lightweight or agile – BDAS development methodologies, and in the last years, the first modern methodologies have been proposed. However, studies contrasting rigor-oriented vs. lightweight or agile BDAS development methodologies are still scarce in the literature. In this chapter, we address this knowledge gap, and we report a comparative review between CRISP-DM – the main rigor-oriented BDAS methodology – and Team Data Science Process (TDSP), a new relevant proprietary agile one, by using a Scrum-XP workflow of practices as the theoretical agile development framework. Our comparative review provides theoretical and practical insights for discriminating both BDAS development approaches useful for researchers and practitioners in the BDAS development domain.

Gerardo Salazar-Salazar, Manuel Mora, Hector A. Duran-Limon, Francisco Javier Álvarez Rodríguez

BDAS-EPM: An Integrated Evolution Process Model for Big Data Analytics Systems

Abstract

This chapter reports the results of a review and synthesis conducted on selected literature about the big data analytics (BDA) systems evolution, from theory to practice, to address the research questions of the main concepts and evolution (RQ.1), the most relevant frameworks (RQ.2), the main domains of applications reported (RQ.3), and the main trends and challenges for effective decisional support with BDA systems (RQ.4). This involves utilizing a selective literature review methodology on assessing the BDA systems techniques, frameworks, and emerging trends with the aim of providing a summary of core concepts, a succinct but valuable description, and an account for addressing the big data challenges and enhancing its opportunities. Based on the results reviewed and synthesized, this chapter presents a big data analytics systems evolution process model (BDAS-EPM). The BDAS-EPM is an integrated and organized view of the BDA systems and techniques, which can be adopted by organizations to advance appropriateness and increase usefulness of big data in achieving goals and objectives. Centered on the BDAS-EPM, the chapter offers a set of practical recommendations for the data scientist and data architects including the executives and leaders in organizations in their strategic and operational pursuits of innovative advancement and competitive edge.

Fen Wang, Tiko Iyamu, Gloria Phillips-Wren, Jeffrey Yi-Lin Forrest

Big Data Adoption Factors and Development Methodologies: A Multiple Case Study Analysis

Abstract

Data is one of the most valuable resources in any organization. Big data primarily provides access to the data’s often untapped potential. In this research, there are two main focal areas. First, we strive to identify critical factors affecting the adoption of BD implementation in organizations using the interpretative phenomenological analysis (IPA) and technology-organization-environment (TOE) framework. Factors affecting BD adoption in this research include finding the appropriate use case to extract value, the challenge with security, the challenge of managing large datasets, privacy concerns, cost concerns, the burden of regulation, and the challenge of finding big data IT expertise. Second, we explored the organization’s BD development methodologies using IPA methodology to examine its successes and challenges. Most organizations examined in this research are using agile with medium-size teams. They have used agile development methodology that enabled them to create rapid development, continuous improvement, increased stakeholder participation, and ability to develop with incomplete big data expertise. It also has some challenges that include repeated conflict, feature interaction regressions, divergence of development paths, and longer development cycles due to experimentation.

Ahmad B. Alnafoosi, Olayele Adelakun

Detection of Breast Cancer in Mammography Using Pretrained Convolutional Neural Networks with Fine-Tuning

Abstract

Breast cancer is a major health concern for women, especially in Latin America where the incidence and mortality rates are high. Mammography is an essential diagnostic tool in detecting breast cancer, but interpreting mammogram images can be challenging due to their complex nature. To assist radiologists in identifying abnormalities in mammogram images, deep learning algorithms, specifically deep convolutional neural networks (DCNNs), are being employed. This chapter explores the effectiveness of several pretrained DCNN models, such as ResNet-50, ResNet152, VGG19, and EfficientB7, in classifying mammogram images.

To ensure reliable results, the Mini-MIAS and CBIS-DDSM datasets, consisting of 334 and 2620 scanned film mammography images, respectively, were selected for this study. The images were categorized into binary classification and multiclassification groups based on the severity of the lesion. For both datasets, the same preprocessing approach was used to enhance image quality. This involved normalizing the images and applying contrast limited adaptive histogram equalization (CLAHE). The efficacy of the preprocessing techniques was evaluated by comparing the performance of the models on the entire dataset and just the normalized images. Four different stages were tested using images from both datasets, and the performance of each model was evaluated using five metrics, namely, accuracy, precision, recall, F1-score, and area under the ROC curve (AUC).

Cesar Muñoz-Chavez, Hermilo Sánchez-Cruz, Humberto Sossa-Azuela, Julio Ponce-Gallegos

Challenges and Opportunities of Intercompany Big Data Analytics in Supply Chains

Abstract

This chapter investigates the possible challenges and opportunities of big data analytics between supply chain participants. It answers the question which barriers prevent the companies from the usage of intercompany Big Data Analytics. In addition, this chapter investigates which potential benefits such analysis can bring to the supply chain members and how they influence the relation between the companies. This chapter presents different already existing data exchange options and elaborates which benefits the intercompany data exchange might give. After describing these cooperations, this chapter discusses why the companies are willing to exchange data in these cases but not on a regular basis. After this discussion, the chapter shows possible concepts, applications, and solutions to compete the identified lack of intercompany data exchange and analysis. This chapter closes with a discussion of the findings.

J. Kallisch, Jorge Marx-Gómez, C. Wunck

From Big Data to Big Insights: A Synthesis of Real-World Applications of Big Data Analytics

Abstract

The constant changes and advancement in technology have drastically improved real-world applications in diverse industries such as healthcare, financial, education, sports, retail, and manufacturing, among others. Several domains such as marketing and logistics are using big data to make better decisions and gain competitive advantage (Du et al., Gen Hosp Psychiat 67: 144, 2020; Idemudia et al., Using information technology advancements to adapt to global pandemics. IGI Global, 2022; Handbook of research on IT applications for strategic competitive advantage and decision making. IGI Global, Hershey, 2020). This book chapter provides insights and understanding, through a selective review and synthesis of real-world applications, on how different companies and/or industries are using big data analytics to facilitate more effective and efficient organizational decision-making in this new era. This chapter also discusses the implications for theory and practice in the big data age, including recommendations and ethical considerations (Loebbecke and Galliers, CAIS 49(1): 22, 2021) for real-world applications of big data analytics.

Mahesh S. Raisinghani, Efosa C. Idemudia, Fen Wang

Backmatter

Titel: Development Methodologies for Big Data Analytics Systems
herausgegeben von: Manuel Mora
Fen Wang
Jorge Marx Gomez
Hector Duran-Limon
Verlag: Springer International Publishing
Electronic ISBN: 978-3-031-40956-1
Print ISBN: 978-3-031-40955-4
DOI: https://doi.org/10.1007/978-3-031-40956-1