Skip to main content
main-content

Über dieses Buch

This book is inspired by the Machine Learning Model Building Process Flow, which provides the reader the ability to understand a ML algorithm and apply the entire process of building a ML model from the raw data.

This new paradigm of teaching Machine Learning will bring about a radical change in perception for many of those who think this subject is difficult to learn. Though theory sometimes looks difficult, especially when there is heavy mathematics involved, the seamless flow from the theoretical aspects to example-driven learning provided in Blockchain and Capitalism makes it easy for someone to connect the dots.

For every Machine Learning algorithm covered in this book, a 3-D approach of theory, case-study and practice will be given. And where appropriate, the mathematics will be explained through visualization in R.

All practical demonstrations will be explored in R, a powerful programming language and software environment for statistical computing and graphics. The various packages and methods available in R will be used to explain the topics. In the end, readers will learn some of the latest technological advancements in building a scalable machine learning model with Big Data.

Who This Book is For:

Data scientists, data science professionals and researchers in academia who want to understand the nuances of Machine learning approaches/algorithms along with ways to see them in practice using R. The book will also benefit the readers who want to understand the technology behind implementing a scalable machine learning model using Apache Hadoop, Hive, Pig and Spark.

What you will learn:

1. ML model building process flow2. Theoretical aspects of Machine Learning3. Industry based Case-Study4. Example based understanding of ML algorithm using R5. Building ML models using Apache Hadoop and Spark

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction to Machine Learning and R

Machine learning played a pivotal role in transforming statistics into a more accessible subject by showing the applications to the real-world problems. However, many statisticians probably won't agree with machine learning giving life to statistics, giving rise to the never-ending chicken and egg conundrum kind of discussions.

Karthik Ramasubramanian, Abhishek Singh

Chapter 2. Data Preparation and Exploration

As we emphasized in our introductory chapter on applying machine learning (ML) algorithms with a simplified process flow, in this chapter, we go deeper into the first block of machine learning process flow—data exploration and preparation.

Karthik Ramasubramanian, Abhishek Singh

Chapter 3. Sampling and Resampling Techniques

Sampling is an important block in our machine learning process flow and it serves the dual purpose of cost savings in data collection and reduction in computational cost without compromising the power of the machine learning model.

Karthik Ramasubramanian, Abhishek Singh

Chapter 4. Data Visualization in R

Information visualization is the broadest term that could be taken to subsume all the developments described here. At this level, almost anything, if sufficiently organized, is information of a sort. Tables, graphs, maps, and even text, whether static or dynamic, provide some means to see what lies within, determine the answer to a question, find relations, and perhaps apprehend things which could not be seen so readily in other forms.

Karthik Ramasubramanian, Abhishek Singh

Chapter 5. Feature Engineering

In machine learning, feature engineering is a blanket term covering both statistical and business judgment aspects of modeling real-world problems. Feature engineering is a new term coined recently to give due importance to the domain knowledge required to select sets of features for machine learning algorithms. It is one of the reasons that most of the machine learning professionals call it an informal process. In this chapter, we will provide an easy-to-use guide of key terms and methodology used in feature engineering. The chapter will give due weight to the domain knowledge and some common business limitations while using machine learning algorithms to solve business problems.

Karthik Ramasubramanian, Abhishek Singh

Chapter 6. Machine Learning Theory and Practices

The world is quickly adapting the use of Machine Learning (ML). Whether its driverless cars, the intelligent personal assistant, or machines playing the games like Go and Jeopardy against humans, ML is pervasive.

Karthik Ramasubramanian, Abhishek Singh

Chapter 7. Machine Learning Model Evaluation

In many cases, we may even discard the complete model based on the performance metrics. This phase of the PEBE plays a very critical role in the success of any ML based projects.

Karthik Ramasubramanian, Abhishek Singh

Chapter 8. Model Performance Improvement

Model performance is a broad term generally used to measure how the model performs on a new dataset, usually a test dataset. The performance metrics also play the role of thresholds to decide whether the model can be put into actual decision making systems or needs improvements. In the previous chapter, we discussed some performance metrics for our continuous and discrete cases. In this chapter, we will discuss how changing the modeling process can help us improve model performance on the metrics.

Karthik Ramasubramanian, Abhishek Singh

Chapter 9. Scalable Machine Learning and Related Technologies

A few years back, you would have not heard the word "scalable" in machine learning parlance. The reason was mainly attributed to the lack of infrastructure, data, and real-world application. Machine learning was being much talked about in the research community of academia or in well-funded industry research labs. A prototype of any real-world application using machine learning was considered a big feat and a demonstration of breakthrough research. However, time has changed ever since the availability of powerful commodity hardware at a reduced cost and big data technology's widespread adaption. As a result, the data has become easily accessible and software developments are becoming more and more data savvy. Every single byte of data is being captured even if its use is not clear in the near future.

Karthik Ramasubramanian, Abhishek Singh

Backmatter

Weitere Informationen

Premium Partner

Neuer Inhalt

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

Whitepaper

- ANZEIGE -

Product Lifecycle Management im Konzernumfeld – Herausforderungen, Lösungsansätze und Handlungsempfehlungen

Für produzierende Unternehmen hat sich Product Lifecycle Management in den letzten Jahrzehnten in wachsendem Maße zu einem strategisch wichtigen Ansatz entwickelt. Forciert durch steigende Effektivitäts- und Effizienzanforderungen stellen viele Unternehmen ihre Product Lifecycle Management-Prozesse und -Informationssysteme auf den Prüfstand. Der vorliegende Beitrag beschreibt entlang eines etablierten Analyseframeworks Herausforderungen und Lösungsansätze im Product Lifecycle Management im Konzernumfeld.
Jetzt gratis downloaden!

Bildnachweise