Skip to main content
main-content

Über dieses Buch

Know how to do machine learning with Microsoft technologies. This book teaches you to do predictive, descriptive, and prescriptive analyses with Microsoft Power BI, Azure Data Lake, SQL Server, Stream Analytics, Azure Databricks, HD Insight, and more.

The ability to analyze massive amounts of real-time data and predict future behavior of an organization is critical to its long-term success. Data science, and more specifically machine learning (ML), is today’s game changer and should be a key building block in every company’s strategy. Managing a machine learning process from business understanding, data acquisition and cleaning, modeling, and deployment in each tool is a valuable skill set.

Machine Learning with Microsoft Technologies is a demo-driven book that explains how to do machine learning with Microsoft technologies. You will gain valuable insight into designing the best architecture for development, sharing, and deploying a machine learning solution. This book simplifies the process of choosing the right architecture and tools for doing machine learning based on your specific infrastructure needs and requirements.

Detailed content is provided on the main algorithms for supervised and unsupervised machine learning and examples show ML practices using both R and Python languages, the main languages inside Microsoft technologies.

What You'll Learn

Choose the right Microsoft product for your machine learning solutionCreate and manage Microsoft’s tool environments for development, testing, and production of a machine learning projectImplement and deploy supervised and unsupervised learning in Microsoft products Set up Microsoft Power BI, Azure Data Lake, SQL Server, Stream Analytics, Azure Databricks, and HD Insight to perform machine learning Set up a data science virtual machine and test-drive installed tools, such as Azure ML Workbench, Azure ML Server Developer, Anaconda Python, Jupyter Notebook, Power BI Desktop, Cognitive Services, machine learning and data analytics tools, and more Architect a machine learning solution factoring in all aspects of self service, enterprise, deployment, and sharing

Who This Book Is For

Data scientists, data analysts, developers, architects, and managers who want to leverage machine learning in their products, organization, and services, and make educated, cost-saving decisions about their ML architecture and tool set.

Inhaltsverzeichnis

Frontmatter

Getting Started

Frontmatter

Chapter 1. Introduction to Machine Learning

Abstract
Machine learning allows decision makers to gain more insight from their data. Today, the application of machine learning is no longer limited to research and specific industries. In most fields, there is a valuable opportunity to use machine learning to obtain more concise and in-depth information from available data. As a result, most big software companies provide opportunities to their users to access machine learning via easy-to-use software. For example, Microsoft, a pioneer in developing business software, leverages machine learning in developing products such as the Bing search engine, Xbox, Kinect, and others. The use of machine learning in Microsoft is not limited to the production of new software. In many of Microsoft’s software development tools, such as Microsoft SQL Server, Power BI, and .NET, there is an opportunity to use machine learning to create smarter applications and reposts.
Leila Etaati

Chapter 2. Introduction to R

Abstract
R is undoubtedly one of the most popular languages for machine learning. It is a programming language and free software environment used mainly for statistical computing and data visualization. R has been used by academics, data scientists, and statisticians for a long time. It is a statistical language, which is excellent for machine learning, statistics, and use as a visualization tool. There is an integration between Microsoft technologies and R language that enhances the capability of machine learning in Microsoft applications and reports. R is an open source and proprietory language that is available for the Windows and Mac operating systems. It can be extended via packages [1]. This chapter provides an overview on installing RStudio, and how to extend the R capability via installing packages, R data structures, machine learning, and statistical analysis and visualization with R will be explained.
Leila Etaati

Chapter 3. Introduction to Python

Abstract
Python is one of the main languages used for performing machine learning. It is a multi-purpose language that has been leveraged for device programming, object-oriented programming, machine learning, and so forth. In this chapter, you will learn
Leila Etaati

Chapter 4. R Visualization in Power BI

Abstract
Power BI is a self-service business intelligence (BI) software. This tool can be used for data visualization, data cleaning, modeling, analysis, and collaboration at enterprise scale. Many books and blogs have been published about how to use Power BI. In this chapter, I am going to show how we can leverage R to create better visualizations and get additional value from Power BI. In this chapter, I will explain how to set up R within Power BI, how to draw charts in Power BI using R scripts, how to set up the Power BI report environment, how to set up Power BI to write R code, and how to draw R charts in Power BI.
Leila Etaati

Machine Learning with R and Power BI

Frontmatter

Chapter 5. Business Understanding

Abstract
Business understanding is the main and first step in undertaking machine learning in any platform or language. Not all business problems can be addressed by machine learning approaches. There are some basic categories into which machine learning falls, including supervised learning and unsupervised learning.
Leila Etaati

Chapter 6. Data Wrangling for Predictive Analysis

Abstract
In the machine learning process, after business understanding, the next step is collecting the right data, feature selection, and data wrangling. Data wrangling includes data cleaning, joining different data sources, quality control, data integration, data transformation, and data reduction processes (Figure 6-1).
Leila Etaati

Chapter 7. Predictive Analysis in Power Query with R

Abstract
In this chapter, the process of doing machine learning inside Power BI Query Editor by writing R code will be explained. The main aim here is to provide some examples of how we can use R codes for predictive analysis (classification and regression). The concepts and codes related to some of the algorithms will be provided. In addition, the process of automating predictions via parameters inside Power BI Query Editor also will be discussed.
Leila Etaati

Chapter 8. Descriptive Analysis in Power Query with R

Abstract
This chapter focuses on descriptive analysis in Power BI. A brief introduction explains how we can use descriptive analysis to help decision making. Next comes a brief introduction to clustering, how clustering is performed in Power BI Report, and how we can do clustering in Power Query Editor. Finally, I will cover how to do market basket analysis in Power BI.
Leila Etaati

Machine Learning SQL Server

Frontmatter

Chapter 9. Using R with SQL Server 2016 and 2017

Abstract
In 2016, Microsoft announced the possibility of writing R codes inside SQL Server Management Studio. To be able to write R code in SQL Server 2016, we must install R Services first. In 2017, the ability to write Python codes inside SQL Server 2017 was provided. A developer can write the R and Python codes inside SQL Server Management Studio, using Machine Learning Server and accessing the different R or Python packages. In 2017, instead of R services, we have machine learning services, which allow us to embed R or Python codes in SQL scripts. In this chapter, the process of how we can set up SQL Server Management Studio to write R or Python scripts is explained. A brief explanation of some essential packages is also provided. In addition, best practices for how we can create a model and reuse it for another data set are explained.
Leila Etaati

Chapter 10. Azure Databricks

Abstract
Databricks is an analytics service based on the Apache Spark open source project. Apache Spark is a batch processing and real time processing environment. Apache Spark is quite popular among data scientists because of its ability to analyze huge amounts of data, its streaming capabilities, graph computation, machine learning, and interactive queries engine. Spark provides in-memory cluster computing. One of the popular tools for big data analytics on Spark is Databricks. Databricks has been used for ingesting a significant amount of data, cleaning data, applying machine learning, and so forth. In February 2018, there was an integration between Microsoft Azure and Databricks that provides a better collaboration between data engineers, data scientists, and data analytics. This integration provides data science and data engineering teams with a fast, easy, and collaborative Spark-based platform in Azure [1]. Azure Databricks is a new platform for big data analytics and machine learning. The notebook in Azure Databricks enables data engineers, data scientists, and business analysts to collaborate using a single tool. This chapter gives an overview of what Azure Databricks is, the environment it inhabits, and its use in data science.
Leila Etaati

Machine Learning in Azure

Frontmatter

Chapter 11. R in Azure Data Lake

Abstract
Azure Data Lake Store is one of the components in Microsoft Cloud that helps developers, data scientists, and analysts to store data of any size and shape. Azure Data Lake is optimized for processing large amounts of data. It provides parallel processing with optimum performance. In Azure Data Lake, we can create a hierarchical data folder structure. Because of these capabilities, Azure Data Lake makes it easy for data scientists to apply advanced analytics and machine learning modeling with high scalability cost-effectiveness. Azure Data Lake Analytics includes U-SQL, which is a language like SQL that enables you to process unstructured data [1]. It is possible to perform machine learning inside Azure Data Lake and explore the Azure Data Lake from RStudio to create models inside the RStudio environment. Moreover, it is possible to get data from Azure Data Lake with Hive query and to use that data inside Azure Machine Learning. In this chapter, you will see how we can write and work with data, using U-SQL language with R in Azure Data Lake, and how we can import data from Azure Data Lake to RStudio or import data from RStudio into Azure Data Lake.
Leila Etaati

Chapter 12. Azure Machine Learning Studio

Abstract
Azure Machine Learning (ML) Studio is a cloud machine learning platform. It features a drag-and-drop environment that is easy to use. It contains more than 20 predefined machine learning algorithms. With Azure ML Studio, it is possible to import data from different resources, devise machine learning experiments, and create a web service from the model. Moreover, it is possible to run the R or Python codes inside the Azure ML Studio environment. In this chapter, first I will explain the environment and how to formulate an experiment in it, how to create a simple machine learning model, how to test and evaluate the model, and how to import data from the local machine from other Azure components. Also, I will discuss the process of creating a web service from the model. The process of how to run R codes inside the Azure ML Studio will be explored. In addition, the process of exploring an Azure ML experiment in R Studio will be elaborated.
Leila Etaati

Chapter 13. Machine Learning in Azure Stream Analytics

Abstract
Azure Stream Analytics is an event-processing engine that allows users to analyze high volumes of data streaming from devices, sensors, and applications. Azure Stream Analytics can be used for Internet of Things (IoT) real-time analytics, remote monitoring and data inventory controls. However, Azure Stream Analytics is another component in Azure on which we could run machine learning. It is possible to use a machine learning model API created in Azure ML Studio inside Azure Stream Analytics for applying machine learning to streaming data from sensors, applications, and live databases. In this chapter, I will explain how to use machine learning inside Azure Stream Analytics. First, a general introduction to Azure Stream Analytics is given, then, a simple example of an Azure ML Studio API that is going to be applied to the stream data is presented.
Leila Etaati

Chapter 14. Azure Machine Learning (ML) Workbench

Abstract
Azure ML Workbench is another tool introduced by Microsoft in 2017. Azure Machine Learning services (preview) integrate end-to-end data science with advanced analytics tools. They help professional data scientists prepare data, develop experiments, and deploy models at cloud scale [1]. First in this chapter, a brief introduction into Azure ML Workbench is provided, then a comparison between Azure ML Studio and Azure ML Workbench is made. The process of installing Azure ML Workbench will be presented next. After, loading, preparing, and visualizing the data will be discussed.
Leila Etaati

Chapter 15. Machine Learning on HDInsight

Abstract
In this chapter, an overview of how to use HDInsight for the purpose of machine learning will be presented. HDInsight is based on Apache Spark and used for in-memory cluster processing. Processing data in-memory is much faster than disk-based computing. Spark also supports the Scala language, which supports distributed data sets. Creating a cluster in Spark is very fast, and it is able to use Jupyter Notebook, which makes data processing and visualization easier. Spark clusters can also be integrated with Azure Event Hub and Kafka. Moreover, it is possible to set up Azure Machine Learning (ML) services to run distributed R computations. In the next section, the process of setting up Spark in HDInsight will be discussed.
Leila Etaati

Chapter 16. Data Science Virtual Machine and AI Frameworks

Abstract
Data Science Virtual Machine (DSVM) is a virtual machine on the Azure cloud that is customized for doing data science. DSVM has some pre-configured and preinstallation tools that help users build artificial intelligence (AI) applications. DSVM assists data science teams to access a consistent setup. In this chapter, a brief introduction to DSVM and how to install it is provided, in addition to an overview of the tools installed.
Leila Etaati

Chapter 17. Deep Learning Tools with Cognitive Toolkit (CNTK)

Abstract
Microsoft Cognitive Toolkit (CNTK) is an open source deep learning tool [1]. In this chapter, an introduction to neural network and deep learning will be provided first. Next, an introduction to what CNTK is and how it is accessed and installed will be provided. Finally, a basic case study using CNTK to solve a simple problem will be elaborated.
Leila Etaati

Data Science Virtual Machine

Frontmatter

Chapter 18. Cognitive Services Toolkit

Abstract
Microsoft Cognitive Services are collections of APIs and services that help developers create smarter applications and reports. By using Cognitive Services, developers can add such intelligent features as face recognition, emotion recognition, text analytics, and so forth, to their applications. This chapter first presents an overview of Cognitive Services and then explains how to use them for text analytics in Power BI Report. Finally, how to use Cognitive Services in a Windows application is explored briefly.
Leila Etaati

Chapter 19. Bot Framework

Abstract
A bot is an application that is able to interact with users conversationally [1]. It can be a very simple application that supports dialog and basic questions, or it can be sophisticated, capable of understanding language. In Microsoft Azure, it is possible to create a bot in C# or Node.js. You can create a bot using .NET SDK [2] and test it via such tools as Emulator [3]. In addition, some bot components help you to add more features [4]. In this chapter, a very simple bot using an Azure component will be presented. First, how to create a bot service in Azure will be shown, then how to create a simple bot for questions and answers will be presented, as well as a more complex one.
Leila Etaati

Chapter 20. Overview of Microsoft Machine Learning Tools

Abstract
The last 19 chapters were an overview of how you can undertake machine learning with different Microsoft products. First, an introduction to machine learning approaches, such as descriptive, predictive, and prescriptive analytics, was provided. The R language, as one of the principal languages for machine learning was then discussed. Following was a brief explanation of how to do machine learning using such tools as Power BI, Azure ML Studio, SQL Server, and others.
Leila Etaati

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise