Skip to main content

2021 | Buch

Advanced Analytics with Transact-SQL

Exploring Hidden Patterns and Rules in Your Data

insite
SUCHEN

Über dieses Buch

Learn about business intelligence (BI) features in T-SQL and how they can help you with data science and analytics efforts without the need to bring in other languages such as R and Python. This book shows you how to compute statistical measures using your existing skills in T-SQL. You will learn how to calculate descriptive statistics, including centers, spreads, skewness, and kurtosis of distributions. You will also learn to find associations between pairs of variables, including calculating linear regression formulas and confidence levels with definite integration.

No analysis is good without data quality. Advanced Analytics with Transact-SQL introduces data quality issues and shows you how to check for completeness and accuracy, and measure improvements in data quality over time. The book also explains how to optimize queries involving temporal data, such as when you search for overlapping intervals. More advanced time-oriented information in the book includes hazard and survival analysis. Forecasting with exponential moving averages and autoregression is covered as well.

Every web/retail shop wants to know the products customers tend to buy together. Trying to predict the target discrete or continuous variable with few input variables is important for practically every type of business. This book helps you understand data science and the advanced algorithms use to analyze data, and terms such as data mining, machine learning, and text mining.

Key to many of the solutions in this book are T-SQL window functions. Author Dejan Sarka demonstrates efficient statistical queries that are based on window functions and optimized through algorithms built using mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained so you can understand and modify the techniques presented.

T-SQL is supported in SQL Server, Azure SQL Database, and in Azure Synapse Analytics. There are so many BI features in T-SQL that it might become your primary analytic database language. If you want to learn how to get information from your data with the T-SQL language that you already are familiar with, then this is the book for you.

What You Will LearnDescribe distribution of variables with statistical measures

Find associations between pairs of variablesEvaluate the quality of the data you are analyzingPerform time-series analysis on your dataForecast values of a continuous variablePerform market-basket analysis to predict customer purchasing patternsPredict target variable outcomes from one or more input variablesCategorize passages of text by extracting and analyzing keywords

Who This Book Is For

Database developers and database administrators who want to translate their T-SQL skills into the world of business intelligence (BI) and data science. For readers who want to analyze large amounts of data efficiently by using their existing knowledge of T-SQL and Microsoft’s various database platforms such as SQL Server and Azure SQL Database. Also for readers who want to improve their querying by learning new and original optimization techniques.

Inhaltsverzeichnis

Frontmatter

Statistics

Frontmatter
Chapter 1. Descriptive Statistics
Abstract
Descriptive statistics summarize or quantitatively describe variables from a dataset. In a SQL Server table, a dataset is a set of the rows, or a rowset, that comes from a SQL Server table, view, or tabular expression. A variable is stored in a column of the rowset. In statistics, a variable is frequently called a feature.
Dejan Sarka
Chapter 2. Associations Between Pairs of Variables
Abstract
After successfully analyzing distributions of single variables, let’s move on to finding associations between pairs of variables. There are three possibilities.
Dejan Sarka

Data Preparation and Quality

Frontmatter
Chapter 3. Data Preparation
Abstract
In any analytical or business intelligence (BI) project, data preparation is crucial. It might also be the longest part of a project—exhausting and sometimes tedious. However, the success of a project heavily depends on data preparation.
Dejan Sarka
Chapter 4. Data Quality and Information
Abstract
You start a shiny new analytical project. You do the initial overview of the data. And bang! You meet probably the biggest issue in advanced analysis and business intelligence: data quality. Garbage in, garbage out is a very old rule. Before doing advanced analyses, it is always recommended that you check the data quality. Measuring improvements in data quality over time can help to understand the factors that influence it.
Dejan Sarka

Dealing with Time

Frontmatter
Chapter 5. Time-Oriented Data
Abstract
Understanding what kind of temporal data can appear in a database is very important. Some queries that deal with temporal data are hard to optimize. Data preparation of time series data has many of its own rules.
Dejan Sarka
Chapter 6. Time-Oriented Analyses
Abstract
Who are my top customers, and what are my top-selling products? How long is a customer faithful to the supplier or the subscribed services and service provider? Which are the most likely days to lose a customer? What are the sales for the next few periods? This chapter shows how to answer these questions using T-SQL.
Dejan Sarka

Data Science

Frontmatter
Chapter 7. Data Mining
Abstract
Every web and/or retail shop wants to know which products customers tend to buy together. Trying to predict a target discrete or continuous variable with few input variables is important in practically every type of business. This chapter introduces some of the most popular algorithms implemented in T-SQL. You learn about the following.
Dejan Sarka
Chapter 8. Text Mining
Abstract
This last chapter of the book introduces text mining with T-SQL. Text mining means analysis of texts. Text mining can include semantic search, term extraction, quantitative analysis of words and characters, and more. Data mining algorithms like association rules can be used to get a deeper understanding of the analyzed text. In this chapter, you learn about the following.
Dejan Sarka
Backmatter
Metadaten
Titel
Advanced Analytics with Transact-SQL
verfasst von
Dejan Sarka
Copyright-Jahr
2021
Verlag
Apress
Electronic ISBN
978-1-4842-7173-5
Print ISBN
978-1-4842-7172-8
DOI
https://doi.org/10.1007/978-1-4842-7173-5