Skip to main content

2019 | Buch

Practical Text Analytics

Maximizing the Value of Text Data

verfasst von: Murugan Anandarajan, Chelsey Hill, Thomas Nolan

Verlag: Springer International Publishing

Buchreihe : Advances in Analytics and Data Science

insite
SUCHEN

Über dieses Buch

This book introduces text analytics as a valuable method for deriving insights from text data. Unlike other text analytics publications, Practical Text Analytics: Maximizing the Value of Text Data makes technical concepts accessible to those without extensive experience in the field. Using text analytics, organizations can derive insights from content such as emails, documents, and social media.

Practical Text Analytics is divided into five parts. The first part introduces text analytics, discusses the relationship with content analysis, and provides a general overview of text mining methodology. In the second part, the authors discuss the practice of text analytics, including data preparation and the overall planning process. The third part covers text analytics techniques such as cluster analysis, topic models, and machine learning. In the fourth part of the book, readers learn about techniques used to communicate insights from text analysis, including data storytelling. The final part of Practical Text Analytics offers examples of the application of software programs for text analytics, enabling readers to mine their own text data to uncover information.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction to Text Analytics
Abstract
In this chapter we define text analytics, discuss its origins, cover its current usage, and show its value to businesses. The chapter describes examples of current text analytics uses to demonstrate the wide array of real-world impacts. Finally, we present a process road map as a guide to text analytics and to the book.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan

Planning the Text Analytics Project

Frontmatter
Chapter 2. The Fundamentals of Content Analysis
Abstract
In this chapter, the reader is provided with an introduction to content analysis, which highlights the congruencies between content analysis and text analytics. The reader learns the differences between content types and is provided with a demonstration of the content analysis process. The chapter concludes with a discussion on how to properly manage the subject area’s current theory for desired results.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 3. Planning for Text Analytics
Abstract
This chapter encourages readers to consider the reason for their analysis to chart the correct path for conducing it. This chapter outlines the process for planning the text analytics process. The chapter starts by asking the analyst to consider the objective, data availability, cost, and outcome desired. Analysis paths are then shown as possible ways to achieve the goal.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan

Text Preparation

Frontmatter
Chapter 4. Text Preprocessing
Abstract
This chapter starts the process of preparing text data for analysis. This chapter introduces the choices that can be made to cleanse text data, including tokenizing, standardizing and cleaning, removing stop words, and stemming. The chapter also covers advanced topics in text preprocessing, such as n-grams, part-of-speech tagging, and custom dictionaries. The text preprocessing decisions influence the text document representation created for analysis.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 5. Term-Document Representation
Abstract
This chapter details the process of converting documents into an analysis-ready term-document representation. Preprocessed text documents are first transformed into an inverted index for demonstrative purposes. Then, the inverted index is manipulated into a term-document or document-term matrix. The chapter concludes with descriptions of different weighting schemas for analysis-ready term-document representation.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan

Text Analysis Techniques

Frontmatter
Chapter 6. Semantic Space Representation and Latent Semantic Analysis
Abstract
In this chapter, we introduce latent semantic analysis (LSA), which uses singular value decomposition (SVD) to reduce the dimensionality of the document-term representation. This method reduces the large matrix to an approximation that is made up of fewer latent dimensions that can be interpreted by the analyst. Two important concepts in LSA, cosine similarity and queries, are explained. Finally, we discuss decision-making in LSA.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 7. Cluster Analysis: Modeling Groups in Text
Abstract
This chapter explains the unsupervised learning method of grouping data known as cluster analysis. The chapter shows how hierarchical and k-means clustering can place text or documents into significant groups to increase the understanding of the data. Clustering is a valuable tool that helps us find naturally occurring similarities.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 8. Probabilistic Topic Models
Abstract
In this chapter, the reader is introduced to an unsupervised, probabilistic analysis model known as topic models. In topic models, the full TDM (or DTM) is broken down into two major components: the topic distribution over terms and the document distribution over topics. The topic models introduced in this chapter include latent Dirichlet allocation, dynamic topic models, correlated topic models, supervised latent Dirichlet allocation, and structural topic models. Finally, decision-making and topic model validation are presented.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 9. Classification Analysis: Machine Learning Applied to Text
Abstract
This chapter introduces classification models. We begin with a description of the various measures for determining the model’s strength. Then, we explain popular classification models including Naïve Bayes, k-nearest neighbors, support vector machines, decision trees, random forests, and neural networks. We demonstrate the use of each model with the data from the example with the four dog breeds.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 10. Modeling Text Sentiment: Learning and Lexicon Models
Abstract
This chapter presents two types of sentiment analysis: lexicon-based and learning-based. Both methods aim to extract the overall feeling or opinion from text. Each approach is described with an example, and then the difficulties of sentiment analysis are discussed.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan

Communicating the Results

Frontmatter
Chapter 11. Storytelling Using Text Data
Abstract
This chapter explores the concept of data storytelling, an approach used to communicate insights to an audience to inform, influence, and spur action. A storytelling framework is included for reference and can be used to develop, focus, and deliver the most important concepts from an analysis that should be conveyed within a narrative.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 12. Visualizing Analysis Results
Abstract
Text visualizations are the topic for this chapter. The chapter begins with general techniques to help create effective visualizations. From there, it moves to common visualizations used in text analysis. The chapter describes heat maps, word clouds, top term plots, cluster visualizations, topics over time, and network graphs.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan

Text Analytics Examples

Frontmatter
Chapter 13. Sentiment Analysis of Movie Reviews Using R
Abstract
In this chapter, the reader is presented with a step-by-step lexicon-based sentiment analysis using the R open-source software. Using 1,000 movie reviews with sentiment classification labels, the example analysis performs sentiment analysis to assess the predictive accuracy of built-in lexicons in R. Then, a custom stop list is used and accuracy is reevaluated.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 14. Latent Semantic Analysis (LSA) in Python
Abstract
This chapter presents the application of latent semantic analysis (LSA) in Python as a complement to Chap. 6, which covers semantic space modeling and LSA. In this chapter, we will present how to implement text analysis with LSA through annotated code in Python. In this example, we will run LSA over a dataset that includes 401 instances of both online and offline review sources from the Areias do Seixo Eco-Resort (Data available at https://​archive.​ics.​uci.​edu/​ml/​datasets/​Eco-hotel).
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 15. Learning-Based Sentiment Analysis Using RapidMiner
Abstract
This chapter provides a step-by-step sentiment analysis in RapidMiner using classification analysis. After being introduced to the RapidMiner software, the reader learns to build a process map-based analysis to classify Amazon reviews by sentiment. Two machine learning methods, k-nearest neighbor and naïve Bayes, are demonstrated and assessed for predictive performance.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Chapter 16. SAS Visual Text Analytics
Abstract
This chapter presents a step-by-step visualization analysis of over 4,000 health news tweets using SAS Visual Text Analytics (VTA). SAS VTA is a commercial software program that uses a pipeline, or process-based, approach to the analysis of text. This chapter demonstrates the creation of visualizations including tree maps, line charts, pie charts, and word clouds using the software.
Murugan Anandarajan, Chelsey Hill, Thomas Nolan
Backmatter
Metadaten
Titel
Practical Text Analytics
verfasst von
Murugan Anandarajan
Chelsey Hill
Thomas Nolan
Copyright-Jahr
2019
Verlag
Springer International Publishing
Electronic ISBN
978-3-319-95663-3
Print ISBN
978-3-319-95662-6
DOI
https://doi.org/10.1007/978-3-319-95663-3

Premium Partner