Skip to main content

2021 | Buch

Python Programming for Data Analysis

insite
SUCHEN

Über dieses Buch

This textbook grew out of notes for the ECE143 Programming for Data Analysis class that the author has been teaching at University of California, San Diego, which is a requirement for both graduate and undergraduate degrees in Machine Learning and Data Science. This book is ideal for readers with some Python programming experience. The book covers key language concepts that must be understood to program effectively, especially for data analysis applications. Certain low-level language features are discussed in detail, especially Python memory management and data structures. Using Python effectively means taking advantage of its vast ecosystem. The book discusses Python package management and how to use third-party modules as well as how to structure your own Python modules. The section on object-oriented programming explains features of the language that facilitate common programming patterns.

After developing the key Python language features, the book moves on to third-party modules that are foundational for effective data analysis, starting with Numpy. The book develops key Numpy concepts and discusses internal Numpy array data structures and memory usage. Then, the author moves onto Pandas and details its many features for data processing and alignment. Because strong visualizations are important for communicating data analysis, key modules such as Matplotlib are developed in detail, along with web-based options such as Bokeh, Holoviews, Altair, and Plotly.

The text is sprinkled with many tricks-of-the-trade that help avoid common pitfalls. The author explains the internal logic embodied in the Python language so that readers can get into the Python mindset and make better design choices in their codes, which is especially helpful for newcomers to both Python and data analysis.

To get the most out of this book, open a Python interpreter and type along with the many code samples.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Basic Programming
Abstract
Understanding the internal logic of the Python language makes it easier to use effectively. We provide the motivations and development for key data structures such as lists and dictionaries as well as looping structures, decorators, and generators. We detail how memory is used with these data structures as well as an in-depth breakdown of the internals of Python functions. Python asynchronous programming via asyncio is discussed as are methods for debugging and logging codes.
José Unpingco
Chapter 2. Object-Oriented Programming
Abstract
Python is an object-oriented language but with many features implemented by convention instead of by the language itself. This leads to more flexibility in object-oriented design. We discuss and develop examples using Python multiple inheritance and break down the individual elements of Python class design including class functions and static methods. Metaprogramming techniques such monkey-patching are developed alongside abstract base classes. Some common design patterns implemented using Python are also discussed.
José Unpingco
Chapter 3. Using Modules
Abstract
Python comes with an amazing standard library as well as a lively community of third-party third-party modules. Thus, using modules effectively is key to good Python programming. This section develops the methods and strategies for using both the standard library and third-party modules, as well as recommendations for creating virtual environments for code development and deployment. Both the pip andn conda package managers are discussed.
José Unpingco
Chapter 4. Numpy
Abstract
Numpy numerical arrays are the foundation of all data science and machine learning in Python. This section develops the Numpy array data structure in detail, especially memory management. Slicing, reshaping, and stacking arrays are developed in detail. Using the wide variety of universal functions to accelerate numerical computations is discussed, as is broadcasting, arguably the most powerful feature of Numpy arrays. Managing numerical types using Numpy dtypes is key to accelerating computations and using memory effectively. Numpy encloses a powerful linear algebra library that is also discussed.
José Unpingco
Chapter 5. Pandas
Abstract
Pandas is a powerful data processing library that makes complicated data transformations almost automatic. This chapter develops the key data structures of Pandas, the Series and DataFrame, as well as how to use them effectively. Pandas categorical objects allow for efficient memory usage. Like Numpy, Pandas also supports broadcasting. Understanding the Pandas MultiIndex object helps slicing and aligning multidimensional data. Pandas provides an extension framework for customizing the visual display of DataFrames, which abbreviates codes by adding new code to the DataFrame itself. Python supports methods such as rolling and filling, which are very important longitudinal time-series analysis.
José Unpingco
Chapter 6. Visualizing Data
Abstract
Data visualization is key for presenting analysis results as well as for debugging codes. Matplotlib is developed in detail as are web-based visualization alternatives such as Bokeh, Altair, Holoviews, and Plotly. The Seaborn statistical visualization module, which is built on top of Matplotlib, is developed in detail.
José Unpingco
Backmatter
Metadaten
Titel
Python Programming for Data Analysis
verfasst von
Dr. José Unpingco
Copyright-Jahr
2021
Electronic ISBN
978-3-030-68952-0
Print ISBN
978-3-030-68951-3
DOI
https://doi.org/10.1007/978-3-030-68952-0

Neuer Inhalt