Top

2024 | Book

Read chapter Read first chapter

Data and Process Visualisation for Graphic Communication

A Hands-on Approach with Python

Author: Francesco Bianconi

Publisher: Springer Nature Switzerland

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book guides the reader through the process of graphic communication with a particular focus on representing data and processes. It considers a variety of common graphic communication scenarios among those that arise most frequently in practical applications.

The book is organized in two parts: representing data (Part I) and representing processes (Part II). The first part deals with the graphical representation of data. It starts with an introductory chapter on the types of variables, then guides the reader through the most common data visualization scenarios – i.e.: representing magnitudes, proportions, one variable as a function of the other, groups, relations, bivariate, trivariate and geospatial data. The second part covers various tools for the visual representation of processes; these include timelines, flow-charts, Gantt charts and PERT diagrams. In addition, the book also features four appendices which cover cross-chapter topics: mathematics and statistics review, Matplotlib primer, color representation and usage, and representation of geospatial data.

Aimed at junior and senior undergraduate students in various technical, scientific, and economic fields, this book is also a valuable aid for researchers and practitioners in data science, marketing, entertainment, media and other fields.

Frontmatter

Data

Frontmatter

Chapter 1. Introducing Data

Abstract

The aim of this first chapter is to briefly introduce the reader to the different types of data. We explain the distinction between quantitative and categorical variables as well as ordered vs. unordered variables. Additionally, the notions of dimension and measure are also presented.

Francesco Bianconi

Chapter 2. Magnitudes

Abstract

This chapter deals with the graphical representation of magnitudes over a set of classes, where classes are identified by a categorical variable (e.g., country, province, gender) or a numerical discrete one (e.g., year). Magnitudes indicate the level of some character of a class; hence magnitudes are expressed as a numerical, continuous or discrete variable—for instance population, gross domestic product, area, etc. The bar chart and its variations (paired bar charts, stacked bar charts, and multiple bar charts) are the mainstay in this context and are all presented in this chapter. Another option for displaying magnitudes—packed bubble charts-is also discussed.

Francesco Bianconi

Chapter 3. Proportions

Abstract

The topic of this chapter is the visualization of proportions, that is, part-to-whole relationships. We are in the situation where the data can be partitioned into a set of classes, for instance gender, nationality, age group, etc. The classes are usually expressed as a categorical variable; the proportion values are expressed as continuous variables with values in [0, 1] or, equivalently, as percentages. The general goal is to emphasize the relative importance of the categories while also bringing the reader’s attention to each category’s portion of the total. We present different tools for visualizing proportions; these include pie charts, doughnut charts, waffle charts, hundred percent stacked bar charts, hundred percent divergent stacked bar charts, and tree maps.

Francesco Bianconi

Chapter 4. One Variable as a Function of the Other

Abstract

This chapter explores ways to represent relationships between two variables a, b when the relationship can be expressed as an explicit function, that is, in the form b = f(a). We refer to a and b respectively as the independent and dependent variables. It is customary to plot the independent variable on the x-axis of the chart and the dependent one on the y-axis. The function f can be given as an algebraic expression or, more commonly, as a list of (a, b) pairs. The chapter presents two different tools for visualizing one variable as a function of the other: line charts and slope charts.

Francesco Bianconi

Chapter 5. Frequency Distributions

Abstract

Consider a numerical variable x (continuous or discrete) with values in [a, b] and let \(\mathcal {P}\) be a partition of [a, b] into K monotonically increasing intervals (bins). The frequency distribution (histogram) of x over \(\mathcal {P}\) is a function that returns the number of values of x that fall within each bin. Visualizing the distribution of all the observations in a quantitative dataset is essential to gain a better understanding of its shape, center, and spread. This chapter presents different approaches for visualizing frequency distributions: histogram plots, dot diagrams, pyramid plots, and area charts.

Francesco Bianconi

Chapter 6. Groups

Abstract

The focus of this chapter is the representation of sets of data of a continuous or discrete variable (the target variable) across levels of a categorical or numerical discrete variable (the group variable). There are a variety of choices, and the selection primarily depends on the degree of data aggregation desired for the visualization. We describe how to show every data point in each group through strip plots or swarm plots, to display summary statistics of each group by box plots, or to visualize the overall frequency distribution via violin plots. The chapter also describes how to combine the above plots in various ways, for instance by overlapping strip plots with box plots and violin plots with box plots.

Francesco Bianconi

Chapter 7. Relations

Abstract

In numerous scenarios it may be useful to illustrate the connections between categories that represent items, entities, or classes. This chapter is concerned with representing pairwise interaction between categories when the interaction consists of some kind of uni- or bi-directional quantitative flow from one class/entity to another. In this type of chart there is one measure (the amount of flow between the classes/entities) and one or more dimensions that define the classes. This chapter introduces two specific tools for representing relations: chord diagrams and Sankey diagrams.

Francesco Bianconi

Chapter 8. Bivariate Data

Abstract

This chapter deals with bivariate observations, i.e., those where datapoints consist of pairs of values (x, y) of numeric (discrete or, more commonly, continuous) variables. The variables can either represent raw features (e.g., height, weight, blood pressure) or be the result of a transformation from originally multivariate data via some dimensionality reduction procedure. The main tools for representing bivariate data are the scatter plots, which are also the focus of this chapter. Representing bivariate data by scatter plots plays a major role in exploratory data analysis, particularly for (1) qualitatively assessing the correlation between variables (or the lack of it) and (2) visualizing clusters. The chapter shows methods and applications.

Francesco Bianconi

Chapter 9. Trivariate Data

Abstract

The chapter introduces different approaches for the visualization of trivariate data. Observations in this case are represented by triplets of values (x, y, z) where the third variable z is typically numeric and continuous, whereas x and y can be either numerical (discrete or continuous) or categorical. The solution is to plot the data on the xy plane and delegate the encoding of z to marker size and/or color. In this context there are two scenarios: (a) x and y are continuous variables and (b) x and y are numerical discrete or categorical variables. The chapter describes solutions for both situations, respectively scatter bubble plots for (a) and heat maps and lattice bubble plots for (b).

Francesco Bianconi

Chapter 10. Geospatial Data

Abstract

This chapter addresses the visualization of geospatial data. With this term we refer to any kind of data that carry information about locations on the Earth’ s surface. Geospatial data are in a way equivalent to trivariate data, except they carry information specific to locations on the Earth’s surface-typically country, region, province, and/or other boundaries-but in some cases also cities, rivers, and other such landmarks. The chapter introduces and discusses four tools for representing geospatial data: choroplet maps, hexgrid maps, proportional symbol maps, and cartograms.

Francesco Bianconi

Representing Processes

Frontmatter

Chapter 11. Timelines

Abstract

A timeline is a visual representation of a period of time where significant events are highlighted. Timelines are commonly used to visualize and understand the order and duration of a series of events occurring (or predicted to occur) in a stretch of time delimited by a start and an end point. The events are usually represented as bullets on a straight line (vertical or horizontal), and each bullet is accompanied by a text label describing the corresponding event. Timelines can be used to display chronologies, and find application in various fields including history, genealogy, project management, education, journalism, communication, and psychotherapy. The chapter shows how to generate horizontal and vertical timelines using Matplotlib.

Francesco Bianconi

Chapter 12. Flowcharts

Abstract

Flowcharts are diagrams that display a problem’s definition, analysis, or solution approach and employ symbols to represent activities, data, flow, equipment, and other actions. Various types of flowcharts exist, among which are data, system, and program flowcharts. This chapter focuses on program flowcharts, which are the most commonly used, in practice, to show the sequence of steps and decisions required to perform a process or algorithm. We present examples on how to generate flowcharts using the Schemdraw package.

Francesco Bianconi

Chapter 13. Gantt Charts

Abstract

Gantt charts are one of the most common tools for graphic communication in project management. Named after American engineer Henry L. Gantt (1861–1919), Gantt charts are essentially bar charts in which the bars represent activities (tasks) in a process. In the typical layout the bars are plotted horizontally with the x-axis representing time, while the left of the chart reports the names of the tasks. The initial and final positions of each bar reflect the start and end dates of the activity; hence the length of the bar is proportional to the amount of time required to complete that task. A Gantt chart can optionally show the activities grouped into phases. The chapter provides examples of Gantt charts with activities alone as well as with phases and activities.

Francesco Bianconi

Chapter 14. PERT Diagrams

Abstract

The acronym PERT (Program Evaluation Review Technique) refers to a class of visuals in which processes are depicted as directed acyclic graphs. The activities can be either nodes or arcs of the graph. The methodology was originally developed by the U.S. Navy in the late 1950s as a part of the Polaris submarine missile development program. One major advantage of PERT diagrams over Gantt charts is that they allow the visualization of dependencies between the activities, whereas Gantt charts do not. There are two main types of PERT diagrams, both of which are treated in this chapter: those where the activities are represented as nodes of the graph (activity-on-node, AoN) and those in which they are arcs of the graph (activity-on-arc, AoA).

Francesco Bianconi

Backmatter

Title: Data and Process Visualisation for Graphic Communication
Author: Francesco Bianconi
Publisher: Springer Nature Switzerland
Electronic ISBN: 978-3-031-57051-3
Print ISBN: 978-3-031-57050-6
DOI: https://doi.org/10.1007/978-3-031-57051-3

Springer Professional

Data and Process Visualisation for Graphic Communication

A Hands-on Approach with Python

About this book

Table of Contents

Frontmatter

Data

Frontmatter

Chapter 1. Introducing Data

Chapter 2. Magnitudes

Chapter 3. Proportions

Chapter 4. One Variable as a Function of the Other

Chapter 5. Frequency Distributions

Chapter 6. Groups

Chapter 7. Relations

Chapter 8. Bivariate Data

Chapter 9. Trivariate Data

Chapter 10. Geospatial Data

Representing Processes

Frontmatter

Chapter 11. Timelines

Chapter 12. Flowcharts

Chapter 13. Gantt Charts

Chapter 14. PERT Diagrams

Backmatter

Premium Partner