Skip to main content
main-content
Top

About this book

Build working implementations of hash tables, written in the C programming language. This book starts with simple first attempts devoid of collision resolution strategies, and moves through improvements and extensions illustrating different design ideas and approaches, followed by experiments to validate the choices.
Hash tables, when implemented and used appropriately, are exceptionally efficient data structures for representing sets and lookup tables, providing low overhead, constant time, insertion, deletion, and lookup operations.
The Joys of Hashing walks you through the implementation of efficient hash tables and the pros and cons of different design choices when building tables. The source code used in the book is available on GitHub for your re-use and experiments.
What You Will LearnMaster the basic ideas behind hash tables
Carry out collision resolution, including strategies for handling collisions and their consequences for performance
Resize or grow and shrink tables as needed
Store values by handling when values must be stored with keys to make general sets and mapsWho This Book Is For
Those with at least some prior programming experience, especially in C programming.

Table of Contents

Frontmatter

Chapter 1. The Joys of Hashing

This book is an introduction to the hash table data structure. When implemented and used appropriately, hash tables are exceptionally efficient data structures for representing sets and lookup tables. They provide constant time, low overhead, insertion, deletion, and lookup. The book assumes the reader is familiar with programming and the C programming language. For the theoretical parts of the book, it also assumes some familiarity with probability theory and algorithmic theory.
Thomas Mailund

Chapter 2. Hash Keys, Indices, and Collisions

As mentioned in the introduction, this book is primarily about implementing hash tables and not hash functions. So to simplify the exposition, assume that the data values stored in tables are identical to the hash keys. In Chapter 5, you will address which changes you have to make to store application data together with keys, but for most of the theory of hash tables you only need to consider hash keys; everywhere else, you will view additional data as black box data and just store their keys. While the code snippets below cover all that you need to implement the concepts in the chapter, you cannot easily compile them from the book, but you can download the complete code listings from https://github.com/mailund/JoyChapter2 .
Thomas Mailund

Chapter 3. Collision Resolution, Load Factor, and Performance

Collisions are inevitable when using a hash table, at least if you want the table size, and thus the initialization time for the table, to be linear in the number of keys you put into it. Therefore, you need a way to deal with collisions so you can still insert keys if the bin you map it to is already occupied. There are two classical approaches to collision resolution: chaining (where you use linked lists to store colliding keys) and open addressing (where you find alternative empty slots to store values in when keys collide).
Thomas Mailund

Chapter 4. Resizing

If you know how performance degrades as the load factor of a hash table increases, you can use this to pick a table size where the expected performance matches your needs, presuming that you know how many keys the table will need to store. If you do not know the number of elements you need to store, n, then you cannot choose a table size, m, that ensures that α = n/m is below a desired upper bound. In most applications, you do not know n before you run your program. Therefore, you must adjust m as n increases by resizing the table.
Thomas Mailund

Chapter 5. Adding Application Keys and Values

So far, you have only considered storing keys in your hash tables. Most of the techniques for implementing hash tables do not depend on whether you store simple keys or whether you associate application values with them. The setup where you only store keys that you can also use as hash keys, however, is practically never used in real-world applications. This chapter is about storing application values in bins together with their hash keys. You can download the code at https://github.com/mailund/JoyChapter5 .
Thomas Mailund

Chapter 6. Heuristic Hash Functions

The main topic of this book is implementing hash tables; it’s only secondarily about hash functions. This is why you have assumed a priori that you have uniformly distributed hash keys. In reality, this is unlikely to be the case; real data are rarely random samples from the space of possible data values. In this chapter, you will learn about commonly used heuristic hash functions. In the next chapter, you will see an approach to achieving stronger probabilistic guarantees.
Thomas Mailund

Chapter 7. Universal Hashing

Generally, you cannot assume that your application can produce uniformly distributed keys; the hash functions in Chapter 6 are only heuristics. They make no guarantees about the results of hashing application keys and thus risk pathological cases where operations are linear rather than constant.
Thomas Mailund

Chapter 8. Conclusions

In this book, you explored the hash table data structure. You learned how to map keys from a large space, in which you assume that keys are uniformly distributed, into a small space of table bins and considered a table’s performance as a function of the number of bins vs. how many keys you store in a table. You covered strategies for handling collisions when two or more different keys map to the same bin and the performance consequences of the choice of strategy. You also learned how to dynamically adjust the size of tables to avoid them filling up and incurring high runtime performance penalties as a consequence, while at the same ensuring that you do not allocate tables larger than necessary and thus incur memory penalties as a consequence.
Thomas Mailund

Backmatter

Additional information

Premium Partner

image credits