×
By the end of this chapter, you should be able to:
A hash table or hash map is a data structure that can map keys (of any type) to values. Hash tables are one of the most impressive data structures as they have an average runtime of O(1) for searching, insertion, and deletion.
Before we dive deep into the performance characteristics of hash tables, let's first examine how they work. You can think of a hash table like a JavaScript object, except instead of all the keys being strings, the keys can be any kind of value.
When searching for values in the hash table, we pass the key through a special function called a "hashing function," which turns the key into an index. A computer then uses that index to access the key's corresponding value. This isn't exposed to you; it's just used by the computer internally to associate keys with values.
One thing to be mindful of when working with hash functions is that they sometimes cause collisions. In other words, it's possible for two keys to get hashed to the same index. When working with hash tables, it is essential to have a good hashing function that does not frequently cause collisions (different inputs return the same output). The function at a minimum should provide a uniform distribution (equal likelihood for all values). The hash table also needs a way to resolve collisions; we'll talk more about this in the next chapter.
We typically think of objects in JavaScript as being hash tables, but objects are restricted in certain ways. For example, the keys in JavaScript objects must be strings or symbols; no other data type is allowed. In order to accomodate more general hash tables, in ES2015, two new constructors called Map
and WeakMap
were introduced.
Maps are similar to objects, but with a few key differences. From MDN:
Object
has a prototype, so there are default keys in the map that could collide with your keys if you're not careful. This could be bypassed by using map = Object.create(null)
since ES5, but was seldom done.Object
are Strings
and Symbols
, whereas they can be any value for a Map
, including functions, objects, and any primitive.Map
easily with the size
property, while the size of an Object
must be determined manually. What's the difference between Map
and WeakMap
, then? They are both implementations of Hash Tables, but keys in WeakMaps must be objects (the trade off here is performance). WeakMaps also behave differently when it comes to garbage collection. You can read more about the differences here and here.
So how can we create a hashing function? One approach is for our function to return a number between 0 and the size of our hash table. We can then use this number as an index inside of a suitably large array.
Let's start with a simple example: suppose we have a collection of numbers that we want to store as values in a hash table. We could create a hash function by using the number modulo the size of the table as the index. For example, if our hash table has a size of 10
, and our number is 50
, then our index would be 50 % 10 = 0
.
This is a nice start, but we quickly run into problems. One problem is that the effectiveness of our hash table depends a great deal on the numbers we're trying to hash. For example, if we're trying to store the numbers 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100, we immediately run into a problem with a hash table of size 10, because each one of these numbers will yield an index of 0 with our current hashing function!
One solution to this problem is to create a large hash table. In most applications, in order to minimize collisions it's also helpful to ensure that the size is a prime number (here's one article that goes into why primes are helpful).
The larger issue, however, is that our hashing function depends on the input being a number. But we may want to hash all kinds of different types of data!
Here's an example of a more sophisticated hashing function which converts strings to indices. This function sums the ASCII values of the letters in the key. It then finds the value of that sum modulus the size of the hash table:
// let's imagine hashTable is an array with 7919 (a large prime number) var hashTable = new Array(7919) function basicHash(key){ var sumOfCharacters = key.split("").reduce(function(previousValue, nextValue){ return previousValue + nextValue.charCodeAt(0) }, 0) return sumOfCharacters % hashTable.length } basicHash('Elie') // 383 basicHash('Matt') // 406 basicHash('Tim') // 298
We could now use the value returned to use from the basicHash
function as an index and implement a set
method to add the value into our hash table. While this will work, it still has some problems. While hashing the three strings "Elie", "Matt" and "Tim" seem fine so far, that is because we have a relatively large hash table size. If we were to add more and more values the distribution would not be even and we would quickly get hashing collisions.
To fix our issue, we can introduce another small prime number to multiply each part of our sumOfCharacters
by. This is an implementation of an algorithm called Horner's method. We will not be getting into the math of it (you can read more about it here), but in short, this will lead to a more uniform distribution of indices with fewer collisions on average.
// lets imagine hashTable is an array with 7919 (a large prime number) var hashTable = new Array(7919) function hornerHash(key){ var smallPrime = 37; var sumOfCharacters = key.split("").reduce(function(previousValue, nextValue){ return smallPrime * previousValue + nextValue.charCodeAt(0) }, 0) return sumOfCharacters % hashTable.length } hornerHash('Elie') // 4155 hornerHash('Matt') // 6711 hornerHash('Tim') // 205
Note that in the example above, we are assuming that our keys are strings. What makes a hash table so special is that your keys can be any data type. This includes strings, numbers, booleans, null, undefined, objects, arrays, and even functions!
Don't worry though, for the exercises below, you will not be implementing your own hashing function. What's important to understand is what makes a good hashing function and how you can handle hashing collisions, which we will see more in the next chapter.
Hash tables have quite remarkable performance characteristics. If you can manage collisions in a hash table or if you are lucky enough to know all the keys ahead of time you can even create a perfect hash which will guarantee O(1) lookups for all cases. Here is some more detail about the performance of a hash table.
Operation | Runtime |
---|---|
search | O(1) |
insert | O(1) |
delete | O(1) |
space complexity | O(n) |
Complete Part I of the exercises here.
When you're ready, move on to Handling Collisions