Data Structures: Explain Hash Tables. Why do we use them?
This is a question I was recently faced with in an interview. I fumbled it a bit, so I decided to go back and brush up on hash tables.
For most data structures, Wikipedia does a darn good job of explaining the basics. If you haven’t already, I’d highly recommend you read through the Hash Table Wikipedia entry. If you’re too lazy to do that, you can just read on ahead – just be warned that this is more of a refresher rather than a course.
What is a Hash Table?
From Wikipedia, we get:
a hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values”). A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.
In simplest terms, you can think of the backbone of the data structure as a standard array – this is where the data is held. What makes hash tables unique is that when you are putting a key-value pair into it, you first run the key through what’s called a hash function.
A hash function is basically just an algorithm that transforms the key into a valid index of the array. This returned value will give us the position where we’ll store, or find, the data inside of the aforementioned backbone. It’s important to note that each time you put a key through a hash function, the returned value must be the same. First and foremost, this helps us find the data again once we’ve stored it. It also helps us achieve the theoretical O(1) store and retrieval time (actual performance varies on data being stored, size of list, and the quality of the hash function among other things).
You may have noticed that there could be problems if the hash function hashes multiple keys to the same index as it would either have to overwrite the data or, more commonly, follow a collision protocol to find a new place for it. You’d be right. It is an issue in nearly every implementation which is why the solution to it can be huge when it comes to the observed performance of the hash table.
Why do we Use Hash Tables?
We use hash tables because they are often times more efficient than any other data structure. The ability to lookup and store data with O(1) complexity is very rare.
As always, the best way to learn anything is to practice. Try creating your own hash table along with a working hash function, collision protocol, and dynamic resizing. Then use it to do something useful.
“Hash Table.” Wikipedia. Wikimedia Foundation, n.d. Web. 15 Jan. 2015.