A wall of numbered post office mailboxes. Photo · Elizabeth Kay / Unsplash
A hash map: hand a key to the function, get back a slot number, walk straight to the box. No search, no sort, no waiting.

Last week we said arrays look up by index in O(1). Wonderful — but only if you happen to know the index. What if you have a name, and you want the phone number? A username, and you want the user record? A word, and you want its translation? You don't have an index; you have a key. Searching the array is O(n). Sorted with binary search, it's O(log n). The hash map turns that lookup into O(1) — the same cost as an array index, on collections of any size.

This data structure is so useful it's the default thing in many languages. JavaScript objects are hash maps. Python dictionaries are hash maps. Lua tables are hash maps. Without them, modern software essentially does not work.

The trick

You have an array of N "buckets". You want to store a key/value pair. The hash map runs the key through a hash function — a function that maps any input to a number — then takes the result mod N to pick a bucket. Store the value there.

To look up: hash the key, mod N, go to that bucket. Done. One hash computation, one mod, one memory read. O(1).

// conceptually:
slot = hash(key) % num_buckets;
buckets[slot] = {key, value};
// to find:
value = buckets[hash(key) % num_buckets].value;

Collisions

Two different keys can hash to the same bucket. The hash function is many-to-one — it has to be, since there are infinite possible strings but only N buckets. Two main strategies for handling this:

As long as the table isn't too full (typical "load factor" cap is around 0.7), collisions are rare and the average lookup is still O(1). When it gets too full, the table rehashes — allocates a bigger backing array and reinserts everything. Like vector growth: amortised O(1).

Practical use

#include <unordered_map>
#include <string>

std::unordered_map<std::string, int> scores;

scores["alice"] = 95;
scores["bob"]   = 82;
scores["carol"] = 71;

// O(1) lookup
std::cout << scores["alice"];      // 95

// existence check
if (scores.contains("dave")) {...}     // C++20 — clearest
if (scores.find("dave") != scores.end()) {...} // pre-C++20

Note: scores["dave"] on a missing key silently inserts a default value (here, 0) and returns it. This is convenient for counters but a footgun for "is this key present?" — use contains or find for that.

Use std::unordered_map when you want fast lookup. Use std::map when you also want sorted traversal of the keys (it's a tree internally — O(log n) per op, but iterates in key order).

Hash functions in practice

The standard library has a built-in std::hash<T> for every primitive and most STL types. To use a hash map keyed by your own class, specialise std::hash for your type — the implementation is usually a combination of hashes of the fields. For most application code you'll never need to write a hash function from scratch.

One important property: a good hash function should distribute keys uniformly across the bucket range. Bad hash functions (e.g. always returning 0) reduce the table to a single linked list, and lookups degenerate to O(n). Languages have learned to use cryptographically-strong hashes for strings to prevent adversarial "hash flooding" attacks.

Why this matters for AI

Tokenizers in language models are massive hash maps from string to integer ID, and back. A 50,000-token vocabulary is 50,000 entries; lookup is O(1) per token; you're tokenising thousands of tokens per second. Without hash maps, this is the slow path.

Same in databases: every "find user by email" is a hash-map lookup against an index. Same in compilers: every variable name is hashed during parsing. Same in caching: every CDN entry. The hash map is, after the array, the most-used data structure in software.

If a problem can be reduced to "I have a key; give me the value", hash maps are the answer. They almost always are.

Try it yourself

What's next

Now we leave data structures momentarily and meet a control-flow idea so strange it has its own folklore. A function that calls itself.

Week 32 is Recursion — the Inception episode.

Photo credit

Photo free under the Unsplash license. Mailboxes · Elizabeth Kay.