Week 30 — Big O Notation · From Zero to AI Hero

A weighing scale at rest, perfectly balanced.

Big O is a scale. It doesn't tell you how heavy the algorithm is — it tells you how the weight changes as the input grows.

Suppose you have two algorithms that solve the same problem. One takes 0.1 seconds on a small input; the other takes 0.5 seconds. Which is better?

Trick question. The right answer is "I don't know yet — what happens when the input is a thousand times bigger?" The first might take 100 seconds. The second might take 0.7 seconds. The 0.5-on-small can be the 0.7-on-huge, and the 0.1-on-small can be the 100-on-huge. The behaviour at scale is what determines whether your code is shippable.

Big O notation is how programmers talk about that scaling behaviour. It is not optional. It is not academic. It is, after the kitchen, the most useful piece of vocabulary in this entire course.

The idea

Big O describes how the running time of an algorithm grows as the input size grows. We don't care about exact times — those depend on the chef's speed. We care about the shape of the curve.

If doubling the input doubles the time, that's O(n) — linear. If doubling the input squares the time, that's O(n²) — quadratic. If doubling the input adds one extra step, that's O(log n) — logarithmic. If doubling the input doesn't change the time at all, that's O(1) — constant.

The letter O is just shorthand. O(n) reads as "order of n". The notation drops constant factors and lower-order terms — we don't care about 3n + 7, only that it grows linearly. Big O is the shape, not the slope.

The classes you'll meet

Notation	Name	Example	n=1,000,000
O(1)	constant	array index, hash lookup	1 step
O(log n)	logarithmic	binary search	~20 steps
O(n)	linear	sum the elements; max/min	1,000,000 steps
O(n log n)	"linearithmic"	good sorting (mergesort, quicksort)	~20,000,000 steps
O(n²)	quadratic	bubble sort; nested loops	10¹² steps — uncomfortable
O(n³)	cubic	naïve matrix multiply	10¹⁸ — won't finish
O(2ⁿ)	exponential	brute-force subset enumeration	won't finish, ever
O(n!)	factorial	brute-force travelling salesman	heat death of universe

Read those last three rows again. Different algorithm classes are not "a little faster" — they are different planets. An O(n) algorithm finishes in a second on a million elements; an O(n²) algorithm on the same input takes hours; an O(2ⁿ) algorithm has effectively no chance of completing.

This is why "I rewrote the inner loop in C" can be a 2× speedup, but "I changed from O(n²) to O(n log n)" can be a 10,000× speedup. Big O is the dominant factor at scale; constant factors are the small ones.

Reading Big O from code

You don't need to count instructions. You count loops.

// O(1)
arr[0];

// O(n) — one loop over n items
for (int i = 0; i < n; i++) { total += arr[i]; }

// O(n²) — nested loops, both depending on n
for (int i = 0; i < n; i++)
    for (int j = 0; j < n; j++)
        M[i][j] = 0;

// O(log n) — each step halves the range
while (lo <= hi) {
    int mid = (lo + hi) / 2;
    if (arr[mid] == target) return mid;
    if (arr[mid] < target) lo = mid + 1;
    else                  hi = mid - 1;
}

Heuristics:

One loop over n items: O(n).
Two nested loops, each O(n): O(n²).
Loop where each iteration halves something: O(log n).
Recursive function that splits work in two: probably O(n log n) or O(2ⁿ), depending on whether the splits combine.
Hash-table operations: O(1) average, O(n) worst case.

Big O lies (a little)

Big O isn't the whole story. Two real cautions:

Constant factors matter at small scale. An O(n²) algorithm with a tiny inner loop can beat an O(n log n) algorithm with a huge constant for small n. The standard library's std::sort uses insertion sort (O(n²)) for small partitions because it's faster than quicksort there.
Cache effects matter. An O(n) walk through contiguous memory crushes an O(n) walk through a linked list. Big O thinks one operation is one operation; the chef knows otherwise.

So: use Big O to think about scale; benchmark to think about speed. They're complementary tools, not substitutes.

Why this matters for AI

The Transformer architecture, in its standard form, has an attention step that's O(n²) in the sequence length. Doubling the context window quadruples the compute. This single fact is the reason context-length is the most expensive thing to scale in language models — and why every "long-context" innovation (sliding windows, sparse attention, FlashAttention) is, fundamentally, a fight against that exponent. Whole research areas exist because someone wanted Big O of attention to drop from O(n²) to O(n log n).

Big O is the difference between a model that runs and one that doesn't.

Try it yourself

Write two functions that find the largest element in a vector — one with a single pass, one with nested-loop comparisons. Time them on n = 1000, n = 10,000, n = 100,000. Observe the curve.
Write binary search. Print the number of comparisons for various sizes; confirm log₂(n).
For each STL container in Week 26's table, look up the Big O of insert, find, erase. The differences explain why each container exists.

What's next

One data structure has the magical-seeming property of doing lookups in O(1) — finding any item, in any-sized collection, in essentially constant time. It's the workhorse of databases, compilers, networking, and AI. Time to meet it.

Week 31 is The Speed Demon — hash maps and the trick that makes them fast.

Photo credit

Photo free under the Unsplash license. Scale · Piret Ilver.