Week 10 · Phase 2 — The Ancestry
Compilers vs. interpreters — why C is fast, why Python is friendly, and why both matter.
Photo · Mick Haupt / Unsplash
The chef from Phase 1 only understands one language: machine code. Strings of bytes that decode into ADD, MOV, JMP, and friends. Nothing else. Not Python, not JavaScript, not C. The chef will never read your source code. Ever.
So between you and the chef, there is always a translator. The choice of which translator — and when the translation happens — is one of the deepest design decisions in any programming language. It's the reason Python is forgiving and slow, the reason C is unforgiving and fast, the reason JavaScript got a hundred times faster between 2008 and 2015 without changing its syntax, and the reason every single AI inference engine ends up spitting out machine code at the bottom.
There are two ways to bridge the gap between human source code and the chef:
The first is like producing the full English translation of a French novel and printing it as a book — slow up front, but every reader thereafter just reads English. The second is like having a translator stand next to you reading aloud, line by line, every time you want to read the book. Convenient if you only read it once. Murderous if you read it a million times.
Photo · Bank Phrom / Unsplash
A compiler is a printing press. The pages take time to set, but every copy after the first comes out fast.
"Compile" sounds like one verb. It is, technically, four:
The compiler:
This whole pipeline takes seconds for a small program, minutes for Linux, an hour for a modern web browser. But once it's done, the executable can run as many times as you like, at full chef-speed. Every C program you ever ship runs through these four steps before it meets the chef.
An interpreter skips most of that. It reads your Python file, builds a quick representation, and starts executing it line by line — looking up types at runtime, dispatching operators at runtime, allocating memory at runtime. It is friendly and forgiving (you don't have to wait for "the build" — you just run), but it's also paying a steep tax on every single line.
The price: when you write x + y in Python, the interpreter has to look up what x is, look up what y is, find the right "plus" function for those types, call it, store the result, and continue. In compiled C, the same line is one machine instruction. Ten clock cycles in Python becomes one in C, and that's the optimistic case.
For interactive work — a notebook, a quick script, a teaching environment — interpretation is the better trade. For tight inner loops, it's miserable.
It's hard to give universally fair numbers — workloads vary, optimisations matter, modern interpreters are clever. But for arithmetic-heavy code, the order of magnitude is roughly this, with C as the baseline:
Read the bottom row again. The same loop in vanilla Python is 50 to 100 times slower than the same loop in C. This is real and observable. It's why every Python data-science library is, secretly, a thin Python wrapper around a fast C or C++ engine.
For a long time the choice was binary: compile, ship, accept the workflow pain — or interpret, accept the speed hit. Then somebody had a clever idea.
Run the program through an interpreter. Watch which functions get called a lot. After they've been hit a few hundred times, take that hot function aside, compile it to machine code in the background, and from then on use the compiled version. Cold code stays interpreted (fast to start). Hot code becomes compiled (fast to run). You get most of compilation's speed and most of interpretation's flexibility.
This is Just-In-Time compilation, or JIT. JavaScript got 100× faster between 2008 and 2015 mostly because Google built V8, an aggressive JIT for JavaScript. Java has had a JIT since the late '90s. .NET has one. Python's "PyPy" is a JIT for Python (and is several times faster than CPython on heavy loops). Modern AI runtimes — including the inner loops of PyTorch's compiled mode — use JIT-style techniques to specialise for the actual shapes of tensors at runtime.
The translator stops being one strategy and becomes a continuum: pure interpretation on one end, pure compilation on the other, and everything in between is a JIT trying to figure out which lines are worth compiling early.
Photo · Possessed Photography / Unsplash
A modern compiler pipeline really is a factory: source goes in, optimisation passes happen, and machine-code rolls off the line.
Because the language was deliberately designed to not get in the way of the compiler. Every C type has a known size at compile time. Every C function has a known signature. Pointer arithmetic is allowed. There's no garbage collector watching over you. There's no runtime type checking. The compiler can reason about your program with extreme precision, generate excellent assembly, and the resulting executable speaks the chef's language directly with no translation layer at runtime.
This is also why C is dangerous. The same lack of guard rails that lets the compiler optimise your code aggressively also lets you walk straight off a cliff. Every serious memory safety bug ever filed against an OS kernel — buffer overflows, dangling pointers, use-after-free — comes from C and its derivatives. We pay for the speed.
Modern alternatives like Rust try to keep C-level speed while making the worst categories of bugs literally impossible to express. We'll come back to that argument in Phase 7.
AI training and inference live or die on the inner-loop speed of a few specific operations: large matrix multiplications, convolutions, attention. Those operations are written in C++/CUDA, compiled aggressively, and called from Python only once per "step". The Python for loop never sees a single floating-point number.
This is the trick of modern numerical Python. Your code looks like x = matrix @ vector. That single line dispatches into a precompiled C++ kernel that runs at full chef-speed for milliseconds. Then you get one slow Python step, then another precompiled kernel, and so on. Most of your running time is in the kernels — and the kernels are, by careful design, compiled ahead of time.
If you ever look at why a numerical Python program is slow, the answer is almost always: too much Python between the kernels. The kernels themselves are already as fast as they're going to get.
"Compiled" and "interpreted" are not opposites. They are the two ends of one dial — and modern systems are constantly turning it.
Watch a compiler at work, end-to-end:
hello.c:
#include <stdio.h>
int main(void) {
printf("hello, world\n");
return 0;
}
cc hello.c -o hello && ./hello. Two seconds, then a binary on disk.cc -S -O2 hello.c. Open hello.s. That is what the chef will actually see — a few dozen lines of ARM or x86.hello.py with one print("hello, world"). Run python3 hello.py. No build step, no binary — but every time you run it, Python re-parses, re-builds, re-interprets.time ./hello vs time python3 hello.py. Even on this trivial program, Python's startup takes longer than C's entire execution.You now know there's a translator between you and the chef, and roughly how it works. Time to actually write something for the translator to translate.
Week 11 is Hello World — the most-written program in history, dissected line by line. By the end of next week you'll know exactly what every word in #include <stdio.h> means, and why main is special.
All photos are free under the Unsplash license. Dictionary · Mick Haupt · Press · Bank Phrom · Factory · Possessed Photography. Pipeline and speed-bars are inline SVG / CSS.