Week 07 — 32-bit vs 64-bit · From Zero to AI Hero

Aerial photograph of a wide concrete highway with multiple lanes.

A wider road carries more cars per cycle. A wider register carries more bits per instruction. Same idea, smaller scale.

"Is your computer 32-bit or 64-bit?" is one of the more confusing questions in computing — partly because nobody bothered to write the answer down anywhere obvious, and partly because the answer is two things at once: how wide is the chef's hand? and how big is the warehouse the chef can address?

Both numbers happen to be the same. They didn't have to be. They are, on every modern CPU, by deep convention. And the choice between 32 and 64 has shaped, more than almost any other decision, what the last twenty years of computing have been able to do.

Two questions, one answer

Every CPU has a word size — the natural width of the registers, the width of the bus the registers ride on, the size of one "thing" the chef holds in one hand. This is the number we mean when we say a CPU is "32-bit" or "64-bit".

That single number controls two things you'd expect to be unrelated:

How big a number you can do arithmetic on in one instruction. A 32-bit ADD adds two 32-bit numbers in one cycle. To add two 64-bit numbers on a 32-bit CPU, you need two instructions and a carry — that doubles every long-arithmetic operation.
How much memory you can possibly address. An address is just a number. If your registers and pointers are 32 bits, you can write down at most 2³² different addresses. That's 4,294,967,296 bytes. Four gigabytes. Hard ceiling.

The first 32-bit CPU shipped in 1979. For decades, "you can never have more than 4 GB of RAM" was just the natural order of things. By 2005 it was a serious problem — Photoshop wanted more, video editors wanted more, databases desperately wanted more. The industry moved, awkwardly and over a decade, to 64-bit. Today every general-purpose CPU you've seen in fifteen years is 64-bit.

A close-up of a measuring tape with markings.

Photo · Siora Photography / Unsplash

A bit is a notch on the ruler. The more notches, the bigger the number you can write down.

Why each extra bit doubles the world

Each bit you add doubles the number of possible patterns. This sounds obvious. It is also extremely easy to underestimate.

Bits	Distinct values	Maximum address	Feels like
8	256	256 B	a sentence
16	65,536	64 KB	a small picture
24	~16 million	16 MB	an MP3
32	~4.3 billion	4 GB	a HD movie
40	~1 trillion	1 TB	all your photos
48	~280 trillion	256 TB	a small data centre
64	~18 quintillion	16 EB	all data on Earth, several times

From 8 bits to 32 bits, the number 4× as many bits, but the value 16 million times larger. From 32 to 64 — twice as many bits — the value 4 billion times larger. That's the doubling: each new bit halves the unaddressable region. It is one of the most slowly-realised facts in software, and one of the most consequential.

The 4 GB ceiling, and why it broke an era

For the first decade of the 2000s, "32-bit" was the universal default. Windows XP, the first iPhone, a lot of game consoles, every consumer Linux. And then, slowly, programs started bumping their heads against the ceiling.

Photoshop on Windows XP could not, by law of nature, use more than 4 GB. A single video frame from a 4K video doesn't fit comfortably in 32-bit memory twice. Databases routinely had data sets that didn't fit. Workarounds appeared — banks, segments, the dread PAE hack on Windows that let the OS see more than 4 GB but each individual program could only see 4. None of it scaled.

The 64-bit migration was the most quietly significant infrastructure shift of the 2000s. It happened mostly without users noticing — Apple did it in 2003 (the G5), Intel/AMD by 2006, iPhone in 2013 (the iPhone 5s — the first 64-bit phone). With it, the 4 GB ceiling vanished, and "memory" effectively became unlimited from the program's point of view.

How big is 16 exabytes, really

A 64-bit address space can name 2⁶⁴ different bytes — 18,446,744,073,709,551,616 of them. Sixteen exabytes. We do not have an intuition for this number. So:

In practice CPUs only physically wire up about 48 bits of that — 256 terabytes — because there's no point putting transistors on address lines that nothing in the universe will use. But the conceptual answer is clear: we will not run out of address space again in our lifetimes. 64-bit is, for working purposes, infinite. The next move, when it eventually comes, won't be motivated by running out of room.

A vast night sky filled with stars and the band of the Milky Way galaxy.

Photo · Jeremy Thomas / Unsplash

There are roughly 10²² stars in the observable universe. A 64-bit pointer addresses 1.8×10¹⁹ bytes. Same order of magnitude. Genuinely.

Wider registers do more in a single cycle

The other half of "64-bit" is arithmetic width. A 64-bit register holds a 64-bit integer, which can represent values up to 9.2 quintillion. With one ADD instruction, the chef adds two 64-bit numbers in one cycle. Floating-point doubles fit naturally. Pointers fit naturally. Most everyday math becomes a single op, where on a 32-bit CPU it had been two or three.

Even better: many modern CPUs have much wider registers for parallel work — 128, 256, 512 bits — and special SIMD (Single Instruction Multiple Data) instructions that do the same operation across all the lanes at once. A 256-bit register can hold eight 32-bit floats; a single SIMD ADD does eight additions in the time of one. AVX-512 doubles that again. The CPU is no longer just fast; it's wide.

And once you've got SIMD, you've got something else: a baby GPU. The same wide-vector idea, taken to extremes, is exactly what graphics cards and AI accelerators do — except instead of doing eight or sixteen at once, they do thousands.

Why this matters for AI

Modern neural networks famously use lower-precision math: FP16, BF16, sometimes FP8 or even 4-bit integers. People sometimes assume this is a step backwards. It's not. It's the wider-register trick, taken further.

If your registers are 64 bits and your math fits in 16 bits, you can pack four values into one register and do four operations in one cycle. Drop to 8-bit and you do eight at once. Drop to 4-bit and sixteen. The math gets a bit blurrier — but for inference workloads, blurry-but-fast is exactly the right trade. The GPUs running today's largest models are doing 16-way and 32-way parallel arithmetic per instruction, in registers that look enormous compared to the chef of 1995.

"Faster" stopped being the only frontier. The two new ones, since 2010, are wider and more parallel.

Try it yourself

Find your own register width:

Anywhere — open a Python prompt. Type import sys; sys.maxsize. On a 64-bit Python (which is everything modern), you'll see 9223372036854775807. That's 2⁶³ − 1, the largest signed 64-bit integer.
macOS / Linux — uname -m. x86_64 or arm64 means 64-bit. (If you ever see i386 on something newer than 2010, run.)
Windows — Settings → System → About. The "System type" line will say 64-bit.
Want to see SIMD? sysctl -a | grep hw.optional on macOS. You'll see arm64, neon, and on Apple Silicon a list of accelerators including the matrix-math instructions that 2024-era AI inference relies on.

What's next

Phase 1 has now shown you the chef, the hierarchy, the catalogue, the cop, the streets, and the road width. There is one last actor: the second chef who lives next door, who is a thousand times less skilled but works in groups of ten thousand. That chef draws your screen, and increasingly, runs your AI.

Week 08 is The GPU & NPU — specialised brains for graphics and AI. Why one of them is doing what your CPU could not.

Photo credits

All photos are free under the Unsplash license. Highway · Roman Logov · Tape · Siora Photography · Galaxy · Jeremy Thomas. Address-space comparison and scale table are inline SVG / CSS.