What's the difference between CPU clock speed and performance?

Clock speed (measured in GHz) tells you how many cycles per second the CPU can execute, but performance depends on what happens in each cycle. A 3.0 GHz CPU isn't necessarily faster than a 2.5 GHz CPU if the latter does more work per cycle.

Why do modern CPUs have multiple cores?

Multiple cores let a CPU run several tasks simultaneously. Instead of one worker finishing one task before starting another, you have four or eight workers handling different tasks in parallel. This improves multitasking and speeds up applications designed to use multiple cores.

What is CPU cache and why does it matter?

Cache is ultra-fast memory inside the CPU that stores frequently used data close to where it's needed. Accessing main RAM takes 50-100 nanoseconds; cache access takes 1-2 nanoseconds. This speed difference makes cache critical for performance, which is why CPUs have multiple levels (L1, L2, L3) of increasingly larger but slower cache.

How CPUs Work | distill.md

In 1971, Intel released the 4004, the first commercial microprocessor. As Intel’s own history documents, it contained 2,300 transistors and could perform about 92,000 calculations per second. Today’s fastest consumer CPUs contain over 15 billion transistors and execute more than 5 billion instructions per second. The fundamental principles haven’t changed, but the scale is almost incomprehensible.

The short answer

A CPU (central processing unit) executes instructions by using transistors as microscopic switches that turn each other on and off in patterns representing binary data (1s and 0s). These transistors form logic gates, which combine into circuits that perform arithmetic and make decisions. The CPU runs through this cycle billions of times per second: fetch an instruction from memory, decode what it means, execute the operation, then store the result.

The full picture

The transistor: the building block

Every calculation your CPU performs happens through transistors. A transistor is a semiconductor device that can act as a switch or amplifier. When you apply voltage to one part of a transistor, it allows current to flow through another part. This on/off state represents the 1 or 0 of binary information.

Modern CPUs are built using CMOS (complementary metal-oxide-semiconductor) technology. The AMD Ryzen 9 7950X, for example, packs roughly 13.1 billion transistors across its chiplets, each one switching on and off up to 5 billion times per second. They’re etched onto silicon wafers using photolithography — what Intel’s chip-manufacturing explainer describes as “printing with light” using glass masks at scales smaller than wavelengths of visible light. These same principles of semiconductor physics power everything from the servers that run the internet to the phone in your pocket.

From transistors to logic gates

Individual transistors aren’t useful on their own. When you combine them in specific configurations, you create logic gates. An AND gate outputs 1 only if both inputs are 1. An OR gate outputs 1 if either input is 1. A NOT gate inverts the signal, turning 1 into 0 and vice versa.

These three gate types combine to form everything your CPU does. Adders combine numbers by using AND gates to detect when both bits are 1 (carry) and OR gates to combine the results. Multiplexers use logic gates to choose between different inputs based on conditions. Memory cells use combinations of gates to store bits that persist until changed.

The instruction cycle

CPUs execute programs through a repeating cycle called the fetch-decode-execute cycle. The CPU’s control unit manages this process, stepping through instructions one at a time at a pace set by its clock signal.

First, the CPU fetches the next instruction from RAM, storing it in the instruction register. Then, it decodes the instruction, examining the binary pattern to determine what operation to perform (add, compare, move data, jump to a different location). Next comes execution, where the arithmetic logic unit (ALU) actually performs the calculation or operation. Finally, the result gets stored back in memory or a register, and the cycle repeats.

This happens billions of times per second. A 3.5 GHz CPU completes 3.5 billion clock cycles per second. Modern CPUs can execute multiple instructions per cycle through techniques like instruction-level parallelism.

Cores, threads, and parallelism

The single-core CPU that powered computers through the 1990s hit physical limits. Increasing clock speed generated more heat than could be practically dissipated. So manufacturers added more cores instead.

Each core is essentially a complete CPU that can execute instructions independently. A quad-core CPU can handle four simultaneous streams of instructions. Your operating system presents these cores as additional processing power to your applications.

Intel’s Hyper-Threading and AMD’s Simultaneous Multithreading (SMT) add another layer: each physical core can handle two threads (or more) by sharing execution resources. An 8-core CPU with SMT appears as 16 logical processors to your operating system, though each core can only truly do so much work at once. Intel’s Hyper-Threading documentation explains how this works at the microarchitecture level.

Cache and memory hierarchy

RAM is fast, but not fast enough for a CPU running at billions of cycles per second. Waiting for data from RAM would waste hundreds of cycles each time the CPU needs something. That’s where cache comes in.

CPUs have multiple levels of cache. L1 cache is smallest (typically 32-64 KB per core) and fastest, built directly into the processor core. L2 cache is larger (256 KB to several MB) and slightly slower. L3 cache can be shared across all cores and ranges from 8 MB to over 30 MB in modern processors.

When the CPU needs data, it checks L1 first, then L2, then L3, and finally goes to RAM if not found in any cache. This happens automatically; programmers don’t explicitly manage cache, though they can write code that tends to be cache-friendly.

Why it matters

Understanding how CPUs work helps you make better decisions about the technology you buy and use. Marketing departments love to emphasize clock speed, but raw GHz tells you less than you might think.

Consider the 2020 Apple M1 chip. It runs at 3.2 GHz, slower than many Intel processors of the same era. Yet it outperformed them significantly in real tasks. The M1 executes more instructions per clock cycle and, as Apple’s technical specifications show, combines high-performance and energy-efficient cores in the same chip — a more efficient memory architecture that let a slower clock beat a faster one with an older design.

For everyday users, this means the CPU with the highestGHz isn’t always the fastest. Applications optimized for multiple cores benefit from more cores. Single-threaded tasks depend more on per-cycle efficiency. Your web browser benefits from multiple cores. Video editing and 3D rendering can use all available cores.

Common misconceptions

“More GHz means a faster CPU.”

This was roughly true in the 1990s when all CPUs had similar architectures. Today, it’s misleading. A 2024 CPU at 4.0 GHz may be two to three times faster than a 2015 CPU at the same 4.0 GHz due to architectural improvements, better cache, and more efficient instruction handling. Compare processors within the same generation or manufacturer family for meaningful comparisons.

“More cores always means faster.”

Not all applications can use multiple cores. A spreadsheet recalculation or older game might only use one or two cores, leaving the others idle. Software must be specifically written to take advantage of multiple cores, called “parallelization.” Some tasks inherently can’t be parallelized and must run sequentially.

“CPUs keep getting faster every year.”

Clock speeds have plateaued around 5 GHz for consumer CPUs. The focus has shifted to other improvements: better efficiency, more cores, integrated graphics, and specialized processing units (like AI accelerators). The 2024 CPU in your computer is faster than the 2019 version, but not because it runs at a higher speed. It’s faster because it does more useful work per clock cycle and handles data more efficiently.

The short answer

The full picture

The transistor: the building block

From transistors to logic gates

The instruction cycle

Cores, threads, and parallelism

Cache and memory hierarchy

Why it matters

Common misconceptions

How 5G Works

How AI Agents Work

How AI Hallucinations Happen

Get the weekly explainer digest