How Neural Networks Work

In the 1980s, researchers tried to build AI systems by writing out explicit rules: if you see these shapes arranged this way, it’s a cat. Decades of effort produced systems that could barely recognize handwriting. Then a different approach took over — instead of writing rules, you show the system millions of examples and let it figure out the rules itself. The shift felt almost like cheating. It turned out to be the breakthrough that unlocked modern AI.

The short answer

A neural network is a mathematical function made up of layers of simpler functions, with millions or billions of adjustable numbers called weights. You feed it input data, it produces an output, and then you measure how wrong that output was. A process called backpropagation traces the error backward through the network and adjusts the weights slightly so the next prediction is a little more accurate.

Do this millions of times, with millions of examples, and the weights gradually settle into configurations that capture the patterns in the data. The network hasn’t been told the rules. It has found them by being wrong a lot.

The full picture

The basic unit: a neuron

The fundamental building block is an artificial neuron (also called a node or unit). It does something simple: it takes several numbers as input, multiplies each by a weight, adds them all together, adds a bias value, and passes the result through an activation function that determines its output.

The activation function is usually something like ReLU (Rectified Linear Unit): output the input if it’s positive, output zero if it’s negative. This simple nonlinearity is crucial. Without activation functions, stacking layers of neurons would be mathematically equivalent to just one layer. The nonlinearities allow networks to approximate complicated, non-linear relationships.

Individually, a neuron is trivial. Together, in layers of thousands, connected across multiple layers, they become capable of representing remarkably complex functions.

Layers: how networks are organized

A neural network consists of an input layer (which receives raw data), one or more hidden layers (which transform the data through cascades of computations), and an output layer (which produces the final answer).

Each layer learns to represent the data at a different level of abstraction. In an image recognition network, early layers learn to detect edges and color gradients. Middle layers combine these into shapes and textures. Later layers combine shapes into parts of objects. The final layer combines all of this to recognize the object.

This hierarchical representation is powerful because it’s learned automatically. You don’t design the feature hierarchy; you just provide examples and let the training process discover which representations are useful.

Deep learning refers to networks with many hidden layers, typically dozens to hundreds. The “deep” refers to depth of layers, not depth of thought. More layers allow more abstract representations, which is why deep networks outperform shallow ones on complex tasks.

Training: learning by failing

Before training, a network’s weights are initialized to small random numbers. Feed in an image and it produces nonsense output.

Training is the process of adjusting those weights to make the output less wrong. It requires labeled examples: inputs where you already know the correct answer. For image recognition, thousands of images labeled with their contents.

For each training example:

Pass the input through the network and get a prediction (the forward pass)
Compare the prediction to the correct answer using a loss function that measures the error
Use backpropagation to calculate how much each weight contributed to the error
Adjust every weight slightly in the direction that reduces the error (this step is gradient descent)

Backpropagation is the key algorithm. It applies the chain rule from calculus to trace how each weight in the network affected the final error, going backward from the output layer to the input layer. This gives you the gradient of the loss with respect to each weight: a number telling you how much changing that weight would change the error, and in which direction.

Gradient descent then nudges each weight by a small amount (determined by the learning rate) in the direction that decreases the error. The size of the nudge matters. Too large and the weights overshoot good values and bounce around. Too small and training takes forever.

Why training requires so much data and compute

A large neural network has billions of weights. Each one needs to be adjusted through thousands of iterations of gradient descent to reach useful values. Training a large language model requires quadrillions of mathematical operations, which is why it costs millions of dollars in computing time and requires specialized hardware (GPUs and TPUs).

But there’s a deeper reason for needing so much data. A network only generalizes to examples it hasn’t seen if it has seen enough variety during training to learn the underlying pattern rather than memorizing specific examples. A face recognition system trained only on photos taken in bright daylight will fail in dim lighting, because it learned “face in good light” rather than “face.”

Overfitting is the failure mode where a network memorizes training examples rather than learning patterns. Techniques like dropout (randomly disabling neurons during training), data augmentation (artificially creating more varied examples), and careful train/test splits are used to detect and prevent it.

Modern architectures: transformers and convolutions

Not all neural networks have the same structure. Convolutional neural networks (CNNs) are specialized for image data: they apply filters that slide across the image, detecting local patterns regardless of position. This matches the structure of image data well and dramatically reduces the number of weights needed.

Transformers, introduced in 2017 by Ashish Vaswani and colleagues at Google Brain in their landmark paper “Attention Is All You Need”, power most modern language models. Their key innovation is an attention mechanism that allows each part of the input to selectively focus on other parts, capturing long-range dependencies. When processing a sentence, a transformer can learn that “it” refers to a specific noun mentioned fifteen words earlier.

The black box problem: why neural networks can’t explain themselves

There’s a troubling aspect to neural networks that gets less attention than their capabilities: nobody — including the researchers who built them — fully understands why they work as well as they do, or what they’re actually doing internally.

This is called the black box problem or interpretability problem. A neural network is not a set of logical rules that a human wrote. It’s a vast array of billions of floating-point numbers, each one a small weight in an interconnection. The network produces a result, but tracing why it produced that result is genuinely hard.

This matters for two reasons. First, it makes errors hard to predict. A neural network that can accurately identify tumors in medical scans may fail catastrophically on images taken with a different type of scanner, because it learned to use subtle artifacts of the original scanner as shortcuts rather than genuinely learning to see tumors. Self-driving cars face a similar version of this problem with edge cases on real roads. This is called shortcut learning, and it’s been demonstrated in medical AI, self-driving research, and NLP systems.

Second, it makes bias hard to detect and fix. A hiring model trained on historical data might encode the preferences of whoever made past hiring decisions, including any systematic discrimination, without containing a single explicit rule about race or gender. The bias is distributed invisibly across millions of weights.

Interpretability research is an active field trying to solve this. Techniques like SHAP values, attention visualization, and probing classifiers try to reveal what information a network uses when making a decision. But even the most sophisticated interpretability tools give incomplete answers. We can see which input features seem to matter most, but we can’t read the “reasoning” the network used the way we might trace a decision through a traditional rule-based system. This is perhaps the defining unsolved problem of modern AI.

Why it matters

Neural networks have become the dominant approach for a wide range of tasks that previously seemed beyond the reach of computers: medical image diagnosis, real-time language translation, voice recognition, protein structure prediction, and generating text and images.

The surprising thing isn’t just that they work. It’s that the same basic architecture (neurons, layers, backpropagation) scales: larger networks trained on more data consistently perform better. This scaling law — formalized in 2020 by Jared Kaplan and colleagues at OpenAI — is what drove the investment in large language models and the current wave of AI capability. It turned out that just making the networks bigger and feeding them more text was enough to produce systems that can reason, write, and code.

Common misconceptions

Neural networks work like the human brain. This is misleading. Artificial neurons are loosely inspired by biological neurons, but the analogy stops there. The brain involves chemical signals, timing, plasticity, and billions of cells interacting in ways we still don’t fully understand. Neural networks are mathematical functions, not brain models.

Neural networks understand what they’re processing. They don’t. A network that recognizes cats in images has no concept of “cat” as an animal, only statistical patterns in pixel values. When it fails, it fails in ways that reveal it never understood semantics, only correlations in training data.

More data always makes neural networks better. Up to a point. If the additional data is redundant or low quality, training can plateau or even degrade. Networks also need diverse data that represents the full range of situations they’ll encounter in practice.

Neural networks can be fully understood if we examine them closely enough. Despite intense research, neural networks remain largely opaque. A network with billions of weights encodes knowledge in ways that resist simple interpretation. We can observe behavior, but explaining exactly why a network makes a specific decision is often genuinely difficult.