What's the difference between fine-tuning and prompt engineering?

Prompt engineering shapes a model's behavior through the input text alone, changing how you ask without changing the model itself. Fine-tuning actually modifies the model's parameters, permanently altering how it responds. Fine-tuning is more effort but produces consistent results and can teach entirely new capabilities that prompting cannot.

Does fine-tuning require a lot of data?

It depends on the task. Some fine-tuning approaches work with just tens or hundreds of examples (especially with techniques like LoRA), while others need thousands. The key insight is that fine-tuning is more about quality than quantity. A few hundred well-chosen examples can meaningfully shift a model's behavior.

Can fine-tuning make a model less capable at other tasks?

Yes, this is called 'catastrophic forgetting.' A model fine-tuned heavily on one task may regress on others. Techniques like multi-task fine-tuning, parameter-efficient methods (LoRA), or keeping the base model frozen help preserve general capabilities while adding specialized skills.

How Fine-Tuning Works

A base language model like GPT-4 or Claude is remarkably capable out of the box. It can write poetry, explain physics, and debug code. But it wasn’t trained to be any of those things specifically. What it learned during training was the structure of language itself: grammar, reasoning patterns, world knowledge encoded in statistical relationships. Fine-tuning is how we take that general foundation and shape it into something specialized.

The short answer

Fine-tuning is the process of taking a pre-trained model and continuing its training on a specific dataset to modify its behavior. The base model has already learned general language patterns from massive data; fine-tuning adjusts its parameters to specialize in a particular task or style. A model fine-tuned on medical texts learns to discuss symptoms and treatments. A model fine-tuned on customer service chats learns to be polite and helpful. The base capabilities stay, but the focus sharpens.

The full picture

Why fine-tuning exists

Training a large language model from scratch costs millions of dollars and requires enormous computational resources. Large foundation models are trained on massive corpora and huge compute budgets, a pattern documented in reports like GPT-4 Technical Report and PaLM.

But you don’t need to start from zero to teach a model something new. The model already understands language deeply. It knows grammar, reasoning, and a vast amount of world knowledge. What’s missing is specialization. Fine-tuning leverages all that pre-existing knowledge and redirects it toward a specific task.

The efficiency gains are enormous. Fine-tuning might require a few hours on a single GPU rather than months on a cluster. This is what makes it practical for companies to create specialized AI assistants without building foundation models from scratch.

How fine-tuning actually works

The process builds on the same core mechanism as initial training: gradient descent. You show the model examples of what good output looks like, measure how wrong its current outputs are, and adjust the model’s parameters to be less wrong next time.

In practice, fine-tuning typically uses one of two approaches:

Supervised fine-tuning (SFT) is the simpler method. You prepare a dataset of input-output pairs, questions and answers, or prompts and desired responses, and train the model on these examples directly. The model learns to produce outputs similar to your examples.

Reinforcement Learning from Human Feedback (RLHF) adds a layer. After initial fine-tuning, human raters rank different model outputs from best to worst. A separate “reward model” learns to score outputs based on these rankings. The original model is then fine-tuned to maximize its scores according to this reward model, as described in OpenAI’s InstructGPT paper. This is how chat assistants became more helpful rather than merely fluent.

Parameter-efficient fine-tuning: doing more with less

Full fine-tuning updates every parameter in a model. Billions of them. This is expensive and risks “forgetting” what the model originally learned.

LoRA (Low-Rank Adaptation) and similar techniques solve this differently. Instead of modifying the model’s core parameters, they add small trainable matrices alongside the existing weights, following the approach from the original LoRA paper. During fine-tuning, only these small matrices are updated. The base model stays frozen.

When you run inference, the small matrices are merged with the base weights on the fly. The result behaves as if the model was fully fine-tuned, but you only trained a tiny fraction of the parameters.

This approach has practical benefits: faster training, lower memory usage, and easier to maintain multiple specialized versions (just swap the small adapter matrices).

What fine-tuning can and can’t do

Fine-tuning excels at shaping style and focus. You can make a model sound more formal, answer in a specific format, or focus on a domain like law or medicine. The model absorbs the patterns in your training data.

But fine-tuning has limits. It can’t add entirely new knowledge the model couldn’t potentially infer. If the base model doesn’t know who won the 2022 World Cup, fine-tuning on World Cup facts won’t help. The information simply wasn’t in its training data. Fine-tuning shapes what the model already knows, it doesn’t inject new facts.

When fine-tuning makes sense

Fine-tuning is worth the effort in several scenarios:

Consistency matters more than flexibility. If you need a model to always respond in a specific format or tone, fine-tuning is more reliable than prompting.

Domain expertise is required. A model fine-tuned on medical literature will use the right terminology and follow clinical reasoning patterns more consistently.

Scale demands efficiency. Running a 50-example prompt for every query costs more in API tokens than fine-tuning a model once and calling it with simple prompts.

Specialized output format. If you need structured JSON outputs, code in a specific framework, or consistent formatting, fine-tuning is more dependable.

Why it matters

The difference between a base model and a fine-tuned model is the difference between a talented generalist and a specialist. Base models are extraordinarily capable but unfocused. Fine-tuning is how we focus them.

As AI becomes embedded in every product and workflow, fine-tuning is how companies build competitive advantages. Your fine-tuned model isn’t just using the same foundation as everyone else. It’s been shaped by your data, your use cases, and your requirements.

The result is an AI that doesn’t just know how to answer questions, but knows how to answer your questions, in your context, the way you need them answered.

Common misconceptions

“Fine-tuning is only for experts with large datasets.” This is outdated. Techniques like LoRA enable fine-tuning with just tens or hundreds of examples. Many companies fine-tune models effectively using data they already have, without needing massive labeled datasets.

“A fine-tuned model learns new facts from your training data.” Fine-tuning shapes how a model expresses what it already knows, it doesn’t inject new information. If the base model doesn’t know something, fine-tuning on that topic won’t help. The model can only learn patterns, not memorize new facts it never encountered during pre-training.

“Fine-tuning replaces the need for prompt engineering.” They serve different purposes. Fine-tuning establishes consistent behavior, style, and task focus, while prompt engineering still matters for guiding specific outputs. The best results often come from combining both approaches.

The short answer

The full picture

Why fine-tuning exists

How fine-tuning actually works

Parameter-efficient fine-tuning: doing more with less

What fine-tuning can and can’t do

When fine-tuning makes sense

Why it matters

Common misconceptions

How Prompt Engineering Works

How AI Image Generation Works

How Self-Driving Cars Work

Get the weekly explainer digest