Chapter 6: Defining Done

At some point in every big project, two people will look at the same piece of work and one will say "we're done" and the other will say "we haven't even started." Both will be completely right. And that is the problem this chapter is about.

"Done" sounds like it should be simple. You either finished the thing or you didn't. But in practice, the word carries so much hidden baggage that it becomes one of the primary ways projects blow up. Timelines slip not because engineers work slowly, but because "done" kept meaning different things to different people across the life of the project. The engineers thought done meant "the code is written and tests pass." The product manager thought done meant "users can actually do the thing." The VP thought done meant "the metrics moved." Legal thought done meant "the privacy review is signed off." Nobody was wrong. They were just answering different questions.

This chapter is about making "done" a precise, shared, and negotiated thing before you start — not something everyone assumes they agree on until the deadline arrives and they realize they never did.

The Three Things "Done" Usually Means

Let's start by pulling the word apart. When someone says a project is "done," they usually mean one of three very different things. Understanding which one they mean — and which one you need — is the first move.

Success Criteria

Did we achieve the goal we set out to achieve?

This is measured in outcomes: did checkout conversion improve? Did on-call pages drop? Did new user activation increase? Success criteria live in the world of results. They are typically owned by the person who originated the project — a product manager, a director, an engineering leader. You might ship perfectly functioning code and still fail the success criteria because the code didn't actually change user behavior the way you hoped.

Acceptance Criteria

Does this meet the agreed specification?

This is measured in behaviors: does the system do what the spec says? Does the feature work across all the edge cases we agreed to handle? Does the API return the right shape of data? Acceptance criteria live in the world of requirements. They are typically owned by the engineers and QA, verified through testing. You can pass every acceptance criterion and still not move the metrics, because the spec itself was wrong.

Launch Criteria

Is it safe to put this in front of real users?

This is measured in readiness: is there a rollback plan? Is the monitoring in place? Did security review it? Does customer support know it's coming? Launch criteria live in the world of operational safety. They are typically owned by the infrastructure team, security, legal, and your ops org. You can pass acceptance criteria and have a good shot at success criteria but still not be allowed to launch because the rollout plan is missing.

Notice that these three are completely independent. A project can pass one and fail the others. A project can pass two and stall on the third. Most of the time, when a project is "done" but somehow still not going anywhere, it's because it's finished from one frame but stuck in another.

What This Looks Like in Real Life

A team spends twelve weeks rebuilding an internal data pipeline. The new pipeline is faster, cleaner, better tested, and more maintainable. Engineers are proud of it. From an acceptance criteria standpoint it's done. But the business metric the pipeline was supposed to improve — reporting latency for the analytics team — is unchanged, because it turned out the pipeline wasn't the bottleneck. The dashboard queries were. Nobody checked. The project is "done" and a complete failure simultaneously. Success criteria were never written down at the start.

Why Everyone Skips This Step

If defining done is so important, why don't teams do it? The honest answer is that it's uncomfortable. Writing down specific, measurable criteria forces you to commit to something. It makes failure concrete and visible. As long as "done" remains fuzzy, there's always a way to declare victory. Nobody wants to stand in front of their director six months later and say "here is the exact metric we said we'd move, and we didn't move it."

There's also a subtler problem. The people who kick off a project often don't fully know what success looks like. The project started because someone had a strong feeling that something was wrong, or that an opportunity existed, but the details are fuzzy. Writing done criteria forces you to surface this fuzziness early, and that's embarrassing. It's much more comfortable to say "we'll know it when we see it" and start building.

But "we'll know it when we see it" is a trap. By the time you see it, you've already spent six months building the wrong thing. Or worse — you've built exactly the right thing but nobody can agree that you did, because the goalposts moved while you were building.

The discomfort of defining done precisely upfront is nothing compared to the discomfort of six months of work that nobody can agree was worth doing.

How to Write Success Criteria That Are Actually Useful

Good success criteria have three properties: they are measurable, they have a timeframe, and they include a baseline. Without all three, you don't have a success criterion — you have a wish.

Measurable

"Improve performance" is not measurable. "Reduce median API response time from 420ms to under 200ms" is measurable. "Make the checkout flow better" is not measurable. "Increase checkout completion rate from 61% to 68%" is measurable. The test is simple: could two people with access to the same data independently determine whether this criterion was met? If yes, it's measurable. If the answer depends on judgment, it's not.

This sounds obvious but it's genuinely hard to do in practice, because most projects start with goals that are stated in qualitative language. "Improve developer productivity." "Reduce customer friction." "Make the system more reliable." These are not goals — they're directions. Your job is to turn directions into destinations. That means having the uncomfortable conversation where you ask "by how much?" and "as measured by what?" until someone gives you a number they're willing to be held to.

Timeframe

"Increase retention" with no timeframe attached is useless. Retention might go up three years after your project ships because of compounding improvements from a dozen other initiatives. Does that count? Without a timeframe you can't answer that question. Good success criteria say "within 60 days of launch" or "measured over the 30-day cohort that first encounters the feature." The timeframe forces you to think about causality — you're claiming that your specific project will cause this specific change within this specific window.

Baseline

You cannot measure improvement without knowing where you started. This sounds so obvious that it barely deserves saying, except that teams skip it constantly. Before you start building, write down the current state of every metric you claim you will change. Measure it. Screenshot it. Store it. The reason is not just intellectual honesty — it's self-defense. Without a baseline, someone will always be able to argue that the metric was already moving before your project launched, or that you can't tell your contribution apart from something else. A baseline is your before-and-after photograph.

The Success Criterion Formula

Write it like this, every time

State the metric

Name the exact thing you will measure. Be specific enough that two people independently know how to look it up. "User-reported satisfaction score (CSAT) from post-purchase survey" is better than "satisfaction."

Record the current baseline

Measure it now, before anything changes. Write down the date, the methodology, and the number. "As of [date], measured via [method], the baseline is [X]."

Name the target

State the number you are committing to reach. Be specific. "Improve by 10%" is weaker than "reach 72%." When the target is a relative improvement, state both the relative and absolute form.

Set the measurement window

Say exactly when you will check. "We will measure this over the 30 days following full rollout to 100% of users, starting no later than [date]."

Name who owns the call

Someone must be responsible for declaring whether this criterion was met. Without an owner, the measurement never happens or nobody acts on it.

Here is what this looks like fully written out: "Checkout completion rate, currently 61.3% (measured via [analytics platform] over the 30 days ending [date]), will reach 68% or above as measured over the 30-day window following full rollout, expected by [target date]. Measurement is owned by [person]."

Is that more work than writing "improve checkout"? Yes. Does it force uncomfortable conversations about whether 68% is actually achievable? Yes. Does it mean someone will be clearly accountable if it doesn't happen? Yes. All of those things are features, not bugs. The project that started with that criterion will be run very differently than the project that started with "improve checkout" — and it will have a much higher chance of actually improving checkout.

Acceptance Criteria: The Spec Problem

While success criteria answer "did we achieve the goal," acceptance criteria answer "did we build the thing we said we'd build." They live closer to the engineering side and they're usually more comfortable to write, because they deal with observable behavior rather than business outcomes. But they have their own failure modes.

The Incomplete Spec

The most common problem is that acceptance criteria only cover the happy path. Someone writes down what the feature should do when everything goes right, and leaves blank what should happen when things go wrong. What happens when a user submits the form twice? What happens when the third-party payment processor times out? What happens when the input is malformed? What happens when the user's session expires mid-flow?

These edge cases aren't exotic hypotheticals — they are the normal operating conditions of a live system. Leaving them out of the acceptance criteria doesn't make them go away; it just means they get discovered in production, at the worst possible time, with real users affected. The way to prevent this is to treat edge cases as first-class citizens in your acceptance criteria, not as afterthoughts.

The Spec Trap

Acceptance criteria written only for the happy path will pass QA and fail production. Every edge case you leave out of the spec is a future incident you're agreeing to have. The cost of writing it down now is one hour. The cost of leaving it out is one 3am page and an emergency patch.

The Ambiguous Spec

The second problem is ambiguity. "The page should load quickly" is an acceptance criterion that can mean anything from under 100ms to under 10 seconds depending on who you ask. "The form should validate user input correctly" tells you nothing about what "correctly" means. Every piece of language that requires judgment to interpret is a future argument waiting to happen.

The cure is to write acceptance criteria as if you were writing automated test cases — because you basically are. "The page load time (Time to First Contentful Paint) should be under 1.5 seconds on a simulated 4G connection" is unambiguous. "All required form fields should display an inline error message within 200ms of the user attempting to submit with them empty" is unambiguous. You should be able to write a test that passes or fails mechanically, without a human making judgment calls about what the criterion means.

The Ownership Gap

A third, less obvious problem is that acceptance criteria often don't say who is responsible for verifying each one. If nobody is explicitly responsible for checking whether "the API returns a 4xx error with a human-readable message when given an invalid token," that check will be skipped. Not because anyone is lazy — because people are busy and they work on what they're explicitly responsible for. Make a table. Each criterion gets an owner. The owner is responsible for writing the test, running the test, and signing off that it passed.

Launch Criteria: The Gate Nobody Talks About Until It's a Problem

Launch criteria are the things that must be true before you can ship to real users. They are distinct from both success criteria (what you hope to achieve) and acceptance criteria (whether the thing works as specified). They are about operational readiness: is the system safe to put in front of people?

Most teams handle launch criteria informally. Someone who has been around a while remembers to ask about the rollback plan. Someone else remembers that Legal needs to review anything that touches user data. Someone remembers that Customer Support needs to be briefed. This works until it doesn't — until the person who remembered to ask about the rollback plan is on vacation, or the team is new and nobody knows that Legal needs to review.

The solution is a checklist. Not a long, bureaucratic, fill-every-field document — a short, clear list of non-negotiables that is agreed upon at the start of the project and revisited at launch time. Here is a baseline version that works for most product and infrastructure projects:

Baseline Launch Criteria Checklist

Monitoring and alerts: Metrics are in dashboards. Alerts are configured. Someone is responsible for the first 72 hours of on-call after launch.

Rollback plan: A written procedure exists for reverting to the previous state. It has been tested in staging. The estimated time to rollback is known.

Rollout plan: Traffic is not going to 100% on day one unless explicitly decided. The rollout sequence (1% → 10% → 50% → 100%) is written down with criteria for advancing each step.

Data migrations: If the launch involves a schema change or data migration, the migration has been run in staging. A forward and backward migration both exist.

Support readiness: Customer Support has been briefed on the new behavior. A FAQ or internal wiki page exists covering the most common questions they will receive.

Legal and privacy review: If the feature collects, processes, or exposes user data differently than before, the privacy review is signed off. Any required consent flows are in place.

Dependencies confirmed: Every external dependency (third-party APIs, partner teams, infrastructure changes) has been verified ready. No items are "assumed ready."

Incident playbook: A short runbook exists for the most likely failure mode. Engineers who are not the author have reviewed and understand it.

You will customize this for your project and your organization. Some projects won't need a privacy review. Some will need load testing results and others won't. The point is not the specific items — it's the habit of making launch criteria explicit so they can be verified rather than assumed.

The Rule

Define your launch criteria at the beginning of the project, not the week before you plan to ship. When launch criteria are written late, they get negotiated under time pressure. People skip items because the deadline is tomorrow. When they're written at the start, they become part of the project plan — with budget and time allocated to satisfy them.

The MVP Trap: When "Minimum" Becomes an Excuse

The concept of the Minimum Viable Product is one of the most useful and most abused ideas in modern software development. When it works, the MVP is a disciplined technique for learning quickly: build the smallest thing that can test your most important assumption, ship it to real users, learn from what happens, and iterate. When it fails, the MVP is an excuse: it justifies cutting every corner, removing every quality bar, and shipping something that doesn't actually work well enough to generate meaningful learning.

The difference between a good MVP and a bad one comes down to whether the "minimum" was chosen deliberately. A good MVP answers one specific question. A bad MVP is just a project where everything you didn't have time to build got labeled "V2."

The "V2" Graveyard

Here is a dynamic that plays out on almost every team at some point. During planning, the project scope gets large and the timeline gets uncomfortable. Someone suggests "we can ship an MVP in the original timeline and do the rest in V2." Everyone agrees because the pressure is real and the alternative — pushing the date — feels worse. V1 ships. The team celebrates. Then immediately moves on to the next project, because there is always a next project. V2 never ships. The features that got deferred in planning live forever in the backlog, gradually becoming irrelevant as the product evolves around them. The users who needed those features either found workarounds or churned.

The reason V2 doesn't ship is almost never laziness or bad intentions. It's because V2 was never a real commitment. It was a placeholder for "the stuff we don't have time for now" dressed up as a plan. When you write "V2" in a planning doc without a committed timeline and a named owner, what you are actually writing is "this will probably never happen."

The V2 Test

Before you agree to defer something to V2, ask two questions. First: does the product still deliver meaningful value to users without this feature, or are we just shipping an incomplete experience? Second: who is committed to owning V2, and by when? If you can't answer both clearly, the thing you're deferring should either be in scope or explicitly cut — not deferred.

The "Minimum" Part Is the Hard Part

Most teams get the "viable" part of MVP wrong in one direction — they cut so much that the product isn't actually viable. But some teams get the "minimum" part wrong in the other direction — they can't bring themselves to cut anything, so the "MVP" ends up being a full product that just happens to be called an MVP.

The question that unlocks the minimum is: what is the one question we are trying to answer with this launch? Not five questions. One. Everything that doesn't help you answer that question is scope that could be cut or deferred, at least for this launch. If you can't name the single question, you don't have an MVP — you have a product launch, and that's fine, but you should call it that and plan for it accordingly.

Example — Good MVP vs. Bad MVP

The question you're answering: "Will users pay for a premium tier if it offers offline access?"

Good MVP: Basic offline sync for the core use case. No fancy UI. No background sync. No conflict resolution. Just: does offline access change whether users pay? Launching this answers your question cleanly.

Bad MVP: Offline access, plus a redesigned settings page, plus a new notification system, plus three new dashboard widgets, all shipped at the same time. Now you can't tell what caused the change in conversion — was it offline access? The new dashboard? The settings redesign? The notification? You shipped a lot of things and learned very little.

Viable Means Viable

There is also a floor below which "minimum" becomes dishonest. A product is not viable if it crashes regularly. It's not viable if the core flow breaks in a common browser. It's not viable if the performance is so bad users give up before finishing. When the "minimum" in your MVP is below the threshold of actual usability, you are not running a learning experiment — you are generating bad data. Users abandoning a product because it's broken tells you nothing about whether they want the product.

This is the real minimum in MVP: the experience has to be good enough that a user's behavior is driven by whether they want the thing, not by whether the thing works. Anything below that threshold is not a minimum viable product — it's a prototype, and that's perfectly fine as long as you know that's what it is.

Negotiating Scope Without Losing Integrity

Let's be direct: scope gets negotiated. Every project involves people with different priorities arguing about what should and shouldn't be in the release. That's normal and healthy. What's not healthy is when the negotiation is conducted poorly — when it's driven by deadlines rather than tradeoffs, when features get added without anything coming out, when engineers agree to timelines they know are impossible because they don't want to be the difficult ones.

Good scope negotiation is about making the cost of each choice visible before anyone commits to it. The tool for doing this is called the scope tradeoff conversation, and it goes like this.

The Scope Tradeoff Conversation

When someone wants to add something to scope, your job is not to say yes or no. Your job is to make the cost visible and ask the person making the request to make the tradeoff explicit. There are four things to make visible:

Time cost: If we add this, when does the launch move to? Give a specific estimate, not a range. "This adds three weeks. The new launch date is [date]."
Quality cost: If we add this without moving the date, what do we cut or cut corners on? Be specific. "We can do this by [date] if we drop [feature X] or if we skip performance testing on [component Y]."
Complexity cost: Does this addition make the overall system harder to change in the future? If so, say so. "Adding this now means the payment flow will be harder to modify for the next six months."
Opportunity cost: What else doesn't happen if we do this? Engineering time is finite. "If we take this on, Team B's migration gets delayed by a month."

Once these costs are visible, the person requesting the addition gets to make a real choice. Maybe they still want it — and now they own the tradeoff explicitly, which is important. Maybe they realize they didn't understand what they were asking for, and they withdraw the request. Maybe it leads to a larger conversation about priorities that needed to happen anyway.

What this conversation does not do is make you the person who says no to everything. You are not blocking the request — you are informing the decision. The difference matters. You are providing information; the business is making the call.

The Key Reframe

Your job in scope negotiation is not to protect your timeline. Your job is to make sure the decision-maker has accurate information about what each choice costs. If they have accurate information and choose to take on the cost, that is a legitimate business decision. Your integrity is intact as long as you told them the truth about the cost, not as long as you prevented every addition.

Scope Creep vs. Scope Evolution

Not all scope changes are bad. Sometimes new information genuinely changes what the project needs to do. The market shifted. A competitor launched. A customer discovery session revealed an assumption was wrong. In these cases, changing scope is correct — it means the team is paying attention to reality and updating accordingly.

The difference between healthy scope evolution and destructive scope creep is deliberateness. Scope creep happens when additions accumulate without anyone consciously choosing them — each individual addition seems small and reasonable, but the cumulative effect is a project that grew 40% without anyone noticing. Scope evolution happens when someone explicitly says "I've learned X, which means we should change Y, and I'm willing to accept the following tradeoff to accommodate it."

The mechanism that distinguishes them is documentation. If every scope change is written down with its cost and the decision-maker who approved it, scope evolution stays visible and controlled. If scope changes are made informally in hallway conversations or Slack messages that get forgotten, creep is inevitable.

Hardening Your Done Definition Across Time

One of the underappreciated problems with defining done is that it can drift. The criteria you wrote at the start of the project were based on your understanding at the start of the project. Three months in, you know much more — about the problem, about the users, about the technical constraints. Sometimes you need to update the done criteria to reflect that knowledge. But there's a difference between an honest update and a quiet revision designed to make the project look better.

Legitimate Updates

Some updates to done criteria are legitimate and even necessary. You might discover that the metric you chose to measure success is not actually a good proxy for what you care about — users are gaming it, or it's measuring something adjacent to the real goal. You might discover that a launch criterion you defined is actually not achievable due to a third-party dependency you didn't fully understand. In these cases, updating the criteria is the honest thing to do.

The test for a legitimate update is: would you write this change down in the project log and tell your stakeholders about it explicitly? If yes, it's a legitimate update. If no — if the change feels like something you'd rather do quietly — that's a signal you're moving the goalposts.

Moving the Goalposts

Goalpost-moving is what happens when a project is heading for failure and instead of acknowledging the failure, someone quietly changes the definition of success to something the project can actually achieve. It's dishonest and it's very common. It usually happens incrementally — nobody sits down and decides to move the goalposts. The target slips a little here, gets reframed a little there, and after three months the project that was supposed to increase retention by 15% is being celebrated for "improving the overall user experience."

The way to prevent it is to treat your original done criteria as a public record. Write them in the project brief. Share them with stakeholders. Review them explicitly at each milestone. Make them hard to change quietly by making them visible publicly. The discomfort of a public conversation about why the criteria are changing is far less than the damage of a project that quietly declared victory on terms nobody agreed to.

Writing the Done Document

Everything in this chapter converges on a single artifact: a one- to two-page document that defines done for your project, written before any significant engineering begins. It should cover all three types of done criteria: success, acceptance, and launch.

Here is the structure. Adapt it to your context, but keep all the sections.

The Done Document — Structure

One to two pages. Written before engineering begins. Reviewed at each milestone.

The Goal (One Sentence)

What problem are we solving and for whom? Deliberately short. If you can't say it in one sentence, you don't understand it well enough yet.

Success Criteria

Two to four metrics. For each: the metric name, the current baseline with measurement date, the target value, the measurement window after launch, and who owns the measurement call. If there's a primary and secondary metric, label them explicitly.

Acceptance Criteria

The list of behaviors the system must exhibit, including edge cases. Written as verifiable statements. Each criterion has an owner who is responsible for verifying it before launch. Each criterion is either "required" or "desired" — a required criterion blocks launch; a desired one does not.

Launch Criteria

The checklist of operational readiness items. For each: what needs to be true, who verifies it, and when. Items marked "required" block launch. Items marked "expected" should be done but don't block in an emergency.

Explicitly Out of Scope

A short list of things this project will NOT do. This is not a backlog — it's a fence. If it's on this list, it won't be done in this project. Writing this down prevents scope creep from "assumed inclusions" and aligns everyone on what V1 is.

Amendment Log

An append-only record of changes to the criteria. Each entry: date, what changed, why, who approved. This is what separates legitimate evolution from goalpost-moving — the changes are visible and attributed.

The done document is not a heavyweight process artifact. It's a one-to-two page agreement. It should take two to four hours to write the first time — mostly because the writing forces conversations that need to happen. The value is not in the document itself but in the alignment the document creates and the arguments it prevents.

The "Done Enough" Conversation

There is one more form of "done" that doesn't appear in any of the categories above but that every experienced engineer has had to navigate: done enough. This is the conversation you have near the end of a project when the team has done solid work, most of the criteria are met, and there are still a handful of open items that feel important but not critical.

The done enough conversation is fundamentally about risk tolerance. The question is: what is the risk of shipping now versus the risk of waiting? Both choices carry risk. Shipping now means you might encounter a failure mode you haven't tested. Waiting means you carry the cost of delay — your users don't have the thing, your team's focus is split, and the project starts to lose momentum.

To have this conversation well, you need to separate the open items into three buckets:

Bucket	What it means	Action
Blockers	Known failure modes that will definitely affect users in common scenarios. Things that are broken, not just imperfect.	Fix before launch. These are non-negotiable.
Risks	Potential failure modes that might affect some users in less common scenarios. Imperfect handling of edge cases. Things that could cause a bad experience but won't cause data loss or security issues.	Make the risk explicit and get explicit sign-off from the decision-maker that the risk is accepted. Document it.
Polish	Things that could be better but that don't affect the core function. Minor UX issues. Non-critical performance improvements. Copy changes.	Cut or defer deliberately. Write them down so they don't get forgotten, but don't let them block launch.

The blockers are clear. The polish is clear. The risks are where the real conversation lives. Many teams treat risks as blockers because that feels safe — you can always justify more time by pointing to a risk that hasn't been fully addressed. The honest question is: given this specific risk, with this specific probability, affecting this specific percentage of users, is waiting worth the delay? Sometimes yes, sometimes no. But the answer should come from that analysis, not from general risk aversion.

The Principal's Job in This Conversation

You are not the person who decides whether to ship. You are the person who makes the risk picture accurate. Enumerate what is and isn't done. Be specific about the failure modes associated with the open items. Give a realistic estimate of how long it would take to address them. Then let the decision-maker decide. If they ship with known risk, that's their call — your job was to make sure they made it with open eyes.

Putting It Together

Defining done is not a ceremony. It is one of the highest-leverage things you can do on a large project, because it changes the behavior of every person on the project for the entire duration of the project. Engineers who know exactly what "done" means make better decisions about what to build and what to skip. Product managers who have committed to specific success criteria are more disciplined about scope. Leaders who have agreed to launch criteria in advance have fewer last-minute surprises.

The projects that go off the rails are almost always the ones where nobody had this conversation at the start. The team built something, shipped it, celebrated it, and then found that none of the metrics moved and nobody could agree on what to do next. The conversation about what success looked like — the one that would have taken four hours at the beginning — ended up taking four months at the end, with everyone demoralized and the trust between the team and the business eroded.

Four hours at the start versus four months at the end. Define done.

Chapter Summary

The Key Principle

"Done" is not a moment — it is a negotiated agreement. There are three separate things it can mean (success, acceptance, launch), and conflating them is one of the primary ways large projects fail. Define all three explicitly before building begins.

The Most Common Mistake

Treating "done" as obvious and deferring the conversation until the deadline arrives, at which point every definition of done is contested under time pressure, teams are demoralized, and the project declares victory on terms nobody agreed to at the start.

Three Questions for Your Next Project

Can you write down, in one sentence for each, how you would determine whether the project succeeded, whether the thing was built correctly, and whether it was safe to ship?
Do all your key stakeholders agree on those definitions, or are you assuming alignment that hasn't been verified?
Is anything currently labeled "V2" actually in the V2 backlog with a committed owner, or is it a graveyard with a hopeful name?