Part II — Bringing Clarity Chapter 5

Mapping the Unknowns

The engineers who ship the impossible don't know more than you do at the start. They just have a better system for turning fog into a map — one assumption at a time.

What You'll Get From This Chapter

A concrete method for making ambiguity manageable. By the end, you'll have tools to see what you don't know, organize it, and attack it systematically — instead of hoping it goes away.

The four-quadrant model of what you know and don't know
Why unknown unknowns kill more projects than technical difficulty
The assumption register: your first line of defense against surprise
Technical uncertainty vs. organizational uncertainty — different problems, different tools
The spike: how to run a targeted investigation that actually reduces risk
The discipline of making unknowns visible to your team and stakeholders

The Uncomfortable Truth About Ambiguous Projects

When you get handed a large, ambiguous project, there's a specific feeling that comes with it. It's not quite fear, not quite excitement. It's closer to standing in a room where someone just turned off the lights. You know there's furniture in there. You just don't know where.

The instinct — for almost every engineer — is to start moving. To write the design doc. To spin up the prototype. To begin the sprint planning. Movement feels like progress, and progress feels good.

But here's the problem: moving fast in a dark room mostly means you hit furniture faster. And in a six-month project, "hitting furniture" means discovering three weeks from launch that the one team whose API you depend on has a six-week code freeze, or that the "simple migration" you've been building toward requires a schema change that takes down a service your CEO demos daily.

The engineers who consistently deliver large, ambiguous projects have one thing in common: they spend time mapping the room before they move through it. Not endlessly — not six weeks of analysis before a single line of code is written. But enough. Enough to know where the load-bearing walls are. Enough to know which doors are locked. Enough to know which areas of the room they genuinely cannot see from where they're standing.

This chapter is about how to do that systematically. It's not about eliminating uncertainty — that's impossible. It's about turning invisible uncertainty into visible, trackable, actionable items. Because uncertainty you can see is a problem. Uncertainty you can't see is a trap.

The Four Quadrants of Knowledge

There's a framework that sounds simple and turns out to be one of the most useful things you can carry into any ambiguous project. It comes from decision theory, but you don't need to know anything about decision theory to use it. Here it is:

Everything you need to know about your project falls into one of four buckets. The buckets are defined by two questions: Do you know it? And do you know that you don't know it?

Quadrant 1

Known Knowns

Things you know, and you know you know them. The stack you're using. The team size. The deadline. Facts you can state confidently right now.

Quadrant 2

Known Unknowns

Things you don't know, but you know you don't know them. "We haven't decided on the data model yet." "We don't know if this will fit in the SLA." Questions you can write down.

Quadrant 3

Unknown Unknowns

Things you don't know, and you don't know you don't know them. The questions you haven't thought to ask yet. The dependency you haven't discovered. The constraint you'll learn about in month four.

Quadrant 4

Unknown Knowns

Things that are actually known — somewhere in your org, or in your own head — but not by the right people. The institutional knowledge trapped in someone's brain who hasn't been looped in yet.

Quadrant 1: Known Knowns — Your Foundation

These are the facts you can write down today without hesitation. The technology is Python and Postgres. The deadline is Q3. There are three engineers on the team. The budget is fixed.

Known knowns are your foundation. They don't require investigation, just documentation. The mistake people make with known knowns is not writing them down. They live in the project lead's head, and three months later there's a disagreement about what "the deadline" actually meant, or whether the budget included infrastructure costs, or which teams were officially in scope.

Write them down. Put them in the one-pager. Make them explicit. Known knowns become contested facts the moment they're left unwritten.

Quadrant 2: Known Unknowns — Your Work Queue

These are the open questions you already know you have. You're aware of them. You just haven't answered them yet. "We need to decide whether to migrate existing data or backfill lazily." "We don't know if the third-party payment API supports batch operations." "We haven't confirmed that legal has approved the new data retention policy."

Known unknowns are your work queue. They are the questions your project needs to answer in order to move forward. Some of them block design decisions. Some of them block coding. Some of them block launch. Knowing which ones block which things — and in what order they need to be answered — is a significant part of project planning.

The failure mode here is treating all known unknowns as equal urgency. They're not. Some will resolve themselves as the project progresses. Some need to be resolved in week one or they'll cascade into delays in week eight. Part of your job is triaging them.

Quadrant 3: Unknown Unknowns — The Real Danger

This is the quadrant that kills projects. Not the hard technical problems. Not the tight deadlines. The things you didn't know you didn't know.

A team spends four months building a real-time messaging system. In month four, they discover that the mobile app's background process limitations mean messages can't be delivered reliably while the app is backgrounded — a constraint no one had thought to ask about, because no one on the backend team had shipped a mobile app before. The architecture has to be redesigned. Two months of work become partially obsolete.

A platform team completes a major API redesign. Two weeks before launch, they discover that an internal data science team has thirty pipelines hardcoded against the old API format — pipelines that no one documented and that don't show up in any dependency graph. Launch gets delayed six weeks.

These aren't edge cases. They are the normal experience of working on large, ambiguous projects at companies with more than a few hundred engineers. The larger the org, the more unknown unknowns lurk.

Why Unknown Unknowns Are So Dangerous

Known unknowns appear on your risk register. You budget time to resolve them. You tell stakeholders "we're investigating X." Unknown unknowns don't appear anywhere — until they explode. By then, you're usually too far into the project to absorb the impact without slipping a deadline or cutting scope. The surprise is the cost.

You cannot eliminate unknown unknowns. By definition, you don't know what they are. But you can reduce their probability and reduce the blast radius when they hit. The tools for this are: broader stakeholder mapping (looping in more people earlier), cross-org discovery sessions, explicit assumption-capturing, and staged rollouts that let you discover problems when they're still cheap to fix.

Quadrant 4: Unknown Knowns — The Hidden Resource

This is the most underrated quadrant. Unknown knowns are things that are known, just not by you — not yet, not in the right room.

Somewhere in your company, someone has already tried to build what you're building. Somewhere, someone knows that the API you're planning to use has a rate limit that will destroy your design at scale. Somewhere, someone wrote a post-mortem eighteen months ago about exactly the failure mode you're walking into.

Unknown knowns are why talking to people is not just a social nicety — it's a project management technique. Every senior engineer you sit down with for thirty minutes is a chance to convert unknown knowns into known knowns. Every old post-mortem you read. Every Slack channel you search. Every person you pull into your design doc review who wasn't originally on the list.

A Simple Practice That Pays Off Every Time

At the start of every project, ask this question in a meeting with your team: "Who else has tried to build something like this, and what happened?" Then actually find those people and talk to them. This single habit converts more unknown knowns into known knowledge than any amount of individual research.

The Real Project Killer Is Always in Quadrant 3

You've probably heard war stories from experienced engineers. They always follow the same structure. "We were so close to done, and then we discovered..." That discovery is always something from quadrant three. An undocumented dependency. A hidden constraint. A team that was doing something adjacent that nobody knew about. A performance characteristic of the underlying system that only manifested at production scale.

The reason this keeps happening isn't that engineers are careless. It's that large systems are genuinely complex, organizations are not fully legible, and the human mind is only good at worrying about things it can already see.

"The project was not killed by what we knew and couldn't solve. It was killed by what we didn't know we needed to know."

Your job, as the person leading an ambiguous project, is to shrink quadrant three as fast as possible. You do this by converting its contents into quadrant two (things you know you don't know) — so you can put them on your work queue, assign them to someone, and resolve them before they explode.

The assumption register, the spike, the two-uncertainty framework — all the tools in this chapter are ultimately in service of this one goal: moving things out of the dark part of the room and into the lit part, where you can deal with them properly.

The Assumption Register

What It Is

An assumption register is a living document — nothing fancier than a table — that captures every significant assumption your project is making. An assumption is anything you're treating as true, but that you haven't confirmed is true.

Most projects are built on dozens of assumptions. The upstream service will continue to support its current API format. The data volume won't exceed ten million rows. The legal team will approve the approach before the end of Q2. The mobile team can integrate the new SDK without a forced app update. Every one of these, if wrong, could derail the project. Most of them are never written down.

When assumptions are implicit — living in someone's head rather than on paper — three things happen. First, they get forgotten. Second, when they turn out to be false, nobody remembers making them, so the failure looks like bad luck rather than a knowable risk. Third, different team members are operating on different assumptions without realizing it, which creates alignment problems that look like disagreements but are really just different invisible premises.

Writing them down fixes all three problems.

How to Build One

A useful assumption register has five columns. You don't need more than five.

Assumption	Type	Risk if Wrong	Owner	Status
The Payments API supports idempotent retries on 5xx errors	Technical	High	Priya	Unverified — spike planned for week 2
Legal sign-off on the new data retention model will arrive by March 15	Org	High	Marcus	Meeting scheduled for March 8
The current read load on the Users table is under 5k QPS	Technical	Medium	Dana	Verified — checked dashboards Jan 14
The iOS team can absorb a two-day SDK integration without delaying their sprint	Org	Medium	James	Unverified — need to sync with iOS lead
Backfilling existing rows at migration time will complete within the maintenance window	Technical	High	Priya	Unverified — backfill test needed on staging data volume
The data science team doesn't depend on the current event schema format	Org	High	Marcus	Unverified — nobody has asked them yet

You'll notice the last row. "Nobody has asked them yet." That is an unknown known sitting in plain sight, freshly converted into a known unknown just by the act of writing it down. That's the power of the register.

The Most Important Column Is "Status"

The assumption register is not a one-time artifact you produce at kickoff and file away. It's a living document. Status evolves. "Unverified" becomes "Verified" or — more often than people expect — "False: assumption was wrong, here is what we're doing instead." A register that isn't being updated is just a snapshot of your ignorance on day one.

Doing this well requires one thing that doesn't come naturally to most engineers: the willingness to write down things you're not sure about. Writing down an assumption feels like admitting weakness, like you should have already verified it. This instinct is wrong. Writing it down is not admitting weakness — it's exercising rigor. The weakness is in leaving it implicit.

Keeping It Alive Through the Project

The register dies when it becomes someone's homework instead of the team's shared artifact. Here's how to keep it alive:

1
Review it every week in your project sync
Five minutes. Go through the unverified high-risk ones. Ask: what happened this week that changes our assumptions? Who is working on verifying what? This keeps it fresh and surfaces blockers early.
2
Treat every new discovery as a potential new row
When someone says "I assume we can just use the existing API for this," that's a new row. When a stakeholder says "we've always done it this way," that's an assumption about the future behaving like the past. Add it.
3
When an assumption breaks, do a mini-retro on it
Why did we think this was true? When did we first make this assumption? What would have helped us catch it sooner? This builds the team's assumption-detection reflex over time.
4
Share it with stakeholders — the high-risk ones, at least
Stakeholders appreciate transparency about what you're not sure of. "Here are the three things we're treating as assumptions that we're actively verifying" is a much stronger status update than "we're on track." It shows rigor and surfaces risks early, when they're still manageable.

Two Kinds of Uncertainty — and Why the Difference Matters

Once you start capturing your assumptions, you'll notice they fall into two very different buckets. This distinction is not just semantic. These two types of uncertainty have different causes, different resolution paths, and different blast radii when they go wrong.

Technical Uncertainty

Technical uncertainty is about whether something is possible or feasible. Can we migrate six hundred million rows in a rolling window without downtime? Does the ML model generalize well enough to production data distribution? Will the new event-driven architecture stay under our 50ms p99 SLA under peak load?

Technical uncertainty lives in the code, the data, the infrastructure. You resolve it by building things and measuring them. By running tests. By doing load tests on production-scale data. By looking at what similar systems have done.

Technical uncertainty has an important property: it respects evidence. You run an experiment, you get data, you know more than you did before. The feedback loop is relatively clean. You ask the database a question, it gives you an answer. You run the load test, you get a latency histogram. Technical uncertainty is uncomfortable, but it's honest.

Example: Technical Uncertainty in Practice

Your team is building a new search service. You assume it can handle the current query volume under 200ms. But you haven't tested it at full scale. That's technical uncertainty. The way to resolve it is to build a replica of the production load and run it. The answer will come back as a number. You'll either be right or wrong, and you'll know.

The mistake is not doing this test early. Teams that defer load testing to the week before launch discover their p99 is 1.2 seconds, not 200ms — and now they have two weeks to fix an architecture problem that they've been building on for four months.

Organizational Uncertainty

Organizational uncertainty is about whether something is allowed, agreed upon, or politically viable. Will legal sign off on this approach? Is the data science team willing to migrate their pipelines on our timeline? Does the VP of Engineering actually support this project, or are we doing it because someone three levels below her thought it was a good idea?

Organizational uncertainty lives in people and relationships. You resolve it by having conversations, getting explicit commitments, and understanding the incentives and priorities of the people around you.

And here is the critical difference: organizational uncertainty does not always respond to evidence. You can show a person data that clearly supports your approach, and they can still say no, because their reasons aren't technical. A team might not migrate their pipelines on your timeline not because they can't, but because they have their own roadmap and your project isn't on it. Legal might not sign off not because your approach is actually non-compliant, but because the lawyer assigned to your review is overloaded and underinformed.

Organizational uncertainty is resolved through alignment, not investigation. You need to find the right people, bring them into the conversation early, understand their constraints and incentives, and create agreement — not just proof.

Why This Distinction Changes How You Work

Engineers who treat organizational uncertainty like technical uncertainty get stuck in a loop. They produce more evidence, better data, more thorough analysis — and wonder why people still aren't moving. The answer is that they're applying the wrong tool to the problem.

When you hit a blocker, ask yourself: Is this a question of feasibility or a question of alignment? If it's feasibility, go build an experiment. If it's alignment, go have a conversation. Treating an alignment problem like a research problem is one of the most common ways senior engineers waste months.

The Most Expensive Mistake in Cross-Team Projects

Building a technically perfect solution to a problem that an adjacent team won't adopt, because you never got their buy-in before you built it. You can prove your design is better in every measurable way, and it won't matter. Adoption is an organizational question, not a technical one. The time to resolve organizational uncertainty is before you build, not after.

In practice, most hard projects have both types of uncertainty entangled together. "Can we migrate the authentication service?" is simultaneously a technical question (is the new auth service API-compatible enough?) and an organizational question (will the twelve teams that use auth give us their attention for integration testing?). You need to track both threads, resolve them with different tools, and understand that one often unblocks the other in unexpected ways.

The Spike: Your First Tool Against Ambiguity

What a Spike Is

A spike is a short, focused, timeboxed investigation designed to answer a specific question. The word comes from the Extreme Programming tradition — like driving a spike through a piece of wood to check its consistency — but the concept is universal. It's not about building something that will ship. It's about building something that will tell you whether you can ship the thing you actually want to build.

A spike is how you attack technical uncertainty directly. Instead of assuming the database can handle the query load, you build a minimal test harness and measure it. Instead of assuming the third-party API behaves the way the docs say it does, you write fifty lines of code and call it. Instead of assuming the machine learning model will generalize, you run it against a held-out production sample.

Spikes are the antidote to the most common form of early-project paralysis: you know you don't know something, you're not sure how bad it could be, and so the question festers in the background while the team starts building on the assumption that it'll work out. The spike says: no. Let's find out now, while it's cheap to be wrong.

The Rules of a Good Spike

Not all investigations are spikes. A spike has a specific structure that makes it different from "noodling on a problem" or "doing some research."

1
One question, stated before you start
A spike begins with a question written down in one sentence. "Can we process 50k events per second through the new pipeline without exceeding 200ms p99 latency?" Not "let's explore the pipeline." A question. If you can't write the question in one sentence, you don't know what you're spiking yet.
2
A timebox — agreed on before the spike starts
One day. Two days. One week. You set the clock before you start, and when the clock runs out, the spike is over. You either have your answer or you have a better understanding of why the question is harder than you thought, which is itself an answer. A spike without a timebox is just an open-ended research project, which is a different thing entirely.
3
The output is a decision, not a prototype
When the spike ends, you should be able to say: "Yes, this is feasible, and here's the evidence" or "No, this approach doesn't work, and here's what we learned." The code you wrote during the spike is usually thrown away. The point was never the code — it was the answer. Teams that try to keep and productionize spike code almost always regret it.
4
The spike should change your plan
If the result of the spike doesn't change anything you were going to do, you spiked the wrong thing. A good spike answers something that was genuinely load-bearing — something that, if the answer came back wrong, would force you to make a different architectural decision, scope a different milestone, or have a different conversation with your stakeholders.

A Spike vs. Not a Spike

This distinction trips people up. Here's a concrete way to think about it:

Spike: The Right Use

Question: "Can we backfill 800 million historical rows in our analytics table within a 6-hour maintenance window without impacting read performance?"

Timebox: 2 days

Method: Run the backfill against a sample of 50 million rows on a staging clone with production-level read load simulated. Measure wall clock time and read p99 during backfill.

Output: "At current I/O throttling settings, the full backfill will take 22 hours, not 6. We need either a longer maintenance window, a chunked online migration approach, or a different backfill strategy. Decision: we will use the chunked online approach. Milestone 2 scope changes accordingly."

Not a Spike: The Wrong Use

"Let's spike on the new search architecture for a week."

This is not a spike. There is no question. There is no success criterion. There is no decision this will inform. What you'll get after a week is a half-built prototype that nobody is sure what to do with, and a team that is now emotionally invested in an architecture they built without constraints. This is exploration, which is fine — but it's not a spike, and it doesn't reduce risk the way a spike does.

Spikes are most valuable in the first two to four weeks of a project, when uncertainty is highest and decisions are most reversible. By the time you're six weeks in and the architecture is mostly built, a spike that tells you the architecture is wrong is less useful — not because the information isn't valuable, but because the cost of acting on it has increased dramatically.

Run your spikes early. This is the single biggest mistake teams make with them. "We'll figure that out once we have the core working" is how you get an elegant core sitting on top of a foundation that can't support it.

The Discipline of Making Unknowns Visible

Everything in this chapter — the four quadrants, the assumption register, the two-type framework, the spike — is really one idea expressed in different forms: the work of the first weeks of an ambiguous project is to make invisible things visible.

This discipline runs counter to how most people are wired. We are rewarded, in engineering, for having answers. Standing up in a meeting and saying "here are twelve things we don't know yet" feels exposing. It feels like you should have done more homework. It feels like it will make stakeholders nervous.

The opposite is true. Stakeholders who have been around long enough know that large projects have unknowns. What makes them nervous isn't the existence of unknowns — it's the sense that the team doesn't know about them. When you walk in with a structured list of what you know, what you're investigating, and what you're monitoring, you create confidence. You demonstrate rigor. You show that when something unexpected happens, you'll see it early — because you're looking.

How the Best Principal Engineers Frame Uncertainty to Stakeholders

Not: "We're not sure if this will work."

But: "We've identified five assumptions our current plan depends on. Two are verified. Two we're spiking this week. One — the data science team's API dependency — we're resolving in a meeting Thursday. Our plan is solid for the verified assumptions, and we have specific actions underway for the rest."

Same level of uncertainty. Completely different signal to the person in the room.

The discipline also builds something in your team that is hard to teach directly: the habit of asking "what are we assuming here?" Teams that work this way catch problems earlier. They make better architectural decisions. They have fewer late-stage surprises. And over time, they get a reputation for shipping reliably — not because they were lucky, but because they were systematic.

The fog doesn't go away all at once. You don't write the assumption register on day one and wake up with a perfect map. What happens instead is that every week, the map gets a little more complete. A few more assumptions get verified. A spike answers a question that was blocking a design decision. A conversation with an adjacent team reveals a dependency nobody knew about. Slowly, the room gets lit.

Your job is to make sure that process is happening — consistently, visibly, and faster than the project is moving.

Chapter Principles

01 Everything you don't know about your project is in one of four buckets: known knowns, known unknowns, unknown unknowns, and unknown knowns. Your job is to move items from the dark buckets into the light ones.
02 Unknown unknowns kill more projects than hard technical problems. They are invisible until they're expensive, which is why you must actively hunt for them rather than wait for them to surface.
03 An assumption register is not a sign that you are uncertain — it is a sign that you are rigorous. Projects built on implicit assumptions fail for reasons nobody can reconstruct. Projects built on explicit, tracked assumptions fail for reasons you can understand and learn from.
04 Technical uncertainty and organizational uncertainty require different tools. Technical uncertainty yields to experiments and data. Organizational uncertainty yields to conversations, alignment, and explicit commitments. Using the wrong tool wastes time.
05 A spike is a timeboxed investigation with a single question and a clear decision as its output. Run them early, when being wrong is still cheap. Spikes that come back with bad news are not failures — they are the system working exactly as intended.
06 The discipline of making unknowns visible is what separates engineers who are reliably trusted with large projects from engineers who aren't. Stakeholders don't fear unknowns. They fear working with someone who doesn't know what they don't know.

← Previous Ch 4 — The Art of the Problem Statement Next → Ch 6 — Defining Done