Appendix C

The Risk Register

Most projects don't fail because of bad luck. They fail because of bad luck that was predictable. The risk register is how you predict it and decide in advance what you'll do about it.

What a Risk Is (and What It Isn't)

A risk is something that might happen that would hurt your project. It has not happened yet. The moment it happens, it becomes an issue, and you deal with it differently.

This distinction matters because risk management and issue management require completely different responses. With a risk, you have time to prepare. You can take actions now — before it happens — that either reduce the chance it happens or reduce the damage if it does. Once it becomes an issue, you're reacting under pressure.

The most common mistake engineers make with risk is conflating "risk" with "fear." A fear is vague: "something might go wrong with the database." A risk is specific: "our database connection pool size is hardcoded to 100, and the load test shows we'll need 300 connections at peak. If we don't change this before launch, the service will start rejecting connections under real traffic." The second version is actionable. The first is just stress.

The Test for a Real Risk

Ask: can I name what specifically happens, when it's likely to happen, and what the consequence is? If all three answers are yes, you have a real risk. If any answer is "I don't know," you have a fear you need to investigate before it becomes a risk.

The Five Fields That Make a Risk Actionable

Most teams either skip risk registers entirely or build them with ten fields per risk and then never update them. The sweet spot is five fields. Each one pulls its weight.

Field 1: Description

One or two sentences, maximum. What might happen and why. Written as a cause-and-effect statement: "If X, then Y." For example: "If the Identity team can't deliver the user-lookup API by May 10, we can't complete the IdP integration, and the May 29 staging milestone slips by at least two weeks."

The cause-and-effect format is not just stylistic. It forces you to think about whether you've actually identified the cause — and therefore whether you have any options to address it.

Field 2: Probability (1–5)

How likely is it that this risk actually materializes? Use a simple scale:

Be honest. Most engineers rate risks as 2 or 3 to avoid looking pessimistic. A risk you rate 3 but treat as a 5 is still fine. A risk you rate 2 when it's actually a 4 — because you didn't want to seem negative in the planning meeting — is a problem.

Field 3: Impact (1–5)

If this risk materializes, how bad is it? Same scale, but measuring damage:

Field 4: Mitigation

What can you do right now to either reduce the probability or reduce the impact? There are two kinds of mitigation: preventive (things you do to make the risk less likely) and contingency (things you prepare in advance so if it happens, you recover faster).

For high-score risks, you usually want both. For example, for a risky external dependency: preventive is "meet with the Identity team this week and get a written commitment on the May 10 date." Contingency is "if they slip, we already have a mock API we can develop against while waiting."

Field 5: Decision Trigger

This is the field most teams skip, and it's the most valuable one. A decision trigger is a pre-agreed answer to the question: at what point do we stop waiting and do something different?

"If the Identity team has not delivered by May 7 (three days before the dependency date), we will escalate to their VP and simultaneously build the mock." This is a decision trigger. It means when May 7 arrives with no delivery, you don't have to call a meeting to decide what to do. You already decided. You just execute.

The value of decision triggers is that they move decisions from emotional moments (when the risk has just materialized and everyone is stressed) to calm moments (when you were planning and thinking clearly).

The Risk Score and the Heat Map

Multiply probability by impact to get the risk score (maximum 25). This number tells you where to invest your energy. Don't treat all risks equally — that's how you end up spending two weeks mitigating a score-4 risk when there's a score-20 risk you've ignored.

Risk Heat Map — Probability × Impact
5
5
10
15
20
25
4
4
8
12
16
20
3
3
6
9
12
15
2
2
4
6
8
10
1
1
2
3
4
5
1
2
3
4
5
↑ IMPACT PROBABILITY →
■ 1–6: Low — monitor ■ 8–10: Medium — mitigate ■ 12–16: High — own it ■ 20–25: Critical — escalate

Filled Example: Enterprise SSO Risk Register

Here are the four most significant risks for the SSO project, fully filled out. Notice how specific each one is — and notice the decision triggers in particular. They are the part that makes the register actually useful at 11pm when something has just gone wrong.

R1
Identity team API not ready by May 10
Score: 20 (P4 × I5)
Description
If the Identity team does not deliver the user-lookup-by-email and org-sso-config-write endpoints by May 10, the IdP integration cannot be completed, and the May 29 staging milestone slips at minimum two weeks, likely making the June 6 GA date impossible.
Mitigation
Preventive: Written commitment from Priya Nair by May 4, confirmed with her manager. Weekly check-in starting May 4.
Contingency: Mock API already being built in parallel (Sam Chen owns this). If they slip, we develop against the mock and swap out for real API later.
Decision Trigger
If by May 7 there is no green signal from Priya on the API delivery, we (1) escalate to the Identity team manager immediately, (2) flip the mock flag on all IdP integration work, and (3) inform Vikram Singh that the June 6 date is at risk. We do not wait until May 10 to make this call.
R2
SAML library integration takes 2× longer than estimated
Score: 15 (P3 × I5)
Description
We have never integrated a SAML library before. The two most-used open-source SAML libraries have known edge cases with non-standard IdP configs. Our estimate of 2 weeks could easily be 4. That alone kills the June deadline.
Mitigation
Preventive: Start with a time-boxed spike this week (3 days max) to validate the SAML library choice before committing to the full integration. Get Acme Corp's IdP metadata file in advance to test against real data.
Contingency: If the spike reveals a complexity problem, we consider a commercial SAML library (budget: up to $800/year) that has better enterprise IdP coverage.
Decision Trigger
If the 3-day spike has not produced a working basic SAML exchange by May 8, we evaluate the commercial library option that same day. Decision made by May 8 EOD — not "let's see how it goes next week."
R3
Security review finds a design problem after staging
Score: 12 (P3 × I4)
Description
If the security review on May 18 surfaces a fundamental issue with the token handling or cert rotation design, we could need 1–2 weeks of rework after staging, which consumes the entire buffer before June 6.
Mitigation
Preventive: Share the security design doc with Maria Okonkwo by May 15 so she can do a preliminary scan before the formal review. Address any structural concerns in an async conversation before the May 18 meeting.
Contingency: Build two weeks of float into the June 6 date by framing it as "GA no later than June 6" with the understanding that June 13 is a hard backstop.
Decision Trigger
If the security review on May 18 results in any finding rated "critical" or "high," we pause the staging timeline and schedule a 24-hour design sprint with Maria's team. We do not ship past a known high-severity security issue regardless of deadline pressure. We tell Vikram the same day.
R4
Pilot customer (Acme Corp) IT team is slow to test
Score: 9 (P3 × I3)
Description
Acme's IT team may take longer than the three days we've budgeted to validate SSO in staging. Enterprise IT teams often have internal approval processes for testing new auth configs. A week's delay here eats into the June 6 window.
Mitigation
Preventive: David Tan (Sales) to brief Acme's IT contact this week. Get their IT team's testing calendar in advance. Set the staging test date bilaterally — not "let us know when you're ready."
Contingency: Run our own end-to-end test with a test Okta tenant that mirrors Acme's config. If Acme is slow, we can still launch with high confidence and Acme validates in prod with a fallback to password auth.
Decision Trigger
If we do not have a confirmed Acme IT test slot by May 24, we ship to GA on June 6 based on internal testing only and treat the Acme validation as a post-launch milestone. Decision made May 24, not June 4.

How to Maintain the Register During the Project

Review weekly, not daily

A daily risk review is overkill and will kill the habit. Once a week, spend ten minutes running through the register. Ask three questions for each risk: has the probability changed? Has the impact changed? Are any decision triggers about to fire?

Retire risks when they pass

A risk that is no longer possible should be marked closed with a note on how it resolved. "Identity team delivered API on May 9 — risk retired" is useful historical context. An old risk that you leave open with no update is just noise.

Add new risks when you discover them

The register is not a prediction of all risks at project start. It is a living document. When something surfaces mid-project that could hurt you, add it immediately. The habit of adding risks promptly is more important than having a perfect register at kickoff.

Share it with your manager

The risk register is not a private document. Your manager should be able to look at it and understand the top three things you're worried about. If sharing the register feels uncomfortable, that usually means you haven't been transparent enough about the project's challenges — and that is a problem to fix sooner rather than later.

The Blank Template

RISK REGISTER — [PROJECT NAME]

ID: _____
Title: ________________________________________________
Probability (1–5): _____   Impact (1–5): _____   Score: _____

Description (If X, then Y):
__________________________________________________________________________

Preventive mitigation:
__________________________________________________________________________

Contingency mitigation:
__________________________________________________________________________

Decision Trigger (at what point / what action):
__________________________________________________________________________

Owner: ______________________   Status: [ ] Open [ ] Mitigated [ ] Closed
Last reviewed: _______________
The Point of All This

A risk register does not prevent bad things from happening. It prevents bad things from catching you by surprise. And it prevents you from having to make hard decisions under emotional pressure, because you already made them in a calm room with a blank whiteboard. That is the whole point.

← Appendix B: Stakeholder Map Appendix D: Dependency Tracking →