Methodology · Stradyn Probability

How the number
gets to the page.

Stradyn Probability is a calibrated estimate, not a proprietary model. This page documents the analytical workflow — the questions we ask, the drivers we name, the market-implied benchmarks we compare against, and the resolution discipline that makes every forecast gradeable.

The North Star · Applied to Every Probability
"Does this change what the client does in the next 24–90 days?"
Stradyn Probability only publishes when the forecast is decision-relevant on a meaningful horizon. We do not produce probabilities on questions that don't change repositioning.

Five stages, in order. Every time.

No forecast leaves Stradyn without clearing all five. Stages are sequenced because each one constrains the next — a poorly-framed question cannot yield a well-calibrated probability, no matter how careful the estimation step.

Stage 01

Question Framing

Specific, binary or bounded, with observable resolution criteria and a named resolution date.

Stage 02

Driver Enumeration

Explicit drivers pushing probability up and down — dated, sourced, specific actors and events.

Stage 03

Market Benchmark

Comparison against prediction markets, futures pricing, or implied probabilities where they exist.

Stage 04

Calibrated Estimate

Point probability with explicit confidence level. No false precision, no hedged language.

Stage 05

Resolution & Grading

Fixed date, fixed criteria, graded correct / partial / miss when the forecast resolves. Public.

What happens at each stage.

Stage 01
01
Question Framing — the gatekeeper stage.

The quality of a probability is capped by the quality of the question. A vague question — "Will tensions escalate?" — cannot resolve, cannot be graded, and cannot pass the Decision Test. We write questions the way Good Judgment Inc writes IARPA forecasting tournament questions: specific, binary or bounded, with observable resolution criteria and a named resolution date.

Every Stradyn Probability frame must satisfy:

  • A specific observable event or threshold as the resolution trigger
  • A named resolution date, typically 30–180 days forward
  • Binary outcome (Yes / No) or bounded range (e.g., Fed cuts by 0, 25, or 50 bps)
  • Decision relevance — the resolution changes at least one named repositioning
Example · Pass "Will the ECB raise its deposit rate at the April 30, 2026 Governing Council meeting?" · Resolution: official ECB rate announcement April 30 · Binary · 7-day horizon · Affects EUR duration positioning.
Example · Fail "Will the ECB become more hawkish?" · No observable trigger · No date · Not binary · No graded decision.
Stage 02
02
Driver Enumeration — the reasoning made visible.

Every Stradyn Probability is accompanied by an explicit list of drivers — specific, dated, sourced factors pushing the probability up and pushing it down. This is not narrative prose; it is a structured accounting of what the estimate is built on.

Drivers must be:

  • Specific — named actors, events, or data points, not general trends
  • Dated — every driver has a time reference ("April 16 ECB minutes," not "recent ECB communications")
  • Sourced — traceable to a public document, filing, market, or reporting we can cite
  • Falsifiable — a reader can check each driver against the source

If we cannot enumerate at least three drivers up and three drivers down, we do not publish the probability. Thin reasoning is worse than no forecast.

Example · Drivers up (ECB hike scenario) April 16 minutes show five council members open to a hike · March CPI at 2.5% above target · Energy shock compounding into second-round wage effects · Lagarde April 22 speech emphasized anchoring · April 28 PCE likely to confirm stickiness.
Stage 03
03
Market Benchmark — where the edge lives.

The single most important calibration discipline is comparison against a market-implied probability. If Polymarket prices a 12% chance of an ECB hike and Stradyn Probability is 28%, the question becomes: what do we see that the market does not, and is that edge real or are we wrong?

Benchmarks we use, in priority order:

  • Prediction markets (Polymarket, Kalshi, Manifold) where live contracts exist
  • Futures-implied probabilities (rate decisions, policy paths)
  • Options-implied distributions for price or index outcomes
  • Insurance/reinsurance market pricing for specific event risks

When no market exists, we note that explicitly. We do not invent a benchmark, and we are more cautious with the probability itself — the absence of a market is information.

Why this matters Stradyn Probability's value is not accuracy in absolute terms — it is calibrated divergence from market-implied. A correctly-priced forecast matching the market tells the subscriber nothing they couldn't get from Bloomberg. A differentiated, justified, gradeable divergence is the actionable product.
Stage 04
04
Calibrated Estimate — the point probability.

This is the stage the subscriber reads first but the analyst writes last. Having framed the question, enumerated drivers, and benchmarked against market-implied, the analyst assigns an integer percentage reflecting structured judgment over the drivers and the evidence base.

Estimation discipline:

  • Integer percentages only — no false precision from decimals
  • Confidence level stated explicitly — high, medium, or low — with one-sentence reasoning
  • Probabilities in the 40–60 range are flagged as low-conviction; we will not push into binary language for coin-flip cases
  • Extreme probabilities (below 10%, above 90%) require additional justification in the confidence note

Stradyn Probability is explicitly not a quantitative model output. It is structured analytical judgment, graded against outcomes. The Scorecard keeps us honest.

Stage 05
05
Resolution & Grading — the discipline.

Every forecast resolves on its named date against its named criteria. The result is recorded — correct, partial, or miss — and appended to the Scorecard. Misses stay on the record.

Grading rubric:

  • Correct: Outcome matches the forecast side and the probability assigned was on the confident side (≥60% for Yes, ≤40% for No)
  • Partial: Outcome matches the forecast side but the probability was near 50/50, or the outcome was directionally correct but mis-magnitude (e.g., we said 25bp cut, they cut 50bp)
  • Miss: Outcome is the opposite of the forecast side with meaningful conviction (probability >60% for the wrong side)

The Scorecard publishes the track record — correct, partial, miss counts, and a hit rate calculated as (correct + 0.5 × partial) / total. This matches the Brier-score tradition in the forecasting literature: generous enough to reward directional calibration, strict enough to penalize overconfident misses.

What Stradyn Probability is

A calibrated analytical frame.

Stradyn Probability is structured judgment — disciplined, documented, and graded against outcomes. Every forecast carries an explicit question, dated drivers, a market-implied benchmark, a confidence level, and a resolution date.

The method is transparent by design. A subscriber can audit any published probability, check the drivers against named sources, compare to the market benchmark, and — when resolution comes — see how the forecast graded.

This is the Good Judgment Project model, applied to geopolitical intelligence. Calibration through practice and scorekeeping, not through proprietary math.

What it is not

A quantitative model.

Stradyn does not run proprietary Bayesian networks, Monte Carlo simulations, or machine-learned classifiers to produce probabilities. We do not claim to.

Geopolitical outcomes do not decompose into clean equations. They emerge from actor incentives, institutional constraints, and historical analogy — domains where structured judgment outperforms narrow quantitative methods, provided the judgment is disciplined and graded.

Claiming a model we do not run would be marketing inflation. We prefer the honest framing: calibrated frames, transparent drivers, public scorecard. The work speaks through the track record, not through the label.

Where the method breaks.

Every analytical framework has failure modes. Naming ours is part of the discipline — it tells subscribers where to trust the output and where to weight it lighter. We update this list as the Scorecard reveals new patterns.

01

Step-change events without analog

Our drivers lean on historical patterns and stated actor incentives. When an outcome breaks from history — a first-use nuclear event, an unprecedented regime collapse, a novel trade architecture — our probability will be poorly calibrated because the reference class is thin. We flag these forecasts with low confidence, but the uncertainty is real.

02

Tail-probability estimation

Probabilities below 10% and above 90% are harder to calibrate than mid-range estimates. Small biases compound. Our historical track record on tails has a higher miss rate than on base-case frames. We still publish tail forecasts when the decision implication warrants it — but we weight confidence accordingly.

03

Sub-weekly horizons

Stradyn Probability is built for 7–180 day horizons. On shorter windows, market-implied probabilities from prediction markets and futures are typically better calibrated than structured judgment — markets react faster to new information than analysts do. We do not publish probabilities on timescales under seven days.

The Stradyn Scorecard · Calibration Proof
The method only earns trust through the track record.
Every Stradyn Probability resolves, gets graded, and gets added to the public Scorecard. Hits and misses alike. A methodology that cannot be graded cannot be trusted; one that can is the only honest claim to calibration.
See the full scorecard →
Live record · Year to date
Resolved probabilities
Correct
Partial
Miss
Hit rate
The Scorecard populates as Stradyn Probabilities resolve and are graded against their criteria. Track record begins with v6 launch; forecasts published prior are not retroactively counted.