Goldeneye

Decision Calibration

Forecast reliability · diagnostics
NG

How well do your convictions calibrate?

Reliability diagram across your logged journal entries. Perfect calibration sits on the diagonal: a 70% conviction band should resolve as hits 70% of the time. Bands below the diagonal are over-confident, bands above are under-confident.

Headline

Your 85% theses resolved at 43% (n=14).

Sample
Total entries48
Resolved45
Unresolved3
Reliability diagram3 buckets · diagonal = perfect calibration
Per-bucket detail
BucketClaimed meanTotal nResolvedHitsHit rate
0-20%000n=0 (need 3+)
20-40%000n=0 (need 3+)
40-60%54%18151173%
60-80%65%16161063%
80-100%85%1414643%

Model Health · how each model fails

Loading…

Calib err (reliability) is how far stated confidence sits from realized hit-rate — lower is better; Sharpness (resolution) is how much the model discriminates across its confidence levels — higher is better; Dir gap flags a one-sided edge. Descriptive, in-sample over the backtest window — not a forward forecast.

Desk Calibration · skill vs. luck

Loading…

Calibration (Brier on stated conviction) measures whether an analyst's confidence is reliable — the skill signal; Hit is the raw outcome, which luck contaminates. Scores are withheld until 10 decisions resolve. Descriptive decision-quality diagnostics, not advice.