Decision Calibration
Forecast reliability · diagnosticsHow well do your convictions calibrate?
Reliability diagram across your logged journal entries. Perfect calibration sits on the diagonal: a 70% conviction band should resolve as hits 70% of the time. Bands below the diagonal are over-confident, bands above are under-confident.
Your 85% theses resolved at 43% (n=14).
| Bucket | Claimed mean | Total n | Resolved | Hits | Hit rate |
|---|---|---|---|---|---|
| 0-20% | — | 0 | 0 | 0 | n=0 (need 3+) |
| 20-40% | — | 0 | 0 | 0 | n=0 (need 3+) |
| 40-60% | 54% | 18 | 15 | 11 | 73% |
| 60-80% | 65% | 16 | 16 | 10 | 63% |
| 80-100% | 85% | 14 | 14 | 6 | 43% |
Model Health · how each model fails
Loading…
Calib err (reliability) is how far stated confidence sits from realized hit-rate — lower is better; Sharpness (resolution) is how much the model discriminates across its confidence levels — higher is better; Dir gap flags a one-sided edge. Descriptive, in-sample over the backtest window — not a forward forecast.
Desk Calibration · skill vs. luck
Loading…
Calibration (Brier on stated conviction) measures whether an analyst's confidence is reliable — the skill signal; Hit is the raw outcome, which luck contaminates. Scores are withheld until 10 decisions resolve. Descriptive decision-quality diagnostics, not advice.