Statistical foundations · reference
METHODOLOGY
A concise reference for the math behind every indicator on hypo.markets — what each number means, how it's computed, and the canonical papers behind the estimators. Companion to the live per-asset dashboards and the interactive simulator.
Kelly criterion#
f★ = p − (1 − p) / b
The fraction of bankroll that maximises expected log-growth on a binary bet at decimal odds b with win probability p. Full Kelly extracts the most growth but with brutal variance; most pros run ¼ or ½ Kelly to survive model error. Negative f★ ⇒ the edge is on the other side and you should not take the position.
For a continuous-return asset (perp), the analogue is f★ = μ / σ² (Merton continuous-time). hypo.markets shows both the empirical argmax of g(f) and the parametric μ/σ² as side-by-side markers on the perp Kelly card.
Ref: J. Kelly Jr., 1956 · Bell System Technical Journal
▶ See this live in the simulatorStrong edge · 40¢ market vs 55¢ model · ½-Kelly
Brier score + Murphy decomposition#
BRIER = REL − RES + UNC
The Brier score is the mean squared error of probabilistic forecasts: BS = mean((fᵢ − oᵢ)²). Lower is better; zero is perfect; the climatology baseline is ō·(1 − ō).
Murphy's decomposition splits BRIER into three meaningful parts: REL (reliability — how miscalibrated the model is; lower is better), RES (resolution — how decisively the model separates outcomes; higher is better), and UNC (uncertainty — the irreducible base-rate term).
The reliability diagram bins forecasts and plots the observed frequency against the stated probability. On-diagonal points mean perfectly calibrated. The sharpness histogram below the diagonal shows how often each forecast level is used.
Ref: A. H. Murphy, 1973 · J. Applied Meteorology
▶ See this live in the simulatorSynthetic forecast ledger viewer (reseed to redraw)
ROC curve + AUC#
AUC = ∫₀¹ TPR d(FPR)
ROC plots true positive rate against false positive rate as the decision threshold sweeps from 1 to 0. AUC is the area underneath: 0.5 = coin-flip, 1.0 = perfect ranker. It measures ranking power — independent of calibration. A model with AUC = 0.9 but bad calibration can be rescued by Platt scaling; a model with AUC = 0.55 cannot.
▶ See this live in the simulatorROC + reliability + Murphy bars on a worked example
Bayesian Beta posterior#
θ ~ Beta(α, β), κ = α + β
Conjugate prior for a Bernoulli likelihood. We model your stated probability as the mean of a Beta posterior with concentration κ = q(1−q)/σ² − 1 driven by your stated 1σ uncertainty.
The 95% credible interval is the highest-density region containing 95% of the posterior mass — computed via cumulative trapezoidal integration over a 1000-point grid. If the market price falls outside this band, the disagreement is statistically meaningful.
▶ See this live in the simulatorTight posterior (σ=0.04) where market price falls outside the 95% CI
Binary entropy + KL divergence#
H(p) = −p·log₂ p − (1−p)·log₂(1−p)
Shannon's binary entropy peaks at 1 bit when p = 0.5 (maximal doubt) and falls to 0 at the boundaries (certainty).
KL divergence D_KL(q ‖ p) = q·ln(q/p) + (1−q)·ln((1−q)/(1−p)) measures the information your belief q adds beyond the market price p— the theoretical upper bound on exploitable edge. Zero KL ⇒ you know nothing the crowd doesn't. The branch decomposition splits KL into the YES contribution and the NO contribution.
▶ See this live in the simulatorBelief 70¢ vs market 50¢ — substantial KL (signal)
GARCH(1,1) conditional volatility#
σ²ₜ = ω + α·r²ₜ₋₁ + β·σ²ₜ₋₁
Volatility-clustering model. The conditional variance at time t is a weighted sum of yesterday's squared shock and yesterday's variance. Persistence α + β near 1 ⇒ shocks decay slowly; calm and turbulent regimes cluster. Position sizing must track the conditional σ, not the unconditional average.
Ref: T. Bollerslev, 1986 · J. Econometrics
ADF + KPSS unit-root tests#
The Augmented Dickey-Fuller (ADF) test has H₀: the series has a unit root (non-stationary, behaves like a random walk). Reject at the 5% level if the t-statistic is below ≈ −2.86.
The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test has the opposite null: H₀: the series is level-stationary. Reject at 5% if the statistic exceeds 0.463.
The cleanest verdict comes from running both: rejecting one and failing to reject the other gives an unambiguous answer. Rejecting both ⇒ the series sits in a grey zone.
Variance ratio test#
VR(q) = Var(rq) / (q · Var(r₁))
Under a random walk, VR(q) ≈ 1 for all q. VR > 1 means positive serial correlation (trending — long horizons are riskier than the IID assumption suggests); VR < 1 means mean reversion (long horizons are tamer). We use Lo-MacKinlay's asymptotic z-statistic for the significance test.
Ref: A. Lo + A. MacKinlay, 1988 · Review of Financial Studies
Hurst exponent (R/S analysis)#
log(R/S) = H · log(n) + c
Rescaled-range analysis estimates long-memory: H > 0.5 ⇒ persistent / trending, H < 0.5 ⇒ anti-persistent / mean-reverting, H ≈ 0.5 ⇒ memoryless random walk. Estimated via OLS on the log-log plot of R/S vs window size n.
Engle-Granger cointegration#
Two non-stationary series can still have a stationary linear combination — they are cointegrated. Step 1: OLS regression y = α + β·x + ε. Step 2: ADF on the residuals. Reject H₀ (no cointegration) at 5% if the residual ADF statistic is below ≈ −3.34.
Useful for pairs-trading: when the cointegration relationship spreads, you can short one leg and buy the other expecting reversion.
Ref: R. Engle + C. Granger, 1987 · Econometrica
VaR + Expected Shortfall (CVaR)#
VaRα is the α-quantile loss: on the worst (1−α)% of bars you lose at least VaRα. CVaRα (also called Expected Shortfall) is the average loss conditional on being in that tail. CVaR is always ≥ VaR and is the number that actually ruins accounts. Under normality, ES95 / VaR95≈ 1.25; ratios > 1.5 signal fat tails.
▶ See this live in the simulatorWide model σ + half-Kelly — VaR/CVaR widen in MC
Monte-Carlo equity simulation#
For binary positions we sample N parallel careers of K bets each, drawing the true probability from N(q, σ²) each bet, then resolving against the market price at deployed-Kelly leverage. For continuous returns (perps) we bootstrap with replacement from the observed return distribution — this preserves the fat tails the parametric Gaussian misses.
Outputs include the percentile fan (5/25/50/75/95), VaR95, CVaR95, maximum drawdown of the median path, terminal-wealth distribution, and the ruin rate (fraction of paths that ever crossed the 50% bankroll floor).
▶ See this live in the simulatorFull Kelly with realistic σ — observe ruin frequency