Binary Exhaustion Lab — hybrid_v1METHODOLOGY

A short, honest whitepaper · 2026-06-11 · ~6 min read

Contents
  1. What this is, and what it is not
  2. The pipeline, top to bottom
  3. The math, briefly
  4. Calibration honesty
  5. Wager sizing
  6. Known limitations
  7. CHOP BUSTER — the non-directional addendum
  8. References

1. What this is, and what it is not

The Binary Exhaustion Lab is a research workbench for studying whether a deterministic candle-exhaustion screen reader, optionally cross-checked by a multimodal LLM, can predict the directional outcome of the next N bars on a single instrument with a measurable, calibrated edge. Every prediction is auto-resolved against the actual OHLC close and recorded; the lab learns its own win rate over time rather than asserting one.

This is simulation, not advice. No real orders are placed. Wagers are paper, bankroll is fictional, all P&L is for study. The interesting question the lab tries to answer rigorously is: does the heuristic stack have an edge, how big is it, and how does its confidence relate to reality?

2. The pipeline, top to bottom

On each observation the lab executes a six-stage pipeline. The first stage is the original deterministic engine; the rest are the "hybrid_v1" additions distilled from a four-way agent competition held on this lab.

#StageWhat it doesWhy it's there
1Engine prior Runs binary_exhaustion_engine.analyze_binary_exhaustion: consecutive-candle counter, MA stacking (9/20/50/200), distance-from-mean in ATR units, ZigZag swing context. Outputs a bull-probability in [0,1]. Unmodified domain heuristic, kept as the prior so all later additions are auditable changes.
2Regime detection Computes Kaufman's Efficiency Ratio (ER) on the last 10 and 30 closes; labels the regime trending, mixed, or choppy and produces two smoothly interpolated feature weights trend_weight and revert_weight. Mean-reversion signals are noise in strong trends; momentum signals are noise in chop. The regime gate down-weights each accordingly.
3Logit-space ensemble Adds five regime-gated independent features to the prior: RSI(14) reversion, ATR-normalized stretch from EMA(20), 3-bar ATR momentum, last-bar wick rejection, and a higher-timeframe MTF bias (EMA9 vs EMA21). All contributions are summed in logit space, then squashed through a sigmoid. No single feature can saturate the estimate (an 88% honesty clamp caps heuristic-only confidence). Each feature is auditable in the factors[] response field.
4Vision LLM fold If a screenshot is provided, a multimodal model is asked to reply with Direction / Confidence / Pattern / Reasoning in a structured block. Its vote is folded into the ensemble with bounded weight (~0.35 in logit space) — never as a hard override. The LLM sees the picture (gaps, wicks, structure breaks) the deterministic engine cannot. But it is also the noisiest input, so its weight is bounded and clamped.
5PAV isotonic calibration Every resolved (won/lost) observation in MongoDB updates an isotonic regression that maps raw heuristic confidence to realized win rate. Below 25 resolved samples the calibrator is identity; above that it becomes the truth-teller. "75% confidence" actually meaning 75% over the long run requires a non-parametric, monotone map from claim to reality. Pool-Adjacent-Violators is the standard estimator.
6Drawdown-braked ¼-Kelly sizing Wager sizing inputs are the per-confidence-band Wilson lower bound (so a thin bucket can't claim a fat edge) and the Kelly fraction is multiplied by a drawdown brake (linear ramp, floor 0.25) and a losing-streak damper (×0.5 after 3 consecutive losses). Kelly assumes p is known. We estimate p. The brake accepts that estimation, which prevents catastrophic ramp-up during structural regime shifts the engine hasn't adapted to yet.

3. The math, briefly

3.1 Kaufman Efficiency Ratio (regime detector)

ER(n) = |close[t] − close[t−n]|  /  Σ |close[i] − close[i−1]|     (i = t−n+1 … t)

ER → 1 is a perfect trend, ER → 0 is pure chop. We blend ER(10) and ER(30) at 60/40 and map the result to two weights trend_weight ∈ [0.35, 1.0] and revert_weight = 1 − (trend_weight − 0.35).

3.2 Logit-space ensemble

z   = 0.55 · logit(p_engine)
    + ∑  contrib_i  (each contrib gated by regime weight)
p   = σ(z)
p   ∈ [0.12, 0.88]                   (honesty clamp)

Each contrib_i is a small, bounded shift (typically ±0.2 to ±0.5 in logit units). The clamp prevents any combination of heuristics from claiming more than 88% certainty — that level of confidence has to be earned by calibration data, not asserted by an indicator.

3.3 Vision fold (when present)

z'  = (1 − w) · logit(p)  +  w · logit(p_LLM)     w = 0.35

3.4 Pool-Adjacent-Violators (PAV) isotonic regression

Given resolved samples (c_i, y_i) with y_i ∈ {0,1} sorted by claimed confidence c_i, PAV produces the monotone non-decreasing function that minimizes squared error. Algorithmically, sweep left to right and pool any adjacent pair whose right value is below the left, repeat until monotone. We interpolate linearly between fitted knots. Identity below 25 samples avoids overfitting.

3.5 Wilson score lower bound (per-bucket truth)

p_lower(w, l, z=1.96)
    = ((p + z²/2n) − z · √(p(1−p)/n + z²/4n²)) / (1 + z²/n)
   where  p = w/(w+l),  n = w+l

This is the conservative true win rate of each confidence band given n samples. A bucket with n < 50 is marked insufficient and forces the floor wager regardless of how good looks.

3.6 Fractional Kelly with drawdown brake

f_full     = (b · p − q) / b              # b = payout ≈ 0.85, q = 1−p
f          = f_full · 0.25 · brake · streak       # ¼-Kelly + brake
brake      = max(0.25, 1 − 2 · drawdown_pct)      # 25% floor
streak     = 0.5  if losing_streak ≥ 3 else 1.0
size       = clip(f, floor=0.001, cap=0.02)       # 0.1% … 2% of bankroll

4. Calibration honesty

The single hardest thing about a confidence number is that it must be true. We make four explicit honesty commitments:

  1. The 88% logit clamp means no purely-heuristic configuration can claim >88% certainty. Higher confidence has to come from real resolved-sample evidence via PAV.
  2. The per-bucket Wilson lower bound means a 75% point estimate on n=8 samples becomes p_lower≈0.41, which crosses below the Kelly break-even and forces a floor wager. Thin evidence cannot fund a fat bet.
  3. The identity calibrator below 25 samples is openly displayed (calibrator_active: false). We don't pretend we have an isotonic fit when we don't.
  4. Every observation stores the full factors[] trail including each contribution's signed magnitude. Anyone can reconstruct why a verdict said what it said. Black box, this is not.

5. Wager sizing in plain English

Sizing answers one question: given everything I know about this confidence band's history, how much would a sane fractional-Kelly bettor risk? The lab's answer is roughly:

6. Known limitations


7. CHOP BUSTER — the non-directional addendum

Everything in sections 1–6 above is about directional conviction: "the next bar closes UP (or DOWN)". But every directional engine has the same failure mode — the regime where price is genuinely undecided. On a chart where the moving averages have collapsed into a squeeze and price is ribboning sideways, even an 80%-confidence directional call is, in information-theoretic terms, noise.

The CHOP BUSTER card sitting below the directional wager on the lab page is a deliberate second opinion designed for exactly that regime. It is a non-directional bet: instead of guessing whether price goes up or down, it asks "does price stay between two specific strikes for the next N bars?". When the regime detector (§3.1) reports compressed / squeeze / mixed, the chop-buster arms; otherwise it sits in a dimmed STANDBY state that shows the user what bracket it would propose if conditions flipped. It is never auto-charged.

7.1 The math: binary atoms as a complete basis

The chop-buster is a synthetic short strangle expressed as two of our binary atoms with the sign flipped:

By Breeden–Litzenberger (1978), the risk-neutral density of the underlying is recoverable from the second derivative of a continuum of European call prices in strike. Carr–Madan (1998) generalises this: any payoff function on the terminal underlying can be replicated as a portfolio of digitals at every strike. A short strangle is the simplest non-trivial case of this — three regions of payoff (lose, win, lose), constructed from exactly two digital sells. We're not pricing options; we are using the payoff structure as a hypothesis-test instrument.

7.2 Why this is honest about its own edge

The pstay figure on the card is not theoretical. It is an empirical, walk-forward sample: for the most recent 60 sliding windows of horizon bars in the current OHLC history, count what fraction stayed strictly inside a ±0.8·ATR bracket around the window's starting close. If the bracket got blown through, count it as a loss; otherwise a win. The card publishes that number front-and-centre, e.g.:

p(stay inside) = 7%  ·  60-window historical sample  ·  horizon 3 × 1m

A 7% number is loud and embarrassing on purpose. It is telling the user exactly what is true: this specific bracket, at these specific strikes, on this specific chart, gets killed 93% of the time historically — do not fund it with real conviction. The wager sizer agrees and floors the position to 0.1% of bankroll. The card is "ARMED" only in the sense that the regime qualifies (sideways), not in the sense that the math endorses the trade. This distinction is intentional: showing the user the regime-detector state separately from the historical-edge state turns a single confusing recommendation into two pieces of information they can reason about.

7.3 Sizing: ½-Kelly, half the cap, two reasons

When the historical sample does support a bet (pstay ≥ 60% on the 60-window sample), the chop-buster sizes at ½-Kelly (vs. the directional engine's ¼-Kelly) and caps at 1% of bankroll (vs. the directional engine's 2%). The two reasons:

  1. Asymmetric payoff geometry. Selling a strangle wins small and often, but the loss when the bracket is blown is up to the full premium at once. This is the classic short-vol fat-tail problem and the standard Mertonian remedy is a smaller bet size, not a higher one. Halving the cap is the conservative response.
  2. Regime-detector half-life. The "compressed" label that arms the card has a half-life: the engine notices the squeeze after it has already begun. The bet is therefore being taken with a tailwind that may already be fading. Smaller sizing absorbs this without blowing the bankroll.

The sizer formula:

b      = 0.85                              (binary-style payout)
f_full = (b·p_stay − (1 − p_stay)) / b     (full-Kelly fraction)
f_half = max(floor_pct, min(cap_pct,       (½-Kelly, clamped)
             f_full · 0.5))
size$  = bankroll · f_half

7.4 Two opinions, not two bets

The user is reminded by the UI that the directional wager and the chop-buster are two independent opinions on the same chart, not two legs of a hedge. They draw from the same bankroll but are not combined. A typical pattern when the engine fires both:

7.5 Limits specific to the chop buster


References

  1. Kaufman, P.J. (1995). Smarter Trading. McGraw-Hill. (Efficiency Ratio.)
  2. Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference." Journal of the American Statistical Association 22 (158): 209–212.
  3. Ayer, M., Brunk, H. D., Ewing, G. M., Reid, W. T., Silverman, E. (1955). "An empirical distribution function for sampling with incomplete information." Annals of Mathematical Statistics 26: 641–647. (PAV / isotonic regression.)
  4. Kelly, J. L. (1956). "A new interpretation of information rate." Bell System Technical Journal 35 (4): 917–926.
  5. Wang, X. et al. (2022). "Self-Consistency Improves Chain of Thought Reasoning in Language Models." arXiv:2203.11171. (Vote-strength ensembling of LLM outputs.)
  6. Breeden, D. T., Litzenberger, R. H. (1978). "Prices of State-Contingent Claims Implicit in Option Prices." Journal of Business 51 (4): 621–651. (Risk-neutral density from option prices — the formal justification for treating digitals as the basis of all payoffs.)
  7. Carr, P., Madan, D. (1998). "Towards a Theory of Volatility Trading." In Volatility: New Estimation Techniques for Pricing Derivatives (R. Jarrow, ed.), Risk Books, 417–427. (Static replication of arbitrary payoffs by portfolios of vanillas; foundation of the chop-buster atom.)
  8. Merton, R. C. (1969). "Lifetime Portfolio Selection under Uncertainty: The Continuous-Time Case." Review of Economics and Statistics 51 (3): 247–257. (Why fat-tailed downside payoffs justify smaller Kelly fractions than the textbook full-Kelly answer.)