An engineer watching two of our own instruments — Candlevision and the Exhaustion Lab — placed a pair of paper wagers after every chart refresh, a call and a put, sized them by hand, trailed them as the price moved, and watched a fictional bankroll climb $81 → $114 at a roughly 80% hit rate. That is a textbook result — but a paper one: no spread was paid, no order arrived late, no fill slipped. The single engineering question this lab exists to answer is the one the engineer actually asked:
Does this paper edge survive contact with real-world execution friction — spread, slippage, and execution delay?
The TRAIL coefficient (§5) reduces that question to one number. Everything else on the page is plumbing in service of measuring it honestly. This is simulation, not advice: no real orders are placed, the bankroll is fictional, and a result of "the edge dies in friction" is treated as a perfectly valid scientific finding, not a failure.
On every tick — once per loop_every_s (default 30 s) — the lab
captures one chart frame, runs the binary engine to obtain a
direction and a confidence, and then opens
both legs of a straddle: a call (long) wager and a
put (short) wager, every single time, regardless of which way the
engine leans. We never skip a leg.
Opening both legs unconditionally is deliberate. The engine's edge, if it
exists, lives in how it sizes the two legs (§3), not in a binary
"trade / don't-trade" gate. Holding the always-both-legs rule fixed makes the
sizing tilt the only free variable — exactly the quantity under study. Each leg
carries an expiry_ts = entry_ts + horizon_seconds (default 60 s) and
is graded by a background resolver against the underlying's actual close at
expiry.
With bankroll = $1000, a per-leg cap f_max = 0.05, and
the engine reporting 70% long (p = 0.68 in the canonical
example), the two stakes come out to:
That $50 — exactly bankroll × f_max — is the same whether the engine
is 50/50 or 78% convinced. This is the scaled-up version of the $1–$5-on-$85
sizing the engineer was doing by instinct.
The sizing rule is a probability-matched application of the Kelly criterion
(Kelly 1956). For a single binary wager at net odds b with win
probability p and loss probability q = 1 − p, Kelly
prescribes the growth-optimal fraction:
Full Kelly is famously volatile, so practitioners stake a constant fraction of
it — Thorp (1969) popularized exactly this fractional-Kelly discipline. We fix a
conservative per-leg ceiling f_max = 0.05 (the "¼-Kelly" budget) and
split it between the two mutually exclusive outcomes in proportion to each
outcome's own probability. Writing p for the engine's long-side
confidence:
The elegant property falls straight out of summing the legs:
Total capital at risk per tick is constant — it never depends on
p. Confidence only redistributes risk between the legs. At
p = 0.50 the legs are equal (a true straddle); at p =
0.78 the call gets $39 and the put $11 — still carried as cheap insurance
against being wrong. The reference implementation is the pure function
compute_leg_sizes(p_long, bankroll, f_max) in
trail_lab_service.py.
p is always the long-side
probability. When the engine reports direction == "short" with some
confidence c, the caller must pass p_long = 1 − c, so
that a strong short conviction correctly tilts size into the put leg.
Friction is what separates the pristine paper world from a real fill. The lab applies three penalties, configured per session at arm-time:
| Penalty | Symbol | Effect |
|---|---|---|
| Spread | spread_pct |
Multiplicative haircut on stake at entry:
stake_realistic = stake_pristine × (1 − spread_pct). |
| Slippage | slippage_bps |
A symmetric band (in basis points) widening the price each leg must clear to win. |
| Exec delay | exec_delay_ms |
Models the lag between the engine's tick and the actual fill, so the realistic entry is struck against a slightly later price. |
Spread is multiplicative, never additive — a spread is a percentage of notional,
not a flat subtraction. The same multiplicative logic defines the win
conditions. With slippage_pct derived from
slippage_bps, a leg is graded a winner exactly when:
The symmetric friction band is the realistic-side math. The pristine side
ignores friction entirely and grades on the bare comparison
close > entry (call) or close < entry (put). Two
P/L lines are tracked side by side in the same session from the same observations
— pnl_pristine and pnl_realistic — so the only
difference between them is friction.
The headline metric is the running ratio of the two P/L lines:
It reads as cents of real edge retained per dollar of paper edge. Interpretation thresholds:
| TRAIL_coeff | Reading |
|---|---|
| ≥ 0.6 | Real edge that survives friction. Worth thinking about. |
| 0.0 – 0.6 | Works on paper; friction eats most of the edge. Borderline. |
| < 0.0 | Loses money once friction is applied. A pure paper-trading artifact. |
pnl_pristine == 0 — division by zero, and also genuinely
meaningless, since "fraction of edge retained" has no content when there is no
paper edge yet. The lab reports None and the page shows —
until enough legs have resolved that pnl_pristine ≠ 0. We do not
substitute a placeholder number; an honest blank beats a fabricated ratio.
The lab's founding mission is that the engine infers edge from the screen capture. Phase 1 honoured the friction math but read prices from numeric OHLC providers — fine for major FX and provider-covered crypto, useless for the brokers the operator actually wanted to study: Pocket Option OTC synthetics and broker-internal instruments that appear in no public feed. For those the OHLC call errors out or returns a stale array, so the engine emits a frozen verdict while the screen shows new candles forming.
Vision Mode makes the screenshot the source of truth. A session armed with
vision_mode=true sends each captured frame to
vision_extractor.extract_chart_state(), which prompts a multimodal
model (OpenAI GPT-4o, 2024) to read the rightmost candles and return strict JSON:
{direction, confidence, last_close, last_open, consecutive_green/red,
support, resistance}. The extractor clamps confidence to [0.5,
0.95], derives p_long, and hands the backend the same
engine block and entry_price the OHLC path produces. From there
nothing changes: compute_leg_sizes, the asymmetric ¼-Kelly tilt, and
the sunk-cost friction model are byte-for-byte identical.
Why this is still honest math. The friction algebra never inspects where
entry_price and close_price came from. As long as both
sides of a comparison are read from the same instrument — the screen at
observe-time and the screen re-captured at expiry-time — the realistic book
subtracts the same stake × friction_pct it always did, so
bankroll_pristine ≥ bankroll_realistic holds by the identical
derivation as §4. Vision changes the data source, not the accounting.
ok=False — never a fabricated
number — when it cannot read last_close, and unresolvable legs fall
through to the resolver's status:"stale" path (counted, zero P/L),
exactly as a dead OHLC feed does. A misread is a declined measurement, not
a silent corruption of the coefficient.