TRAIL Lab — Trailing Reality-Adjusted InferenceMETHODOLOGY

A short, honest whitepaper · 2026-06-13 · ~5 min read


1. The question

An engineer watching two of our own instruments — Candlevision and the Exhaustion Lab — placed a pair of paper wagers after every chart refresh, a call and a put, sized them by hand, trailed them as the price moved, and watched a fictional bankroll climb $81 → $114 at a roughly 80% hit rate. That is a textbook result — but a paper one: no spread was paid, no order arrived late, no fill slipped. The single engineering question this lab exists to answer is the one the engineer actually asked:

Does this paper edge survive contact with real-world execution friction — spread, slippage, and execution delay?

The TRAIL coefficient (§5) reduces that question to one number. Everything else on the page is plumbing in service of measuring it honestly. This is simulation, not advice: no real orders are placed, the bankroll is fictional, and a result of "the edge dies in friction" is treated as a perfectly valid scientific finding, not a failure.

2. The straddle mechanic

On every tick — once per loop_every_s (default 30 s) — the lab captures one chart frame, runs the binary engine to obtain a direction and a confidence, and then opens both legs of a straddle: a call (long) wager and a put (short) wager, every single time, regardless of which way the engine leans. We never skip a leg.

Opening both legs unconditionally is deliberate. The engine's edge, if it exists, lives in how it sizes the two legs (§3), not in a binary "trade / don't-trade" gate. Holding the always-both-legs rule fixed makes the sizing tilt the only free variable — exactly the quantity under study. Each leg carries an expiry_ts = entry_ts + horizon_seconds (default 60 s) and is graded by a background resolver against the underlying's actual close at expiry.

2.1 Worked example

With bankroll = $1000, a per-leg cap f_max = 0.05, and the engine reporting 70% long (p = 0.68 in the canonical example), the two stakes come out to:

call_stake = 1000 × 0.05 × 0.68 = $34
put_stake = 1000 × 0.05 × (1 − 0.68) = $16
─────────────────────────────────
total at risk this tick = $50

That $50 — exactly bankroll × f_max — is the same whether the engine is 50/50 or 78% convinced. This is the scaled-up version of the $1–$5-on-$85 sizing the engineer was doing by instinct.

3. The asymmetric ¼-Kelly tilt

The sizing rule is a probability-matched application of the Kelly criterion (Kelly 1956). For a single binary wager at net odds b with win probability p and loss probability q = 1 − p, Kelly prescribes the growth-optimal fraction:

f* = (b·p − q) / b

Full Kelly is famously volatile, so practitioners stake a constant fraction of it — Thorp (1969) popularized exactly this fractional-Kelly discipline. We fix a conservative per-leg ceiling f_max = 0.05 (the "¼-Kelly" budget) and split it between the two mutually exclusive outcomes in proportion to each outcome's own probability. Writing p for the engine's long-side confidence:

call_stake = bankroll × f_max × p
put_stake = bankroll × f_max × (1 − p)

The elegant property falls straight out of summing the legs:

call_stake + put_stake = bankroll × f_max × (p + (1 − p)) = bankroll × f_max

Total capital at risk per tick is constant — it never depends on p. Confidence only redistributes risk between the legs. At p = 0.50 the legs are equal (a true straddle); at p = 0.78 the call gets $39 and the put $11 — still carried as cheap insurance against being wrong. The reference implementation is the pure function compute_leg_sizes(p_long, bankroll, f_max) in trail_lab_service.py.

Sign convention. p is always the long-side probability. When the engine reports direction == "short" with some confidence c, the caller must pass p_long = 1 − c, so that a strong short conviction correctly tilts size into the put leg.

4. The friction model

Friction is what separates the pristine paper world from a real fill. The lab applies three penalties, configured per session at arm-time:

PenaltySymbolEffect
Spreadspread_pct Multiplicative haircut on stake at entry: stake_realistic = stake_pristine × (1 − spread_pct).
Slippageslippage_bps A symmetric band (in basis points) widening the price each leg must clear to win.
Exec delayexec_delay_ms Models the lag between the engine's tick and the actual fill, so the realistic entry is struck against a slightly later price.

Spread is multiplicative, never additive — a spread is a percentage of notional, not a flat subtraction. The same multiplicative logic defines the win conditions. With slippage_pct derived from slippage_bps, a leg is graded a winner exactly when:

call_leg WIN iff close(expiry) > entry_price × (1 + spread_pct + slippage_pct)
put_leg WIN iff close(expiry) < entry_price × (1 − spread_pct − slippage_pct)

The symmetric friction band is the realistic-side math. The pristine side ignores friction entirely and grades on the bare comparison close > entry (call) or close < entry (put). Two P/L lines are tracked side by side in the same session from the same observations — pnl_pristine and pnl_realistic — so the only difference between them is friction.

5. The TRAIL coefficient

The headline metric is the running ratio of the two P/L lines:

TRAIL_coeff = pnl_realistic / pnl_pristine

It reads as cents of real edge retained per dollar of paper edge. Interpretation thresholds:

TRAIL_coeffReading
≥ 0.6 Real edge that survives friction. Worth thinking about.
0.0 – 0.6 Works on paper; friction eats most of the edge. Borderline.
< 0.0 Loses money once friction is applied. A pure paper-trading artifact.
Undefined at the origin. The ratio is undefined when pnl_pristine == 0 — division by zero, and also genuinely meaningless, since "fraction of edge retained" has no content when there is no paper edge yet. The lab reports None and the page shows until enough legs have resolved that pnl_pristine ≠ 0. We do not substitute a placeholder number; an honest blank beats a fabricated ratio.

6. Vision Mode (Phase 1.5)

The lab's founding mission is that the engine infers edge from the screen capture. Phase 1 honoured the friction math but read prices from numeric OHLC providers — fine for major FX and provider-covered crypto, useless for the brokers the operator actually wanted to study: Pocket Option OTC synthetics and broker-internal instruments that appear in no public feed. For those the OHLC call errors out or returns a stale array, so the engine emits a frozen verdict while the screen shows new candles forming.

Vision Mode makes the screenshot the source of truth. A session armed with vision_mode=true sends each captured frame to vision_extractor.extract_chart_state(), which prompts a multimodal model (OpenAI GPT-4o, 2024) to read the rightmost candles and return strict JSON: {direction, confidence, last_close, last_open, consecutive_green/red, support, resistance}. The extractor clamps confidence to [0.5, 0.95], derives p_long, and hands the backend the same engine block and entry_price the OHLC path produces. From there nothing changes: compute_leg_sizes, the asymmetric ¼-Kelly tilt, and the sunk-cost friction model are byte-for-byte identical.

Why this is still honest math. The friction algebra never inspects where entry_price and close_price came from. As long as both sides of a comparison are read from the same instrument — the screen at observe-time and the screen re-captured at expiry-time — the realistic book subtracts the same stake × friction_pct it always did, so bankroll_pristine ≥ bankroll_realistic holds by the identical derivation as §4. Vision changes the data source, not the accounting.

Failure modes, handled honestly. Vision can misread an auto-scaled Y-axis, or fail entirely when overlays/panels cover the rightmost candles. Two guards contain this: the extractor returns ok=False — never a fabricated number — when it cannot read last_close, and unresolvable legs fall through to the resolver's status:"stale" path (counted, zero P/L), exactly as a dead OHLC feed does. A misread is a declined measurement, not a silent corruption of the coefficient.

7. References

  1. Kelly, J. L. (1956). A New Interpretation of Information Rate. Bell System Technical Journal, 35(4), 917–926. link
  2. Thorp, E. O. (1969). Optimal Gambling Systems for Favorable Games. Review of the International Statistical Institute, 37(3), 273–293. link
  3. Almgren, R., & Chriss, N. (2001). Optimal Execution of Portfolio Transactions. Journal of Risk, 3(2), 5–39. link
  4. Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.), Ch. 6: Gambling and Data Compression. Wiley.
  5. OpenAI (2024). GPT-4o System Card / Hello GPT-4o. The multimodal vision backbone behind Vision Mode (§6). link