Live trading is honest, but it is slow. If Ladder takes one or two trades a day, it can take forever to know if a setup is real. If we force trades just to collect data, then we are not testing anymore. We are just donating money with extra steps.
So we built Replay Lab. Not a magic backtest, not a money printer simulator, just a filter. Its job is simple:
bad ideas die here
good ideas earn the next test
What Did Not Work
Raw ideas look very smart on charts. "Price reclaimed VWAP." Nice. "EMA support." Very nice. "Volume came in." Now it sounds like a trader with a podcast.
But when we replayed raw setups, many of them were just noise wearing a clean shirt. Backtests can lie with a straight face. They know the future, they do not pay spread, they do not panic on a fast candle, and they do not get partial fills. Very convenient life.
So the first rule became:
Replay must only use what Ladder could know at that moment.
No future candle. No perfect exit. No fake fill. No "it would have worked if we entered at the best tick." That is not research. That is fan fiction.
What Worked Better
The useful finding came from VWAP Reclaim. At first we tested the broad idea:
price crosses back above VWAP = possible long
It worked a little, but not enough. Then we tightened the setup and asked for more structure before calling it a real trade:
if reclaimed_vwap:
if held_below_vwap_first:
if ema_support_present:
if stop_is_realistic and reward_risk_ok:
promote("qualified_vwap_reclaim")
That changed the result. One replay report showed the shape very clearly:
Raw VWAP Reclaim
402 trades
39% win rate
+0.18 avg R
Qualified VWAP Reclaim
50 trades
58% win rate
+0.79 avg R
Holdout sample
17 trades
59% win rate
+0.85 avg R
This is the kind of finding I care about. Not "AI liked the chart." Not "the candle looked bullish." A repeatable method improved when we added structure.
The Big Lesson
A setup name is not a strategy. "VWAP Reclaim" is a label. "VWAP Reclaim after pressure below VWAP, with reclaim, support, stop room, and acceptable reward/risk" is closer to a strategy.
That difference matters. One is a sentence. The other can be tested.
Where AI Fits
Replay does not remove the AI. It keeps the AI honest.
The AI can still ask whether the move is early or late, whether the market is helping or fighting, whether volume is real, whether this is a trap, and whether the setup should be watched instead of traded. But the AI should not invent structure because a chart looks exciting.
That is how we get expensive poetry.
The Promotion Path
This is the path I want Ladder to follow:
idea
-> reason it should work
-> replay test
-> holdout replay
-> shadow-live tracking
-> tiny paper execution
-> bigger paper sample
-> real money only after explicit approval
Slow? Yes. But markets punish excitement. Replay is how we stay ambitious without being dumb about it.
Verdict
Replay Lab works. Not because it proves we will make money, but because it tells us which ideas are not worth live damage.
The main finding so far is simple:
raw signal = too noisy
qualified structure = much better
That is a real step forward. Not the finish line, but finally something we can build on.