Blind Data Testing
Blind data testing is the closest approximation to live trading that backtesting can provide. You develop and validate your strategy on one dataset, then test it exactly once on completely unseen data.
Why blind data testing exists
Walk-forward testing provides strong validation by testing on OOS windows throughout the dataset. But those OOS windows are “seen” in the sense that the researcher knows the approximate market conditions in each period. Blind data testing goes further by using data the researcher has genuinely never analyzed.
Critically: you run the blind test exactly once. If you adjust parameters after seeing the result and re-run, the blind dataset is no longer blind — it has become part of your optimization process.
The correct workflow
- 1
Choose your datasets before starting
Decide in advance which data will be Dataset A (development) and Dataset B (blind test). Common split: Dataset A = 2010–2022, Dataset B = 2023–present. Do not look at Dataset B statistics before locking your strategy.
- 2
Fully develop and validate on Dataset A
Run all your backtests, walk-forward validation, and Monte Carlo simulation on Dataset A. Iterate and optimize until you are satisfied. This is your development set — use it freely.
- 3
Lock your parameters
Choose the final parameter set from your Dataset A analysis. Write it down, save it, and commit to not changing it after the blind test.
- 4
Run Dataset B once
Apply your locked parameters to Dataset B. Run it once. Do not re-run with any adjustments.
- 5
Interpret the result
Compare Dataset B performance to your Dataset A walk-forward OOS performance. Within 30–50% is acceptable. Below 50% of expectations suggests the strategy may not generalize well.
Dataset selection guidelines
Time-based split
Most common approach. Use earlier data for Dataset A, recent data for Dataset B. Ensures Dataset B includes current market conditions.
Asset-based split
Develop on one set of tickers, test on a different set of tickers in the same universe. Useful for testing whether a strategy generalizes across securities.
Minimum Dataset B size
Dataset B should be long enough to generate at least 20–30 trades. A 3-month blind dataset on a strategy that trades weekly may not have enough trades to be meaningful.
Interpreting results
Important: the one-run rule
The integrity of blind data testing depends entirely on running it once with locked parameters. If you see poor Dataset B results and then adjust your strategy and re-test on Dataset B, you are now optimizing on what was supposed to be your blind set. The result is no longer a blind test. If you want to investigate and refine after a failed blind test, treat the current Dataset B as a new Dataset A and find a genuinely new dataset for the next blind test.