Blind Data Testing

Blind data testing is the closest approximation to live trading that backtesting can provide. You develop and validate your strategy on one dataset, then test it exactly once on completely unseen data.

Why blind data testing exists

Walk-forward testing provides strong validation by testing on OOS windows throughout the dataset. But those OOS windows are “seen” in the sense that the researcher knows the approximate market conditions in each period. Blind data testing goes further by using data the researcher has genuinely never analyzed.

Critically: you run the blind test exactly once. If you adjust parameters after seeing the result and re-run, the blind dataset is no longer blind — it has become part of your optimization process.

The correct workflow

  1. 1

    Choose your datasets before starting

    Decide in advance which data will be Dataset A (development) and Dataset B (blind test). Common split: Dataset A = 2010–2022, Dataset B = 2023–present. Do not look at Dataset B statistics before locking your strategy.

  2. 2

    Fully develop and validate on Dataset A

    Run all your backtests, walk-forward validation, and Monte Carlo simulation on Dataset A. Iterate and optimize until you are satisfied. This is your development set — use it freely.

  3. 3

    Lock your parameters

    Choose the final parameter set from your Dataset A analysis. Write it down, save it, and commit to not changing it after the blind test.

  4. 4

    Run Dataset B once

    Apply your locked parameters to Dataset B. Run it once. Do not re-run with any adjustments.

  5. 5

    Interpret the result

    Compare Dataset B performance to your Dataset A walk-forward OOS performance. Within 30–50% is acceptable. Below 50% of expectations suggests the strategy may not generalize well.

Dataset selection guidelines

Time-based split

Most common approach. Use earlier data for Dataset A, recent data for Dataset B. Ensures Dataset B includes current market conditions.

Asset-based split

Develop on one set of tickers, test on a different set of tickers in the same universe. Useful for testing whether a strategy generalizes across securities.

Minimum Dataset B size

Dataset B should be long enough to generate at least 20–30 trades. A 3-month blind dataset on a strategy that trades weekly may not have enough trades to be meaningful.

Interpreting results

StrongDataset B Sharpe ≥ 80% of Dataset A walk-forward OOS
AcceptableDataset B Sharpe 50–80% of Dataset A walk-forward OOS
Investigate furtherDataset B Sharpe 30–50% of Dataset A walk-forward OOS
Red flagDataset B Sharpe < 30% of Dataset A walk-forward OOS

Important: the one-run rule

The integrity of blind data testing depends entirely on running it once with locked parameters. If you see poor Dataset B results and then adjust your strategy and re-test on Dataset B, you are now optimizing on what was supposed to be your blind set. The result is no longer a blind test. If you want to investigate and refine after a failed blind test, treat the current Dataset B as a new Dataset A and find a genuinely new dataset for the next blind test.