Backtest Expert
Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
Core Philosophy
Goal: Find strategies that "break the least", not strategies that "profit the most" on paper.
Principle: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
When to Use This Skill
Use this skill when:
- - Developing or validating systematic trading strategies
- Evaluating whether a trading idea is robust enough for live implementation
- Troubleshooting why a backtest might be misleading
- Learning proper backtesting methodology
- Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
- Assessing parameter sensitivity and regime dependence
- Setting realistic expectations for slippage and execution costs
Backtesting Workflow
1. State the Hypothesis
Define the edge in one sentence.
Example: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."
If you can't articulate the edge clearly, don't proceed to testing.
2. Codify Rules with Zero Discretion
Define with complete specificity:
- - Entry: Exact conditions, timing, price type
- Exit: Stop loss, profit target, time-based exit
- Position sizing: Fixed $$, % of portfolio, volatility-adjusted
- Filters: Market cap, volume, sector, volatility conditions
- Universe: What instruments are eligible
Critical: No subjective judgment allowed. Every decision must be rule-based and unambiguous.
3. Run Initial Backtest
Test over:
- - Minimum 5 years (preferably 10+)
- Multiple market regimes (bull, bear, high/low volatility)
- Realistic costs: Commissions + conservative slippage
Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
4. Stress Test the Strategy
This is where 80% of testing time should be spent.
Parameter sensitivity:
- - Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
- Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
- Vary entry/exit timing by ±15-30 minutes
- Look for "plateaus" of stable performance, not narrow spikes
Execution friction:
- - Increase slippage to 1.5-2x typical estimates
- Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
- Add realistic order rejection scenarios
- Test with pessimistic commission structures
Time robustness:
- - Analyze year-by-year performance
- Require positive expectancy in majority of years
- Ensure strategy doesn't rely on 1-2 exceptional periods
- Test in different market regimes separately
Sample size:
- - Absolute minimum: 30 trades
- Preferred: 100+ trades
- High confidence: 200+ trades
5. Out-of-Sample Validation
Walk-forward analysis:
- 1. Optimize on training period (e.g., Year 1-3)
- Test on validation period (Year 4)
- Roll forward and repeat
- Compare in-sample vs out-of-sample performance
Warning signs:
- - Out-of-sample <50% of in-sample performance
- Need frequent parameter re-optimization
- Parameters change dramatically between periods
6. Evaluate Results
Questions to answer:
- - Does edge survive pessimistic assumptions?
- Is performance stable across parameter variations?
- Does strategy work in multiple market regimes?
- Is sample size sufficient for statistical confidence?
- Are results realistic, not "too good to be true"?
Decision criteria:
- - ✅ Deploy: Survives all stress tests with acceptable performance
- 🔄 Refine: Core logic sound but needs parameter adjustment
- ❌ Abandon: Fails stress tests or relies on fragile assumptions
Key Testing Principles
Punish the Strategy
Add friction everywhere:
- - Commissions higher than reality
- Slippage 1.5-2x typical
- Worst-case fills
- Order rejections
- Partial fills
Rationale: Strategies that survive pessimistic assumptions often outperform in live trading.
Seek Plateaus, Not Peaks
Look for parameter ranges where performance is stable, not optimal values that create performance spikes.
Good: Strategy profitable with stop loss anywhere from 1.5% to 3.0%
Bad: Strategy only works with stop loss at exactly 2.13%
Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
Test All Cases, Not Cherry-Picked Examples
Wrong approach: Study hand-picked "market leaders" that worked
Right approach: Test every stock that met criteria, including those that failed
Selective examples create survivorship bias and overestimate strategy quality.
Separate Idea Generation from Validation
Intuition: Useful for generating hypotheses
Validation: Must be purely data-driven
Never let attachment to an idea influence interpretation of test results.
Common Failure Patterns
Recognize these patterns early to save time:
- 1. Parameter sensitivity: Only works with exact parameter values
- Regime-specific: Great in some years, terrible in others
- Slippage sensitivity: Unprofitable when realistic costs added
- Small sample: Too few trades for statistical confidence
- Look-ahead bias: "Too good to be true" results
- Over-optimization: Many parameters, poor out-of-sample results
See references/failed_tests.md for detailed examples and diagnostic framework.
Available Reference Documentation
Methodology Reference
File: INLINECODE1
When to read: For detailed guidance on specific testing techniques.
Contents:
- - Stress testing methods
- Parameter sensitivity analysis
- Slippage and friction modeling
- Sample size requirements
- Market regime classification
- Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)
Failed Tests Reference
File: INLINECODE2
When to read: When strategy fails tests, or learning from past mistakes.
Contents:
- - Why failures are valuable
- Common failure patterns with examples
- Case study documentation framework
- Red flags checklist for evaluating backtests
Critical Reminders
Time allocation: Spend 20% generating ideas, 80% trying to break them.
Context-free requirement: If strategy requires "perfect context" to work, it's not robust enough for systematic trading.
Red flag: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.
Tool limitations: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).
Statistical significance: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.
Discretionary vs Systematic Differences
This skill focuses on systematic/quantitative backtesting where:
- - All rules are codified in advance
- No discretion or "feel" in execution
- Testing happens on all historical examples, not cherry-picked cases
- Context (news, macro) is deliberately stripped out
Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.
技能名称:backtest-expert-zc
详细描述:
回测专家
基于专业方法论的系统化交易策略回测方法,优先考虑稳健性而非乐观结果。
核心理念
目标:寻找“最不容易失效”的策略,而非纸上“利润最高”的策略。
原则:增加摩擦,压力测试假设,观察哪些能存活。如果策略在悲观条件下仍能成立,则更可能在实盘交易中有效。
何时使用此技能
在以下情况下使用此技能:
- - 开发或验证系统化交易策略
- 评估交易想法是否足够稳健以用于实盘
- 排查回测结果可能具有误导性的原因
- 学习正确的回测方法论
- 避免常见陷阱(曲线拟合、前视偏差、幸存者偏差)
- 评估参数敏感性和市场环境依赖性
- 设定对滑点和执行成本的现实预期
回测工作流程
1. 陈述假设
用一句话定义优势。
示例:“财报后跳空高开超过3%并在第一小时内回撤至昨日收盘价的股票,提供均值回归机会。”
如果无法清晰阐述优势,不要继续测试。
2. 零主观性编码规则
以完全精确性定义:
- - 入场:确切条件、时间、价格类型
- 出场:止损、止盈、基于时间的出场
- 仓位管理:固定金额、投资组合百分比、波动率调整
- 筛选条件:市值、成交量、行业、波动率条件
- 标的范围:哪些工具符合资格
关键:不允许主观判断。每个决策必须基于规则且明确无误。
3. 运行初始回测
测试范围:
- - 至少5年(最好10年以上)
- 多种市场环境(牛市、熊市、高/低波动率)
- 现实成本:佣金 + 保守滑点
检查初始结果的基本可行性。如果根本性失效,迭代假设。
4. 压力测试策略
这是应花费80%测试时间的地方。
参数敏感性:
- - 测试止损为基准值的50%、75%、100%、125%、150%
- 测试止盈为基准值的80%、90%、100%、110%、120%
- 将入场/出场时间变化±15-30分钟
- 寻找稳定表现的“平台区”,而非狭窄的尖峰
执行摩擦:
- - 将滑点增加至典型估计值的1.5-2倍
- 模拟最差成交情况(以卖价+1跳买入,以买价-1跳卖出)
- 添加现实的订单拒绝场景
- 使用悲观的佣金结构测试
时间稳健性:
- - 逐年分析表现
- 要求大多数年份具有正期望值
- 确保策略不依赖1-2个特殊时期
- 在不同市场环境中分别测试
样本量:
- - 绝对最小值:30笔交易
- 推荐值:100笔以上交易
- 高置信度:200笔以上交易
5. 样本外验证
滚动优化分析:
- 1. 在训练期(例如第1-3年)优化
- 在验证期(第4年)测试
- 向前滚动并重复
- 比较样本内与样本外表现
警示信号:
- - 样本外表现低于样本内表现的50%
- 需要频繁重新优化参数
- 参数在不同时期剧烈变化
6. 评估结果
需要回答的问题:
- - 优势在悲观假设下是否仍然存在?
- 表现是否在参数变化中保持稳定?
- 策略是否在多种市场环境中有效?
- 样本量是否足以获得统计置信度?
- 结果是否现实,而非“好得令人难以置信”?
决策标准:
- - ✅ 部署:通过所有压力测试且表现可接受
- 🔄 优化:核心逻辑合理但需调整参数
- ❌ 放弃:未通过压力测试或依赖脆弱假设
关键测试原则
惩罚策略
在所有地方增加摩擦:
- - 佣金高于现实
- 滑点为典型值的1.5-2倍
- 最差成交情况
- 订单拒绝
- 部分成交
理由:在悲观假设下存活的策略,在实盘交易中往往表现更优。
寻找平台区,而非峰值
寻找表现稳定的参数范围,而非产生表现尖峰的最优值。
好:止损在1.5%至3.0%之间策略均盈利
差:策略仅在止损恰好为2.13%时有效
稳定表现表明真正的优势;狭窄的最优值暗示曲线拟合。
测试所有案例,而非精选示例
错误方法:研究手工挑选的“市场领导者”
正确方法:测试所有符合标准的股票,包括失败的
选择性示例会产生幸存者偏差,高估策略质量。
将想法生成与验证分开
直觉:用于生成假设
验证:必须纯粹基于数据驱动
永远不要让对想法的依恋影响测试结果的解读。
常见失败模式
尽早识别这些模式以节省时间:
- 1. 参数敏感性:仅在精确参数值下有效
- 环境特定性:某些年份表现优异,其他年份糟糕
- 滑点敏感性:加入现实成本后不盈利
- 小样本:交易数量过少,缺乏统计置信度
- 前视偏差:“好得令人难以置信”的结果
- 过度优化:参数过多,样本外结果差
参见 references/failed_tests.md 获取详细示例和诊断框架。
可用参考文档
方法论参考
文件:references/methodology.md
何时阅读:需要特定测试技术的详细指导时。
内容:
- - 压力测试方法
- 参数敏感性分析
- 滑点和摩擦建模
- 样本量要求
- 市场环境分类
- 常见偏差和陷阱(幸存者偏差、前视偏差、曲线拟合等)
失败测试参考
文件:references/failed_tests.md
何时阅读:策略未通过测试时,或从过去错误中学习时。
内容:
- - 失败为何有价值
- 常见失败模式及示例
- 案例研究文档框架
- 评估回测的红旗清单
关键提醒
时间分配:花20%时间生成想法,80%时间试图打破它们。
无背景要求:如果策略需要“完美背景”才能有效,则不足以用于系统化交易。
红旗:如果回测结果看起来过于完美(胜率超过90%、最小回撤、完美时机),仔细审计是否存在前视偏差或数据问题。
工具局限性:了解回测平台的特性(插值方法、低流动性处理、数据对齐问题)。
统计显著性:小优势需要大样本量来证明。每笔交易5%的优势需要100笔以上交易才能与运气区分。
主观交易与系统化交易的区别
此技能专注于系统化/量化回测,其中:
- - 所有规则提前编码
- 执行中无主观判断或“感觉”
- 测试涵盖所有历史示例,而非精选案例
- 背景(新闻、宏观)被刻意剥离
主观交易者的研究方法不同——此技能可能不适用于需要主观判断的设定。