Backtest Expert

Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.

Core Philosophy

Goal: Find strategies that "break the least", not strategies that "profit the most" on paper.

Principle: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.

When to Use This Skill

Use this skill when:

- Developing or validating systematic trading strategies
Evaluating whether a trading idea is robust enough for live implementation
Troubleshooting why a backtest might be misleading
Learning proper backtesting methodology
Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
Assessing parameter sensitivity and regime dependence
Setting realistic expectations for slippage and execution costs

Backtesting Workflow

1. State the Hypothesis

Define the edge in one sentence.

Example: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."

If you can't articulate the edge clearly, don't proceed to testing.

2. Codify Rules with Zero Discretion

Define with complete specificity:

- Entry: Exact conditions, timing, price type
Exit: Stop loss, profit target, time-based exit
Position sizing: Fixed $$, % of portfolio, volatility-adjusted
Filters: Market cap, volume, sector, volatility conditions
Universe: What instruments are eligible

Critical: No subjective judgment allowed. Every decision must be rule-based and unambiguous.

3. Run Initial Backtest

Test over:

- Minimum 5 years (preferably 10+)
Multiple market regimes (bull, bear, high/low volatility)
Realistic costs: Commissions + conservative slippage

Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.

4. Stress Test the Strategy

This is where 80% of testing time should be spent.

Parameter sensitivity:

- Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
Vary entry/exit timing by ±15-30 minutes
Look for "plateaus" of stable performance, not narrow spikes

Execution friction:

- Increase slippage to 1.5-2x typical estimates
Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
Add realistic order rejection scenarios
Test with pessimistic commission structures

Time robustness:

- Analyze year-by-year performance
Require positive expectancy in majority of years
Ensure strategy doesn't rely on 1-2 exceptional periods
Test in different market regimes separately

Sample size:

- Absolute minimum: 30 trades
Preferred: 100+ trades
High confidence: 200+ trades

5. Out-of-Sample Validation

Walk-forward analysis:

1. Optimize on training period (e.g., Year 1-3)
Test on validation period (Year 4)
Roll forward and repeat
Compare in-sample vs out-of-sample performance

Warning signs:

- Out-of-sample <50% of in-sample performance
Need frequent parameter re-optimization
Parameters change dramatically between periods

6. Evaluate Results

Questions to answer:

- Does edge survive pessimistic assumptions?
Is performance stable across parameter variations?
Does strategy work in multiple market regimes?
Is sample size sufficient for statistical confidence?
Are results realistic, not "too good to be true"?

Decision criteria:

- ✅ Deploy: Survives all stress tests with acceptable performance
🔄 Refine: Core logic sound but needs parameter adjustment
❌ Abandon: Fails stress tests or relies on fragile assumptions

Key Testing Principles

Punish the Strategy

Add friction everywhere:

- Commissions higher than reality
Slippage 1.5-2x typical
Worst-case fills
Order rejections
Partial fills

Rationale: Strategies that survive pessimistic assumptions often outperform in live trading.

Seek Plateaus, Not Peaks

Look for parameter ranges where performance is stable, not optimal values that create performance spikes.

Good: Strategy profitable with stop loss anywhere from 1.5% to 3.0%
Bad: Strategy only works with stop loss at exactly 2.13%

Stable performance indicates genuine edge; narrow optima suggest curve-fitting.

Test All Cases, Not Cherry-Picked Examples

Wrong approach: Study hand-picked "market leaders" that worked
Right approach: Test every stock that met criteria, including those that failed

Selective examples create survivorship bias and overestimate strategy quality.

Separate Idea Generation from Validation

Intuition: Useful for generating hypotheses
Validation: Must be purely data-driven

Never let attachment to an idea influence interpretation of test results.

Common Failure Patterns

Recognize these patterns early to save time:

1. Parameter sensitivity: Only works with exact parameter values
Regime-specific: Great in some years, terrible in others
Slippage sensitivity: Unprofitable when realistic costs added
Small sample: Too few trades for statistical confidence
Look-ahead bias: "Too good to be true" results
Over-optimization: Many parameters, poor out-of-sample results

See references/failed_tests.md for detailed examples and diagnostic framework.

Available Reference Documentation

Methodology Reference

File: INLINECODE1

When to read: For detailed guidance on specific testing techniques.

Contents:

- Stress testing methods
Parameter sensitivity analysis
Slippage and friction modeling
Sample size requirements
Market regime classification
Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)

Failed Tests Reference

File: INLINECODE2

When to read: When strategy fails tests, or learning from past mistakes.

Contents:

- Why failures are valuable
Common failure patterns with examples
Case study documentation framework
Red flags checklist for evaluating backtests

Critical Reminders

Time allocation: Spend 20% generating ideas, 80% trying to break them.

Context-free requirement: If strategy requires "perfect context" to work, it's not robust enough for systematic trading.

Red flag: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.

Tool limitations: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).

Statistical significance: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.

Discretionary vs Systematic Differences

This skill focuses on systematic/quantitative backtesting where:

- All rules are codified in advance
No discretion or "feel" in execution
Testing happens on all historical examples, not cherry-picked cases
Context (news, macro) is deliberately stripped out

Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.

技能名称：backtest-expert-zc
详细描述：

回测专家

基于专业方法论的系统化交易策略回测方法，优先考虑稳健性而非乐观结果。

核心理念

目标：寻找“最不容易失效”的策略，而非纸上“利润最高”的策略。

原则：增加摩擦，压力测试假设，观察哪些能存活。如果策略在悲观条件下仍能成立，则更可能在实盘交易中有效。

何时使用此技能

在以下情况下使用此技能：

- 开发或验证系统化交易策略
评估交易想法是否足够稳健以用于实盘
排查回测结果可能具有误导性的原因
学习正确的回测方法论
避免常见陷阱（曲线拟合、前视偏差、幸存者偏差）
评估参数敏感性和市场环境依赖性
设定对滑点和执行成本的现实预期

回测工作流程

1. 陈述假设

用一句话定义优势。

示例：“财报后跳空高开超过3%并在第一小时内回撤至昨日收盘价的股票，提供均值回归机会。”

如果无法清晰阐述优势，不要继续测试。

2. 零主观性编码规则

以完全精确性定义：

- 入场：确切条件、时间、价格类型
出场：止损、止盈、基于时间的出场
仓位管理：固定金额、投资组合百分比、波动率调整
筛选条件：市值、成交量、行业、波动率条件
标的范围：哪些工具符合资格

关键：不允许主观判断。每个决策必须基于规则且明确无误。

3. 运行初始回测

测试范围：

- 至少5年（最好10年以上）
多种市场环境（牛市、熊市、高/低波动率）
现实成本：佣金 + 保守滑点

检查初始结果的基本可行性。如果根本性失效，迭代假设。

4. 压力测试策略

这是应花费80%测试时间的地方。

参数敏感性：

- 测试止损为基准值的50%、75%、100%、125%、150%
测试止盈为基准值的80%、90%、100%、110%、120%
将入场/出场时间变化±15-30分钟
寻找稳定表现的“平台区”，而非狭窄的尖峰

执行摩擦：

- 将滑点增加至典型估计值的1.5-2倍
模拟最差成交情况（以卖价+1跳买入，以买价-1跳卖出）
添加现实的订单拒绝场景
使用悲观的佣金结构测试

时间稳健性：

- 逐年分析表现
要求大多数年份具有正期望值
确保策略不依赖1-2个特殊时期
在不同市场环境中分别测试

样本量：

- 绝对最小值：30笔交易
推荐值：100笔以上交易
高置信度：200笔以上交易

5. 样本外验证

滚动优化分析：

1. 在训练期（例如第1-3年）优化
在验证期（第4年）测试
向前滚动并重复
比较样本内与样本外表现

警示信号：

- 样本外表现低于样本内表现的50%
需要频繁重新优化参数
参数在不同时期剧烈变化

6. 评估结果

需要回答的问题：

- 优势在悲观假设下是否仍然存在？
表现是否在参数变化中保持稳定？
策略是否在多种市场环境中有效？
样本量是否足以获得统计置信度？
结果是否现实，而非“好得令人难以置信”？

决策标准：

- ✅ 部署：通过所有压力测试且表现可接受
🔄 优化：核心逻辑合理但需调整参数
❌ 放弃：未通过压力测试或依赖脆弱假设

关键测试原则

惩罚策略

在所有地方增加摩擦：

- 佣金高于现实
滑点为典型值的1.5-2倍
最差成交情况
订单拒绝
部分成交

理由：在悲观假设下存活的策略，在实盘交易中往往表现更优。

寻找平台区，而非峰值

寻找表现稳定的参数范围，而非产生表现尖峰的最优值。

好：止损在1.5%至3.0%之间策略均盈利
差：策略仅在止损恰好为2.13%时有效

稳定表现表明真正的优势；狭窄的最优值暗示曲线拟合。

测试所有案例，而非精选示例

错误方法：研究手工挑选的“市场领导者”
正确方法：测试所有符合标准的股票，包括失败的

选择性示例会产生幸存者偏差，高估策略质量。

将想法生成与验证分开

直觉：用于生成假设
验证：必须纯粹基于数据驱动

永远不要让对想法的依恋影响测试结果的解读。

常见失败模式

尽早识别这些模式以节省时间：

1. 参数敏感性：仅在精确参数值下有效
环境特定性：某些年份表现优异，其他年份糟糕
滑点敏感性：加入现实成本后不盈利
小样本：交易数量过少，缺乏统计置信度
前视偏差：“好得令人难以置信”的结果
过度优化：参数过多，样本外结果差

参见 references/failed_tests.md 获取详细示例和诊断框架。

可用参考文档

方法论参考

文件：references/methodology.md

何时阅读：需要特定测试技术的详细指导时。

内容：

- 压力测试方法
参数敏感性分析
滑点和摩擦建模
样本量要求
市场环境分类
常见偏差和陷阱（幸存者偏差、前视偏差、曲线拟合等）

失败测试参考

文件：references/failed_tests.md

何时阅读：策略未通过测试时，或从过去错误中学习时。

内容：

- 失败为何有价值
常见失败模式及示例
案例研究文档框架
评估回测的红旗清单

关键提醒

时间分配：花20%时间生成想法，80%时间试图打破它们。

无背景要求：如果策略需要“完美背景”才能有效，则不足以用于系统化交易。

红旗：如果回测结果看起来过于完美（胜率超过90%、最小回撤、完美时机），仔细审计是否存在前视偏差或数据问题。

工具局限性：了解回测平台的特性（插值方法、低流动性处理、数据对齐问题）。

统计显著性：小优势需要大样本量来证明。每笔交易5%的优势需要100笔以上交易才能与运气区分。

主观交易与系统化交易的区别

此技能专注于系统化/量化回测，其中：

- 所有规则提前编码
执行中无主观判断或“感觉”
测试涵盖所有历史示例，而非精选案例
背景（新闻、宏观）被刻意剥离

主观交易者的研究方法不同——此技能可能不适用于需要主观判断的设定。

backtest-expert-zc回测专家指南

backtest-expert-zc

Backtest Expert

Core Philosophy

When to Use This Skill

Backtesting Workflow

1. State the Hypothesis

2. Codify Rules with Zero Discretion

3. Run Initial Backtest

4. Stress Test the Strategy

5. Out-of-Sample Validation

6. Evaluate Results

Key Testing Principles

Punish the Strategy

Seek Plateaus, Not Peaks

Test All Cases, Not Cherry-Picked Examples

Separate Idea Generation from Validation

Common Failure Patterns

Available Reference Documentation

Methodology Reference

Failed Tests Reference

Critical Reminders

Discretionary vs Systematic Differences

回测专家

核心理念

何时使用此技能

回测工作流程

1. 陈述假设

2. 零主观性编码规则

3. 运行初始回测

4. 压力测试策略

5. 样本外验证

6. 评估结果

关键测试原则

惩罚策略

寻找平台区，而非峰值

测试所有案例，而非精选示例

将想法生成与验证分开

常见失败模式

可用参考文档

方法论参考

失败测试参考

关键提醒

主观交易与系统化交易的区别

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement