Forge — Autonomous Quality Engineering Swarm
Quality forged in, not bolted on.
Forge is a self-learning, autonomous quality engineering swarm that unifies three approaches into one:
| Pillar | Source | What It Does |
|---|
| Build | DDD+ADR+TDD methodology | Structured development with quality gates, defect prediction, confidence-tiered fixes |
| Verify |
BDD/Gherkin behavioral specs | Continuous behavioral verification — the PRODUCT works, not just the CODE |
|
Heal | Autonomous E2E fix loop | Test → Analyze → Fix → Commit → Learn → Repeat |
"DONE DONE" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every quality gate clears. Every dependency graph is satisfied.
ARCHITECTURE ADAPTABILITY
Forge adapts to any project architecture. Before first run, it discovers your project structure:
Supported Architectures
| Architecture | How Forge Adapts |
|---|
| Monolith | Single backend process, all contexts in one codebase. Forge runs all tests against one server. |
| Modular Monolith |
Single deployment with bounded contexts as modules. Forge discovers modules and tests each context independently. |
|
Microservices | Multiple services. Forge discovers service endpoints, tests each service, validates inter-service contracts. |
|
Monorepo | Multiple apps/packages in one repo. Forge detects workspace structure (Turborepo, Nx, Lerna, Melos, Cargo workspace). |
|
Mobile + Backend | Frontend app with backend API. Forge starts backend, then runs E2E tests against it. |
|
Full-Stack Monolith | Frontend and backend in same deployment. Forge tests through the UI layer against real backend. |
Project Discovery
On first invocation, Forge analyzes the project to build a context map:
CODEBLOCK0
Forge stores the discovered project map:
CODEBLOCK1
Configuration Override
Projects can provide a forge.config.yaml at the repo root to override auto-discovery:
CODEBLOCK2
CRITICAL: NO MOCKING OR STUBBING ALLOWED
ABSOLUTE RULE: This skill NEVER uses mocking or stubbing of the backend API.
- - ALL tests run against the REAL backend API
- NO mocking frameworks for API calls (no
mockito, wiremock, MockClient, nock, msw, httpretty, etc.) - NO stubbed responses or fake data from API endpoints
- The backend MUST be running and healthy before any tests execute
- Test data is seeded through REAL API calls, not mocked state
Why No Mocking:
- - Mocks hide real integration bugs
- Mocks create false confidence
- Mocks don't test the actual data flow
- Real API tests catch serialization, validation, and timing issues
PHASE 0: BACKEND SETUP (MANDATORY FIRST STEP)
BEFORE ANY TESTING, the backend MUST be built, compiled, and running.
This is the FIRST thing the skill does — no exceptions.
Step 1: Check and Start Backend
CODEBLOCK3
Step 2: Verify Backend Health
CODEBLOCK4
Step 3: Contract Validation
CODEBLOCK5
Step 4: Seed Test Data (Real API Calls)
CODEBLOCK6
PHASE 1: BEHAVIORAL SPECIFICATION & ARCHITECTURE RECORDS
Before testing, verify Gherkin specs and architecture decision records exist for the target bounded context.
Behavioral specifications define WHAT the product does from the user's perspective. Every test traces back to a Gherkin scenario. If tests pass but specs fail, the product is broken.
Spec Location
Gherkin specs are stored alongside tests:
CODEBLOCK7
The exact location depends on your project's test structure. Forge auto-discovers this from the project map.
Spec-to-Test Mapping
Each Gherkin Scenario maps to exactly one test function. The mapping is tracked:
CODEBLOCK8
Missing Spec Generation
If specs are missing for a target context, the Specification Verifier agent creates them:
- 1. Read the screen/component/route implementation files for the context
- Extract all user-visible features, interactions, and states
- Generate Gherkin scenarios covering every cyclomatic path
- Write to INLINECODE8
- Map each scenario to its corresponding test function
Agent-Optimized ADR Generation
When Forge discovers a bounded context without an Architecture Decision Record, the Specification Verifier generates one. ADRs follow an agent-optimized format designed for machine consumption:
CODEBLOCK9
ADR Storage:
- - ADRs are stored in
docs/decisions/ or the project-configured ADR directory - Each bounded context has exactly one ADR
- ADRs are updated when contracts change or new dependencies are discovered
- The Specification Verifier agent includes ADR generation in its workflow
PHASE 2: CONTRACT & DEPENDENCY VALIDATION
Contract Validation
Before running tests, verify API response schemas match expected DTOs:
CODEBLOCK10
Contract violations are treated as Gate 7 failures and must be resolved before functional testing proceeds.
Shared Types Validation
For bounded contexts that share dependencies, validate type consistency across context boundaries:
- 1. Identify shared DTOs/models — For each context, extract types used in API requests and responses
- Cross-reference types — Compare DTOs between contexts that share dependencies (from the dependency graph)
- Flag type mismatches — e.g., context A expects
userId: string but context B sends INLINECODE11 - Validate value objects — Ensure value objects (email, money, address) follow consistent patterns across contexts
- Report violations — Flag as pre-Gate warnings with specific file locations and expected vs actual types
CODEBLOCK11
Cross-Cutting Foundation Validation
Verify cross-cutting concerns are consistent across all bounded contexts:
- - Auth patterns — Same header format (
Authorization: Bearer <token>), same token validation approach across all endpoints - Error response format — All API endpoints return errors in the project's standard format (consistent structure, error codes, HTTP status codes)
- Logging patterns — Consistent log levels, structured format, and correlation IDs across contexts
- Pagination format — Consistent pagination parameters and response format across collection endpoints
Cross-cutting violations are reported as warnings before Gate evaluation begins.
Dependency Graph
Bounded contexts have dependencies. When a fix touches context X, all contexts that depend on X must be re-tested.
CODEBLOCK12
Cascade Re-Testing
When Bug Fixer modifies a file in context X:
- 1. Identify which context X belongs to
- Look up all contexts in
blocks list for X - After X's tests pass, automatically re-run tests for blocked contexts
- If a cascade failure occurs, trace it back to the original fix
PHASE 3: SWARM INITIALIZATION
CODEBLOCK13
MODEL ROUTING
Forge routes each agent to the appropriate model tier based on task complexity, optimizing for cost without sacrificing quality:
| Agent | Model | Rationale |
|---|
| Specification Verifier | INLINECODE14 | Reads code + generates Gherkin — moderate reasoning |
| Test Runner |
haiku | Structured execution, output parsing — low reasoning |
| Failure Analyzer |
sonnet | Root cause analysis — moderate reasoning |
| Bug Fixer |
opus | First-principles code fixes — high reasoning |
| Quality Gate Enforcer |
haiku | Threshold comparison — low reasoning |
| Accessibility Auditor |
sonnet | Code analysis + WCAG rules — moderate reasoning |
| Auto-Committer |
haiku | Git operations, message formatting — low reasoning |
| Learning Optimizer |
sonnet | Pattern analysis, prediction — moderate reasoning |
Projects can override model assignments in forge.config.yaml:
CODEBLOCK14
When no override is specified, the defaults above are used. This routing reduces token cost by ~60% compared to running all agents on the highest-tier model.
PHASE 4: SPAWN AUTONOMOUS AGENTS
Claude Code MUST spawn these 8 agents in a SINGLE message with run_in_background: true:
CODEBLOCK15
PHASE 5: QUALITY GATES
7 gates evaluated after each fix cycle. ALL must pass before a commit is created.
| Gate | Check | Threshold | Blocking |
|---|
| 1. Functional | All tests pass | 100% pass rate | YES |
| 2. Behavioral |
Gherkin scenarios satisfied | 100% of targeted scenarios | YES |
| 3. Coverage | Path coverage | >=85% overall, >=95% critical | YES (critical only) |
| 4. Security | No hardcoded secrets, secure storage, SAST checks | 0 critical/high violations | YES |
| 5. Accessibility | Accessible labels, target sizes, contrast | WCAG AA | Warning only |
| 6. Resilience | Offline handling, timeout handling, error states | Tested for target context | Warning only |
| 7. Contract | API response matches expected schema | 0 mismatches | YES |
Gate Failure Categories
When gates fail, failures are categorized for targeted re-runs:
- - Functional failures → Re-run Bug Fixer on failing tests
- Behavioral failures → Check spec-to-test mapping, may need new tests
- Coverage failures → Generate additional test paths
- Security failures → Fix hardcoded values, update storage patterns
- Accessibility failures → Add accessible labels, fix target sizes
- Resilience failures → Add offline/error state handling
- Contract failures → Update DTOs or flag API regression
AUTONOMOUS EXECUTION LOOP
CODEBLOCK16
REAL-TIME PROGRESS REPORTING
Each agent emits structured progress events during execution for observability:
CODEBLOCK17
Progress File:
- - Events are appended to
.forge/progress.jsonl (one JSON object per line) - File is created at the start of each Forge run and truncated
- Tools can tail this file for real-time monitoring: INLINECODE25
Integration with Agentic QE AG-UI:
- - When the AQE AG-UI protocol is available, events stream directly to the user interface
- Users see live progress: which gate is being evaluated, which test is running, which fix is being applied
- When running in Claude Code without AG-UI, progress is visible through agent output files
CONFIDENCE TIERS FOR FIX PATTERNS
Every fix pattern is tracked with a confidence score that evolves over time:
CODEBLOCK18
Tier Thresholds
| Tier | Confidence | Auto-Apply | Behavior |
|---|
| Platinum | >= 0.95 | Yes | Apply immediately without review |
| Gold |
>= 0.85 | Yes | Apply and flag in commit message |
|
Silver | >= 0.75 | No | Suggest to Bug Fixer, don't auto-apply |
|
Bronze | >= 0.70 | No | Store for learning only, never auto-apply |
|
Expired | < 0.70 | No | Pattern demoted, needs revalidation |
Confidence Updates
After each application:
- - Success: confidence += 0.05 (capped at 1.0)
- Failure: confidence -= 0.10 (floored at 0.0)
- Tier promotion when crossing threshold upward
- Tier demotion when crossing threshold downward
DEFECT PREDICTION
Before running tests, the Learning Optimizer analyzes historical data to predict which tests are most likely to fail:
Input Signals
- 1. Files changed since last green run (git diff against last-green-commit)
- Historical failure rates per bounded context (from forge-results namespace)
- Fix pattern freshness — recently applied fixes are more likely to regress
- Complexity metrics — contexts with more cyclomatic paths fail more often
- Dependency chain length — deeper dependency chains have higher failure rates
Prediction Output
CODEBLOCK19
Tests are executed in descending probability order — predicted-to-fail tests run FIRST for faster convergence.
EXHAUSTIVE EDGE CASE TESTING
General UI Element Edge Cases
For EVERY interactive element, test:
- 1. Interaction States
- Single interaction → expected action
- Repeated rapid interaction → no duplicate action
- Long press / right-click → context menu if applicable
- Disabled state → no action, visual feedback
- 2. Input Field States
- Empty → placeholder visible
- Focus → visual focus indicator
- Valid input → no error
- Invalid input → error message
- Max length reached → prevents further input
- Paste → validates pasted content
- Clear → resets to empty
- 3. Async Operation States
- Before load → loading indicator
- During load → spinner, disabled submit
- Success → data displayed, spinner gone
- Error → error message, retry option
- Timeout → timeout message, retry option
- 4. Navigation Edge Cases
- Back navigation → previous screen or exit confirmation
- Deep link → correct screen with params
- Invalid deep link → fallback/error screen
- Browser forward/back (web) → correct state
- 5. Scroll Edge Cases
- Overscroll → appropriate feedback
- Scroll to hidden content → content becomes visible
- Keyboard appears → scroll to focused field
Network Edge Cases
- 1. No internet → offline indicator, cached data if available
- Slow connection → loading states persist, timeout handling
- Connection restored → auto-retry pending operations
- Server error 500 → generic error message
- Auth error 401 → redirect to login
- Permission error 403 → permission denied message
- Not found 404 → "not found" message
Chaos Testing (Resilience)
For each target context, inject controlled failures:
- 1. Timeout injection → API calls take >10s → verify timeout UI
- Partial response → API returns incomplete data → verify graceful degradation
- Rate limiting → API returns 429 → verify retry-after behavior
- Concurrent mutations → Multiple clients modify same resource → verify conflict handling
- Session expiry → Token expires mid-flow → verify re-auth prompt
Visual Regression Testing
For UI-heavy projects, Forge captures and compares screenshots to detect unintended visual changes:
- 1. Before fix — Capture baseline screenshots of all screens in the target context
- After fix — Capture new screenshots of the same screens
- Compare — Pixel-by-pixel comparison with configurable threshold (default: 0.1% diff tolerance)
- Report — Flag visual regressions as Gate 5 (Accessibility) warnings
- Store — Save screenshot diffs in memory for review
Screenshot Capture by Platform:
| Platform | Method |
|---|
| Web (Playwright) | INLINECODE26 |
| Web (Cypress) |
cy.screenshot() |
| Flutter |
await tester.binding.setSurfaceSize(size); await expectLater(find.byType(App), matchesGoldenFile('name.png')) |
| Mobile (native) | Platform-specific screenshot capture |
Configuration:
CODEBLOCK20
When Agentic QE is available, delegate to the visual-tester agent for parallel viewport comparison across multiple screen sizes.
INVOCATION MODES
CODEBLOCK21
MEMORY NAMESPACES
| Namespace | Purpose | Key Pattern |
|---|
| INLINECODE30 | Fix patterns with confidence tiers | INLINECODE31 |
| INLINECODE32 |
Test run results |
test-run-[timestamp] |
|
forge-state | Coverage + gate status |
forge-coverage-status,
gates-[context]-[ts],
last-green-commit |
|
forge-commits | Commit history |
commit-[hash] |
|
forge-screens | Implemented screens/pages |
screen-[name] |
|
forge-specs | Gherkin specifications |
specs-[context]-[timestamp] |
|
forge-contracts | API contract snapshots |
contract-snapshot-[timestamp] |
|
forge-predictions | Defect prediction history |
prediction-[date] |
OPTIONAL: AGENTIC QE INTEGRATION
Forge can optionally integrate with the Agentic QE framework via MCP for enhanced capabilities. All AQE features are additive — Forge works identically without AQE.
Detection
On startup, Forge checks for AQE availability:
CODEBLOCK22
Enhanced Capabilities When AQE Is Available
| Forge Component | Without AQE (Default) | With AQE |
|---|
| Pattern Storage | claude-flow memory (forge-patterns namespace) | ReasoningBank — HNSW vector-indexed, 150x faster pattern search, experience replay |
| Defect Prediction |
Historical failure rates + file changes |
defect-intelligence domain — root-cause-analyzer + defect-predictor agents |
|
Security Scanning | Gate 4 static checks (secrets, injection vectors) |
security-compliance domain — full SAST/DAST via security-scanner agent |
|
Accessibility Audit | Forge Accessibility Auditor agent |
visual-accessibility domain — visual-tester + accessibility-auditor agents |
|
Contract Testing | Gate 7 schema validation |
contract-testing domain — contract-validator + graphql-tester agents |
|
Progress Reporting |
.forge/progress.jsonl file | AG-UI streaming protocol for real-time UI updates |
Fallback Behavior
When AQE is NOT available, Forge falls back to its built-in behavior for every capability. No configuration is required — the skill auto-detects and adapts.
Configuration
CODEBLOCK23
AQE Agent Delegation Map
When AQE is enabled, Forge delegates specific subtasks to specialized AQE agents:
| Forge Agent | AQE Domain | AQE Agents Used |
|---|
| Specification Verifier | INLINECODE54 | bdd-generator, requirements-validator |
| Failure Analyzer |
defect-intelligence | root-cause-analyzer, defect-predictor |
| Quality Gate Enforcer (Gate 4) |
security-compliance | security-scanner, security-auditor |
| Accessibility Auditor |
visual-accessibility | visual-tester, accessibility-auditor |
| Quality Gate Enforcer (Gate 7) |
contract-testing | contract-validator, graphql-tester |
| Learning Optimizer |
learning-optimization | learning-coordinator, pattern-learner |
Forge agents that have no AQE equivalent (Test Runner, Bug Fixer, Auto-Committer) continue to run as built-in agents regardless of AQE availability.
DEFENSIVE TEST PATTERNS
The Bug Fixer agent uses defensive patterns appropriate to the project's test framework. Examples:
Flutter: Safe Tap
CODEBLOCK24
Flutter: Safe Text Entry
CODEBLOCK25
Flutter: Visual Observation Delay
CODEBLOCK26
Flutter: Scroll Until Visible
CODEBLOCK27
Flutter: Wait For API Response
CODEBLOCK28
Cypress / Playwright: Safe Click
CODEBLOCK29
Cypress / Playwright: Wait For API
async function waitForApi(urlPattern, options = { timeout: 10000 }) {
return page.waitForResponse(
response => response.url().includes(urlPattern) && response.status() === 200,
{ timeout: options.timeout }
);
}
COMMON FIX PATTERNS
Pattern: Element Not Found
CODEBLOCK31
Pattern: Timeout
CODEBLOCK32
Pattern: Assertion Failed
CODEBLOCK33
Pattern: API Response Mismatch
{
"error": "Type error / null value / schema mismatch",
"cause": "Backend response format changed",
"tier": "gold",
"confidence": 0.86,
"fixes": [
"Update model/DTO to match current API response",
"Add null safety handling",
"Check API version compatibility"
]
}
COVERAGE TRACKING
The Learning Optimizer maintains coverage status per context:
CODEBLOCK35
AUTO-COMMIT MESSAGE FORMAT
CODEBLOCK36
ROLLBACK & CONFLICT RESOLUTION
Rollback Capability
If a fix introduces regressions:
CODEBLOCK37
Fix Conflict Protocol
When Bug Fixer's fix causes a cascade regression (tests in dependent contexts fail):
- 1. Halt — Stop the fix loop for the affected context
- Re-analyze — Failure Analyzer examines both the original failure AND the cascade failure
- Categorize — Compare root cause categories:
-
Different root cause → The fix is kept; the cascade failure is treated as a new, independent failure in the next loop iteration
-
Same root cause → The fix is reverted and the pattern is demoted (-0.10 confidence)
- 4. Revert limit — Maximum 2 revert cycles per test before escalating to user review
- Escalation — If 2 reverts occur for the same test, Forge pauses and reports:
CODEBLOCK38
Agent Disagreement Resolution
When two agents disagree (e.g., Bug Fixer wants to change a file that Spec Verifier says shouldn't change):
- 1. Quality Gate Enforcer acts as arbiter — It evaluates both proposed states
- The change that results in more gates passing wins
- Tie-breaking order:
- Fewer files changed (prefer minimal diff)
- Higher confidence tier (prefer proven patterns)
- Bug Fixer defers to Spec Verifier (specs are source of truth)
POST-EXECUTION LEARNING
After each autonomous run, the skill triggers comprehensive learning:
CODEBLOCK39
PROJECT-SPECIFIC EXTENSIONS
Forge can be extended per-project by creating a forge.contexts.yaml file alongside the skill:
CODEBLOCK40
This separates the generic Forge engine from project-specific configuration, making Forge reusable across any codebase.
QUICK REFERENCE CHECKLIST
Before running Forge:
- - [ ] Backend built and running
- [ ] Health check passes
- [ ] Test data seeded via real API calls
- [ ] No mocking or stubbing in test code
- [ ] Gherkin specs exist for target context (or will be generated)
- [ ] All new screens/pages have test coverage
- [ ] Edge cases documented and tested
After Forge completes:
- - [ ] Gate 1 (Functional): All tests pass
- [ ] Gate 2 (Behavioral): All targeted Gherkin scenarios satisfied
- [ ] Gate 3 (Coverage): >=85% overall, >=95% critical paths
- [ ] Gate 4 (Security): No hardcoded secrets, no injection vectors, no critical CVEs
- [ ] Gate 5 (Accessibility): WCAG AA compliance checked
- [ ] Gate 6 (Resilience): Offline/timeout/error states tested
- [ ] Gate 7 (Contract): API responses match expected schemas
- [ ] Confidence tiers updated for all applied fix patterns
- [ ] Defect predictions updated for next run
- [ ] All fixes committed with detailed messages
Forge — 自主质量工程集群
质量锻造于内,而非附加于外。
Forge 是一个自学习、自主运行的质量工程集群,它将三种方法统一为一种:
| 支柱 | 来源 | 功能 |
|---|
| 构建 | DDD+ADR+TDD 方法论 | 结构化开发,包含质量门禁、缺陷预测、置信度分级修复 |
| 验证 |
BDD/Gherkin 行为规范 | 持续行为验证——产品正常工作,而不仅仅是代码 |
|
修复 | 自主端到端修复循环 | 测试 → 分析 → 修复 → 提交 → 学习 → 重复 |
真正完成 意味着:代码编译通过,并且产品行为符合规范。每个 Gherkin 场景都通过。每个质量门禁都清除。每个依赖图都满足。
架构适应性
Forge 适应任何项目架构。在首次运行前,它会发现你的项目结构:
支持的架构
| 架构 | Forge 如何适应 |
|---|
| 单体应用 | 单一后端进程,所有上下文在一个代码库中。Forge 针对一个服务器运行所有测试。 |
| 模块化单体 |
单一部署,包含作为模块的有界上下文。Forge 发现模块并独立测试每个上下文。 |
|
微服务 | 多个服务。Forge 发现服务端点,测试每个服务,验证服务间契约。 |
|
单仓库 | 一个仓库中的多个应用/包。Forge 检测工作区结构(Turborepo、Nx、Lerna、Melos、Cargo workspace)。 |
|
移动端 + 后端 | 带后端 API 的前端应用。Forge 启动后端,然后针对它运行端到端测试。 |
|
全栈单体 | 前端和后端在同一部署中。Forge 通过 UI 层针对真实后端进行测试。 |
项目发现
首次调用时,Forge 分析项目以构建上下文映射:
bash
Forge 自动发现:
1. 后端技术(Rust/Cargo、Node/npm、Python/pip、Go、Java/Maven/Gradle、.NET)
2. 前端技术(Flutter、React、Next.js、Vue、Angular、SwiftUI、Kotlin/Compose)
3. 测试框架(integration_test、Jest、Pytest、Go test、JUnit、xUnit)
4. 项目结构(单仓库布局、服务边界、模块边界)
5. API 协议(REST、GraphQL、gRPC、WebSocket)
6. 构建系统(Make、npm scripts、Gradle tasks、Cargo features)
Forge 存储发现的项目映射:
json
{
architecture: mobile-backend,
backend: {
technology: rust,
buildCommand: cargo build --release --features test-endpoints,
runCommand: cargo run --release --features test-endpoints,
healthEndpoint: /health,
port: 8080,
migrationCommand: cargo sqlx migrate run
},
frontend: {
technology: flutter,
testCommand: flutter drive --driver=testdriver/integrationtest.dart --target={target},
testDir: integration_test/e2e/,
specDir: integration_test/e2e/specs/
},
contexts: [identity, rides, payments, ...],
testDataSeeding: {
method: api,
endpoint: /api/v1/test/seed,
authHeader: X-Test-Key
}
}
配置覆盖
项目可以在仓库根目录提供 forge.config.yaml 文件以覆盖自动发现:
yaml
forge.config.yaml(可选——如果缺失,Forge 会自动发现)
architecture: microservices
backend:
services:
- name: auth-service
port: 8081
healthEndpoint: /health
buildCommand: npm run build
runCommand: npm start
- name: payment-service
port: 8082
healthEndpoint: /health
buildCommand: npm run build
runCommand: npm start
frontend:
technology: react
testCommand: npx cypress run --spec {target}
testDir: cypress/e2e/
specDir: cypress/e2e/specs/
contexts:
- name: identity
testFile: auth.cy.ts
specFile: identity.feature
- name: payments
testFile: payments.cy.ts
specFile: payments.feature
dependencies:
identity:
blocks: [payments, orders]
payments:
depends_on: [identity]
blocks: [orders]
关键:禁止模拟或桩代码
绝对规则:此技能绝不使用后端 API 的模拟或桩代码。
- - 所有测试都针对真实后端 API 运行
- 没有用于 API 调用的模拟框架(没有 mockito、wiremock、MockClient、nock、msw、httpretty 等)
- 没有来自 API 端点的桩响应或虚假数据
- 在任何测试执行之前,后端必须正在运行且健康
- 测试数据通过真实 API 调用播种,而非模拟状态
为什么禁止模拟:
- - 模拟会隐藏真正的集成错误
- 模拟会制造虚假的信心
- 模拟不测试实际的数据流
- 真实 API 测试能捕获序列化、验证和时序问题
阶段 0:后端设置(强制第一步)
在任何测试之前,后端必须被构建、编译并运行。
这是技能做的第一件事——没有例外。
步骤 1:检查并启动后端
bash
1. 读取项目配置或自动发现后端设置
2. 检查后端是否已在运行
curl -s http://localhost:${BACKEND
PORT}/${HEALTHENDPOINT} || {
echo 后端未运行。正在启动...
# 3. 导航到后端目录
cd ${BACKEND_DIR}
# 4. 确保环境已配置
cp .env.example .env 2>/dev/null || true
# 5. 构建后端
${BUILD_COMMAND}
# 6. 运行数据库迁移(如果适用)
${MIGRATION_COMMAND}
# 7. 启动后端(后台)
nohup ${RUN_COMMAND} > backend.log 2>&1 &
echo $! > backend.pid
# 8. 等待后端健康(最多 60 秒)
for i in {1..60}; do
if curl -s http://localhost:${BACKENDPORT}/${HEALTHENDPOINT} | grep -q ok\|healthy\|UP; then
echo 后端在端口 ${BACKEND_PORT} 上健康运行
break
fi
sleep 1
done
}
步骤 2:验证后端健康
bash
验证关键端点是否响应
curl -s http://localhost:${BACKEND
PORT}/${HEALTHENDPOINT} | jq .
验证测试夹具端点(用于播种)
curl -s -H ${TEST
AUTHHEADER} http://localhost:${BACKEND
PORT}/${TESTSTATUS_ENDPOINT} | jq .
步骤 3:契约验证
bash
验证 API 规范是否与运行的 API 匹配(如果 OpenAPI/Swagger 可用)
curl -s http://localhost:${BACKEND
PORT}/${OPENAPIENDPOINT} > /tmp/live-spec.json
存储契约快照用于回归检测
npx @claude-flow/cli@latest memory store \
--key contract-snapshot-$(date +%s) \
--value $(cat /tmp/live-spec.json | head -c 5000) \
--namespace forge-contracts
步骤 4:播种测试数据(真实 API 调用)
bash
通过真实 API 播种测试数据——根据项目的播种端点进行调整
curl -X POST http://localhost:${BACKEND
PORT}/${SEEDENDPOINT} \
-H Content-Type: application/json \
-H ${TEST
AUTHHEADER} \
-d ${SEED_PAYLOAD}
阶段 1:行为规范与架构记录
在测试之前,验证目标有界上下文的 Gherkin 规范和架构决策记录是否存在。
行为规范从用户角度定义了产品做什么。每个测试都追溯到一个 Gherkin 场景。如果测试通过但规范失败,则产品已损坏。
规范位置
Gherkin 规范与测试一起存储:
${SPEC_DIR}/
├── [context-a].feature
├── [context-b].feature
├── [context-c].feature