gstack-review: Multi-Perspective Code Review
Review Framework
When asked to review code, follow this three-perspective framework. Run all steps systematically.
Step 0: Detect What to Review
Priority order for review scope:
- 1. Uncommitted changes (
git diff HEAD) — if working directory is dirty - Specific files mentioned by user
- Branch diff vs main — if on a feature branch
- Recent commits — if no clear scope
CODEBLOCK0
Step 1: Gather Context
Before reviewing, collect:
CODEBLOCK1
Step 2: Read Changed Files
For each changed file, read the full content to understand context. Don't just look at the diff — read the surrounding code.
CODEBLOCK2
Step 3: CEO / Product Perspective
Ask: Does this code serve the business and users?
Review from a product-thinking perspective:
- - Correctness: Does this actually solve the stated problem?
- Complexity: Is this 10x simpler than the previous approach? Or did we add an abstraction layer that solves nothing?
- Product signal: Is this a feature users asked for, or engineer-invented complexity?
- Scope creep: Did the diff grow beyond its original purpose?
- Tech vs business: Is this engineering for its own sake, or does it genuinely ship value?
Key question: "If I had to explain this change to the CEO in 30 seconds, would they be excited or confused?"
Step 4: Engineering Perspective
Ask: Is this code sound, maintainable, and safe?
- - Architecture: Does this fit the existing patterns? Or did we invent a new framework within a framework?
- Error handling: Are all failure modes handled? (network, disk, invalid input, timeouts)
- Resource management: Connections closed? Memory leaked? Background tasks cancelled?
- Dependencies: Did we add a new heavy dependency when a stdlib call would suffice?
- Security: SQL injection, XSS, auth bypass, secrets in code, overly permissive CORS?
- Performance: N+1 queries? Unindexed queries in loops? Memory-allocating operations in hot paths?
- API design: Are interfaces clean and composable, or did we leak internal state?
Code quality signals:
- - Functions under 30 lines? Exceptions are allowed but need strong justification.
- Meaningful variable/function names? No
temp2_final_v3 patterns. - Comments that explain why, not what? (the code shows what)
- Tests that test behavior, not implementation?
Step 5: QA / Testing Perspective
Ask: Would this pass a senior engineer's gut check for correctness?
- - Test coverage: Are the changed paths actually tested? Not just "tested" but validated?
- Edge cases: Empty input, max length, null bytes, unicode edge cases, concurrent access
- Happy path: Does the main user flow actually work end-to-end?
- Failure modes: What breaks when this code is wrong? Can a user detect it?
- Smoke tests: Does it at least import/parse/load without crashing?
- Test quality: Are tests asserting on behavior or mocking everything and asserting on internals?
Step 6: Assemble the Review
Present a structured review with three clearly labeled sections.
Review Output Template
CODEBLOCK3
Review Principles
- 1. Be direct. Don't hedge. "This is wrong" is more useful than "this might be worth considering."
- Distinguish severity. A missing test on a utility function ≠ a SQL injection vulnerability.
- Context matters. Code that looks wrong in isolation might be the right solution for the system.
- Praise good work. If the code is clean, simple, and well-tested, say so. Reinforce the pattern.
- Actionable over academic. "Consider using a WeakMap" is less useful than "Replace
new Map() with new WeakMap() on line 42 to avoid memory leaks in the closure." - No bikeshedding. Don't flag style preferences that a linter wouldn't flag. Focus on what matters.
When to Escalate (Don't Review Alone)
Escalate to human review for:
- - 🔴 Security vulnerabilities (auth bypass, injection, data exposure)
- 🔴 Breaking changes to external APIs or database schemas
- 🔴 Complex concurrency or distributed systems changes without expert review
- 🔴 Changes to payment, billing, or access control logic
- 🔴 Anything that would require a rollback plan
Inspired by Garry Tan's gstack (github.com/garrytan/gstack) — ported for OpenClaw.
gstack-review:多视角代码审查
审查框架
当被要求审查代码时,请遵循以下三个视角的框架。系统性地执行所有步骤。
步骤0:检测审查内容
审查范围的优先级顺序:
- 1. 未提交的更改(git diff HEAD)——如果工作目录有未暂存更改
- 用户指定的特定文件
- 与主分支的分支差异——如果当前在功能分支上
- 最近的提交——如果没有明确范围
bash
检查工作树状态
git status --short
获取未提交的更改
git diff HEAD
获取当前分支与origin/main之间的已提交更改
BRANCH=$(git branch --show-current 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null)
MAIN_BRANCH=$(git main-branch 2>/dev/null || echo main)
git log ${MAIN_BRANCH}..${BRANCH} --oneline 2>/dev/null | head -20
git diff ${MAIN_BRANCH}..${BRANCH} 2>/dev/null | head -500
列出已更改的文件
git diff --name-only ${MAIN_BRANCH}..${BRANCH} 2>/dev/null
步骤1:收集上下文
在审查之前,收集以下信息:
bash
项目类型和语言
ls
.json .toml
.yaml .gradle *.xml Makefile package.json 2>/dev/null | head -5
cat package.json 2>/dev/null | grep name\|scripts | head -5
运行测试(静默执行,捕获退出码)
TEST_OUTPUT=$(npm test 2>&1 || pytest 2>&1 || cargo test 2>&1 || true)
TEST_EXIT=$?
类型检查/代码检查
LINT_OUTPUT=$(npm run lint 2>&1 || npx tsc --noEmit 2>&1 || true)
构建检查
BUILD_OUTPUT=$(npm run build 2>&1 || cargo build 2>&1 || true)
BUILD_EXIT=$?
步骤2:阅读已更改的文件
对于每个已更改的文件,阅读完整内容以理解上下文。不要只看差异——阅读周围的代码。
对于差异中的每个文件:
1. 阅读完整文件(不仅仅是更改的行——上下文很重要)
2. 识别代码实际做什么与差异声称做什么之间的差异
3. 注意任何仅包含二进制/生成文件的文件(跳过详细审查)
步骤3:CEO/产品视角
提问:这段代码是否服务于业务和用户?
从产品思维的角度进行审查:
- - 正确性:这真的解决了所述的问题吗?
- 复杂度:这比之前的方法简单了10倍吗?还是我们添加了一个毫无意义的抽象层?
- 产品信号:这是用户要求的功能,还是工程师发明的复杂性?
- 范围蔓延:差异是否超出了其原始目的?
- 技术vs业务:这是为了技术本身而开发,还是真正交付了价值?
关键问题: 如果我必须在30秒内向CEO解释这个变更,他们会感到兴奋还是困惑?
步骤4:工程视角
提问:这段代码是否合理、可维护且安全?
- - 架构:这符合现有的模式吗?还是我们在一个框架内发明了一个新框架?
- 错误处理:所有失败模式都处理了吗?(网络、磁盘、无效输入、超时)
- 资源管理:连接关闭了吗?内存泄漏了吗?后台任务取消了吗?
- 依赖:当标准库调用就足够时,我们是否添加了新的重型依赖?
- 安全性:SQL注入、XSS、认证绕过、代码中的秘密、过于宽松的CORS?
- 性能:N+1查询?循环中的未索引查询?热路径中的内存分配操作?
- API设计:接口是否干净且可组合,还是我们泄露了内部状态?
代码质量信号:
- - 函数是否在30行以内?允许例外但需要强有力的理由。
- 变量/函数名称是否有意义?没有temp2finalv3这样的模式。
- 注释是否解释为什么,而不是什么?(代码显示什么)
- 测试是否测试行为,而不是实现?
步骤5:QA/测试视角
提问:这能通过高级工程师对正确性的直觉检查吗?
- - 测试覆盖率:更改的路径是否真的被测试了?不仅仅是测试过而是验证过?
- 边界情况:空输入、最大长度、空字节、Unicode边界情况、并发访问
- 快乐路径:主要用户流程是否端到端正常工作?
- 失败模式:当这段代码出错时会发生什么?用户能检测到吗?
- 冒烟测试:它至少能导入/解析/加载而不崩溃吗?
- 测试质量:测试是在断言行为,还是模拟一切并断言内部实现?
步骤6:组装审查
呈现一个结构化的审查,包含三个明确标记的部分。
审查输出模板
代码审查:[分支名称] — [日期]
═══════════════════════════════════════════════
摘要
[一段:什么改变了以及为什么改变。如果这是一条提交信息,它好吗?]
更改的文件: N个文件 | 行数: +N -N
测试: [通过/失败/无] | 构建: [通过/失败/无] | 代码检查: [通过/失败/无]
🏛️ CEO/产品审查
[要点发现。用🔴标记问题,用✅表扬好的决策]
结论: [明确陈述——发布、重做或与团队讨论]
⚙️ 工程审查
[要点发现。按类别分组:架构、安全性、性能、代码质量]
结论: [明确陈述]
🧪 QA/测试审查
[要点发现。按以下分组:覆盖率、边界情况、正确性]
结论: [明确陈述]
行动项
- - [ ] [优先级] [具体的可操作项——谁应该修复以及如何修复]
- [ ] [优先级] ...
总体: ✅ 批准发布 / ⚠️ 需要修订 / ❌ 阻止
═══════════════════════════════════════════════
审查原则
- 1. 直接了当。 不要含糊其辞。这是错误的比这可能值得考虑更有用。
- 区分严重性。 工具函数上缺少测试 ≠ SQL注入漏洞。
- 上下文很重要。 孤立看起来错误的代码可能是系统的正确解决方案。
- 表扬好的工作。 如果代码干净、简单且经过良好测试,就说出来。强化这种模式。
- 可操作而非学术性。 考虑使用WeakMap不如在第42行将new Map()替换为new WeakMap()以避免闭包中的内存泄漏有用。
- 不要纠结于琐事。 不要标记代码检查工具不会标记的风格偏好。关注重要的事情。
何时升级(不要单独审查)
将以下情况升级到人工审查:
- - 🔴 安全漏洞(认证绕过、注入、数据泄露)
- 🔴 对外部API或数据库模式的破坏性更改
- 🔴 没有专家审查的复杂并发或分布式系统更改
- 🔴 对支付、计费或访问控制逻辑的更改
- 🔴 任何需要回滚计划的内容
灵感来自Garry Tan的gstack(github.com/garrytan/gstack)——为OpenClaw移植。