Code Archaeology Skill
Overview
Code Archaeology is a systematic analysis methodology for understanding legacy codebases and extracting actionable insights for modernization. This skill provides tools and workflows for:
- - Business Rule Extraction: Identify and document business logic from legacy code
- Technical Specification Generation: Extract data models, API contracts, and system architecture
- Security Risk Assessment: Identify security vulnerabilities and technical debt
- Migration Planning: Generate detailed migration requirements and task breakdowns
- AI Plan Generator Integration: Convert analysis results into AI-executable context documents
Unified Directory Structure
Code Archaeology results are organized in a standardized directory structure:
CODEBLOCK0
Core Capabilities
1. Multi-Round Analysis
- - Round 1: Business domain mapping and core architecture analysis
- Round 2: Deep technical assessment (security, performance, optimization)
2. Domain-Specific Analysis
- - Financial Management: Payment processing, invoicing, reconciliation
- Customer Management: User authentication, profile management
- Contract Management: Contract lifecycle, status transitions
- Supply Chain: Inventory, procurement, logistics
3. Security Risk Identification
- - Critical: Hardcoded credentials, SQL injection vulnerabilities
- High: Weak password storage, session management issues
- Medium: XSS/CSRF protection gaps, insecure file permissions
4. Technical Debt Assessment
- - Architecture: Monolithic limitations, lack of layered architecture
- Code Quality: Code duplication, outdated language features
- Maintainability: Missing documentation, poor test coverage
- Performance: Database query optimization, caching mechanisms
AI Plan Generator Integration
Code Archaeology results can be directly consumed by AI Plan Generator to create:
- - Campaign Documents: Strategic migration plans with clear boundaries
- Context Documents: AI-executable business rules and technical specifications
- Task Decomposition: Detailed implementation tasks with priorities and dependencies
- Validation Standards: Comprehensive testing requirements and acceptance criteria
Integration Workflow
CODEBLOCK1
Usage Guidelines
When to Use
- - Legacy System Modernization: Planning migration from PHP 5.x, legacy Java, etc.
- Business Logic Documentation: Extracting undocumented business rules
- Security Remediation: Identifying and prioritizing security vulnerabilities
- Technical Debt Reduction: Planning systematic codebase improvements
Input Requirements
- - Source Code Access: Full access to legacy codebase
- Business Context: Understanding of business domains and requirements
- Target Architecture: Clear vision of target modern architecture
Output Artifacts
- - Comprehensive Reports: Executive summaries and detailed technical analysis
- Actionable Recommendations: Prioritized improvement and migration tasks
- Risk Assessments: Security and business continuity risk evaluations
- Integration Ready: Structured data for AI Plan Generator consumption
Best Practices
Analysis Process
- 1. Start Broad: Begin with high-level business domain mapping
- Go Deep: Focus on critical domains (financial, security-sensitive)
- Validate Findings: Cross-reference analysis results with business stakeholders
- Iterate: Refine analysis based on feedback and new discoveries
Documentation Standards
- - Machine Readable: Structure outputs for AI consumption
- Human Understandable: Provide clear explanations for business stakeholders
- Action Oriented: Focus on actionable insights and recommendations
- Version Controlled: Track analysis evolution over time
Integration Patterns
- - ClawTeam Orchestration: Use analysis results to drive multi-agent coordination
- Continuous Validation: Regularly validate AI interpretations against original code
- Feedback Loops: Use implementation results to refine future analyses
Example Use Cases
Financial Module Migration
Input: Legacy PHP financial system with hardcoded credentials
Analysis: Identifies payment processing logic, security vulnerabilities, data models
Output: Complete migration plan with security remediation and validation standards
User Authentication Modernization
Input: Custom authentication system with weak password storage
Analysis: Extracts user management workflows, identifies security gaps
Output: Modern authentication implementation plan with proper security controls
API Standardization
Input: Inconsistent RPC-style APIs across multiple modules
Analysis: Documents all API endpoints, request/response formats, error handling
Output: RESTful API redesign specification with backward compatibility strategy
Code Archaeology transforms legacy code understanding from an art into a systematic, repeatable science that powers AI-driven modernization.
Integration Scripts
This skill includes integration scripts for converting Code Archaeology results to AI Plan Generator format:
- -
convert-to-ai-plan-generator.cjs: Main conversion utility - INLINECODE1 : Core parsing and extraction logic
- INLINECODE2 : File location and organization management
Usage
CODEBLOCK2
代码考古技能
概述
代码考古是一种系统分析方法论,用于理解遗留代码库并提取可操作的现代化改造洞察。该技能提供以下工具和工作流程:
- - 业务规则提取:从遗留代码中识别并记录业务逻辑
- 技术规范生成:提取数据模型、API契约和系统架构
- 安全风险评估:识别安全漏洞和技术债务
- 迁移规划:生成详细的迁移需求和任务分解
- AI计划生成器集成:将分析结果转换为AI可执行的上下文文档
统一目录结构
代码考古结果按标准化目录结构组织:
{项目}codearchaeology/
├── results/ # 主要分析输出(用于AI集成)
│ ├── {项目}apianalysis.md
│ ├── {项目}securityaudit_results.md
│ ├── {项目}performanceanalysis.md
│ ├── {项目}technicaldebt_assessment.md
│ ├── {项目}optimizationrecommendations.md
│ └── {项目}codearchaeologyfinalreport.md
├── process/ # 详细分析产物(30+文件)
│ ├── 01-system-constants-analysis.md
│ ├── 02-database-schema-analysis.md
│ ├── 03-business-domain-file-list.md
│ ├── {领域}-analysis.md(按业务领域)
│ └── round2_progress.json
├── source/ # 原始源代码参考
│ └── {项目}/
└── {项目}archaeologystatus.json # 分析状态跟踪
核心能力
1. 多轮分析
- - 第一轮:业务领域映射和核心架构分析
- 第二轮:深度技术评估(安全、性能、优化)
2. 领域特定分析
- - 财务管理:支付处理、发票开具、对账
- 客户管理:用户认证、档案管理
- 合同管理:合同生命周期、状态转换
- 供应链:库存、采购、物流
3. 安全风险识别
- - 严重:硬编码凭据、SQL注入漏洞
- 高危:弱密码存储、会话管理问题
- 中危:XSS/CSRF防护缺失、不安全的文件权限
4. 技术债务评估
- - 架构:单体架构限制、缺乏分层架构
- 代码质量:代码重复、过时的语言特性
- 可维护性:缺少文档、测试覆盖率低
- 性能:数据库查询优化、缓存机制
AI计划生成器集成
代码考古结果可直接由AI计划生成器消费,以创建:
- - 活动文档:具有明确边界的战略迁移计划
- 上下文文档:AI可执行业务规则和技术规范
- 任务分解:具有优先级和依赖关系的详细实施任务
- 验证标准:全面的测试要求和验收标准
集成工作流程
bash
1. 运行代码考古分析
code-archaeology analyze legacy-project --output-dir legacy
projectcode_archaeology
2. 从考古结果生成AI计划生成器上下文
ai-plan-generator generate-context-from-archaeology \
/path/to/legacy
projectcode_archaeology \
context-documents \
finance
3. 验证上下文文档完整性
ai-plan-generator analyze-completeness context-documents
4. 创建ClawTeam迁移团队
clawteam create --name finance-migration --description-file campaign.md
使用指南
何时使用
- - 遗留系统现代化:规划从PHP 5.x、遗留Java等迁移
- 业务逻辑文档化:提取未记录的业规则
- 安全修复:识别并优先处理安全漏洞
- 技术债务减少:规划系统化代码库改进
输入要求
- - 源代码访问:完全访问遗留代码库
- 业务上下文:理解业务领域和需求
- 目标架构:明确的目标现代化架构愿景
输出产物
- - 综合报告:执行摘要和详细技术分析
- 可操作建议:按优先级排序的改进和迁移任务
- 风险评估:安全和业务连续性风险评估
- 集成就绪:供AI计划生成器消费的结构化数据
最佳实践
分析流程
- 1. 从宏观开始:从高层业务领域映射开始
- 深入挖掘:聚焦关键领域(财务、安全敏感)
- 验证发现:与业务利益相关者交叉验证分析结果
- 迭代优化:基于反馈和新发现完善分析
文档标准
- - 机器可读:为AI消费结构化输出
- 人类可理解:为业务利益相关者提供清晰解释
- 行动导向:聚焦可操作的洞察和建议
- 版本控制:随时间跟踪分析演进
集成模式
- - ClawTeam编排:使用分析结果驱动多智能体协调
- 持续验证:定期验证AI解释与原始代码的一致性
- 反馈循环:使用实施结果完善未来分析
示例用例
财务模块迁移
输入:带有硬编码凭据的遗留PHP财务系统
分析:识别支付处理逻辑、安全漏洞、数据模型
输出:包含安全修复和验证标准的完整迁移计划
用户认证现代化
输入:具有弱密码存储的自定义认证系统
分析:提取用户管理工作流程,识别安全缺口
输出:具有适当安全控制的现代化认证实施计划
API标准化
输入:跨多个模块的不一致RPC风格API
分析:记录所有API端点、请求/响应格式、错误处理
输出:具有向后兼容性策略的RESTful API重新设计规范
代码考古将遗留代码理解从一门艺术转变为一种系统化、可重复的科学,为AI驱动的现代化提供动力。
集成脚本
该技能包含用于将代码考古结果转换为AI计划生成器格式的集成脚本:
- - convert-to-ai-plan-generator.cjs:主要转换工具
- code-archaeology-integrator.cjs:核心解析和提取逻辑
- process-file-manager.cjs:文件位置和组织管理
使用方法
bash
node convert-to-ai-plan-generator.cjs /path/to/archaeology-results output-dir domain