Context Builder — Agentic Skill
Generate a single, structured markdown file from any codebase directory. The output is optimized for LLM consumption with relevance-based file ordering, AST-aware code signatures, automatic token budgeting, and smart defaults.
Installation
CODEBLOCK0
Pre-built binaries with SHA256 checksums are also available for manual download from GitHub Releases.
Verify: context-builder --version (expected: 0.8.3)
Security & Path Scoping
IMPORTANT: This tool reads file contents from the specified directory. Agents MUST follow these rules:
- - Only target explicit project directories — always pass the exact project root (e.g.,
/home/user/projects/myapp). Never point at home directories, system paths, or credential stores (~/.ssh, ~/.aws, /etc, ~, /) - Use scoped filters — use
-f to limit to known source extensions (e.g., -f rs,toml,md), reducing exposure surface - Output to project-local paths — write output to the project's
docs/ folder or /tmp/, never to shared or public locations - Review before sharing — the output may contain API keys, secrets, or credentials embedded in source files; always review or use
.gitignore patterns to exclude sensitive files
Built-in protections (always active, no configuration needed):
- - Excludes
.git/, node_modules/, and 19 other heavy/sensitive directories at any depth - Respects
.gitignore rules when a .git directory is present - Binary files are auto-detected and skipped via UTF-8 sniffing
- Output file and cache directory are auto-excluded to prevent self-ingestion
When to Use
- - Deep code review — Feed an entire codebase to an LLM for architecture analysis or bug hunting
- Onboarding — Generate a project snapshot for understanding unfamiliar codebases
- Diff-based updates — After code changes, generate only the diffs to update an LLM's understanding
- AST signatures — Extract function/class signatures for token-efficient structural understanding
- Cross-project research — Quickly package a dependency's source for analysis
Core Workflow
1. Quick Context (whole project)
CODEBLOCK1
- -
-y skips confirmation prompts (recommended for agent workflows when path is explicitly scoped) - Output includes: header → file tree → files sorted by relevance (config → source → tests → docs)
2. Scoped Context (specific file types)
CODEBLOCK2
- -
-f rs,toml includes only Rust and TOML files - INLINECODE19 excludes directories by name
3. AST Signatures Mode (minimal tokens)
CODEBLOCK3
- - Replaces full file content with extracted function/class signatures (~4K vs ~15K tokens per file)
- Supports 8 languages: Rust, JavaScript (.js/.jsx), TypeScript (.ts/.tsx), Python, Go, Java, C, C++
- Requires
--features tree-sitter-all at install time
4. Signatures with Structural Summary
CODEBLOCK4
- -
--structure appends a count summary (e.g., "6 functions, 2 structs, 1 impl block") - Combine with
--visibility public to show only public API surface
5. Budget-Constrained Context
CODEBLOCK5
- - Caps output to ~100K tokens (estimated)
- Files are included in relevance order until budget is exhausted
- Automatically warns if output exceeds 128K tokens
6. Token Count Preview
CODEBLOCK6
- - Prints estimated token count without generating output
- Use this first to decide if filtering or
--signatures is needed
7. Incremental Diffs
First, ensure context-builder.toml exists with:
CODEBLOCK7
Then run twice:
CODEBLOCK8
For minimal output (diffs only, no full file bodies):
CODEBLOCK9
Smart Defaults
These behaviors require no configuration:
| Feature | Behavior |
|---|
| Auto-ignore | INLINECODE25 , dist, build, __pycache__, .venv, vendor, and 12 more heavy dirs are excluded at any depth |
| Self-exclusion |
Output file, cache dir, and
context-builder.toml are auto-excluded |
|
.gitignore | Respected automatically when
.git directory exists |
|
Binary detection | Binary files are skipped via UTF-8 sniffing |
|
File ordering | Config/docs first → source (entry points before helpers) → tests → build/CI → lockfiles |
CLI Reference (Agent-Relevant Flags)
| Flag | Purpose | Agent Guidance |
|---|
| INLINECODE33 | Input directory | Always use absolute paths for reliability |
| INLINECODE34 |
Output path | Write to project
docs/ or
/tmp/ |
|
-f <EXT> | Filter by extension | Comma-separated:
-f rs,toml,md |
|
-i <NAME> | Ignore dirs/files | Comma-separated:
-i tests,docs,assets |
|
--max-tokens <N> | Token budget cap | Use
100000 for most models,
200000 for Gemini |
|
--token-count | Dry-run token estimate | Run first to check if filtering is needed |
|
-y | Skip all prompts |
Use only with explicit, scoped project paths |
|
--preview | Show file tree only | Quick exploration without generating output |
|
--diff-only | Output only diffs | Minimizes tokens for incremental updates |
|
--signatures | AST signature extraction | Requires
tree-sitter-all feature at install |
|
--structure | Structural summary | Pair with
--signatures for compact output |
|
--visibility <V> | Filter by visibility |
all (default),
public (public API only) |
|
--truncate <MODE> | Truncation strategy |
smart (AST-aware) or
simple |
|
--init | Create config file | Auto-detects project file types |
|
--clear-cache | Reset diff cache | Use if diff output seems stale |
Recipes
Recipe: Deep Think Code Review
Generate a scoped context file, then prompt an LLM for deep analysis:
CODEBLOCK10
Recipe: API Surface Review (signatures only)
CODEBLOCK11
Recipe: Compare Two Versions
CODEBLOCK12
Recipe: Monorepo Slice
CODEBLOCK13
Recipe: Quick Size Check Before Deciding Strategy
CODEBLOCK14
Configuration File (Optional)
Create context-builder.toml in the project root for persistent settings:
CODEBLOCK15
Initialize one automatically with context-builder --init.
Output Format
The generated markdown follows this structure:
# Directory Structure Report
[metadata: project name, filters, content hash]
## File Tree
[visual tree of included files]
## Files
### File: src/main.rs
[code block with file contents, syntax-highlighted by extension]
### File: src/lib.rs
...
Files appear in relevance order (not alphabetical), prioritizing config and entry points so LLMs build understanding faster.
When --signatures is active, file contents are replaced with extracted signatures:
### File: src/lib.rs
CODEBLOCK16
Error Handling
- - If
context-builder is not installed, install with INLINECODE64 - If
--signatures shows no output for a file, the language may not be supported or the feature was not enabled at install - If output exceeds token limits, add
--max-tokens or narrow with -f / -i, or use INLINECODE69 - If the project has no
.git directory, auto-ignores still protect against dependency flooding - Use
--clear-cache if diff output seems stale or incorrect
上下文构建器 — 智能体技能
从任意代码库目录生成单个结构化的 Markdown 文件。输出经过优化,适合大语言模型消费,具有基于相关性的文件排序、AST 感知的代码签名、自动令牌预算和智能默认值。
安装
bash
需要 Rust 工具链。通过 crates.io 进行加密验证,从源码构建。
cargo install context-builder --features tree-sitter-all
预构建的二进制文件及其 SHA256 校验和也可从 GitHub Releases 手动下载。
验证:context-builder --version(预期输出:0.8.3)
安全与路径范围
重要:此工具从指定目录读取文件内容。智能体必须遵循以下规则:
- - 仅针对明确的项目目录 — 始终传递确切的项目根目录(例如 /home/user/projects/myapp)。切勿指向主目录、系统路径或凭据存储位置(~/.ssh、~/.aws、/etc、~、/)
- 使用范围过滤器 — 使用 -f 限制为已知的源文件扩展名(例如 -f rs,toml,md),减少暴露面
- 输出到项目本地路径 — 将输出写入项目的 docs/ 文件夹或 /tmp/,切勿写入共享或公共位置
- 分享前审查 — 输出可能包含嵌入在源文件中的 API 密钥、机密或凭据;始终审查或使用 .gitignore 模式排除敏感文件
内置保护(始终激活,无需配置):
- - 在任何深度排除 .git/、node_modules/ 和其他 19 个重量级/敏感目录
- 当存在 .git 目录时,遵循 .gitignore 规则
- 通过 UTF-8 嗅探自动检测并跳过二进制文件
- 自动排除输出文件和缓存目录,防止自我吞噬
何时使用
- - 深度代码审查 — 将整个代码库提供给大语言模型进行架构分析或漏洞查找
- 入职培训 — 生成项目快照以理解不熟悉的代码库
- 基于差异的更新 — 代码更改后,仅生成差异以更新大语言模型的理解
- AST 签名 — 提取函数/类签名,实现令牌高效的结构理解
- 跨项目研究 — 快速打包依赖项的源码进行分析
核心工作流
1. 快速上下文(整个项目)
bash
context-builder -d /path/to/project -y -o context.md
- - -y 跳过确认提示(当路径明确限定时,推荐用于智能体工作流)
- 输出包括:头部 → 文件树 → 按相关性排序的文件(配置 → 源码 → 测试 → 文档)
2. 限定范围上下文(特定文件类型)
bash
context-builder -d /path/to/project -f rs,toml -i docs,assets -y -o context.md
- - -f rs,toml 仅包含 Rust 和 TOML 文件
- -i docs,assets 按名称排除目录
3. AST 签名模式(最小令牌数)
bash
context-builder -d /path/to/project --signatures -f rs,ts,py -y -o signatures.md
- - 用提取的函数/类签名替换完整文件内容(每个文件约 4K 令牌 vs 约 15K 令牌)
- 支持 8 种语言:Rust、JavaScript(.js/.jsx)、TypeScript(.ts/.tsx)、Python、Go、Java、C、C++
- 安装时需要 --features tree-sitter-all
4. 带结构摘要的签名
bash
context-builder -d /path/to/project --signatures --structure -y -o context.md
- - --structure 附加计数摘要(例如 6 个函数,2 个结构体,1 个 impl 块)
- 结合 --visibility public 仅显示公共 API 表面
5. 预算受限上下文
bash
context-builder -d /path/to/project --max-tokens 100000 -y -o context.md
- - 将输出限制在约 100K 令牌(估算值)
- 文件按相关性顺序包含,直到预算耗尽
- 如果输出超过 128K 令牌,自动发出警告
6. 令牌计数预览
bash
context-builder -d /path/to/project --token-count
- - 打印估算的令牌计数,不生成输出
- 先使用此命令决定是否需要过滤或使用 --signatures
7. 增量差异
首先,确保存在 context-builder.toml,内容如下:
toml
timestamped_output = true
auto_diff = true
然后运行两次:
bash
第一次运行:基线快照
context-builder -d /path/to/project -y
代码更改后:生成差异注释
context-builder -d /path/to/project -y
对于最小输出(仅差异,无完整文件内容):
bash
context-builder -d /path/to/project -y --diff-only
智能默认值
这些行为无需配置:
| 特性 | 行为 |
|---|
| 自动忽略 | node_modules、dist、build、pycache、.venv、vendor 和其他 12 个重量级目录在任何深度被排除 |
| 自我排除 |
输出文件、缓存目录和 context-builder.toml 自动排除 |
|
.gitignore | 当存在 .git 目录时自动遵循 |
|
二进制检测 | 通过 UTF-8 嗅探跳过二进制文件 |
|
文件排序 | 配置/文档优先 → 源码(入口点优先于辅助文件)→ 测试 → 构建/CI → 锁定文件 |
CLI 参考(智能体相关标志)
| 标志 | 用途 | 智能体指南 |
|---|
| -d <PATH> | 输入目录 | 始终使用绝对路径以确保可靠性 |
| -o <FILE> |
输出路径 | 写入项目 docs/ 或 /tmp/ |
| -f
| 按扩展名过滤 | 逗号分隔:-f rs,toml,md |
| -i | 忽略目录/文件 | 逗号分隔:-i tests,docs,assets |
| --max-tokens | 令牌预算上限 | 大多数模型使用 100000,Gemini 使用 200000 |
| --token-count | 空运行令牌估算 | 先运行以检查是否需要过滤 |
| -y | 跳过所有提示 | 仅与明确限定的项目路径一起使用 |
| --preview | 仅显示文件树 | 快速探索,不生成输出 |
| --diff-only | 仅输出差异 | 最小化增量更新的令牌数 |
| --signatures | AST 签名提取 | 安装时需要 tree-sitter-all 特性 |
| --structure | 结构摘要 | 与 --signatures 配对使用,实现紧凑输出 |
| --visibility | 按可见性过滤 | all(默认)、public(仅公共 API) |
| --truncate | 截断策略 | smart(AST 感知)或 simple |
| --init | 创建配置文件 | 自动检测项目文件类型 |
| --clear-cache | 重置差异缓存 | 如果差异输出看起来过时则使用 |
配方
配方:深度思考代码审查
生成限定范围的上下文文件,然后提示大语言模型进行深度分析:
bash
步骤 1:生成聚焦上下文
context-builder -d /path/to/project -f rs,toml --max-tokens 120000 -y -o docs/deepthinkcontext.md
步骤 2:将上下文提供给大语言模型并附带审查提示
附加 docs/deepthinkcontext.md 并请求:
- 架构审查
- 漏洞查找
- 性能分析
配方:API 表面审查(仅签名)
bash
仅提取公共签名 — 通常比完整源码少 80-90% 的令牌
context-builder -d /path/to/project --signatures --visibility public -f rs -y -o docs/api_surface.md
配方:比较两个版本
bash
为两个版本生成上下文
context-builder -d ./v1 -f py -y -o /tmp/v1_context.md
context-builder -d ./v