FL Plugin — Model Migration Skill

Usage

CODEBLOCK0

Argument	Required	Default
INLINECODE0	Yes	—
INLINECODE1

Execution

Step 1: Parse arguments and validate paths

Extract from user input:

- {{model_name}} = first argument (required, snake_case)
INLINECODE5 = second argument or INLINECODE6
INLINECODE7 = third argument or current working directory

If {{upstream_folder}} doesn't exist, ask user whether to clone it. If {{plugin_folder}} doesn't exist, error out.

→ Tell user: Confirm parsed model name and paths.

Step 2: Load references and resolve placeholders

Read these files (relative to this SKILL.md):

- references/procedure.md — step-by-step migration procedure
INLINECODE11 — 0.13.0 patch catalog
INLINECODE12 — communication, TaskList, bash rules, resilience

The procedure references executable scripts in scripts/:

- scripts/validate_migration.py — automated code review (Step 6)
INLINECODE15 — benchmark verification (Step 9)
INLINECODE16 — serve model locally (Step 10.1, also used for E2E)
INLINECODE17 — test request (Step 10.2)
INLINECODE18 — E2E correctness verification (Step 11)
INLINECODE19 — test prompts for E2E (5 text + 5 multimodal)
INLINECODE20 — E2E config template (copy to e2e_config.json and fill in)
INLINECODE22 — manage GT server on remote machine via SSH

Then investigate upstream source + HuggingFace to resolve all placeholders:

Placeholder	How to derive
INLINECODE23	Direct from argument
INLINECODE24

Lowercase of model_name (usually identical, e.g. qwen3_5) — used in file paths |
| {{MODEL_DISPLAY_NAME}} | From upstream code or HF model card |
| {{ModelClassName}} | From upstream model class (PascalCase) |
| {{model_type}} | From HF config.json model_type field |
| {{ConfigClassName}} | From upstream or derive from model_type |
| {{skill_root}} | Absolute path to this skill's folder (the directory containing this SKILL.md) |

Naming conventions vary per model — always verify from actual source, never guess.

→ Tell user: Present all resolved values. Use AskUserQuestion if anything is ambiguous.

Step 3: Execute procedure

With placeholders resolved, execute every step in procedure.md sequentially. Apply patches from compatibility-patches.md during the copy-then-patch step. Follow operational-rules.md throughout.

→ Tell user: Before starting, output a numbered plan. Report progress at each step boundary.

Scripts Reference

Script	Step	Description
INLINECODE35	6	Automated import/API/registration checks
INLINECODE36

Examples

Example 1: Typical new model
CODEBLOCK1

Example 2: Re-run after upstream update
CODEBLOCK2

Troubleshooting

General principle: When any runtime error occurs, first compare vLLM upstream code against both the plugin adaptation and the installed 0.13.0 environment. The diff is the fastest path to root cause. See operational-rules.md § Debugging Priority: Upstream-First for the full protocol.

Problem	Typical Cause	Fix
INLINECODE46 after copy-then-patch	Missing P1 fix (relative→absolute imports)	Verify all `from .xxx` converted to `from vllm.*` or INLINECODE49
INLINECODE50

API doesn't exist in 0.13.0 | Check P3 in compatibility-patches.md; stub or remove |
| Config not recognized by vLLM | model_type mismatch or config bridge missing | Verify _CONFIG_REGISTRY[model_type] matches HF config.json exactly |
| Registration has no effect | Class name or import path typo | Compare with existing registrations in __init__.py |
| Benchmark KeyError on config field | Config bridge missing a field | Compare upstream config class vs bridge; add missing fields with defaults |
| Benchmark/Serve fails with OOM or "insufficient memory" | GPUs occupied by other processes | Kill GPU processes: nvidia-smi --query-compute-apps=pid --format=csv,noheader \| xargs -r kill -9 then retry. Never skip these steps. |
| Model outputs garbled/gibberish text | ColumnParallelLinear used for merged projections with different sub-dimensions (TP sharding mismatch) | Override __init__ to use MergedColumnParallelLinear(output_sizes=[...]). See P8 in compatibility-patches.md |
| AssertionError: Duplicate op name | Child class imports custom op from different module path than parent | Use same import path as parent module (e.g. vllm_fl.ops.fla not vllm_fl.models.fla_ops). See P11 |
| AttributeError on fused_recurrent_* during CUDA graph warmup | __init__ override with nn.Module.__init__(self) missed attributes used by inherited _forward_core | Create ALL attributes from parent's __init__, especially custom ops. See P12 |
| E2E: local server not reachable | serve.sh port doesn't match e2e_config.json local port | Ensure both use same port (default 8122) |
| E2E: GT server not reachable | GT machine down or docker/conda env wrong | Check e2e_remote_serve.sh status or SSH manually |
| E2E: early token divergence (first 5 tokens) | Weight loading bug, TP sharding error | Check load_weights, stacked_params_mapping, MergedColumnParallelLinear |
| E2E: late minor divergence (token #15+) | Numerical noise from different op implementations | Usually acceptable; document in report |
| resolve_op fails with VLLM_FL_PREFER_ENABLED=false | Op not registered in dispatch, no fallback | Add try/except fallback to flag_gems in op import code |

FL 插件 — 模型迁移技能

用法

/model-migrate-flagos <模型名称> [上游文件夹] [插件文件夹]

参数	是否必需	默认值
模型名称	是	—
上游文件夹

执行流程

步骤 1：解析参数并验证路径

从用户输入中提取：

- {{模型名称}} = 第一个参数（必需，蛇形命名）
{{上游文件夹}} = 第二个参数或 /tmp/vllm-upstream-ref
{{插件文件夹}} = 第三个参数或当前工作目录

如果 {{上游文件夹}} 不存在，询问用户是否克隆。如果 {{插件文件夹}} 不存在，报错退出。

→ 告知用户：确认解析后的模型名称和路径。

步骤 2：加载参考文件并解析占位符

读取以下文件（相对于本 SKILL.md）：

- references/procedure.md — 逐步迁移流程
references/compatibility-patches.md — 0.13.0 补丁目录
references/operational-rules.md — 通信、任务列表、bash 规则、弹性策略

流程中引用了 scripts/ 中的可执行脚本：

- scripts/validatemigration.py — 自动化代码审查（步骤 6）
scripts/benchmark.sh — 基准测试验证（步骤 9）
scripts/serve.sh — 本地部署模型（步骤 10.1，也用于端到端测试）
scripts/request.sh — 测试请求（步骤 10.2）
scripts/e2eeval.py — 端到端正确性验证（步骤 11）
scripts/e2etestprompts.json — 端到端测试提示词（5 个文本 + 5 个多模态）
scripts/e2econfig.template.json — 端到端配置模板（复制为 e2econfig.json 并填写）
scripts/e2eremoteserve.sh — 通过 SSH 管理远程机器上的 GT 服务器

然后调查上游源码 + HuggingFace 以解析所有占位符：

占位符	推导方式
{{模型名称}}	直接来自参数
{{模型名称小写}}

模型名称的小写形式（通常相同，例如 qwen35）— 用于文件路径 |
| {{模型显示名称}} | 来自上游代码或 HF 模型卡片 |
| {{模型类名}} | 来自上游模型类（大驼峰命名） |
| {{模型类型}} | 来自 HF config.json 的 model_type 字段 |
| {{配置类名}} | 来自上游或从模型类型推导 |
| {{技能根目录}} | 本技能文件夹的绝对路径（包含此 SKILL.md 的目录） |

不同模型的命名约定各异 — 务必从实际源码验证，切勿猜测。

→ 告知用户：展示所有已解析的值。如有任何歧义，使用 AskUserQuestion。

步骤 3：执行流程

占位符解析完成后，按顺序执行 procedure.md 中的每一步。在复制后修补步骤中应用 compatibility-patches.md 中的补丁。全程遵循 operational-rules.md。

→ 告知用户：开始前，输出编号计划。在每个步骤边界报告进度。

脚本参考

脚本	步骤	描述
validate_migration.py	6	自动化导入/API/注册检查
benchmark.sh

示例

示例 1：典型新模型

用户说：/model-migrate-flagos kimi_k25
操作：
1. 解析 → 模型名称=kimi_k25，上游/插件路径使用默认值
2. 克隆上游，找到 vllm/modelexecutor/models/kimik25.py
3. 发现它封装了 DeepseekV2 → 遵循 kimi_k25（封装器）模式
4. 复制文件，应用 P1+P2 补丁，创建配置桥接
5. 注册、验证、测试、基准测试、部署+请求
6. 与上游 GT 进行端到端验证
结果：kimi_k25 在插件中完全可用，全部 11 个步骤通过

示例 2：上游更新后重新运行

用户说：重新迁移 qwen3_5，上游已更新
操作：
1. 幂等重运行 — 用新的上游副本覆盖现有文件
2. 重新应用补丁，重新验证，重新测试
3. 重新运行端到端测试以确认无回归
结果：qwen3_5 更新至最新上游版本，无回归

故障排除

通用原则：发生任何运行时错误时，首先将 vLLM 上游代码与插件适配及已安装的 0.13.0 环境进行对比。差异是定位根因的最快途径。详见 operational-rules.md § 调试优先级：上游优先的完整协议。

问题	典型原因	修复方法
复制后修补出现 ImportError	缺少 P1 修复（相对→绝对导入）	验证所有 from .xxx 已转换为 from vllm. 或 from vllm_fl.
AttributeError: module vllm has no attribute X

model-migrate-flagos模型迁移标志

model-migrate-flagos

FL Plugin — Model Migration Skill

Usage

Execution

Step 1: Parse arguments and validate paths