Ark Video Storyboard

Turn a scene idea into a structured video plan, then optionally execute it with the Ark video generation API.

This skill is confirmation-first:

- First generate storyboard + prompts
Let the user review and revise
Only generate video after explicit user approval

Workflow

1. 接收场景描述 — 用户描述视频场景（如"下班后去赛博朋克网吧打游戏"）
询问参考图 — 用户描述场景后，主动询问："你有参考图吗？"（图片用于风格/人物参考）
确认参考图角色 — 如果用户提供了参考图，询问："这张图是背景/风格参考还是人物形象参考？"

- 背景/风格参考：作为环境、色调、氛围的视觉基准 - 人物形象参考：作为主角外貌、着装、动作的基准

4. 确认人物描述 — 如果有多个视频片段且没有人物参考图，主动询问用户："这个视频里主角的人物描述是什么？"（如"东亚男性、黑色短发、穿白色T恤"），收集后在每个段落提示词里保持完全一致
生成脚本 — 展开场景为更丰富的整体脚本，拆分为多个连贯段落
输出分镜 — 每个段落包含：参考图用途说明、人物描述（多段一致）、画面描述，光照状态、连贯性备注、英文 AI 提示词（含参考图风格描述+一致的人物描述）
用户确认 — 展示分镜给用户确认："这是不是你要的脚本/提示词？"
修改 — 用户如需调整（风格、节奏、镜头语言、人物细节、提示词措辞），修改后重新展示
执行确认 — 用户确认后，询问"是否开始生成视频？"
提交 API — 用户明确说"可以/开始生成"后，提交给 Ark API，逐段轮询结果，下载视频
合并并发送 — 所有片段下载完成后，用 ffmpeg 合并为一个完整视频，检查大小（飞书限制约 20MB），必要时压缩，通过飞书发送给用户

视频合并与发送流程

所有片段下载完成后，按以下步骤合并并发送给用户：

第一步：定位片段目录

Ark API 下载的视频片段默认保存在 ~/.openclaw/media/{timestamp}/，按时间戳组织。确认目录存在：

CODEBLOCK0

第二步：合并视频

1. 创建片段列表文件

CODEBLOCK1

seg 序号与片段数量一致，逐行追加。

2. 执行合并

CODEBLOCK2

3. 验证

CODEBLOCK3

第三步：检查大小并压缩（如需要）

飞书直接发送限制约 20MB：

- ≤20MB：直接使用 INLINECODE1
>20MB：压缩后再发

CODEBLOCK4

第四步：发送至飞书

使用 message 工具发送文件：

- filePath: INLINECODE4
INLINECODE5: INLINECODE6
INLINECODE7: 告知用户视频已合并完成，共多少片段，时长多少

第五步：更新工作流记录

在 ~/.openclaw/workspace/WORKFLOW.md 中记录本次处理信息（时间戳、片段数量、输出文件路径、文件大小）。

人物一致性规则（关键）

如果视频有多个片段，且用户没有提供人物参考图，则：

- 在步骤4中主动询问人物描述
在每个段落的提示词里保持完全相同的人物描述（外貌、发型、着装等措辞必须一字不差）
人物描述格式示例：INLINECODE9

如果用户提供了人物参考图，则每个提示词里统一写：INLINECODE10

Interaction Phases

Phase 1: Script / Prompt Confirmation

- User gives the scene, style, references, and goal.
If images are provided, first confirm whether each one is a background/environment reference or a character/subject reference.
If multiple segments and no character reference image, ask for a consistent character description.
Generate the storyboard, segment plan, and English prompts first.
Ask the user whether this version is correct.
If the user asks to tweak tone, pacing, camera language, subject details, prompt wording, or image-role interpretation, revise and show the updated version again.
Do not call the Ark API in this phase unless the user explicitly asks for direct generation.

Phase 2: Execution Confirmation

- After the user confirms the script/prompt is correct, ask whether to start generation if they have not already made that explicit.
Only run the API submission / polling / download flow after explicit approval.
If submission fails, immediately report the exact stage and error.

Input Requirements

Collect as many of these as possible before writing prompts:

- Reference image or images（主动询问用户是否有参考图）
Scene description
Subject or product
Target style (cinematic, cozy, commercial, dreamy, realistic, etc.)
Intended use (ad, social clip, atmosphere film, storytelling, product demo)
Constraints such as camera language, pacing, lighting, or ending mood
Total duration target
Segment count target
Consistent character description（多段无人物参考图时必须收集）

If inputs are incomplete, still proceed with reasonable defaults and clearly state the assumptions.

Hard Rules

- Default all human characters to East Asian / 东方亚洲人 unless the user explicitly specifies otherwise.
All segments must belong to the same video, not unrelated clips.
Maintain continuity for character appearance, wardrobe, environment, props, lighting logic, and emotional progression.
Write the planning fields in Chinese unless the user requests another language.
Write the final generation prompts in English unless the user explicitly wants Chinese prompts.
Prefer cinematic, visual, action-oriented prompts over abstract descriptions.
Do not silently retry failed API submissions in the background without telling the user.

Segment Output Format

Follow the schema in references/storyboard-schema.md.

At minimum include:

- Segment index
Duration seconds
Character description (same across all segments)
Visual description
Lighting state
Continuity notes
English AI prompt (includes consistent character + reference style)

Prompt Construction Rules

When writing a segment prompt, include the details that matter most for video generation:

- Subject identity and appearance (consistent character description, same in every segment)
Camera angle or shot type
Motion or action
Scene and environment
Lighting and mood
Pacing or motion quality
Style words only when they improve consistency
Reference image style description (if provided by the user, e.g., "in cyberpunk neon city style per reference image")
For all segments of the same video: character description must be IDENTICAL

Sequence Design Rules

Use this narrative rhythm by default unless the user asks for a different structure:

- Segment 1: establish subject, place, and mood
Segment 2: deepen action or environment interaction
Segment 3: push visual/emotional peak or transition
Final segment: resolve, land, or fade out with a clear ending image

For more than 4 segments, insert additional deepen / transition beats while preserving continuity.

Duration / Segment Logic

This skill should support dynamic segment splitting.

Examples:

- 60 seconds ÷ 4 segments = 15 seconds each
60 seconds ÷ 6 segments = 10 seconds each
60 seconds ÷ 12 segments = 5 seconds each

Current validated Seedance 1.5 Pro rule from user-confirmed testing:

- duration must be an integer in the range [4, 12]

So before execution:

1. Compute INLINECODE13
Ensure the result is an integer
Ensure the result is within 4~12
If not, stop and explain the issue to the user before submitting

API Execution

API key loading order for actual generation:

1. Explicit wrapper argument if one is added later
Environment variable INLINECODE14
INLINECODE15 → INLINECODE16
Backward-compatible old format: INLINECODE17

If the user wants actual generation, read references/api.md and use the scripts:

- scripts/build_storyboard.py to assemble structured segment data
INLINECODE20 to sequentially submit segments, poll each task, collect video_url, and optionally download videos
INLINECODE22 to submit one segment at a time
INLINECODE23 to query a task once and extract INLINECODE24
INLINECODE25 to poll until completion and return INLINECODE26
INLINECODE27 to download a finished video_url to local storage

Submit segments sequentially, not in parallel, unless the user explicitly asks otherwise.

Current Known Ark Payload Requirements

Current known request requirements include:

- INLINECODE29
INLINECODE30 (first item is the text prompt)
INLINECODE31
INLINECODE32
INLINECODE33
Reference image: use {"type": "image_url", "image_url": {"url": "<data_uri or url>"}} in content array

Current validated model in this workspace:

- INLINECODE35

Error Handling Rule

If API submission fails, returns any model / parameter / schema error, or returns no valid task_id:

- Stop immediately
Tell the user the exact failing segment and stage
Show the key error message
Explain which parameter or payload assumption most likely caused it
Do not pretend generation is still running
Do not continue to later segments

If API submission succeeds and returns a valid task_id:

- Continue to the next segment by default without interrupting the user for each success
Do not notify the user for each successful segment submission
After all segments are successfully submitted, send one consolidated update that all tasks are in the Ark queue and generation is underway

Example Shape

A good segment should look like this:

- 人物描述：东亚男性，黑色短发，白色T恤（所有段落一致）
参考图用途：背景/风格参考（温馨卧室，城市夜景窗外，暖色灯光）
画面描述：描述主体、动作、构图，环境变化
光照状态：明确亮度，主光、轮廓光、氛围变化
AI 提示词：人物描述 + 镜头 + 动作 + 光线 + 情绪，提示词末尾加参考图风格描述

See references/examples.md for a concrete sleeping-scene example.

When To Read References

- Read references/storyboard-schema.md before generating structured segments.
Read references/prompt-rules.md when you need guardrails for prompt quality or continuity.
Read references/api.md before building or submitting API payloads.
Read references/examples.md when the user wants output that matches the example style.

Ark 视频分镜

将场景创意转化为结构化的视频方案，并可选择通过 Ark 视频生成 API 执行。

本技能采用先确认后执行的模式：

- 首先生成分镜脚本和提示词
让用户审查和修改
仅在用户明确批准后才生成视频

工作流程

1. 接收场景描述 — 用户描述视频场景（如下班后去赛博朋克网吧打游戏）
询问参考图 — 用户描述场景后，主动询问：你有参考图吗？（图片用于风格/人物参考）
确认参考图角色 — 如果用户提供了参考图，询问：这张图是背景/风格参考还是人物形象参考？

- 背景/风格参考：作为环境、色调、氛围的视觉基准 - 人物形象参考：作为主角外貌、着装、动作的基准

4. 确认人物描述 — 如果有多个视频片段且没有人物参考图，主动询问用户：这个视频里主角的人物描述是什么？（如东亚男性、黑色短发、穿白色T恤），收集后在每个段落提示词里保持完全一致
生成脚本 — 展开场景为更丰富的整体脚本，拆分为多个连贯段落
输出分镜 — 每个段落包含：参考图用途说明、人物描述（多段一致）、画面描述，光照状态、连贯性备注、英文 AI 提示词（含参考图风格描述+一致的人物描述）
用户确认 — 展示分镜给用户确认：这是不是你要的脚本/提示词？
修改 — 用户如需调整（风格、节奏、镜头语言、人物细节、提示词措辞），修改后重新展示
执行确认 — 用户确认后，询问是否开始生成视频？
提交 API — 用户明确说可以/开始生成后，提交给 Ark API，逐段轮询结果，下载视频
合并并发送 — 所有片段下载完成后，用 ffmpeg 合并为一个完整视频，检查大小（飞书限制约 20MB），必要时压缩，通过飞书发送给用户

视频合并与发送流程

所有片段下载完成后，按以下步骤合并并发送给用户：

第一步：定位片段目录

Ark API 下载的视频片段默认保存在 ~/.openclaw/media/{timestamp}/，按时间戳组织。确认目录存在：

bash
ls ~/.openclaw/media/{timestamp}/seg*.mp4

第二步：合并视频

1. 创建片段列表文件

bash
cd ~/.openclaw/media/{timestamp}/
echo file seg1.mp4\nfile seg2.mp4\n... > concat.txt

seg 序号与片段数量一致，逐行追加。

2. 执行合并

bash
ffmpeg -f concat -safe 0 -i concat.txt -c copy merged.mp4

3. 验证

bash
ls -lh merged.mp4
ffprobe -v quiet -printformat json -showformat merged.mp4

第三步：检查大小并压缩（如需要）

飞书直接发送限制约 20MB：

- ≤20MB：直接使用 merged.mp4
>20MB：压缩后再发

bash
ffmpeg -i merged.mp4 \
-c:v libx264 \
-crf 28 \
-c:a aac \
-b:a 128k \
-y merged_compressed.mp4

第四步：发送至飞书

使用 message 工具发送文件：

- filePath: ~/.openclaw/media/{timestamp}/merged_compressed.mp4
channel: feishu
message: 告知用户视频已合并完成，共多少片段，时长多少

第五步：更新工作流记录

在 ~/.openclaw/workspace/WORKFLOW.md 中记录本次处理信息（时间戳、片段数量、输出文件路径、文件大小）。

人物一致性规则（关键）

如果视频有多个片段，且用户没有提供人物参考图，则：

- 在步骤4中主动询问人物描述
在每个段落的提示词里保持完全相同的人物描述（外貌、发型、着装等措辞必须一字不差）
人物描述格式示例：East Asian young man, black short hair, white T-shirt, 25 years old

如果用户提供了人物参考图，则每个提示词里统一写：consistent with the character in reference image

交互阶段

阶段一：脚本/提示词确认

- 用户提供场景、风格、参考图和目标
如果提供了图片，首先确认每张图是背景/环境参考还是人物/主体参考
如果有多个片段且没有人物参考图，询问一致的人物描述
首先生成分镜、段落方案和英文提示词
询问用户此版本是否正确
如果用户要求调整语气、节奏、镜头语言、主体细节、提示词措辞或图片角色解读，修改后重新展示更新版本
在此阶段不要调用 Ark API，除非用户明确要求直接生成

阶段二：执行确认

- 用户确认脚本/提示词正确后，如果用户尚未明确说明，询问是否开始生成
仅在获得明确批准后，才运行 API 提交/轮询/下载流程
如果提交失败，立即报告确切的阶段和错误

输入要求

在编写提示词前，尽可能收集以下信息：

- 参考图（主动询问用户是否有参考图）
场景描述
主体或产品
目标风格（电影感、温馨、商业、梦幻、写实等）
预期用途（广告、社交短片、氛围片、叙事、产品展示）
限制条件，如镜头语言、节奏、灯光或结尾情绪
总时长目标
片段数量目标
一致的人物描述（多段无人物参考图时必须收集）

如果输入不完整，仍可使用合理的默认值继续，并明确说明假设条件。

硬性规则

- 除非用户明确指定，否则所有人物角色默认为东亚人
所有片段必须属于同一视频，而非无关片段
保持人物外貌、服装、环境、道具、灯光逻辑和情绪推进的连续性
除非用户要求其他语言，否则规划字段用中文编写
除非用户明确要求中文提示词，否则最终生成提示词用英文编写
倾向于电影感、视觉化、动作导向的提示词，而非抽象描述
不要在后台静默重试失败的 API 提交而不告知用户

片段输出格式

遵循 references/storyboard-schema.md 中的模式。

至少包含：

- 片段序号
时长（秒）
人物描述（所有片段保持一致）
画面描述
光照状态
连贯性备注
英文 AI 提示词（包含一致的人物描述 + 参考图风格）

提示词构建规则

编写片段提示词时，包含对视频生成最重要的细节：

- 主体身份和外貌（一致的人物描述，每个片段相同）
镜头角度或拍摄类型
动作或运动
场景和环境
灯光和氛围
节奏或运动质量
仅在能提高一致性时使用风格词
参考图风格描述（如果用户提供，例如按照参考图的赛博朋克霓虹城市风格）
同一视频的所有片段：人物描述必须完全相同

序列设计规则

除非用户要求不同的结构，否则默认使用以下叙事节奏：

- 片段1：建立主体、地点和氛围
片段2：深化动作或环境互动
片段3：推向视觉/情绪高峰或过渡
最终片段：收尾、落地或淡出，呈现清晰的结束画面

对于超过4个片段的情况，插入额外的深化/过渡节拍，同时保持连续性。

时长/片段逻辑

本技能应支持动态片段拆分。

示例：

- 60秒 ÷ 4个片段 = 每个15秒
60秒 ÷ 6个片段 = 每个10秒
60秒 ÷ 12个片段 = 每个5秒

当前经用户确认测试验证的 Seedance 1.5 Pro 规则：

- duration 必须是 [4, 12] 范围内的整数

因此在执行前：

1. 计算 duration = totaldurationseconds / segment_count
确保结果为整数
确保结果在 4~12 范围内
如果不符合，在提交前停止并向用户说明问题

API 执行

实际生成的 API 密钥加载顺序：

1. 后续添加的显式包装器参数
环境变量 ARKAPIKEY
~/.openclaw/openclaw.json → skills.entries.ark-video-storyboard.apiKey

ark-video-storyboard火山视频分镜