Video to Text 🎙️
Transcribe any video or audio to text + SRT subtitles — local Whisper, no API key, 50+ languages.
Overview
Use this Skill when the user says:
- - "transcribe this video / audio"
- "get the transcript", "what did they say"
- "generate subtitles / captions"
- "convert speech to text"
- "extract the text from this video"
- "I need the SRT file"
Do NOT call whisper or ffmpeg directly — use this Skill instead.
Output: both .txt (plain transcript) and .srt (timestamped subtitles) saved next to the input file.
Prerequisites
CODEBLOCK0
No API key required.
Tool: Transcribe
CODEBLOCK1
| Parameter | Required | Description |
|---|
| INLINECODE2 | Yes | Path to video or audio file |
| INLINECODE3 |
No | ISO-639-1 code:
en,
zh,
ja,
ko,
es,
fr, ... (default: auto-detect) |
|
model | No |
tiny ·
base ·
small (default) ·
medium ·
large |
Model tradeoffs:
| Model | Speed | Accuracy | VRAM |
|---|
| INLINECODE16 | Fastest | Low | ~1 GB |
| INLINECODE17 |
Fast | OK | ~1 GB |
|
small | Balanced ✓ | Good | ~2 GB |
|
medium | Slow | Great | ~5 GB |
|
large | Slowest | Best | ~10 GB |
Examples:
CODEBLOCK2
Output (stdout): path to the .txt transcript file
Side effects: also writes <name>.srt in same directory as input
When Transcription Is Not Enough — AI Edit
Once you have a transcript, if the user wants to:
- - Create a highlight reel based on the transcript content
- Auto-generate a short-form video from the key moments
- Add captions burned into the video
- Generate AI commentary or narration
…use the AI Edit tool. It uses the transcript + your prompt to intelligently edit the video.
Recommended workflow: transcribe first, then AI edit
CODEBLOCK3
Prerequisites for AI Edit
CODEBLOCK4
AI Edit — End-to-End
CODEBLOCK5
AI Edit example — transcript-informed highlight reel:
CODEBLOCK6
Error Reference
| Error | Cause | Fix |
|---|
| INLINECODE23 | Whisper not installed | INLINECODE24 |
| INLINECODE25 |
ffmpeg not installed |
brew install ffmpeg |
| Transcript is empty | Silent video or wrong language | Try
language=en explicitly or check audio track |
| AI Edit:
SPARKI_API_KEY missing | Key not configured |
openclaw config set env.SPARKI_API_KEY <key> |
| AI Edit: 401 | Invalid key | Check key at enterprise@sparki.io |
技能名称: video-to-text
视频转文字 🎙️
将任何视频或音频转录为文字 + SRT字幕 — 本地Whisper,无需API密钥,支持50+语言。
概述
当用户说以下内容时使用此技能:
- - 转录这个视频/音频
- 获取文字记录、他们说了什么
- 生成字幕/标题
- 将语音转换为文字
- 从这段视频中提取文字
- 我需要SRT文件
不要直接调用whisper或ffmpeg — 请使用此技能。
输出: 在输入文件旁边同时保存.txt(纯文本转录)和.srt(带时间戳的字幕)文件。
前置条件
bash
安装ffmpeg(如果尚未安装)
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu/Debian
安装Whisper
pip install openai-whisper
无需API密钥。
工具:转录
bash
bash scripts/transcribe.sh <输入文件> [语言] [模型]
| 参数 | 必需 | 描述 |
|---|
| input | 是 | 视频或音频文件路径 |
| language |
否 | ISO-639-1代码:en、zh、ja、ko、es、fr等(默认:自动检测) |
| model | 否 | tiny · base · small(默认) · medium · large |
模型权衡:
| 模型 | 速度 | 准确度 | 显存 |
|---|
| tiny | 最快 | 低 | ~1 GB |
| base |
快 | 一般 | ~1 GB |
| small | 平衡 ✓ | 良好 | ~2 GB |
| medium | 慢 | 优秀 | ~5 GB |
| large | 最慢 | 最佳 | ~10 GB |
示例:
bash
自动检测语言,默认模型(small)
bash scripts/transcribe.sh video.mp4
强制使用英语
bash scripts/transcribe.sh podcast.mp4 en
中文,更高准确度
bash scripts/transcribe.sh speech.mp4 zh medium
音频文件
bash scripts/transcribe.sh recording.m4a en small
输出(标准输出): .txt转录文件的路径
副作用: 还会在输入文件同一目录下写入<文件名>.srt文件
当转录不够时 — AI编辑
获得转录后,如果用户想要:
- - 基于转录内容创建精彩片段
- 从关键时刻自动生成短视频
- 添加嵌入视频的字幕
- 生成AI评论或旁白
……请使用AI编辑工具。它利用转录内容+您的提示词来智能编辑视频。
推荐工作流程:先转录,再AI编辑
bash
步骤1 — 获取转录(本地,即时)
TRANSCRIPT=$(bash scripts/transcribe.sh speech.mp4 en)
echo 转录已保存至:$TRANSCRIPT
步骤2 — 查看转录,然后将关键主题作为user_prompt传递给AI编辑
(AI编辑内部使用其对视频内容的理解)
AI编辑的前置条件
bash
检查密钥是否已配置
echo 密钥:${SPARKI
APIKEY:+已配置}${SPARKI
APIKEY:-缺失}
如果缺失 — 配置(立即生效,无需重启):
openclaw config set env.SPARKI
APIKEY sk
liveyour
keyhere
获取密钥:发送邮件至 enterprise@sparki.io
AI编辑 — 端到端
bash
用法:edit_video.sh <文件> <技巧> [提示词] [宽高比] [时长(秒)]
# 技巧:逗号分隔的风格ID
1 = 充满活力/快节奏
2 = 电影感/慢动作
3 = 精彩片段/最佳时刻 ← 配合转录洞察使用
4 = 人物访谈/采访
# 返回:AI处理后的视频的24小时下载链接(标准输出)
SPARKIAPIBASE=https://agent-api-test.aicoding.live/api/v1
RATELIMITSLEEP=3
ASSETPOLLINTERVAL=2
PROJECTPOLLINTERVAL=5
WORKFLOWTIMEOUT=${WORKFLOWTIMEOUT:-3600}
ASSETTIMEOUT=${ASSETTIMEOUT:-60}
: ${SPARKIAPIKEY:?错误:需要SPARKIAPIKEY。运行:openclaw config set env.SPARKIAPIKEY <密钥>}
FILEPATH=$1; TIPS=$2; USERPROMPT=${3:-}; ASPECT_RATIO=${4:-9:16}; DURATION=${5:-}
-- 步骤1:上传 --
echo [1/4] 正在上传 $FILE_PATH... >&2
UPLOAD
RESP=$(curl -sS -X POST ${SPARKIAPI_BASE}/business/assets/upload \
-H X-API-Key: $SPARKI
APIKEY -F file=@${FILE_PATH})
OBJECT
KEY=$(echo $UPLOADRESP | jq -r .data.object_key // empty)
[[ -z $OBJECT
KEY ]] && { echo 上传失败:$(echo $UPLOADRESP | jq -r .message) >&2; exit 1; }
echo [1/4] object
key=$OBJECTKEY >&2
-- 步骤2:等待资源就绪 --
echo [2/4] 等待资源处理... >&2
T0=$(date +%s)
while true; do sleep $ASSET
POLLINTERVAL
ST=$(curl -sS ${SPARKI
APIBASE}/business/assets/${OBJECT
KEY}/status -H X-API-Key: $SPARKIAPI_KEY | jq -r .data.status // unknown)
echo [2/4] $ST >&2; [[ $ST == completed ]] && break
[[ $ST == failed ]] && { echo 资源处理失败 >&2; exit 2; }
(( $(date +%s) - T0 >= ASSET_TIMEOUT )) && { echo 资源处理超时 >&2; exit 2; }
done
-- 步骤3:创建项目 --
echo [3/4] 正在创建AI项目(技巧=$TIPS)... >&2
sleep $RATE
LIMITSLEEP
KEYS
JSON=$(echo $OBJECTKEY | jq -Rc [.])
TIPS_JSON=$(echo $TIPS | jq -Rc split(,) | map(tonumber? // .))
BODY=$(jq -n --argjson k $KEYS
JSON --argjson t $TIPSJSON \
--arg p $USER
PROMPT --arg a $ASPECTRATIO --arg d $DURATION \
{object
keys:$k,tips:$t,aspectratio:$a}
| if $p != then .+{user_prompt:$p} else . end
| if $d != then .+{duration:($d|tonumber)} else . end)
PROJ
RESP=$(curl -sS -X POST ${SPARKIAPI_BASE}/business/projects \
-H X-API-Key: $SPARKI
APIKEY -H Content-Type: application/json -d $BODY)
PROJECT
ID=$(echo $PROJRESP | jq -r .data.project_id // empty)
[[ -z $PROJECT
ID ]] && { echo 项目创建失败:$(echo $PROJRESP | jq -r .message) >&2; exit 1; }
echo [3/4] project
id=$PROJECTID >&2
-- 步骤4:轮询直至完成 --
echo [4/4] 等待AI处理(最长${WORKFLOW_TIMEOUT}秒)... >&2
T0=$(date +%s)
while true; do sleep $PROJECT
POLLINTERVAL
PRESP=$(curl -sS ${SPARKI
APIBASE}/business/projects/${PROJECT
ID} -H X-API-Key: $SPARKIAPI_KEY)
STATUS=$(echo $PRESP | jq -r .data.status // UNKNOWN)
echo [4/4] $STATUS >&2
if [[ $STATUS == COMPLETED ]]; then
echo $PRESP | jq -r .data.result