Voice Translate
Use this skill for two closely related modes:
- 1. Chat-native mode: the user sends audio or a voice note in OpenClaw; return transcript text, translation text, and translated audio.
- Local pipeline mode: run a deterministic file-based pipeline that writes transcript, translation, wav, and metadata artifacts.
Default to an LLM-assisted translation workflow: let the current agent produce the translation, save it to a file when using the local pipeline, or use the surrounding agent turn directly when responding in chat.
Workflow
A. Chat-native mode
Use this when an inbound message already contains an audio transcript from OpenClaw media understanding, or when the user asks you to process a voice message conversationally.
- 1. Detect that the user sent audio or that the request is for voice translation.
- Obtain or confirm the transcript text.
- Translate with the current model.
- Send the transcript text to the user.
- Send the translated text to the user.
- Synthesize the translated text as audio:
- prefer the OpenClaw
tts tool when you need an immediate chat reply with audio
- prefer Piper when you need a local wav artifact
- 7. Keep the output order stable: transcript first, translation second, audio last.
B. Local pipeline mode
- 1. Confirm input/output expectations: source language, target language, output directory, and whether the run should be real or mock.
- Choose backends:
-
faster-whisper for real transcription,
mock for pipeline testing.
-
llm as the default translation path when an agent/model is available.
-
service only when unattended HTTP translation is preferable.
-
manual only as a fallback.
-
piper for real TTS,
mock for dry-run testing.
- 3. Run transcription.
- If using the default
llm path, read the transcript and translate it with the current model. Save the translated text to a file. - Run synthesis/output writing with
--translation-file. - Inspect outputs:
-
01_transcript.txt
-
02_translation.txt
-
03_translation.wav
-
result.json
- 7. If the user wants chat updates during processing, pass notifier commands with
--transcript-command, --translation-command, and --audio-command.
Preferred execution patterns
Default LLM-assisted path
Use this when the agent handling the task can translate the transcript itself.
- 1. Run the pipeline once transcription is available, or run the full command after preparing
translation.txt. - Save the model-produced translation to a file.
- Invoke:
CODEBLOCK0
Read references/llm-translation-pattern.md when you need the exact orchestration pattern or a reusable translation prompt.
Mock end-to-end validation
Use this first when you need to validate the pipeline structure without model/runtime dependencies.
CODEBLOCK1
Notes:
- -
mock transcription reads plain text from the input file. - INLINECODE20 TTS writes a silent wav file.
- INLINECODE21 is still required by the current CLI shape even when using mock TTS; use any placeholder path.
- INLINECODE22 mode currently means the translation must already exist in
--translation-file.
Service fallback
CODEBLOCK2
Resources
scripts/
- -
run_voice_translate.py: primary entrypoint. - INLINECODE25 : thin wrapper for the default LLM-assisted path.
- INLINECODE26 : pipeline modules.
- INLINECODE27 : wrap stage text and forward it via a shell command.
- INLINECODE28 : forward generated audio via a shell command.
- INLINECODE29 ,
mock_audio_sender.py: local smoke-test helpers.
references/
- - Read
references/runtime-notes.md for dependency/setup details, backend behavior, and integration constraints. - Read
references/llm-translation-pattern.md when the surrounding agent should perform translation with its own model. - Read
references/openclaw-chat-mode.md when implementing or following the conversational flow: receive voice, output transcript text, output translation text, then output translated audio.
Editing guidance
- - Keep
SKILL.md procedural and short. - Put environment- or backend-specific detail in references.
- Treat
llm as the preferred translation path for agent-driven workflows. - In chat-native mode, preserve the user-visible ordering: transcript text, translation text, then audio.
- Prefer OpenClaw
tts for immediate conversational audio replies; prefer Piper for local wav artifacts and offline pipelines. - If the user wants tighter OpenClaw integration, add an attachment-aware outer workflow or hook instead of rewriting ASR/TTS first.
- Preserve the current file contract unless the user asks to change it: transcript, translation, wav, metadata JSON.
语音翻译
使用此技能处理两种紧密相关的模式:
- 1. 聊天原生模式:用户在OpenClaw中发送音频或语音消息;返回转录文本、翻译文本和翻译后的音频。
- 本地流水线模式:运行基于文件的确定性流水线,生成转录、翻译、wav和元数据文件。
默认采用LLM辅助翻译工作流:让当前智能体生成翻译,使用本地流水线时保存到文件,或在聊天中直接响应时使用周围的智能体轮次。
工作流
A. 聊天原生模式
当入站消息已包含来自OpenClaw媒体理解的音频转录,或用户要求你以对话方式处理语音消息时使用此模式。
- 1. 检测到用户发送了音频或请求进行语音翻译。
- 获取或确认转录文本。
- 使用当前模型进行翻译。
- 将转录文本发送给用户。
- 将翻译文本发送给用户。
- 将翻译文本合成为音频:
- 当需要立即回复带有音频的聊天时,优先使用OpenClaw的tts工具
- 当需要本地wav文件时,优先使用Piper
- 7. 保持输出顺序稳定:先转录,再翻译,最后音频。
B. 本地流水线模式
- 1. 确认输入/输出预期:源语言、目标语言、输出目录,以及运行应为真实还是模拟。
- 选择后端:
- faster-whisper用于真实转录,mock用于流水线测试。
- 当有智能体/模型可用时,llm作为默认翻译路径。
- 仅在需要无人值守的HTTP翻译时使用service。
- manual仅作为回退方案。
- piper用于真实TTS,mock用于空运行测试。
- 3. 运行转录。
- 如果使用默认的llm路径,读取转录并使用当前模型进行翻译。将翻译文本保存到文件。
- 使用--translation-file运行合成/输出写入。
- 检查输出:
- 01_transcript.txt
- 02_translation.txt
- 03_translation.wav
- result.json
- 7. 如果用户希望在处理过程中获得聊天更新,使用--transcript-command、--translation-command和--audio-command传递通知命令。
首选执行模式
默认LLM辅助路径
当处理任务的智能体可以自行翻译转录时使用此模式。
- 1. 转录可用后运行流水线,或在准备好translation.txt后运行完整命令。
- 将模型生成的翻译保存到文件。
- 调用:
bash
bash scripts/runvoicetranslate_llm.sh \
/path/to/input.m4a \
./outputs/llm-run \
zh \
en \
/path/to/en_US-lessac-medium.onnx \
./translation.txt \
--whisper-model small \
--transcribe-backend faster-whisper \
--tts-backend piper
当需要精确的编排模式或可复用的翻译提示时,请阅读references/llm-translation-pattern.md。
模拟端到端验证
当需要验证流水线结构而不依赖模型/运行时依赖时,首先使用此模式。
bash
python3 scripts/runvoicetranslate.py \
--input references/examples/mock-input.txt \
--output-dir ./outputs/mock-run \
--source-lang zh \
--target-lang en \
--transcribe-backend mock \
--translation-file ./translated.txt \
--translation-backend llm \
--no-interactive-translate \
--tts-backend mock \
--piper-model ./dummy.onnx
注意:
- - mock转录从输入文件读取纯文本。
- mockTTS写入静音wav文件。
- 即使使用模拟TTS,当前CLI格式仍需要--piper-model;使用任何占位路径即可。
- llm模式目前意味着翻译必须已存在于--translation-file中。
服务回退
bash
python3 scripts/runvoicetranslate.py \
--input /path/to/input.m4a \
--output-dir ./outputs/service-run \
--source-lang zh \
--target-lang en \
--whisper-model small \
--transcribe-backend faster-whisper \
--translation-backend service \
--translation-service-url http://127.0.0.1:8000/translate \
--tts-backend piper \
--piper-model /path/to/en_US-lessac-medium.onnx
资源
scripts/
- - runvoicetranslate.py:主要入口点。
- runvoicetranslatellm.sh:默认LLM辅助路径的轻量封装。
- voicetranslateapp/:流水线模块。
- sendtext.py:封装阶段文本并通过shell命令转发。
- sendaudio.py:通过shell命令转发生成的音频。
- mocktextsender.py、mockaudio_sender.py:本地冒烟测试辅助工具。
references/
- - 阅读references/runtime-notes.md了解依赖/设置详情、后端行为和集成约束。
- 当周围智能体应使用自身模型执行翻译时,阅读references/llm-translation-pattern.md。
- 当实现或遵循对话流程时阅读references/openclaw-chat-mode.md:接收语音,输出转录文本,输出翻译文本,然后输出翻译后的音频。
编辑指南
- - 保持SKILL.md流程化且简洁。
- 将环境或后端特定的细节放在references中。
- 将llm视为智能体驱动工作流的首选翻译路径。
- 在聊天原生模式下,保持用户可见的顺序:转录文本、翻译文本、然后音频。
- 对于即时对话音频回复,优先使用OpenClaw的tts;对于本地wav文件和离线流水线,优先使用Piper。
- 如果用户希望更紧密的OpenClaw集成,添加附件感知的外部工作流或钩子,而不是重写ASR/TTS。
- 除非用户要求更改,否则保持当前文件约定:转录、翻译、wav、元数据JSON。