Faster Whisper Local Service

Provision a local STT backend used by voice skills.

What this sets up

- Python venv for faster-whisper
INLINECODE0 HTTP endpoint at INLINECODE1
systemd user service: INLINECODE2

Important: Model download on first run

On first startup, faster-whisper downloads model weights from Hugging Face (~1.5 GB for medium). This requires internet access and disk space. After the initial download, models are cached locally and the service runs fully offline.

Model	Download size	RAM usage
tiny	~75 MB	~400 MB
base

~150 MB | ~500 MB |
| small | ~500 MB | ~800 MB |
| medium | ~1.5 GB | ~1.4 GB |
| large-v3 | ~3.0 GB | ~3.5 GB |

To pre-download models in an air-gapped environment, see faster-whisper docs.

Security notes

Network isolation

- Binds to 127.0.0.1 only — not reachable from the network.
CORS restricted to a single origin (https://127.0.0.1:8443 by default).
No credentials, API keys, or secrets are used or stored.

Input validation

- Upload size limit: Requests exceeding the configured limit are rejected before processing (HTTP 413). Default: 50 MB, configurable via MAX_UPLOAD_MB.
Magic-byte check: Only files with recognized audio signatures (WAV, OGG, FLAC, MP3, WebM, M4A) are accepted. Unrecognized formats are rejected (HTTP 415) before reaching GStreamer.
Subprocess safety: All arguments to gst-launch-1.0 are passed as a list — no shell expansion or injection is possible.

GStreamer dependency

The service uses GStreamer's decodebin for audio format conversion. Like any media library, GStreamer's parsers process binary data and should be kept up to date. Mitigation: install gst-launch-1.0 from your OS vendor's trusted packages and apply security updates regularly. The magic-byte pre-filter above reduces the attack surface by rejecting non-audio payloads before they reach GStreamer.

No data exfiltration

- No outbound network calls (after initial model download).
No telemetry, analytics, or phone-home behavior.
Temporary files are created in a per-request TemporaryDirectory and cleaned up immediately.

Reproducibility defaults

- Pinned package: faster-whisper==1.1.1 (override via env)
Explicit dependency check for INLINECODE12
CORS restricted to one origin by default
Configurable workspace/service paths (no hardcoded user path)

Deploy

CODEBLOCK0

With custom settings:

CODEBLOCK1

Language setting

Default: auto (auto-detect language). Set WHISPER_LANGUAGE=de for German-only, en for English-only, etc. Fixed language is faster and more accurate if you only use one language.

Idempotent: safe to run repeatedly.

What this skill modifies

What	Path	Action
Python venv	INLINECODE16	Creates venv, installs faster-whisper via pip
Transcribe server

Uninstall

CODEBLOCK2

Optional full cleanup:

CODEBLOCK3

Verify

CODEBLOCK4

Expected:

- service INLINECODE20
endpoint responds (HTTP 200/500 acceptable for invalid sample payload)

Notes

- This skill provides backend transcription only.
Pair with webchat-voice-proxy for browser mic + HTTPS/WSS integration.
For one-step install, use webchat-voice-full-stack (deploys backend + proxy in order).

Faster Whisper 本地服务

为语音技能提供本地语音转文字后端。

本服务设置的内容

- 用于 faster-whisper 的 Python 虚拟环境
位于 http://127.0.0.1:18790/transcribe 的 transcribe-server.py HTTP 端点
systemd 用户服务：openclaw-transcribe.service

重要提示：首次运行时下载模型

首次启动时，faster-whisper 会从 Hugging Face 下载模型权重（medium 模型约 1.5 GB）。这需要互联网连接和磁盘空间。初始下载完成后，模型会缓存到本地，服务将完全离线运行。

模型	下载大小	内存占用
tiny	约 75 MB	约 400 MB
base

约 150 MB | 约 500 MB |
| small | 约 500 MB | 约 800 MB |
| medium | 约 1.5 GB | 约 1.4 GB |
| large-v3 | 约 3.0 GB | 约 3.5 GB |

如需在离线环境中预下载模型，请参阅 faster-whisper 文档。

安全说明

网络隔离

- 仅绑定到 127.0.0.1 — 无法从网络访问。
CORS 限制为单一来源（默认为 https://127.0.0.1:8443）。
不使用或存储任何凭据、API 密钥或机密信息。

输入验证

- 上传大小限制：超过配置限制的请求在处理前将被拒绝（HTTP 413）。默认值：50 MB，可通过 MAXUPLOADMB 配置。
魔数检查：仅接受具有可识别音频签名（WAV、OGG、FLAC、MP3、WebM、M4A）的文件。无法识别的格式在到达 GStreamer 前将被拒绝（HTTP 415）。
子进程安全：传递给 gst-launch-1.0 的所有参数均以列表形式传递 — 不存在 shell 扩展或注入风险。

GStreamer 依赖

该服务使用 GStreamer 的 decodebin 进行音频格式转换。与任何媒体库一样，GStreamer 的解析器处理二进制数据，应保持更新。缓解措施：从操作系统供应商的可信软件包安装 gst-launch-1.0，并定期应用安全更新。上述魔数预过滤通过在非音频负载到达 GStreamer 前将其拒绝，减少了攻击面。

无数据泄露

- 无出站网络调用（初始模型下载后）。
无遥测、分析或回传行为。
临时文件在每次请求的 TemporaryDirectory 中创建，并立即清理。

可复现性默认设置

- 固定包：faster-whisper==1.1.1（可通过环境变量覆盖）
显式检查 gst-launch-1.0 依赖
默认将 CORS 限制为单一来源
可配置的工作空间/服务路径（无硬编码用户路径）

部署

bash
bash scripts/deploy.sh

使用自定义设置：

bash
WORKSPACE=~/.openclaw/workspace \
TRANSCRIBE_PORT=18790 \
WHISPERMODELSIZE=medium \
WHISPER_LANGUAGE=auto \
TRANSCRIBEALLOWEDORIGIN=https://10.0.0.42:8443 \
bash scripts/deploy.sh

语言设置

默认值：auto（自动检测语言）。设置为 WHISPER_LANGUAGE=de 仅支持德语，en 仅支持英语等。如果只使用一种语言，固定语言模式更快且更准确。

幂等性：可安全重复运行。

本技能修改的内容

内容	路径	操作
Python 虚拟环境	$WORKSPACE/.venv-faster-whisper/	创建虚拟环境，通过 pip 安装 faster-whisper
转录服务器

卸载

bash
systemctl --user stop openclaw-transcribe.service
systemctl --user disable openclaw-transcribe.service
rm -f ~/.config/systemd/user/openclaw-transcribe.service
systemctl --user daemon-reload

可选完全清理：

bash
rm -rf ~/.openclaw/workspace/.venv-faster-whisper
rm -f ~/.openclaw/workspace/voice-input/transcribe-server.py

验证

bash
bash scripts/status.sh

预期结果：

- 服务状态为 active
端点响应正常（对于无效示例负载，HTTP 200/500 均可接受）

备注

- 本技能仅提供后端转录功能。
与 webchat-voice-proxy 配合使用，可实现浏览器麦克风 + HTTPS/WSS 集成。
如需一键安装，请使用 webchat-voice-full-stack（按顺序部署后端和代理）。

faster-whisper-local-service本地语音转录服务