MiMo Voice Assistant

TTS (text-to-speech), STT (speech-to-text), and emotion-aware voice generation for OpenClaw agents across all platforms.

Architecture

CODEBLOCK0

Before Install

⚠️ This skill sends text/audio to Xiaomi's MiMo API (api.xiaomimimo.com) for TTS/STT processing. Ensure you trust this service and have a valid MIMO_API_KEY. If you need higher security, consider deploying the proxy in an isolated environment (Docker/container) and rotating your API key regularly.

Quick Start

CODEBLOCK1

OpenClaw config (openclaw.json):
CODEBLOCK2

Emotion Detection

See INLINECODE3

Multi-Platform

See INLINECODE4

API Endpoints

Endpoint	Method	Description
INLINECODE5	GET	Health check
INLINECODE6

Request format:
CODEBLOCK3

Formats: wav (default), mp3 (needs ffmpeg), opus (needs ffmpeg)

Multi-Language Support

CRITICAL: TTS output must match the user's language automatically.

Language Detection

Detect the user's language from their message and respond in the same language for both text and voice.

User sends	Agent text reply	TTS voice output
"你好，帮我查一下天气"	中文回复	中文语音
"What's the weather?"

How It Works

1. Agent detects language from the user's message (first message or latest message language)
Agent replies in that language (text)
TTS speaks that language — MiMo-V2-TTS supports Chinese, English, Japanese, Korean, and more
No explicit instruction needed — this is automatic behavior

When to Override

Only switch language if the user explicitly asks:

- "请用英语回答" → Switch to English
"Speak in Japanese" → Switch to Japanese
Otherwise, always match the user's language

TTS Language Compatibility

MiMo-V2-TTS supports natural speech in:

- ✅ Chinese (Mandarin)
✅ English (US/UK)
✅ Japanese
✅ Korean
✅ Other languages (quality varies)

Implementation

In your response, you can use [lang:xx] hints for the TTS proxy (optional):

CODEBLOCK4

Or simply reply normally — the TTS proxy will automatically handle the language based on the text content.

Security & Data Flow

- API key: passed via env var (MIMO_API_KEY) or Authorization Bearer header, never hardcoded
Network: proxy only connects to api.xiaomimimo.com (Xiaomi official API) — text and base64 audio are sent there for TTS/STT processing
Local binding: proxy binds to 127.0.0.1:3999 (localhost only, not externally exposed)
Temp files: auto-cleaned after each request
User responsibility: if using systemd/launchd for persistence, store API keys securely (env file or secret manager, not inline in service files)

MiMo 语音助手

面向所有平台上的 OpenClaw 代理的 TTS（文本转语音）、STT（语音转文本）和情感感知语音生成。

架构

用户语音 → OpenClaw (Telegram/Discord/WhatsApp/...)
→ STT (MiMo-V2-Omni 转录)
→ 代理处理
→ TTS (MiMo-V2-TTS 含情感 + 语言)
→ 语音回复

安装前须知

⚠️ 此技能会将文本/音频发送至小米 MiMo API (api.xiaomimimo.com) 进行 TTS/STT 处理。请确保您信任此服务并拥有有效的 MIMOAPIKEY。如需更高安全性，建议在隔离环境（Docker/容器）中部署代理，并定期轮换 API 密钥。

快速开始

bash

1. 安装依赖

cd mimo-tts-proxy && npm install

2. 设置 API 密钥

export MIMOAPIKEY=your-key-here

3. 启动代理

node src/server.mjs

OpenClaw 配置 (openclaw.json):
json
{
messages: {
tts: {
auto: inbound,
provider: openai,
baseUrl: http://127.0.0.1:3999,
maxTextLength: 4000
}
}
}

情感检测

参见 references/emotion-detection.md

多平台支持

参见 references/platforms.md

API 端点

端点	方法	描述
/health	GET	健康检查
/v1/models

请求格式：
json
{model: tts-1, input: Hello, voice: mimodefault, responseformat: mp3}

格式： wav（默认）、mp3（需要 ffmpeg）、opus（需要 ffmpeg）

多语言支持

关键：TTS 输出必须自动匹配用户的语言。

语言检测

从用户消息中检测其语言，并以相同语言回复文本和语音。

用户发送	代理文本回复	TTS 语音输出
你好，帮我查一下天气	中文回复	中文语音
Whats the weather?

工作原理

1. 代理检测语言 从用户消息中（首条消息或最新消息的语言）
代理以该语言回复（文本）
TTS 以该语言朗读 — MiMo-V2-TTS 支持中文、英语、日语、韩语等
无需显式指令 — 此为自动行为

何时覆盖

仅在用户明确要求时切换语言：

- 请用英语回答 → 切换至英语
Speak in Japanese → 切换至日语
否则，始终匹配用户的语言

TTS 语言兼容性

MiMo-V2-TTS 支持以下语言的自然语音：

- ✅ 中文（普通话）
✅ 英语（美式/英式）
✅ 日语
✅ 韩语
✅ 其他语言（质量各异）

实现方式

在回复中，您可以使用 [lang:xx] 提示为 TTS 代理提供参考（可选）：

[lang:zh]你好，这是你的语音回复。
[lang:en]Hello, here is your voice reply.
[lang:ja]こんにちは、音声返信です。

或直接正常回复 — TTS 代理将根据文本内容自动处理语言。

安全与数据流

- API 密钥：通过环境变量 (MIMOAPIKEY) 或 Authorization Bearer 标头传递，绝不硬编码
网络：代理仅连接至 api.xiaomimimo.com（小米官方 API）— 文本和 base64 音频在此处进行 TTS/STT 处理
本地绑定：代理绑定至 127.0.0.1:3999（仅限本地，不对外暴露）
临时文件：每次请求后自动清理
用户责任：若使用 systemd/launchd 实现持久化运行，请安全存储 API 密钥（环境文件或密钥管理器，而非直接写入服务文件）

mimo-voice-assistant语音助手