SenseAudio Voice Cloner

Guide users through platform-side voice cloning, then generate personalized TTS with the resulting cloned voice_id.

What This Skill Does

- Explain the official SenseAudio voice-cloning workflow
Validate whether a sample is likely suitable for cloning
Help users manage cloned voice slots and voice_id values
Generate TTS with a cloned voice through the official TTS API
Apply optional pronunciation dictionary control for cloned voices

Credential and Dependency Rules

- Read the API key from SENSEAUDIO_API_KEY.
Send auth only as Authorization: Bearer <API_KEY>.
Do not place API keys in query parameters, logs, or saved examples.
If Python helpers are used, this skill expects python3, requests, and pydub.
INLINECODE7 is only needed for optional local audio validation.

Official Voice-Cloning Constraints

Use the official SenseAudio platform voice-cloning rules summarized below:

- Cloning itself is platform-side only; there is no direct public API to create a cloned voice.
Users must first clone on the platform, then retrieve the resulting voice_id for API use.
Sample requirements for platform cloning:

- duration: 3-30 seconds - size: <=50MB - format: MP3, WAV, or AAC - recording environment: quiet and echo-free

- Cloning consumes a voice slot on the user's plan.
Deleting unused cloned voices frees slots.

Official TTS Constraints for Cloned Voices

Use the official TTS API on /v1/t2a_v2 after the user already has a cloned voice_id:

- Standard TTS model: INLINECODE16
INLINECODE17 is required and may be a cloned voice ID
Optional audio formats: mp3, wav, pcm, INLINECODE21
Optional sample rates: 8000, 16000, 22050, 24000, 32000, INLINECODE27
Optional MP3 bitrates: 32000, 64000, 128000, INLINECODE31
Optional channels: 1 or INLINECODE33
Optional pronunciation dictionary is only for cloned voices and requires INLINECODE35

Recommended Workflow

1. Confirm cloning status:

- If the user does not yet have a cloned voice, direct them to the platform cloning flow first.
If they already have a cloned voice, ask for the voice_id.

2. Validate the source sample when helpful:

- Check duration, file type, and basic audio quality locally.
Warn when the sample is noisy, reverberant, or outside the documented size/duration limits.

3. Generate TTS with the cloned voice:

- Use SenseAudio-TTS-1.0 for normal synthesis.
Use SenseAudio-TTS-1.5 only when a pronunciation dictionary is needed.

4. Keep output safe and reproducible:

- Decode returned hex audio before writing files.
Keep filenames deterministic and avoid logging secrets.

Platform Guidance Helper

CODEBLOCK0

Minimal TTS Helper

CODEBLOCK1

Pronunciation Dictionary Pattern

Use this only for cloned voices that need explicit polyphone correction.

CODEBLOCK2

Dictionary items follow the official shape:

- original: source text span
INLINECODE41: pronunciation override such as INLINECODE42

Optional Local Validation

CODEBLOCK3

Output Options

- MP3 or WAV audio synthesized with a cloned voice
Markdown instructions for platform cloning and slot management
JSON metadata containing voice_id labels and local descriptions
Optional validation report for source samples

Safety Notes

- Do not claim that voice cloning can be initiated through the public API.
Do not mix API_KEY and SENSEAUDIO_API_KEY; use SENSEAUDIO_API_KEY consistently.
Use SenseAudio-TTS-1.0 by default; reserve SenseAudio-TTS-1.5 for cloned-voice dictionary use.
Treat voice_id values as user-specific operational identifiers.

SenseAudio 语音克隆器

引导用户完成平台端的语音克隆，然后使用生成的克隆 voice_id 生成个性化 TTS。

该技能的功能

- 解释官方 SenseAudio 语音克隆工作流程
验证样本是否适合克隆
帮助用户管理克隆语音槽位和 voice_id 值
通过官方 TTS API 使用克隆语音生成 TTS
为克隆语音应用可选的发音词典控制

凭证和依赖规则

- 从 SENSEAUDIOAPIKEY 读取 API 密钥。
仅以 Authorization: Bearer 形式发送认证信息。
不要将 API 密钥放在查询参数、日志或保存的示例中。
如果使用 Python 辅助工具，此技能需要 python3、requests 和 pydub。
pydub 仅用于可选的本地音频验证。

官方语音克隆约束

使用以下总结的官方 SenseAudio 平台语音克隆规则：

- 克隆本身仅在平台端进行；没有直接创建克隆语音的公开 API。
用户必须先在平台上克隆，然后获取生成的 voice_id 供 API 使用。
平台克隆的样本要求：

- 时长：3-30 秒 - 大小：<=50MB - 格式：MP3、WAV 或 AAC - 录音环境：安静且无回声

- 克隆会消耗用户套餐中的一个语音槽位。
删除未使用的克隆语音可释放槽位。

克隆语音的官方 TTS 约束

在用户已有克隆 voiceid 后，使用 /v1/t2av2 上的官方 TTS API：

- 标准 TTS 模型：SenseAudio-TTS-1.0
voicesetting.voiceid 为必填项，可以是克隆语音 ID
可选的音频格式：mp3、wav、pcm、flac
可选的采样率：8000、16000、22050、24000、32000、44100
可选的 MP3 比特率：32000、64000、128000、256000
可选的声道数：1 或 2
可选的发音 dictionary 仅适用于克隆语音，且需要 model=SenseAudio-TTS-1.5

平台引导辅助

python
def guidevoicecloning():
return
在 SenseAudio 平台上克隆语音：

1. 打开 https://senseaudio.cn/platform/voice-clone
准备一段干净的语音样本：

- 时长：3-30 秒 - 格式：MP3 / WAV / AAC - 大小：50MB 或更小 - 环境：安静、低回声、语音清晰

3. 在平台上上传或录制样本
等待平台完成训练
从语音列表中复制生成的 voiceid
在后续的 TTS API 调用中使用该 voiceid

最小化 TTS 辅助

python
import binascii
import os

import requests

APIKEY = os.environ[SENSEAUDIOAPI_KEY]
APIURL = https://api.senseaudio.cn/v1/t2av2

def generatewithclonedvoice(text, voiceid, speed=1.0, vol=1.0, pitch=0):
response = requests.post(
API_URL,
headers={
Authorization: fBearer {API_KEY},
Content-Type: application/json,
},
json={
model: SenseAudio-TTS-1.0,
text: text,
stream: False,
voice_setting: {
voiceid: voiceid,
speed: speed,
vol: vol,
pitch: pitch,
},
audio_setting: {
format: mp3,
sample_rate: 32000,
bitrate: 128000,
channel: 2,
},
},
timeout=60,
)
response.raiseforstatus()
data = response.json()
return binascii.unhexlify(data[data][audio]), data.get(trace_id)

发音词典模式

仅用于需要明确多音字纠正的克隆语音。

python
def generatewithdictionary(text, voice_id, dictionary):
response = requests.post(
API_URL,
headers={
Authorization: fBearer {API_KEY},
Content-Type: application/json,
},
json={
model: SenseAudio-TTS-1.5,
text: text,
voicesetting: {voiceid: voice_id},
dictionary: dictionary,
},
timeout=60,
)
response.raiseforstatus()
return response.json()

词典条目遵循官方格式：

- original：源文本片段
replacement：发音覆盖，如 [hao4]干净

可选的本地验证

python
from pydub import AudioSegment

def validatecloningaudio(audio_file):
audio = AudioSegment.fromfile(audiofile)
issues = []

if not 3000 <= len(audio) <= 30000:
issues.append(durationoutof_range)
if audio.frame_rate < 16000:
issues.append(sampleratelow)
if audio.channels > 2:
issues.append(toomanychannels)
if not audio_file.lower().endswith((.mp3, .wav, .aac)):
issues.append(unsupported_extension)

return {
valid: not issues,
issues: issues,
duration_ms: len(audio),
samplerate: audio.framerate,
channels: audio.channels,
}

输出选项

- 使用克隆语音合成的 MP3 或 WAV 音频
平台克隆和槽位管理的 Markdown 说明
包含 voice_id 标签和本地描述的 JSON 元数据
源样本的可选验证报告

安全注意事项

- 不要声称可以通过公开 API 发起语音克隆。
不要混淆 APIKEY 和 SENSEAUDIOAPIKEY；始终使用 SENSEAUDIOAPIKEY。
默认使用 SenseAudio-TTS-1.0；将 SenseAudio-TTS-1.5 保留给克隆语音的词典使用。
将 voiceid 值视为用户特定的操作标识符。

senseaudio-voice-cloner语音克隆指南