AudioClaw Conversation Rehearsal
What this skill is for
This skill is for realistic conversation rehearsal in high-pressure situations:
- - 汇报述职
- 向上沟通
- 绩效面谈
- 晋升答辩
- 难搞老板或强势同事沟通
- 需要脱敏的正式谈话
It is designed to simulate the other person speaking back, not just generate a script.
Default stance
Use two voice modes:
- Recommended default
- Use a role-appropriate system voice and behavior style
- Only use when the voice sample is explicitly authorized for rehearsal or internal training
- Best official path: clone on the AudioClaw platform first, then pass the prepared clone
voice_id
- A prepared cloned voice id commonly looks like
vc-..., and can be passed directly with INLINECODE4
Do not default to cloning a real person's voice without clear permission.
Workflow
- 1. Define the rehearsal:
- scenario
- counterpart role
- relationship
- talk topic
- desired outcome
- fear triggers
- difficulty
- 2. Run
scripts/build_rehearsal_blueprint.py. - Decide voice mode:
- proxy voice
- authorized clone
- 4. Run the live loop in your agent stack:
- counterpart turn via TTS
- user spoken reply via ASR
- if you want faster perceived intake, enable stream ASR
- agent judges tone, structure, and progress
- use
scripts/build_counterpart_turn.py to generate the next counterpart reply
- use
scripts/senseaudio_counterpart_tts.py to synthesize that reply
- official clone chain: prepare the clone on the AudioClaw platform first and pass the resulting
voice_id
- if that
voice_id is a clone id like
vc-..., counterpart TTS now auto-routes to
SenseAudio-TTS-1.5
- optional experimental path: if an authorized platform token is available, use
scripts/senseaudio_clone_workspace.py to inspect clone slots or attempt a rehearsal-only clone from an authorized sample
- if the user wants to actually hear the counterpart turns in Feishu or AudioClaw, use
--send-feishu-audio or run
scripts/send_rehearsal_counterparts_to_feishu.py
- 5. After the session, run
scripts/analyze_rehearsal_transcript.py. - Produce a debrief:
- weak openings
- over-explaining
- vague asks
- missing evidence
- apologetic or defensive tone
- better rewrites
AudioClaw Trigger Pattern
Use this skill as a structured multi-turn rehearsal mode.
Recommended user trigger:
CODEBLOCK0
The agent should:
- 1. Collect the rehearsal slots first.
- Build the blueprint.
- Enter rehearsal mode, with reply mode defaulting to
voice. - Start the scene with the opening counterpart turn as voice, not text.
- For every later rehearsal turn:
- transcribe with
scripts/senseaudio_asr.py
- generate the next counterpart turn
- synthesize that turn with proxy voice or the prepared clone
voice_id
- in ongoing rehearsal mode, default to
--send-feishu-audio so the counterpart turns are sent as Feishu
audio messages without needing the user to repeat that request
- only fall back to text-first replies if the user explicitly asks for text-only output or the channel cannot play voice
- 6. End with
scripts/analyze_rehearsal_transcript.py and return a concrete debrief.
Rehearsal mode should be sticky inside the same session:
- - Keep the same scenario, counterpart role, relationship, topic, desired outcome, fear triggers, difficulty, and chosen INLINECODE22
- Keep voice reply as the default from the opening turn onward until the user explicitly says to switch back to text replies or exit rehearsal mode
- If the user says "直接发语音给我练" or "每轮都发语音", treat that as confirming the same sticky voice mode rather than a one-turn exception
If the user asks to "use the cloned voice", interpret that as:
- - use a platform-prepared clone
voice_id when available - otherwise pause and ask for the clone
voice_id or fall back to INLINECODE25
Design rules
- - Prioritize behavior realism over exact voice likeness.
- Treat the public documented clone flow and the experimental workspace automation flow as separate paths.
- For scary-counterpart scenarios, structure the rehearsal in phases:
- opening pressure
- pushback
- challenge question
- close
- what the user said
- how the user said it
- - Keep debrief concrete and operational.
API key lookup
For this skill, use SENSEAUDIO_API_KEY as the default API key source again.
Practical rule:
- -
scripts/run_live_rehearsal_session.py, scripts/run_complete_rehearsal_service.py, and scripts/senseaudio_counterpart_tts.py now default to INLINECODE30 - If the host app injects
SENSEAUDIO_API_KEY as a login token such as v2.public..., the shared bootstrap replaces it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before the rehearsal call starts
Resources
- Builds a structured rehearsal plan and counterpart persona
- Generates the next counterpart turn from rehearsal state and the user's latest reply
- Transcribes user spoken rehearsal turns with the official AudioClaw HTTP ASR API
- Synthesizes a counterpart turn using a safe proxy voice or an explicitly authorized clone voice_id
- Runs a multi-turn live rehearsal session from user audio replies, counterpart generation, TTS, and automatic debrief
- Supports
--stream-asr and
--send-feishu-audio
- Reuses the Feishu voice delivery path to send the generated counterpart turns one by one as audio messages
- Lists clone slots, lists available voices, and creates an authorized rehearsal clone through the official AudioClaw workspace endpoints, preferring a platform token and otherwise trying a logged-in Chrome browser session
- Resolves an AudioClaw workspace platform token from env or a logged-in Chrome AudioClaw tab when Apple Events JavaScript is enabled
- One entry point that builds the blueprint, optionally resolves a prepared clone
voice_id or attempts experimental workspace clone automation, runs the live rehearsal session, and writes a summary bundle
- Supports
--send-feishu-audio so the rehearsal counterpart can proactively send voice turns to Feishu or AudioClaw-linked chats
- Scores a rehearsal transcript for tone and communication risks
- A minimal multi-turn runtime pattern for AudioClaw or another agent orchestrator
- Product design, safety policy, and rollout plan
AudioClaw 对话演练
本技能的用途
本技能用于高压场景下的真实对话演练:
- - 汇报述职
- 向上沟通
- 绩效面谈
- 晋升答辩
- 难搞老板或强势同事沟通
- 需要脱敏的正式谈话
其设计目标是模拟对方回话,而不仅仅是生成脚本。
默认立场
使用两种语音模式:
- 推荐默认模式
- 使用符合角色设定的系统语音及行为风格
- 仅在语音样本明确授权用于演练或内部培训时使用
- 最佳官方路径:先在 AudioClaw 平台进行克隆,然后传入准备好的克隆 voice_id
- 准备好的克隆语音 ID 通常形如 vc-...,可通过 --prepared-clone-voice-id 直接传入
未经明确许可,不得默认克隆真实人物的声音。
工作流程
- 1. 定义演练要素:
- 场景
- 对方角色
- 关系
- 谈话主题
- 期望结果
- 恐惧触发点
- 难度
- 2. 运行 scripts/buildrehearsalblueprint.py
- 决定语音模式:
- 代理语音
- 授权克隆
- 4. 在智能体栈中运行实时循环:
- 对方轮次通过 TTS 输出
- 用户语音回复通过 ASR 输入
- 如需更快感知速度,启用流式 ASR
- 智能体评估语气、结构和进展
- 使用 scripts/build
counterpartturn.py 生成下一轮对方回复
- 使用 scripts/senseaudio
counterparttts.py 合成该回复
- 官方克隆链路:先在 AudioClaw 平台准备克隆,传入生成的 voice_id
- 若该 voice_id 是形如 vc-... 的克隆 ID,对方 TTS 将自动路由至 SenseAudio-TTS-1.5
- 可选实验路径:如有授权平台令牌,可使用 scripts/senseaudio
cloneworkspace.py 检查克隆槽位,或尝试从授权样本创建仅用于演练的克隆
- 若用户希望在飞书或 AudioClaw 中实际听到对方轮次,使用 --send-feishu-audio 或运行 scripts/send
rehearsalcounterparts
tofeishu.py
- 5. 会话结束后,运行 scripts/analyzerehearsaltranscript.py
- 生成复盘报告:
- 开场薄弱
- 过度解释
- 请求模糊
- 缺乏证据
- 道歉或防御性语气
- 更优改写建议
AudioClaw 触发模式
将本技能用作结构化的多轮演练模式。
推荐用户触发指令:
text
开始演练,用 $senseaudio-conversation-rehearsal。
场景:manager_update
对方身份:strict_manager
主题:项目延期说明
目标:获得补救方案认可
害怕点:被打断,被质疑执行力
难度:medium
prepared clone voiceid:yourclonevoiceid
后面我发语音,和我进行多轮演练,最后给我复盘。
智能体应:
- 1. 首先收集演练槽位信息
- 构建蓝图
- 进入演练模式,回复模式默认为 voice
- 以对方轮次的语音(而非文字)开场
- 后续每轮演练:
- 使用 scripts/senseaudio_asr.py 进行语音转写
- 生成下一轮对方回复
- 使用代理语音或准备好的克隆 voice_id 合成该回复
- 在持续演练模式下,默认使用 --send-feishu-audio,将对方轮次以飞书 audio 消息发送,无需用户重复请求
- 仅当用户明确要求纯文字输出或当前渠道无法播放语音时,才回退至文字优先的回复
- 6. 以 scripts/analyzerehearsaltranscript.py 结束,返回具体的复盘报告
演练模式应在同一会话中保持粘性:
- - 保持相同的场景、对方角色、关系、主题、期望结果、恐惧触发点、难度及选定的 voice_id
- 从开场轮次起,默认保持语音回复,直至用户明确要求切换回文字回复或退出演练模式
- 若用户说直接发语音给我练或每轮都发语音,视为确认保持相同的粘性语音模式,而非单轮例外
若用户要求使用克隆语音,应理解为:
- - 有平台准备好的克隆 voiceid 时优先使用
- 否则暂停并询问克隆 voiceid,或回退至 proxy_voice
设计规则
- - 行为真实感优先于语音精确度
- 将公开文档中的克隆流程与实验性的工作区自动化流程视为独立路径
- 针对令人生畏的对手场景,分阶段构建演练:
- 开场施压
- 反驳
- 挑战性问题
- 收尾
- 用户说了什么
- 用户怎么说
API 密钥查找
本技能默认使用 SENSEAUDIOAPIKEY 作为 API 密钥来源。
实际规则:
- - scripts/runliverehearsalsession.py、scripts/runcompleterehearsalservice.py 和 scripts/senseaudiocounterparttts.py 现在默认使用 SENSEAUDIOAPIKEY
- 若宿主应用注入的 SENSEAUDIOAPIKEY 是类似 v2.public... 的登录令牌,共享引导程序会在演练调用开始前将其替换为 ~/.audioclaw/workspace/state/senseaudio_credentials.json 中的真实 sk-... 值
资源
- - scripts/buildrehearsalblueprint.py
- 构建结构化的演练计划和对方角色设定
- - scripts/buildcounterpartturn.py
- 根据演练状态和用户最新回复生成下一轮对方回复
- - scripts/senseaudio_asr.py
- 使用官方 AudioClaw HTTP ASR API 转写用户语音演练轮次
- - scripts/senseaudiocounterparttts.py
- 使用安全的代理语音或明确授权的克隆 voice_id 合成对方轮次
- - scripts/runliverehearsal_session.py
- 运行多轮实时演练会话,包含用户语音回复、对方生成、TTS 及自动复盘
- 支持 --stream-asr 和 --send-feishu-audio
- - scripts/sendrehearsalcounterpartstofeishu.py
- 复用飞书语音投递路径,将生成的对方轮次逐一以音频消息发送
- - scripts/senseaudiocloneworkspace.py
- 列出克隆槽位、可用语音,并通过官方 AudioClaw 工作区端点创建授权演练克隆,优先使用平台令牌,否则尝试已登录的 Chrome 浏览器会话
- - scripts/senseaudioplatformtoken.py
- 从环境变量或已登录的 Chrome AudioClaw 标签页(需启用 Apple Events JavaScript)解析 AudioClaw 工作区平台令牌
- - scripts/runcompleterehearsal_service.py
- 单一入口点:构建蓝图,可选解析准备好的克隆 voice_id 或尝试实验性工作区克隆自动化,运行实时演练会话,并写入摘要包
- 支持 --send-feishu-audio,使演练对手可主动向飞书或 AudioClaw 关联聊天发送语音轮次
- - scripts/analyzerehearsaltranscript.py
- 对演练记录进行语气和沟通风险评估评分
- - references/liverehearsalloop.md
- 适用于 AudioClaw 或其他智能体编排器的最小多轮运行时模式
- - references/rehearsal_design.md
- 产品设计、安全策略及发布计划