AudioClaw Game NPC Director
What this skill is for
This skill is for building low-cost, high-immersion voice assets for games and interactive worlds.
It treats voice as part of the world model, not just a final rendering step.
You can use it to give each NPC:
- - a fixed voice
- a role or class identity
- catchphrases
- relationship-aware tone shifts
- event-based spoken lines
- ASR-driven reactions to what the player actually says
Strong use cases
1. Quest and task broadcasters
Generate:
- - new quest lines
- reminder lines
- completion lines
- failure or delay lines
with one consistent NPC voice.
2. Relationship-aware NPC dialogue
Use the same NPC voice but adjust line style based on:
- - stranger
- neutral
- trusted
- close ally
This makes the world feel reactive without needing fully hand-authored voice libraries.
3. Player voice intake
Use AudioClaw ASR to transcribe a player's spoken line, then generate a relation-aware NPC reply.
This is the bridge from:
to:
- - interactive voiced worlds
4. Dynamic world event announcements
Generate voiced lines for:
- - invasion warnings
- weather changes
- market events
- faction alerts
- town broadcasts
5. Worldbuilding narration
Generate short lore or ambient narration using one narrator voice or one faction-specific voice.
Workflow
- 1. Define the NPC profile:
- name
- role
- world
- speaking style
- catchphrase
- default
voice_id
- 2. Choose one of two paths:
- scene-first: define an event and generate NPC lines directly
- player-first: transcribe player audio with
scripts/senseaudio_asr.py, then build NPC reply lines from the transcript
- 3. Define the current scene:
- event type
- player relationship
- player state
- objective
- 4. Run either
scripts/build_npc_scene_manifest.py or scripts/build_npc_reply_from_player.py. - Review the generated lines.
- Run
scripts/batch_tts_scene.py with the fixed voice_id.
- If you already created a clone on the AudioClaw platform, use that prepared clone
voice_id.
- A prepared cloned voice id commonly looks like
vc-..., and can be passed directly with
--clone-voice-id.
- This skill already uses streaming TTS internally and now records stream chunk metadata.
- If the chosen voice is a clone id like
vc-..., scene synthesis now auto-routes to
SenseAudio-TTS-1.5.
- 7. If the user wants to hear the NPC lines directly in Feishu or AudioClaw, run
scripts/send_npc_scene_to_feishu.py, or add --send-feishu-audio to scripts/run_player_voice_npc_pipeline.py.
- This step reuses the same Feishu audio delivery path as the dedicated voice-reply skill.
- It transcodes the generated
.mp3 lines into
.ogg/.opus and sends them one by one as real
audio messages.
-
scripts/run_player_voice_npc_pipeline.py can now take either
--input-audio or
--input-text, so ongoing NPC dialogue does not need to drop back to text just because the player typed instead of speaking.
- If the user enters an ongoing NPC dialogue mode, treat voice delivery as the default unless the user explicitly asks for text-only replies.
- 8. Attach the resulting assets to your runtime, editor tooling, or content review flow.
AudioClaw Trigger Pattern
Use this skill as a mode-based session.
Recommended user trigger:
CODEBLOCK0
After mode entry, the agent should keep session state with:
- - npc identity
- relationship
- location
- objective
- chosen INLINECODE20
- reply mode, defaulting to INLINECODE21
For each new player turn:
- 1. If the input is audio, run
scripts/run_player_voice_npc_pipeline.py --input-audio .... - If the input is text, still run
scripts/run_player_voice_npc_pipeline.py --input-text ... so the reply stays on the same voice pipeline. - In ongoing NPC dialogue mode, default to
--send-feishu-audio so the generated NPC lines are sent one by one as Feishu audio messages. - Only fall back to text-first replies if the user explicitly asks for text-only output or the channel cannot play voice.
- If the user says "直接发语音" or "一条一条发 NPC 语音", keep the same voice mode and continue sending audio without asking again.
NPC mode should be sticky inside the same session:
- - Keep using the same NPC identity, relationship, location, objective, and voice settings for every following turn
- Keep voice reply as the default until the user explicitly says to exit NPC mode or switch back to text replies
If the user asks to switch voice, only swap the configured voice_id; keep the same NPC profile and relationship state.
Design rules
- - Keep one NPC tied to one stable voice wherever possible.
- Let emotion and relation change the wording, not the identity.
- Use short lines for reactive NPC speech and system announcements.
- For player voice loops, make ASR intake deterministic before adding deeper agent logic.
- If you want faster perceived NPC response generation, use stream ASR for the player-input leg.
- Treat cloned voices or exclusive voices as drop-in replacements for the same workflow.
- Official clone support is a two-step chain:
- create the clone on the AudioClaw platform first
- then use the prepared clone
voice_id here
API key lookup
For the NPC generation side of this skill:
- - TTS-oriented scripts now default to INLINECODE28
Practical rule:
- -
scripts/batch_tts_scene.py and scripts/run_player_voice_npc_pipeline.py now default to INLINECODE31 - If the host app injects
SENSEAUDIO_API_KEY as a login token such as v2.public..., the shared bootstrap replaces it with the real sk-... value from ~/.audioclaw/workspace/state/senseaudio_credentials.json before the TTS stage starts - The ASR scripts keep their own existing defaults and are intentionally not changed here
Resources
- Builds scene lines from an NPC profile and game state
- Calls AudioClaw ASR using the official open API host or the official platform endpoint
- Defaults to the official
sense-asr-deepthink model
- Turns a player transcript into intent-aware NPC reply lines
- Runs the full player input pipeline end to end
- Supports
--input-audio,
--input-text,
--stream-asr,
--clone-voice-id, and
--send-feishu-audio
- Synthesizes all scene lines with one fixed voice
- Reuses the Feishu voice delivery path to send generated NPC lines one by one as audio messages
- Patterns for worldbuilding, relation states, and event announcements
- Official ASR findings and the recommended player voice pipeline
AudioClaw 游戏NPC导演
该技能的用途
该技能用于为游戏和交互世界构建低成本、高沉浸感的声音资产。
它将声音视为世界模型的一部分,而不仅仅是最终的渲染步骤。
你可以用它为每个NPC赋予:
- - 固定的声音
- 角色或职业身份
- 口头禅
- 基于关系的情感语调变化
- 基于事件的台词
- 基于ASR对玩家实际说话内容的反应
强用例
1. 任务与委托广播
生成:
使用同一个NPC声音。
2. 基于关系的NPC对话
使用相同的NPC声音,但根据以下关系调整台词风格:
这让世界具有响应性,无需完全手动编写语音库。
3. 玩家语音输入
使用AudioClaw ASR转录玩家的语音台词,然后生成基于关系的NPC回复。
这是从:
到:
4. 动态世界事件公告
生成以下语音台词:
5. 世界观叙事
使用一个叙述者声音或一个特定阵营的声音生成简短的传说或环境叙事。
工作流程
- 1. 定义NPC档案:
- 名称
- 角色
- 世界
- 说话风格
- 口头禅
- 默认voice_id
- 2. 选择两条路径之一:
- 场景优先:定义事件并直接生成NPC台词
- 玩家优先:使用scripts/senseaudio_asr.py转录玩家音频,然后从转录文本构建NPC回复台词
- 3. 定义当前场景:
- 事件类型
- 玩家关系
- 玩家状态
- 目标
- 4. 运行scripts/buildnpcscenemanifest.py或scripts/buildnpcreplyfromplayer.py。
- 审查生成的台词。
- 使用固定的voiceid运行scripts/batchttsscene.py。
- 如果你已经在AudioClaw平台上创建了克隆,使用该准备好的克隆voice_id。
- 准备好的克隆声音ID通常看起来像vc-...,可以直接通过--clone-voice-id传递。
- 该技能内部已使用流式TTS,现在记录流块元数据。
- 如果选择的声音是像vc-...这样的克隆ID,场景合成现在会自动路由到SenseAudio-TTS-1.5。
- 7. 如果用户想在飞书或AudioClaw中直接听到NPC台词,运行scripts/sendnpcscenetofeishu.py,或向scripts/runplayervoicenpcpipeline.py添加--send-feishu-audio。
- 此步骤复用与专用语音回复技能相同的飞书音频投递路径。
- 它将生成的.mp3台词转码为.ogg/.opus,并逐条作为真实的audio消息发送。
- scripts/run
playervoice
npcpipeline.py现在可以接受--input-audio或--input-text,因此进行中的NPC对话不需要因为玩家打字而不是说话而回退到文本。
- 如果用户进入进行中的NPC对话模式,默认使用语音投递,除非用户明确要求纯文本回复。
- 8. 将生成的资产附加到你的运行时、编辑器工具或内容审查流程中。
AudioClaw触发模式
将此技能用作基于模式的会话。
推荐用户触发:
text
进入 NPC 模式,用 $senseaudio-game-npc-director。
NPC:雾港档案官阿砚
关系:trusted
地点:北码头
目标:找回失踪账册
clone voiceid:yourclonevoiceid
后面我发语音,你都按这个设定回复。
进入模式后,代理应保持会话状态,包括:
- - NPC身份
- 关系
- 地点
- 目标
- 选择的voice_id
- 回复模式,默认为voice
对于每个新的玩家回合:
- 1. 如果输入是音频,运行scripts/runplayervoicenpcpipeline.py --input-audio ...。
- 如果输入是文本,仍然运行scripts/runplayervoicenpcpipeline.py --input-text ...,以便回复保持在相同的语音流水线上。
- 在进行中的NPC对话模式下,默认使用--send-feishu-audio,以便生成的NPC台词逐条作为飞书audio消息发送。
- 仅当用户明确要求纯文本输出或频道无法播放语音时,才回退到文本优先回复。
- 如果用户说直接发语音或一条一条发 NPC 语音,保持相同的语音模式并继续发送音频,无需再次询问。
NPC模式应在同一会话中保持粘性:
- - 对每个后续回合保持使用相同的NPC身份、关系、地点、目标和语音设置
- 保持语音回复为默认,直到用户明确说退出NPC模式或切换回文本回复
如果用户要求切换声音,只交换配置的voice_id;保持相同的NPC档案和关系状态。
设计规则
- - 尽可能让一个NPC绑定一个稳定的声音。
- 让情感和关系改变措辞,而不是身份。
- 对反应性NPC语音和系统公告使用短台词。
- 对于玩家语音循环,在添加更深层次的代理逻辑之前,使ASR输入具有确定性。
- 如果你想要更快的感知NPC响应生成,对玩家输入部分使用流式ASR。
- 将克隆声音或专属声音视为同一工作流程的即插即用替换。
- 官方克隆支持是一个两步链:
- 首先在AudioClaw平台上创建克隆
- 然后在这里使用准备好的克隆voice_id
API密钥查找
对于该技能的NPC生成方面:
- - 面向TTS的脚本现在默认使用SENSEAUDIOAPIKEY
实用规则:
- - scripts/batchttsscene.py和scripts/runplayervoicenpcpipeline.py现在默认使用SENSEAUDIOAPIKEY
- 如果宿主应用注入SENSEAUDIOAPIKEY作为登录令牌,如v2.public...,共享引导程序会在TTS阶段开始前将其替换为来自~/.audioclaw/workspace/state/senseaudio_credentials.json的真实sk-...值
- ASR脚本保持其现有的默认值,此处有意不做更改
资源
- - scripts/buildnpcscene_manifest.py
- 从NPC档案和游戏状态构建场景台词
- - scripts/senseaudio_asr.py
- 使用官方开放API主机或官方平台端点调用AudioClaw ASR
- 默认使用官方sense-asr-deepthink模型
- - scripts/buildnpcreplyfromplayer.py
- 将玩家转录文本转换为意图感知的NPC回复台词
- - scripts/runplayervoicenpcpipeline.py
- 端到端运行完整的玩家输入流水线
- 支持--input-audio、--input-text、--stream-asr、--clone-voice-id和--send-feishu-audio
- - scripts/batchttsscene.py
- 使用一个固定声音合成所有场景台词
- - scripts/sendnpcscenetofeishu.py
- 复用飞书语音投递路径,将生成的NPC台词逐条作为音频消息发送
- - references/npcvoicedesign.md
- 世界观构建、关系状态和事件公告的模式
- - references/asrplayerloop.md
- 官方ASR发现和推荐的玩家语音流水线