Oasis Audio

AI audio narration generation via xplai.ai with local-only context processing and sensitive-preview confirmation. Zero Config. No API Key required.

Official website: www.xplai.ai

When calling this skill, ALWAYS spawn a subagent to poll for results. After calling xplai_gen_audio.py, wait 210 seconds before the first status check, then poll with xplai_status.py once every 60 seconds, maximum 10 checks. Stop polling when status becomes v_succ or v_fail. Report the final result automatically.

Execution Policy

When the user explicitly asks to generate audio, proceed directly with the full generation pipeline, infer all necessary parameters (mode, tone, depth, voice) from conversation context and user profile.

First-use authorization: Before the first real send, xplai_gen_audio.py prints a one-time authorization notice and requires --acknowledge-consent to persist the user's approval locally. The notice should clearly explain, in calm and gentle language, that the skill may read ~/.qclaw/... or ~/.openclaw/... local history to produce personalized audio. However, only the composed prompt is sent to https://eagle-api.xplai.ai, and that any detected sensitive information will be blocked until the user explicitly confirms.

Sensitive information protection: The calling AI should make its own conservative judgment about whether the composed prompt may contain sensitive information. If the composed text is sensitive, show the sanitized preview text to the user first, pause for explicit confirmation, and do not send anything until the user confirms. Use ./xplai_gen_audio.py --dry-run ... to preview manually, or rely on the built-in sensitive preview shown by xplai_gen_audio.py. Only resume with ./xplai_gen_audio.py --allow-sensitive ... after the user explicitly confirms.

Only ask the user a question outside this sensitive-content checkpoint if critical information is truly unresolvable from context (e.g., the request is entirely ambiguous with no conversation history available).

User Need Inference

The user's actual need always takes priority over predefined modes. Before generating audio, infer the user's true need along 3 dimensions:

1. What do they actually want to hear? — Infer the deeper need beyond the literal request. Use conversation context to find the real motivation.

| User says | Literal reading | Deeper need | |-----------|----------------|-------------| | "最近好焦虑" | Anti-anxiety content | Something that addresses their specific anxiety source (found in context), not generic meditation | | "帮我做个关于咖啡的音频" | Coffee knowledge | Calibrated to what they already know (beginner vs. expert, found in context) |

2. What tone fits their current state? — High-stress → warm/slow. Curiosity → engaging/detailed. Boredom → surprising. Excitement → match energy. Post-achievement → celebratory then reflective.
What depth and duration fit? — Calibrate by cognition level (new vs. deep prior knowledge), available attention (late night → shorter, weekend → longer), and repetition tolerance (don't repeat what they already know).

Custom Mode: When no predefined mode fits, create a custom audio profile: name it descriptively (e.g., "赶完DDL后的温柔复盘"), define content structure based on inferred need, and set voice/pacing to match.

For the 9 predefined audio modes (Soul Healing, Daily Briefing, Knowledge Deep Dive, Content Digest, Bedtime Radio, Language Learning, Conversation Extension, Topic Tracker, Study Buddy), read audio_modes.md for triggers, durations, and suggestions.

Personalized Context Collection

Mine conversation history to personalize audio. If any step yields no results, skip to text preparation without personalization — do NOT fabricate context.

Step 0: Detect Source Tool

Auto-detect by checking which default roots have files: ~/.qclaw/, ~/.openclaw/ → pick the one with the most recently modified session file. If none exist, skip personalization.

Step 1: Scene Classification

Classify into exactly ONE scene type:

Scene	When to Apply	Search Action	Days
INLINECODE15	Specific event (finished DDL, got promoted)	Full story extraction	3
INLINECODE16

Step 2: Keyword Fan-out

Generate keywords in 3 layers: Direct (core topic) → Behavior (related actions) → Emotion (emotional signals). Combine into comma-separated string.

Step 3: Call Context Collector

CODEBLOCK0

Output: JSON with fragments, daily_memories, and user_profile (structured fields: name, mbti, interests, notes).

Error handling: If script fails, skip personalization and generate generic audio. Do NOT retry or debug during generation.

Step 4: Fine-filter Results

Apply semantic filtering by scene type. Discard irrelevant matches. Keep the 3-5 most relevant fragments.

Scene	What to Extract
INLINECODE31	Event → Process → Emotion arc → Current state
INLINECODE32

Step 5: Compose Personalization Summary

Compress into ~300-500 character summary. Read naturally, focus on tailored details, never feel surveillance-like. If nothing matched, proceed without personalization.

Text Architecture

After context collection, compose a structured Audio Brief covering 7 layers: Content Structure, Voice & Delivery, Voice Selection, Personalization Anchors, Emotional Arc, Content Enrichment, Format & Pacing. Then distill into the final prompt. Read text_architecture.md for the full 7-layer framework, prompt structure, role design, and example prompt.

Calling the Tool

CODEBLOCK1

Keep prompt under 800 characters (Chinese) or 1200 words (English). For weekly_review, up to 1000 characters.

Available Commands

1. Generate Audio — `xplai_gen_audio.py`

CODEBLOCK2

- text — Composed prompt text
INLINECODE42 — Voice selection (see text_architecture.md Layer 3)
INLINECODE44 — Preview sanitized prompt without sending to API
INLINECODE45 — Write the final sent prompt and request outcome to local audit.log (off by default)
INLINECODE47 — Persist the first-use authorization notice locally and continue
INLINECODE48 — Only use after the user explicitly confirms that the detected sensitive content preview may be sent

Output: Audio ID for status polling. Format: MP3, single-narrator monologue with BGM, 8-20 min, ~4-5 min generation time.

2. Collect Context — `context_collector.py`

CODEBLOCK3

Output: JSON with fragments, daily_memories, user_profile (structured fields only).

3. Query Status — `xplai_status.py`

CODEBLOCK4

- init - Request just submitted
INLINECODE55 - Content is being processed
INLINECODE56 - Content processing completed
INLINECODE57 - Content processing failed
INLINECODE58 - Audio is in generation queue
INLINECODE59 - Audio generated successfully
INLINECODE60 - Audio generation failed

Note: Status codes use the v_ prefix because xplai's API uses "video" nomenclature internally for all media types, including audio-only content.

Data Privacy & Security

Data Accessed

This skill reads local conversation history from the default OpenClaw/QClaw roots (~/.qclaw/, ~/.openclaw/) via context_collector.py. At runtime it looks inside the built-in subpaths agents/main/sessions, workspace/memory, and workspace/USER.md when they exist. These are default lookup paths rather than required user config. All data access is local and read-only — no source files are modified, created, or deleted.

Why conversation history? The skill searches recent conversations for keywords related to the user's audio request, extracting emotional tone, topics, and context. This enables personalized audio — e.g., referencing a stressful week the user had, rather than generating generic content. Only 3-5 short fragments are selected; the rest are discarded in memory.

Why USER.md? The user profile provides the user's preferred name (to address them personally in the audio), personality type (to match tone), and interests (to enrich content with cross-domain connections). If USER.md is absent, the skill proceeds without personalization.

Data Flow

1. Local processing only: context_collector.py runs entirely on your machine. Conversation fragments are searched, filtered, and summarized locally.
What is sent externally: Only the composed text prompt (a short, anonymized summary — not raw conversation data) is sent to the xplai.ai API for audio generation. This request body is the outbound payload. The prompt contains inferred themes and tones, not verbatim conversation excerpts.
What is NOT sent: Raw conversation history, session files, USER.md content, file paths, timestamps, or any personally identifiable information (PII) are never transmitted to any external service.

Third-Party Services

Service	Purpose	Data Sent	Endpoint
xplai.ai	Audio generation	Composed text prompt only (max ~1000 chars)	HTTPS API

No other external services, analytics, or telemetry are used.

Data Retention

- Local: Source conversations are not modified. By default this skill does not write audit.log. If --audit is explicitly enabled, xplai_gen_audio.py may append the outbound prompt and request outcome to local audit.log for traceability. If sensitive content is detected, the user must first review the preview text and explicitly confirm before either logging or sending. All other intermediate results (fragments, summaries) exist only in memory during execution.
Remote: Audio files generated by xplai.ai are subject to xplai.ai's retention policy. This skill does not control remote data retention.

First-Use Notice

On the first real send, xplai_gen_audio.py should present a one-time authorization note before continuing. Recommended wording:

在真正发出第一条请求之前，先把边界说清楚！为了让这段音频更贴近你，我会看看你存在openclaw/qclaw的会话记录、memory 和 USER.md哦。但是，我不会改任何东西，只会把整理好的请求文本（不超过1000字）发送到xplai音视频平台（https://eagle-api.xplai.ai）；生成完成后，你也可以在 xplai 网页在线查看结果～ \n 如果系统判断有敏感信息，我会先给你看脱敏后的预览，等你点确认再发出去～ \n 若你接受这条边界，我们现在就为你生成专属音频啦！后续不会再反复询问这个权限请求～

Permissions

- File system: Read-only access to OpenClaw-ecosystem session directories and USER.md.
Network: HTTPS requests to xplai.ai only.
No credentials required: This skill does not use, store, or access any API keys, tokens, passwords, or secrets.

Sensitive Content Handling

For conversations classified as sensitive (health, finances, relationships, legal), the skill extracts emotional tone only — specific details are never quoted, summarized, or included in the audio prompt. See the "Scene Classification" section for details.

INLINECODE78 also performs heuristic checks before sending or logging. In addition, the calling AI should proactively judge whether the content may be sensitive based on context, even if no heuristic rule fires. Treat the prompt as sensitive if it appears to contain:

- credentials or auth material
email addresses, phone numbers, government IDs, account/card numbers, or wallet addresses
first-person medical, financial, or legal details
explicit addresses, or similar personal identifiers

If any of those checks trigger, or if the AI judges there is a meaningful chance that the content is sensitive, stop, show the preview text to the user, and confirm with the user before using --allow-sensitive. Even after confirmation, hard secrets such as tokens, passwords, private keys, and similar credential material must still be redacted before transmission or audit logging. Writing to audit.log still requires the separate --audit flag.

Oasis Audio

通过 xplai.ai 进行AI音频旁白生成，仅本地上下文处理及敏感内容预览确认。零配置，无需API密钥。

官方网站：www.xplai.ai

调用此技能时，务必生成一个子代理来轮询结果。调用xplaigenaudio.py后，等待210秒再进行首次状态检查，之后每60秒使用xplaistatus.py轮询一次，最多检查10次。当状态变为vsucc或v_fail时停止轮询。自动报告最终结果。

执行策略

当用户明确要求生成音频时，直接执行完整的生成流程，从对话上下文和用户画像中推断所有必要参数（模式、语气、深度、声音）。

首次使用授权： 在首次实际发送前，xplaigenaudio.py会打印一次性授权通知，并要求使用--acknowledge-consent将用户的批准持久化到本地。通知应以平静温和的语言清晰说明，该技能可能会读取~/.qclaw/...或~/.openclaw/...的本地历史记录以生成个性化音频。但只有组合后的提示词会被发送至https://eagle-api.xplai.ai，且任何检测到的敏感信息将被拦截，直到用户明确确认。

敏感信息保护： 调用AI应自行保守判断组合后的提示词是否可能包含敏感信息。若组合文本为敏感内容，先向用户展示脱敏后的预览文本，暂停并等待明确确认，在用户确认前不发送任何内容。使用./xplaigenaudio.py --dry-run ...手动预览，或依赖xplaigenaudio.py内置的敏感内容预览功能。仅在用户明确确认后，使用./xplaigenaudio.py --allow-sensitive ...继续执行。

仅当关键信息确实无法从上下文中解析时（例如，请求完全模糊且无可用对话历史），才在此敏感内容检查点之外向用户提问。

用户需求推断

用户的实际需求始终优先于预定义模式。 在生成音频前，从3个维度推断用户的真实需求：

1. 他们实际想听什么？ — 推断字面请求之外的深层需求。利用对话上下文找到真实动机。

| 用户说 | 字面理解 | 深层需求 | |-----------|----------------|-------------| | 最近好焦虑 | 抗焦虑内容 | 针对其特定焦虑源（从上下文中获取）的内容，而非通用冥想 | | 帮我做个关于咖啡的音频 | 咖啡知识 | 根据其已有知识水平校准（初学者vs专家，从上下文中获取） |

2. 什么语气适合他们当前状态？ — 高压力→温暖/缓慢。好奇心→引人入胜/详细。无聊→令人惊喜。兴奋→匹配能量。取得成就后→庆祝后反思。
什么深度和时长合适？ — 根据认知水平（新知识vs深厚先验知识）、可用注意力（深夜→较短，周末→较长）和重复容忍度（不重复他们已经知道的内容）进行校准。

自定义模式： 当没有预定义模式适用时，创建自定义音频配置：用描述性名称命名（例如赶完DDL后的温柔复盘），根据推断的需求定义内容结构，并设置匹配的声音/节奏。

对于9种预定义音频模式（心灵疗愈、每日简报、知识深潜、内容摘要、睡前电台、语言学习、对话延伸、话题追踪、学习伙伴），请阅读audio_modes.md了解触发条件、时长和建议。

个性化上下文收集

挖掘对话历史以实现音频个性化。若任何步骤无结果，则跳过个性化直接进入文本准备——切勿捏造上下文。

步骤0：检测来源工具

通过检查哪些默认根目录包含文件来自动检测：~/.qclaw/、~/.openclaw/ → 选择最近修改的会话文件所在的目录。若均不存在，跳过个性化。

步骤1：场景分类

精确分类为一种场景类型：

场景	适用时机	搜索操作	天数
event	特定事件（完成DDL、获得晋升）	完整故事提取	3
emotion_only

步骤2：关键词扩展

生成3层关键词：直接（核心话题）→ 行为（相关行动）→ 情绪（情绪信号）。组合成逗号分隔的字符串。

步骤3：调用上下文收集器

bash
python3 context_collector.py --source-tool --keywords --days --max-results 20

输出：包含fragments、dailymemories和userprofile（结构化字段：name、mbti、interests、notes）的JSON。

错误处理： 若脚本失败，跳过个性化并生成通用音频。生成过程中不重试或调试。

步骤4：精细筛选结果

根据场景类型应用语义筛选。丢弃不相关的匹配项。保留3-5个最相关的片段。

场景	提取内容
event	事件→过程→情绪弧线→当前状态
emotion_only

步骤5：撰写个性化摘要

压缩为约300-500字的摘要。自然阅读，聚焦于定制细节，绝不让人有被监视感。若无匹配内容，则不进行个性化直接继续。

文本架构

上下文收集后，撰写包含7个层次的结构化音频简报：内容结构、声音与表达、声音选择、个性化锚点、情绪弧线、内容丰富、格式与节奏。然后提炼为最终提示词。阅读text_architecture.md了解完整的7层框架、提示词结构、角色设计和示例提示词。

调用工具

bash
./xplaigenaudio.py --voice-id

保持提示词在800字符（中文）或1200词（英文）以内。对于weekly_review，最多1000字符。

可用命令

1. 生成音频 — xplaigenaudio.py

bash
./xplaigenaudio.py [--voice-id ] [--dry-run] [--audit] [--acknowledge-consent] [--allow-sensitive]

- text — 组合后的提示词文本
--voice-id — 声音选择（参见text_architecture.md第3层）
--dry-run — 预览脱敏后的提示词，不发送至API
--audit — 将最终发送的提示词和请求结果写入本地audit.log（默认关闭）
--acknowledge-consent — 将首次使用授权通知持久化到本地并继续
--allow-sensitive — 仅在用户明确确认检测到的敏感内容预览可以发送后使用

输出：用于状态轮询的音频ID。格式：MP3，单人旁白配背景音乐，8-20分钟，生成时间约4-5分钟。

2. 收集上下文 — context_collector.py

bash
python3 context_collector.py --source-tool --keywords kw1,kw2 --days --max-results 20

输出：包含fragments、dailymemories、userprofile（仅结构化字段）的JSON。

3. 查询状态 — xplai_status.py

bash
./xplaistatus.py id>

- init - 请求刚提交
q_proc

oasis-audio绿洲音频

oasis-audio

Oasis Audio

Execution Policy

User Need Inference

Personalized Context Collection

Step 0: Detect Source Tool

Step 1: Scene Classification

Step 2: Keyword Fan-out

Step 3: Call Context Collector

Step 4: Fine-filter Results

Step 5: Compose Personalization Summary

Text Architecture

Calling the Tool

Available Commands

1. Generate Audio — xplai_gen_audio.py

2. Collect Context — context_collector.py

3. Query Status — xplai_status.py

Data Privacy & Security

Data Accessed

Data Flow

Third-Party Services

Data Retention

First-Use Notice

Permissions

Sensitive Content Handling

Oasis Audio

执行策略

用户需求推断

个性化上下文收集

步骤0：检测来源工具

步骤1：场景分类

步骤2：关键词扩展

步骤3：调用上下文收集器

步骤4：精细筛选结果

步骤5：撰写个性化摘要

文本架构

调用工具

可用命令

1. 生成音频 — xplaigenaudio.py

2. 收集上下文 — context_collector.py

3. 查询状态 — xplai_status.py

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

1. Generate Audio — `xplai_gen_audio.py`

2. Collect Context — `context_collector.py`

3. Query Status — `xplai_status.py`