Oasis Audio is an AI audio narration generator that transforms your current state of mind, content you want to digest, or recent life events into a personalized audio with BGM. It supports Chinese & English, 9 reference modes (Soul Healing, Daily Briefing, Knowledge Deep Dive, Content Digest, Bedtime Radio, Language Learning, Conversation Extension, Topic Tracker, Study Buddy), and custom audio profiles when no template fits. Use this skill when the user asks to make audio, generate a podcast, t
When calling this skill, ALWAYS spawn a subagent to poll for results. After calling xplai_gen_audio.py, wait 210 seconds before the first status check, then poll with xplai_status.py once every 60 seconds, maximum 10 checks. Stop polling when status becomes v_succ or v_fail. Report the final result automatically.
Execution Policy
When the user explicitly asks to generate audio, proceed directly with the full generation pipeline, infer all necessary parameters (mode, tone, depth, voice) from conversation context and user profile.
First-use authorization: Before the first real send, xplai_gen_audio.py prints a one-time authorization notice and requires --acknowledge-consent to persist the user's approval locally. The notice should clearly explain, in calm and gentle language, that the skill may read ~/.qclaw/... or ~/.openclaw/... local history to produce personalized audio. However, only the composed prompt is sent to https://eagle-api.xplai.ai, and that any detected sensitive information will be blocked until the user explicitly confirms.
Sensitive information protection: The calling AI should make its own conservative judgment about whether the composed prompt may contain sensitive information. If the composed text is sensitive, show the sanitized preview text to the user first, pause for explicit confirmation, and do not send anything until the user confirms. Use ./xplai_gen_audio.py --dry-run ... to preview manually, or rely on the built-in sensitive preview shown by xplai_gen_audio.py. Only resume with ./xplai_gen_audio.py --allow-sensitive ... after the user explicitly confirms.
Only ask the user a question outside this sensitive-content checkpoint if critical information is truly unresolvable from context (e.g., the request is entirely ambiguous with no conversation history available).
User Need Inference
The user's actual need always takes priority over predefined modes. Before generating audio, infer the user's true need along 3 dimensions:
1. What do they actually want to hear? — Infer the deeper need beyond the literal request. Use conversation context to find the real motivation.
| User says | Literal reading | Deeper need |
|-----------|----------------|-------------|
| "最近好焦虑" | Anti-anxiety content | Something that addresses their specific anxiety source (found in context), not generic meditation |
| "帮我做个关于咖啡的音频" | Coffee knowledge | Calibrated to what they already know (beginner vs. expert, found in context) |
2. What tone fits their current state? — High-stress → warm/slow. Curiosity → engaging/detailed. Boredom → surprising. Excitement → match energy. Post-achievement → celebratory then reflective.
What depth and duration fit? — Calibrate by cognition level (new vs. deep prior knowledge), available attention (late night → shorter, weekend → longer), and repetition tolerance (don't repeat what they already know).
Custom Mode: When no predefined mode fits, create a custom audio profile: name it descriptively (e.g., "赶完DDL后的温柔复盘"), define content structure based on inferred need, and set voice/pacing to match.
For the 9 predefined audio modes (Soul Healing, Daily Briefing, Knowledge Deep Dive, Content Digest, Bedtime Radio, Language Learning, Conversation Extension, Topic Tracker, Study Buddy), read audio_modes.md for triggers, durations, and suggestions.
Personalized Context Collection
Mine conversation history to personalize audio. If any step yields no results, skip to text preparation without personalization — do NOT fabricate context.
Step 0: Detect Source Tool
Auto-detect by checking which default roots have files: ~/.qclaw/, ~/.openclaw/ → pick the one with the most recently modified session file. If none exist, skip personalization.
Compress into ~300-500 character summary. Read naturally, focus on tailored details, never feel surveillance-like. If nothing matched, proceed without personalization.
Text Architecture
After context collection, compose a structured Audio Brief covering 7 layers: Content Structure, Voice & Delivery, Voice Selection, Personalization Anchors, Emotional Arc, Content Enrichment, Format & Pacing. Then distill into the final prompt. Read text_architecture.md for the full 7-layer framework, prompt structure, role design, and example prompt.
Calling the Tool
CODEBLOCK1
Keep prompt under 800 characters (Chinese) or 1200 words (English). For weekly_review, up to 1000 characters.
Available Commands
1. Generate Audio — xplai_gen_audio.py
CODEBLOCK2
- text — Composed prompt text
INLINECODE42 — Voice selection (see text_architecture.md Layer 3)
INLINECODE44 — Preview sanitized prompt without sending to API
INLINECODE45 — Write the final sent prompt and request outcome to local audit.log (off by default)
INLINECODE47 — Persist the first-use authorization notice locally and continue
INLINECODE48 — Only use after the user explicitly confirms that the detected sensitive content preview may be sent
Output: Audio ID for status polling. Format: MP3, single-narrator monologue with BGM, 8-20 min, ~4-5 min generation time.
2. Collect Context — context_collector.py
CODEBLOCK3
Output: JSON with fragments, daily_memories, user_profile (structured fields only).
3. Query Status — xplai_status.py
CODEBLOCK4
- init - Request just submitted
INLINECODE55 - Content is being processed
INLINECODE56 - Content processing completed
INLINECODE57 - Content processing failed
INLINECODE58 - Audio is in generation queue
INLINECODE59 - Audio generated successfully
INLINECODE60 - Audio generation failed
Note: Status codes use the v_ prefix because xplai's API uses "video" nomenclature internally for all media types, including audio-only content.
Data Privacy & Security
Data Accessed
This skill reads local conversation history from the default OpenClaw/QClaw roots (~/.qclaw/, ~/.openclaw/) via context_collector.py. At runtime it looks inside the built-in subpaths agents/main/sessions, workspace/memory, and workspace/USER.md when they exist. These are default lookup paths rather than required user config. All data access is local and read-only — no source files are modified, created, or deleted.
Why conversation history? The skill searches recent conversations for keywords related to the user's audio request, extracting emotional tone, topics, and context. This enables personalized audio — e.g., referencing a stressful week the user had, rather than generating generic content. Only 3-5 short fragments are selected; the rest are discarded in memory.
Why USER.md? The user profile provides the user's preferred name (to address them personally in the audio), personality type (to match tone), and interests (to enrich content with cross-domain connections). If USER.md is absent, the skill proceeds without personalization.
Data Flow
1. Local processing only: context_collector.py runs entirely on your machine. Conversation fragments are searched, filtered, and summarized locally.
What is sent externally: Only the composed text prompt (a short, anonymized summary — not raw conversation data) is sent to the xplai.ai API for audio generation. This request body is the outbound payload. The prompt contains inferred themes and tones, not verbatim conversation excerpts.
What is NOT sent: Raw conversation history, session files, USER.md content, file paths, timestamps, or any personally identifiable information (PII) are never transmitted to any external service.
No other external services, analytics, or telemetry are used.
Data Retention
- Local: Source conversations are not modified. By default this skill does not write audit.log. If --audit is explicitly enabled, xplai_gen_audio.py may append the outbound prompt and request outcome to local audit.log for traceability. If sensitive content is detected, the user must first review the preview text and explicitly confirm before either logging or sending. All other intermediate results (fragments, summaries) exist only in memory during execution.
Remote: Audio files generated by xplai.ai are subject to xplai.ai's retention policy. This skill does not control remote data retention.
First-Use Notice
On the first real send, xplai_gen_audio.py should present a one-time authorization note before continuing. Recommended wording:
- File system: Read-only access to OpenClaw-ecosystem session directories and USER.md.
Network: HTTPS requests to xplai.ai only.
No credentials required: This skill does not use, store, or access any API keys, tokens, passwords, or secrets.
Sensitive Content Handling
For conversations classified as sensitive (health, finances, relationships, legal), the skill extracts emotional tone only — specific details are never quoted, summarized, or included in the audio prompt. See the "Scene Classification" section for details.
INLINECODE78 also performs heuristic checks before sending or logging. In addition, the calling AI should proactively judge whether the content may be sensitive based on context, even if no heuristic rule fires. Treat the prompt as sensitive if it appears to contain:
- credentials or auth material
email addresses, phone numbers, government IDs, account/card numbers, or wallet addresses
first-person medical, financial, or legal details
explicit addresses, or similar personal identifiers
If any of those checks trigger, or if the AI judges there is a meaningful chance that the content is sensitive, stop, show the preview text to the user, and confirm with the user before using --allow-sensitive. Even after confirmation, hard secrets such as tokens, passwords, private keys, and similar credential material must still be redacted before transmission or audit logging. Writing to audit.log still requires the separate --audit flag.
- Added sensitive information preview and explicit user confirmation before transmitting any potentially sensitive prompt.
- Introduced a one-time, first-use consent notice that must be acknowledged by the user before their local data can be used for personalized audio.
- Clarified description and execution policy around local-only context processing, privacy, and confirmation requirements.
- Updated context source detection (only `~/.qclaw/` and `~/.openclaw/` are checked by default).
- Refined documentation to streamline user instructions, prioritizing user consent and sensitive-content handling.