Audio Cog - AI Audio Generation Powered by CellCog

Create professional audio with AI — voiceovers, music, sound effects, and personalized avatar voices.

CellCog provides three voice providers, each with different strengths. Choose based on your needs:

Scenario	Provider	Why
Standard narration/voiceover	OpenAI	Best voice style control, consistent quality
Emotional/dramatic delivery

How to Use

For your first CellCog task in a session, read the cellcog skill for the full SDK reference — file handling, chat modes, timeouts, and more.

OpenClaw (fire-and-forget):
CODEBLOCK0

All agents except OpenClaw (blocks until done):

from cellcog import CellCogClient
client = CellCogClient(agent_provider="openclaw|cursor|claude-code|codex|...")
result = client.create_chat(
    prompt="[your task prompt]",
    task_label="my-task",
    chat_mode="agent",
)
print(result["message"])

Voice Providers

OpenAI (Default)

Best for standard narration, voiceovers, and single-speaker content with precise delivery control.

Key strength: Natural-language style instructions — describe the accent, tone, pacing, and emotion you want.

8 built-in voices:

Voice	Gender	Characteristics
cedar	Male	Warm, resonant, authoritative, trustworthy
marin

Best quality: cedar (male), marin (female).

Style customization examples:

- "Warm conversational tone, medium pace, slight enthusiasm when mentioning features. American accent."
"Deep, hushed, enigmatic, with a slow deliberate cadence — true crime narrator style."
"Heavy French accent, sophisticated yet friendly, moderate pacing with deliberate pauses."

ElevenLabs

Best for emotional delivery, dramatic content, character voices, and audiobook narration.

Key strength: Emotion tags embedded directly in text — [laughs], [sighs], [whispers], [excited], [sarcastic]. Plus 100+ diverse pre-made voices.

Emotion tags (use sparingly — 1-2 per paragraph):

Tag	Effect
INLINECODE5	Natural laughter
INLINECODE6

Example prompt:

"Generate speech using ElevenLabs with a warm British male voice:

'And then, just when everyone thought it was over... [pause] [whispers] it wasn't.'"

MiniMax

Best for cloned voices (avatars) and fine-grained voice control.

Key strength: MiniMax Speech 2.8 HD — studio-grade audio quality. Supports avatar cloned voice IDs for personalized content, plus 17+ standard pre-made voices with granular speed, pitch, and volume control.

Standard voices include: Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Wise_Woman, Friendly_Person, Young_Knight, Elegant_Man, and more.

Voice settings: emotion (happy/sad/angry/neutral/etc.), speed (0.5–2.0), volume (0–10), pitch (-12 to 12).

Avatar / Cloned Voice

Users can create avatars on CellCog with their own cloned voice. When an avatar has a cloned voice, CellCog uses the MiniMax provider to generate speech that sounds like that person.

How it works:

- The user creates an avatar on cellcog.ai and uploads voice samples
CellCog clones their voice using MiniMax Speech 2.8 HD
Any audio request referencing that avatar uses their cloned voice

Example prompt:

"Generate a voiceover using my avatar Luna's voice: 'Welcome to our quarterly update. I'm excited to share some incredible results with you today.'"

This is powerful for creating consistent, personalized content — marketing videos, podcast intros, course narration — all in the user's own voice.

Sound Effects (SFX)

CellCog generates standalone sound effects from text descriptions. Royalty-free, 0.1 to 30 seconds.

Example prompts:

- "Generate a sound effect of heavy rain hitting a metal roof with occasional thunder, 10 seconds"
"Create a crispy footsteps-on-fresh-snow sound effect, 5 seconds"
"Generate an echoing door slam in a large empty warehouse"

Tips for better SFX:

- Be specific about textures and environment
Specify duration when exact length matters
For ambient audio longer than 30 seconds, generate a short loopable segment and extend with ffmpeg

Music Generation

Create original music from text descriptions. 3 seconds to 10 minutes. Royalty-free.

Capabilities:

- Any genre or genre fusion
Instrumental and vocal tracks (specify if you want vocals)
Complex arrangements, mood transitions, and energy dynamics
Describe what you want — the model handles music theory

Example prompts:

- "Create 2 minutes of calm lo-fi hip-hop background music with soft piano and mellow beats, 75 BPM"
"Generate a 15-second upbeat tech podcast intro jingle"
"Create 90 seconds of cinematic orchestral music — start soft and inspiring, build to a confident crescendo"
"Generate a 3-minute pop song about summer adventures with female vocals"

For precise section-by-section control (exact timing per section), describe your composition plan in detail — CellCog handles the structure.

All generated music is royalty-free — use commercially without attribution or licensing fees.

Multi-Language Support

All three voice providers support 40+ languages. Provide speech text in the target language:

English, Spanish, French, German, Italian, Portuguese, Chinese (Mandarin/Cantonese), Japanese, Korean, Hindi, Arabic, Russian, Polish, Dutch, Turkish, and many more.

Chat Mode

Use chat_mode="agent" for all audio tasks. Audio generation executes efficiently in agent mode — no need for agent team.

Tips for Better Audio

1. Choose the right provider: OpenAI for standard narration, ElevenLabs for emotional/dramatic, MiniMax for cloned voices
Provide the complete script: Write out exactly what should be spoken — don't say "something about our product"
Include style instructions: "Confident but warm", "slow and deliberate", "with slight excitement"
For music: Specify duration, mood, genre, and tempo (BPM if you know it)
Pronunciation guidance: For names or technical terms, add hints: "CellCog (pronounced SELL-kog)"
For ElevenLabs emotion tags: Use sparingly — 1-2 per paragraph. Tags affect all subsequent text until a new tag.

If CellCog is not installed

Run /cellcog-setup (or /cellcog:cellcog-setup depending on your tool) to install and authenticate.
OpenClaw users: Run clawhub install cellcog instead.
Manual setup: pip install -U cellcog and set CELLCOG_API_KEY. See the cellcog skill for SDK reference.

Audio Cog - 由CellCog驱动的AI音频生成

使用AI创建专业音频——配音、音乐、音效以及个性化虚拟形象声音。

CellCog提供三种语音提供商，各有不同优势。请根据需求选择：

场景	提供商	原因
标准旁白/配音	OpenAI	最佳语音风格控制，质量稳定
情感/戏剧性表达

使用方法

在会话中首次执行CellCog任务时，请阅读cellcog技能以获取完整SDK参考——文件处理、聊天模式、超时等。

OpenClaw（即发即忘模式）：
python
result = client.create_chat(
prompt=[你的任务提示],
notifysessionkey=agent:main:main,
task_label=my-task,
chat_mode=agent,
)

除OpenClaw外的所有代理（阻塞至完成）：
python
from cellcog import CellCogClient
client = CellCogClient(agent_provider=openclaw|cursor|claude-code|codex|...)
result = client.create_chat(
prompt=[你的任务提示],
task_label=my-task,
chat_mode=agent,
)
print(result[message])

语音提供商

OpenAI（默认）

最适合标准旁白、配音以及需要精确表达控制的单说话人内容。

核心优势：自然语言风格指令——描述你想要的发音、语调、节奏和情感。

8种内置语音：

语音	性别	特点
cedar	男声	温暖、浑厚、权威、可信赖
marin

最佳质量：cedar（男声）、marin（女声）。

风格定制示例：

- 温暖的对话语气，中等语速，提到功能时略带热情。美式发音。
低沉、轻声、神秘，缓慢从容的节奏——真实犯罪解说员风格。
浓重法语口音，优雅而友好，中等语速，刻意停顿。

ElevenLabs

最适合情感表达、戏剧性内容、角色语音和有声书旁白。

核心优势：直接在文本中嵌入情感标签——[笑]、[叹气]、[低语]、[兴奋]、[讽刺]。此外还有100+种多样化预制语音。

情感标签（谨慎使用——每段1-2个）：

标签	效果
[笑]	自然笑声
[轻笑]

示例提示：

使用ElevenLabs以温暖的英式男声生成语音：

然后，就在所有人都以为一切都结束了的时候……[停顿][低语]其实并没有结束。

MiniMax

最适合克隆语音（虚拟形象）和精细语音控制。

核心优势：MiniMax Speech 2.8 HD——录音室级音频质量。支持虚拟形象克隆语音ID用于个性化内容，以及17+种标准预制语音，可精细调节速度、音高和音量。

标准语音包括： 深沉男声、平静女声、休闲男声、活泼女声、智慧女声、友善之声、年轻骑士、优雅男声等。

语音设置： 情感（开心/悲伤/愤怒/中性等）、速度（0.5–2.0）、音量（0–10）、音高（-12至12）。

虚拟形象 / 克隆语音

用户可以在CellCog上创建带有自己克隆语音的虚拟形象。当虚拟形象拥有克隆语音时，CellCog使用MiniMax提供商生成听起来像该人物的语音。

工作原理：

- 用户在cellcog.ai上创建虚拟形象并上传语音样本
CellCog使用MiniMax Speech 2.8 HD克隆其语音
任何引用该虚拟形象的音频请求都会使用其克隆语音

示例提示：

使用我的虚拟形象Luna的语音生成配音：欢迎参加我们的季度更新。今天我很高兴与大家分享一些令人难以置信的成果。

这对于创建一致、个性化的内容非常强大——营销视频、播客开场、课程旁白——全部使用用户自己的语音。

音效（SFX）

CellCog根据文本描述生成独立音效。免版税，时长0.1至30秒。

示例提示：

- 生成10秒的暴雨击打金属屋顶并伴有偶尔雷声的音效
创建5秒的新雪上清脆脚步声的音效
生成大型空仓库中回响的摔门声

更好的音效技巧：

- 具体描述质感和环境
当精确时长重要时指定持续时间
对于超过30秒的环境音效，生成可循环的短片段并用ffmpeg扩展

音乐生成

根据文本描述创作原创音乐。时长3秒至10分钟。免版税。

能力：

- 任何流派或流派融合
器乐和人声曲目（如需人声请指定）
复杂编排、情绪转换和能量动态
描述你想要的内容——模型会处理音乐理论

示例提示：

- 创建2分钟平静的lo-fi嘻哈背景音乐，带有柔和钢琴和醇厚节拍，75 BPM
生成15秒充满活力的科技播客开场曲
创建90秒电影管弦乐——从柔和鼓舞开始，逐渐增强到自信的高潮
生成3分钟关于夏日冒险的流行歌曲，女声演唱

如需精确的逐段控制（每段精确时间），请详细描述你的创作计划——CellCog会处理结构。

所有生成的音乐均为免版税——可商业使用，无需署名或许可费。

多语言支持

所有三个语音提供商均支持40+种语言。请以目标语言提供语音文本：

英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文（普通话/粤语）、日语、韩语、印地语、阿拉伯语、俄语、波兰语、荷兰语、土耳其语等。

聊天模式

所有音频任务请使用chat_mode=agent。音频生成在代理模式下高效执行——无需代理团队。

更好的音频技巧

1. 选择合适的提供商：标准旁白用OpenAI，情感/戏剧性用ElevenLabs，克隆语音用MiniMax
提供完整脚本：准确写出应该说的内容——不要说关于我们产品的一些内容
包含风格指令：自信但温暖、缓慢而从容、略带兴奋
对于音乐：指定时长、情绪、流派和速度（如知道BPM请提供）
发音指导：对于名称或技术术语，添加提示：CellCog（发音为SELL-kog）
对于ElevenLabs情感标签：谨慎使用——每段1-2个。标签会影响后续所有文本，直到出现新标签。

如果未安装CellCog

运行/cellcog-setup（或根据你的工具运行/cellcog:cellcog-setup）进行安装和认证。
OpenClaw用户： 请运行clawhub install cellcog。
手动安装： pip install -U cellcog并设置CELLCOGAPIKEY。请参阅cellcog技能获取SDK参考。

audio-cog音频生成

audio-cog

Audio Cog - AI Audio Generation Powered by CellCog

How to Use

Voice Providers

OpenAI (Default)

ElevenLabs

MiniMax

Avatar / Cloned Voice

Sound Effects (SFX)

Music Generation

Multi-Language Support

Chat Mode

Tips for Better Audio

If CellCog is not installed

Audio Cog - 由CellCog驱动的AI音频生成

使用方法

语音提供商

OpenAI（默认）

ElevenLabs

MiniMax

虚拟形象 / 克隆语音

音效（SFX）

音乐生成

多语言支持

聊天模式

更好的音频技巧

如果未安装CellCog

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

audio-cog音频生成

audio-cog

Audio Cog - AI Audio Generation Powered by CellCog

How to Use

Voice Providers

OpenAI (Default)

ElevenLabs

MiniMax

Avatar / Cloned Voice

Sound Effects (SFX)

Music Generation

Multi-Language Support

Chat Mode

Tips for Better Audio

If CellCog is not installed

Audio Cog - 由CellCog驱动的AI音频生成

使用方法

语音提供商

OpenAI（默认）

ElevenLabs

MiniMax

虚拟形象 / 克隆语音

音效（SFX）

音乐生成

多语言支持

聊天模式

更好的音频技巧

如果未安装CellCog

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement