Audio Cog - AI Audio Generation Powered by CellCog
Create professional audio with AI — voiceovers, music, sound effects, and personalized avatar voices.
CellCog provides three voice providers, each with different strengths. Choose based on your needs:
| Scenario | Provider | Why |
|---|
| Standard narration/voiceover | OpenAI | Best voice style control, consistent quality |
| Emotional/dramatic delivery |
ElevenLabs | Richest emotional range, supports emotion tags |
| Cloned voice (avatar) | MiniMax | Only provider with voice cloning support |
| Character voice with specific accent | ElevenLabs | 100+ diverse pre-made voices |
| Fine pitch/speed/volume control | MiniMax | Granular voice settings |
How to Use
For your first CellCog task in a session, read the cellcog skill for the full SDK reference — file handling, chat modes, timeouts, and more.
OpenClaw (fire-and-forget):
CODEBLOCK0
All agents except OpenClaw (blocks until done):
from cellcog import CellCogClient
client = CellCogClient(agent_provider="openclaw|cursor|claude-code|codex|...")
result = client.create_chat(
prompt="[your task prompt]",
task_label="my-task",
chat_mode="agent",
)
print(result["message"])
Voice Providers
OpenAI (Default)
Best for standard narration, voiceovers, and single-speaker content with precise delivery control.
Key strength: Natural-language style instructions — describe the accent, tone, pacing, and emotion you want.
8 built-in voices:
| Voice | Gender | Characteristics |
|---|
| cedar | Male | Warm, resonant, authoritative, trustworthy |
| marin |
Female | Bright, articulate, emotionally agile, professional |
|
ballad | Male | Smooth, melodic, musical quality |
|
coral | Female | Vibrant, lively, dynamic, spirited |
|
echo | Male | Calm, measured, thoughtful, deliberate |
|
sage | Female | Wise, contemplative, reflective |
|
shimmer | Female | Soft, gentle, soothing, approachable |
|
verse | Male | Poetic, rhythmic, artistic, expressive |
Best quality: cedar (male), marin (female).
Style customization examples:
- - "Warm conversational tone, medium pace, slight enthusiasm when mentioning features. American accent."
- "Deep, hushed, enigmatic, with a slow deliberate cadence — true crime narrator style."
- "Heavy French accent, sophisticated yet friendly, moderate pacing with deliberate pauses."
ElevenLabs
Best for emotional delivery, dramatic content, character voices, and audiobook narration.
Key strength: Emotion tags embedded directly in text — [laughs], [sighs], [whispers], [excited], [sarcastic]. Plus 100+ diverse pre-made voices.
Emotion tags (use sparingly — 1-2 per paragraph):
| Tag | Effect |
|---|
| INLINECODE5 | Natural laughter |
| INLINECODE6 |
Soft/brief laughter |
|
[sighs] | Sighing |
|
[gasps] | Surprise/shock |
|
[whispers] | Whispering delivery |
|
[pause] | Natural pause/beat |
|
[sad],
[happy],
[excited],
[angry],
[sarcastic] | Emotional delivery |
Example prompt:
"Generate speech using ElevenLabs with a warm British male voice:
'And then, just when everyone thought it was over... [pause] [whispers] it wasn't.'"
MiniMax
Best for cloned voices (avatars) and fine-grained voice control.
Key strength: MiniMax Speech 2.8 HD — studio-grade audio quality. Supports avatar cloned voice IDs for personalized content, plus 17+ standard pre-made voices with granular speed, pitch, and volume control.
Standard voices include: Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Wise_Woman, Friendly_Person, Young_Knight, Elegant_Man, and more.
Voice settings: emotion (happy/sad/angry/neutral/etc.), speed (0.5–2.0), volume (0–10), pitch (-12 to 12).
Avatar / Cloned Voice
Users can create avatars on CellCog with their own cloned voice. When an avatar has a cloned voice, CellCog uses the MiniMax provider to generate speech that sounds like that person.
How it works:
- - The user creates an avatar on cellcog.ai and uploads voice samples
- CellCog clones their voice using MiniMax Speech 2.8 HD
- Any audio request referencing that avatar uses their cloned voice
Example prompt:
"Generate a voiceover using my avatar Luna's voice: 'Welcome to our quarterly update. I'm excited to share some incredible results with you today.'"
This is powerful for creating consistent, personalized content — marketing videos, podcast intros, course narration — all in the user's own voice.
Sound Effects (SFX)
CellCog generates standalone sound effects from text descriptions. Royalty-free, 0.1 to 30 seconds.
Example prompts:
- - "Generate a sound effect of heavy rain hitting a metal roof with occasional thunder, 10 seconds"
- "Create a crispy footsteps-on-fresh-snow sound effect, 5 seconds"
- "Generate an echoing door slam in a large empty warehouse"
Tips for better SFX:
- - Be specific about textures and environment
- Specify duration when exact length matters
- For ambient audio longer than 30 seconds, generate a short loopable segment and extend with ffmpeg
Music Generation
Create original music from text descriptions. 3 seconds to 10 minutes. Royalty-free.
Capabilities:
- - Any genre or genre fusion
- Instrumental and vocal tracks (specify if you want vocals)
- Complex arrangements, mood transitions, and energy dynamics
- Describe what you want — the model handles music theory
Example prompts:
- - "Create 2 minutes of calm lo-fi hip-hop background music with soft piano and mellow beats, 75 BPM"
- "Generate a 15-second upbeat tech podcast intro jingle"
- "Create 90 seconds of cinematic orchestral music — start soft and inspiring, build to a confident crescendo"
- "Generate a 3-minute pop song about summer adventures with female vocals"
For precise section-by-section control (exact timing per section), describe your composition plan in detail — CellCog handles the structure.
All generated music is royalty-free — use commercially without attribution or licensing fees.
Multi-Language Support
All three voice providers support 40+ languages. Provide speech text in the target language:
English, Spanish, French, German, Italian, Portuguese, Chinese (Mandarin/Cantonese), Japanese, Korean, Hindi, Arabic, Russian, Polish, Dutch, Turkish, and many more.
Chat Mode
Use chat_mode="agent" for all audio tasks. Audio generation executes efficiently in agent mode — no need for agent team.
Tips for Better Audio
- 1. Choose the right provider: OpenAI for standard narration, ElevenLabs for emotional/dramatic, MiniMax for cloned voices
- Provide the complete script: Write out exactly what should be spoken — don't say "something about our product"
- Include style instructions: "Confident but warm", "slow and deliberate", "with slight excitement"
- For music: Specify duration, mood, genre, and tempo (BPM if you know it)
- Pronunciation guidance: For names or technical terms, add hints: "CellCog (pronounced SELL-kog)"
- For ElevenLabs emotion tags: Use sparingly — 1-2 per paragraph. Tags affect all subsequent text until a new tag.
If CellCog is not installed
Run /cellcog-setup (or /cellcog:cellcog-setup depending on your tool) to install and authenticate.
OpenClaw users: Run clawhub install cellcog instead.
Manual setup: pip install -U cellcog and set CELLCOG_API_KEY. See the cellcog skill for SDK reference.
Audio Cog - 由CellCog驱动的AI音频生成
使用AI创建专业音频——配音、音乐、音效以及个性化虚拟形象声音。
CellCog提供三种语音提供商,各有不同优势。请根据需求选择:
| 场景 | 提供商 | 原因 |
|---|
| 标准旁白/配音 | OpenAI | 最佳语音风格控制,质量稳定 |
| 情感/戏剧性表达 |
ElevenLabs | 最丰富的情感范围,支持情感标签 |
| 克隆语音(虚拟形象) | MiniMax | 唯一支持语音克隆的提供商 |
| 特定口音的角色语音 | ElevenLabs | 100+种多样化预制语音 |
| 精细音高/速度/音量控制 | MiniMax | 精细的语音设置 |
使用方法
在会话中首次执行CellCog任务时,请阅读cellcog技能以获取完整SDK参考——文件处理、聊天模式、超时等。
OpenClaw(即发即忘模式):
python
result = client.create_chat(
prompt=[你的任务提示],
notifysessionkey=agent:main:main,
task_label=my-task,
chat_mode=agent,
)
除OpenClaw外的所有代理(阻塞至完成):
python
from cellcog import CellCogClient
client = CellCogClient(agent_provider=openclaw|cursor|claude-code|codex|...)
result = client.create_chat(
prompt=[你的任务提示],
task_label=my-task,
chat_mode=agent,
)
print(result[message])
语音提供商
OpenAI(默认)
最适合标准旁白、配音以及需要精确表达控制的单说话人内容。
核心优势:自然语言风格指令——描述你想要的发音、语调、节奏和情感。
8种内置语音:
| 语音 | 性别 | 特点 |
|---|
| cedar | 男声 | 温暖、浑厚、权威、可信赖 |
| marin |
女声 | 明亮、清晰、情感灵活、专业 |
|
ballad | 男声 | 流畅、旋律优美、音乐质感 |
|
coral | 女声 | 充满活力、生动、动感、精神饱满 |
|
echo | 男声 | 平静、稳重、深思熟虑、从容 |
|
sage | 女声 | 睿智、沉思、反思 |
|
shimmer | 女声 | 柔和、温柔、舒缓、亲切 |
|
verse | 男声 | 诗意、节奏感强、艺术、富有表现力 |
最佳质量:cedar(男声)、marin(女声)。
风格定制示例:
- - 温暖的对话语气,中等语速,提到功能时略带热情。美式发音。
- 低沉、轻声、神秘,缓慢从容的节奏——真实犯罪解说员风格。
- 浓重法语口音,优雅而友好,中等语速,刻意停顿。
ElevenLabs
最适合情感表达、戏剧性内容、角色语音和有声书旁白。
核心优势:直接在文本中嵌入情感标签——[笑]、[叹气]、[低语]、[兴奋]、[讽刺]。此外还有100+种多样化预制语音。
情感标签(谨慎使用——每段1-2个):
轻柔/短暂笑声 |
| [叹气] | 叹气 |
| [倒吸一口气] | 惊讶/震惊 |
| [低语] | 低语表达 |
| [停顿] | 自然停顿/节拍 |
| [悲伤]、[开心]、[兴奋]、[愤怒]、[讽刺] | 情感表达 |
示例提示:
使用ElevenLabs以温暖的英式男声生成语音:
然后,就在所有人都以为一切都结束了的时候……[停顿][低语]其实并没有结束。
MiniMax
最适合克隆语音(虚拟形象)和精细语音控制。
核心优势:MiniMax Speech 2.8 HD——录音室级音频质量。支持虚拟形象克隆语音ID用于个性化内容,以及17+种标准预制语音,可精细调节速度、音高和音量。
标准语音包括: 深沉男声、平静女声、休闲男声、活泼女声、智慧女声、友善之声、年轻骑士、优雅男声等。
语音设置: 情感(开心/悲伤/愤怒/中性等)、速度(0.5–2.0)、音量(0–10)、音高(-12至12)。
虚拟形象 / 克隆语音
用户可以在CellCog上创建带有自己克隆语音的虚拟形象。当虚拟形象拥有克隆语音时,CellCog使用MiniMax提供商生成听起来像该人物的语音。
工作原理:
- - 用户在cellcog.ai上创建虚拟形象并上传语音样本
- CellCog使用MiniMax Speech 2.8 HD克隆其语音
- 任何引用该虚拟形象的音频请求都会使用其克隆语音
示例提示:
使用我的虚拟形象Luna的语音生成配音:欢迎参加我们的季度更新。今天我很高兴与大家分享一些令人难以置信的成果。
这对于创建一致、个性化的内容非常强大——营销视频、播客开场、课程旁白——全部使用用户自己的语音。
音效(SFX)
CellCog根据文本描述生成独立音效。免版税,时长0.1至30秒。
示例提示:
- - 生成10秒的暴雨击打金属屋顶并伴有偶尔雷声的音效
- 创建5秒的新雪上清脆脚步声的音效
- 生成大型空仓库中回响的摔门声
更好的音效技巧:
- - 具体描述质感和环境
- 当精确时长重要时指定持续时间
- 对于超过30秒的环境音效,生成可循环的短片段并用ffmpeg扩展
音乐生成
根据文本描述创作原创音乐。时长3秒至10分钟。免版税。
能力:
- - 任何流派或流派融合
- 器乐和人声曲目(如需人声请指定)
- 复杂编排、情绪转换和能量动态
- 描述你想要的内容——模型会处理音乐理论
示例提示:
- - 创建2分钟平静的lo-fi嘻哈背景音乐,带有柔和钢琴和醇厚节拍,75 BPM
- 生成15秒充满活力的科技播客开场曲
- 创建90秒电影管弦乐——从柔和鼓舞开始,逐渐增强到自信的高潮
- 生成3分钟关于夏日冒险的流行歌曲,女声演唱
如需精确的逐段控制(每段精确时间),请详细描述你的创作计划——CellCog会处理结构。
所有生成的音乐均为免版税——可商业使用,无需署名或许可费。
多语言支持
所有三个语音提供商均支持40+种语言。请以目标语言提供语音文本:
英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文(普通话/粤语)、日语、韩语、印地语、阿拉伯语、俄语、波兰语、荷兰语、土耳其语等。
聊天模式
所有音频任务请使用chat_mode=agent。音频生成在代理模式下高效执行——无需代理团队。
更好的音频技巧
- 1. 选择合适的提供商:标准旁白用OpenAI,情感/戏剧性用ElevenLabs,克隆语音用MiniMax
- 提供完整脚本:准确写出应该说的内容——不要说关于我们产品的一些内容
- 包含风格指令:自信但温暖、缓慢而从容、略带兴奋
- 对于音乐:指定时长、情绪、流派和速度(如知道BPM请提供)
- 发音指导:对于名称或技术术语,添加提示:CellCog(发音为SELL-kog)
- 对于ElevenLabs情感标签:谨慎使用——每段1-2个。标签会影响后续所有文本,直到出现新标签。
如果未安装CellCog
运行/cellcog-setup(或根据你的工具运行/cellcog:cellcog-setup)进行安装和认证。
OpenClaw用户: 请运行clawhub install cellcog。
手动安装: pip install -U cellcog并设置CELLCOGAPIKEY。请参阅cellcog技能获取SDK参考。