AudioPod AI
Full audio processing API: music generation, stem separation, TTS, noise reduction, transcription, speaker separation, wallet management.
Setup
CODEBLOCK0
Auth: set AUDIOPOD_API_KEY env var or pass to client constructor.
Getting an API Key
- 1. Sign up at https://audiopod.ai/auth/signup (free, no credit card required)
- Go to https://www.audiopod.ai/dashboard/account/api-keys
- Click "Create API Key" and copy the key (starts with
ap_) - Add funds to your wallet at https://www.audiopod.ai/dashboard/account/wallet (pay-as-you-go, no subscription)
CODEBLOCK1
AI Music Generation
Generate songs, rap, instrumentals, samples, and vocals from text prompts.
Tasks: text2music (song with vocals), text2rap (rap), prompt2instrumental (instrumental), lyric2vocals (vocals only), text2samples (loops/samples), audio2audio (style transfer), INLINECODE8
Python SDK
CODEBLOCK2
cURL
CODEBLOCK3
Parameters
| Field | Required | Description |
|---|
| prompt | yes | Style/genre description |
| lyrics |
for song/rap/vocals | Song lyrics with verse/chorus structure |
| audio_duration | no | Duration in seconds (default: 30) |
| genre_preset | no | Genre preset name (from presets endpoint) |
| display_name | no | Track display name |
Stem Separation
Split audio into individual instrument/vocal tracks.
Modes
| Mode | Stems | Output | Use Case |
|---|
| single | 1 | Specified stem only | Vocal isolation, drum extraction |
| two |
2 | vocals + instrumental | Karaoke tracks |
| four | 4 | vocals, drums, bass, other | Standard remixing (default) |
| six | 6 | + guitar, piano | Full instrument separation |
| producer | 8 | + kick, snare, hihat | Beat production |
| studio | 12 | + cymbals, sub_bass, synth | Professional mixing |
| mastering | 16 | Maximum detail | Forensic analysis |
Single stem options: vocals, drums, bass, guitar, piano, other
Python SDK
CODEBLOCK4
cURL
CODEBLOCK5
Response Format
CODEBLOCK6
Text to Speech
Generate speech from text with 50+ voices in 60+ languages. Supports voice cloning.
Voice Types
- - 50+ production-ready voices — multilingual, supporting 60+ languages with auto-detection
- Custom clones — clone any voice with ~5 seconds of audio sample
Python SDK
CODEBLOCK7
cURL (Raw HTTP — most reliable)
CODEBLOCK8
Generate Parameters
| Field | Required | Description |
|---|
| inputtext | yes | Text to speak (max 5000 chars). Use input_text for raw HTTP, text for SDK |
| audioformat |
no | mp3, wav, ogg (default: mp3) |
| speed | no | 0.25 - 4.0 (default: 1.0) |
| language | no | ISO code, auto-detected if omitted |
Response Format
CODEBLOCK9
Important Notes
- - Raw HTTP generate endpoint uses form data, not JSON. Field is
input_text not INLINECODE12 - SDK endpoint (
/api/v1/voice/tts/generate) uses JSON with field INLINECODE14 - Output files may be WAV disguised as .mp3 — convert with INLINECODE15
- ~55 credits per generation, wallet-based billing
Speaker Separation
Separate audio by speaker with automatic diarization.
Python SDK
CODEBLOCK10
cURL
CODEBLOCK11
Speech to Text (Transcription)
Transcribe audio/video with speaker diarization, word-level timestamps, and multiple output formats.
Python SDK
CODEBLOCK12
cURL
CODEBLOCK13
Parameters
| Field | Required | Description |
|---|
| url / urls | yes (or file) | URL(s) to transcribe (YouTube, SoundCloud, direct links) |
| language |
no | ISO 639-1 code (auto-detected if omitted) |
| enable
speakerdiarization | no | Enable speaker identification (default: false) |
| min
speakers / maxspeakers | no | Speaker count hints for better diarization |
| word_timestamps | no | Enable word-level timestamps (default: true) |
Output Formats
- - json — Full structured output with segments, timestamps, speakers
- srt — SubRip subtitle format
- vtt — WebVTT subtitle format
- txt — Plain text transcript
Noise Reduction
Remove background noise from audio/video files.
Python SDK
CODEBLOCK14
cURL
CODEBLOCK15
Wallet & Billing
Check balance, estimate costs, and view usage history.
Python SDK
CODEBLOCK16
cURL
CODEBLOCK17
API Endpoint Summary
| Service | Endpoint | Method |
|---|
| Music | INLINECODE16 | POST |
| Music jobs |
/api/v1/music/jobs/{id} | GET/DELETE |
| Music presets |
/api/v1/music/presets | GET |
|
Stems |
/api/v1/stem-extraction/api/extract | POST (multipart) |
| Stems status |
/api/v1/stem-extraction/status/{id} | GET |
| Stems modes |
/api/v1/stem-extraction/modes | GET |
| Stems jobs |
/api/v1/stem-extraction/jobs | GET |
|
TTS generate |
/api/v1/voice/voices/{uuid}/generate | POST (form data) |
| TTS generate (SDK) |
/api/v1/voice/tts/generate | POST (JSON) |
| TTS status |
/api/v1/voice/tts-jobs/{id}/status | GET |
| TTS status (SDK) |
/api/v1/voice/tts/status/{id} | GET |
| Voice list |
/api/v1/voice/voice-profiles | GET |
| Voice list (SDK) |
/api/v1/voice/voices | GET |
|
Speaker |
/api/v1/speaker/diarize | POST (multipart) |
| Speaker jobs |
/api/v1/speaker/jobs/{id} | GET/DELETE |
|
Transcribe URL |
/api/v1/transcribe/transcribe | POST (JSON) |
| Transcribe upload |
/api/v1/transcribe/transcribe-upload | POST (multipart) |
| Transcript output |
/api/v1/transcribe/jobs/{id}/transcript?format= | GET |
| Transcribe jobs |
/api/v1/transcribe/jobs | GET |
|
Denoise |
/api/v1/denoiser/denoise | POST (multipart) |
| Denoise jobs |
/api/v1/denoiser/jobs/{id} | GET/DELETE |
|
Wallet balance |
/api/v1/api-wallet/balance | GET |
| Wallet pricing |
/api/v1/api-wallet/pricing | GET |
| Wallet usage |
/api/v1/api-wallet/usage | GET |
Auth Headers
Two auth styles work:
- -
X-API-Key: ap_... — works for most endpoints - INLINECODE41 — works for TTS generate/status
Known Issues
- - SDK method signatures may differ from raw API — when in doubt, use cURL examples
- TTS output stored on Cloudflare R2, download via
output_url in job status - TTS output files may be WAV disguised as .mp3 — convert with ffmpeg before sending via WhatsApp
AudioPod AI
完整的音频处理API:音乐生成、音轨分离、文本转语音、降噪、转录、说话人分离、钱包管理。
安装
bash
pip install audiopod # Python
npm install audiopod # Node.js
认证:设置 AUDIOPODAPIKEY 环境变量,或传递给客户端构造函数。
获取API密钥
- 1. 在 https://audiopod.ai/auth/signup 注册(免费,无需信用卡)
- 访问 https://www.audiopod.ai/dashboard/account/api-keys
- 点击创建API密钥并复制密钥(以 ap_ 开头)
- 在 https://www.audiopod.ai/dashboard/account/wallet 向钱包充值(按量付费,无需订阅)
python
from audiopod import AudioPod
client = AudioPod() # 使用 AUDIOPODAPIKEY 环境变量
或:client = AudioPod(apikey=ap...)
AI音乐生成
根据文本提示生成歌曲、说唱、纯音乐、采样和人声。
任务类型: text2music(带人声的歌曲)、text2rap(说唱)、prompt2instrumental(纯音乐)、lyric2vocals(仅人声)、text2samples(循环/采样)、audio2audio(风格迁移)、songbloom
Python SDK
python
生成带歌词的完整歌曲
result = client.music.song(
prompt=欢快的流行乐,合成器,鼓点,120 BPM,女声,电台级制作,
lyrics=主歌1:\n走在阳光明媚的街道上\n\n副歌:\n今晚我们热情似火!,
duration=60
)
print(result[output_url])
生成说唱
result = client.music.rap(
prompt=Lo-Fi 嘻哈,100 BPM,男声说唱,忧郁,键盘和弦,
lyrics=主歌1:\n从底层开始,现在我们正在攀登...,
duration=60
)
生成纯音乐(无需歌词)
result = client.music.instrumental(
prompt=氛围环境音景,振奋人心,驱动性情绪,
duration=30
)
使用明确任务类型的通用生成
result = client.music.generate(
prompt=电子舞曲,高能量,
task=text2samples, # 任意任务类型
duration=30
)
异步:提交后轮询
job = client.music.create(
prompt=轻松的Lo-Fi节拍,
duration=30,
task=prompt2instrumental
)
result = client.music.wait
forcompletion(job[id], timeout=600)
获取可用的流派预设
presets = client.music.get_presets()
列出/管理任务
jobs = client.music.list(skip=0, limit=50)
job = client.music.get(job_id=123)
client.music.delete(job_id=123)
cURL
bash
带歌词的歌曲
curl -X POST https://api.audiopod.ai/api/v1/music/text2music \
-H X-API-Key: $AUDIOPOD
APIKEY \
-H Content-Type: application/json \
-d {prompt:欢快的流行乐,合成器,120bpm,女声, lyrics:走在阳光明媚的街道上..., audio_duration:60}
说唱
curl -X POST https://api.audiopod.ai/api/v1/music/text2rap \
-H X-API-Key: $AUDIOPOD
APIKEY \
-H Content-Type: application/json \
-d {prompt:Lo-Fi 嘻哈,男声说唱,100 BPM, lyrics:从底层开始..., audio_duration:60}
纯音乐
curl -X POST https://api.audiopod.ai/api/v1/music/prompt2instrumental \
-H X-API-Key: $AUDIOPOD
APIKEY \
-H Content-Type: application/json \
-d {prompt:氛围音景,振奋人心, audio_duration:30}
采样/循环
curl -X POST https://api.audiopod.ai/api/v1/music/text2samples \
-H X-API-Key: $AUDIOPOD
APIKEY \
-H Content-Type: application/json \
-d {prompt:鼓点循环,悲伤情绪, audio_duration:15}
仅人声
curl -X POST https://api.audiopod.ai/api/v1/music/lyric2vocals \
-H X-API-Key: $AUDIOPOD
APIKEY \
-H Content-Type: application/json \
-d {prompt:清晰人声,快乐, lyrics:永恒的团结合唱..., audio_duration:30}
检查任务状态/获取结果
curl https://api.audiopod.ai/api/v1/music/jobs/JOB_ID \
-H X-API-Key: $AUDIOPOD
APIKEY
获取流派预设
curl https://api.audiopod.ai/api/v1/music/presets \
-H X-API-Key: $AUDIOPOD
APIKEY
列出任务
curl https://api.audiopod.ai/api/v1/music/jobs?skip=0&limit=50 \
-H X-API-Key: $AUDIOPOD
APIKEY
删除任务
curl -X DELETE https://api.audiopod.ai/api/v1/music/jobs/JOB_ID \
-H X-API-Key: $AUDIOPOD
APIKEY
参数
| 字段 | 必填 | 描述 |
|---|
| prompt | 是 | 风格/流派描述 |
| lyrics |
歌曲/说唱/人声 | 带主歌/副歌结构的歌词 |
| audio_duration | 否 | 时长(秒,默认:30) |
| genre_preset | 否 | 流派预设名称(来自预设端点) |
| display_name | 否 | 曲目显示名称 |
音轨分离
将音频分割为单独的乐器/人声轨道。
模式
| 模式 | 音轨数 | 输出 | 使用场景 |
|---|
| single | 1 | 仅指定音轨 | 人声隔离、鼓点提取 |
| two |
2 | 人声 + 伴奏 | 卡拉OK伴奏 |
| four | 4 | 人声、鼓、贝斯、其他 | 标准混音(默认) |
| six | 6 | + 吉他、钢琴 | 完整乐器分离 |
| producer | 8 | + 底鼓、军鼓、踩镲 | 节拍制作 |
| studio | 12 | + 镲片、低音贝斯、合成器 | 专业混音 |
| mastering | 16 | 最大细节 | 法医分析 |
单音轨选项: 人声、鼓、贝斯、吉他、钢琴、其他
Python SDK
python
同步:提取并等待结果
result = client.stems.separate(
url=https://youtube.com/watch?v=VIDEO_ID,
mode=six,
timeout=600
)
for stem, url in result[download_urls].items():
print(f{stem}: {url})
从本地文件
result = client.stems.separate(file=/path/to/song.mp3, mode=four)
单音轨提取
result = client.stems.separate(
url=https://youtube.com/watch?v=ID,
mode=single,
stem=vocals
)
异步:提交后轮询
job = client.stems.extract(url=https://youtube.com/watch?v=ID, mode=six)
print(f任务ID: {job[id]})
status = client.stems.status(job[id])
或等待:
result = client.stems.wait
forcompletion(job[id], timeout=600)
列出可用模式
modes = client.stems.modes()
任务管理
jobs = client.stems.list(skip=0, limit=50, status=COMPLETED)
job = client.stems.get(job_id=1234)
client.stems.delete(job_id=1234)
cURL
bash
从URL提取
curl -X POST https://api.audiopod.ai/api/v1/stem-extraction/api/extract \
-H X