AI Podcast Pipeline
⚠️ Security Notice
This skill may trigger antivirus false positives due to legitimate use of:
- - base64 decoding: Used ONLY to decode audio data from Gemini TTS API responses (standard practice for binary data in JSON)
- subprocess calls: Used ONLY to invoke ffmpeg for audio/video processing
- Environment variables: Reads API keys from user-configured environment (
GEMINI_API_KEY) - Network requests: Calls Google Gemini API for text-to-speech generation
All code is open source and auditable in this repository. No malicious behavior.
Build end-to-end podcast assets from Trend/QuickView-* content.
Core Workflow
- 1. Select source QuickView file.
- Generate script (full or compressed mode).
- Build dual-voice MP3 (Gemini multi-speaker, chunked for reliability).
- Generate full-text Korean subtitles (no ellipsis truncation).
- Render subtitle MP4 with tuned font/size/timing shift.
- Build thumbnail + YouTube metadata.
- Deliver final package.
Step 1) Select Source
Prefer weekly QuickView file from your configured Quartz root.
If user gives wk.aiee.app URL, map to local Quartz markdown first.
Step 2) Generate Script
Read and apply:
Modes:
- - Full mode: 15~20 minutes
- Compressed mode: 5~7 minutes (core tips only)
Rules:
- - no system/meta text in spoken lines
- host intro once at opening only
- conversational Korean, short sentences, actionable
- save script in INLINECODE4
Step 3) Build Audio (Gemini Multi-Speaker, Reliable)
Preferred: chunked builder (timeout-safe)
CODEBLOCK0
Single-pass (short scripts)
CODEBLOCK1
Default voice mapping (2026-02-10 fixed):
- - Callie (female) → INLINECODE5
- Nick (male) → INLINECODE6
Output: MP3 (default delivery format)
Step 4) Build Korean Subtitles (Full Text)
Use full-text subtitle builder (no ... truncation):
CODEBLOCK2
Step 5) Render Subtitled MP4 (Font + Timing)
Use renderer with adjustable font and timing shift:
CODEBLOCK3
Notes:
- -
shift-ms negative = subtitle earlier (for lag fixes) - If text clipping occurs, lower
font-size (e.g., 25~27) - keep text inside safe area; avoid overlap with character/object
Step 6) Build Thumbnail + YouTube Metadata
CODEBLOCK4
Reference (layout/copy guardrails):
Step 7) Final Delivery Checklist
Always include:
- 1. source used
- final MP3 path
- subtitle MP4 path + size
- thumbnail path
- YouTube title options (3)
- YouTube description
Reliability Rules
- - Gemini timeout on long input: use chunked builder (
build_dualvoice_audio.py) - Subtitle clipping: reduce font size and increase bottom margin
- Subtitle lag: adjust
--shift-ms (usually -150 to -300) - Keep generated assets under Telegram practical limits
Security Notes
- - API keys must be passed via environment variables (
GEMINI_API_KEY), not hardcoded. - Never paste raw keys into prompts, logs, screenshots, or public posts.
- Recent hardening: thumbnail generation now passes keys via env (not CLI args).
References
- - INLINECODE16
- INLINECODE17
- INLINECODE18
AI播客流水线
⚠️ 安全须知
该技能可能因以下合法用途触发杀毒软件误报:
- - base64解码:仅用于解码Gemini TTS API响应中的音频数据(JSON中二进制数据的标准做法)
- 子进程调用:仅用于调用ffmpeg进行音频/视频处理
- 环境变量:从用户配置的环境中读取API密钥(GEMINIAPIKEY)
- 网络请求:调用Google Gemini API进行文本转语音生成
所有代码均为开源,可在本仓库中审计。无恶意行为。
从Trend/QuickView-*内容构建端到端播客资产。
核心工作流
- 1. 选择源QuickView文件。
- 生成脚本(完整或压缩模式)。
- 构建双语音MP3(Gemini多说话人,分块处理以确保可靠性)。
- 生成全文本韩语字幕(无省略号截断)。
- 渲染带字幕的MP4(调整字体/大小/时间偏移)。
- 构建缩略图+YouTube元数据。
- 交付最终包。
步骤1) 选择源
优先选择配置的Quartz根目录中的周度QuickView文件。
如果用户提供wk.aiee.app URL,先映射到本地Quartz markdown文件。
步骤2) 生成脚本
读取并应用:
- - references/podcastprompttemplate_ko.md
模式:
- - 完整模式:15~20分钟
- 压缩模式:5~7分钟(仅核心要点)
规则:
- - 口语对话中不包含系统/元文本
- 主持人开场介绍仅在开头出现一次
- 对话式韩语,短句,可操作性强
- 脚本保存至archive/
步骤3) 构建音频(Gemini多说话人,可靠)
推荐:分块构建器(超时安全)
bash
通过环境变量设置API密钥(必需)
export GEMINI
APIKEY=
从skills/ai-podcast-pipeline/目录运行
python3 scripts/builddualvoiceaudio.py \
--input \
--outdir \
--basename podcastfulldualvoice \
--chunk-lines 6
单次处理(短脚本)
bash
python3 scripts/geminimultispeakertts.py \
--input-file \
--outdir \
--basename podcast_dualvoice \
--retries 3 \
--timeout-seconds 120
默认语音映射(2026-02-10固定):
- - Callie(女声)→ Kore
- Nick(男声)→ Puck
输出:MP3(默认交付格式)
步骤4) 构建韩语字幕(全文本)
使用全文本字幕构建器(无...截断):
bash
python3 scripts/buildkoreansrt.py \
--script \
--audio \
--output /podcast.srt \
--max-chars 22
步骤5) 渲染带字幕的MP4(字体+时间)
使用可调整字体和时间偏移的渲染器:
bash
python3 scripts/rendersubtitledvideo.py \
--image \
--audio \
--srt \
--output /final.mp4 \
--font-name Do Hyeon \
--font-size 27 \
--shift-ms -250
注意:
- - shift-ms为负值=字幕提前(用于延迟修复)
- 如果出现文本裁剪,降低font-size(例如25~27)
- 保持文本在安全区域内;避免与角色/物体重叠
步骤6) 构建缩略图+YouTube元数据
bash
通过环境变量设置API密钥(必需)
export GEMINIAPIKEY=
python3 scripts/buildpodcastassets.py \
--source
参考(布局/文案护栏):
- - references/thumbnailguidelinesko.md
步骤7) 最终交付清单
始终包含:
- 1. 使用的源
- 最终MP3路径
- 字幕MP4路径+大小
- 缩略图路径
- YouTube标题选项(3个)
- YouTube描述
可靠性规则
- - Gemini长输入超时:使用分块构建器(builddualvoiceaudio.py)
- 字幕裁剪:减小字体大小并增加底部边距
- 字幕延迟:调整--shift-ms(通常为-150到-300)
- 保持生成的资产在Telegram实际限制范围内
安全说明
- - API密钥必须通过环境变量传递(GEMINIAPIKEY),不可硬编码。
- 切勿将原始密钥粘贴到提示词、日志、截图或公开帖子中。
- 近期加固:缩略图生成现在通过环境变量传递密钥(而非CLI参数)。
参考资料
- - references/podcastprompttemplateko.md
- references/workflowrunbook.md
- references/thumbnailguidelinesko.md