Audio/Video Transcription & Summarization

Transcribe audio/video files using the SenseASR API (api.senseaudio.cn), then summarize the content into structured notes.

{baseDir} refers to this skill's directory.

Prerequisites

- Environment variable SENSEAUDIO_API_KEY configured (get your key at https://senseaudio.cn/platform/api-key)
Python 3.8+ with requests installed
For large files (>10MB): ffmpeg installed for splitting（macOS: brew install ffmpeg，Windows: ffmpeg.org 下载并加入 PATH，Linux: apt install ffmpeg）

Quick Start

1. Run the transcription script:

CODEBLOCK0

2. The script outputs a transcript .txt file alongside the source file
Read the transcript and generate a summary (see Summary Format below)

Workflow

Step 1: Assess the Audio File

Check file size and format:

- Supported formats: wav, mp3, ogg, flac, aac, m4a, mp4
Max file size per request: 10MB
If file > 10MB, the script auto-splits using ffmpeg

Step 2: Choose the Right Model

Model	Use When
INLINECODE8	Quick batch transcription, simple audio, cost-sensitive
INLINECODE9

General transcription, need speaker separation or timestamps | | sense-asr-pro | High accuracy needed: meetings, interviews, complex audio | | sense-asr-deepthink | Noisy audio, dialects, heavy jargon, speech-to-clean-text |

Default to sense-asr-pro for best quality.

Step 3: Transcribe

Run the transcription script. Key options:

CODEBLOCK1

Step 4: Summarize

After transcription, read the transcript file and produce a summary using the format below.

Summary Format

Generate summaries in this structure:

CODEBLOCK2

Adapt the template based on content type:

- Meeting: emphasize action items, decisions, speaker contributions
Lecture/Talk: emphasize key concepts, learning points, structure
Interview: emphasize Q&A pairs, key responses
Podcast: emphasize topics discussed, interesting insights

API Reference

For full SenseASR API parameters and response formats, see api-reference.md.

音频/视频转录与摘要

使用 SenseASR API（api.senseaudio.cn）转录音频/视频文件，然后将内容总结为结构化笔记。

{baseDir} 指代本技能的目录。

前置条件

- 已配置环境变量 SENSEAUDIOAPIKEY（在 https://senseaudio.cn/platform/api-key 获取密钥）
已安装 Python 3.8+ 并安装 requests 库
对于大文件（>10MB）：需安装 ffmpeg 用于分割（macOS：brew install ffmpeg，Windows：从 ffmpeg.org 下载并加入 PATH，Linux：apt install ffmpeg）

快速开始

1. 运行转录脚本：

bash
python {baseDir}/scripts/transcribe.py <音频文件> [--model sense-asr-pro] [--language zh] [--speakers] [--sentiment] [--translate en]

2. 脚本会在源文件旁输出一个 .txt 格式的转录文件
阅读转录内容并生成摘要（参见下方摘要格式）

工作流程

步骤 1：评估音频文件

检查文件大小和格式：

- 支持的格式：wav、mp3、ogg、flac、aac、m4a、mp4
单次请求最大文件大小：10MB
如果文件 > 10MB，脚本会自动使用 ffmpeg 进行分割

步骤 2：选择合适的模型

模型	适用场景
sense-asr-lite	快速批量转录、简单音频、对成本敏感
sense-asr

默认使用 sense-asr-pro 以获得最佳质量。

步骤 3：转录

运行转录脚本。关键选项：

bash

基础转录

python {baseDir}/scripts/transcribe.py recording.mp3

多人会议 + 情感分析

python {baseDir}/scripts/transcribe.py meeting.wav \ --model sense-asr-pro \ --speakers --max-speakers 4 \ --sentiment \ --timestamps segment

转录并翻译为英文

python {baseDir}/scripts/transcribe.py lecture.mp3 \ --model sense-asr \ --translate en

步骤 4：生成摘要

转录完成后，阅读转录文件并使用以下格式生成摘要。

摘要格式

按以下结构生成摘要：

markdown

[标题 - 从内容推断]

来源：filename.mp3
时长：X 分 Y 秒
日期：YYYY-MM-DD
说话人：[如果使用了说话人分离]

要点

- 要点 1
要点 2
...

详细摘要

[按主题/时间顺序组织的 2-4 段内容摘要]

待办事项

- [ ] 待办事项 1（如适用，分配给说话人 X）
[ ] 待办事项 2

精彩引述

转录中的直接引语 — 说话人 X，[如有时间戳]

完整转录

点击展开完整转录

[完整转录文本，如有说话人标签和时间戳]

根据内容类型调整模板：

- 会议：强调待办事项、决策、说话人贡献
讲座/演讲：强调关键概念、学习要点、结构
访谈：强调问答对、关键回答
播客：强调讨论的话题、有趣见解

API 参考

有关完整的 SenseASR API 参数和响应格式，请参阅 api-reference.md。

audio-transcribe-summarize音频转录摘要

audio-transcribe-summarize

Audio/Video Transcription & Summarization

Prerequisites

Quick Start