Meeting To Text
Use this skill when the job is a local file-to-transcript workflow.
Do not use this skill if the user only wants audio extraction, a meeting summary, environment setup, or an explanation of the models.
Inputs To Collect
Always collect:
- one local source file path one output target path
Output target rules:
- If the target ends with .txt, write exactly to that file. Otherwise treat it as a directory and write <source-stem>_transcript.txt inside it.
Supported source types:
- Video: .mp4, .mkv, .mov, .avi, INLINECODE6 Audio: .wav, .mp3, .m4a, .aac, .flac, INLINECODE12
Runtime
Read references/runtime_paths.md before running the script.
Run the bundled entrypoint with the local ASR environment:
CODEBLOCK0
If you need a stable temp location, add:
CODEBLOCK1
Result Handling
The script may print library noise before the final machine-readable result.
Always treat the last non-empty stdout line as the JSON result object.
Interpret results this way:
- Exit code 0 with status: success: transcript file was created with no warnings. Exit code 0 with status: warning: transcript file was created, but you must report the warnings and any skipped segments. Non-zero exit code or status: error: do not claim success; surface the warning list and the intended output path.
Important fields in the final JSON:
- output_path: final transcript file path INLINECODE19 : number of detected 说话人N labels in the written transcript INLINECODE21 : normalized diarization segments sent into transcription INLINECODE22 : segments that produced text INLINECODE23 : dropped or failed segments INLINECODE24 : segment-level failures with start, end, and INLINECODE27 INLINECODE28 : run-level warnings such as INLINECODE29
Behavior Guarantees
The entrypoint already enforces the workflow. Do not rewrite the pipeline ad hoc in the conversation.
The script will:
- normalize audio with FFmpeg instead of renaming extensions use local SenseVoiceSmall for ASR use local 3D-Speaker embeddings plus clustering for diarization write a plain text transcript with timestamps and INLINECODE30 stop on diarization failure instead of silently emitting a non-speaker-separated transcript
Report Back To The User
On success, report:
- the final transcript path whether the source was audio or video the detected speaker count any warnings that matter for review
On failure, report:
- the exit code category the warning message from the JSON result whether the failure happened during validation, media normalization, diarization, transcription, or output writing
References
Read these only when needed:
会议转文本
当任务为本地文件到转录文本的工作流程时使用此技能。
如果用户仅需要音频提取、会议摘要、环境设置或模型解释,请勿使用此技能。
需收集的输入
始终收集:
输出目标规则:
- 如果目标路径以.txt结尾,则直接写入该文件。 否则将其视为目录,并在其中写入<源文件名>_transcript.txt。
支持的源文件类型:
- 视频:.mp4、.mkv、.mov、.avi、.webm 音频:.wav、.mp3、.m4a、.aac、.flac、.ogg
运行时
在运行脚本前,请阅读references/runtime_paths.md 。
使用本地ASR环境运行捆绑的入口点:
powershell
& CONDAENVPYTHON PATH> C:\path\to\your\meeting-to-text\scripts\meetingto text.py --input PATH> --output TARGET>
如果需要稳定的临时位置,请添加:
powershell
--work-dir WORKSPACETEMP_PATH>
结果处理
脚本在输出最终机器可读结果前可能会打印库的噪声信息。
始终将最后一个非空stdout行视为JSON结果对象。
按以下方式解释结果:
- 退出代码0且status: success:转录文件已创建,无警告。 退出代码0且status: warning:转录文件已创建,但必须报告警告及任何跳过的片段。 非零退出代码或status: error:不要声称成功;展示警告列表和预期的输出路径。
最终JSON中的重要字段:
- outputpath:最终转录文件路径 speakercount:在写入的转录中检测到的说话人N标签数量 segmentcount:送入转录的标准化说话人分割片段数 transcribedsegmentcount:产生文本的片段数 skippedsegmentcount:丢弃或失败的片段数 failedsegments:片段级失败信息,包含start、end和reason warnings:运行级警告,如仅检测到一个说话人
行为保证
入口点已强制执行工作流程。不要在对话中临时重写管道。
脚本将:
- 使用FFmpeg标准化音频,而非重命名扩展名 使用本地SenseVoiceSmall进行ASR 使用本地3D-Speaker嵌入加聚类进行说话人分割 写入带有时间戳和说话人N的纯文本转录 在说话人分割失败时停止,而非静默输出未区分说话人的转录
向用户报告
成功时,报告:
- 最终转录文件路径 源文件是音频还是视频 检测到的说话人数量 任何需要审查的警告
失败时,报告:
- 退出代码类别 JSON结果中的警告消息 失败发生在验证、媒体标准化、说话人分割、转录还是输出写入阶段
参考资料
仅在需要时阅读: