YouTube Batch Transcript Extractor API Skill
📖 Introduction
This skill uses the BrowserAct YouTube Batch Transcript Extractor API template to provide users with an automated service for extracting YouTube video transcripts and metadata in batch. Simply by providing search keywords and filters, you can batch extract full video transcripts, likes, and channel metadata without writing crawler scripts.
✨ Features
- 1. No hallucinations, ensuring stable and accurate data extraction: Pre-set workflows avoid generative AI hallucinations.
- No CAPTCHA issues: No need to handle reCAPTCHA or other verification challenges.
- No IP access restrictions or geofencing: No need to deal with regional IP restrictions.
- Faster execution: Tasks execute faster compared to pure AI-driven browser automation solutions.
- High cost-effectiveness: Significantly reduces data acquisition costs compared to AI solutions that consume a large number of tokens.
🔑 API Key Guide Process
Before running, you must check the
BROWSERACT_API_KEY environment variable. If it is not set, do not take any other actions; you should request and wait for the user to provide it collaboratively.
The Agent must inform the user at this time:
"Since you have not yet configured the BrowserAct API Key, please go to the BrowserAct Console to get your Key first."
🛠️ Input Parameters
When calling the script, the Agent should flexibly configure the following parameters based on user needs:
- 1. KeyWords
-
Type:
string
-
Description: The keyword to search for on YouTube.
-
Example:
OpenClaw, INLINECODE3
- 2. Upload_date
-
Type:
string
-
Description: Filter for the upload date of the videos.
-
Optional values:
Today,
This week,
This month,
This year.
-
Default value: INLINECODE9
- 3. Datelimit
-
Type:
number
-
Description: The number of videos to extract. Adjust as needed.
-
Default value:
5
-
Recommendation: Set a smaller value (1-5) for quick tests and a larger value for bulk extraction.
🚀 Invocation Method (Recommended)
The Agent should execute the following independent script to achieve "one-line command to get results":
CODEBLOCK0
⏳ Execution Status Monitoring
Since this task involves automated browser operations, it may take a long time (several minutes). The script will
continuously output timestamped status logs (e.g.,
[14:30:05] Task Status: running) while running.
Agent Notes:
- - While waiting for the script to return results, please keep an eye on the terminal output.
- As long as the terminal is still outputting new status logs, it means the task is running normally. Please do not mistakenly judge it as a deadlock or unresponsiveness.
- Only consider triggering the retry mechanism if the status remains unchanged for a long time or the script stops outputting without returning a result.
📊 Data Output Description
After successful execution, the script will parse and print the results directly from the API response. The results include:
- -
Video title: The title of the YouTube video. - INLINECODE14 : The direct link to the original video.
- INLINECODE15 : The name of the channel publishing the video.
- INLINECODE16 : The URL of the publisher's YouTube channel.
- INLINECODE17 : The number of likes the video has received.
- INLINECODE18 : The complete extracted transcript/subtitles of the videos.
⚠️ Error Handling & Retry
If an error is encountered during the execution of the script (such as network fluctuation or task failure), the Agent should follow the logic below:
- 1. Check the output content:
- If the output
contains "Invalid authorization", it indicates that the API Key is invalid or expired.
Do not retry at this time. You should guide the user to recheck and provide the correct API Key.
- If the output
does not contain "Invalid authorization" but the task execution fails (for example, the output starts with
Error: or returns an empty result), the Agent should
automatically try to execute the script once more.
- 2. Retry limits:
- Automatic retry is limited to
once. If the second attempt still fails, stop retrying and report the specific error message to the user.
YouTube批量字幕提取API技能
📖 简介
本技能使用BrowserAct YouTube批量字幕提取API模板,为用户提供自动化批量提取YouTube视频字幕和元数据的服务。只需提供搜索关键词和筛选条件,即可批量提取完整的视频字幕、点赞数和频道元数据,无需编写爬虫脚本。
✨ 功能特点
- 1. 无幻觉,确保数据稳定准确提取:预设工作流程避免了生成式AI的幻觉问题。
- 无验证码问题:无需处理reCAPTCHA或其他验证挑战。
- 无IP访问限制或地理封锁:无需处理区域IP限制问题。
- 执行速度更快:相比纯AI驱动的浏览器自动化方案,任务执行速度更快。
- 高性价比:相比消耗大量Token的AI方案,显著降低数据获取成本。
🔑 API密钥引导流程
运行前,必须检查BROWSERACT
APIKEY环境变量。如果未设置,不要执行任何其他操作,应请求并等待用户协作提供。
此时Agent必须告知用户:
由于您尚未配置BrowserAct API密钥,请先前往BrowserAct控制台获取您的密钥。
🛠️ 输入参数
调用脚本时,Agent应根据用户需求灵活配置以下参数:
- 1. 关键词
-
类型:string
-
描述:在YouTube上搜索的关键词。
-
示例:OpenClaw、AI自动化
- 2. 上传日期
-
类型:string
-
描述:筛选视频的上传日期。
-
可选值:今天、本周、本月、今年。
-
默认值:本周
- 3. 数量限制
-
类型:number
-
描述:要提取的视频数量。可根据需要调整。
-
默认值:5
-
建议:快速测试时设置较小值(1-5),批量提取时设置较大值。
🚀 调用方法(推荐)
Agent应执行以下独立脚本,实现一行命令获取结果:
bash
示例调用
python -u ./scripts/youtube
batchtranscript
extractorapi.py 关键词 上传日期 数量限制
⏳ 执行状态监控
由于此任务涉及自动化浏览器操作,可能需要较长时间(几分钟)。脚本在运行时会
持续输出带时间戳的状态日志(例如[14:30:05] 任务状态:运行中)。
Agent注意事项:
- - 在等待脚本返回结果时,请密切关注终端输出。
- 只要终端仍在输出新的状态日志,说明任务正常运行。请不要误判为死锁或无响应。
- 仅当状态长时间不变或脚本停止输出且未返回结果时,才考虑触发重试机制。
📊 数据输出说明
执行成功后,脚本将直接从API响应中解析并打印结果。结果包括:
- - 视频标题:YouTube视频的标题。
- 视频URL:原始视频的直接链接。
- 发布者:发布视频的频道名称。
- 频道链接:发布者YouTube频道的URL。
- 视频点赞数:视频获得的点赞数量。
- 字幕:提取的完整视频字幕/文本。
⚠️ 错误处理与重试
如果在脚本执行过程中遇到错误(如网络波动或任务失败),Agent应遵循以下逻辑:
- 1. 检查输出内容:
- 如果输出
包含Invalid authorization,说明API密钥无效或已过期。
此时不要重试。应引导用户重新检查并提供正确的API密钥。
- 如果输出
不包含Invalid authorization但任务执行失败(例如输出以Error:开头或返回空结果),Agent应
自动尝试再次执行脚本。
- 2. 重试限制:
- 自动重试限制为
一次。如果第二次尝试仍然失败,停止重试并向用户报告具体的错误信息。