YouTube Search and Transcripts via Scavio
Search YouTube, retrieve video metadata, extract transcripts, and check AI trainability. All endpoints return structured JSON.
When to trigger
Use this skill when the user asks to:
- - Find YouTube videos on a topic
- Get the transcript of a YouTube video (for summarization, RAG, Q&A)
- Check view counts, upload date, or video metadata
- Verify if a video is suitable for AI training or has captions available
Setup
Get a free API key at https://scavio.dev (1,000 free credits/month, no card required):
CODEBLOCK0
Workflow
- 1. Finding a video: call
/search with the topic. Use sort_by: view_count for the most-watched result. - Getting content: if the user wants a summary, call
/transcript with the videoId from search results. - Checking metadata: call
/metadata for view counts, likes, tags, or channel info. - RAG pipelines: inject transcript segments directly into context. Each segment has
text, start, and duration. - Trainability: call
/trainability to check license and caption availability before ingesting.
Endpoints
| Endpoint | Description |
|---|
| INLINECODE9 | Search YouTube videos |
| INLINECODE10 |
Get structured video metadata |
|
POST https://api.scavio.dev/api/v1/youtube/transcript | Extract video transcript |
|
POST https://api.scavio.dev/api/v1/youtube/trainability | Check AI training suitability |
CODEBLOCK1
Search Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE13 | string | required | Search query — note: this field is search, not INLINECODE15 |
| INLINECODE16 |
string | -- |
last_hour,
today,
this_week,
this_month,
this_year |
|
type | string | -- |
video,
channel, or
playlist |
|
duration | string | -- |
short (< 4 min),
medium (4-20 min),
long (> 20 min) |
|
sort_by | string |
relevance |
relevance,
date,
view_count,
rating |
|
subtitles | boolean |
false | Videos with captions only |
|
creative_commons | boolean |
false | Creative Commons videos only |
Transcript Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE40 | string | required | YouTube video ID (e.g. sVcwVQRHIc8) |
| INLINECODE42 |
string |
en | Language code for transcript |
|
transcript_origin | string | -- |
auto_generated or
uploader_provided |
Examples
CODEBLOCK2
Search Response
CODEBLOCK3
Transcript response: INLINECODE47
Metadata response: title, view_count, like_count, comment_count, categories, tags, channel_id, upload_date, thumbnails.
Trainability response: has_transcript, transcript_languages, license, is_trainable.
Guardrails
- - The search parameter is
search, not query — this is different from other Scavio endpoints. - Never fabricate video IDs, view counts, or transcript content.
- Transcripts are raw captions — they may lack punctuation. Summarize them clearly.
- If a transcript is not available, call
/trainability to check why before reporting an error.
Failure handling
- - If search returns no results, suggest different keywords or relaxing filters.
- If transcript fails, check if the video has captions using
/trainability. - If
SCAVIO_API_KEY is not set, prompt the user to export it before continuing.
LangChain
CODEBLOCK4
CODEBLOCK5
通过Scavio进行YouTube搜索和转录
搜索YouTube,检索视频元数据,提取转录文本,并检查AI可训练性。所有端点均返回结构化JSON。
触发时机
当用户要求以下内容时使用此技能:
- - 查找某个主题的YouTube视频
- 获取YouTube视频的转录文本(用于摘要、RAG、问答)
- 查看观看次数、上传日期或视频元数据
- 验证视频是否适合AI训练或是否有字幕可用
设置
在https://scavio.dev获取免费API密钥(每月1000次免费额度,无需银行卡):
bash
export SCAVIOAPIKEY=skliveyour_key
工作流程
- 1. 查找视频: 使用主题调用/search。使用sortby: viewcount获取观看次数最多的结果。
- 获取内容: 如果用户需要摘要,使用搜索结果中的videoId调用/transcript。
- 检查元数据: 调用/metadata获取观看次数、点赞数、标签或频道信息。
- RAG管道: 直接将转录片段注入上下文。每个片段包含text、start和duration。
- 可训练性: 在摄取前调用/trainability检查许可和字幕可用性。
端点
| 端点 | 描述 |
|---|
| POST https://api.scavio.dev/api/v1/youtube/search | 搜索YouTube视频 |
| POST https://api.scavio.dev/api/v1/youtube/metadata |
获取结构化视频元数据 |
| POST https://api.scavio.dev/api/v1/youtube/transcript | 提取视频转录文本 |
| POST https://api.scavio.dev/api/v1/youtube/trainability | 检查AI训练适用性 |
Authorization: Bearer $SCAVIOAPIKEY
搜索参数
| 参数 | 类型 | 默认值 | 描述 |
|---|
| search | 字符串 | 必填 | 搜索查询 — 注意:此字段为search,而非query |
| uploaddate |
字符串 | -- | lasthour、today、this
week、thismonth、this_year |
| type | 字符串 | -- | video、channel或playlist |
| duration | 字符串 | -- | short(< 4分钟)、medium(4-20分钟)、long(> 20分钟) |
| sort
by | 字符串 | relevance | relevance、date、viewcount、rating |
| subtitles | 布尔值 | false | 仅包含有字幕的视频 |
| creative_commons | 布尔值 | false | 仅包含知识共享视频 |
转录参数
| 参数 | 类型 | 默认值 | 描述 |
|---|
| video_id | 字符串 | 必填 | YouTube视频ID(例如sVcwVQRHIc8) |
| language |
字符串 | en | 转录文本的语言代码 |
| transcript
origin | 字符串 | -- | autogenerated或uploader_provided |
示例
python
import os, requests
BASE = https://api.scavio.dev
HEADERS = {Authorization: fBearer {os.environ[SCAVIOAPIKEY]}}
搜索 — 使用search字段,而非query
results = requests.post(f{BASE}/api/v1/youtube/search, headers=HEADERS,
json={search: langchain tutorial, type: video, sort
by: viewcount}).json()
video_id = results[data][0][videoId]
用于RAG的转录文本
transcript = requests.post(f{BASE}/api/v1/youtube/transcript, headers=HEADERS,
json={video
id: videoid, language: en}).json()
text = .join(seg[text] for seg in transcript[data])
搜索响应
json
{
data: [
{
videoId: sVcwVQRHIc8,
title: Learn RAG From Scratch - Python AI Tutorial,
channel: freeCodeCamp.org,
publishedAt: 2024-04-17,
duration: 2:33:11,
viewCount: 1258310,
thumbnail: https://i.ytimg.com/vi/sVcwVQRHIc8/hq720.jpg
}
],
credits_used: 1
}
转录响应:[{text: ..., start: 0.0, duration: 3.2}]
元数据响应:title、viewcount、likecount、commentcount、categories、tags、channelid、upload_date、thumbnails。
可训练性响应:hastranscript、transcriptlanguages、license、is_trainable。
约束条件
- - 搜索参数为search,而非query — 这与Scavio的其他端点不同。
- 切勿虚构视频ID、观看次数或转录内容。
- 转录文本为原始字幕 — 可能缺少标点符号。请清晰地总结。
- 如果转录文本不可用,在报告错误前先调用/trainability检查原因。
失败处理
- - 如果搜索无结果,建议使用不同的关键词或放宽筛选条件。
- 如果转录失败,使用/trainability检查视频是否有字幕。
- 如果未设置SCAVIOAPIKEY,提示用户先导出后再继续。
LangChain
bash
pip install scavio-langchain
python
from scavio_langchain import ScavioSearchTool
tool = ScavioSearchTool(engine=youtube)