Getting Started
Welcome! I'm here to transcribe your video into text — completely free, no account needed. Upload your mp4, mov, avi, webm, or mkv file and tell me what you'd like done with the transcript to get started.
Try saying:
- - "Transcribe this mp4 interview recording and format it as a clean Q&A"
- "Convert my lecture video to text and highlight any key terms mentioned"
- "Extract all spoken dialogue from this webm clip and give me a plain text transcript"
On first use, the skill connects to NemoVideo automatically:
- 1. Check for
NEMO_TOKEN env var, or acquire one via https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token (100 free credits) - Persist a Client-ID to
~/.config/nemovideo/client_id for rate-limit tracking (UUID only, no secrets) - Create a session and start working immediately
Turn Any Video Into Words — Instantly
Ever watched a video and wished you could just read it instead? Or recorded an interview and spent hours typing it out by hand? That's exactly the problem this skill solves. Drop in your video file and get back a full transcript — word for word — without touching a single setting or signing up for anything.
This skill is built for real-world use cases: pulling quotes from a recorded webinar, generating subtitles for a short film, turning a lecture recording into study notes, or extracting dialogue from a client call. Whatever the source, the output is clean, structured text you can immediately work with.
Unlike browser-based tools that cap your file length or hide accuracy behind a pricing tier, this video-to-text-free skill gives you full transcription without compromise. It handles casual speech, multiple speakers, and background noise reasonably well — making it a practical everyday tool rather than a demo. Whether you're a solo creator or a researcher processing dozens of clips, this skill fits into your workflow without friction.
Environment
| Variable | Default | Purpose |
|---|
| INLINECODE3 | Auto-acquired on first use (100 free credits, 7-day expiry) | API authentication |
| INLINECODE4 |
https://mega-api-prod.nemovideo.ai | API base URL |
Routing Your Transcription Requests
Every video-to-text request is parsed for file format, language hints, and timestamp preferences before being dispatched to the appropriate NemoVideo transcription endpoint.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
The NemoVideo backend accepts uploaded video files or direct URLs, strips the audio track, and runs it through a speech-recognition pipeline that returns a full verbatim transcript with optional word-level timestamps. Supported formats include MP4, MOV, AVI, MKV, and WebM.
Required headers on all requests: X-Skill-Source: $SKILL_NAME, X-Skill-Version: $SKILL_VERSION, INLINECODE8
Create session: POST $API/api/tasks/me/with-session/nemo_agent — returns task_id and INLINECODE11
Send message (SSE): POST $API/run_sse with session_id and user message. Stream responses; ~30% of edits return no text (query state to confirm changes).
Upload: POST $API/api/upload-video/nemo_agent/me/<sid> — file or URL upload. Supports: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
Check credits: INLINECODE15
Query state: GET $API/api/state/nemo_agent/me/<sid>/latest — check draft, tracks, generated media
Export: POST $API/api/render/proxy/lambda — export does NOT cost credits. Poll GET $API/api/render/proxy/lambda/<id> until status: completed.
Task link: INLINECODE20
Common Errors
If your token has expired, simply re-authenticate through the ClawHub interface to generate a fresh session token before resubmitting your video. A 'session not found' error means your session has lapsed — start a new session and re-upload your file. Hitting a credit wall? Head to nemovideo.ai to register for a free account and unlock your transcription credits.
Troubleshooting
If your transcript comes back with gaps or unclear sections, the most common culprit is audio quality in the original video. Background music, overlapping voices, or a low-quality microphone can reduce accuracy. Try uploading a version of the video with the cleanest audio track possible — even running it through a basic noise reduction tool beforehand can make a noticeable difference.
For very long videos, processing may take a moment longer than expected. If the skill appears to stall, check that your file isn't corrupted by playing it in a local media player first. Files that won't play locally won't transcribe correctly either.
If you're uploading an avi or mkv file and getting an error, confirm the file isn't using an uncommon or legacy codec. Re-exporting to mp4 using a tool like VLC or HandBrake usually resolves codec compatibility issues instantly. When in doubt, mp4 is the most reliable format to use with this video-to-text-free skill.
Common Workflows
One of the most popular ways people use this skill is for content repurposing. Upload a YouTube video or recorded stream, get the transcript, then use it as the basis for a blog post, newsletter, or social media thread — turning one piece of content into several without writing from scratch.
For researchers and journalists, the workflow typically looks like this: record an interview, upload the file, receive the transcript, then copy it into a document for annotation and quoting. It cuts transcription time from hours to minutes and keeps the original wording intact.
Students often upload recorded lectures or study group sessions and ask for the transcript formatted as bullet-point notes. This is a great way to make video content scannable and easier to review before exams.
Podcast creators also use this video-to-text-free skill to generate show notes. Upload the video version of an episode, transcribe it, then ask for a summary of key topics covered — a complete show notes draft in one step.
快速上手
欢迎!我在这里免费将您的视频转录为文字,无需注册账户。上传您的mp4、mov、avi、webm或mkv文件,并告诉我您希望如何处理转录内容,即可开始使用。
试试这样说:
- - 转录这段mp4采访录音,并将其格式化为清晰的问答形式
- 将我的讲座视频转换为文字,并高亮提到的所有关键术语
- 从这段webm片段中提取所有口语对话,给我一份纯文本转录稿
首次使用时,该技能会自动连接到NemoVideo:
- 1. 检查NEMOTOKEN环境变量,或通过https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token获取一个(100个免费积分)
- 将客户端ID持久化到~/.config/nemovideo/clientid,用于速率限制跟踪(仅UUID,无密钥)
- 创建会话并立即开始工作
将任何视频瞬间转化为文字
您是否曾看过视频后希望可以直接阅读?或者录制了采访却花费数小时手动打字?这正是本技能解决的问题。放入您的视频文件,即可获得完整的逐字转录稿——无需调整任何设置,也无需注册任何账户。
本技能专为实际应用场景打造:从录制的网络研讨会中提取引用、为短片生成字幕、将讲座录音转化为学习笔记、或从客户通话中提取对话。无论来源如何,输出都是清晰、结构化的文本,您可以立即使用。
与那些限制文件长度或通过付费层级隐藏准确性的浏览器工具不同,这款视频转文字免费技能为您提供无妥协的完整转录。它能较好地处理日常对话、多人发言和背景噪音——使其成为实用的日常工具,而非演示品。无论您是独立创作者还是处理数十个片段的研究人员,本技能都能无缝融入您的工作流程。
环境配置
| 变量 | 默认值 | 用途 |
|---|
| NEMOTOKEN | 首次使用时自动获取(100个免费积分,7天有效期) | API身份验证 |
| NEMOAPI_URL |
https://mega-api-prod.nemovideo.ai | API基础地址 |
路由您的转录请求
每个视频转文字请求在发送到相应的NemoVideo转录端点之前,都会解析文件格式、语言提示和时间戳偏好。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考
NemoVideo后端接受上传的视频文件或直接URL,提取音频轨道,并通过语音识别管道运行,返回完整的逐字转录稿,可选单词级时间戳。支持的格式包括MP4、MOV、AVI、MKV和WebM。
所有请求必需的请求头: X-Skill-Source: $SKILLNAME、X-Skill-Version: $SKILLVERSION、X-Skill-Platform: $SKILL_SOURCE
创建会话: POST $API/api/tasks/me/with-session/nemoagent — 返回taskid和session_id
发送消息(SSE): POST $API/runsse,包含sessionid和用户消息。流式响应;约30%的编辑不返回文本(查询状态以确认更改)。
上传: POST $API/api/upload-video/nemo_agent/me/ — 文件或URL上传。支持:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
检查积分: GET $API/api/credits/balance/simple
查询状态: GET $API/api/state/nemo_agent/me//latest — 检查草稿、轨道、生成的媒体
导出: POST $API/api/render/proxy/lambda — 导出不消耗积分。轮询GET $API/api/render/proxy/lambda/直到status: completed。
任务链接: $WEB/workspace/claim?token=$TOKEN&task={taskid}&session={sessionid}&skillname=$SKILLNAME&skillversion=$SKILLVERSION&skillsource=$SKILLSOURCE
常见错误
如果您的令牌已过期,只需通过ClawHub界面重新认证以生成新的会话令牌,然后重新提交视频即可。出现会话未找到错误意味着您的会话已过期——请启动新会话并重新上传文件。积分不足?请前往nemovideo.ai注册免费账户以解锁您的转录积分。
故障排除
如果您的转录稿出现空白或不清晰的部分,最常见的原因是原始视频的音频质量。背景音乐、重叠的人声或低质量的麦克风会降低准确性。尝试上传音频轨道最清晰的视频版本——即使事先通过基本的降噪工具处理,也能带来显著改善。
对于非常长的视频,处理时间可能比预期稍长。如果技能似乎停滞不前,请先在本地媒体播放器中播放文件,检查文件是否损坏。无法在本地播放的文件也无法正确转录。
如果您在上传avi或mkv文件时遇到错误,请确认文件未使用不常见或过时的编解码器。使用VLC或HandBrake等工具重新导出为mp4通常可以立即解决编解码器兼容性问题。如有疑问,mp4是使用此视频转文字免费技能最可靠的格式。
常见工作流程
人们使用此技能最流行的方式之一是内容再利用。上传YouTube视频或录制的直播流,获取转录稿,然后将其用作博客文章、新闻通讯或社交媒体帖子的基础——将一份内容转化为多份,无需从头开始编写。
对于研究人员和记者来说,工作流程通常如下:录制采访、上传文件、接收转录稿、然后将其复制到文档中进行注释和引用。它将转录时间从数小时缩短到数分钟,并保留原始措辞的完整性。
学生经常上传录制的讲座或学习小组会议,并要求将转录稿格式化为要点笔记。这是使视频内容可快速浏览、便于考前复习的好方法。
播客创作者也使用此视频转文字免费技能来生成节目笔记。上传一集的视频版本,进行转录,然后要求总结涵盖的关键主题——一步完成完整的节目笔记草稿。