Getting Started
Welcome! You're one text prompt away from a finished video — this skill turns your words, scripts, or story ideas into real video content using AI, completely free. Paste your script or describe your video concept to get started right now.
Try saying:
- - "Turn my script into a video"
- "Generate scenes from this text"
- "Create a promo video from description"
Automatic Setup
On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".
Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.
Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).
Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.
Confirm to the user you're connected and ready. Don't print tokens or raw JSON.
From Blank Page to Finished Video — Instantly
Most people have ideas but no way to turn them into video. Hiring editors is expensive. Learning software takes weeks. And staring at a blank timeline is frustrating when all you have is a good script and a deadline. This skill exists to close that gap entirely.
With the AI Text to Video Generator Free skill, you write your message — a product pitch, a how-to guide, a short story, a social media hook — and the AI handles the visual storytelling. It interprets your text, builds a scene structure, selects appropriate visual flow, and assembles a cohesive video that matches your tone and intent.
Whether you're producing explainer content for a startup, educational clips for a classroom, or quick promo videos for Instagram and TikTok, this skill gives you a fast, zero-cost path from idea to finished video. No subscriptions, no timeline dragging, no rendering headaches — just text in, video out.
Prompt Routing and Video Dispatch
When you submit a text prompt, the skill parses your scene descriptors, style tags, and duration parameters before routing the generation request to the optimal AI rendering pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Cloud Rendering API Reference
The free-tier video synthesis backend processes your text-to-video requests through a distributed diffusion model cluster, handling keyframe interpolation, voiceover synthesis, and scene transitions entirely in the cloud — no local GPU required. Rendered video assets are temporarily cached on the CDN edge node closest to your region for fast preview and download.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE8 - INLINECODE9 : from frontmatter INLINECODE10
- INLINECODE11 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE21
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Best Practices for AI Text to Video Generation
The quality of your output is directly tied to the clarity of your input. Vague prompts like 'make a video about health' produce generic results. Instead, specify your audience, tone, video length, and purpose — for example: 'A 45-second upbeat video for millennials about meal prepping on a budget.'
Break your text into logical segments if you're working with longer scripts. The AI performs best when it can map distinct ideas to distinct scenes. Think in chunks: hook, problem, solution, call to action.
For social media content, always mention the platform. A LinkedIn explainer video has a very different pacing and tone than a TikTok. Naming the platform helps the AI calibrate visual energy, text density, and scene length appropriately.
Finally, include emotional cues when relevant. Words like 'urgent,' 'inspiring,' 'calm,' or 'playful' guide the AI toward the right visual atmosphere for your message.
Troubleshooting Common Issues
If your generated video structure feels off or scenes don't match your intent, the most common cause is an ambiguous prompt. Re-read your input and ask: could this be interpreted multiple ways? If yes, tighten the language and resubmit.
If the AI is generating too many scenes for a short video, explicitly state your target duration and scene count — for example: '3 scenes, under 30 seconds, fast-paced.' Constraints help the model stay focused.
When working with technical or niche topics, avoid heavy jargon without context. The AI may misinterpret industry-specific terms. Either define them briefly in your prompt or use plain-language equivalents.
If your output feels generic despite a detailed prompt, try reframing your input as a story rather than a list of facts. Narrative structure — character, conflict, resolution — consistently produces more dynamic and engaging video concepts than bullet-point style descriptions.
FAQ — AI Text to Video Generator Free
Do I need any video editing experience to use this? No. This skill is built for people who have never touched a timeline. You provide text; the AI handles all structural and visual decision-making.
What types of videos can I generate? Explainers, social media shorts, product demos, educational content, storytelling videos, promotional clips, and more. If it can be scripted, it can be generated.
Is there a word or character limit for my input text? For best results, keep your input focused — roughly equivalent to a 30 to 90 second spoken script. Longer inputs can be broken into multiple generation requests.
Can I use this for commercial projects? Yes. The generated video concepts, scripts, and scene structures are yours to use, adapt, and produce commercially. Always review final output to ensure it aligns with your brand voice before publishing.
开始使用
欢迎!你只需输入一段文字提示,就能生成完整的视频——这项技能可以完全免费地利用AI将你的文字、脚本或故事创意转化为真实的视频内容。立即粘贴你的脚本或描述你的视频构思,即可开始使用。
试试这样说:
- - 将我的脚本转换成视频
- 根据这段文字生成场景
- 根据描述创建宣传视频
自动设置
首次交互时,先连接到处理API。显示简短状态,如正在设置……。
令牌:如果已设置NEMO_TOKEN环境变量,则直接使用并跳至下方会话部分。
免费令牌:生成UUID作为客户端标识符,然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求,请求头为X-Client-Id: 。响应字段data.token即为你的NEMO_TOKEN(100积分,7天有效期)。
会话:使用Bearer认证向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求,请求体为{taskname:project}。保存响应中的session_id。
向用户确认已连接并准备就绪。不要打印令牌或原始JSON。
从空白页面到成品视频——即刻完成
大多数人都有想法,但无法将其转化为视频。雇佣剪辑师成本高昂。学习软件需要数周时间。当你只有一份好脚本和紧迫的截止日期时,盯着空白的剪辑时间线令人沮丧。这项技能的存在正是为了彻底消除这一差距。
借助AI文字转视频生成器免费技能,你只需写下你的信息——产品推介、操作指南、短篇故事、社交媒体爆点——AI便会处理视觉叙事。它解读你的文字,构建场景结构,选择合适的视觉流程,并组装成一个与你的语气和意图相匹配的连贯视频。
无论你是在为初创公司制作解说内容、为课堂制作教育短片,还是为Instagram和TikTok制作快速宣传视频,这项技能都能为你提供从创意到成品视频的快速、零成本路径。无需订阅、无需拖拽时间线、无需渲染烦恼——只需输入文字,即可输出视频。
提示路由与视频分发
当你提交文字提示时,该技能会解析你的场景描述、风格标签和时长参数,然后将生成请求路由到最佳的AI渲染管道。
| 用户说…… | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐……) | → §3.1 SSE | ❌ |
云端渲染API参考
免费层的视频合成后端通过分布式扩散模型集群处理你的文字转视频请求,负责关键帧插值、配音合成和场景过渡,全部在云端完成——无需本地GPU。渲染后的视频资产会临时缓存在离你最近的CDN边缘节点上,以便快速预览和下载。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:ai-text-to-video-generator-free
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:根据安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头会导致导出失败并返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。
发送消息(SSE):POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},请求头 Accept: text/event-stream。最大超时时间:15分钟。
上传:POST /api/upload-video/nemoagent/me/ — 文件:multipart -F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET /api/credits/balance/simple — 返回 available、frozen、total
会话状态:GET /api/state/nemoagent/me//latest — 关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/,直到status = completed。下载URL位于output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中…… |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑是否已应用,然后向用户总结更改内容。
后端响应翻译
后端假定存在GUI。将这些翻译为API操作:
| 后端说 | 你执行 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文字),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:城市梦想(0-3秒)
错误处理
令牌错误或已过期 | 通过匿名令牌重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有?bind=
的注册URL(需要时从创建会话或状态响应中获取)。已注册用户:请在你的账户中充值积分 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩或裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(参见§1) |
| 402 | 免费计划导出受限 | 订阅层级问题,非积分问题。注册或升级你的套餐以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 等待30秒后重试一次 |
AI文字转视频生成最佳实践
输出质量直接取决于输入的清晰度。像制作一个关于健康的视频这样模糊的提示会产生泛泛的结果。相反,要明确你的受众、语气、视频长度和目的——例如:为千禧一代制作一个45秒的、节奏欢快的、关于预算餐食准备的视频。
如果你处理的是较长的脚本,请将文本分成逻辑段落。当AI能够将不同的想法映射到不同的场景时,其表现最佳。按模块思考