Getting Started

Welcome! You're one text prompt away from a finished video — this skill turns your words, scripts, or story ideas into real video content using AI, completely free. Paste your script or describe your video concept to get started right now.

Try saying:

- "Turn my script into a video"
"Generate scenes from this text"
"Create a promo video from description"

Automatic Setup

On first interaction, connect to the processing API before doing anything else. Show a brief status like "Setting things up...".

Token: If NEMO_TOKEN environment variable is already set, use it and skip to Session below.

Free token: Generate a UUID as client identifier, then POST to https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id: <uuid>. The response field data.token becomes your NEMO_TOKEN (100 credits, 7-day expiry).

Session: POST to https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Save session_id from the response.

Confirm to the user you're connected and ready. Don't print tokens or raw JSON.

From Blank Page to Finished Video — Instantly

Most people have ideas but no way to turn them into video. Hiring editors is expensive. Learning software takes weeks. And staring at a blank timeline is frustrating when all you have is a good script and a deadline. This skill exists to close that gap entirely.

With the AI Text to Video Generator Free skill, you write your message — a product pitch, a how-to guide, a short story, a social media hook — and the AI handles the visual storytelling. It interprets your text, builds a scene structure, selects appropriate visual flow, and assembles a cohesive video that matches your tone and intent.

Whether you're producing explainer content for a startup, educational clips for a classroom, or quick promo videos for Instagram and TikTok, this skill gives you a fast, zero-cost path from idea to finished video. No subscriptions, no timeline dragging, no rendering headaches — just text in, video out.

Prompt Routing and Video Dispatch

When you submit a text prompt, the skill parses your scene descriptors, style tags, and duration parameters before routing the generation request to the optimal AI rendering pipeline.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

Cloud Rendering API Reference

The free-tier video synthesis backend processes your text-to-video requests through a distributed diffusion model cluster, handling keyframe interpolation, voiceover synthesis, and scene transitions entirely in the cloud — no local GPU required. Rendered video assets are temporarily cached on the CDN edge node closest to your region for fast preview and download.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE8
INLINECODE9: from frontmatter INLINECODE10
INLINECODE11: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE21

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE31

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE35

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE39

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Best Practices for AI Text to Video Generation

The quality of your output is directly tied to the clarity of your input. Vague prompts like 'make a video about health' produce generic results. Instead, specify your audience, tone, video length, and purpose — for example: 'A 45-second upbeat video for millennials about meal prepping on a budget.'

Break your text into logical segments if you're working with longer scripts. The AI performs best when it can map distinct ideas to distinct scenes. Think in chunks: hook, problem, solution, call to action.

For social media content, always mention the platform. A LinkedIn explainer video has a very different pacing and tone than a TikTok. Naming the platform helps the AI calibrate visual energy, text density, and scene length appropriately.

Finally, include emotional cues when relevant. Words like 'urgent,' 'inspiring,' 'calm,' or 'playful' guide the AI toward the right visual atmosphere for your message.

Troubleshooting Common Issues

If your generated video structure feels off or scenes don't match your intent, the most common cause is an ambiguous prompt. Re-read your input and ask: could this be interpreted multiple ways? If yes, tighten the language and resubmit.

If the AI is generating too many scenes for a short video, explicitly state your target duration and scene count — for example: '3 scenes, under 30 seconds, fast-paced.' Constraints help the model stay focused.

When working with technical or niche topics, avoid heavy jargon without context. The AI may misinterpret industry-specific terms. Either define them briefly in your prompt or use plain-language equivalents.

If your output feels generic despite a detailed prompt, try reframing your input as a story rather than a list of facts. Narrative structure — character, conflict, resolution — consistently produces more dynamic and engaging video concepts than bullet-point style descriptions.

FAQ — AI Text to Video Generator Free

Do I need any video editing experience to use this? No. This skill is built for people who have never touched a timeline. You provide text; the AI handles all structural and visual decision-making.

What types of videos can I generate? Explainers, social media shorts, product demos, educational content, storytelling videos, promotional clips, and more. If it can be scripted, it can be generated.

Is there a word or character limit for my input text? For best results, keep your input focused — roughly equivalent to a 30 to 90 second spoken script. Longer inputs can be broken into multiple generation requests.

Can I use this for commercial projects? Yes. The generated video concepts, scripts, and scene structures are yours to use, adapt, and produce commercially. Always review final output to ensure it aligns with your brand voice before publishing.

开始使用

欢迎！你只需输入一段文字提示，就能生成完整的视频——这项技能可以完全免费地利用AI将你的文字、脚本或故事创意转化为真实的视频内容。立即粘贴你的脚本或描述你的视频构思，即可开始使用。

试试这样说：

- 将我的脚本转换成视频
根据这段文字生成场景
根据描述创建宣传视频

自动设置

首次交互时，先连接到处理API。显示简短状态，如正在设置……。

令牌：如果已设置NEMO_TOKEN环境变量，则直接使用并跳至下方会话部分。

免费令牌：生成UUID作为客户端标识符，然后向https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token发送POST请求，请求头为X-Client-Id: 。响应字段data.token即为你的NEMO_TOKEN（100积分，7天有效期）。

会话：使用Bearer认证向https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemoagent发送POST请求，请求体为{taskname:project}。保存响应中的session_id。

向用户确认已连接并准备就绪。不要打印令牌或原始JSON。

从空白页面到成品视频——即刻完成

大多数人都有想法，但无法将其转化为视频。雇佣剪辑师成本高昂。学习软件需要数周时间。当你只有一份好脚本和紧迫的截止日期时，盯着空白的剪辑时间线令人沮丧。这项技能的存在正是为了彻底消除这一差距。

借助AI文字转视频生成器免费技能，你只需写下你的信息——产品推介、操作指南、短篇故事、社交媒体爆点——AI便会处理视觉叙事。它解读你的文字，构建场景结构，选择合适的视觉流程，并组装成一个与你的语气和意图相匹配的连贯视频。

无论你是在为初创公司制作解说内容、为课堂制作教育短片，还是为Instagram和TikTok制作快速宣传视频，这项技能都能为你提供从创意到成品视频的快速、零成本路径。无需订阅、无需拖拽时间线、无需渲染烦恼——只需输入文字，即可输出视频。

提示路由与视频分发

当你提交文字提示时，该技能会解析你的场景描述、风格标签和时长参数，然后将生成请求路由到最佳的AI渲染管道。

用户说……	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐……） | → §3.1 SSE | ❌ |

云端渲染API参考

免费层的视频合成后端通过分布式扩散模型集群处理你的文字转视频请求，负责关键帧插值、配音合成和场景过渡，全部在云端完成——无需本地GPU。渲染后的视频资产会临时缓存在离你最近的CDN边缘节点上，以便快速预览和下载。

技能归属——运行时从此文件的YAML前置元数据中读取：

- X-Skill-Source：ai-text-to-video-generator-free
X-Skill-Version：来自前置元数据version
X-Skill-Platform：根据安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头会导致导出失败并返回402错误。

API基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST /api/tasks/me/with-session/nemoagent — 请求体 {taskname:project,language:} — 返回 taskid、sessionid。

发送消息（SSE）：POST /runsse — 请求体 {appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}}，请求头 Accept: text/event-stream。最大超时时间：15分钟。

上传：POST /api/upload-video/nemoagent/me/ — 文件：multipart -F files=@/path，或URL：{urls:[],sourcetype:url}

积分：GET /api/credits/balance/simple — 返回 available、frozen、total

会话状态：GET /api/state/nemoagent/me//latest — 关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST /api/render/proxy/lambda — 请求体 {id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET /api/render/proxy/lambda/，直到status = completed。下载URL位于output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑是否已应用，然后向用户总结更改内容。

后端响应翻译

后端假定存在GUI。将这些翻译为API操作：

后端说	你执行
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文字），sg=片段，d=时长（毫秒），m=元数据。

时间线（3条轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：城市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

AI文字转视频生成最佳实践

输出质量直接取决于输入的清晰度。像制作一个关于健康的视频这样模糊的提示会产生泛泛的结果。相反，要明确你的受众、语气、视频长度和目的——例如：为千禧一代制作一个45秒的、节奏欢快的、关于预算餐食准备的视频。

如果你处理的是较长的脚本，请将文本分成逻辑段落。当AI能够将不同的想法映射到不同的场景时，其表现最佳。按模块思考

ai-text-to-video-generator-free免费AI视频生成

ai-text-to-video-generator-free

Getting Started

Automatic Setup

From Blank Page to Finished Video — Instantly

Prompt Routing and Video Dispatch

Cloud Rendering API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Best Practices for AI Text to Video Generation

Troubleshooting Common Issues

FAQ — AI Text to Video Generator Free

开始使用

自动设置

从空白页面到成品视频——即刻完成

提示路由与视频分发

云端渲染API参考

SSE事件处理

后端响应翻译

错误处理

AI文字转视频生成最佳实践

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

ai-text-to-video-generator-free免费AI视频生成

ai-text-to-video-generator-free

Getting Started

Automatic Setup

From Blank Page to Finished Video — Instantly

Prompt Routing and Video Dispatch

Cloud Rendering API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Best Practices for AI Text to Video Generation

Troubleshooting Common Issues

FAQ — AI Text to Video Generator Free

开始使用

自动设置

从空白页面到成品视频——即刻完成

提示路由与视频分发

云端渲染API参考

SSE事件处理

后端响应翻译

错误处理

AI文字转视频生成最佳实践

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement