Getting Started
Welcome! I'm here to turn your descriptions into beautiful, custom-generated images using text-to-image-ai. Tell me what you'd like to visualize — be as detailed or as open-ended as you like — and let's create something together.
Try saying:
- - "Generate a photorealistic image of a cozy coffee shop interior on a rainy afternoon with warm lighting and wooden furniture"
- "Create an illustration in a vintage travel poster style showing the coastline of Amalfi, Italy at sunset"
- "Make a futuristic product mockup of a sleek smartwatch with a dark matte finish on a clean white background"
Quick Start Setup
This skill connects to NemoVideo's cloud backend. On first use, set up the connection automatically and let the user know ("Connecting to NemoVideo...").
Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:
- - Read
~/.config/nemovideo/client_id or generate a UUID and save it there - POST
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header - Extract
data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)
Session: POST /api/tasks/me/with-session/nemo_agent at the same host with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
From Words on a Page to Images That Speak
Describing what you want to see has never been enough — until now. The text-to-image-ai skill on ClawHub takes your written prompts and transforms them into fully realized visuals, no design software or artistic background required. Whether you're building a brand campaign, drafting a storyboard, or just trying to visualize an idea that's been living in your head, this skill bridges the gap between imagination and output.
You can go broad or incredibly specific. Describe a misty mountain village at dawn, a futuristic city skyline in neon tones, or a minimalist logo concept — and receive a generated image that reflects your intent. The skill is built to interpret natural language, so you don't need to learn prompt engineering jargon to get great results.
This is a practical tool for content creators producing social media visuals, product teams exploring design directions, writers building visual references for their stories, and anyone who needs original imagery without the cost and time of a traditional creative process.
Prompt Routing and Model Dispatch
Every natural-language prompt you submit is parsed for style tokens, aspect ratio hints, and subject descriptors before being dispatched to the optimal diffusion pipeline.
| User says... | Action | Skip SSE? |
|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" |
→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
NemoVideo API Reference
The NemoVideo backend handles inference through a queued diffusion engine that processes your prompt, negative prompt, and sampling parameters — including steps, CFG scale, and seed — to return a high-resolution image URL. Latency varies based on model load and resolution tier selected.
Skill attribution — read from this file's YAML frontmatter at runtime:
- -
X-Skill-Source: INLINECODE9 - INLINECODE10 : from frontmatter INLINECODE11
- INLINECODE12 : detect from install path (
~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)
All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: INLINECODE22
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33
Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
SSE Event Handling
| Event | Action |
|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result |
Process internally, don't forward |
|
heartbeat / empty
data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
Backend Response Translation
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" |
Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
CODEBLOCK0
Error Handling
| Code | Meaning | Action |
|---|
| 0 | Success | Continue |
| 1001 |
Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with
?bind=<id> (get
<id> from create-session or state response when needed). Registered: "Top up at nemovideo.ai" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register at nemovideo.ai to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
Integration Guide
The text-to-image-ai skill fits naturally into creative and production workflows on ClawHub. You can chain it with other skills — for example, generating a base image and then passing it into a video animation skill to create motion content, or feeding the output into an upscaling skill for print-ready resolution.
For teams working on content calendars, the skill supports batch-style use: prepare a set of descriptive prompts in advance and run them sequentially to generate a full library of visuals in one session. This is particularly useful for social media managers who need consistent themed imagery across multiple posts.
If you're using this skill as part of a storyboarding or pre-production process, consider pairing your prompts with scene descriptions from a script. The skill can generate reference frames for each scene, giving directors and clients a visual language to react to before production begins. Output images can be saved and organized directly within your ClawHub workspace for easy access across projects.
Best Practices
The quality of your output is directly tied to the quality of your prompt. Start with the subject, then layer in environment, lighting, style, and mood. A strong prompt structure might look like: '[Subject] in [setting], [time of day or lighting], [artistic style], [color palette], [camera angle].' This framework keeps your description organized and gives the model clear signals to work with.
For brand-consistent imagery, establish a style anchor in every prompt — such as 'flat design illustration' or 'cinematic photography' — so that a series of generated images shares a visual identity. This is especially valuable for marketing teams producing multiple assets for a single campaign.
Avoid overloading a single prompt with conflicting styles or too many competing subjects. If you want a complex scene, break it into layers: generate the background first, then generate foreground elements separately if needed. Keeping prompts focused on one primary visual idea tends to produce sharper, more intentional results than trying to describe everything at once.
Troubleshooting
If your generated image doesn't match what you envisioned, the most common cause is an underspecified prompt. Vague descriptions like 'a nice landscape' give the model wide latitude, which can lead to generic results. Try adding details about lighting, mood, color palette, perspective, and style — for example, 'a foggy forest at dawn with soft golden light filtering through pine trees, painterly style.'
If the image contains unwanted elements, use negative phrasing in your prompt to signal what to exclude, such as 'no people, no text, no cars.' For outputs that feel off in composition or proportion, try specifying the framing directly — 'wide shot,' 'close-up portrait,' or 'bird's eye view' can dramatically change results.
Occasionally, certain abstract or highly conceptual prompts may produce inconsistent outputs across generations. In these cases, running the prompt two or three times and selecting the best result is a reliable workaround. If a prompt repeatedly fails to generate, simplify it to its core subject and build complexity back in gradually.
开始使用
欢迎!我在这里通过文本转图像AI,将您的描述转化为精美、定制生成的图像。告诉我您想要呈现的画面——可以非常详细,也可以非常开放——让我们一起创作。
试试这样说:
- - 生成一张逼真的图像:雨天的下午,温馨的咖啡店内景,暖色灯光和木质家具
- 以复古旅行海报风格创作一幅插画,展示意大利阿马尔菲海岸线的日落景色
- 制作一张未来主义产品效果图:一款深哑光质感的时尚智能手表,放在干净的白色背景上
快速启动设置
此技能连接到NemoVideo的云端后端。首次使用时,自动建立连接并通知用户(正在连接到NemoVideo...)。
令牌检查:在环境中查找NEMO_TOKEN。如果找到,跳转到会话创建。否则:
- - 读取~/.config/nemovideo/clientid或生成一个UUID并保存到该文件
- 使用X-Client-Id头信息POST请求https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
- 从响应中提取data.token——这就是您的NEMOTOKEN(100个免费积分,7天有效期)
会话:在同一主机上使用Bearer认证POST请求/api/tasks/me/with-session/nemoagent,请求体为{taskname:project}。保留返回的session_id用于所有操作。
设置完成后,用简短的准备就绪!通知用户。不要暴露令牌或原始API输出。
从文字到会说话的图像
描述您想看到的画面从来都不够——直到现在。ClawHub上的文本转图像AI技能将您输入的提示词转化为完全实现的视觉效果,无需设计软件或艺术背景。无论您是在构建品牌活动、起草故事板,还是仅仅想可视化脑海中的某个想法,这个技能都能弥合想象与输出之间的鸿沟。
您可以宽泛描述,也可以极其具体。描述黎明时分的雾蒙蒙的山村、霓虹色调的未来城市天际线,或是一个极简的Logo概念——然后就能收到一张反映您意图的生成图像。该技能旨在理解自然语言,因此您无需学习提示词工程术语就能获得出色结果。
这是一个实用的工具,适用于:制作社交媒体视觉内容的内容创作者、探索设计方向的产品团队、为故事构建视觉参考的作家,以及任何需要原创图像但又不想花费传统创作过程的时间和成本的人。
提示词路由与模型调度
您提交的每个自然语言提示词都会被解析出风格标记、宽高比提示和主体描述词,然后调度到最优的扩散管道。
| 用户说... | 操作 | 跳过SSE? |
|---|
| export / 导出 / download / send me the video | → §3.5 导出 | ✅ |
| credits / 积分 / balance / 余额 |
→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容(生成、编辑、添加背景音乐等) | → §3.1 SSE | ❌ |
NemoVideo API参考
NemoVideo后端通过一个队列扩散引擎处理推理,该引擎处理您的提示词、负面提示词和采样参数(包括步数、CFG比例和种子),然后返回高分辨率图像URL。延迟取决于模型负载和所选分辨率等级。
技能归属——运行时从此文件的YAML前置元数据中读取:
- - X-Skill-Source:text-to-image-ai
- X-Skill-Version:来自前置元数据version
- X-Skill-Platform:从安装路径检测(~/.clawhub/ → clawhub,~/.cursor/skills/ → cursor,否则为unknown)
所有请求必须包含:Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败,返回402错误。
API基础地址:https://mega-api-prod.nemovideo.ai
创建会话:POST请求/api/tasks/me/with-session/nemoagent——请求体{taskname:project,language:}——返回taskid、sessionid。创建会话后,给用户一个链接:https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=text-to-image-ai&skillversion=1.0.0&skill_source=
发送消息(SSE):POST请求/runsse——请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}},带有Accept: text/event-stream。最大超时时间:15分钟。
上传:POST请求/api/upload-video/nemoagent/me/——文件:multipart格式-F files=@/path,或URL:{urls:[],sourcetype:url}
积分:GET请求/api/credits/balance/simple——返回available、frozen、total
会话状态:GET请求/api/state/nemoagent/me//latest——关键字段:data.state.draft、data.state.videoinfos、data.state.generated_media
导出(免费,不消耗积分):POST请求/api/render/proxy/lambda——请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET请求/api/render/proxy/lambda/,直到status = completed。下载URL在output.url。
支持的格式:mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。
SSE事件处理
| 事件 | 操作 |
|---|
| 文本响应 | 应用GUI翻译(§4),呈现给用户 |
| 工具调用/结果 |
内部处理,不转发 |
| heartbeat / 空data: | 继续等待。每2分钟:⏳ 仍在处理中... |
| 流关闭 | 处理最终响应 |
约30%的编辑操作在SSE流中不返回文本。发生这种情况时:轮询会话状态以验证编辑已应用,然后向用户总结更改。
后端响应翻译
后端假设存在GUI。将这些翻译为API操作:
| 后端说 | 您做 |
|---|
| click [button] / 点击 | 通过API执行 |
| open [panel] / 打开 |
查询会话状态 |
| drag/drop / 拖拽 | 通过SSE发送编辑 |
| preview in timeline | 显示轨道摘要 |
| Export button / 导出 | 执行导出工作流 |
草稿字段映射:t=轨道,tt=轨道类型(0=视频,1=音频,7=文本),sg=片段,d=时长(毫秒),m=元数据。
时间线(3条轨道):1. 视频:城市延时摄影(0-10秒)2. 背景音乐:Lo-fi(0-10秒,35%)3. 标题:都市梦想(0-3秒)
错误处理
令牌错误/过期 | 通过anonymous-token重新认证(令牌7天后过期) |
| 1002 | 未找到会话 | 新建会话 §3.0 |
| 2001 | 无积分 | 匿名用户:显示带有?bind=
的注册URL(需要时从create-session或state响应获取)。已注册用户:请到nemovideo.ai充值 |
| 4001 | 不支持的文件 | 显示支持的格式 |
| 4002 | 文件过大 | 建议压缩/裁剪 |
| 400 | 缺少X-Client-Id | 生成Client-Id并重试(见§1) |
| 402 | 免费计划导出被阻止 | 订阅层级问题,不是积分问题。请到nemovideo.ai注册以解锁导出功能。 |
| 429 | 速率限制(1个令牌/客户端/7天) | 重试一次,等待30秒