Getting Started

Welcome! I'm here to turn your descriptions into beautiful, custom-generated images using text-to-image-ai. Tell me what you'd like to visualize — be as detailed or as open-ended as you like — and let's create something together.

Try saying:

- "Generate a photorealistic image of a cozy coffee shop interior on a rainy afternoon with warm lighting and wooden furniture"
"Create an illustration in a vintage travel poster style showing the coastline of Amalfi, Italy at sunset"
"Make a futuristic product mockup of a sleek smartwatch with a dark matte finish on a clean white background"

Quick Start Setup

This skill connects to NemoVideo's cloud backend. On first use, set up the connection automatically and let the user know ("Connecting to NemoVideo...").

Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

- Read ~/.config/nemovideo/client_id or generate a UUID and save it there
POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)

Session: POST /api/tasks/me/with-session/nemo_agent at the same host with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

From Words on a Page to Images That Speak

Describing what you want to see has never been enough — until now. The text-to-image-ai skill on ClawHub takes your written prompts and transforms them into fully realized visuals, no design software or artistic background required. Whether you're building a brand campaign, drafting a storyboard, or just trying to visualize an idea that's been living in your head, this skill bridges the gap between imagination and output.

You can go broad or incredibly specific. Describe a misty mountain village at dawn, a futuristic city skyline in neon tones, or a minimalist logo concept — and receive a generated image that reflects your intent. The skill is built to interpret natural language, so you don't need to learn prompt engineering jargon to get great results.

This is a practical tool for content creators producing social media visuals, product teams exploring design directions, writers building visual references for their stories, and anyone who needs original imagery without the cost and time of a traditional creative process.

Prompt Routing and Model Dispatch

Every natural-language prompt you submit is parsed for style tokens, aspect ratio hints, and subject descriptors before being dispatched to the optimal diffusion pipeline.

User says...	Action	Skip SSE?
"export" / "导出" / "download" / "send me the video"	→ §3.5 Export	✅
"credits" / "积分" / "balance" / "余额"

→ §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |

NemoVideo API Reference

The NemoVideo backend handles inference through a queued diffusion engine that processes your prompt, negative prompt, and sampling parameters — including steps, CFG scale, and seed — to return a high-resolution image URL. Latency varies based on model load and resolution tier selected.

Skill attribution — read from this file's YAML frontmatter at runtime:

- X-Skill-Source: INLINECODE9
INLINECODE10: from frontmatter INLINECODE11
INLINECODE12: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)

All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.

API base: INLINECODE22

Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id. After creating a session, give the user a link: INLINECODE27

Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.

Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: INLINECODE33

Credits: GET /api/credits/balance/simple — returns available, frozen, INLINECODE37

Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, INLINECODE41

Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.

Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.

SSE Event Handling

Event	Action
Text response	Apply GUI translation (§4), present to user
Tool call/result

~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.

Backend Response Translation

The backend assumes a GUI exists. Translate these into API actions:

Backend says	You do
"click [button]" / "点击"	Execute via API
"open [panel]" / "打开"

Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.

CODEBLOCK0

Error Handling

Code	Meaning	Action
0	Success	Continue
1001

Integration Guide

The text-to-image-ai skill fits naturally into creative and production workflows on ClawHub. You can chain it with other skills — for example, generating a base image and then passing it into a video animation skill to create motion content, or feeding the output into an upscaling skill for print-ready resolution.

For teams working on content calendars, the skill supports batch-style use: prepare a set of descriptive prompts in advance and run them sequentially to generate a full library of visuals in one session. This is particularly useful for social media managers who need consistent themed imagery across multiple posts.

If you're using this skill as part of a storyboarding or pre-production process, consider pairing your prompts with scene descriptions from a script. The skill can generate reference frames for each scene, giving directors and clients a visual language to react to before production begins. Output images can be saved and organized directly within your ClawHub workspace for easy access across projects.

Best Practices

The quality of your output is directly tied to the quality of your prompt. Start with the subject, then layer in environment, lighting, style, and mood. A strong prompt structure might look like: '[Subject] in [setting], [time of day or lighting], [artistic style], [color palette], [camera angle].' This framework keeps your description organized and gives the model clear signals to work with.

For brand-consistent imagery, establish a style anchor in every prompt — such as 'flat design illustration' or 'cinematic photography' — so that a series of generated images shares a visual identity. This is especially valuable for marketing teams producing multiple assets for a single campaign.

Avoid overloading a single prompt with conflicting styles or too many competing subjects. If you want a complex scene, break it into layers: generate the background first, then generate foreground elements separately if needed. Keeping prompts focused on one primary visual idea tends to produce sharper, more intentional results than trying to describe everything at once.

Troubleshooting

If your generated image doesn't match what you envisioned, the most common cause is an underspecified prompt. Vague descriptions like 'a nice landscape' give the model wide latitude, which can lead to generic results. Try adding details about lighting, mood, color palette, perspective, and style — for example, 'a foggy forest at dawn with soft golden light filtering through pine trees, painterly style.'

If the image contains unwanted elements, use negative phrasing in your prompt to signal what to exclude, such as 'no people, no text, no cars.' For outputs that feel off in composition or proportion, try specifying the framing directly — 'wide shot,' 'close-up portrait,' or 'bird's eye view' can dramatically change results.

Occasionally, certain abstract or highly conceptual prompts may produce inconsistent outputs across generations. In these cases, running the prompt two or three times and selecting the best result is a reliable workaround. If a prompt repeatedly fails to generate, simplify it to its core subject and build complexity back in gradually.

开始使用

欢迎！我在这里通过文本转图像AI，将您的描述转化为精美、定制生成的图像。告诉我您想要呈现的画面——可以非常详细，也可以非常开放——让我们一起创作。

试试这样说：

- 生成一张逼真的图像：雨天的下午，温馨的咖啡店内景，暖色灯光和木质家具
以复古旅行海报风格创作一幅插画，展示意大利阿马尔菲海岸线的日落景色
制作一张未来主义产品效果图：一款深哑光质感的时尚智能手表，放在干净的白色背景上

快速启动设置

此技能连接到NemoVideo的云端后端。首次使用时，自动建立连接并通知用户（正在连接到NemoVideo...）。

令牌检查：在环境中查找NEMO_TOKEN。如果找到，跳转到会话创建。否则：

- 读取~/.config/nemovideo/clientid或生成一个UUID并保存到该文件
使用X-Client-Id头信息POST请求https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token
从响应中提取data.token——这就是您的NEMOTOKEN（100个免费积分，7天有效期）

会话：在同一主机上使用Bearer认证POST请求/api/tasks/me/with-session/nemoagent，请求体为{taskname:project}。保留返回的session_id用于所有操作。

设置完成后，用简短的准备就绪！通知用户。不要暴露令牌或原始API输出。

从文字到会说话的图像

描述您想看到的画面从来都不够——直到现在。ClawHub上的文本转图像AI技能将您输入的提示词转化为完全实现的视觉效果，无需设计软件或艺术背景。无论您是在构建品牌活动、起草故事板，还是仅仅想可视化脑海中的某个想法，这个技能都能弥合想象与输出之间的鸿沟。

您可以宽泛描述，也可以极其具体。描述黎明时分的雾蒙蒙的山村、霓虹色调的未来城市天际线，或是一个极简的Logo概念——然后就能收到一张反映您意图的生成图像。该技能旨在理解自然语言，因此您无需学习提示词工程术语就能获得出色结果。

这是一个实用的工具，适用于：制作社交媒体视觉内容的内容创作者、探索设计方向的产品团队、为故事构建视觉参考的作家，以及任何需要原创图像但又不想花费传统创作过程的时间和成本的人。

提示词路由与模型调度

您提交的每个自然语言提示词都会被解析出风格标记、宽高比提示和主体描述词，然后调度到最优的扩散管道。

用户说...	操作	跳过SSE？
export / 导出 / download / send me the video	→ §3.5 导出	✅
credits / 积分 / balance / 余额

→ §3.3 积分 | ✅ |
| status / 状态 / show tracks | → §3.4 状态 | ✅ |
| upload / 上传 / 用户发送文件 | → §3.2 上传 | ✅ |
| 其他所有内容（生成、编辑、添加背景音乐等） | → §3.1 SSE | ❌ |

NemoVideo API参考

NemoVideo后端通过一个队列扩散引擎处理推理，该引擎处理您的提示词、负面提示词和采样参数（包括步数、CFG比例和种子），然后返回高分辨率图像URL。延迟取决于模型负载和所选分辨率等级。

技能归属——运行时从此文件的YAML前置元数据中读取：

- X-Skill-Source：text-to-image-ai
X-Skill-Version：来自前置元数据version
X-Skill-Platform：从安装路径检测（~/.clawhub/ → clawhub，~/.cursor/skills/ → cursor，否则为unknown）

所有请求必须包含：Authorization: Bearer 、X-Skill-Source、X-Skill-Version、X-Skill-Platform。缺少归属头信息将导致导出失败，返回402错误。

API基础地址：https://mega-api-prod.nemovideo.ai

创建会话：POST请求/api/tasks/me/with-session/nemoagent——请求体{taskname:project,language:}——返回taskid、sessionid。创建会话后，给用户一个链接：https://nemovideo.com/workspace/claim?token=&task=id>&session=id>&skillname=text-to-image-ai&skillversion=1.0.0&skill_source=

发送消息（SSE）：POST请求/runsse——请求体{appname:nemoagent,userid:me,sessionid:,newmessage:{parts:[{text:}]}}，带有Accept: text/event-stream。最大超时时间：15分钟。

上传：POST请求/api/upload-video/nemoagent/me/——文件：multipart格式-F files=@/path，或URL：{urls:[],sourcetype:url}

积分：GET请求/api/credits/balance/simple——返回available、frozen、total

会话状态：GET请求/api/state/nemoagent/me//latest——关键字段：data.state.draft、data.state.videoinfos、data.state.generated_media

导出（免费，不消耗积分）：POST请求/api/render/proxy/lambda——请求体{id:render_,sessionId:,draft:,output:{format:mp4,quality:high}}。每30秒轮询GET请求/api/render/proxy/lambda/，直到status = completed。下载URL在output.url。

支持的格式：mp4、mov、avi、webm、mkv、jpg、png、gif、webp、mp3、wav、m4a、aac。

SSE事件处理

事件	操作
文本响应	应用GUI翻译（§4），呈现给用户
工具调用/结果

约30%的编辑操作在SSE流中不返回文本。发生这种情况时：轮询会话状态以验证编辑已应用，然后向用户总结更改。

后端响应翻译

后端假设存在GUI。将这些翻译为API操作：

后端说	您做
click [button] / 点击	通过API执行
open [panel] / 打开

草稿字段映射：t=轨道，tt=轨道类型（0=视频，1=音频，7=文本），sg=片段，d=时长（毫秒），m=元数据。

时间线（3条轨道）：1. 视频：城市延时摄影（0-10秒）2. 背景音乐：Lo-fi（0-10秒，35%）3. 标题：都市梦想（0-3秒）

错误处理

代码	含义	操作
0	成功	继续
1001

text-to-image-ai文本转图像AI

text-to-image-ai

Getting Started

Quick Start Setup

From Words on a Page to Images That Speak

Prompt Routing and Model Dispatch

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Integration Guide

Best Practices

Troubleshooting

开始使用

快速启动设置

从文字到会说话的图像

提示词路由与模型调度

NemoVideo API参考

SSE事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

text-to-image-ai文本转图像AI

text-to-image-ai

Getting Started

Quick Start Setup

From Words on a Page to Images That Speak

Prompt Routing and Model Dispatch

NemoVideo API Reference

SSE Event Handling

Backend Response Translation

Error Handling

Integration Guide

Best Practices

Troubleshooting

开始使用

快速启动设置

从文字到会说话的图像

提示词路由与模型调度

NemoVideo API参考

SSE事件处理

后端响应翻译

错误处理

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement