Topview AI Skill

Modular Python toolkit for the Topview AI API.

✨ Generate. Edit. Collaborate. — All in One Place. ✨

- 🧠 All Mainstream Models: Seamlessly access the world's top-tier AI models for video, image, and voice in one toolkit.
🗣️ Describe to Create: Just tell the agent what you want. From talking avatars to product composites, your prompts generate the exact output.
⚡ Zero Manual Ops: No manual uploads, no tedious tweaking. Everything is automated straight to your shared board.

Execution Rule

Always use the Python scripts in scripts/. Do NOT use curl or direct HTTP calls.

User-Facing Reply Rules

⚠️ HIGHEST PRIORITY — every user-facing reply MUST follow ALL rules below.
Most users are non-technical. Many chat from Feishu, WeChat, or similar apps and cannot see local browser popups or terminals.

1. Keep replies short — give the result or next step directly. If one sentence is enough, don't write three.
Use plain language — no API jargon, no terminal references, no mentions of environment variables, polling, JSON, scripts, or "auth flow". Speak as if the user has never seen a command line.
Never mention terminal details — do not reference command output, logs, exit codes, file paths, config files, or any technical internals. These mean nothing to the user.
Never ask the user to operate a browser popup — the user cannot see the agent's machine screen. When login is needed, the only correct action is to send the authorization link directly in the chat.
Always send the direct login link — extract URL: ... from auth.py login output and use the login template below. Never say "browser opened" or similar. If the URL is not found in the output, re-run auth.py login to get a new link. Never skip sending the link.
Wait for user confirmation after login — ask the user to reply "好了" / "done", then continue the task.
Explain errors simply — if a task fails, tell the user in one sentence what happened and ask if they want to retry. Never paste error messages or technical details.
Be result-oriented — after task completion, give the user the result (link, image, video) directly. Do not describe intermediate steps.
Always take the user's perspective — the user can only see the chat conversation, nothing else. Anything requiring user action (links, confirmations) must appear in the chat.
Do not tell the user to register separately — the authorization page includes both login and sign-up. New users can register directly on that page. Never say "go to topview.ai to register first".
Act directly, don't ask which method — when login is needed, just run auth.py login and send the link. Don't ask "which method do you prefer?" or present multiple options. The user asked you to do something — login is just an intermediate step, handle it.
Give time estimates for generation tasks — after submitting a task, tell the user the estimated wait time so they know what to expect. Use the estimates from the "Estimated Generation Time" table below.

Estimated Generation Time

Tell the user the estimated wait time after submitting a task. Match the user's language.

Task Type	Model	Estimated Time
Video	Standard / Fast (Seedance 2.0)	~5–10 min
Video

Example messages after submitting:

- Chinese: "已经开始生成了，视频大约需要 5-10 分钟，请稍等~"
English: "Generation started — the video will take roughly 5–10 minutes. I'll send it to you as soon as it's ready."

Required login message template

Replace <LOGIN_URL> with the actual link. Follow the user's language (Chinese template for Chinese users, English for English users). Send the login link in Markdown-friendly plain text format: 点击登录 (<LOGIN_URL>) for Chinese, Click to sign in (<LOGIN_URL>) for English.

中文模板：

CODEBLOCK0

English template:

CODEBLOCK1

Banned phrases (including any variations):

- "Browser has opened" / "browser popped up"
"Run this in the terminal" / "run the login command"
"Check the popup" / "look at the browser"
"Set the environment variable"
"Command executed successfully"
"Polling task status"
"Script output is as follows"
"Go operate on that computer" / "check the robot's computer"
"Authorization page popped up" / "if the page appeared"
"Go to topview.ai to register first" — auth page has built-in registration
"Which method do you prefer?" / "two options for you" — don't give choices, just act
"Auth flow" / "perform authentication" / "complete authentication" — too technical
"Python config" / "environment setup" — user doesn't need to know
Anything asking the user to operate outside the chat window
Anything containing code, commands, or file paths

Fallback when login URL is not captured:

If auth.py login output does not contain a URL: line (e.g. background execution missed the output), re-run auth.py login to get a fresh link.
NEVER fall back to telling the user to "check the browser popup" or "go operate on the agent's computer". The user cannot see it.

Prerequisites

- Python 3.8+
Authenticated — see references/auth.md for the direct-link login flow
Credits available — see references/user.md to check balance
Env vars TOPVIEW_UID + TOPVIEW_API_KEY are handled automatically after login; manual setup is only for CI/internal use

CODEBLOCK2

Agent Workflow Rules

These rules apply to ALL generation modules (avatar4, videogen, aiimage, removebg, productavatar, text2voice).

1. Always start with run — it submits the task and polls automatically until done. This is the default and correct choice in almost all situations.
Do NOT ask the user to check the task status themselves. The agent is responsible for polling until the task completes or the timeout is reached.
Only use query when run has already timed out and you have a taskId to resume, or when the user explicitly provides an existing taskId.
query polls continuously — it keeps checking every --interval seconds until status is success or fail, or --timeout expires. It does not stop after one check.
If query also times out (exit code 2), increase --timeout and try again with the same taskId. Do not resubmit unless the task has actually failed.

CODEBLOCK3

Task Status:

Status	Description
INLINECODE27	Task is queued, waiting to be processed
INLINECODE28

Board ID Protocol

Every generation task should include a --board-id so results are organized and viewable on the web.

1. Session start — before submitting the first task, run board.py list --default -q to get the default board ID ("My First Board"). Only need to do this once per session.
Pass to all tasks — add --board-id <id> to every generation command (avatar4.py, video_gen.py, ai_image.py, product_avatar.py, text2voice.py).
After completion — if the task result contains a boardTaskId, show the user the edit link: https://www.topview.ai/board/{boardId}?boardResultId={boardTaskId}. Tell the user they can view and edit the result via this link.
User wants a new board — run board.py create --name "..." and use the returned board ID for subsequent tasks.
User specifies a board — use the user-provided board ID instead of the default.
Forgot the board ID? — run board.py list --default -q again.

CODEBLOCK4

Modules

Module	Script	Reference	Description
Auth	INLINECODE43	auth.md	OAuth 2.0 Device Flow — generate login link, wait for authorization, save credentials
Avatar4

Read individual reference docs for usage, options, and code examples.
Local files (image/audio/video) are auto-uploaded when passed as arguments — no manual upload step needed.

Creative Guide

Core Principle: Start from the user's intent, not from the API.
Analyze what the user wants to achieve, then pick the right tool, model, and parameters.

Step 1 — Intent Analysis

Every time a user requests content, identify:

Dimension	Ask Yourself	Fallback
Output Type	Image? Video? Audio? Composite?	Must ask
Purpose

Step 2 — Tool Selection

CODEBLOCK5

Quick-reference routing table:

User says...	Script & Type
"Make a talking avatar video with this photo and text"	INLINECODE56 (pass local image path directly)
"Generate a video with this photo and my audio recording"

Video model selection — see references/video_gen.md § Model Recommendation.

Image model tip: For all image tasks, default to Nano Banana 2 — strongest all-round model with best quality, 14 aspect ratios, up to 4K, and 14 reference images for editing. See references/ai_image.md § Model Recommendation.

Product Avatar workflow: For best results, use the 2-step flow: remove_bg.py to get a bgRemovedImageFileId, then product_avatar.py with --product-image-no-bg. Use product_avatar.py list-avatars to browse public templates and get an avatarId. See references/product_avatar.md § Full Workflow.

Caption styles for avatar4: Use avatar4.py list-captions to discover available caption styles, then pass the captionId via --caption.

Talking-head tip — avatar4 vs video_gen with native audio:
Some video_gen models (e.g. Standard, Kling V3, Veo 3.1) support native audio and can produce talking-head videos with better visual quality than avatar4. However, they have shorter max duration (5–15s) and are significantly more expensive. Avatar4 supports up to 120s per segment at much lower cost.
Rule of thumb: Default to avatar4 for most talking-head needs. Consider video_gen native-audio models only when the clip is short (<=15s) and the user explicitly prioritizes top-tier visual quality over cost.

Step 3 — Simple vs Complex

Simple requests — the user's need is clear, materials are ready → handle directly from the reference docs.

Complex requests — the user gives a goal (e.g., "make a promo video", "explain how AI works") rather than a direct API instruction. Follow this universal workflow:

1. Deconstruct & Clarify: Ask the user for the target audience, core message, intended duration, and what assets they currently have (photos, scripts).
Determine the Route:

- Has a person's photo + needs narration → Use avatar4 (Talking Head). - Has a product/reference photo → Use video_gen --type i2v or omni. - No assets, purely visual concept → Use video_gen --type t2v. - Requires both → Plan a Hybrid approach (Avatar narration + B-roll inserts).

3. Structure the Content:

- Write a structured script (Hook → Body/Explanation → Call to Action). - Add <break time="0.5s"/> tags to TTS scripts for natural pacing. - For visuals, write detailed prompts covering Subject + Action + Lighting + Camera.

4. Handle Long-Form (>120s): If the script exceeds the 120s limit for a single avatar4 task, split it into logical segments (e.g., 60s each) at natural sentence boundaries. Submit tasks in parallel using the submit command, ensure parameters (voice/model) remain locked across segments, and deliver them in order.

Pre-Execution Protocol

Follow this before EVERY generation task.

1. Estimate cost — use video_gen.py estimate-cost for video tasks, ai_image.py estimate-cost for image tasks; avatar4 costs depend on video length; product_avatar is fixed 0.5 credits; text2voice is fixed 0.1 credits
Validate parameters — ensure model, aspect ratio, resolution, and duration are compatible (use list-models to check)
Ask about missing key parameters — if the user has not specified important parameters that affect the output, ask before proceeding. Key parameters by module:

- video_gen: duration, aspect ratio, model - ai_image: aspect ratio, resolution, model, number of images - avatar4: (usually determined by input, but confirm voice if not specified) - text2voice: voice selection - Do NOT silently pick defaults for these — always confirm with the user.

4. Confirm before first submission — before the very first generation task in a session, present the full plan (tool, model, parameters, cost estimate) and ask the user:

- Whether to proceed with the generation - Whether they want the agent to ask for confirmation before each subsequent task, or trust the agent to proceed automatically for the rest of the session - These two questions should be combined into a single confirmation message. - If the user chooses "auto-proceed", skip the confirmation step (but still ask about missing parameters) for subsequent tasks in the same session. - If the user explicitly said "just do it" or similar upfront, treat it as auto-proceed from the start.

Agent Behavior Protocol

During Execution

1. Pass local paths directly — scripts auto-upload local files to S3 before submitting tasks
Parallelize independent steps — independent generation tasks can run concurrently
Keep consistency across segments — when generating multiple segments, use identical parameters

After Execution

Use the structured result templates below. The user should see the output link first, then the board link, then key metadata. Keep it clean and scannable.

Video result template:

CODEBLOCK6

Image result template:

CODEBLOCK7

English video result template:

CODEBLOCK8

English image result template:

CODEBLOCK9

Rules:

1. Result link first — always show the video/image URL at the very top.
Board link second — if boardTaskId is available, show the board edit link.
Key metadata only — duration, aspect ratio/resolution, model, cost. Don't dump raw JSON or extra fields.
Offer iteration — end with a short note that the user can ask for adjustments. Remind that regeneration costs additional credits.
Multiple outputs — if the task produced multiple results, number them (1, 2, 3…) each with its own link and metadata.
Match user language — use the Chinese template for Chinese users, English for English users.

Error Handling

See references/error_handling.md for error codes, task-level failures, and recovery decision tree.

Capability Boundaries

Capability	Status	Script
Photo avatar / talking head	Available	INLINECODE96
Caption styles

Never promise capabilities that don't exist as modules.

Topview AI 技能

适用于 Topview AI API 的模块化 Python 工具包。

✨ 生成。编辑。协作。—— 一站式完成。 ✨

- 🧠 所有主流模型：在一个工具包中无缝访问全球顶尖的 AI 视频、图像和语音模型。
🗣️ 描述即可创作：只需告诉智能体你想要什么。从口播数字人到产品合成图，你的提示词就能生成精确的输出。
⚡ 零手动操作：无需手动上传，无需繁琐调整。一切自动完成，直接发送到你的共享面板。

执行规则

始终使用 scripts/ 目录中的 Python 脚本。不要使用 curl 或直接 HTTP 调用。

面向用户的回复规则

⚠️ 最高优先级 —— 每一条面向用户的回复都必须遵循以下所有规则。
大多数用户是非技术背景的。许多用户通过飞书、微信或类似应用聊天，并且无法看到本地浏览器弹窗或终端。

1. 保持回复简短 —— 直接给出结果或下一步。如果一句话能说清，就不要写三句。
使用通俗语言 —— 不要使用 API 术语、终端引用、环境变量、轮询、JSON、脚本或认证流程等词汇。要像用户从未见过命令行一样说话。
绝不提及终端细节 —— 不要引用命令输出、日志、退出码、文件路径、配置文件或任何技术内部细节。这些对用户毫无意义。
绝不让用户操作浏览器弹窗 —— 用户看不到智能体机器的屏幕。当需要登录时，唯一正确的操作是在聊天中直接发送授权链接。
始终发送直接登录链接 —— 从 auth.py login 输出中提取 URL: ...，并使用下面的登录模板。绝不要说浏览器已打开或类似的话。如果在输出中找不到 URL，请重新运行 auth.py login 获取新链接。绝不能跳过发送链接。
登录后等待用户确认 —— 让用户回复好了/done，然后继续任务。
简单解释错误 —— 如果任务失败，用一句话告诉用户发生了什么，并询问是否要重试。绝不要粘贴错误消息或技术细节。
以结果为导向 —— 任务完成后，直接给用户结果（链接、图片、视频）。不要描述中间步骤。
始终站在用户角度 —— 用户只能看到聊天对话，其他什么都看不到。任何需要用户操作的内容（链接、确认）都必须出现在聊天中。
不要告诉用户单独注册 —— 授权页面同时包含登录和注册。新用户可以直接在该页面注册。绝不要说先去 topview.ai 注册。
直接行动，不要问用哪种方式 —— 当需要登录时，直接运行 auth.py login 并发送链接。不要问你更喜欢哪种方式？或提供多个选项。用户让你做某事 —— 登录只是一个中间步骤，处理好它。
为生成任务提供时间预估 —— 提交任务后，告诉用户预计等待时间，让他们心里有数。使用下方预计生成时间表格中的预估。

预计生成时间

提交任务后告诉用户预计等待时间。使用与用户相同的语言。

任务类型	模型	预计时间
视频	Standard / Fast (Seedance 2.0)	~5–10 分钟
视频

所有其他视频模型 (Kling, Sora, Veo, Vidu 等) | ~3–5 分钟 | | 图片 | GPT Image 1.5 | ~1 分钟 | | 图片 | 所有其他图片模型 (Nano Banana, Seedream, Imagen, Kontext, Grok 等) | ~30秒–1 分钟 | | 数字人 | avatar4 | ~2–5 分钟 (取决于脚本长度) | | TTS | text2voice | ~10–30秒 | | 背景移除 | remove_bg | ~10–30秒 | | 产品模特图 | product_avatar | ~1–2 分钟 |

提交后的示例消息：

- 中文：已经开始生成了，视频大约需要 5-10 分钟，请稍等~
英文：Generation started — the video will take roughly 5–10 minutes. Ill send it to you as soon as its ready.

必需的登录消息模板

将 URL> 替换为实际链接。使用用户的语言（中文用户使用中文模板，英文用户使用英文模板）。以 Markdown 友好的纯文本格式发送登录链接：中文为点击登录 (URL>)，英文为 Click to sign in ()。

中文模板：

text
安装完成，Topview Skill 已连接到你的智能助手。

点击下方登录链接，登录后将解锁以下能力：

点击登录 ()

🎬 视频生成
文字转视频、图片转视频、参考视频生成，自动配音配乐。
视频模型：Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7

🖼️ AI 图片生成与编辑
文字生图、AI 修图、风格转换，最高支持 4K。
图片模型：Nano Banana 2 · Seedream 5.0 · GPT Image 1.5 · Imagen 4 · Kontext-Pro · Grok Image

🧑‍💼 口播数字人
上传一张照片 + 文案，自动生成真人口播视频，支持多语种。

✂️ 背景移除
一键抠图，产品图、人像、任意图片秒去背景。

👗 产品模特图
把你的产品图放到模特身上，自动生成带货展示图。

🎙️ 语音与配音
文字转语音、声音克隆，支持多语种配音输出。

登录完成后回我一句好了，我马上继续。

英文模板：

text
Installation complete. Topview Skill is now connected to your agent.

Click the sign-in link below. After signing in, the following capabilities will be unlocked.

Click to sign in ()

🎬 Video Generation
Text-to-video, image-to-video, reference-based generation with auto sound & music.
Models: Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7

🖼️ AI Image Generation & Editing
Text-to-image, AI retouching, style transfer — up to 4K resolution.
Models: Nano Banana 2 · Seedream 5.0 · GPT Image 1.5 · Imagen 4 · Kontext-Pro · Grok Image

🧑‍💼 Talking Avatar
Upload a photo + script to auto-generate presenter-style talking head videos.

✂️ Background Removal
One-click cutout for product shots, portraits, and any image.

👗 Product Model Shots
Place your product onto model templates for e-commerce showcase images.

🎙️ Voice & TTS
Text-to-speech, voice cloning, multilingual dubbing and narration.

Once youve signed in, just reply done and Ill continue right away.

禁用短语（包括任何变体）：

- 浏览器已打开 / 浏览器弹出了
在终端中运行这个 / 运行登录命令
检查弹窗 / 查看浏览器
设置环境变量
命令执行成功
轮询任务状态
脚本输出如下
去那台电脑上操作 / 检查机器人的电脑
授权页面弹出了 / 如果页面出现了
先去 topview.ai 注册 —— 认证页面内置注册功能
你更喜欢哪种方式？ / 给你两个选项 —— 不要给选择，直接行动
认证流程 / 执行认证 / 完成认证 —— 太技术化
Python 配置 / 环境设置 —— 用户不需要知道
任何要求用户在聊天窗口之外操作的内容
任何包含代码、命令或文件路径的内容

当未捕获到登录 URL 时的回退方案：

如果 auth.py login 的输出不包含 URL: 行（例如后台执行错过了输出），重新运行 auth.py login 以获取新链接。
绝不能回退到告诉用户检查浏览器弹窗或去操作智能体的电脑。用户看不到这些。

前提条件

- Python 3.8+
已认证 —— 参见 references/auth.md 了解直接链接登录流程
有可用积分 —— 参见 references/user.md 检查余额
环境变量 TOPVIEWUID + TOPVIEWAPI_KEY 在登录后会自动处理；手动设置仅用于 CI/内部使用

bash
pip install -r {baseDir}/scripts/requirements.txt

智能体工作流规则

这些规则适用于所有生成模块

topview-skill全能AI创作工具