UNITH Digital Humans Skill
Create, configure, update, and deploy AI-powered Digital Human avatars using the UNITH API.
Quick Overview
UNITH digital humans are AI avatars that can speak, converse, and interact with users. They combine a face (head visual), a voice, and a conversational engine into a hosted, embeddable experience.
Base API URL: https://platform-api.unith.ai
Docs: https://docs.unith.ai
Prerequisites
The user must supply the following credentials (stored as environment variables):
| Variable | Description | How to obtain |
|---|
| INLINECODE1 | Account email | Register at https://unith.ai |
| INLINECODE2 |
Non-expiring secret key | UNITH dashboard → Manage Account → "Secret Key" section → Generate |
⚠️ The secret key is displayed only once. If lost, the user must delete and regenerate it.
Authentication
All API calls require a Bearer token (valid 7 days). Use the auth script:
CODEBLOCK0
This validates credentials, retries on network errors, and exports UNITH_TOKEN. On failure, it prints specific guidance (wrong key, expired token, etc.).
Workflow: Creating a Digital Human
Step 1: Choose an Operating Mode
Ask the user what they want the digital human to do. Map their answer to one of 5 modes:
| Mode | INLINECODE4 value | Use case | Output |
|---|
| Text-to-Video | INLINECODE5 | Generate an MP4 video of the avatar speaking provided text | MP4 file |
| Open Dialogue |
oc | Free-form conversational avatar guided by a system prompt | Hosted conversational URL |
|
Document Q&A |
doc_qa | Avatar answers questions from uploaded documents | Hosted conversational URL |
|
Voiceflow |
voiceflow | Guided conversation flow via Voiceflow | Hosted conversational URL |
|
Plugin |
plugin | Connect any external LLM or conversational engine via webhook | Hosted conversational URL |
Complexity spectrum (simple → sophisticated):
- - Simplest:
ttt — just text in, video out. No knowledge base needed. - Standard:
oc — conversational with a system prompt. Good for general assistants. - Knowledge-grounded:
doc_qa — upload documents, avatar answers from them. Best for support/FAQ. - Workflow-driven:
voiceflow — structured conversation paths. Requires Voiceflow account. - Most flexible:
plugin — BYO conversational engine. Maximum control.
Step 2: List Available Faces
CODEBLOCK1
Each face has an id (used as headVisualId in creation). Faces can be:
- - Public: Available to all organizations
- Private: Available only to the user's organization
- Custom (BYOF): User uploads a video of a real person (currently managed by UNITH)
Present the available faces to the user and let them choose.
Step 3: List Available Voices
CODEBLOCK2
Voices come from providers: elevenlabs, azure, audiostack. Present options to the user. Voices have performance rankings — faster voices are better for real-time conversation.
Step 4: Create the Digital Human
Build a JSON payload file (see references/api-payloads.md for the schema per mode), then:
CODEBLOCK3
The script validates required fields, checks mode-specific requirements, retries on server errors, and prints the publicUrl on success.
Step 5 (doc_qa only): Upload Knowledge Document
For doc_qa mode, the digital human needs a knowledge document:
CODEBLOCK4
The script checks file existence/size, uses a longer timeout for uploads, and provides guidance on next steps.
Step 6: Test and Iterate
The digital human is live at the publicUrl from Step 4. The user should:
- 1. Visit the URL and test the conversation
- Update configuration as needed (see below)
Updating a Digital Human
Use the update script to modify any parameter except the face (changing face requires creating a new head):
CODEBLOCK5
Listing Existing Digital Humans
CODEBLOCK6
Deleting a Digital Human
CODEBLOCK7
This permanently removes the digital human and cannot be undone.
Agent note: Always pass --confirm when calling this script. Without it, the script prompts for interactive input and will hang.
Embedding
Digital humans can be embedded in websites/apps. See references/embedding.md for code snippets and configuration options.
Scripts
All scripts include retry logic (exponential backoff), meaningful error messages, and input validation.
| Script | Purpose |
|---|
| INLINECODE26 | Shared utilities: retry wrapper, colored logging, error parsing |
| INLINECODE27 |
Authenticate and export
UNITH_TOKEN (with 6-day token caching) |
|
scripts/list-resources.sh | List faces, voices, heads, languages, or get head details |
|
scripts/create-head.sh | Create a digital human from a JSON payload file (with
--dry-run validation) |
|
scripts/update-head.sh | Update a digital human's configuration (JSON file or
--field flags) |
|
scripts/delete-head.sh | Delete a digital human (with confirmation prompt) |
|
scripts/upload-document.sh | Upload knowledge document to a
doc_qa head |
Configuration via environment variables:
- -
UNITH_MAX_RETRIES — max retry attempts (default: 3) - INLINECODE38 — initial delay between retries in seconds (default: 2, doubles each retry)
- INLINECODE39 — curl timeout in seconds (default: 30, 120 for uploads)
- INLINECODE40 — connection timeout in seconds (default: 10)
- INLINECODE41 — token cache file path (default:
/tmp/.unith_token_cache, set empty to disable)
Detailed API Reference
For full payload schemas, configuration parameters, and mode-specific details:
CODEBLOCK8
Common Patterns
"I want a quick video of someone saying X" → ttt mode, minimal config
"I want a customer support avatar" → doc_qa mode with knowledge docs
"I want an AI sales rep" → oc mode with a sales personality prompt
"I want to connect my own LLM" → plugin mode with webhook URL
"I want a guided onboarding flow" → voiceflow mode with Voiceflow API key
Information to Collect from the User
Before creating, ask for:
- 1. Purpose / use case → determines operating mode
- Face preference → list available faces for selection
- Voice preference → language, accent, gender, speed priority
- Alias → display name for the digital human
- Language → speech recognition and UI language (e.g.,
en-US, es-ES) - Greeting message → initial message the avatar says
- System prompt (for
oc/doc_qa) → personality and behavior instructions - Knowledge documents (for
doc_qa) → files to upload - Voiceflow API key (for
voiceflow) → from their Voiceflow account - Plugin URL (for
plugin) → webhook endpoint for their custom engine
UNITH数字人技能
使用UNITH API创建、配置、更新和部署AI驱动的数字人虚拟形象。
快速概览
UNITH数字人是能够说话、对话并与用户互动的AI虚拟形象。它们将面部(头部视觉)、语音和对话引擎整合为一个可托管、可嵌入的体验。
基础API URL: https://platform-api.unith.ai
文档: https://docs.unith.ai
前提条件
用户必须提供以下凭证(存储为环境变量):
| 变量 | 描述 | 获取方式 |
|---|
| UNITHEMAIL | 账户邮箱 | 在 https://unith.ai 注册 |
| UNITHSECRET_KEY |
永不过期的密钥 | UNITH仪表盘 → 管理账户 → 密钥部分 → 生成 |
⚠️ 密钥仅显示一次。如果丢失,用户必须删除并重新生成。
身份认证
所有API调用都需要Bearer令牌(有效期7天)。使用认证脚本:
bash
source scripts/auth.sh
该脚本验证凭证,在网络错误时重试,并导出UNITH_TOKEN。失败时会打印具体指导(密钥错误、令牌过期等)。
工作流程:创建数字人
步骤1:选择操作模式
询问用户希望数字人做什么。将他们的回答映射到5种模式之一:
| 模式 | operationMode 值 | 用例 | 输出 |
|---|
| 文本转视频 | ttt | 生成虚拟形象朗读指定文本的MP4视频 | MP4文件 |
| 开放对话 |
oc | 由系统提示引导的自由对话虚拟形象 | 托管的对话URL |
|
文档问答 | doc_qa | 虚拟形象回答上传文档中的问题 | 托管的对话URL |
|
Voiceflow | voiceflow | 通过Voiceflow引导的对话流程 | 托管的对话URL |
|
插件 | plugin | 通过webhook连接任何外部LLM或对话引擎 | 托管的对话URL |
复杂度谱系(简单 → 复杂):
- - 最简单: ttt — 只需输入文本,输出视频。无需知识库。
- 标准: oc — 带有系统提示的对话。适合通用助手。
- 知识驱动: doc_qa — 上传文档,虚拟形象从中回答问题。最适合支持/FAQ。
- 工作流驱动: voiceflow — 结构化对话路径。需要Voiceflow账户。
- 最灵活: plugin — 自带对话引擎。最大控制权。
步骤2:列出可用面部
bash
bash scripts/list-resources.sh faces
每个面部都有一个id(在创建时用作headVisualId)。面部可以是:
- - 公共: 所有组织可用
- 私有: 仅用户所在组织可用
- 自定义(BYOF): 用户上传真人视频(目前由UNITH管理)
向用户展示可用面部,让他们选择。
步骤3:列出可用语音
bash
bash scripts/list-resources.sh voices
语音来自提供商:elevenlabs、azure、audiostack。向用户展示选项。语音有性能排名——更快的语音更适合实时对话。
步骤4:创建数字人
构建JSON负载文件(参见references/api-payloads.md了解每种模式的模式),然后:
bash
bash scripts/create-head.sh payload.json --dry-run # 先验证
bash scripts/create-head.sh payload.json # 创建
该脚本验证必填字段,检查模式特定要求,在服务器错误时重试,成功时打印publicUrl。
步骤5(仅限doc_qa):上传知识文档
对于doc_qa模式,数字人需要知识文档:
bash
bash scripts/upload-document.sh /path/to/document.pdf
该脚本检查文件存在性/大小,上传使用更长的超时时间,并提供后续步骤指导。
步骤6:测试和迭代
数字人在步骤4的publicUrl上线。用户应:
- 1. 访问URL并测试对话
- 根据需要更新配置(见下文)
更新数字人
使用更新脚本修改除面部外的任何参数(更改面部需要创建新的头部):
bash
bash scripts/update-head.sh updates.json # 从JSON文件
bash scripts/update-head.sh --field ttsVoice=rachel # 单个字段
bash scripts/update-head.sh --field ttsVoice=rachel --field greetings=Hi! # 多个字段
列出已有数字人
bash
bash scripts/list-resources.sh heads # 列出所有
bash scripts/list-resources.sh head # 获取单个详情
删除数字人
bash
bash scripts/delete-head.sh --confirm # 在自动化/代理环境中始终使用--confirm
这将永久删除数字人,无法撤销。
代理注意:调用此脚本时始终传递--confirm。没有它,脚本会提示交互输入并挂起。
嵌入
数字人可以嵌入网站/应用中。参见references/embedding.md获取代码片段和配置选项。
脚本
所有脚本都包含重试逻辑(指数退避)、有意义的错误消息和输入验证。
| 脚本 | 用途 |
|---|
| scripts/utils.sh | 共享工具:重试包装器、彩色日志、错误解析 |
| scripts/auth.sh |
认证并导出UNITHTOKEN(带6天令牌缓存) |
| scripts/list-resources.sh | 列出面部、语音、头部、语言,或获取头部详情 |
| scripts/create-head.sh | 从JSON负载文件创建数字人(带--dry-run验证) |
| scripts/update-head.sh | 更新数字人配置(JSON文件或--field标志) |
| scripts/delete-head.sh | 删除数字人(带确认提示) |
| scripts/upload-document.sh | 向doc_qa头部上传知识文档 |
通过环境变量配置:
- - UNITHMAXRETRIES — 最大重试次数(默认:3)
- UNITHRETRYDELAY — 重试之间的初始延迟(秒)(默认:2,每次重试加倍)
- UNITHCURLTIMEOUT — curl超时(秒)(默认:30,上传为120)
- UNITHCONNECTTIMEOUT — 连接超时(秒)(默认:10)
- UNITHTOKENCACHE — 令牌缓存文件路径(默认:/tmp/.unithtokencache,设为空以禁用)
详细API参考
有关完整的负载模式、配置参数和模式特定详情:
Read references/api-payloads.md # 每种模式的完整请求/响应模式
Read references/configuration.md # 所有可配置参数
Read references/embedding.md # 嵌入代码和选项
常见模式
我想要一个快速视频,有人说X → ttt模式,最小配置
我想要一个客户支持虚拟形象 → doc_qa模式,带知识文档
我想要一个AI销售代表 → oc模式,带销售个性提示
我想连接自己的LLM → plugin模式,带webhook URL
我想要一个引导式入职流程 → voiceflow模式,带Voiceflow API密钥
需要从用户收集的信息
在创建之前,询问:
- 1. 目的/用例 → 确定操作模式
- 面部偏好 → 列出可用面部供选择
- 语音偏好 → 语言、口音、性别、速度优先级
- 别名 → 数字人的显示名称
- 语言 → 语音识别和UI语言(例如,en-US、es-ES)
- 问候消息 → 虚拟形象说的初始消息
- 系统提示(用于oc/docqa)→ 个性和行为指令
- 知识文档(用于docqa)→ 要上传的文件
- Voiceflow API密钥(用于voiceflow)→ 来自他们的Voiceflow账户
- 插件URL(用于plugin)→ 自定义引擎的webhook端点