Stella Selfie
Generate persona-consistent selfie images using Google Gemini or fal (xAI Grok Imagine) and send them to messaging channels via OpenClaw. Supports multi-reference avatar blending for strong character consistency.
When to Use
- - User says "send a pic", "send me a photo", "send a selfie", "发张照片", "发自拍"
- User says "show me what you look like...", "send a pic of you...", "展示你在..."
- User describes a scene: "send a pic wearing...", "send a pic at...", "穿着...发张图"
- User wants the agent to appear in a specific outfit, location, or situation
Prompt Modes
Mode 1: Mirror Selfie (default)
Best for: outfit showcases, full-body shots, fashion content
CODEBLOCK0
Mode 2: Direct Selfie
Best for: close-up portraits, location shots, emotional expressions
CODEBLOCK1
Mode 3: Third-Person Photo
Best for: non-selfie viewpoints, including explicit third-person requests and scenes that should not read as a selfie
CODEBLOCK2
Mode Selection Logic
| Signal | Auto-Select Mode |
|---|
| Strong user keywords: outfit, wearing, clothes, dress, suit, fashion | INLINECODE0 |
| Strong user keywords: full-body, mirror, reflection, pose, show the look |
mirror |
| Strong user keywords: selfie, close-up, portrait, face, eyes, smile, looking into the lens |
direct |
| Strong user keywords: third-person, not a selfie, candid shot, 他拍, 路拍, 抓拍 |
third_person |
| Legacy keywords: travel photo, tourist photo, 旅拍, 打卡照, 风景合影 |
third_person |
Default policy:
- - Interpret explicit user requirements first: camera style, outfit emphasis, body framing, scene, pose, and expression.
- Use
mirror by default for outfit / full-body / self-presentation requests, even if the user did not explicitly mention a mirror. - Use
direct by default for selfie requests focused on face, emotion, immediacy, or in-the-moment presence. - Use
third_person only when the user explicitly asks for a non-selfie style or clearly describes a shot that should not read as a selfie.
Default mode when no keywords match and timeline is unavailable: INLINECODE8
Resolution Keywords
| User says | Resolution |
|---|
| (default) | INLINECODE9 |
| 2k, 2048, medium res, 中等分辨率 |
2K |
| 4k, high res, ultra, 超清, 高分辨率 |
4K |
Step-by-Step Instructions
Step 1: Collect User Input
Determine from the user's message:
- - Explicit context (optional): scene, outfit, location, activity — detect from keywords
- Mode (optional):
mirror, direct, or third_person — auto-detect from explicit user intent if not specified - Target channel: Where to send (e.g.,
#general, @username, channel ID) - Channel provider (optional): Which platform (discord, telegram, whatsapp, slack)
- Resolution (optional): 1K / 2K / 4K — default 1K
- Count (optional): How many images — default 1, only increase if explicitly requested
- Has explicit scene?: Does the request contain any specific scene/outfit/location/activity keywords?
Step 2: Enrich with Timeline Context Or Recent Scene Recall
INLINECODE17 is an optional enhancement, not a prerequisite.
- - If
timeline_resolve is unavailable in the current environment, skip this step and proceed with Stella's default behavior. - If the request is a current-state
Sparse prompt — for example "发张自拍", "发张照片", "想看看你", "send a selfie", "send a photo", "show me what you look like" — and timeline_resolve is available, load and follow references/timeline-integration.md. - If the current request clearly refers back to a single recently resolved timeline scene in the current conversation, load and follow
references/timeline-integration.md even if the photo request itself is not Sparse. - If the user already provided a clear standalone scene, outfit, location, activity, or camera requirement and it is not a callback to a recently resolved timeline scene, do not use timeline enhancement. Follow the default policy directly.
- When you do call
timeline_resolve, do not freely rewrite the request into output-slot questions. Use the fixed query rules in references/timeline-integration.md. - Only enable Nano Banana real-world grounding when the prompt can explicitly include a concrete
city plus an exact local date/time anchor from timeline data. If those anchors are missing, do not claim real-world synchronization. - If timeline returns
fact.status === "empty", is missing result.consumption, or any error occurs, immediately fall back to Step 3 without mentioning timeline failure to the user.
Never block image generation on timeline availability. Timeline enrichment is best-effort and should only be used for current-state Sparse prompts or explicit callbacks to a recently resolved timeline scene.
Step 3: Assemble Prompt
Select mode from the default policy first.
If the request is Sparse, and you loaded references/timeline-integration.md and obtained usable timeline context, apply its Sparse-only merge and prompt rules.
When that timeline enrichment includes outdoor real-world grounding, keep the grounding clause as a separate strong instruction sentence rather than a soft atmosphere phrase like Make it feel like....
Otherwise, use the user's explicit context directly and keep Stella's original fallback behavior:
CODEBLOCK3
Step 4: Generate Image
Run the Stella script:
CODEBLOCK4
Step 5: Confirm Result
After the script completes, confirm to the user:
- - Image was generated successfully
- Image was sent to the target channel
- If any error occurred, send a concise actionable failure message
Environment Variables
Stella supports multiple providers and a gateway-backed send path, so its sensitive runtime environment variables
are explicitly declared in metadata.openclaw.requires.env for OpenClaw's env-injection allowlist.
The skill also sets metadata.openclaw.always: true, so these declarations do not become hard load-time gates.
Actual credential validation remains runtime-driven inside skill.js, based on the selected provider.
| Variable | Required | Description |
|---|
| INLINECODE33 | Required (if Provider=gemini) | Google Gemini API key |
| INLINECODE34 |
Required (if Provider=fal) | fal.ai API key |
|
LAOZHANG_API_KEY | Required (if Provider=laozhang) | laozhang.ai API key (
sk-xxx); get it at
api.laozhang.ai |
|
Provider | Optional | Image provider:
gemini,
fal, or
laozhang |
|
AvatarBlendEnabled | Optional | Enable or disable multi-reference avatar blending |
|
AvatarMaxRefs | Optional | Maximum number of reference images to blend |
Credential requirements are provider-specific:
- - Default
Provider=gemini: requires INLINECODE44 - INLINECODE45 : requires INLINECODE46
- INLINECODE47 : requires INLINECODE48
Media File Handling (Gemini)
When Provider=gemini, Stella writes generated files to:
After successful send, Stella deletes the local file immediately. If send fails, the file is kept for debugging.
Skill Environment Options
Configure in your OpenClaw openclaw.json under skills.entries.stella-selfie.env:
| Option | Default | Description |
|---|
| INLINECODE53 | INLINECODE54 | Image provider: gemini, fal, or INLINECODE57 |
| INLINECODE58 |
true | Enable multi-reference avatar blending |
|
AvatarMaxRefs |
3 | Maximum number of reference images to blend |
Note for Provider=fal users: fal's image editing API only accepts HTTP/HTTPS image URLs. Local file paths (from Avatar / AvatarsDir) are not supported. Configure AvatarsURLs in IDENTITY.md with public URLs of your reference images to enable image editing with fal.
Note for Provider=laozhang users: laozhang.ai uses the Google-native Gemini API format (gemini-3-pro-image-preview). It requires local reference images from Avatar / AvatarsDir and does not use AvatarsURLs. Supports 1K/2K/4K resolution and 10 aspect ratios. Get your API key at api.laozhang.ai — remember to configure a billing mode in the token settings before use.
Delivery Path
- - Stella sends via
openclaw message send. - Delivery auth and routing are handled by the local OpenClaw installation, not by skill-level gateway tokens.
External Endpoints And Data Flow
| Endpoint / path | When used | Data sent |
|---|
| Google Gemini API | INLINECODE73 | Prompt text and selected local reference images from Avatar / INLINECODE75 |
| fal API |
Provider=fal | Prompt text and public reference image URLs from
AvatarsURLs |
| laozhang.ai API (
api.laozhang.ai) |
Provider=laozhang | Prompt text and local reference images (
Avatar /
AvatarsDir, uploaded as base64) |
| Local OpenClaw CLI | Always for delivery | Target channel, target id, caption text, and generated media path/URL |
Security And Privacy
- - Stella reads
~/.openclaw/workspace/IDENTITY.md and local avatar files to build reference context. - Under
Provider=gemini, selected local avatar images are uploaded to Gemini as part of normal image generation. - Under
Provider=fal, only public http/https avatar URLs are sent; local avatar files are not uploaded to fal directly. - Under
Provider=laozhang, local avatar files from Avatar / AvatarsDir are base64-encoded and uploaded to laozhang.ai. - Generated files (Gemini and laozhang) are written to
~/.openclaw/workspace/stella-selfie/ and deleted after successful send.
User Configuration
Before using this skill, you must configure your OpenClaw workspace. See templates/SOUL.fragment.md for the recommended capability snippet to add to your SOUL.md.
Required: IDENTITY.md
Add the following fields to ~/.openclaw/workspace/IDENTITY.md:
CODEBLOCK5
- -
Avatar: Path to your primary reference image (relative to workspace root) - INLINECODE94 : Directory containing multiple reference photos of the same character (different styles, scenes, outfits)
- INLINECODE95 : Comma-separated public URLs of reference images — required for
Provider=fal (local files are not supported by fal's API)
Required: avatars/ Directory
Place your reference photos in ~/.openclaw/workspace/avatars/:
- - Use
jpg, jpeg, png, or webp format - All photos should be of the same character
- Different styles, scenes, outfits, and expressions work best
- Images are selected by creation time (newest first)
Required: SOUL.md
Add the Stella capability block to ~/.openclaw/workspace/SOUL.md. See README.md ("4. SOUL.md") for the copy/paste snippet.
Installation
CODEBLOCK6
After installation, complete the configuration steps above before using the skill.
Stella Selfie
使用 Google Gemini 或 fal(xAI Grok Imagine)生成角色一致性自拍图像,并通过 OpenClaw 发送到消息频道。支持多参考头像融合,实现强角色一致性。
使用时机
- - 用户说发张照片、给我发张照片、发自拍、send a pic、send me a photo、send a selfie
- 用户说展示你的样子……、show me what you look like...、send a pic of you...
- 用户描述场景:发张穿……的照片、send a pic wearing...、send a pic at...、穿着……发张图
- 用户希望代理以特定服装、地点或情境出现
提示词模式
模式 1:镜子自拍(默认)
最适合:服装展示、全身照、时尚内容
此人对着镜子自拍,[用户上下文],展示全身倒影。
模式 2:直接自拍
最适合:特写肖像、地点照片、情感表达
此人的自拍照,[用户上下文],直视镜头。
模式 3:第三人称照片
最适合:非自拍视角,包括明确的第三人称请求以及不应被视为自拍的场景
此人的自然第三人称照片,[用户上下文],自然构图,非自拍。
模式选择逻辑
| 信号 | 自动选择模式 |
|---|
| 强用户关键词:outfit、wearing、clothes、dress、suit、fashion | mirror |
| 强用户关键词:full-body、mirror、reflection、pose、show the look |
mirror |
| 强用户关键词:selfie、close-up、portrait、face、eyes、smile、looking into the lens | direct |
| 强用户关键词:third-person、not a selfie、candid shot、他拍、路拍、抓拍 | third_person |
| 遗留关键词:travel photo、tourist photo、旅拍、打卡照、风景合影 | third_person |
默认策略:
- - 首先解读用户的明确要求:拍摄风格、服装重点、身体构图、场景、姿势和表情。
- 对于服装/全身/自我展示类请求,默认使用 mirror,即使用户未明确提及镜子。
- 对于聚焦面部、情感、即时性或当下存在的自拍请求,默认使用 direct。
- 仅当用户明确要求非自拍风格或清楚描述不应被视为自拍的镜头时,才使用 third_person。
当无关键词匹配且时间线不可用时,默认模式:mirror
分辨率关键词
| 用户表述 | 分辨率 |
|---|
| (默认) | 1K |
| 2k、2048、medium res、中等分辨率 |
2K |
| 4k、high res、ultra、超清、高分辨率 | 4K |
分步说明
步骤 1:收集用户输入
从用户消息中确定:
- - 明确上下文(可选):场景、服装、地点、活动——通过关键词检测
- 模式(可选):mirror、direct 或 third_person——如未指定,从用户明确意图自动检测
- 目标频道:发送位置(例如 #general、@username、频道 ID)
- 频道提供商(可选):哪个平台(discord、telegram、whatsapp、slack)
- 分辨率(可选):1K / 2K / 4K——默认为 1K
- 数量(可选):图片数量——默认为 1,仅在明确要求时增加
- 有明确场景?:请求是否包含任何特定场景/服装/地点/活动关键词?
步骤 2:用时间线上下文或近期场景回忆进行丰富
timeline_resolve 是可选的增强功能,非先决条件。
- - 如果当前环境中 timelineresolve 不可用,跳过此步骤,继续执行 Stella 的默认行为。
- 如果请求是当前状态的 Sparse 提示——例如发张自拍、发张照片、想看看你、send a selfie、send a photo、show me what you look like——且 timelineresolve 可用,则加载并遵循 references/timeline-integration.md。
- 如果当前请求明确回溯到当前对话中最近解析的单个时间线场景,即使照片请求本身不是 Sparse,也加载并遵循 references/timeline-integration.md。
- 如果用户已提供清晰的独立场景、服装、地点、活动或相机要求,且不是对最近解析的时间线场景的回调,则不使用时间线增强。直接遵循默认策略。
- 当调用 timeline_resolve 时,不要随意将请求重写为输出槽问题。使用 references/timeline-integration.md 中的固定查询规则。
- 仅当提示能明确包含来自时间线数据的具体 city 加上精确的本地日期/时间锚点时,才启用 Nano Banana 现实世界接地。如果缺少这些锚点,不要声称现实世界同步。
- 如果时间线返回 fact.status === empty、缺少 result.consumption 或发生任何错误,立即回退到步骤 3,不向用户提及时间线失败。
绝不要因时间线可用性而阻塞图像生成。 时间线丰富是尽力而为的,仅应用于当前状态的 Sparse 提示或对最近解析的时间线场景的明确回调。
步骤 3:组装提示词
首先从默认策略中选择模式。
如果请求是 Sparse,且你加载了 references/timeline-integration.md 并获得了可用的时间线上下文,则应用其仅限 Sparse 的合并和提示规则。
当该时间线丰富包含户外现实世界接地时,将接地子句保留为独立的强指令句,而不是像 Make it feel like... 这样的软氛围短语。
否则,直接使用用户的明确上下文,并保留 Stella 的原始回退行为:
[mirror] 此人对着镜子自拍,[用户的明确上下文(如有)],展示全身倒影。
[direct] 此人的自拍照,[用户的明确上下文(如有)],直视镜头。
[third_person] 此人的自然第三人称照片,[用户的明确上下文(如有)],自然构图,非自拍。
步骤 4:生成图像
运行 Stella 脚本:
bash
node {baseDir}/dist/scripts/skill.js \
--prompt <组装后的提示词> \
--target <目标频道> \
--channel <频道提供商> \
--caption <说明文字> \
--resolution <1K|2K|4K> \
--count <数量>
步骤 5:确认结果
脚本完成后,向用户确认:
- - 图像已成功生成
- 图像已发送到目标频道
- 如果发生任何错误,发送简洁的可操作失败消息
环境变量
Stella 支持多个提供商和网关支持的发信路径,因此其敏感的运行时环境变量在 metadata.openclaw.requires.env 中显式声明,用于 OpenClaw 的环境变量注入允许列表。该技能还设置了 metadata.openclaw.always: true,因此这些声明不会成为硬性的加载时门控。实际的凭证验证在 skill.js 中基于所选提供商在运行时进行。
| 变量 | 必需 | 描述 |
|---|
| GEMINIAPIKEY | 必需(如果 Provider=gemini) | Google Gemini API 密钥 |
| FAL_KEY |
必需(如果 Provider=fal) | fal.ai API 密钥 |
| LAOZHANG
APIKEY | 必需(如果 Provider=laozhang) | laozhang.ai API 密钥(sk-xxx);在
api.laozhang.ai 获取 |
| Provider | 可选 | 图像提供商:gemini、fal 或 laozhang |
| AvatarBlendEnabled | 可选 | 启用或禁用多参考头像融合 |
| AvatarMaxRefs | 可选 | 要融合的参考图像最大数量 |
凭证要求因提供商而异:
- - 默认 Provider=gemini:需要 GEMINIAPIKEY
- Provider=fal:需要 FALKEY
- Provider=laozhang:需要 LAOZHANGAPI_KEY
媒体文件处理(Gemini)
当 Provider=gemini 时,Stella 将生成的文件写入:
- - ~/.openclaw/workspace/stella-selfie/
成功发送后,Stella 立即删除本地文件。如果发送失败,文件保留用于调试。
技能环境选项
在 OpenClaw 的 openclaw.json 中,在 skills.entries.stella-selfie.env 下配置:
| 选项 | 默认值 | 描述