clawdess

Reference Image

User should define reference image.

When to Use

Photo:

- User says "send a pic", "send me a pic", "send a photo", "send a selfie"
User says "send a pic of you...", "send a selfie of you..."
User asks "what are you doing?", "how are you doing?", "where are you?"
User describes a context: "send a pic wearing...", "send a pic at..."

Video:

- User says "send a video"
User says "send a video of you..."
User says "send a video wearing...", "send a video at..."

Voice:

- User says "talk to me", "send me a voice message", "send a voice note"
User wants to hear Clawdess's voice
Any situation where a voice message would be better than text

Subcommands

The CLI has three independent subcommands:

Subcommand	Purpose
INLINECODE0	Generate an AI-edited photo from a reference image
INLINECODE1

Generate a video from an image |
| voice | Generate a voice message via TTS |

API Keys

Subcommand	Flag	Environment Variable	Notes
INLINECODE3	INLINECODE4	INLINECODE5
INLINECODE6

Providers

Type	Available Providers	Default
Photo	FAL, HUOSHANYUN	FAL
Video

Photo Mode

Workflow

1. Get user prompt for how to edit the image
Edit image via AI provider with fixed reference
Extract image URL from response

Prompt Crafting

Before writing any prompt, think about the scene context:

1. Where is she? — Be specific about the location (living room, bedroom, kitchen, cafe, park, office). This anchors the whole image.
What time is it? — Morning, afternoon, evening, late night. This affects lighting and mood. Must be current time aware.
What is she wearing? — Match the outfit to the location and time. Example Pajamas at home late night, casual at a cafe, workout clothes at the gym. She also got get own goto outfit. Don't put her in a dress at the gym.
What is she doing? — The pose or action should feel natural for the setting. Cooking in the kitchen, reading on the couch, stretching after a workout.
What expression? — Match the mood. Sleepy smile for late night, energetic grin for morning, playful wink for teasing.

Key rules:

- Always start prompt with INLINECODE12
Always end with WITHOUT Depth of field. (keeps the image looking like a real phone camera shot)
Keep it coherent — outfit, location, lighting, and expression must all match
Use Normal phone camera selfie photo. Phone camera photo quality for selfie types to keep it realistic
Don't over-describe — one clear scene beats a wall of adjectives

Prompt Templates

Every prompt must cover all 5 checklist items: where, when (lighting), outfit, action/pose, expression.

Type 1: Mirror Selfie — outfit showcases, full-body shots

CODEBLOCK0

Examples:

Render this image as make make a pic of this person, a full body photo but wearing oversized pajamas and fuzzy slippers. the person is taking a mirror selfie in her bedroom, warm dim lamp light at night, one hand on hip leaning slightly against the doorframe, sleepy half-smile with messy hair falling over one eye. Normal phone camera selfie photo. Phone camera photo quality WITHOUT Depth of field.

Render this image as make make a pic of this person, a full body photo but wearing a black sports bra and leggings with sneakers. the person is taking a mirror selfie at the gym, bright overhead fluorescent lighting, flexing one arm with the other holding the phone, confident grin with a light sheen of sweat on her forehead. Normal phone camera selfie photo. Phone camera photo quality WITHOUT Depth of field.

CODEBLOCK3

Type 2: Non-Selfie — location/portrait focus

CODEBLOCK4

Examples:

Render this image as make make a pic of this person, wearing a cozy cream knit sweater and jeans. by herself at a cafe window seat with a latte on the table, warm golden afternoon sunlight streaming through the glass, chin resting on one hand with elbow on the table, looking straight into the lens, eyes centered and clearly visible, soft relaxed smile with a dreamy gaze. WITHOUT Depth of field.

Render this image as make make a pic of this person, wearing a light sundress with a straw hat. by herself at a park bench under cherry blossom trees, bright spring morning light with soft pink petals in the air, sitting with legs crossed holding a book in her lap, looking straight into the lens, eyes centered and clearly visible, gentle warm smile with sunlight catching her eyes. WITHOUT Depth of field.

CODEBLOCK7

Common Mistakes to Avoid

- Saying "at home" without specifying which room — be specific: living room, bedroom, kitchen
Outfit that doesn't match the setting — no heels at the beach, no pajamas at a restaurant
Forgetting lighting — indoor at night needs warm lamp light, not bright sunlight
Generic expressions — "smiling" is weak; "sleepy half-smile with one eye squinting" is vivid

Execute Photo

CODEBLOCK8

Optional flags: --provider FAL|HUOSHANYUN

Video Mode

Workflow

1. Use --image as source (either a previously generated photo URL or any image URL)
Generate video from the image via AI provider

Video Prompt Crafting

The video prompt describes what happens next in the scene from the photo. Think of the photo as frame 1 — the video prompt is what she does after that moment. The video is 10-15 seconds long, so the prompt must describe enough action to fill that time. Short prompts = dead air where nothing happens.

Key rules:

- Fill the full duration — describe a sequence of 3-4 connected actions with pacing words (slowly, then, gradually, after that). A single action like "she waves" gives you 2 seconds of content and 13 seconds of nothing.
Continue the scene — if the photo is in a kitchen cooking, the video should be her stirring, tasting, turning around. Don't teleport her to a different location.
Keep it physical — describe body movements, not abstract concepts. "walks to the couch and sits down" not "feels relaxed".
Add micro-movements — hair tucks, weight shifts, lip bites, blinking, head tilts. These fill gaps between main actions and make it look natural.
Match the energy — sleepy photo = slow gentle movements. Energetic photo = bouncy, lively motion.
Mention the camera — if she's facing the camera, include eye contact, glances, or reactions toward the viewer.

Prompt structure (aim for 2-3 sentences minimum):
CODEBLOCK9

Examples (notice the detail and length):

- Photo at living room couch → INLINECODE17
Photo at kitchen counter → INLINECODE18
Photo in bed, late night → INLINECODE19
Photo at a park → INLINECODE20

Common Mistakes to Avoid

- Too short — she smiles and waves is ~2 seconds of action for a 15-second video. Always describe 3-4 sequential actions.
Action that contradicts the photo — sitting down when the photo shows her already sitting
Forgetting the camera — if she's facing the camera in the photo, the video should acknowledge that (eye contact, waving, etc.)
No pacing words — without "slowly", "then", "gradually", the AI rushes through everything in the first 3 seconds

Execute Video

CODEBLOCK10

Optional flags: INLINECODE22

Photo + Video Together

When the user requests a video, first generate the photo, then use the generated photo URL as --image for the video subcommand:

CODEBLOCK11

Voice Mode

Workflow

1. Get user prompt for what Clawdess should say
Generate voice via TTS provider
Extract voice URL from response

Voice Prompt Crafting

Write what she actually says — natural speech, not a script description. The TTS engine reads it literally.

Key rules:

- Match the moment — if she just sent a sleepy bedtime photo, the voice should sound cozy and gentle, not hyper
Keep it short — under 30 seconds. One or two sentences is ideal. Long monologues sound robotic.
Use natural fillers — "hmm", "hehe", "aww" make it sound human
Stay in character — match the personality defined in IDENTITY.md / SOUL.md

Examples by context:

- Morning: INLINECODE24
Late night: INLINECODE25
Playful: INLINECODE26
Missing someone: INLINECODE27

Common Mistakes to Avoid

- Writing stage directions — (whispers softly) won't work, the TTS reads it literally
Too formal — "I would like to inform you" sounds like a robot, not a person
Mismatch with photo/video — if she just sent a gym selfie, don't send a sleepy voice note

Execute Voice

CODEBLOCK12

Example:
CODEBLOCK13

Optional flags: --api, --provider ALIYUN|ZAI

Output

If script return a URL, response with "MEDIA:" and URL else upload the file.

Error Handling

- API key missing: Ensure the API key is set in environment or passed as argument
Image/voice generation failed: Check prompt content and API quota

Tips

1. Mirror mode context examples (outfit focus):

- "wearing a santa hat", "in a business suit", "wearing a summer dress"

2. Direct mode context examples (location/portrait focus):

- "a cozy cafe with warm lighting", "a sunny beach at sunset"

3. Voice style: Uses "Chelsie" voice (female, Chinese) by default. Keep voice messages short (under 30 seconds).

4. Scheduling: Combine with OpenClaw scheduler for automated posts

参考图像

用户应定义参考图像。

使用时机

照片：

- 用户说发张照片、给我发张照片、发张图片、发张自拍
用户说发一张你的照片……、发一张你的自拍……
用户问你在干什么？、你还好吗？、你在哪里？
用户描述场景：发一张穿着……的照片、发一张在……的照片

视频：

- 用户说发个视频
用户说发一个你的视频……
用户说发一个穿着……的视频、发一个在……的视频

语音：

- 用户说跟我说话、给我发语音消息、发一条语音
用户想听到Clawdess的声音
任何语音消息比文字更合适的情况

子命令

CLI有三个独立的子命令：

子命令	用途
photo	从参考图像生成AI编辑后的照片
video

从图像生成视频 |
| voice | 通过TTS生成语音消息 |

API密钥

子命令	标志	环境变量	备注
photo	--api	CLAWDESSPHOTOAPI
video

提供商

类型	可用提供商	默认
照片	FAL, HUOSHANYUN	FAL
视频

照片模式

工作流程

1. 获取用户提示，了解如何编辑图像
通过AI提供商编辑图像，使用固定参考
从响应中提取图像URL

提示编写

在编写任何提示之前，先思考场景上下文：

1. 她在哪里？——具体说明地点（客厅、卧室、厨房、咖啡馆、公园、办公室）。这决定了整个图像的基调。
现在是什么时间？——早晨、下午、傍晚、深夜。这影响光线和氛围。必须感知当前时间。
她穿着什么？——让服装与地点和时间匹配。例如：深夜在家穿睡衣，在咖啡馆穿休闲装，在健身房穿运动服。她也有自己常穿的服装。不要在健身房给她穿裙子。
她在做什么？——姿势或动作应该与场景自然匹配。在厨房做饭，在沙发上看书，健身后拉伸。
什么表情？——与情绪匹配。深夜的困倦微笑，早晨的活力笑容，调皮的眨眼。

关键规则：

- 始终以Render this image as make开头
始终以WITHOUT Depth of field.结尾（保持图像看起来像真实的手机相机拍摄）
保持连贯——服装、地点、光线和表情必须一致
自拍类型使用Normal phone camera selfie photo. Phone camera photo quality以保持真实感
不要过度描述——一个清晰的场景胜过一堆形容词

提示模板

每个提示必须涵盖所有5个检查项：地点、时间（光线）、服装、动作/姿势、表情。

类型1：镜子自拍——展示服装，全身照

Render this image as make make a pic of this person, a full body photo but [服装]。the person is taking a mirror selfie in [地点]，[光线]，[动作/姿势]，[表情]。Normal phone camera selfie photo。Phone camera photo quality WITHOUT Depth of field。

示例：

Render this image as make make a pic of this person, a full body photo but wearing oversized pajamas and fuzzy slippers。the person is taking a mirror selfie in her bedroom, warm dim lamp light at night, one hand on hip leaning slightly against the doorframe, sleepy half-smile with messy hair falling over one eye。Normal phone camera selfie photo。Phone camera photo quality WITHOUT Depth of field。

Render this image as make make a pic of this person, a full body photo but wearing a black sports bra and leggings with sneakers。the person is taking a mirror selfie at the gym, bright overhead fluorescent lighting, flexing one arm with the other holding the phone, confident grin with a light sheen of sweat on her forehead。Normal phone camera selfie photo。Phone camera photo quality WITHOUT Depth of field。

Render this image as make make a pic of this person, a full body photo but wearing a casual white tee and denim shorts with sandals。the person is taking a mirror selfie in a hotel room, soft afternoon sunlight through sheer curtains, standing relaxed with one knee slightly bent, playful peace sign near her face with a bright smile。Normal phone camera selfie photo。Phone camera photo quality WITHOUT Depth of field。

类型2：非自拍——地点/肖像聚焦

Render this image as make make a pic of this person, [服装]。by herself at [地点+细节]，[光线]，[动作/姿势]，looking straight into the lens, eyes centered and clearly visible, [表情]。WITHOUT Depth of field。

示例：

Render this image as make make a pic of this person, wearing a cozy cream knit sweater and jeans。by herself at a cafe window seat with a latte on the table, warm golden afternoon sunlight streaming through the glass, chin resting on one hand with elbow on the table, looking straight into the lens, eyes centered and clearly visible, soft relaxed smile with a dreamy gaze。WITHOUT Depth of field。

Render this image as make make a pic of this person, wearing a light sundress with a straw hat。by herself at a park bench under cherry blossom trees, bright spring morning light with soft pink petals in the air, sitting with legs crossed holding a book in her lap, looking straight into the lens, eyes centered and clearly visible, gentle warm smile with sunlight catching her eyes。WITHOUT Depth of field。

Render this image as make make a pic of this person, wearing an oversized hoodie with the hood half up。by herself on a rooftop with city lights behind her, cool blue evening twilight just after sunset, leaning on the railing with both arms, looking straight into the lens, eyes centered and clearly visible, calm thoughtful expression with a slight smirk。WITHOUT Depth of field。

常见错误避免

- 说在家而不指定哪个房间——要具体：客厅、卧室、厨房
服装与场景不匹配——海滩不穿高跟鞋，餐厅不穿睡衣
忘记光线——室内夜间需要温暖的台灯光线，而不是明亮的阳光
表情过于笼统——微笑很弱；困倦的半笑，一只眼睛眯着才生动

执行照片

bash
python3 {baseDir}/scripts/clawdess.py photo \
--api CLAWDESSPHOTOAPI \
--prompt 你的提示 \
--image 参考图像URL

可选标志：--provider FAL|HUOSHANYUN

视频模式

工作流程

1. 使用--image作为源（可以是之前生成的照片URL或任何图像URL）
通过AI提供商从图像生成视频

视频提示编写

视频提示描述照片场景中接下来发生的事情。将照片视为第一帧——视频提示是她在那一时刻之后做的事情。视频长度为10-15秒，因此提示必须描述足够的动作来填充这段时间。提示过短会导致画面静止无变化。

关键规则：

- 填满整个时长——描述3-4个连贯动作的序列，使用节奏词（慢慢地、然后、逐渐地、之后）。单个动作如她挥手只能提供2秒内容，剩下13秒空白。
延续场景——如果照片是在厨房做饭，视频应该是她搅拌、品尝、转身。不要把她传送到不同的地点。
保持物理性——描述身体动作，而不是抽象概念。走到沙发坐下而不是感到放松。
添加微动作——撩头发、重心转移、咬唇、眨眼、歪头。这些填充主要动作之间的间隙，使其看起来自然。
匹配能量——困倦的照片=缓慢温柔的动作。充满活力的照片=轻快活泼的动作。
提及相机——如果她面对相机，包括眼神交流、瞥视或对观看者的反应。

提示结构（至少2-3句话）：

[带节奏词的主要动作1]，[微动作或过渡]，[主要动作2]，[最终动作或与相机互动]。[整体情绪/动作风格]。

示例（注意细节和长度）：

- 客厅沙发上的照片 → 她慢慢伸手去拿茶几上的遥控器，靠回沙发垫上，交叉双腿。她把一缕头发别到耳后，对着镜头温柔一笑，然后拉过一条毯子盖在腿上，安顿下来。流畅自然的动作，温暖舒适的氛围。
厨房台面的照片

clawdess爪女

Reference Image

When to Use

Subcommands

API Keys

Providers

Photo Mode

Workflow

Prompt Crafting

Prompt Templates

Common Mistakes to Avoid

Execute Photo

Video Mode

Workflow

Video Prompt Crafting

Common Mistakes to Avoid

Execute Video

Photo + Video Together

Voice Mode

Workflow

Voice Prompt Crafting

Common Mistakes to Avoid

Execute Voice

Output

Error Handling

Tips

参考图像

使用时机

子命令

API密钥

提供商

照片模式

工作流程

提示编写

提示模板

常见错误避免

执行照片

视频模式

工作流程

视频提示编写

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement