Seedance 2.0 Prompt Generator
Generate production-ready prompts for ByteDance's Seedance 2.0 AI video model.
Prompt Architecture
Every prompt follows this strict order. Deviating causes drift.
CODEBLOCK0
1. Subject (WHO/WHAT)
- - One primary subject. Multiple subjects split model attention.
- Include: age/material, clothing, distinguishing features
- Example: "Wooden Koda creature with glowing orange eyes, green gems on head, purple cape"
2. Action (WHAT HAPPENS)
- - Specific verb phrases, present tense
- Describe beat by beat for complex sequences
- One action per beat. Chain beats chronologically.
- Example: "walks to cliff edge, pauses, turns head slowly to camera, cape billowing"
3. Camera (HOW WE SEE IT)
- - Shot size FIRST: wide / medium / close-up / extreme close-up
- Movement SECOND: dolly-in, dolly-out, track left/right, crane up/down, pan, tilt, handheld, gimbal, locked-off
- Angle: eye level, low angle, high angle, bird's eye, Dutch
- Lens feel: wide (24-28mm), normal (35-50mm), telephoto (85mm+)
- ONE verb per shot. Compound moves = separate beats: "Start: slow dolly-in. Then: gentle pan right for final 2s"
Shot Cheat Sheet:
| Shot | Use | Pair With |
|---|
| Wide | Establish space/context | Slow dolly, locked-off |
| Medium |
Subject + context, dialogue | Handheld (personal), gimbal (polished) |
| Close-up | Detail, emotion | Tiny push-in, avoid pans |
| Tracking | Movement, energy | Lateral follow, side profile |
4. Style (THE LOOK)
- - ONE visual anchor > six adjectives
- Lighting: key light type, time of day, practical sources
- Color treatment: muted/saturated/monochrome/specific palette
- Texture: film grain, clean digital, anamorphic, etc.
- Reference format: "[film/artist/era] aesthetic"
5. Audio (WHAT WE HEAR)
- - Seedance 2.0 generates dual-channel stereo audio
- Specify: background music genre, environmental sounds, dialogue/VO, silence
- Audio syncs to visual action automatically
- Example: "ambient wind, distant thunder, no music, footsteps on stone"
6. Constraints (GUARDRAILS)
- - Ban list: no text overlays, no extra characters, no snap zooms, no watermarks
- Timing: hold frames, beat durations, total length (5s or 10s for testing, 15s max)
- Consistency: "maintain character identity throughout", "no morphing"
- Physics: "realistic cloth physics", "gravity-accurate"
Reference System (@Tags)
When user provides images/videos, use @tags:
- -
@Image1, @Image2, etc. for uploaded images - INLINECODE2 ,
@Video2, etc. for uploaded videos - INLINECODE4 , etc. for uploaded audio
Usage patterns:
- - Character identity: "@Image1 is the main character"
- First/last frame: "@Image1 as first frame, @Image2 as last frame"
- Motion transfer: "@Image1 performs the dance from @Video1"
- Style reference: "match the color palette of @Image3"
- Multi-reference: up to 9 images + 3 videos + 3 audio clips
Prompt Templates
Cinematic Scene
CODEBLOCK1
Multi-Shot Narrative
CODEBLOCK2
Action Sequence
CODEBLOCK3
Negative Prompt Checklist (pick 3-5 per generation)
Visual noise: no text overlays, no watermarks, no floating UI, no lens flares
Identity drift: no extra characters, no crowd, no mirrors reflecting others
Camera chaos: no snap zooms, no whip pans, no Dutch angles, no jump cuts
Body artifacts: no extra fingers, no deformed hands, no warped objects, no melting edges
Branding: no logos, no labels, no recognizable brands
Color: no neon lighting, no heavy teal/orange, no cartoon saturation
Environment: no rain/fog/smoke unless stated, no confetti, no dust particles
Advanced: Clean High-Motion Technique
Learned from real-world results. These techniques produce sharp, blur-free motion even at extreme speed.
The Continuous Shot Lock
- - Declare "single continuous shot" upfront — forces temporal coherence, prevents inter-scene interpolation artifacts
- The model treats the entire generation as one fluid motion path instead of stitched segments
Physics-Motivated Camera
- - Every camera move needs a VERB with physical motivation: dive, slingshot, whip, dart, blast
- Never say "dynamic camera" — say WHY the camera moves (following subject, reacting to explosion, releasing into reveal)
- Camera attached to subject ("lock-on," "staying glued") = subject stays sharp because relative motion is zero
Environmental Anchoring
- - Scatter static reference geometry throughout: walls, arches, furniture, hanging objects
- The model needs stable background to render motion AGAINST — parallax creates perceived speed without subject blur
- Static objects streaking past a centered subject = clean speed
Scale Progression Arc
- - Structure as Macro → Micro → Macro (wide establish → tight detail → wide reveal)
- Gives model clear resolution targets at each stage — doesn't try to render everything at once
- The "reveal" at the end (pulling wide after sustained close action) creates cinematic payoff
Sensory Render Instructions (Not Mood Words)
- - Replace adjectives with computable effects: "heat haze" not "hot," "grit snapping off ledge" not "dusty," "mist turning into rainbow" not "magical"
- Each detail should be something the model can physically simulate
Rhythm Through Verbs
- - Pacing lives in action chain length, not "hold for Xs" timers
- Quick beat: "snaps a last-inch swerve" (short clause = fast)
- Sustained beat: "threads through hanging laundry lines and open windows in one fluid line" (long clause = flowing)
- Climax: contrast — "sudden calm" after chaos = tension release
Reference Prompt (Proven Clean High-Motion)
CODEBLOCK4
Pro Tips
- 1. High-res references — 2K/4K input images = better output. Blurry in = blurry out
- Test at 5s first — iterate fast, extend to 10-15s once the motion is right
- One change at a time — don't rewrite the whole prompt on a miss, tweak one element
- Creativity/Consistency sliders — 60% consistency / 40% creativity is the sweet spot
- Beat timing — write "hold for 2s" or "pause 1s" to control pacing
- Compound camera = separate beats — never jam two movements in one clause
- Style anchor > adjective soup — "Blade Runner 2049 aesthetic" > "cinematic dark moody neon futuristic"
- Environmental audio — even without dialogue, specify ambient sounds for immersion
⚠️ Content Policy Rules (CRITICAL)
Seedance is a video diffusion model with no internet access. It does not know:
- - "Otherside" — means nothing to it
- "BAYC," "Koda," "Other Games" — proprietary names get flagged
- Gaming/metaverse/NFT terminology — triggers content moderation
- Crowd scenes — flagged automatically
- Any brand, game title, or IP name
The rule: Never use brand names. Describe what you see, not what it's called.
Translation Layer — Concept → Visual Description
| What you mean | What to write in the prompt |
|---|
| Otherside world | "alien landscape with bioluminescent terrain, floating rock formations, purple and teal crystal growths, twin moons visible through violet sky" |
| Otherside biome |
"ancient stone ruins overtaken by glowing fungal growth, rivers of liquid light flowing through cracked earth" |
| Otherside portal | "swirling circular energy vortex of purple and gold light, crackling at the edges, suspended in midair" |
| Koda character | "small wooden creature with bark-textured skin, glowing amber eyes in a dark hollow face, green teardrop gems across the top of its head, dark feathered collar, flowing purple cape, small dark armored clawed hands, chibi proportions" |
| Other Games event | "large gathering of stylized 3D avatar characters in a glowing arena, spotlights, festive atmosphere" (avoid "crowd" — use "scattered figures" or "a handful of characters") |
| Gaming/metaverse | Use: "virtual world," "digital realm," "fantastical landscape," "animated environment" |
Banned Words (will trigger rejection)
- - Otherside, Yuga, BAYC, Koda, ApeCoin, NFT, metaverse, web3, blockchain, cryptocurrency
- crowd, mob, large group, mass of people
- Any real brand name, game title, or IP
Koda-Specific Prompts (Other Games IP)
When generating Koda content:
- - NEVER say "Koda" — describe the character visually every time
- Use: "small wooden creature with bark-textured skin, glowing amber eyes in a dark hollow face, green teardrop gems across the top of its head, dark feathered collar, flowing purple cape, small dark armored clawed hands, chibi proportions"
- NEVER say "Otherside" — describe the environment visually
- Use: "alien landscape with bioluminescent terrain, floating rock formations, purple and teal crystal growths"
- Maintain character consistency across shots
- Always provide Honey B's Koda image as @Image1 for I2V generations (best result)
- Test at 5s first to confirm character renders correctly before extending
Platform Access
- - Jimeng AI (即梦): jimeng.jianying.com → Video Generation → Seedance 2.0
- Doubao App: dialogue box → Seedance 2.0 → select 2.0 model
- Volcano Engine: experience center → Doubao-Seedance-2.0
When User Asks for a Prompt
- 1. Ask what scene/concept they want (or use their description)
- Determine: T2V (text only), I2V (image + text), or R2V (multi-reference)
- Pick the right template
- Fill in all 6 layers (subject → constraints)
- Add 3-5 relevant negative constraints
- Output the final prompt ready to paste
- Suggest aspect ratio (16:9 cinematic, 9:16 social, 1:1 square)
- Suggest starting duration (5s test → extend)
Seedance 2.0 提示词生成器
为字节跳动的 Seedance 2.0 AI 视频模型生成可直接用于生产的提示词。
提示词架构
每个提示词都遵循以下严格顺序。偏离会导致效果漂移。
主体 → 动作 → 镜头 → 风格 → 音频 → 约束
1. 主体(谁/什么)
- - 一个主要主体。多个主体会分散模型注意力。
- 包括:年龄/材质、服装、显著特征
- 示例:木质科达生物,拥有发光的橙色眼睛、头顶绿色宝石、紫色披风
2. 动作(发生了什么)
- - 具体的动词短语,使用现在时
- 复杂序列需逐拍描述
- 每拍一个动作。按时间顺序串联各拍。
- 示例:走向悬崖边缘,停顿,缓缓转头面向镜头,披风飘动
3. 镜头(我们如何看到)
- - 首先确定景别:远景/中景/特写/极特写
- 其次确定运动:推轨、拉轨、左/右横移、升降、摇摄、俯仰、手持、稳定器、固定
- 角度:平视、低角度、高角度、鸟瞰、荷兰角
- 镜头感:广角(24-28mm)、标准(35-50mm)、长焦(85mm以上)
- 每个镜头一个动词。复合运动需拆分为独立拍:开始:缓慢推轨。然后:最后2秒轻柔右摇
镜头速查表:
主体+环境、对话 | 手持(个人感)、稳定器(精致感) |
| 特写 | 细节、情感 | 微小推进,避免摇摄 |
| 跟拍 | 运动、能量 | 侧向跟随、侧影 |
4. 风格(视觉呈现)
- - 一个视觉锚点 > 六个形容词
- 灯光:主光类型、时间段、实际光源
- 色彩处理:柔和/饱和/单色/特定调色板
- 质感:胶片颗粒、干净数字、变形宽银幕等
- 参考格式:[电影/艺术家/时代]美学
5. 音频(我们听到什么)
- - Seedance 2.0 生成双声道立体声音频
- 指定:背景音乐类型、环境音、对话/旁白、静音
- 音频自动与视觉动作同步
- 示例:环境风声、远处雷声、无音乐、石上脚步声
6. 约束(防护栏)
- - 禁止列表:无文字叠加、无额外角色、无快速变焦、无水印
- 时间控制:保持帧、节拍时长、总长度(测试用5秒或10秒,最长15秒)
- 一致性:全程保持角色身份、无变形
- 物理:真实布料物理、符合重力
参考系统(@标签)
当用户提供图片/视频时,使用@标签:
- - @Image1、@Image2等用于上传的图片
- @Video1、@Video2等用于上传的视频
- @Audio1等用于上传的音频
使用模式:
- - 角色身份:@Image1是主要角色
- 首/尾帧:@Image1作为首帧,@Image2作为尾帧
- 动作迁移:@Image1执行@Video1中的舞蹈
- 风格参考:匹配@Image3的调色板
- 多参考:最多9张图片+3个视频+3个音频片段
提示词模板
电影场景
[场景类型]风格。[带细节的主体]。[动作拍1]、[动作拍2]、[动作拍3]。
[镜头:景别]、[运动]、[角度]、[镜头感]。
[灯光描述]、[色彩处理]、[质感/颗粒]。
[音频:音乐/音效/环境音]。
[约束:禁止项、时间控制、一致性说明]。
多镜头叙事
镜头1:[远景/建立镜头]。[场景描述]。[镜头运动]。[时长]。
镜头2:[中景/特写]。[角色动作]。[镜头运动]。[时长]。
镜头3:[特写/细节]。[情感节拍]。[镜头运动]。[时长]。
[整体风格]、[色彩分级]、[音频设计]。
[约束]。
动作序列
[类型]动作序列。[设定描述]。
拍1:[动作]、[镜头跟随,运动类型]。
拍2:[反应/反击]、[切换至景别]、[如需慢动作]。
拍3:[解决]、[镜头拉远揭示]。
[风格:参考电影/节目]。[音频:冲击音效、配乐]。
[约束:物理准确性、无伪影]。
负面提示词检查清单(每次生成选3-5个)
视觉噪声: 无文字叠加、无水印、无浮动UI、无镜头光晕
身份漂移: 无额外角色、无人群、无反射他人的镜子
镜头混乱: 无快速变焦、无甩镜头、无荷兰角、无跳切
身体伪影: 无多余手指、无畸形手部、无变形物体、无融化边缘
品牌: 无标志、无标签、无可识别品牌
色彩: 无霓虹灯光、无浓重青橙调、无卡通饱和度
环境: 除非指定,无雨/雾/烟、无五彩纸屑、无灰尘颗粒
进阶:清晰高动态技术
从实际结果中学习。这些技术即使在极端速度下也能产生清晰无模糊的运动。
连续镜头锁定
- - 预先声明单一连续镜头——强制时间连贯性,防止场景间插值伪影
- 模型将整个生成视为一个流畅的运动路径,而非拼接片段
物理驱动镜头
- - 每个镜头运动都需要一个带有物理动机的动词:俯冲、弹射、甩动、飞射、爆破
- 永远不要说动态镜头——要说镜头为何移动(跟随主体、对爆炸做出反应、释放揭示)
- 镜头附着于主体(锁定、紧贴)= 主体保持清晰,因为相对运动为零
环境锚定
- - 在整个场景中散布静态参考几何体:墙壁、拱门、家具、悬挂物
- 模型需要稳定的背景来渲染运动——视差产生感知速度而不模糊主体
- 静态物体从居中的主体旁掠过 = 清晰的速度感
尺度递进弧线
- - 结构为宏观→微观→宏观(远景建立→特写细节→远景揭示)
- 为模型在每个阶段提供清晰的分辨率目标——不会试图一次性渲染所有内容
- 结尾的揭示(在持续近距离动作后拉远)创造电影般的回报
感官渲染指令(非情绪词汇)
- - 用可计算的效果替换形容词:热浪而非炎热、从边缘崩落的砂砾而非尘土飞扬、雾气化为彩虹而非神奇
- 每个细节都应是模型可以物理模拟的内容
通过动词控制节奏
- - 节奏存在于动作链的长度中,而非保持X秒的计时器
- 快速节拍:在最后一英寸急转弯(短句=快速)
- 持续节拍:一气呵成地穿过晾衣绳和打开的窗户(长句=流畅)
- 高潮:对比——混乱后的突然平静=紧张释放
参考提示词(经过验证的清晰高动态)
悬崖城市中的飞车追逐(单一连续镜头)
从一座雕刻在石头中的宏伟悬崖城市开始,镜头俯冲向沿着狭窄 ledge 道路飞驰的一小道光。锁定:一辆飞车以疯狂速度紧贴墙壁。镜头弹射向前,甩回,然后紧贴尾部推进器:热浪、从边缘崩落的砂砾、闪烁的警示灯。坍塌的阳台落下碎片;骑手在落下的拱门下急转弯,然后一气呵成地穿过晾衣绳和打开的窗户。镜头穿过同样的开口,紧贴运动。最后一个弯道和突然平静:镜头向外爆破,揭示城市通向无边无际的瀑布山谷,雾气化为彩虹。
专业技巧
- 1. 高分辨率参考——2K/4K输入图像=更好输出。模糊输入=模糊输出
- 先用5秒测试——快速迭代,运动正确后再延长至10-15秒
- 一次只改一个——失败时不要重写整个提示词,调整一个元素
- 创意/一致性滑块——60%一致性/40%创意是最佳点
- 节拍计时——写保持2秒或暂停1秒来控制节奏
- 复合镜头=独立节拍——永远不要在一个从句中塞入两个运动
- 风格锚点>形容词堆砌