Virtual Try-On Effect Scorer (虚拟试穿效果打分)
Evaluate the quality of AI-generated virtual try-on results by comparing source person,
target garment, and the generated output. Produce structured per-dimension scores with
brief explanations, then aggregate into a weighted total score.
Input Recognition
Users may provide images in two ways. Correctly identifying which format you're dealing
with is critical — getting this wrong invalidates the entire evaluation.
Format A: Three Separate Images
The user uploads three distinct images:
- 1. Source Person Image — the original photo of the person before try-on
- Target Garment Image — the reference clothing item to be tried on
- Try-On Result Image — the AI-generated output showing the person wearing the target garment
When three images are provided, ask the user to clarify which is which if it's not
obvious from context or filenames. Common naming patterns: "person/model/source",
"cloth/garment/target", "result/output/generated".
Format B: Single Concatenated Image
The user uploads one image that contains all three photos stitched together (side by side,
or in a grid layout). In this case:
- 1. Identify the panel layout (horizontal strip, vertical strip, 2×2 grid, etc.)
- Determine which panel is which by visual cues:
- The
source person panel shows a person in their original outfit (different from the target garment)
- The
target garment panel typically shows clothing on a flat lay, mannequin, or a different model
- The
try-on result panel shows the same person from the source wearing the target garment
- 3. If the layout is ambiguous, describe what you see in each panel and ask the user to confirm
Key distinction signals:
- - If one panel shows an isolated clothing item (no person or a mannequin), that's the garment reference
- If two panels show the same person but in different clothes, the one matching the garment reference is the result
- Pay attention to background consistency between panels — the source person and result often share the same background
Evaluation Dimensions
Score each dimension on a 0–100 scale. The dimensions are listed in order of importance,
which also determines their weight in the final score.
Dimension 1: Face Identity Preservation (Weight: 40%)
This is the single most important criterion. The person in the try-on result must be
recognizably the same individual as in the source image. The face is the primary carrier
of identity — if the face changes, the try-on is fundamentally broken regardless of
how good everything else looks.
What to examine:
- - Facial structure: jawline, cheekbones, face shape
- Key facial features: eyes, nose, mouth, eyebrows
- Skin tone and complexion
- Facial expression (should be similar or naturally plausible)
- Hairstyle and hair color at the boundary with the face
- Accessories on the face (glasses, earrings, piercings)
Scoring guide:
- - 90–100: Face is virtually identical; the person is immediately recognizable
- 70–89: Minor differences exist but identity is clearly preserved; would pass as the same person
- 50–69: Noticeable changes to facial features; identity is questionable
- 30–49: Significant facial distortion or identity shift; hard to confirm same person
- 0–29: Face is severely altered, blurred, or unrecognizable
Dimension 2: Garment Fidelity & Fit (Weight: 30%)
The clothing in the result should faithfully reproduce the target garment's visual
characteristics and fit naturally on the person's body. This matters because the whole
point of virtual try-on is to show how a specific garment looks on a specific person.
What to examine:
- - Color accuracy: hue, saturation, brightness matching the reference
- Pattern/print fidelity: logos, stripes, patterns, text should be preserved
- Garment structure: collar style, sleeve type, hem shape, buttons, zippers
- Fabric texture: material appearance should match the reference
- Fit quality: the garment should drape naturally on the person's body
- Proportions: garment proportions should be appropriate for the person's body size
- Boundary quality: clean edges where garment meets skin/other clothing
Scoring guide:
- - 90–100: Garment is pixel-perfect in appearance and fits naturally
- 70–89: Minor color shifts or small pattern distortions; fit looks natural
- 50–69: Noticeable garment differences; some fit issues (floating/clipping)
- 30–49: Significant garment deviations; poor fit or unnatural draping
- 0–29: Garment is barely recognizable or severely distorted
Dimension 3: Non-Face Body Identity Preservation (Weight: 20%)
Beyond the face, other body characteristics should remain consistent between the source
and result. This reinforces the overall sense that this is genuinely the same person,
not a face-swap on a different body.
What to examine:
- - Body shape and proportions (build, height impression, body type)
- Skin tone on visible body parts (arms, hands, neck, legs)
- Hands: finger count, pose, natural appearance
- Tattoos, scars, or other visible body markings
- Jewelry and accessories not on the face (watches, bracelets, rings)
- Parts of the outfit that shouldn't change (pants if only top is swapped, shoes, etc.)
- Hair (length, style, color) in areas away from the face
Scoring guide:
- - 90–100: Body characteristics are perfectly preserved
- 70–89: Minor inconsistencies but overall body identity is intact
- 50–69: Some body parts look different or have artifacts
- 30–49: Noticeable body shape changes or significant artifacts
- 0–29: Severe body distortion or completely different body characteristics
Dimension 4: Background Preservation (Weight: 10%)
The background should remain stable between the source person image and the try-on
result. Background changes are distracting and reduce the realism of try-on.
What to examine:
- - Scene consistency: same environment, objects, spatial layout
- Color and lighting: consistent tones and illumination
- Artifacts: blurring, ghosting, or warping around the person's silhouette
- Object integrity: furniture, walls, patterns should be unaltered
- Edge blending: smooth transition between person and background
Scoring guide:
- - 90–100: Background is identical or virtually indistinguishable
- 70–89: Very minor changes; need close inspection to notice
- 50–69: Some noticeable background alterations or artifacts
- 30–49: Significant background changes or heavy artifacts
- 0–29: Background is severely altered or replaced
Output Format
Present the evaluation in this exact structure:
CODEBLOCK0
Evaluation Philosophy
The scoring should be honest and calibrated. A few guiding principles:
- - Don't grade on a curve. A score of 95 should mean genuinely excellent quality,
not just "better than average." Reserve scores above 90 for results that would fool
a careful human observer.
- - Weight the dimensions as specified. Face identity (40%) dominates because a
face change means the try-on has failed its core purpose — showing what
this person
would look like in
that outfit. A technically perfect garment transfer with a
different face is worthless.
- - Be specific in feedback. Don't just say "looks good" or "has issues." Point to
concrete observations: "the nose bridge appears slightly narrower" or "the striped
pattern on the left sleeve is distorted."
- - Consider the use case. Virtual try-on is a practical tool — users want to know
if they'd look good in a piece of clothing before buying it. Evaluate from that
perspective: would this result help someone make a confident purchase decision?
Edge Cases
- - If garment type doesn't match (e.g., the source wears a t-shirt and the result
shows a completely different category like a dress), note this but still evaluate
the result against the target garment reference.
- - If the image quality is very low, note the limitation and explain that the scores
might not be fully reliable due to resolution constraints.
- - If the user only provides two images (missing one of the three), ask which one
is missing and whether they can provide it. If they can't provide the garment
reference, you can still evaluate face/body/background but should caveat the
garment score.
- - If the try-on only changes part of the outfit (e.g., only the top), only evaluate
the changed portion for garment fidelity, and note what was preserved from the original.
虚拟试穿效果打分
通过比较原始人物、目标服装和生成输出,评估AI生成的虚拟试穿结果质量。生成结构化的分维度评分并附简要说明,然后汇总为加权总分。
输入识别
用户可能通过两种方式提供图片。正确识别所处理的格式至关重要——识别错误会使整个评估无效。
格式A:三张独立图片
用户上传三张不同的图片:
- 1. 原始人物图片——试穿前人物的原始照片
- 目标服装图片——要试穿的参考服装
- 试穿结果图片——AI生成的输出,显示人物穿着目标服装
当提供三张图片时,如果从上下文或文件名中无法明显区分,请要求用户说明哪张是哪张。常见命名模式:person/model/source、cloth/garment/target、result/output/generated。
格式B:单张拼接图片
用户上传一张包含所有三张照片拼接在一起的图片(并排或网格布局)。在这种情况下:
- 1. 识别面板布局(水平条、垂直条、2×2网格等)
- 通过视觉线索确定哪个面板对应什么:
-
原始人物面板显示穿着原始服装的人物(与目标服装不同)
-
目标服装面板通常显示平铺、人台或不同模特身上的服装
-
试穿结果面板显示原始人物穿着目标服装
- 3. 如果布局不明确,描述每个面板中看到的内容并要求用户确认
关键区分信号:
- - 如果一个面板显示孤立的服装物品(没有人或人台),那就是服装参考
- 如果两个面板显示同一个人但穿着不同服装,与服装参考匹配的那个是结果
- 注意面板之间的背景一致性——原始人物和结果通常共享相同的背景
评估维度
在0–100的尺度上对每个维度进行评分。维度按重要性顺序列出,这也决定了它们在最终分数中的权重。
维度1:人脸身份保持(权重:40%)
这是唯一最重要的标准。试穿结果中的人物必须可识别为与原始图像中相同的个体。人脸是身份的主要载体——如果人脸发生变化,无论其他方面看起来多好,试穿从根本上就是失败的。
检查内容:
- - 面部结构:下颌线、颧骨、脸型
- 关键面部特征:眼睛、鼻子、嘴巴、眉毛
- 肤色和 complexion
- 面部表情(应相似或自然合理)
- 与面部交界处的发型和发色
- 面部配饰(眼镜、耳环、穿孔)
评分指南:
- - 90–100:面部几乎相同;人物可立即识别
- 70–89:存在微小差异但身份清晰保持;可视为同一个人
- 50–69:面部特征有明显变化;身份存疑
- 30–49:面部严重变形或身份转变;难以确认是同一人
- 0–29:面部严重改变、模糊或无法识别
维度2:服装还原与贴合(权重:30%)
结果中的服装应忠实再现目标服装的视觉特征,并自然贴合人物身体。这一点很重要,因为虚拟试穿的整个目的是展示特定服装在特定人物身上的效果。
检查内容:
- - 颜色准确性:色调、饱和度、亮度与参考匹配
- 图案/印花保真度:标志、条纹、图案、文字应保持
- 服装结构:领型、袖型、下摆形状、纽扣、拉链
- 面料纹理:材料外观应与参考匹配
- 贴合质量:服装应自然垂坠在人物身体上
- 比例:服装比例应适合人物的体型
- 边界质量:服装与皮肤/其他服装接触的边缘干净
评分指南:
- - 90–100:服装外观像素完美,贴合自然
- 70–89:轻微色差或小图案变形;贴合看起来自然
- 50–69:服装有明显差异;存在一些贴合问题(漂浮/裁剪)
- 30–49:服装偏差显著;贴合差或垂坠不自然
- 0–29:服装几乎无法识别或严重变形
维度3:非人脸身体特征保持(权重:20%)
除了面部之外,其他身体特征应在原始图像和结果之间保持一致。这强化了整体感觉,即这确实是同一个人,而不是在不同身体上的换脸。
检查内容:
- - 体型和比例(体格、身高印象、体型)
- 可见身体部位的肤色(手臂、手、脖子、腿)
- 手部:手指数量、姿势、自然外观
- 纹身、疤痕或其他可见的身体标记
- 非面部珠宝和配饰(手表、手链、戒指)
- 不应改变的服装部分(如果只更换上衣,则裤子、鞋子等)
- 远离面部的头发(长度、样式、颜色)
评分指南:
- - 90–100:身体特征完美保持
- 70–89:存在微小不一致但整体身体身份完整
- 50–69:某些身体部位看起来不同或有伪影
- 30–49:体型有明显变化或显著伪影
- 0–29:身体严重变形或完全不同的身体特征
维度4:背景保持(权重:10%)
原始人物图像和试穿结果之间的背景应保持稳定。背景变化会分散注意力并降低试穿的真实感。
检查内容:
- - 场景一致性:相同的环境、物体、空间布局
- 颜色和光照:一致的色调和照明
- 伪影:人物轮廓周围的模糊、重影或扭曲
- 物体完整性:家具、墙壁、图案应保持不变
- 边缘融合:人物与背景之间的平滑过渡
评分指南:
- - 90–100:背景相同或几乎无法区分
- 70–89:非常微小的变化;需要仔细检查才能注意到
- 50–69:一些明显的背景改变或伪影
- 30–49:背景显著变化或严重伪影
- 0–29:背景严重改变或替换
输出格式
按以下确切结构呈现评估:
虚拟试穿效果评分报告
输入识别
- - 输入格式:[三张独立图片 / 单张拼接图]
- 原始人物:[简要描述人物特征]
- 目标服装:[简要描述服装特征]
- 试穿结果:[简要描述试穿效果概况]
分项评分
1. 人脸身份保持 (权重 40%)
- - 得分:XX/100
- 评价:[1-2句话解释分数]
2. 服装还原与贴合 (权重 30%)
- - 得分:XX/100
- 评价:[1-2句话解释分数]
3. 非人脸身体特征保持 (权重 20%)
- - 得分:XX/100
- 评价:[1-2句话解释分数]
4. 背景保持 (权重 10%)
- - 得分:XX/100
- 评价:[1-2句话解释分数]
总分
- - 加权总分:XX.X/100
- 计算方式:(人脸 × 0.4) + (服装 × 0.3) + (身体 × 0.2) + (背景 × 0.1)
总体评价
[2-3句话总结整体质量,突出最强和最弱的方面,并建议试穿流程中可以改进的地方]
评估理念
评分应诚实且校准。几个指导原则:
- - 不要按曲线评分。 95分应意味着真正优秀的质量,而不仅仅是高于平均水平。将90分以上的分数留给那些能骗过细心人类观察者的结果。
- - 按指定权重分配维度。 人脸身份(40%)占主导地位,因为面部变化意味着试穿未能实现其核心目的——展示这个人穿那件衣服的效果。技术完美但面部不同的服装转移毫无价值。
- - 反馈要具体。 不要只说看起来不错或有问题。指出具体观察结果:鼻梁看起来略微变窄或左袖上的条纹图案变形。
- - 考虑使用场景。 虚拟试穿是一种实用工具——用户想知道他们在购买前穿某件衣服是否好看。从这个角度评估:这个结果能帮助某人做出自信的购买决定吗?
边缘情况
- - 如果服装类型不匹配(例如,原始人物穿着T恤,结果显示完全不同的类别如连衣裙),请注明这一点,但仍根据目标服装参考评估结果。
- - 如果图像质量非常低,请注明限制,并解释由于分辨率限制,分数可能不完全可靠。
- - 如果用户只提供两张图片(缺少三张中的一张),询问缺少哪一张以及他们能否提供。如果他们无法提供服装参考,仍可评估人脸/身体/背景,但应对服装分数加以说明。
- - 如果试穿只改变部分服装(例如,只更换上衣),仅评估改变部分的服装保真度,并注明原始部分中保留的内容。