Website Frontend Visual Replication
Prerequisites
This workflow depends on either Playwright MCP or the agent-browser skill. As long as at least one of them is installed and available, the workflow can run normally. If neither is available in your environment, remind the user to install one.
Authorization Gate (MUST execute first)
Before proceeding, the agent MUST ask the user:
“Do you own this website, or do you have explicit written permission from the owner to replicate it? Unauthorized replication may violate copyright, terms of service, or applicable law.”
- - If the user confirms authorization → proceed.
- If the user cannot confirm → STOP. Do not proceed with replication. Suggest alternatives (e.g., building an original design inspired by general layout patterns).
Scope & Limitations
This skill replicates FRONTEND VISUAL PRESENTATION only. Specifically:
| Included | NOT Included |
|---|
| Page layout & visual styling | Backend / server-side logic |
| Navigation structure |
Databases & data stores |
| Publicly visible text & images | Authentication systems / sessions |
| CSS/design tokens | API business logic |
| Client-side interaction patterns | Non-public or behind-login content |
| Static asset files (images, fonts) | Credentials, secrets, or API keys |
Data handling rules:
- 1. Never scrape behind a login wall. Only capture publicly accessible pages.
- Never collect or store credentials, API keys, session tokens, or personal data (PII).
- Never reproduce copyrighted content verbatim (articles, copy text) unless the user holds rights.
- Respect robots.txt and rate limits. If the site signals crawl restrictions, honor them.
- Output is for reference & mockup purposes unless the user has confirmed full rights.
Core Idea
- 1. Recursively explore every public page of the target website, systematically record its visual content, client-side interaction logic, and publicly available asset files, and organize everything into a structured “website replication blueprint.” This blueprint should comprehensively include detailed information for each page, and naturally map the site’s navigation relationships through folder hierarchy. Specifically, during exploration, use nested folders to organize and record the collected page information: represent the current page as a folder, and represent all pages reachable from it as child folders. At the same time, save that page’s screenshots, component interaction records, and related asset files inside the page folder. With this structure, the final blueprint will clearly present both the content and interaction details of each page, while also implicitly reflecting the website’s overall information architecture and navigation paths.
An example blueprint folder structure:
CODEBLOCK0
- 2. After completing the blueprint construction above, build a frontend visual replica of the target website based on that blueprint, approximating the original's page-to-page navigation, visual layout, and client-side interaction patterns. This is a frontend-only reproduction and does not include backend behavior replication.
Replication Workflow
The whole process is divided into five phases: initialization → recursively collect pages and build the sitemap → generate summary outputs → frontend visual replication → visual comparison and revision.
The first three phases focus on exploration and documentation, while the final two phases focus on implementing the frontend replica based on the collected blueprint and visually verifying it. Below, the agent-browser workflow is used as an example; if using Playwright MCP, the overall process and usage are essentially the same and can be followed with the same approach.
Step 1: Initialize the project
CODEBLOCK1
Write the following content into blueprint/_meta.md:
CODEBLOCK2
Step 2: Recursively collect pages and build the sitemap
For every recursively discovered page, execute the following standard procedure:
First capture a full-page screenshot → download assets → traverse interaction states → scroll down → download assets again → traverse interaction states again → scroll down again → ... → continue until reaching the bottom of the page.
2.1: Asset download
After opening the page and waiting for it to fully load, collect all asset links on the page (images, videos, fonts, etc.) and download them into that page’s
_assets/ directory whenever possible. Record all failed downloads and the reasons for failure.
2.2: Traverse interaction states
Obtain the list of interactive elements on the page, interact with all of them, capture UI changes, and record all newly discovered pages.
CODEBLOCK3
Write the updated sitemap into blueprint/_sitemap.md.
Write the updated page collection results into blueprint/home/products/_page.md:
CODEBLOCK4
Write all updated interactions into blueprint/home/products/_interactions.md:
CODEBLOCK5
Step 3: Generate summary outputs
After all page collection is complete, generate the global summary files.
blueprint/_sitemap.md
CODEBLOCK6
blueprint/_navigation_graph.md
CODEBLOCK7
Close the browser
CODEBLOCK8
Step 4: Frontend Visual Replication
After completing website exploration and blueprint generation, build a
frontend visual replica based on the collected blueprint. During replication, refer to the blueprint’s page structures, visual styles, and publicly available assets, and use your preferred frontend tools and frameworks to rebuild the website’s client-side presentation. The goal is to approximate the original’s visual design and navigation — not to reproduce backend behavior or non-public functionality.
Step 5: Visual Comparison & Revision
After finishing the frontend replication, use Playwright MCP or the agent-browser skill to render both the original website and the replicated version, systematically compare them for visual consistency in layout, colors, typography, and navigation structure, verify the degree of visual approximation page by page, and make adjustments based on the comparison results.
Key Rules
- 1. Authorization first. Never begin replication without user confirmation of legal right to replicate.
- Public pages only. Only explore and capture publicly accessible pages. Do not attempt to access login-protected areas, admin panels, or authenticated endpoints.
- No credential/PII handling. If any captured content contains credentials, tokens, or personal data, redact or exclude it immediately.
- Frontend only. Never claim or attempt to replicate backend business logic, database schemas, or server-side behavior.
- Folders represent navigation relationships. If page A can navigate to page B, then page B should be created as a subfolder inside page A’s folder.
- If multiple pages can navigate to the same target page, you may skip redundant exploration of that page and create its folder only once under the most natural parent. However, all navigation sources must still be recorded in
_sitemap.md, _navigation_graph.md, and the related pages’ _page.md files. - Every folder must contain:
-
_page.md — page blueprint (sections, components, outbound navigation)
-
_full.png — full-page screenshot
-
_interactions.md — interaction behavior record
-
_interactions/ — directory for interaction state screenshots
-
_assets/ — page-specific assets
- scrolling screenshot sequence
_scroll_00.png ~
_scroll_N.png
- 8. Screenshots must be captured from the real site. Do not describe visual information from memory; all visual details must be based on actual screenshots.
- Interactions must be genuinely triggered and recorded one by one, including hover, click, focus, and so on.
- Assets should be downloaded whenever possible. Use
curl to download images, and use agent-browser eval or Playwright MCP to extract SVG source code and save it. All failed downloads must be recorded together with the reasons for failure. - Record only; do not evaluate. Accurately document observed results without making quality judgments.
- Keep files updated in real time. After finishing exploration of each page window, update the sitemap and all related blueprint files such as
_page.md and _interactions.md so the output always remains synchronized with the current exploration state. - Ensure elements are visible. Before interacting with any page element, make sure the target element is within the visible viewport. Scroll the page or adjust the viewport position if necessary.
- Each call to
agent-browser may execute only one command, such as taking a screenshot, getting the element list, or performing one interaction. Do not combine multiple commands in a single call. The consecutive commands in the example are only for illustrating the workflow; in actual execution they must be split into separate calls. - If interaction with a webpage element triggers navigation to an external domain, there is no need to deeply explore or continue scraping that external page. However, you must record the key information of that external link, including the external URL, the page element that triggered the link (text/button name/selector or ref), and the trigger method (such as click). In the replicated website, the corresponding page element should preserve the same trigger method and external destination.
- INLINECODE22 is used to record the overall visual overview of the page, while the scrolling screenshot sequence (
_scroll_00.png ~ _scroll_N.png) can record details of each segment. During website replication, both the overall information from the full-page screenshot and the detailed information from the scrolling screenshot sequence should be referenced together to ensure accurate visual reproduction and complete detail restoration.
网站前端视觉复制
前置条件
此工作流依赖于 Playwright MCP 或 agent-browser 技能。只要至少安装了其中一项且可用,工作流即可正常运行。如果您的环境中两者均不可用,请提醒用户安装其中一项。
授权门禁(必须首先执行)
在开始之前,代理必须询问用户:
您是否拥有此网站的所有权,或已获得所有者明确的书面许可进行复制?未经授权的复制可能违反版权、服务条款或适用法律。
- - 如果用户确认已获授权 → 继续执行。
- 如果用户无法确认 → 停止。不得进行复制。建议替代方案(例如,基于通用布局模式构建原创设计)。
范围与限制
此技能仅复制前端视觉呈现。 具体如下:
| 包含内容 | 不包含内容 |
|---|
| 页面布局与视觉样式 | 后端/服务器端逻辑 |
| 导航结构 |
数据库与数据存储 |
| 公开可见的文本与图片 | 认证系统/会话 |
| CSS/设计令牌 | API 业务逻辑 |
| 客户端交互模式 | 非公开或需登录的内容 |
| 静态资源文件(图片、字体) | 凭据、密钥或 API 密钥 |
数据处理规则:
- 1. 绝不抓取登录墙后的内容。 仅捕获可公开访问的页面。
- 绝不收集或存储凭据、API 密钥、会话令牌或个人身份信息(PII)。
- 绝不逐字复制受版权保护的内容(文章、文案),除非用户拥有相关权利。
- 遵守 robots.txt 和速率限制。 如果网站标示了爬取限制,请予以遵守。
- 输出仅用于参考和原型设计目的,除非用户已确认拥有完整权利。
核心理念
- 1. 递归探索目标网站的每一个公开页面,系统性地记录其视觉内容、客户端交互逻辑以及公开可用的资源文件,并将所有内容组织成结构化的网站复制蓝图。该蓝图应全面包含每个页面的详细信息,并通过文件夹层级自然地映射网站的导航关系。具体来说,在探索过程中,使用嵌套文件夹来组织和记录收集到的页面信息:将当前页面表示为一个文件夹,将从该页面可到达的所有页面表示为子文件夹。同时,将该页面的截图、组件交互记录及相关资源文件保存在页面文件夹内。通过这种结构,最终蓝图将清晰呈现每个页面的内容和交互细节,同时隐式反映网站的整体信息架构和导航路径。
一个示例蓝图文件夹结构:
text
blueprint/
├── _meta.md # 站点元数据
├── _sitemap.md # 站点地图
├── _assets/ # 全局资源(字体、网站图标等)
├── home/ # 首页
│ ├── _page.md # 页面蓝图(区块、组件、交互摘要)
│ ├── _full.png # 全页截图
│ ├── scroll00.png ~ N.png # 滚动截图序列
│ ├── _interactions.md # 所有交互记录
│ ├── _interactions/ # 交互状态截图(按交互类型组织)
│ ├── _assets/ # 该页面的资源(图片、视频等)
│ ├── products/ # 从首页可到达的产品列表
│ │ ├── _page.md
│ │ ├── _full.png
│ │ ├── scroll00.png ~ N.png
│ │ ├── _interactions.md
│ │ ├── _interactions/
│ │ ├── _assets/
│ │ ├── product-detail/ # 从产品列表可到达的产品详情
│ │ │ ├── _page.md
│ │ │ └── ...
│ │ └── category/ # 从产品列表可到达的分类筛选
│ │ └── ...
│ ├── about/ # 从首页可到达的关于我们
│ │ └── ...
│ ├── blog/ # 从首页可到达的博客
│ │ └── ...
│ └── login/ # 从首页可到达的登录
│ └── ...
└── navigationgraph.md # 站点导航图(Mermaid)
- 2. 在完成上述蓝图构建后,基于该蓝图构建目标网站的前端视觉副本,近似还原原始网站的页面间导航、视觉布局和客户端交互模式。这仅为前端复制,不包括后端行为复制。
复制工作流
整个过程分为五个阶段:初始化 → 递归收集页面并构建站点地图 → 生成摘要输出 → 前端视觉复制 → 视觉比较与修订。
前三个阶段侧重于探索和文档记录,最后两个阶段侧重于基于收集到的蓝图实现前端副本并进行视觉验证。以下以 agent-browser 工作流为例;如果使用 Playwright MCP,整体流程和用法基本相同,可按相同方法执行。
步骤 1:初始化项目
bash
创建蓝图根目录
mkdir -p blueprint/_assets
打开浏览器并访问目标站点
agent-browser open <目标 URL>
agent-browser set viewport 1920 1080
agent-browser wait --load networkidle
将以下内容写入 blueprint/_meta.md:
markdown
网站复制蓝图
- - 目标网站:
- 探索日期:<日期>
- 视口大小:1920×1080
步骤 2:递归收集页面并构建站点地图
对于递归发现的每个页面,执行以下标准流程:
首先捕获全页截图 → 下载资源 → 遍历交互状态 → 向下滚动 → 再次下载资源 → 再次遍历交互状态 → 再次向下滚动 → ... → 持续直到到达页面底部。
2.1:资源下载
打开页面并等待其完全加载后,收集页面上的所有资源链接(图片、视频、字体等),并尽可能将其下载到该页面的 _assets/ 目录中。记录所有失败的下载及其失败原因。
2.2:遍历交互状态
获取页面上的交互元素列表,与所有元素进行交互,捕获 UI 变化,并记录所有新发现的页面。
bash
=== 以向下滚动前的产品列表页面为例 ===
agent-browser open <产品列表 URL>
agent-browser screenshot blueprint/home/products/
scroll00.png
agent-browser wait --load networkidle
agent-browser snapshot -i
假设输出为:
button 全部 [ref=e1] ← 筛选标签
button 电子产品 [ref=e2]
card 产品 A [ref=e3] ← 产品卡片
select 排序 [ref=e4] ← 排序下拉框
link 首页 [ref=e5] ← 面包屑导航链接
button 了解更多 [ref=e6] ← 产品卡片内的按钮
--- 交互 1:悬停产品卡片 ---
agent-browser hover @e3
agent-browser screenshot blueprint/home/products/
interactions/cardhover.png
--- 交互 2:点击筛选标签 ---
agent-browser click @e2
agent-browser wait --load networkidle
agent-browser screenshot blueprint/home/products/
interactions/filterelectronics.png
--- 交互 3:更改排序 ---
agent-browser select @e4 价格:从低到高
agent-browser wait --load networkidle
agent-browser screenshot blueprint/home/products/
interactions/sortprice_asc.png
--- 交互 4:点击产品卡片(触发导航) ---
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser get url # → /products/123
agent-browser screenshot blueprint/home/products/product-detail/_full.png --full
agent-browser back
--- 交互 5:点击面包屑首页 ---
agent-browser click @e5
agent-browser wait --load networkidle
agent-browser get url # → /
agent-browser screenshot blueprint/home/_full.png --full
agent-browser back
--- 交互 6:点击了解更多按钮(外部导航) ---
agent-browser click @e6
agent-browser wait --load networkidle
agent-browser get url # → https://www.angelokeirsebilck.be/
agent-browser screenshot blueprint/home/products/
interactions/learnmore_external.png --full
agent-browser back
将更新后的站点地图写入 blueprint/_sitemap.md。
将更新后的页面收集结果写入 blueprint/home/products/_page.md:
markdown
产品列表页面
- - URL:/products
- 来源:首页主导航产品链接
区块结构
| 序号 | 区块名称 | 布局模式 | 内容类型 |
|---|
| 1 | 页面标题 | 全宽单列 | H1 标题 + 描述文本 |
| 2 |