MrScraper
Run AI-powered, unblockable web scraping, data extraction with natural language via the MrScraper API
Actions
This skill supports:
- - Opening blocked pages through unblocker (stealth browser + IP rotation)
- Starting AI scraper runs from natural-language instructions
- Rerunning existing scraper configurations on one or multiple URLs
- Running manual workflow-based reruns
- Fetching paginated results and detailed results by ID
This skill is API-only and does not depend on bundled local scripts.
Base URLs
- - Unblocker API: INLINECODE0
- Platform API: INLINECODE1
Authentication
Unblocker API auth
Use query-param auth on unblocker endpoint:
Platform API auth
Use header-based auth on platform endpoints:
CODEBLOCK0
How to get MRSCRAPER_API_TOKEN?
An API token lets your applications securely interact with MrScraper APIs and rerun scrapers created in the dashboard.
Follow these steps in the dashboard:
- 1. Click your User Profile at the top-right corner.
- Select API Tokens.
- Click New Token.
- Enter a name and set an expiration date.
- Click Create.
- Copy the new token and store it securely as
MRSCRAPER_API_TOKEN. - Use it in requests through the
x-api-token header.
Security rule:
- - Never expose tokens in client-side code (browser/mobile app bundles).
- Store tokens in environment variables or server-side secret managers.
Notes from the auth docs:
- - The API key works for all V3 Platform endpoints.
- The same key can be used for endpoints on
sync.scraper.mrscraper.com. - For access to endpoints on other hosts, contact
support@mrscraper.com.
Install and Runtime
- - No local install step is required by this skill document.
- No bundled
scripts/ are required. - Calls are direct HTTPS requests to the two base URLs above.
Data and Scope
- - Data is sent only to
api.app.mrscraper.com and api.mrscraper.com. - Responses may contain extracted page content and scrape metadata.
- This skill does not define hidden persistence or background jobs.
- Never expose tokens in logs, commits, or output.
Endpoints
1. Unblocker
- - Method: INLINECODE11
- URL: INLINECODE12
- Auth:
token query parameter
Opens a target URL through stealth browsing and IP rotation, then returns HTML. Use this when direct access is blocked by captcha or anti-bot protections.
Query parameters:
| Field | Type | Required | Default | Description |
|---|
| INLINECODE14 | INLINECODE15 | Yes | — | Unblocker token (MRSCRAPER_API_TOKEN) |
| INLINECODE17 |
string | Yes | — | URL-encoded target URL |
|
timeout |
number | No | 60 | Max wait in seconds (example
120) |
|
geoCode |
string | No | None | Geographic routing code (example
SG) |
|
blockResources |
boolean | No | false | Block non-essential resources |
Request example:
CODEBLOCK1
Response example:
CODEBLOCK2
Notes:
- - Prefer explicit
geoCode and practical timeouts for repeatable behavior. - Only pass cookies when session-specific content is required.
2. Create AI Scraper
- - Method: INLINECODE28
- Host: INLINECODE29
- Path: INLINECODE30
- Auth: INLINECODE31
Create a new AI scraper run from natural-language instructions.
Payload parameters (for agent: general or agent: listing):
| Field | Type | Required | Default | Description |
|---|
| INLINECODE36 | string | Yes | — | Target URL |
| INLINECODE37 |
string | Yes | — | Extraction instruction |
|
agent | string | No | general | The AI agent type to use for scraping:
general,
listing, or
map |
|
proxyCountry | string | No | None | ISO country code for proxy-based scraping |
Payload parameters (for agent: map):
| Field | Type | Required | Default | Description |
|---|
| INLINECODE45 | INLINECODE46 | Yes | — | Target URL |
| INLINECODE47 |
string | No | map | The AI agent type to use for scraping (for this case it is
map) |
|
maxDepth |
number | No | 2 | Maximum depth level for crawling links from the starting URL.
0 = only the starting URL, 1 = +direct links |
|
maxPages |
number | No | 50 | Maximum number of pages to scrape during the crawling process. |
|
limit |
number | No | 1000 | Maximum number of data records to extract across all pages. Scraping stops when this limit is reached. |
|
includePatterns |
string | No | "" | Regex patterns to include (separate multiple with
\|\|) |
|
excludePatterns |
string | No | "" | Regex patterns to exclude (separate multiple with
\|\|) |
Request example:
CODEBLOCK3
Response example:
CODEBLOCK4
Notes:
- - Choose agent type correctly as each agent is specialized for specified use cases. Use
general for most standard web scraping tasks. The go to agent if the user doesn't specify or the connected LLM is not confident about the type of page. But mostly used for scraping product page, but handles any type of page very well as well. Use listing for scraping listing pages like product listings, job listings, etc. Choose this if the connected LLM can confidently identify whether the given URL is a listing page. Use map for crawling and getting all subdomain or subpages of a website. Choose this if the user specifies that the given URL is a website and not a specific page. For map agent type, there is a special args that can be used to configure the scraping process. - For the
map agent, you can use special arguments to control crawling:
maxDepth (lower values 1–2 for focused scraping, max 3 recommended),
maxPages (limits total pages regardless of depth),
limit (caps total records extracted),
and includePatterns/excludePatterns (regex patterns separated by || to specify which URLs to crawl or skip, e.g., */products/*||*/blog/* or */cart/*||*.pdf).
If includePatterns is an empty string, all URLs are included. If excludePatterns is an empty string, no URLs are excluded.
3. Rerun AI Scraper
- - Method: INLINECODE77
- Host: INLINECODE78
- Path: INLINECODE79
- Auth: INLINECODE80
Reruns an existing scraper configuration on a new URL.
Payload parameters:
| Field | Type | Required | Default | Description |
|---|
| INLINECODE81 | INLINECODE82 | Yes | — | Scraper ID retrieved from created AI scraper |
| INLINECODE83 |
string | Yes | — | Target URL |
Optional payload parameters for map agent:
| Field | Type | Required | Default | Description |
|---|
| INLINECODE86 | number | No | 2 | Crawl depth |
| INLINECODE87 |
number | No | 50 | Maximum pages to crawl |
|
limit | number | No | 1000 | Result limit |
|
includePatterns | string | No | "" | Regex patterns to include (separate multiple with
\|\|) |
|
excludePatterns | string | No | "" | Regex patterns to exclude (separate multiple with
\|\|) |
Request example:
CODEBLOCK5
Response example:
CODEBLOCK6
4. Bulk Rerun AI Scraper
- - Method: INLINECODE93
- Host: INLINECODE94
- Path: INLINECODE95
- Auth: INLINECODE96
Runs one scraper configuration over multiple URLs.
Payload parameters:
| Field | Type | Required | Default | Description |
|---|
| INLINECODE97 | INLINECODE98 | Yes | — | Existing AI scraper configuration ID |
| INLINECODE99 |
array[string] | Yes | — | Target URLs to run |
Request example:
CODEBLOCK7
Response example:
CODEBLOCK8
5. Rerun Manual Scraper
- - Method: INLINECODE101
- Host: INLINECODE102
- Path: INLINECODE103
- Auth: INLINECODE104
Executes a rerun using a manual browser workflow.
Creating a Manual Scraper
Before calling the manual rerun endpoint, you need to create and save a manual scraper from the dashboard. Follow these steps:
- 1. Open the
MrScraper dashboard and go to Scraper. - Click
New Manual Scraper +. - Enter your target URL.
- Add workflow steps that match your site's behavior (e.g.,
Input, Click, Delay, Extract, Inject JavaScript). - Configure pagination if needed (using options like
Query Pagination, Directory Pagination, or Next Page Link). - Test and save the scraper, then copy its
scraperId to use in API reruns.
Payload parameters:
| Field | Type | Required | Default | Description |
|---|
| INLINECODE117 | INLINECODE118 | Yes | — | ID of the manual scraper to rerun. |
| INLINECODE119 |
string | Yes | — | Target URL for the rerun. |
|
workflow |
array<object>| No | None | Allows overriding the saved workflow steps. By default, uses the workflow saved during manual creation.|
Request example:
CODEBLOCK9
Response example:
CODEBLOCK10
6. Bulk Rerun Manual Scraper
- - Method: INLINECODE123
- Host: INLINECODE124
- Path: INLINECODE125
- Auth: INLINECODE126
Runs one scraper configuration over multiple URLs.
Payload parameters:
| Field | Type | Required | Default | Description |
|---|
| INLINECODE127 | INLINECODE128 | Yes | — | Existing manual scraper configuration ID |
| INLINECODE129 |
array[string] | Yes | — | Target URLs to run |
Request example:
CODEBLOCK11
Response example:
CODEBLOCK12
7. Fetch Results
- - Method: INLINECODE131
- Host: INLINECODE132
- Path: INLINECODE133
- Auth: INLINECODE134
Returns paginated scrape results.
Query parameters:
| Field | Type | Required | Default | Description |
|---|
| INLINECODE135 | string | Yes | INLINECODE136 | Sort column |
| INLINECODE137 |
string | Yes |
DESC | Sort direction |
|
page | number | Yes | 1 | Page number |
|
pageSize | number | Yes | 10 | Items per page |
|
search | string | No | None | Search keyword |
|
dateRangeColumn | string | No |
createdAt | Date field to filter |
|
startAt | string | No | None | Date range start (ISO) |
|
endAt | string | No | None | Date range end (ISO) |
Notes:
- -
sortField options: createdAt, updatedAt, id, type, url, status, error, tokenUsage, INLINECODE155 - INLINECODE156 options:
ASC, INLINECODE158 - INLINECODE159 options:
createdAt, INLINECODE161
Request example:
CODEBLOCK13
Response example:
CODEBLOCK14
8. Fetch Detailed Result by ID
- - Method: INLINECODE162
- Host: INLINECODE163
- Path: INLINECODE164
- Auth: INLINECODE165
Returns one detailed result object for a specific result ID.
Query parameters:
| Field | Type | Required | Default | Description |
|---|
| INLINECODE166 | INLINECODE167 | Yes | — | Result ID |
Request example:
CODEBLOCK15
Response example:
CODEBLOCK16
Errors
Standard platform API errors:
| Status | Meaning |
|---|
| INLINECODE168 | Invalid request payload |
| INLINECODE169 |
Missing or invalid API token |
|
404 | Scraper or result not found |
|
429 | Rate limit exceeded |
|
500 | Internal scraper error |
Error format:
CODEBLOCK17
Operating Rules
- - Validate required fields before every call.
- Use pagination for large result sets.
- Retry on
429 with exponential backoff. - Never expose credentials in outputs.
MrScraper
通过MrScraper API运行AI驱动的、不可屏蔽的网页抓取和数据提取,支持自然语言操作。
操作
该技能支持:
- - 通过解锁器(隐身浏览器 + IP轮换)打开被屏蔽的页面
- 根据自然语言指令启动AI抓取器运行
- 在一个或多个URL上重新运行现有的抓取器配置
- 运行基于手动工作流的重新运行
- 按ID获取分页结果和详细结果
此技能仅限API使用,不依赖捆绑的本地脚本。
基础URL
- - 解锁器API:https://api.mrscraper.com
- 平台API:https://api.app.mrscraper.com
认证
解锁器API认证
在解锁器端点上使用查询参数认证:
平台API认证
在平台端点上使用基于头的认证:
http
x-api-token: APITOKEN>
accept: application/json
content-type: application/json
如何获取MRSCRAPERAPITOKEN?
API令牌允许您的应用程序安全地与MrScraper API交互,并重新运行在仪表板中创建的抓取器。
请按照以下步骤在仪表板中操作:
- 1. 点击右上角的用户资料。
- 选择API令牌。
- 点击新建令牌。
- 输入名称并设置过期日期。
- 点击创建。
- 复制新令牌并安全存储为MRSCRAPERAPITOKEN。
- 通过x-api-token头在请求中使用它。
安全规则:
- - 切勿在客户端代码(浏览器/移动应用包)中暴露令牌。
- 将令牌存储在环境变量或服务器端密钥管理器中。
来自认证文档的说明:
- - API密钥适用于所有V3平台端点。
- 同一密钥可用于sync.scraper.mrscraper.com上的端点。
- 如需访问其他主机上的端点,请联系support@mrscraper.com。
安装和运行时
- - 此技能文档不需要本地安装步骤。
- 不需要捆绑的scripts/目录。
- 调用是对上述两个基础URL的直接HTTPS请求。
数据和范围
- - 数据仅发送到api.app.mrscraper.com和api.mrscraper.com。
- 响应可能包含提取的页面内容和抓取元数据。
- 此技能不定义隐藏的持久化或后台任务。
- 切勿在日志、提交或输出中暴露令牌。
端点
1. 解锁器
- - 方法:GET
- URL:https://api.mrscraper.com
- 认证:token查询参数
通过隐身浏览和IP轮换打开目标URL,然后返回HTML。当直接访问被验证码或反机器人保护阻止时使用此功能。
查询参数:
| 字段 | 类型 | 必需 | 默认值 | 描述 |
|---|
| token | string | 是 | — | 解锁器令牌(MRSCRAPERAPITOKEN) |
| url |
string | 是 | — | URL编码的目标URL |
| timeout | number | 否 | 60 | 最大等待时间(秒,例如120) |
| geoCode | string | 否 | 无 | 地理路由代码(例如SG) |
| blockResources | boolean | 否 | false | 阻止非必要资源 |
请求示例:
bash
curl --location https://api.mrscraper.com?token=APITOKEN>&timeout=120&geoCode=SG&url=https%3A%2F%2Fwww.lazada.sg%2Fproducts%2Fpdp-i111650098-s23209659764.html&blockResources=false
响应示例:
html
...
...
说明:
- - 建议明确指定geoCode和实际的超时时间,以获得可重复的行为。
- 仅在需要会话特定内容时传递cookie。
2. 创建AI抓取器
- - 方法:POST
- 主机:https://api.app.mrscraper.com
- 路径:/api/v1/scrapers-ai
- 认证:x-api-token
根据自然语言指令创建新的AI抓取器运行。
负载参数(适用于agent:general或agent:listing):
| 字段 | 类型 | 必需 | 默认值 | 描述 |
|---|
| url | string | 是 | — | 目标URL |
| message |
string | 是 | — | 提取指令 |
| agent | string | 否 | general | 用于抓取的AI代理类型:general、listing或map |
| proxyCountry | string | 否 | 无 | 基于代理的抓取的ISO国家代码 |
负载参数(适用于agent:map):
| 字段 | 类型 | 必需 | 默认值 | 描述 |
|---|
| url | string | 是 | — | 目标URL |
| agent |
string | 否 | map | 用于抓取的AI代理类型(此情况下为map) |
| maxDepth | number | 否 | 2 | 从起始URL爬取链接的最大深度级别。
0 = 仅起始URL,1 = +直接链接 |
| maxPages | number | 否 | 50 | 爬取过程中要抓取的最大页面数。 |
| limit | number | 否 | 1000 | 跨所有页面提取的最大数据记录数。达到此限制时停止抓取。 |
| includePatterns | string | 否 | | 要包含的正则表达式模式(多个用\|\|分隔) |
| excludePatterns | string | 否 | | 要排除的正则表达式模式(多个用\|\|分隔) |
请求示例:
bash
curl -X POST https://api.app.mrscraper.com/api/v1/scrapers-ai \
-H x-api-token: APITOKEN> \
-H Content-Type: application/json \
-d {
url: https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html,
message: 提取标题、价格、库存和评分,
agent: general
}
响应示例:
json
{
id: 497f6eca-6276-4993-bfeb-53cbbbba6f08,
createdAt: 2019-08-24T14:15:22Z,
createdById: e13e432a-5323-4484-a91d-b5969bc564d9,
updatedAt: 2019-08-24T14:15:22Z,
updatedById: d8bc6076-4141-4a88-80b9-0eb31643066f,
deletedAt: 2019-08-24T14:15:22Z,
deletedById: 8ef578ad-7f1e-4656-b48b-b1b4a9aaa1cb,
userId: 2c4a230c-5085-4924-a3e1-25fb4fc5965b,
scraperId: 6695bf87-aaa6-46b0-b1ee-88586b222b0b,
type: AI,
url: http://example.com,
status: 已完成,
error: string,
tokenUsage: 0,
runtime: 0,
data: {}, // 主要抓取数据
htmlPath: string,
recordingPath: string,
screenshotPath: string,
dataPath: string
}
说明:
- - 正确选择代理类型,因为每个代理专门用于特定用例。对于