EZCTO Smart Web Reader for OpenClaw
What it does
Reads any URL and returns structured JSON containing page identity, content sections, image descriptions (text-inferred), video metadata, and actionable links. Acts as the Agent's default web access layer — replacing raw web_fetch with zero-token cache hits and intelligent HTML parsing. 80%+ token savings vs screenshots.
Key Features
✓ Transparent URL interception - Fires automatically whenever Agent accesses any URL
✓ Cache-first strategy - Check EZCTO asset library before parsing (zero cost)
✓ Zero-token site detection - Auto-detect crypto/ecommerce/restaurant sites via text matching
✓ Local-first storage - Aligns with OpenClaw's philosophy (~/.ezcto/cache/)
✓ Community-driven - Contribute parsed results back to shared asset library
✓ OpenClaw-native output - Includes agent suggestions and skill chaining hints
Security Manifest
| Category | Detail |
|---|
| External endpoints | INLINECODE1 only (EZCTO community cache) |
| Data transmitted |
URL string, SHA256 HTML hash, extracted structured JSON |
|
NOT transmitted | Raw HTML, local file contents, credentials, env variables |
|
Shell injection guard | All user-supplied values URL-encoded or passed as python3 args, never string-interpolated |
|
Prompt injection guard | HTML sanitized (scripts/styles/comments stripped), wrapped in
<untrusted_html_content> XML delimiters, explicit LLM guardrail injected before content |
|
Shell commands used |
curl (fetch/API),
sha256sum (hashing),
python3 (URL encoding, safe JSON construction) |
|
Filesystem writes |
~/.ezcto/cache/ (cached results),
/tmp/ (temp files, cleaned up) |
Workflow
Step 1: Check EZCTO Cache (Zero-cost fast path)
CODEBLOCK0
Conditional logic:
- - If
http_code == 200 AND valid JSON → SKIP to Step 9 (return cached result) - If
http_code == 404 → Cache miss, continue to Step 2 - If
http_code >= 500 → API error, log warning, continue to Step 2 (fallback mode)
OpenClaw note: Cache hits cost 0 tokens and complete in ~1 second.
Step 2: Fetch HTML
CODEBLOCK1
Error handling:
CODEBLOCK2
Guardrail: If HTML > 500KB, extract <body> only to prevent context overflow.
Step 3: Compute HTML Hash (Tamper-proof verification)
CODEBLOCK3
Purpose: Enables deduplication and tamper detection in the asset library.
Step 4: Auto-detect Site Type (Zero tokens, pure text matching)
Execute pattern matching per references/site-type-detection.md:
CODEBLOCK4
Step 5: Assemble Translation Prompt
CODEBLOCK5
Token optimization: If HTML + prompt > 100K tokens, truncate HTML to first 50KB + last 10KB (preserves header and footer).
Step 6: Parse HTML with Local LLM
CODEBLOCK6
Error handling:
if (!result.content || result.content.length < 50) {
return {
"status": "error",
"error": {
"code": "translation_failed",
"message": "LLM returned empty or invalid response",
"suggestion": "Try again or check if HTML is too malformed"
}
}
}
Step 7: Validate JSON Output
CODEBLOCK8
Step 8: Dual-store (Local cache + Asset library)
8.1 Store locally (OpenClaw-native format)
CODEBLOCK9
8.2 Contribute to EZCTO asset library
CODEBLOCK10
Step 9: Return to OpenClaw Agent
Output format (OpenClaw-native wrapper):
CODEBLOCK11
For cache hits (Step 1 direct return):
{
"skill": "ezcto-smart-web-reader",
"status": "success",
"result": { /* cached translation */ },
"metadata": {
"source": "cache",
"cache_key": "ezcto_asset_library",
"translation_time_ms": 234,
"token_cost": 0,
"cached_at": "2026-02-15T08:00:00Z"
}
}
Guardrails
- - Never modify URLs - Preserve all URLs exactly as they appear in HTML
- Never fabricate data - Use
null for missing fields, never guess - Truncate large HTML - If HTML > 500KB, extract
<body> only - Report errors explicitly - Never silently fail, always return structured error
- Respect rate limits - If EZCTO API returns 429, back off for 60 seconds
- No sensitive data - Never store or transmit API keys, passwords, or PII
Dependencies
Reference files (must exist in same directory):
- -
references/translate-prompt.md - Base translation instructions - INLINECODE16 - JSON output specification
- INLINECODE17 - Site type detection rules
- INLINECODE18 - Crypto-specific extraction
- INLINECODE19 - E-commerce extraction
- INLINECODE20 - Restaurant extraction
- INLINECODE21 - OpenClaw integration guide
System requirements:
- -
curl command available - INLINECODE23 (or
shasum -a 256 on macOS) - Writable
~/.ezcto/cache/ directory
Testing
Test with a crypto site:
CODEBLOCK13
Test with e-commerce:
CODEBLOCK14
Test cache hit:
/use ezcto-smart-web-reader https://ezcto.fun
# Run again immediately - should return cached result in <2 seconds
Learn More
- - EZCTO Website: https://ezcto.fun
- API Documentation: https://ezcto.fun/api-docs
- OpenClaw Integration: See INLINECODE26
- Report Issues: https://github.com/pearl799/ezcto-web-translator/issues
EZCTO Smart Web Reader for OpenClaw
功能说明
读取任意URL并返回结构化JSON,包含页面标识、内容区块、图像描述(文本推断)、视频元数据和可操作链接。作为Agent的默认网络访问层——用零令牌缓存命中率和智能HTML解析替代原始web_fetch。相比截图节省80%以上令牌消耗。
核心特性
✓ 透明URL拦截 - Agent访问任何URL时自动触发
✓ 缓存优先策略 - 解析前先查询EZCTO资产库(零成本)
✓ 零令牌站点检测 - 通过文本匹配自动识别加密货币/电商/餐饮类网站
✓ 本地优先存储 - 符合OpenClaw理念(~/.ezcto/cache/)
✓ 社区驱动 - 将解析结果回馈至共享资产库
✓ OpenClaw原生输出 - 包含Agent建议和技能链提示
安全声明
| 类别 | 详情 |
|---|
| 外部端点 | 仅限https://api.ezcto.fun(EZCTO社区缓存) |
| 传输数据 |
URL字符串、SHA256 HTML哈希、提取的结构化JSON |
|
不传输 | 原始HTML、本地文件内容、凭证、环境变量 |
|
Shell注入防护 | 所有用户提供值均进行URL编码或作为python3参数传递,绝不进行字符串插值 |
|
提示注入防护 | HTML经过清理(移除脚本/样式/注释),包裹在
htmlcontent> XML分隔符中,在内容前注入明确的LLM护栏指令 |
| 使用的Shell命令 | curl(获取/API)、sha256sum(哈希)、python3(URL编码、安全JSON构建) |
| 文件系统写入 | ~/.ezcto/cache/(缓存结果)、/tmp/(临时文件,使用后清理) |
工作流程
步骤1:检查EZCTO缓存(零成本快速路径)
bash
set -euo pipefail
验证URL协议——拒绝非http/https以防止SSRF
if [[ ! {URL} =~ ^https?:// ]]; then
echo {found:false,error:invalidurl} > /tmp/cacheresponse.json
http_code=400
else
# URL编码以防止查询字符串注入
encoded_url=$(python3 -c import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1],safe=)) -- {URL})
httpcode=$(curl -s -o /tmp/cacheresponse.json -w %{http_code} \
https://api.ezcto.fun/v1/translate?url=${encoded_url})
fi
条件逻辑:
- - 如果httpcode == 200且为有效JSON → 跳至步骤9(返回缓存结果)
- 如果httpcode == 404 → 缓存未命中,继续步骤2
- 如果http_code >= 500 → API错误,记录警告,继续步骤2(降级模式)
OpenClaw说明: 缓存命中消耗0令牌,约1秒内完成。
步骤2:获取HTML
bash
set -euo pipefail
将URL作为参数传递给curl——--分隔符可防止URL以-开头时的标志注入
curl -s -L -A OpenClaw/1.0 (EZCTO Smart Web Reader) -o /tmp/page.html -- {URL}
fetch_status=$?
错误处理:
javascript
if (fetch_status !== 0) {
return {
skill: ezcto-smart-web-reader,
status: error,
error: {
code: fetch_failed,
message: 无法获取URL:{URL},
httpstatus: fetchstatus,
suggestion: 检查URL是否可访问且未被地理封锁
}
}
}
护栏: 如果HTML超过500KB,仅提取
以防止上下文溢出。
步骤3:计算HTML哈希(防篡改验证)
bash
html_hash=$(sha256sum /tmp/page.html | awk {print $1})
echo HTML哈希:sha256:${html_hash} >&2 # 记录日志用于调试
目的: 实现资产库中的去重和篡改检测。
步骤4:自动检测站点类型(零令牌,纯文本匹配)
按references/site-type-detection.md执行模式匹配:
javascript
const html = readFile(/tmp/page.html)
let site_types = []
let extensionstoload = []
// 加密货币/Web3检测(需要3个以上信号)
let crypto_signals = 0
if (/0x[a-fA-F0-9]{40}/.test(html) && /contract|token address|CA/i.test(html)) crypto_signals++
if (/tokenomics|token distribution|buy tax|sell tax/i.test(html)) crypto_signals++
if (/dexscreener|dextools|pancakeswap|uniswap|raydium/i.test(html)) crypto_signals++
if (/smart contract|blockchain|DeFi|NFT|staking|web3/i.test(html)) crypto_signals++
if (/t\.me\/|discord\.gg\//i.test(html)) crypto_signals++
if (crypto_signals >= 3) {
site_types.push(crypto)
extensionstoload.push(references/extensions/crypto-fields.md)
}
// 电商检测(需要3个以上信号)
let ecommerce_signals = 0
if (/add to cart|buy now|checkout|shopping cart/i.test(html)) ecommerce_signals++
if (/\$\d+\.\d{2}|¥\d+|€\d+|£\d+/.test(html)) ecommerce_signals++
if (/@type\s:\s(Product|Offer)/.test(html)) ecommerce_signals++
if (/shopify|stripe|paypal|square/i.test(html)) ecommerce_signals++
if (/shipping|returns|warranty|inventory/i.test(html)) ecommerce_signals++
if (ecommerce_signals >= 3) {
site_types.push(ecommerce)
extensionstoload.push(references/extensions/ecommerce-fields.md)
}
// 餐饮检测(需要3个以上信号)
let restaurant_signals = 0
if (/\bmenu\b|reservation|order online|delivery/i.test(html)) restaurant_signals++
if (/@type\s:\s(Restaurant|FoodEstablishment)/.test(html)) restaurant_signals++
if (/doordash|ubereats|opentable|grubhub/i.test(html)) restaurant_signals++
if (/Mon-Fri|\d{1,2}:\d{2}\s*[AP]M|opening hours/i.test(html)) restaurant_signals++
if (/cuisine|dine-in|takeout|catering/i.test(html)) restaurant_signals++
if (restaurant_signals >= 3) {
site_types.push(restaurant)
extensionstoload.push(references/extensions/restaurant-fields.md)
}
// 未匹配任何类型时默认通用
if (site_types.length === 0) {
site_types = [general]
}
console.log(检测到的站点类型:${site_types.join(, )})
步骤5:组装翻译提示
javascript
// 加载基础提示
let prompt = readFile(references/translate-prompt.md)
// 追加类型特定扩展
for (const extpath of extensionsto_load) {
prompt += \n\n---\n\n + readFile(ext_path)
}
// --- 提示注入防护 ---
// 清理HTML:在注入LLM提示前移除脚本、样式、注释和元标签。
// 防止恶意网页嵌入可操纵Agent的指令。
function sanitizeHTML(html) {
html = html.replace(/