Convert Document To Markdown

Use this skill when a user wants a supported local file converted into Markdown for later processing.

What this skill does

- Converts supported local files into Markdown:

.pdf, .docx, .pptx, .xlsx, .jpg, .jpeg, .png, .gif, .bmp, .txt, .json, .xml, .md

- Image handling modes are file-type dependent:

ocr / vl / none for .docx, .pptx, .xlsx, and image files; ocr / vl / vl-page / none for .pdf

- Only runs through Docker. Do not use local Python execution as an operational path.
Uses a prebuilt Aliyun CR image with fixed version 0.0.1:

convert-document-to-markdown-arm64:0.0.1 on ARM64 hosts, convert-document-to-markdown-x64:0.0.1 on x64 hosts

- Returns structured JSON by default so later tool calls can consume markdown, logs, and meta.
Reads one-time VL configuration from OpenClaw skill config or the repository .env file, then forwards it into the container automatically.
Only exposes the file command. URL, health, and version commands are intentionally removed to keep startup lean.
Do not use latest, do not build a fallback image at runtime, and do not treat .doc, .ppt, .xls, audio files, or unlisted image formats as supported inputs.

Required workflow

1. By default the scripts use crpi-4auaoyyj6r36p6lb.cn-hangzhou.personal.cr.aliyuncs.com/huozige_lab.
Let the wrapper script resolve the host architecture and choose convert-document-to-markdown-arm64:0.0.1 or convert-document-to-markdown-x64:0.0.1.
If needed, override with IMAGE_REGISTRY or IMAGE_NAME.
For a local file, run:

scripts/run_docker_cli.sh file <absolute-or-relative-path> --format json

5. Parse the JSON result.
If success is false, surface error.message and relevant logs.
If success is true, use markdown as the canonical output for downstream work.

One-time VL configuration

This skill is designed so the user does not need to re-enter Vision API settings on each run.

Preferred OpenClaw configuration in ~/.openclaw/openclaw.json:

CODEBLOCK0

This works because:

- skillKey is INLINECODE51
INLINECODE52 is VL_API_KEY, so apiKey maps to INLINECODE55
INLINECODE56 can hold VL_BASE_URL and INLINECODE58

Repository-local runtime configuration:

- copy .env.example to INLINECODE60
fill VL_BASE_URL, VL_API_KEY, and INLINECODE63
by default the scripts use INLINECODE64
optionally override with IMAGE_REGISTRY or INLINECODE66
use scripts/run_docker_cli.sh, which loads .env, forwards any host VL_* variables into docker run, and pulls the correct fixed-version image if missing

Command patterns

Local file:

CODEBLOCK1

Parameters

- INLINECODE71

Default mode. Use Tesseract OCR for images.

- INLINECODE72

Use a Vision API. Only choose this when the environment provides VL_API_KEY and related variables.

- INLINECODE74

Skip image recognition for speed.

- INLINECODE75

PDF only. Do not use this mode for Office documents or image files.

- INLINECODE76

Use json unless the user explicitly wants raw Markdown on stdout.

- INLINECODE78

Save the Markdown to a file. Prefer this only when you invoke docker run directly with a writable host mount.

- INLINECODE80

Save detailed logs to a file. Prefer this only when you invoke docker run directly with a writable host mount.

Operational notes

- For very large local files, stay with the Docker CLI path; do not wrap the file content into base64 or a temporary HTTP service.
The skill is Docker-only. Do not instruct users to run uv, python, or any other local runtime path for production use.
The wrapper scripts choose the image by host architecture. Override with IMAGE_ARCH only when you have a concrete reason.
Prefer IMAGE_REGISTRY plus the fixed version 0.0.1; only use IMAGE_NAME when you need to pass the full image reference explicitly.
When the user asks for VL or VL-page, first check whether VL_BASE_URL, VL_API_KEY, and VL_MODEL are already configured via OpenClaw skill config or .env.
If the user only needs extracted Markdown and not the raw JSON wrapper, read the JSON and return the markdown field.
If the user provides an unsupported extension such as .doc, .ppt, .xls, .wav, .mp3, .m4a, or .mp4, say the current skill does not reliably support it.

Safety notes

- Treat file paths as untrusted input. Quote shell arguments correctly.
Do not claim success unless the command returns success: true.

将文档转换为 Markdown

当用户希望将支持的本地文件转换为 Markdown 以便后续处理时，使用此技能。

此技能的功能

- 将支持的本地文件转换为 Markdown：

.pdf、.docx、.pptx、.xlsx、.jpg、.jpeg、.png、.gif、.bmp、.txt、.json、.xml、.md

- 图像处理模式取决于文件类型：

.docx、.pptx、.xlsx 和图像文件支持 ocr / vl / none； .pdf 支持 ocr / vl / vl-page / none

- 仅通过 Docker 运行。请勿将本地 Python 执行作为操作路径。
使用预构建的阿里云容器镜像服务镜像，固定版本为 0.0.1：

ARM64 主机使用 convert-document-to-markdown-arm64:0.0.1， x64 主机使用 convert-document-to-markdown-x64:0.0.1

- 默认返回结构化 JSON，以便后续工具调用可以消费 markdown、logs 和 meta。
从 OpenClaw 技能配置或仓库 .env 文件中读取一次性 VL 配置，然后自动将其转发到容器中。
仅暴露 file 命令。URL、健康检查和版本命令已被有意移除，以保持启动简洁。
请勿使用 latest，不要在运行时构建备用镜像，也不要将 .doc、.ppt、.xls、音频文件或未列出的图像格式视为支持的输入。

必要的工作流程

1. 默认情况下，脚本使用 crpi-4auaoyyj6r36p6lb.cn-hangzhou.personal.cr.aliyuncs.com/huozigelab。
让包装脚本解析主机架构并选择 convert-document-to-markdown-arm64:0.0.1 或 convert-document-to-markdown-x64:0.0.1。
如果需要，使用 IMAGEREGISTRY 或 IMAGE_NAME 进行覆盖。
对于本地文件，运行：

scripts/rundockercli.sh file <绝对路径或相对路径> --format json

5. 解析 JSON 结果。
如果 success 为 false，则展示 error.message 和相关的 logs。
如果 success 为 true，则使用 markdown 作为下游工作的标准输出。

一次性 VL 配置

此技能设计为用户无需在每次运行时重新输入视觉 API 设置。

~/.openclaw/openclaw.json 中的首选 OpenClaw 配置：

json
{
skills: {
entries: {
convertdocumentto_markdown: {
enabled: true,
apiKey: sk-xxx,
env: {
VLBASEURL: https://api.openai.com/v1,
VL_MODEL: gpt-4.1-mini
}
}
}
}
}

这样做的原因是：

- skillKey 为 convertdocumenttomarkdown
primaryEnv 为 VLAPIKEY，因此 apiKey 映射到 VLAPIKEY
env 可以包含 VLBASEURL 和 VLMODEL

仓库本地运行时配置：

- 将 .env.example 复制为 .env
填写 VLBASEURL、VLAPIKEY 和 VLMODEL
默认情况下，脚本使用 crpi-4auaoyyj6r36p6lb.cn-hangzhou.personal.cr.aliyuncs.com/huozigelab
可选地使用 IMAGEREGISTRY 或 IMAGENAME 进行覆盖
使用 scripts/rundockercli.sh，它会加载 .env，将主机上的所有 VL_* 变量转发到 docker run，并在缺少镜像时拉取正确的固定版本镜像

命令模式

本地文件：

bash
scripts/rundockercli.sh file ./notes.pdf --image-process-model ocr --format json

参数

- --image-process-model ocr

默认模式。对图像使用 Tesseract OCR。

- --image-process-model vl

使用视觉 API。仅当环境提供 VLAPIKEY 和相关变量时选择此选项。

- --image-process-model none

跳过图像识别以提高速度。

- --image-process-model vl-page

仅适用于 PDF。请勿对 Office 文档或图像文件使用此模式。

- --format json|markdown

除非用户明确希望将原始 Markdown 输出到标准输出，否则使用 json。

- --output <路径>

将 Markdown 保存到文件。仅当您直接使用可写的主机挂载调用 docker run 时，才优先使用此选项。

- --log-file <路径>

将详细日志保存到文件。仅当您直接使用可写的主机挂载调用 docker run 时，才优先使用此选项。

操作说明

- 对于非常大的本地文件，请坚持使用 Docker CLI 路径；不要将文件内容包装成 base64 或临时 HTTP 服务。
此技能仅限 Docker。请勿指导用户在生产环境中运行 uv、python 或任何其他本地运行时路径。
包装脚本根据主机架构选择镜像。仅在有具体理由时使用 IMAGEARCH 进行覆盖。
优先使用 IMAGEREGISTRY 加上固定版本 0.0.1；仅当需要显式传递完整镜像引用时才使用 IMAGENAME。
当用户要求使用 VL 或 VL-page 时，首先检查 VLBASEURL、VLAPIKEY 和 VLMODEL 是否已通过 OpenClaw 技能配置或 .env 配置。
如果用户只需要提取的 Markdown 而不是原始 JSON 包装，请读取 JSON 并返回 markdown 字段。
如果用户提供不支持的扩展名，如 .doc、.ppt、.xls、.wav、.mp3、.m4a 或 .mp4，请说明当前技能无法可靠支持该格式。

安全说明

- 将文件路径视为不可信输入。正确引用 shell 参数。
除非命令返回 success: true，否则不要声称成功。

convert_document_to_markdown文件转Markdown