doc2markdown
Document conversion assistant that automatically converts documents to Markdown (MD), saving output to the same directory as the source file. Designed to help intelligent agents read and process document content in various formats.
Quick Start
CODEBLOCK0
Capabilities
- - Supported formats: docx, doc, pdf, ppt, pptx, xls, xlsx, jpg, jpeg, png, ceb, teb, caj, odt, ofd, cebx, odp, ott, wps, ods, et, dps, epub, chm, sdc, sdd, sdw, mobi, etc.
- Preserves document structure, tables, and images
- No API Key or account required, zero external dependencies
- Output directory:
{doc_id}_{filename}/ in the same directory as source file
When to Use
- - User requests to "read", "extract", "convert", or "view" a document
- User provides a document path and asks about its content
- User needs to summarize or analyze a document
- User needs to convert document content to Markdown format
Workflow
convert — Convert Document
- 1. Invoke file parsing service
- Auto-poll conversion status (up to 60 seconds)
- Completes within 60s → Auto-download and extract to source file directory
- Exceeds 60s → Return doc ID for subsequent
check query
check — Query and Download
- 1. Provide the previously returned doc ID
- Download if complete, otherwise continue polling for 60 seconds
- Prompt to retry later if still not complete
Data & Privacy
- -
convert uploads files to the docchain cloud service (lab.hjcloud.com) for parsing. Results are returned as a ZIP archive and extracted locally. - All transfers use HTTPS encryption.
- Before converting documents with sensitive content, please verify that the service's data retention policy meets your security requirements.
- For details, visit https://lab.hjcloud.com/llmdoc
Feedback & Support
For parsing errors, format issues, or other problems, please submit an issue on GitHub:
https://github.com/wct-lab/docchain-skills
doc2markdown
文档转换助手,可自动将文档转换为Markdown(MD)格式,并将输出保存至源文件所在目录。旨在帮助智能代理读取和处理多种格式的文档内容。
快速开始
bash
转换文档(自动轮询60秒,若完成则下载,若超时则返回文档ID)
node scripts/doc2markdown.js convert <文件路径>
查询状态并下载(针对超时未完成的文档)
node scripts/doc2markdown.js check <文档ID> <原始文件路径>
功能特性
- - 支持格式:docx、doc、pdf、ppt、pptx、xls、xlsx、jpg、jpeg、png、ceb、teb、caj、odt、ofd、cebx、odp、ott、wps、ods、et、dps、epub、chm、sdc、sdd、sdw、mobi等
- 保留文档结构、表格和图片
- 无需API密钥或账户,零外部依赖
- 输出目录:源文件所在目录下的{文档ID}_{文件名}/文件夹
适用场景
- - 用户要求读取、提取、转换或查看文档
- 用户提供文档路径并询问其内容
- 用户需要总结或分析文档
- 用户需要将文档内容转换为Markdown格式
工作流程
convert — 转换文档
- 1. 调用文件解析服务
- 自动轮询转换状态(最长60秒)
- 60秒内完成 → 自动下载并解压至源文件目录
- 超过60秒 → 返回文档ID,供后续check查询使用
check — 查询并下载
- 1. 提供之前返回的文档ID
- 若完成则下载,否则继续轮询60秒
- 若仍未完成,提示稍后重试
数据与隐私
- - convert操作将文件上传至docchain云服务(lab.hjcloud.com)进行解析。结果以ZIP压缩包形式返回,并在本地解压。
- 所有传输均使用HTTPS加密。
- 在转换包含敏感内容的文档前,请确认该服务的数据保留策略符合您的安全要求。
- 详情请访问 https://lab.hjcloud.com/llmdoc
反馈与支持
如遇解析错误、格式问题或其他情况,请在GitHub提交issue:
https://github.com/wct-lab/docchain-skills