Finance OCR Pro
Run this skill only after OCR intent from the user.
This skill is especially helpful for financial reports, annual reports, prospectuses, investor presentations, regulatory filings, research reports, and other documents with complicated structure, charts, graphs, tables, and mixed layout elements.
Security And Privacy
Before running OCR, make the operating model clear:
- - This skill requires three environment variables, all of which must be configured before OCR can run:
-
API_KEY (sensitive) -- the API key for authenticating with the VLM endpoint.
-
BASE_URL -- the base URL of the OpenAI-compatible VLM endpoint. All page images and OCR prompts are transmitted to this URL.
-
VLM_MODEL -- the vision-capable model identifier. Must support image inputs; text-only models will not work.
- - OCR sends rendered page images and structured prompts to
BASE_URL. This is the primary data-transmission path. Users must verify that the endpoint is trusted before processing sensitive documents. - If the user wants offline or local-only OCR,
BASE_URL must point to a local VLM service. Do not run this skill against an external endpoint with sensitive documents unless the provider is trusted. - Never commit a populated
.env file. Use .env.example as a template and keep real credentials local.
Pre-Run Notice
After the user asks for OCR or extraction, give a short notice that includes:
- - whether
BASE_URL is local or remote - which
VLM_MODEL will be used - which execution mode will be used
- where results will be written
- that page images and prompts will be transmitted to the configured endpoint
Proceed automatically unless the user asks to change those defaults.
Defaults To Announce
- - Running mode: background job by default
- Model: INLINECODE9
- Threads: INLINECODE10
- Result path:
- background:
~/.semantic-ocr/jobs/<job_id>/results/
- synchronous: INLINECODE12
Setup
Use the skill-local virtual environment if present.
- - macOS/Linux: INLINECODE13
- Windows: INLINECODE14
- Fallback: INLINECODE15
Run:
CODEBLOCK0
If setup is incomplete, run:
CODEBLOCK1
Preferred Execution
By default, start a background worker:
CODEBLOCK2
Then inspect progress and outputs:
CODEBLOCK3
Use synchronous mode only when the user explicitly wants inline execution:
CODEBLOCK4
Notes
- - Inputs: PDF, common office documents, Apple office formats, and images.
- Outputs: merged Markdown, HTML review report, DOCX, and Excel.
- OCR requires
API_KEY, BASE_URL, and VLM_MODEL to be configured before running. - Sensitive document pages are transmitted to the configured endpoint during OCR unless the endpoint is a local service.
- Best suited for financial documents and other visually dense materials with tables, charts, graphs, and complex page structure.
- Office-document conversion may require LibreOffice.
- OCR extraction by the VLM model may be time-consuming; check the status regularly.
Finance OCR Pro
仅在用户提出OCR意图后运行此技能。
此技能特别适用于财务报表、年报、招股说明书、投资者演示文稿、监管文件、研究报告以及其他结构复杂、包含图表、图形、表格和混合布局元素的文档。
安全与隐私
在运行OCR之前,明确操作模式:
- - 此技能需要三个环境变量,所有变量必须在OCR运行前配置完成:
- API_KEY(敏感信息)——用于向VLM端点进行身份验证的API密钥。
- BASE_URL——兼容OpenAI的VLM端点的基础URL。所有页面图像和OCR提示词都将传输到此URL。
- VLM_MODEL——支持视觉功能的模型标识符。必须支持图像输入;纯文本模型无法工作。
- - OCR会将渲染后的页面图像和结构化提示词发送到BASEURL。这是主要的数据传输路径。在处理敏感文档前,用户必须确认端点可信。
- 如果用户需要离线或仅本地OCR,BASEURL必须指向本地VLM服务。除非服务提供商可信,否则不要对敏感文档使用外部端点运行此技能。
- 切勿提交包含实际值的.env文件。使用.env.example作为模板,并将真实凭据保存在本地。
运行前通知
在用户请求OCR或提取后,给出简短通知,内容包括:
- - BASEURL是本地还是远程
- 将使用哪个VLMMODEL
- 将使用哪种执行模式
- 结果将写入何处
- 页面图像和提示词将传输到已配置的端点
除非用户要求更改这些默认设置,否则自动继续执行。
需声明的默认设置
- - 运行模式:默认后台作业
- 模型:VLM_MODEL
- 线程数:1
- 结果路径:
- 后台:~/.semantic-ocr/jobs/
/results/
- 同步:ocroutput/OCR/results/
环境设置
如果存在技能本地虚拟环境,则使用该环境。
- - macOS/Linux:.venv/bin/python
- Windows:.venv/Scripts/python.exe
- 备用:python
运行:
bash
python scripts/ocr_setup.py --check
如果设置不完整,运行:
bash
python scripts/ocr_setup.py
首选执行方式
默认情况下,启动后台工作进程:
bash
python scripts/ocrctl.py --json start /path/to/document.pdf
然后检查进度和输出:
bash
python scripts/ocrctl.py --json status
python scripts/ocrctl.py --json artifacts
python scripts/ocrctl.py --json tail
仅当用户明确要求内联执行时使用同步模式:
bash
python scripts/ocr_main.py /path/to/document.pdf
备注
- - 输入:PDF、常见办公文档、Apple办公格式和图像。
- 输出:合并后的Markdown、HTML审查报告、DOCX和Excel。
- OCR运行前需要配置APIKEY、BASEURL和VLM_MODEL。
- 在OCR过程中,敏感文档页面会传输到已配置的端点,除非端点是本地服务。
- 最适合处理包含表格、图表、图形和复杂页面结构的财务文档及其他视觉密集材料。
- 办公文档转换可能需要LibreOffice。
- VLM模型的OCR提取可能耗时较长,请定期检查状态。