Finance OCR Pro

Run this skill only after OCR intent from the user.

This skill is especially helpful for financial reports, annual reports, prospectuses, investor presentations, regulatory filings, research reports, and other documents with complicated structure, charts, graphs, tables, and mixed layout elements.

Security And Privacy

Before running OCR, make the operating model clear:

- This skill requires three environment variables, all of which must be configured before OCR can run:

- API_KEY (sensitive) -- the API key for authenticating with the VLM endpoint. - BASE_URL -- the base URL of the OpenAI-compatible VLM endpoint. All page images and OCR prompts are transmitted to this URL. - VLM_MODEL -- the vision-capable model identifier. Must support image inputs; text-only models will not work.

- OCR sends rendered page images and structured prompts to BASE_URL. This is the primary data-transmission path. Users must verify that the endpoint is trusted before processing sensitive documents.
If the user wants offline or local-only OCR, BASE_URL must point to a local VLM service. Do not run this skill against an external endpoint with sensitive documents unless the provider is trusted.
Never commit a populated .env file. Use .env.example as a template and keep real credentials local.

Pre-Run Notice

After the user asks for OCR or extraction, give a short notice that includes:

- whether BASE_URL is local or remote
which VLM_MODEL will be used
which execution mode will be used
where results will be written
that page images and prompts will be transmitted to the configured endpoint

Proceed automatically unless the user asks to change those defaults.

Defaults To Announce

- Running mode: background job by default
Model: INLINECODE9
Threads: INLINECODE10
Result path:

- background: ~/.semantic-ocr/jobs/<job_id>/results/ - synchronous: INLINECODE12

Setup

Use the skill-local virtual environment if present.

- macOS/Linux: INLINECODE13
Windows: INLINECODE14
Fallback: INLINECODE15

Run:

CODEBLOCK0

If setup is incomplete, run:

CODEBLOCK1

Preferred Execution

By default, start a background worker:

CODEBLOCK2

Then inspect progress and outputs:

CODEBLOCK3

Use synchronous mode only when the user explicitly wants inline execution:

CODEBLOCK4

Notes

- Inputs: PDF, common office documents, Apple office formats, and images.
Outputs: merged Markdown, HTML review report, DOCX, and Excel.
OCR requires API_KEY, BASE_URL, and VLM_MODEL to be configured before running.
Sensitive document pages are transmitted to the configured endpoint during OCR unless the endpoint is a local service.
Best suited for financial documents and other visually dense materials with tables, charts, graphs, and complex page structure.
Office-document conversion may require LibreOffice.
OCR extraction by the VLM model may be time-consuming; check the status regularly.

Finance OCR Pro

仅在用户提出OCR意图后运行此技能。

此技能特别适用于财务报表、年报、招股说明书、投资者演示文稿、监管文件、研究报告以及其他结构复杂、包含图表、图形、表格和混合布局元素的文档。

安全与隐私

在运行OCR之前，明确操作模式：

- 此技能需要三个环境变量，所有变量必须在OCR运行前配置完成：

- API_KEY（敏感信息）——用于向VLM端点进行身份验证的API密钥。 - BASE_URL——兼容OpenAI的VLM端点的基础URL。所有页面图像和OCR提示词都将传输到此URL。 - VLM_MODEL——支持视觉功能的模型标识符。必须支持图像输入；纯文本模型无法工作。

- OCR会将渲染后的页面图像和结构化提示词发送到BASEURL。这是主要的数据传输路径。在处理敏感文档前，用户必须确认端点可信。
如果用户需要离线或仅本地OCR，BASEURL必须指向本地VLM服务。除非服务提供商可信，否则不要对敏感文档使用外部端点运行此技能。
切勿提交包含实际值的.env文件。使用.env.example作为模板，并将真实凭据保存在本地。

运行前通知

在用户请求OCR或提取后，给出简短通知，内容包括：

- BASEURL是本地还是远程
将使用哪个VLMMODEL
将使用哪种执行模式
结果将写入何处
页面图像和提示词将传输到已配置的端点

除非用户要求更改这些默认设置，否则自动继续执行。

需声明的默认设置

- 运行模式：默认后台作业
模型：VLM_MODEL
线程数：1
结果路径：

- 后台：~/.semantic-ocr/jobs//results/ - 同步：ocroutput/OCR/results/

环境设置

如果存在技能本地虚拟环境，则使用该环境。

- macOS/Linux：.venv/bin/python
Windows：.venv/Scripts/python.exe
备用：python

运行：

bash
python scripts/ocr_setup.py --check

如果设置不完整，运行：

bash
python scripts/ocr_setup.py

首选执行方式

默认情况下，启动后台工作进程：

bash
python scripts/ocrctl.py --json start /path/to/document.pdf

然后检查进度和输出：

bash
python scripts/ocrctl.py --json status
python scripts/ocrctl.py --json artifacts
python scripts/ocrctl.py --json tail

仅当用户明确要求内联执行时使用同步模式：

bash
python scripts/ocr_main.py /path/to/document.pdf

备注

- 输入：PDF、常见办公文档、Apple办公格式和图像。
输出：合并后的Markdown、HTML审查报告、DOCX和Excel。
OCR运行前需要配置APIKEY、BASEURL和VLM_MODEL。
在OCR过程中，敏感文档页面会传输到已配置的端点，除非端点是本地服务。
最适合处理包含表格、图表、图形和复杂页面结构的财务文档及其他视觉密集材料。
办公文档转换可能需要LibreOffice。
VLM模型的OCR提取可能耗时较长，请定期检查状态。

finance-ocr-pro金融OCR专业

finance-ocr-pro

Finance OCR Pro

Security And Privacy

Pre-Run Notice

Defaults To Announce

Setup

Preferred Execution

Notes

Finance OCR Pro

安全与隐私

运行前通知

需声明的默认设置

环境设置

首选执行方式

备注

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

finance-ocr-pro金融OCR专业

finance-ocr-pro

Finance OCR Pro

Security And Privacy

Pre-Run Notice

Defaults To Announce

Setup

Preferred Execution

Notes

Finance OCR Pro

安全与隐私

运行前通知

需声明的默认设置

环境设置

首选执行方式

备注

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement