CareMax Upload & OCR

Requires caremax-auth as a sibling directory (../caremax-auth/). If missing, tell the user to install caremax-auth first (e.g. npx skills add KittenYang/caremax-skills).

Upload medical report files (PDF, JPG, PNG, HEIC) and extract structured data via AI-powered OCR.

Session-based workflow: upload → OCR → review → confirm. All operations are on a single session.

Checkpoint & resume: Every pipeline step saves progress to the database. If OCR fails mid-way (LLM timeout, worker crash, network error), retrying automatically resumes from the last checkpoint — no work is lost.

Agent default behavior (MANDATORY)

1. Upload and OCR are one continuous workflow. When the user uploads report files (or asks you to upload/扫描/识别体检报告等), after $UPLOAD returns successfully you must in the same turn run $OCRSTREAM <session_id> using the returned session_id. Do not end the task after upload.sh alone.
Upload-only exception: Skip immediate OCR only if the user explicitly asked to upload without recognition (e.g. 只上传、不要识别、别跑 OCR、只存文件). If unclear, default to running OCR after upload.
Progress: Stream each SSE line to the user as it arrives (normalize / ocr / structure / …).
After step=done: Always continue to Step 3 (review). Do not auto-call confirm — wait for user approval before Step 4.

Prerequisites — Auto-Auth (MANDATORY)

CODEBLOCK0

If any script returns no_credentials → run bash ../caremax-auth/scripts/auth-flow.sh [base_url] (from this skill’s root, sibling of caremax-auth/).

Step 1: Upload (creates session)

CODEBLOCK1

Returns:
CODEBLOCK2

Save the session_id.

Step 2: OCR with real-time progress

CODEBLOCK3

Outputs one JSON per line:
CODEBLOCK4

Display progress to the user as each line arrives.

Key progress events

step	meaning
INLINECODE12	Pipeline is resuming from a saved checkpoint (not starting from zero)
INLINECODE13

If step=resume appears, tell the user: "正在从上次的进度继续处理（不需要重新开始）"

Error responses from `$OCRSTREAM`

code	meaning	action
INLINECODE25	Another OCR run is still active	Wait and retry, or poll INLINECODE26
INLINECODE27

Step 2b: Poll status (when SSE disconnects)

If the SSE stream disconnects (network timeout, terminal closed), use the status endpoint to check progress:

CODEBLOCK5

Returns:
CODEBLOCK6

Field guide:

- status = processing + is_stale = false → OCR is still running normally
INLINECODE30 + is_stale = true → Worker crashed/timed out, safe to retry OCR
INLINECODE32 → OCR completed! Fetch session detail for results
INLINECODE33 + error present → Last OCR attempt failed, retry will resume from checkpoint
INLINECODE35 → How far the pipeline got (normalize → ocr → structure → done)
INLINECODE36 → Number of pages that failed OCR (will be retried on next attempt)

Polling workflow:
CODEBLOCK7

Step 3: Review results (MANDATORY)

Parse the step=done data. Show formatted summary. Do NOT auto-confirm.

Each report has a reportType field: lab, genetic, imaging, pathology, or other.

Lab reports (reportType = "lab")

Show indicators table: CODEBLOCK8

Non-lab reports (reportType = "genetic" / "imaging" / etc.)

Show summary + sections: CODEBLOCK9

Supported file types

- Images (JPG/PNG/HEIC): PaddleOCR → structure
PDF (any size): Azure Mistral Document AI page-split → structure

- Large PDFs (e.g. 23-page gene report, 9.6MB) are fully supported

Step 4: Confirm and save

After user confirms:

CODEBLOCK10

Returns: INLINECODE44

Resuming incomplete sessions

When the user asks to continue/resume a previous upload, or when checking for unfinished work:

Step A: Find pending sessions

CODEBLOCK11

Show a summary of pending sessions to the user (file names, dates, status).

Step B: Resume based on status

- uploading: Start OCR directly → go to Step 2 ($OCRSTREAM <session_id>)

- If there's a saved checkpoint (previous failed attempt), OCR auto-resumes from it

- processing: Check with status endpoint first:

  $APICALL GET "/api/skill/sessions/<session_id>/status"

- is_stale = false → still running, wait or poll - is_stale = true → worker died, safe to retry: $OCRSTREAM <session_id> (auto-resumes from checkpoint)

- awaiting_confirm: Get session detail → show results → go to Step 3 (review & confirm)

CODEBLOCK13

If the session is awaiting_confirm, the response includes ocr_result with the previously parsed reports — display them for review and proceed to Step 3 (confirm).

Resume-aware response handling

When $OCRSTREAM outputs step=done:

- resumed = true in the data → tell user: "已从上次的进度恢复，OCR 结果已就绪"
INLINECODE57 (or absent) → normal fresh run

When $OCRSTREAM outputs step=error:

- code = processing_in_progress → tell user OCR is still running, poll /status instead
INLINECODE62 → tell user to upgrade
No code → LLM/network error, safe to retry (will auto-resume from checkpoint)

Step C: Delete individual reports or stale sessions

Delete a single report (does NOT affect other reports in the same session):

CODEBLOCK14

Delete an entire session (cascade deletes ALL files + reports):

CODEBLOCK15

Other session operations

CODEBLOCK16

CareMax 上传与OCR

需要caremax-auth作为同级目录 (../caremax-auth/)。如果缺失，请告知用户先安装caremax-auth（例如 npx skills add KittenYang/caremax-skills）。

上传医疗报告文件（PDF、JPG、PNG、HEIC），并通过AI驱动的OCR提取结构化数据。

基于会话的工作流程：上传 → OCR → 审核 → 确认。所有操作均在单个会话中完成。

检查点与恢复：每个流程步骤都会将进度保存到数据库。如果OCR中途失败（LLM超时、工作进程崩溃、网络错误），重试时会自动从最后一个检查点恢复——不会丢失任何工作。

代理默认行为（必须遵守）

1. 上传和OCR是一个连续的工作流程。 当用户上传报告文件（或要求您上传/扫描/识别体检报告等）时，在$UPLOAD成功返回后，您必须在同一轮中使用返回的sessionid运行$OCRSTREAM id>。不要在仅执行upload.sh后就结束任务。
仅上传例外： 仅当用户明确要求只上传不识别时（例如只上传、不要识别、别跑OCR、只存文件），才跳过立即执行OCR。如果不确定，默认在上传后运行OCR。
进度： 将每行SSE数据实时流式传输给用户（normalize / ocr / structure / ...）。
在step=done之后： 始终继续执行步骤3（审核）。不要自动调用确认——等待用户批准后再执行步骤4。

前置条件 — 自动认证（必须遵守）

bash
APICALL=bash ../caremax-auth/scripts/api-call.sh
UPLOAD=bash ../caremax-auth/scripts/upload.sh
OCRSTREAM=bash ../caremax-auth/scripts/ocr-stream.sh

如果任何脚本返回nocredentials → 运行bash ../caremax-auth/scripts/auth-flow.sh [baseurl]（从本技能的根目录执行，与caremax-auth/同级）。

步骤1：上传（创建会话）

bash
$UPLOAD /path/to/report1.jpg /path/to/report2.jpg /path/to/report.pdf

返回：
json
{
session_id: uuid-xxx,
member_id: uuid-yyy,
files: [
{ id: file-1, original_name: report1.jpg },
{ id: file-2, original_name: report2.jpg },
{ id: file-3, original_name: report.pdf }
]
}

保存session_id。

步骤2：带实时进度的OCR

bash
$OCRSTREAM

每行输出一个JSON：
json
{step:resume,progress:1,message:从检查点恢复（上次完成：ocr）...}
{step:normalize,progress:5,message:加载文件 1/3...}
{step:ocr,progress:30,message:OCR 第2/3页：report2.jpg}
{step:ocr_retry,progress:35,message:重试OCR 第1/1页：report1.jpg}
{step:structure,progress:62,message:检测报告分组...}
{step:structure,progress:75,message:结构化报告 2/2...}
{step:normalize_indicators,progress:88,message:标准化中...}
{step:done,progress:100,data:{session_id:...,reports:[...],resumed:true}}

在每行到达时向用户显示进度。

关键进度事件

步骤含义
resume 流程正在从保存的检查点恢复（不是从头开始）
info
信息性消息（例如从哪个步骤恢复） | | normalize | 加载和预处理文件 | | ocr | 逐页OCR文本提取 | | ocr_retry | 仅重试之前失败的页面 | | structure | AI分析和分组报告 | | normalize_indicators | 标准化指标名称 | | done | 完成 — data字段包含完整结果 | | error | 流程失败 — 查看message了解详情 |
如果出现step=resume，告知用户：正在从上次的进度继续处理（不需要重新开始）

$OCRSTREAM的错误响应

代码含义操作
processinginprogress 另一个OCR运行仍在进行中等待并重试，或轮询/status
ocrlimitexceeded
免费OCR配额已用尽 | 告知用户升级 | | （无代码） | 流程错误（LLM超时等） | 重试 — 将自动从检查点恢复 |
步骤2b：轮询状态（当SSE断开时）

如果SSE流断开（网络超时、终端关闭），使用状态端点检查进度：

bash
$APICALL GET /api/skill/sessions//status

返回：
json
{
session_id: uuid,
status: processing,
pipeline: {
completedStep: ocr,
pageCount: 5,
ocrCompleted: 4,
ocrFailed: 1,
reportCount: 0,
errors: [{step:ocr,pageIndex:2,message:PaddleOCR超时}]
},
error: null,
is_stale: false
}

字段说明：

- status = processing + isstale = false → OCR仍在正常运行
status = processing + isstale = true → 工作进程崩溃/超时，可以安全重试OCR
status = awaiting_confirm → OCR完成！获取会话详情以查看结果
status = uploading + 存在error → 上次OCR尝试失败，重试将从检查点恢复
pipeline.completedStep → 流程执行到的步骤（normalize → ocr → structure → done）
pipeline.ocrFailed → OCR失败的页数（下次尝试时将重试）

轮询工作流程：

1. 调用$OCRSTREAM → SSE中途断开
每5-10秒轮询GET /sessions//status
当status = awaitingconfirm → 使用GET /sessions/获取完整结果
如果status = uploading（失败）→ 使用$OCRSTREAM重试（自动恢复）
如果isstale = true → 使用$OCRSTREAM重试（从检查点自动恢复）

步骤3：审核结果（必须执行）

解析step=done的数据。显示格式化摘要。不要自动确认。

每个报告都有一个reportType字段：lab、genetic、imaging、pathology或other。

检验报告（reportType = lab）
显示指标表格：
📋 报告 1: [lab] 尿生化 (编号: 114431194)
日期: 2025-02-05 医生: 俞海瑾
指标: 12 个 (3 个异常)
┌──────────────────────┬────────┬──────────┬────────────┬──────┐
│ 指标 │ 结果 │ 单位 │ 参考范围 │ 异常 │
├──────────────────────┼────────┼──────────┼────────────┼──────┤
│ 24H尿钠 │ 130.0 │ mmol/24h │ 137-257 │ ⬇ │
└──────────────────────┴────────┴──────────┴────────────┴──────┘

非检验报告（reportType = genetic / imaging / 等）
显示摘要 + 章节：
📋 报告 1: [genetic] 基因检测报告
日期: 2025-09-12 检测机构: 南京申友医学检验所
摘要: 心血管18项基因检测...高血压、冠心病风险一般...
段落: 18 sections
[gene_variant] 高血压 — 风险: 正常
[gene_variant] 冠心病 — 风险: 一般
[medication] ACEI类降压药 — 正常代谢型
...

支持的文件类型

- 图片（JPG/PNG/HEIC）：PaddleOCR → 结构化
PDF（任意大小）：Azure Mistral Document AI 分页 → 结构化
- 大型PDF（例如23页基因报告、9.6MB）完全支持

caremax-ocrCareMax OCR

caremax-ocr

CareMax Upload & OCR

Agent default behavior (MANDATORY)

Prerequisites — Auto-Auth (MANDATORY)

Step 1: Upload (creates session)