CareMax Upload & OCR
Requires caremax-auth as a sibling directory (../caremax-auth/). If missing, tell the user to install caremax-auth first (e.g. npx skills add KittenYang/caremax-skills).
Upload medical report files (PDF, JPG, PNG, HEIC) and extract structured data via AI-powered OCR.
Session-based workflow: upload → OCR → review → confirm. All operations are on a single session.
Checkpoint & resume: Every pipeline step saves progress to the database. If OCR fails mid-way (LLM timeout, worker crash, network error), retrying automatically resumes from the last checkpoint — no work is lost.
Agent default behavior (MANDATORY)
- 1. Upload and OCR are one continuous workflow. When the user uploads report files (or asks you to upload/扫描/识别体检报告等), after
$UPLOAD returns successfully you must in the same turn run $OCRSTREAM <session_id> using the returned session_id. Do not end the task after upload.sh alone. - Upload-only exception: Skip immediate OCR only if the user explicitly asked to upload without recognition (e.g. 只上传、不要识别、别跑 OCR、只存文件). If unclear, default to running OCR after upload.
- Progress: Stream each SSE line to the user as it arrives (normalize / ocr / structure / …).
- After
step=done: Always continue to Step 3 (review). Do not auto-call confirm — wait for user approval before Step 4.
Prerequisites — Auto-Auth (MANDATORY)
CODEBLOCK0
If any script returns no_credentials → run bash ../caremax-auth/scripts/auth-flow.sh [base_url] (from this skill’s root, sibling of caremax-auth/).
Step 1: Upload (creates session)
CODEBLOCK1
Returns:
CODEBLOCK2
Save the session_id.
Step 2: OCR with real-time progress
CODEBLOCK3
Outputs one JSON per line:
CODEBLOCK4
Display progress to the user as each line arrives.
Key progress events
| step | meaning |
|---|
| INLINECODE12 | Pipeline is resuming from a saved checkpoint (not starting from zero) |
| INLINECODE13 |
Informational message (e.g. which step was resumed from) |
|
normalize | Loading and preprocessing files |
|
ocr | OCR text extraction per page |
|
ocr_retry | Retrying previously failed pages only |
|
structure | AI analyzing and grouping reports |
|
normalize_indicators | Standardizing indicator names |
|
done | Complete —
data field contains the full results |
|
error | Pipeline failed — check
message for details |
If step=resume appears, tell the user: "正在从上次的进度继续处理(不需要重新开始)"
Error responses from $OCRSTREAM
| code | meaning | action |
|---|
| INLINECODE25 | Another OCR run is still active | Wait and retry, or poll INLINECODE26 |
| INLINECODE27 |
Free OCR quota exhausted | Tell user to upgrade |
| (no code) | Pipeline error (LLM timeout etc.) | Retry — will auto-resume from checkpoint |
Step 2b: Poll status (when SSE disconnects)
If the SSE stream disconnects (network timeout, terminal closed), use the status endpoint to check progress:
CODEBLOCK5
Returns:
CODEBLOCK6
Field guide:
- -
status = processing + is_stale = false → OCR is still running normally - INLINECODE30 +
is_stale = true → Worker crashed/timed out, safe to retry OCR - INLINECODE32 → OCR completed! Fetch session detail for results
- INLINECODE33 +
error present → Last OCR attempt failed, retry will resume from checkpoint - INLINECODE35 → How far the pipeline got (normalize → ocr → structure → done)
- INLINECODE36 → Number of pages that failed OCR (will be retried on next attempt)
Polling workflow:
CODEBLOCK7
Step 3: Review results (MANDATORY)
Parse the step=done data. Show formatted summary. Do NOT auto-confirm.
Each report has a reportType field: lab, genetic, imaging, pathology, or other.
Lab reports (reportType = "lab")
Show indicators table:
CODEBLOCK8
Non-lab reports (reportType = "genetic" / "imaging" / etc.)
Show summary + sections:
CODEBLOCK9
Supported file types
- - Images (JPG/PNG/HEIC): PaddleOCR → structure
- PDF (any size): Azure Mistral Document AI page-split → structure
- Large PDFs (e.g. 23-page gene report, 9.6MB) are fully supported
Step 4: Confirm and save
After user confirms:
CODEBLOCK10
Returns: INLINECODE44
Resuming incomplete sessions
When the user asks to continue/resume a previous upload, or when checking for unfinished work:
Step A: Find pending sessions
CODEBLOCK11
Show a summary of pending sessions to the user (file names, dates, status).
Step B: Resume based on status
- -
uploading: Start OCR directly → go to Step 2 ($OCRSTREAM <session_id>)
- If there's a saved checkpoint (previous failed attempt), OCR auto-resumes from it
- -
processing: Check with status endpoint first:
$APICALL GET "/api/skill/sessions/<session_id>/status"
-
is_stale = false → still running, wait or poll
-
is_stale = true → worker died, safe to retry:
$OCRSTREAM <session_id> (auto-resumes from checkpoint)
- -
awaiting_confirm: Get session detail → show results → go to Step 3 (review & confirm)
CODEBLOCK13
If the session is awaiting_confirm, the response includes ocr_result with the previously parsed reports — display them for review and proceed to Step 3 (confirm).
Resume-aware response handling
When $OCRSTREAM outputs step=done:
- -
resumed = true in the data → tell user: "已从上次的进度恢复,OCR 结果已就绪" - INLINECODE57 (or absent) → normal fresh run
When $OCRSTREAM outputs step=error:
- -
code = processing_in_progress → tell user OCR is still running, poll /status instead - INLINECODE62 → tell user to upgrade
- No code → LLM/network error, safe to retry (will auto-resume from checkpoint)
Step C: Delete individual reports or stale sessions
Delete a single report (does NOT affect other reports in the same session):
CODEBLOCK14
Delete an entire session (cascade deletes ALL files + reports):
CODEBLOCK15
Other session operations
CODEBLOCK16
CareMax 上传与OCR
需要caremax-auth作为同级目录 (../caremax-auth/)。如果缺失,请告知用户先安装caremax-auth(例如 npx skills add KittenYang/caremax-skills)。
上传医疗报告文件(PDF、JPG、PNG、HEIC),并通过AI驱动的OCR提取结构化数据。
基于会话的工作流程:上传 → OCR → 审核 → 确认。所有操作均在单个会话中完成。
检查点与恢复:每个流程步骤都会将进度保存到数据库。如果OCR中途失败(LLM超时、工作进程崩溃、网络错误),重试时会自动从最后一个检查点恢复——不会丢失任何工作。
代理默认行为(必须遵守)
- 1. 上传和OCR是一个连续的工作流程。 当用户上传报告文件(或要求您上传/扫描/识别体检报告等)时,在$UPLOAD成功返回后,您必须在同一轮中使用返回的sessionid运行$OCRSTREAM id>。不要在仅执行upload.sh后就结束任务。
- 仅上传例外: 仅当用户明确要求只上传不识别时(例如只上传、不要识别、别跑OCR、只存文件),才跳过立即执行OCR。如果不确定,默认在上传后运行OCR。
- 进度: 将每行SSE数据实时流式传输给用户(normalize / ocr / structure / ...)。
- 在step=done之后: 始终继续执行步骤3(审核)。不要自动调用确认——等待用户批准后再执行步骤4。
前置条件 — 自动认证(必须遵守)
bash
APICALL=bash ../caremax-auth/scripts/api-call.sh
UPLOAD=bash ../caremax-auth/scripts/upload.sh
OCRSTREAM=bash ../caremax-auth/scripts/ocr-stream.sh
如果任何脚本返回nocredentials → 运行bash ../caremax-auth/scripts/auth-flow.sh [baseurl](从本技能的根目录执行,与caremax-auth/同级)。
步骤1:上传(创建会话)
bash
$UPLOAD /path/to/report1.jpg /path/to/report2.jpg /path/to/report.pdf
返回:
json
{
session_id: uuid-xxx,
member_id: uuid-yyy,
files: [
{ id: file-1, original_name: report1.jpg },
{ id: file-2, original_name: report2.jpg },
{ id: file-3, original_name: report.pdf }
]
}
保存session_id。
步骤2:带实时进度的OCR
bash
$OCRSTREAM
每行输出一个JSON:
json
{step:resume,progress:1,message:从检查点恢复(上次完成:ocr)...}
{step:normalize,progress:5,message:加载文件 1/3...}
{step:ocr,progress:30,message:OCR 第2/3页:report2.jpg}
{step:ocr_retry,progress:35,message:重试OCR 第1/1页:report1.jpg}
{step:structure,progress:62,message:检测报告分组...}
{step:structure,progress:75,message:结构化报告 2/2...}
{step:normalize_indicators,progress:88,message:标准化中...}
{step:done,progress:100,data:{session_id:...,reports:[...],resumed:true}}
在每行到达时向用户显示进度。
关键进度事件
| 步骤 | 含义 |
|---|
| resume | 流程正在从保存的检查点恢复(不是从头开始) |
| info |
信息性消息(例如从哪个步骤恢复) |
| normalize | 加载和预处理文件 |
| ocr | 逐页OCR文本提取 |
| ocr_retry | 仅重试之前失败的页面 |
| structure | AI分析和分组报告 |
| normalize_indicators | 标准化指标名称 |
| done | 完成 — data字段包含完整结果 |
| error | 流程失败 — 查看message了解详情 |
如果出现step=resume,告知用户:正在从上次的进度继续处理(不需要重新开始)
$OCRSTREAM的错误响应
| 代码 | 含义 | 操作 |
|---|
| processinginprogress | 另一个OCR运行仍在进行中 | 等待并重试,或轮询/status |
| ocrlimitexceeded |
免费OCR配额已用尽 | 告知用户升级 |
| (无代码) | 流程错误(LLM超时等) | 重试 — 将自动从检查点恢复 |
步骤2b:轮询状态(当SSE断开时)
如果SSE流断开(网络超时、终端关闭),使用状态端点检查进度:
bash
$APICALL GET /api/skill/sessions//status
返回:
json
{
session_id: uuid,
status: processing,
pipeline: {
completedStep: ocr,
pageCount: 5,
ocrCompleted: 4,
ocrFailed: 1,
reportCount: 0,
errors: [{step:ocr,pageIndex:2,message:PaddleOCR超时}]
},
error: null,
is_stale: false
}
字段说明:
- - status = processing + isstale = false → OCR仍在正常运行
- status = processing + isstale = true → 工作进程崩溃/超时,可以安全重试OCR
- status = awaiting_confirm → OCR完成!获取会话详情以查看结果
- status = uploading + 存在error → 上次OCR尝试失败,重试将从检查点恢复
- pipeline.completedStep → 流程执行到的步骤(normalize → ocr → structure → done)
- pipeline.ocrFailed → OCR失败的页数(下次尝试时将重试)
轮询工作流程:
- 1. 调用$OCRSTREAM → SSE中途断开
- 每5-10秒轮询GET /sessions//status
- 当status = awaitingconfirm → 使用GET /sessions/获取完整结果
- 如果status = uploading(失败)→ 使用$OCRSTREAM重试(自动恢复)
- 如果isstale = true → 使用$OCRSTREAM重试(从检查点自动恢复)
步骤3:审核结果(必须执行)
解析step=done的数据。显示格式化摘要。不要自动确认。
每个报告都有一个reportType字段:lab、genetic、imaging、pathology或other。
检验报告(reportType = lab)
显示指标表格:
📋 报告 1: [lab] 尿生化 (编号: 114431194)
日期: 2025-02-05 医生: 俞海瑾
指标: 12 个 (3 个异常)
┌──────────────────────┬────────┬──────────┬────────────┬──────┐
│ 指标 │ 结果 │ 单位 │ 参考范围 │ 异常 │
├──────────────────────┼────────┼──────────┼────────────┼──────┤
│ 24H尿钠 │ 130.0 │ mmol/24h │ 137-257 │ ⬇ │
└──────────────────────┴────────┴──────────┴────────────┴──────┘
非检验报告(reportType = genetic / imaging / 等)
显示摘要 + 章节:
📋 报告 1: [genetic] 基因检测报告
日期: 2025-09-12 检测机构: 南京申友医学检验所
摘要: 心血管18项基因检测...高血压、冠心病风险一般...
段落: 18 sections
[gene_variant] 高血压 — 风险: 正常
[gene_variant] 冠心病 — 风险: 一般
[medication] ACEI类降压药 — 正常代谢型
...
支持的文件类型
- - 图片(JPG/PNG/HEIC):PaddleOCR → 结构化
- PDF(任意大小):Azure Mistral Document AI 分页 → 结构化
- 大型PDF(例如23页基因报告、9.6MB)完全支持