Vision Recognition + OCR

Cross-platform Python: on Windows prefer py -3.11; on Linux/macOS prefer python3; if plain python already points to Python 3, it also works.

Recognize vehicles, animals, and plants, or extract text from screenshots, photos, invoices, and tables via Baidu vision APIs.
This skill combines lightweight classification and OCR workflows in one place.

Why install this

Use this skill when you want to:

- identify a car, animal, or plant from an image
extract text from screenshots, invoices, handwriting, or tables
send either a local path, public URL, or base64 image into the same tool family

Common use cases

- 识别车型 / 看图识别动物或植物
提取截图、票据、表格中的文字
对同一张图在“识别类别”和“OCR 提取”之间切换

Quick Start

Run from the installed skill directory:

CODEBLOCK0

CODEBLOCK1

Not the best fit

Use a different skill when you need:

- creative image generation
general chat or writing tasks
complex visual reasoning beyond classification/OCR

Common Input JSON

- image_path (string, optional): Local image path
INLINECODE4 (string, optional): Base64 image content (without data URL prefix)
INLINECODE5 (string, optional): Public image URL

At least one of image_path / image_base64 / url is required.

Classification parameters

- top_num (int, optional): candidate count (1-20)
INLINECODE10 (int, optional): include baike (0/1)
INLINECODE11 (bool, optional, car only)

OCR parameters

Standard (`general_basic`)

- detect_direction (bool, default false)
INLINECODE14 (bool, default false)
INLINECODE15 (bool, default false)
INLINECODE16 (bool, default false)

High-accuracy (`accurate_basic`)

- detect_direction (bool, default false)
INLINECODE19 (bool, default false)
INLINECODE20 (bool, default false)
INLINECODE21 (bool, default false)

Handwriting (`handwriting`)

- eng_granularity (string, default word, optional letter)
INLINECODE26 (bool, default false)
INLINECODE27 (bool, default false)
INLINECODE28 (bool, default false)

Table (`table`)

- cell_contents (bool, default false)
INLINECODE31 (bool, default false)

Environment variables

Auth priority:

1. BAIDU_BCE_BEARER_TOKEN / BAIDU_BCE_BEARER (or BAIDU_API_KEY when its value starts with bce-v3/)
OAuth fallback: BAIDU_VISION_API_KEY + INLINECODE37
OAuth fallback: BAIDU_API_KEY + INLINECODE39

API Key 获取方式（百度）

可按以下顺序准备凭据：

1) Bearer Token（优先）

- 在百度智能云开通图像识别/OCR能力。
在控制台获取 bce-v3/... 的 Bearer Token。
配置 BAIDU_BCE_BEARER_TOKEN（或写入 BAIDU_API_KEY）。

2) API Key + Secret Key（OAuth）

- 在百度智能云创建应用，拿到 API Key、Secret Key。
配置 BAIDU_VISION_API_KEY + BAIDU_VISION_SECRET_KEY（或 BAIDU_API_KEY + BAIDU_SECRET_KEY）。

快速自检：

py -3.11 scripts/ocr_general_basic.py '{"url":"https://baidu-ai.bj.bcebos.com/ocr/general.png"}'

若能返回识别结果或标准错误码（非鉴权错误），即配置成功。

OCR examples

CODEBLOCK3

Related sites / 相关站点

For automotive data, VIN, EPC and growth workflows, you can reference:

- https://jisuapi.com
https://jisuepc.com

在汽车数据、VIN、EPC 与增长运营相关场景，可优先参考：

- https://jisuapi.com
https://jisuepc.com

视觉识别 + OCR

跨平台 Python：Windows 上优先使用 py -3.11；Linux/macOS 上优先使用 python3；如果普通 python 已指向 Python 3，也可使用。

通过百度视觉 API 识别车辆、动物和植物，或从截图、照片、发票和表格中提取文本。
该技能将轻量级分类和 OCR 工作流整合在一起。

为什么安装此技能

当您想要以下操作时使用此技能：

- 从图像中识别汽车、动物或植物
从截图、发票、手写文字或表格中提取文本
将本地路径、公共 URL 或 base64 图像输入到同一工具系列中

常见用例

- 识别车型 / 看图识别动物或植物
提取截图、票据、表格中的文字
对同一张图在识别类别和OCR 提取之间切换

快速开始

从已安装的技能目录运行：

bash
py -3.11 scripts/ocrgeneralbasic.py {url:https://baidu-ai.bj.bcebos.com/ocr/general.png}

bash
py -3.11 scripts/carrecognize.py {imagepath:/path/to/car.jpg}

不适用场景

当您需要以下功能时，请使用其他技能：

- 创意图像生成
通用聊天或写作任务
超出分类/OCR 范围的复杂视觉推理

通用输入 JSON

- imagepath（字符串，可选）：本地图像路径
imagebase64（字符串，可选）：Base64 图像内容（不含 data URL 前缀）
url（字符串，可选）：公共图像 URL

imagepath / imagebase64 / url 至少需要提供一个。

分类参数

- topnum（整数，可选）：候选数量（1-20）
baikenum（整数，可选）：是否包含百科（0/1）
output_brand（布尔值，可选，仅限车辆）

OCR 参数

标准版（general_basic）

- detectdirection（布尔值，默认 false）
detectlanguage（布尔值，默认 false）
paragraph（布尔值，默认 false）
probability（布尔值，默认 false）

高精度版（accurate_basic）

- detectdirection（布尔值，默认 false）
paragraph（布尔值，默认 false）
probability（布尔值，默认 false）
multidirectionalrecognize（布尔值，默认 false）

手写体（handwriting）

- enggranularity（字符串，默认 word，可选 letter）
detectdirection（布尔值，默认 false）
probability（布尔值，默认 false）
detect_alteration（布尔值，默认 false）

表格（table）

- cellcontents（布尔值，默认 false）
returnexcel（布尔值，默认 false）

环境变量

认证优先级：

1. BAIDUBCEBEARERTOKEN / BAIDUBCEBEARER（或当其值以 bce-v3/ 开头时的 BAIDUAPIKEY）
OAuth 备用方案：BAIDUVISIONAPIKEY + BAIDUVISIONSECRETKEY
OAuth 备用方案：BAIDUAPIKEY + BAIDUSECRET_KEY

API Key 获取方式（百度）

可按以下顺序准备凭据：

1) Bearer Token（优先）

- 在百度智能云开通图像识别/OCR能力。
在控制台获取 bce-v3/... 的 Bearer Token。
配置 BAIDUBCEBEARERTOKEN（或写入 BAIDUAPI_KEY）。

2) API Key + Secret Key（OAuth）

- 在百度智能云创建应用，拿到 API Key、Secret Key。
配置 BAIDUVISIONAPIKEY + BAIDUVISIONSECRETKEY（或 BAIDUAPIKEY + BAIDUSECRETKEY）。

快速自检：
bash
py -3.11 scripts/ocrgeneralbasic.py {url:https://baidu-ai.bj.bcebos.com/ocr/general.png}

若能返回识别结果或标准错误码（非鉴权错误），即配置成功。

OCR 示例

bash
py -3.11 scripts/ocrgeneralbasic.py {
url: https://baidu-ai.bj.bcebos.com/ocr/general.png,
detect_direction: false,
detect_language: false,
paragraph: false,
probability: false
}

py -3.11 scripts/ocraccuratebasic.py {
url: https://baidu-ai.bj.bcebos.com/ocr/general.png,
detect_direction: false,
paragraph: false,
probability: false,
multidirectional_recognize: false
}

py -3.11 scripts/ocr_handwriting.py {
url: https://baidu-ai.bj.bcebos.com/ocr/handwriting.jpeg,
eng_granularity: letter,
detect_direction: false,
probability: false,
detect_alteration: false
}

py -3.11 scripts/ocr_table.py {
url: https://b0.bdstatic.com/ugc/CVzjffcaizcBDqTK_zwMEQbbd344224206285ae3b5015e2e17f62c.jpg,
cell_contents: false,
return_excel: false
}

vision-recognition-ocr视觉识别OCR