imsg-media

Full iMessage multimedia pipeline:

- 🎙️ Voice memo → text via Silicon Flow ASR (SenseVoiceSmall, cloud, no local model)
🖼️ Image → description/OCR via agent's built-in vision model

Requirements

macOS permissions

- Full Disk Access must be granted to the process running OpenClaw
Settings → Privacy & Security → Full Disk Access → add your terminal/app
Without this, imsg cannot read ~/Library/Messages/chat.db and will return INLINECODE2

API key (audio only)

- Silicon Flow API key — sign up free at https://siliconflow.cn
Long-term use: add to ~/.openclaw/.env: INLINECODE4
Quick test / override: pass --api-key sk-... directly to the script
Image analysis does not require this key

CLI dependency

- imsg CLI: INLINECODE7

Trigger conditions

Activate this skill when:

- Incoming message text contains the attachment placeholder INLINECODE8
User says "语音转文字", "转写", "识别语音", "transcribe"
User says "看图", "识别图片", "读图", "OCR", "截图里写的什么"
User references a photo/audio/file they just sent via iMessage

Decision flow

CODEBLOCK0

Workflow

Step 1 — Get the sender identifier

Always read from the message envelope:

- [iMessage sender@example.com ...] → use INLINECODE10
INLINECODE11 → use INLINECODE12
Never hardcode an address

Step 2 — Fetch the attachment

CODEBLOCK1

Returns JSON with file, type (audio or image), and metadata.

If nothing found, try --limit 100.

Step 3a — Audio: transcribe

CODEBLOCK2

Step 3b — Image: analyze

After fetch returns an image path (e.g. {"file": "/path/to/photo.jpg", "type": "image"}):

CODEBLOCK3

Then in the agent:

1. If HEIC/HEIF: convert first → INLINECODE20
Open with the read tool → agent vision model processes it
Respond with: what it is, main subject, any text/OCR, notable details

Default image response format:

- What it is: photo / screenshot / document
Main subject: 1–2 sentences
Text (OCR): quote key text, or "无明显文字"
Details: 3–5 bullets
Follow-up: ask if they want OCR / table extraction / comparison / etc.

Supported formats

Format	Type	Notes
INLINECODE22	Audio	Standard iMessage voice memo
INLINECODE23

Troubleshooting

Error	Cause	Fix
INLINECODE32	No Full Disk Access	Grant FDA in System Settings
INLINECODE33

imsg-media

完整的iMessage多媒体处理管道：

- 🎙️ 语音备忘录 → 文字：通过Silicon Flow ASR（SenseVoiceSmall，云端，无需本地模型）
🖼️ 图片 → 描述/OCR：通过智能体的内置视觉模型

系统要求

macOS 权限

- 必须为运行OpenClaw的进程授予完全磁盘访问权限
设置 → 隐私与安全性 → 完全磁盘访问权限 → 添加你的终端/应用
若无此权限，imsg将无法读取~/Library/Messages/chat.db，并返回permissionDenied

API密钥（仅音频）

- Silicon Flow API密钥 — 免费注册：https://siliconflow.cn
长期使用： 添加到~/.openclaw/.env：SILICONFLOWKEY=sk-...
快速测试/覆盖： 直接向脚本传递--api-key sk-...
图片分析不需要此密钥

CLI依赖

- imsg CLI：npm install -g imsg

触发条件

在以下情况激活此技能：

- 收到的消息文本包含附件占位符
用户说语音转文字、转写、识别语音、transcribe
用户说看图、识别图片、读图、OCR、截图里写的什么
用户提及刚刚通过iMessage发送的照片/音频/文件

决策流程

检测到附件？
├── 音频（.m4a / .caf / .wav / .mp3）→ 通过Silicon Flow ASR转写
├── 图片（.jpg / .png / .heic / .gif）→ 使用视觉模型读取
└── 未知/未下载 → 增加--limit或要求用户重新发送

工作流程

步骤1 — 获取发送者标识

始终从消息信封中读取：

- [iMessage sender@example.com ...] → 使用sender@example.com
[SMS +1234567890 ...] → 使用+1234567890
切勿硬编码地址

步骤2 — 获取附件

bash

从技能目录运行

cd ~/.openclaw/skills/imsg-voice-transcribe

python3 scripts/imsgvoicetranscribe.py fetch \
--identifier sender@example.com \
--limit 50

返回包含file、type（audio或image）和元数据的JSON。

若未找到，尝试--limit 100。

步骤3a — 音频：转写

bash

单行命令（获取+转写）

python3 scripts/imsgvoicetranscribe.py auto \
--identifier sender@example.com \
--limit 50 --raw

或转写特定文件

python3 scripts/imsgvoicetranscribe.py transcribe \ --file /path/to/audio.m4a --raw

使用显式API密钥快速测试（无需设置环境变量）

python3 scripts/imsgvoicetranscribe.py transcribe \ --file /path/to/audio.m4a --api-key sk-... --raw

步骤3b — 图片：分析

fetch返回图片路径后（例如{file: /path/to/photo.jpg, type: image}）：

bash

示例：从发送者获取图片

python3 scripts/imsgvoicetranscribe.py fetch \
--identifier sender@example.com --type image --limit 50

→ {file: /Users/.../Messages/Attachments/photo.jpg, type: image, ...}

然后在智能体中：

1. 如果是HEIC/HEIF：先转换 → sips -s format png input.heic --out output.png
使用read工具打开 → 智能体视觉模型处理
回复内容：是什么、主要主体、任何文字/OCR、值得注意的细节

默认图片回复格式：

- 是什么： 照片/截图/文档
主要主体： 1–2句话
文字（OCR）： 引用关键文字，或无明显文字
细节： 3–5个要点
后续： 询问是否需要OCR/表格提取/对比等

支持的格式

格式	类型	备注
.m4a	音频	标准iMessage语音备忘录
.caf

音频 | 旧版iOS语音备忘录（CAF中的AAC） | | .wav .mp3 | 音频 | 其他来源 | | .jpg .jpeg .png | 图片 | 标准照片 | | .heic .heif | 图片 | iPhone默认格式 — 先转换为PNG | | .gif | 图片 | 动态或静态 |

故障排除

错误	原因	解决方法
permissionDenied	无完全磁盘访问权限	在系统设置中授予FDA权限
SILICONFLOWKEY not set

imsg-mediaiMessage媒体处理

imsg-media

imsg-media

Requirements

macOS permissions

API key (audio only)

CLI dependency

Trigger conditions

Decision flow

Workflow

Step 1 — Get the sender identifier

Step 2 — Fetch the attachment

Step 3a — Audio: transcribe

Step 3b — Image: analyze

Supported formats

Troubleshooting

imsg-media

系统要求

macOS 权限

API密钥（仅音频）

CLI依赖

触发条件

决策流程

工作流程

步骤1 — 获取发送者标识

步骤2 — 获取附件

从技能目录运行

步骤3a — 音频：转写

单行命令（获取+转写）

或转写特定文件

使用显式API密钥快速测试（无需设置环境变量）

步骤3b — 图片：分析

示例：从发送者获取图片

→ {file: /Users/.../Messages/Attachments/photo.jpg, type: image, ...}

支持的格式

故障排除

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement