Purpose
Automate preparatory bookkeeping from incoming email to accounting records.
Core objective:
- 1. detect invoice email,
- extract structured invoice data,
- verify payment event,
- create accounting entry and reconciliation status.
This is orchestration logic across upstream tools; it is not a replacement for financial controls.
Required Installed Skills
- -
gmail (inspected latest: 1.0.6) - INLINECODE2 (inspected latest:
1.0.6) - INLINECODE4 (inspected latest:
1.0.8) - INLINECODE6 (inspected latest:
1.0.4)
Install/update:
CODEBLOCK0
Required Credentials
- -
MATON_API_KEY (for Gmail, Stripe, Xero through Maton gateway) - INLINECODE9 (for OCR extraction)
Preflight:
CODEBLOCK1
If missing, stop before any bookkeeping action.
Inputs the LM Must Collect First
- - INLINECODE10
- INLINECODE11 (default: invoice, rechnung, receipt, quittung)
- INLINECODE12 (for example AWS -> Hosting expense account)
- INLINECODE13 for matching (default: 3)
- INLINECODE14 (default: exact, or configurable small tolerance)
- INLINECODE15 (
manual-review, auto-if-high-confidence) - INLINECODE18 (
store-link, attach-binary-if-supported)
Do not auto-post financial records without explicit policy.
Tool Responsibilities
Gmail (gmail)
Use for intake and attachment discovery.
Relevant behavior:
- - query messages with Gmail operators (for example
has:attachment, subject:invoice, sender filters) - fetch message metadata and full payload for parsing
- label/update messages after processing (for traceability)
DeepRead OCR (deepread-ocr)
Use for extracting structured fields from invoice PDFs/images.
Relevant behavior:
- - async processing (
queued -> completed/failed) - schema-driven extraction
- field-level
hil_flag and reason for uncertainty - webhook or polling modes
Stripe (stripe-api)
Use for payment-side verification.
Relevant behavior:
- - query charges/payment_intents/invoices/balance transactions
- verify amount, currency, status, and date proximity
Xero (xero)
Use for accounting record creation and payment/reconciliation visibility.
Relevant behavior:
- - create contacts if missing
- create invoices/bills (
ACCPAY for payable bills) - list payments and bank transactions
Canonical Signal Chain
Stage 1: Inbox detection
Scan Gmail for candidate invoice emails.
Recommended query pattern:
- - INLINECODE32
- optional sender constraint for known vendors (for example
from:aws)
Output:
- - message ID
- sender
- received date
- attachment candidates
Stage 2: Attachment extraction
For each invoice candidate attachment:
- 1. send file to DeepRead OCR with invoice schema
- wait for async completion (webhook preferred; polling fallback)
- parse structured result
Minimum extracted fields:
- - vendor
- invoicedate
- invoicenumber
- totalamount
- taxamount
- currency
Quality gate:
- - if critical fields have
hil_flag=true, route to review queue before posting.
Stage 3: Payment verification
Use Stripe to check whether corresponding payment occurred.
Matching policy:
- - amount equals invoice total (within tolerance)
- currency matches
- date within tolerance window
- status is successful/paid
If multiple candidates match, mark as ambiguous_match and require review.
Stage 4: Accounting write
Use Xero for booking.
Default payable flow:
- 1. ensure vendor contact exists (create if needed)
- create bill entry (
Type: ACCPAY) with line item category (for example Hosting) - mark as paid/reconciled state only when Stripe verification is confident
- include reference fields: invoice number, source message ID, payment reference
Attachment handling:
- - if binary attachment endpoint/path is available in the active integration, attach file
- otherwise store durable file reference and include link/reference in description/metadata
Stage 5: Traceability updates
After successful processing:
- - apply Gmail processed label
- store processing log (source email, extraction confidence, matching evidence, xero IDs)
- keep idempotency key to avoid duplicate posting
Scenario Mapping (AWS Invoice)
For the scenario "AWS invoice by email -> Xero + card match":
- 1. Gmail finds AWS email with PDF attachment.
- DeepRead OCR extracts structured fields (vendor/date/total/tax/invoice number).
- Stripe check confirms payment event around invoice date and amount.
- Xero creates payable entry (
ACCPAY) under Hosting category. - Record is marked paid only after confident match; source PDF linked/attached per policy.
Data Contract
Normalize to one transaction record before posting:
CODEBLOCK2
Output Contract
Always return:
- emails scanned, invoice candidates found
- extracted fields and
hil_flag status
- matched/not matched + evidence
- created/updated records and IDs
- any records requiring manual validation
Quality Gates
Before auto-posting:
- - vendor identified
- invoice number/date/total present
- no critical
hil_flag unresolved - payment match confidence above policy threshold
- duplicate check passed (same vendor + invoice number + total)
If any gate fails, return Needs Review and do not auto-post.
Guardrails
- - Never mark invoice as paid without payment evidence.
- Never silently overwrite existing accounting records.
- Never drop uncertain OCR fields; surface them explicitly.
- Prefer manual review when amount/date ambiguity exists.
- Preserve source audit trail for every booking action.
Failure Handling
- - Gmail unavailable: stop intake and report connection issue.
- OCR job failed/timeout: keep email queued for retry.
- Stripe no match: post as unpaid bill or route to review per policy.
- Xero write failed: keep normalized record and retry safely with idempotency key.
Known Limits from Inspected Upstream Skills
- - DeepRead OCR is asynchronous and may require webhook/polling orchestration.
- The inspected Xero skill docs emphasize core accounting endpoints but do not fully document attachment upload flow; attachment behavior depends on supported endpoint path in active integration.
- Stripe/Xero matching is orchestration logic here, not a single native "auto-reconcile" endpoint in these inspected skill docs.
- QuickBooks is not part of this researched stack; this meta-skill is Xero-first.
Treat these limits as mandatory operator disclosures.
目的
自动化从收到的电子邮件到会计记录的预备记账流程。
核心目标:
- 1. 检测发票邮件,
- 提取结构化发票数据,
- 验证付款事件,
- 创建会计分录和对账状态。
这是跨上游工具的编排逻辑,不能替代财务控制。
需要安装的技能
- - gmail(检查最新版本:1.0.6)
- deepread-ocr(检查最新版本:1.0.6)
- stripe-api(检查最新版本:1.0.8)
- xero(检查最新版本:1.0.4)
安装/更新:
bash
npx -y clawhub@latest install gmail
npx -y clawhub@latest install deepread-ocr
npx -y clawhub@latest install stripe-api
npx -y clawhub@latest install xero
npx -y clawhub@latest update --all
需要的凭证
- - MATONAPIKEY(用于通过 Maton 网关访问 Gmail、Stripe、Xero)
- DEEPREADAPIKEY(用于 OCR 提取)
预检:
bash
echo $MATONAPIKEY | wc -c
echo $DEEPREADAPIKEY | wc -c
如果缺失,在采取任何记账操作前停止。
语言模型必须首先收集的输入
- - companybasecurrency
- invoicekeywords(默认值:invoice, rechnung, receipt, quittung)
- vendorrules(例如 AWS -> 托管费用科目)
- 匹配的 datetolerancedays(默认值:3)
- amounttolerance(默认值:精确匹配,或可配置的小容差)
- autopostpolicy(manual-review,auto-if-high-confidence)
- attachmentpolicy(store-link,attach-binary-if-supported)
在没有明确策略的情况下,不要自动过账财务记录。
工具职责
Gmail(gmail)
用于邮件接收和附件发现。
相关行为:
- - 使用 Gmail 操作符查询邮件(例如 has:attachment,subject:invoice,发件人筛选)
- 获取邮件元数据和完整载荷以进行解析
- 处理完成后标记/更新邮件(用于可追溯性)
DeepRead OCR(deepread-ocr)
用于从发票 PDF/图像中提取结构化字段。
相关行为:
- - 异步处理(queued -> completed/failed)
- 基于模式的提取
- 字段级别的 hil_flag 和不确定性原因
- Webhook 或轮询模式
Stripe(stripe-api)
用于付款端验证。
相关行为:
- - 查询费用/支付意图/发票/余额交易
- 验证金额、货币、状态和日期接近度
Xero(xero)
用于创建会计记录以及查看付款和对账情况。
相关行为:
- - 如果联系人缺失则创建
- 创建发票/账单(应付账单使用 ACCPAY)
- 列出付款和银行交易
标准信号链
阶段 1:收件箱检测
扫描 Gmail 以查找候选发票邮件。
推荐查询模式:
- - has:attachment (subject:invoice OR subject:rechnung OR subject:receipt OR subject:quittung)
- 可选的对已知供应商的发件人限制(例如 from:aws)
输出:
阶段 2:附件提取
对于每个发票候选附件:
- 1. 将文件发送到 DeepRead OCR 并使用发票模式
- 等待异步完成(首选 Webhook;轮询作为备用)
- 解析结构化结果
最少提取字段:
质量门控:
- - 如果关键字段的 hil_flag=true,则在过账前路由到审核队列。
阶段 3:付款验证
使用 Stripe 检查是否发生了相应的付款。
匹配策略:
- - 金额等于发票总额(在容差范围内)
- 货币匹配
- 日期在容差窗口内
- 状态为成功/已支付
如果多个候选匹配,标记为 ambiguous_match 并要求审核。
阶段 4:会计写入
使用 Xero 进行记账。
默认应付流程:
- 1. 确保供应商联系人存在(如果需要则创建)
- 创建账单条目(Type: ACCPAY),带有行项目类别(例如托管)
- 仅在 Stripe 验证可信时标记为已支付/已对账状态
- 包含参考字段:发票号码、源邮件 ID、付款参考
附件处理:
- - 如果活动集成中提供了二进制附件端点/路径,则附加文件
- 否则存储持久的文件引用,并在描述/元数据中包含链接/参考
阶段 5:可追溯性更新
成功处理后:
- - 应用 Gmail 已处理标签
- 存储处理日志(源邮件、提取置信度、匹配证据、Xero ID)
- 保留幂等键以避免重复过账
场景映射(AWS 发票)
对于通过电子邮件发送的 AWS 发票 -> Xero + 卡匹配场景:
- 1. Gmail 找到带有 PDF 附件的 AWS 邮件。
- DeepRead OCR 提取结构化字段(供应商/日期/总额/税额/发票号码)。
- Stripe 检查确认在发票日期和金额附近有付款事件。
- Xero 在托管类别下创建应付条目(ACCPAY)。
- 仅在可信匹配后记录才标记为已支付;根据策略链接/附加源 PDF。
数据契约
在过账前规范化为一条交易记录:
json
{
source: {
gmailmessageid: ...,
sender: billing@aws.amazon.com,
attachment_name: invoice.pdf
},
invoice: {
vendor: AWS,
invoice_number: INV-123,
invoice_date: 2024-05-01,
total: 53.20,
tax: 0.00,
currency: USD,
ocrconfidenceok: true
},
payment_match: {
provider: stripe,
matched: true,
transactionid: ch...,
amount: 53.20,
date: 2024-05-01
},
accounting: {
system: xero,
entry_type: ACCPAY,
category: Hosting,
status: Paid
}
}
输出契约
始终返回:
- 扫描的邮件数,找到的发票候选数
- 提取的字段和 hil_flag 状态
- 匹配/未匹配 + 证据
- 创建/更新的记录和 ID
- 任何需要手动验证的记录
质量门控
在自动过账前:
- - 供应商已识别
- 发票号码/日期/总额存在
- 没有未解决的关键 hil_flag
- 付款匹配置信度高于策略阈值
- 重复检查通过(相同供应商 + 发票号码 + 总额)
如果任何门控失败,返回 Needs Review 并且不自动过账。
护栏
- - 没有付款证据时,切勿将发票标记为已支付。
- 切勿静默覆盖现有会计记录。
- 切勿丢弃不确定的 OCR 字段;明确地展示它们。
- 当金额/日期存在歧义时,优先进行人工审核。
- 为每个记账操作保留源审计跟踪。
故障处理
- - Gmail 不可用:停止接收并报告连接问题。
- OCR 作业失败/超时:将邮件保留在队列中以供重试。
- Stripe 无匹配:根据策略过账为未付账单或路由到审核。
- Xero 写入失败:保留规范化记录并使用幂等键安全重试。
已检查上游技能的已知限制
- - DeepRead OCR 是异步的,可能需要 Webhook/轮询编排。
- 已检查的 Xero 技能文档强调核心会计端点,但未完整记录附件上传流程;附件行为取决于活动集成中支持的端点路径。
- 这里的 Stripe/Xero 匹配是编排逻辑,而不是这些已检查技能文档中的单个原生自动对账端点。
- QuickBooks 不属于此研究的技术栈;此元技能以 Xero 为首选。
将这些限制视为强制性的操作员披露。