Expense Categorization
Receipt OCR, GL mapping, policy compliance, and anomaly detection for business expenses.
Workflow
1. Receipt Extraction (OCR)
Use tesseract (local) or Vision API for image receipts; pdfplumber for PDF receipts.
Key fields to extract:
- - Vendor name, date, total amount, line items
- Payment method (last 4 digits if visible)
- Tax amount (HST/GST/sales tax)
- Tips/gratuity (separate from subtotal)
CODEBLOCK0
For complex or handwritten receipts → use vision model with prompt in references/ocr-prompt.md.
2. GL Code Mapping
Map extracted expense category to chart of accounts. See references/gl-mapping.md for:
- - Standard QBO GL codes for common expense types
- IRS-aligned categories (meals 50%, travel, home office, etc.)
- Crypto/DeFi expense categories
Matching logic:
- 1. Exact vendor name match (known vendor list)
- MCC code match (credit card transactions)
- Keyword match on description/line items
- Fallback: prompt user to select category
3. Policy Compliance Check
Apply policy rules before approval routing. See references/policy-rules.md for standard rules.
Core checks:
- - Per diem limits: Meals >$75 require itemized receipt; travel per diem by city
- Receipt threshold: Receipt required for any expense ≥$25 (IRS standard)
- Time limit: Receipts must be submitted within 30/60/90 days (configurable)
- Duplicate detection: Same vendor + amount ± $1 within 7 days = flag
- Split transactions: Same vendor, sequential dates, amounts just below approval threshold = flag
4. Anomaly Detection
Flag for human review:
- - Amount > 2× historical average for that vendor/category
- Weekend or holiday transactions (especially travel/entertainment)
- Round-number amounts (potential personal purchase)
- Vendor in restricted list (casinos, adult entertainment, competitors)
- Missing required fields (date, vendor, business purpose)
- Out-of-state purchases for office supply categories
5. Output Format
CODEBLOCK1
Batch Processing
For expense report batches:
CODEBLOCK2
Output batch summary as CSV or feed directly to QBO via qbo-automation skill.
Common Patterns
Credit card statement import:
- 1. Parse CSV/OFX from bank
- Match known vendors → auto-categorize
- Unknown vendors → ML classification or prompt
- Export mapped transactions to QBO
Expense report approval routing:
- - Auto-approve: policy-compliant, under $250, no flags
- Manager approval: $250–$2,500 or single flag
- Finance review: >$2,500, multiple flags, or restricted category
Mileage reimbursement:
- - Extract start/end locations + business purpose
- Calculate at current IRS rate (check
references/irs-rates.md) - Map to GL 6210 (Auto/Mileage)
Integration Points
- - qbo-automation: Push categorized transactions directly to QBO
- crypto-tax-agent: Route DeFi/crypto expenses for cost basis tracking
- kpi-alert-system: Trigger alerts when department spend exceeds budget
- invoice-automation: Cross-reference receipts with vendor invoices
Negative Boundaries
- - Not for PTIN-backed tax work — categorization ≠ tax advice; defer to licensed preparer
- Not for payroll — employee expense reimbursement != payroll processing
- Not a real-time feed — batch review with human sign-off before posting to GL
- Not for legal contracts — use contract-review-agent for vendor agreements
- Confidence <0.7 → always route to human review, never auto-post
费用分类
针对企业费用的收据OCR、总账映射、政策合规性检查及异常检测。
工作流程
1. 收据提取(OCR)
使用tesseract(本地)或视觉API处理图像收据;使用pdfplumber处理PDF收据。
需提取的关键字段:
- - 供应商名称、日期、总金额、明细项目
- 支付方式(若可见则提取后四位数字)
- 税额(HST/GST/销售税)
- 小费/服务费(与消费小计分开)
bash
对收据图像进行Tesseract OCR
tesseract receipt.jpg stdout --psm 4 | python3 scripts/parse_receipt.py
复杂布局可直接使用Claude视觉功能
对于复杂或手写收据 → 使用references/ocr-prompt.md中的提示词调用视觉模型。
2. 总账代码映射
将提取的费用类别映射至会计科目表。详见references/gl-mapping.md:
- - 常见费用类型的标准QBO总账代码
- 符合IRS标准的类别(餐饮50%、差旅、家庭办公室等)
- 加密货币/DeFi费用类别
匹配逻辑:
- 1. 精确供应商名称匹配(已知供应商列表)
- MCC代码匹配(信用卡交易)
- 描述/明细项目关键词匹配
- 兜底方案:提示用户选择类别
3. 政策合规性检查
在审批流转前应用政策规则。标准规则详见references/policy-rules.md。
核心检查项:
- - 每日津贴限额:餐饮费超过75美元需提供明细收据;差旅每日津贴按城市区分
- 收据门槛:任何费用≥25美元需提供收据(IRS标准)
- 时限要求:收据须在30/60/90天内提交(可配置)
- 重复检测:7天内同一供应商+金额±1美元=标记
- 拆分交易:同一供应商、连续日期、金额略低于审批阈值=标记
4. 异常检测
标记需人工审核的情况:
- - 金额超过该供应商/类别历史平均值的2倍
- 周末或节假日交易(特别是差旅/招待类)
- 整数金额(可能为个人消费)
- 供应商在限制名单中(赌场、成人娱乐、竞争对手)
- 缺少必填字段(日期、供应商、业务目的)
- 办公用品类别的跨州采购
5. 输出格式
json
{
receipt_id: REC-20260315-001,
vendor: 达美航空,
date: 2026-03-15,
amount: 487.50,
currency: USD,
gl_code: 6200,
category: 差旅-机票,
policy_status: approved,
flags: [],
confidence: 0.94,
requires_review: false,
notes: 报销需提供业务目的说明
}
批量处理
针对费用报告批次:
python
处理收据文件夹
import glob
receipts = glob.glob(receipts/*.{jpg,png,pdf})
results = [categorize(r) for r in receipts]
汇总统计
flagged = [r for r in results if r[requires_review]]
total = sum(r[amount] for r in results)
by
category = groupby(results, category)
输出批次汇总为CSV格式,或通过qbo-automation技能直接导入QBO。
常见模式
信用卡对账单导入:
- 1. 解析银行CSV/OFX文件
- 匹配已知供应商 → 自动分类
- 未知供应商 → 机器学习分类或提示
- 将映射后的交易导出至QBO
费用报告审批流转:
- - 自动批准:符合政策、低于250美元、无标记
- 经理审批:250-2,500美元或单个标记
- 财务审核:超过2,500美元、多个标记或限制类别
里程报销:
- - 提取起止地点+业务目的
- 按当前IRS费率计算(查阅references/irs-rates.md)
- 映射至总账代码6210(汽车/里程)
集成点
- - qbo-automation:将分类后的交易直接推送至QBO
- crypto-tax-agent:将DeFi/加密货币费用路由至成本基础追踪
- kpi-alert-system:部门支出超预算时触发警报
- invoice-automation:收据与供应商发票交叉核对
边界限制
- - 不适用于PTIN支持的税务工作 — 分类≠税务建议;应交由持证税务师处理
- 不适用于薪资处理 — 员工费用报销≠薪资处理
- 非实时数据流 — 过账至总账前需经人工审核确认
- 不适用于法律合同 — 供应商协议请使用合同审核代理
- 置信度<0.7 → 始终转交人工审核,绝不自动过账