Master Data Intelligent Matching System

Overview

A production-ready skill for intelligent entity resolution across business domains. It combines exact-match and vector-semantic retrieval, OCR field mapping with confidence coloring, and human-in-the-loop verification with active learning.

Usage

CODEBLOCK0

Key Features

Business Domain Isolation

Four isolated schemas:

- procurement — vendor records (vendorname, vendorcode, taxid, contact, etc.)
finance — company records (companyname, registrationnumber, fiscalyearend, etc.)
sales — customer records (customername, customercode, industry, creditlimit, etc.)
hr — employee records (employeename, employeeid, id_number, department, etc.)

OCR Field to Schema Visual Line Mapping

buildOcrSchemaMapping(ocrFields, domain) maps raw OCR field names to schema fields with confidence colors:

Color	Score	Meaning
🟢 green	≥ 0.92	High confidence mapping
🟡 yellow

Dual-Path Entity Retrieval

dualPathEntityRetrieval(entity, domain, dbRecords) runs two parallel paths:

1. Exact Match (threshold 0.92) — ALL critical fields must match exactly
Vector Semantic (threshold 0.70) — weighted similarity across all fields

Results include needsHumanReview: true if confidence < 0.92 or no match found.

Field Value Verification

verifyFieldValues(ocrEntity, dbRecord, domain) returns 4-state verification per field:

State	Meaning
INLINECODE4	OCR and DB values agree
INLINECODE5

Human-in-the-Loop

Every pipeline result generates a hitlRequest with:

- Mismatched fields highlighted
New info fields listed
Available review actions: confirmmatch, rejectmatch, createnew, updatefields

Use processHumanDecision(decision, state) to process human feedback and generate learning payloads.

Active Learning

updateActiveLearning(payloads, stats) tracks:

- Per-domain confirmation/rejection/new-record rates
Per-field error rates
Auto-adjusts thresholds when field error rate > 30%

Example

CODEBLOCK1

API Reference

Function	Description
INLINECODE11	List all supported business domains
INLINECODE12

主数据智能匹配系统

概述

一个面向业务领域的生产级智能实体解析技能。它结合了精确匹配与向量语义检索、带置信度颜色标记的OCR字段映射，以及带主动学习的人机协同验证。

使用方法

javascript
import mdm from ./index.js;

// 1. 获取支持的领域
mdm.getSupportedDomains(); // [procurement, finance, sales, hr]

// 2. 构建带置信度颜色的OCR到模式映射
const mapping = mdm.buildOcrSchemaMapping(ocrFields, procurement);

// 3. 运行完整匹配流水线
const result = mdm.runMatchingPipeline(ocrEntity, procurement, dbRecords);

// 4. 将结果格式化为摘要
console.log(mdm.formatMatchingSummary(result));

主要特性

业务领域隔离

四个独立模式：

- 采购 — 供应商记录（供应商名称、供应商代码、税号、联系人等）
财务 — 公司记录（公司名称、注册号、财政年度结束日等）
销售 — 客户记录（客户名称、客户代码、行业、信用额度等）
人力资源 — 员工记录（员工姓名、员工编号、身份证号、部门等）

OCR字段到模式的视觉连线映射

buildOcrSchemaMapping(ocrFields, domain) 将原始OCR字段名映射到模式字段，并带有置信度颜色：

颜色	分数	含义
🟢 绿色	≥ 0.92	高置信度映射
🟡 黄色

0.70–0.92 | 中等置信度映射 | | 🔴 红色 | < 0.70 | 低置信度/未映射 | | 🔵 蓝色 | 仅数据库 | 数据库字段，无OCR数据 |

双路径实体检索

dualPathEntityRetrieval(entity, domain, dbRecords) 运行两条并行路径：

1. 精确匹配（阈值0.92）— 所有关键字段必须完全匹配
向量语义匹配（阈值0.70）— 所有字段的加权相似度

如果置信度 < 0.92 或未找到匹配，结果包含 needsHumanReview: true。

字段值验证

verifyFieldValues(ocrEntity, dbRecord, domain) 返回每个字段的四种状态验证：

状态	含义
匹配	OCR和数据库值一致
不匹配

人机协同

每个流水线结果生成一个 hitlRequest，包含：

- 高亮显示的不匹配字段
列出的新信息字段
可用的审核操作：确认匹配、拒绝匹配、创建新记录、更新字段

使用 processHumanDecision(decision, state) 处理人工反馈并生成学习负载。

主动学习

updateActiveLearning(payloads, stats) 跟踪：

- 每个领域的确认/拒绝/新记录率
每个字段的错误率
当字段错误率 > 30% 时自动调整阈值

示例

javascript
import mdm from ./index.js;

// 来自供应商发票的示例OCR实体
const ocrVendor = {
vendor_name: Acme Corporation Ltd,
vendor_code: V-5001,
tax_id: 91110000123456789X,
contact_person: John Smith,
email: john.smith@acme.com,
};

// 现有数据库记录
const dbRecords = [
{
id: rec_001,
vendor_name: Acme Corporation Ltd,
vendor_code: V-5001,
tax_id: 91110000123456789X,
contact_person: John Smith,
email: j.smith@acme.com, // 邮箱轻微不匹配
phone: +86-10-12345678,
address: 北京市朝阳区,
bank_account: 6222021234567890,
},
];

// 运行流水线
const result = mdm.runMatchingPipeline(ocrVendor, procurement, dbRecords);
console.log(mdm.formatMatchingSummary(result));

// 处理人工决策
const decision = { action: confirm_match, notes: 邮箱不匹配可接受 };
const { status, learningPayload } = mdm.processHumanDecision(decision, {
domain: procurement,
ocrEntity: ocrVendor,
matchResult: result.matchResult,
});

// 更新主动学习
const newStats = mdm.updateActiveLearning([learningPayload], {});

API参考

函数	描述
getSupportedDomains()	列出所有支持的业务领域
getDomainSchema(domain)

master-data-matching主数据匹配

master-data-matching

Master Data Intelligent Matching System

Overview

Usage

Key Features

Business Domain Isolation

OCR Field to Schema Visual Line Mapping

Dual-Path Entity Retrieval

Field Value Verification

Human-in-the-Loop

Active Learning

Example

API Reference

主数据智能匹配系统

概述

使用方法

主要特性

业务领域隔离

OCR字段到模式的视觉连线映射

双路径实体检索

字段值验证

人机协同

主动学习

示例

API参考

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

master-data-matching主数据匹配

master-data-matching

Master Data Intelligent Matching System

Overview

Usage

Key Features

Business Domain Isolation

OCR Field to Schema Visual Line Mapping

Dual-Path Entity Retrieval

Field Value Verification

Human-in-the-Loop

Active Learning

Example

API Reference

主数据智能匹配系统

概述

使用方法

主要特性

业务领域隔离

OCR字段到模式的视觉连线映射

双路径实体检索

字段值验证

人机协同

主动学习

示例

API参考

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement