Meerkat Governance
Scope: This skill provides two API endpoints your agent can call. It does not auto-activate, does not run in the background, and does not access content unless explicitly called by the agent. The developer controls what content is sent to Meerkat.
Privacy and data handling: https://meerkatplatform.com/privacy
Meerkat processes content in memory and discards it after the response. Only trust scores and metadata are stored. No raw content is retained. No data is shared with third parties. All processing stays in Canada.
Security: Your API key authenticates requests to Meerkat's API. Rotate keys via the dashboard if compromised. All communication is TLS 1.2+ encrypted. Meerkat endpoints are hosted on Azure Container Apps with managed SSL certificates. Verify the endpoint hostname (api.meerkatplatform.com) matches the TLS certificate before sending data.
Ingress Shield
The /v1/shield endpoint scans content for prompt injection, jailbreaks, data exfiltration, and social engineering. The agent can call this before processing content the developer designates as untrusted. Common examples include external emails, web-scraped content, and user-uploaded documents. Developers can optionally configure their agent to shield skill descriptions before installation.
CODEBLOCK0
Response fields:
- -
safe (boolean): Whether the content passed scanning - INLINECODE2 :
NONE, LOW, MEDIUM, HIGH, or INLINECODE7 - INLINECODE8 : Category of detected threat (if any)
- INLINECODE9 : Human-readable description
- INLINECODE10 : Content with threats removed (when available)
- INLINECODE11 : Unique identifier for the audit record
The agent can use the response to decide how to proceed. For example, content flagged HIGH or CRITICAL could be blocked, while MEDIUM could prompt user confirmation. If sanitized_input is returned, the agent can use the cleaned version.
Egress Verify
The /v1/verify endpoint checks AI-generated output against source data using up to five ML checks: entailment (DeBERTa NLI), numerical verification, semantic entropy, implicit preference detection, and claim extraction.
CODEBLOCK1
The domain field applies domain-specific rules. Supported values: healthcare, financial, legal, general.
Response fields:
- -
trust_score (0-100): Weighted composite score across all checks - INLINECODE23 :
PASS or FLAG (severity communicated via trust_score and remediation.severity) - INLINECODE28 : Per-check scores, flags, and details
- INLINECODE29 : Corrections and agent instructions (when status is not
PASS) - INLINECODE31 : Unique identifier for the audit record
- INLINECODE32 : Session identifier for linking retry attempts
The agent can use the status and trust score to decide whether to proceed. When remediation is present, the agent_instruction field contains guidance for self-correction, and corrections lists specific errors (e.g., found value vs expected value). The agent can regenerate output with corrections applied and resubmit using the same session_id to link attempts.
Observation Mode
When no context field is provided, Meerkat runs in observation mode: it checks semantic entropy and implicit preference but skips source-grounded checks. The context_mode field in the response will be observation. This is useful for checking open-ended generation where no source document exists.
Audit Trail
Every shield and verify call is logged with an audit ID. The /v1/audit/<audit_id> endpoint retrieves the full record. Add ?include_session=true to see all linked attempts in a retry session.
CODEBLOCK2
Setup
- 1. Get a free API key at https://meerkatplatform.com (10,000 verifications/month, no credit card)
- Set the environment variable: INLINECODE42
- The developer controls which content is sent to Meerkat through their agent configuration. The agent calls the shield endpoint before processing untrusted external content, and the verify endpoint before executing high-impact actions.
Detection Capabilities
See https://meerkatplatform.com/docs for example payloads and response formats.
Ingress detects: prompt injection, indirect injection, data exfiltration attempts, jailbreak and role-hijacking patterns, credential harvesting, and social engineering.
Egress detects: hallucinated facts, numerical distortions (medication doses, financial figures, legal terms), fabricated entities and citations, confident confabulation, bias and implicit preference, and ungrounded numbers.
Usage Headers
Every API response includes usage headers:
- -
X-Meerkat-Usage: Current verification count - INLINECODE44 : Monthly limit (or "unlimited")
- INLINECODE45 : Verifications remaining
- INLINECODE46 : Warning when approaching limit (80%+)
Privacy
Meerkat processes content for security scanning only. Content is not stored beyond the audit trail retention period. Your API key is scoped to your organization. See https://meerkatplatform.com/privacy for details.
Meerkat 治理
范围: 此技能提供两个可供智能体调用的 API 端点。它不会自动激活,不会在后台运行,也不会访问内容,除非由智能体显式调用。开发者控制发送给 Meerkat 的内容。
隐私与数据处理: https://meerkatplatform.com/privacy
Meerkat 在内存中处理内容,并在响应后将其丢弃。仅存储信任分数和元数据。不保留任何原始内容。不与第三方共享任何数据。所有处理均在加拿大境内完成。
安全性: 您的 API 密钥用于验证对 Meerkat API 的请求。如果密钥泄露,可通过控制面板轮换密钥。所有通信均采用 TLS 1.2+ 加密。Meerkat 端点托管在 Azure 容器应用上,并配有托管 SSL 证书。在发送数据前,请验证端点主机名 (api.meerkatplatform.com) 是否与 TLS 证书匹配。
入站防护
/v1/shield 端点扫描内容中的提示注入、越狱、数据窃取和社会工程学攻击。智能体可以在处理开发者指定为不可信的内容之前调用此端点。常见示例包括外部电子邮件、网页抓取内容和用户上传的文档。开发者可以选择配置其智能体在安装前对技能描述进行防护。
bash
curl -s -X POST https://api.meerkatplatform.com/v1/shield \
-H Authorization: Bearer $MEERKATAPIKEY \
-H Content-Type: application/json \
-d {\input\: \\}
响应字段:
- - safe (布尔值):内容是否通过扫描
- threatlevel:NONE、LOW、MEDIUM、HIGH 或 CRITICAL
- attacktype:检测到的威胁类别(如有)
- detail:人类可读的描述
- sanitizedinput:已移除威胁的内容(可用时)
- auditid:审计记录的唯一标识符
智能体可以使用响应来决定如何处理。例如,标记为 HIGH 或 CRITICAL 的内容可以被阻止,而 MEDIUM 可以提示用户确认。如果返回了 sanitized_input,智能体可以使用清理后的版本。
出站验证
/v1/verify 端点使用多达五项机器学习检查来对照源数据检查 AI 生成的输出:蕴含关系 (DeBERTa NLI)、数值验证、语义熵、隐式偏好检测和声明提取。
bash
curl -s -X POST https://api.meerkatplatform.com/v1/verify \
-H Authorization: Bearer $MEERKATAPIKEY \
-H Content-Type: application/json \
-d {\input\: \REQUEST>\, \output\: \OUTPUT>\, \context\: \\, \domain\: \\}
domain 字段应用特定领域的规则。支持的值:healthcare、financial、legal、general。
响应字段:
- - trustscore (0-100):所有检查的加权综合分数
- status:PASS 或 FLAG(严重性通过 trustscore 和 remediation.severity 传达)
- checks:每项检查的分数、标记和详细信息
- remediation:修正和智能体指令(当状态不是 PASS 时)
- auditid:审计记录的唯一标识符
- sessionid:用于关联重试尝试的会话标识符
智能体可以使用状态和信任分数来决定是否继续。当存在 remediation 时,agentinstruction 字段包含自我修正的指导,corrections 列出具体错误(例如,发现值与期望值)。智能体可以应用修正后重新生成输出,并使用相同的 sessionid 重新提交以关联尝试。
观察模式
当未提供 context 字段时,Meerkat 以观察模式运行:它检查语义熵和隐式偏好,但跳过基于源的检查。响应中的 context_mode 字段将为 observation。这对于检查不存在源文档的开放式生成非常有用。
审计追踪
每次防护和验证调用都会记录一个审计 ID。/v1/audit/id> 端点检索完整记录。添加 ?includesession=true 以查看重试会话中所有关联的尝试。
bash
curl -s https://api.meerkatplatform.com/v1/audit/ \
-H Authorization: Bearer $MEERKATAPIKEY
设置
- 1. 在 https://meerkatplatform.com 获取免费 API 密钥(每月 10,000 次验证,无需信用卡)
- 设置环境变量:MEERKATAPIKEY=mkliveyourkeyhere
- 开发者通过其智能体配置控制发送给 Meerkat 的内容。智能体在处理不可信的外部内容前调用防护端点,在执行高影响操作前调用验证端点。
检测能力
请参阅 https://meerkatplatform.com/docs 了解示例载荷和响应格式。
入站检测:提示注入、间接注入、数据窃取尝试、越狱和角色劫持模式、凭证窃取以及社会工程学攻击。
出站检测:幻觉事实、数值失真(药物剂量、财务数据、法律条款)、虚构实体和引用、自信的胡言乱语、偏见和隐式偏好以及无依据的数字。
使用情况标头
每个 API 响应都包含使用情况标头:
- - X-Meerkat-Usage:当前验证计数
- X-Meerkat-Limit:月度限制(或unlimited)
- X-Meerkat-Remaining:剩余验证次数
- X-Meerkat-Warning:接近限制时的警告(80%+)
隐私
Meerkat 仅出于安全扫描目的处理内容。内容不会在审计追踪保留期限之外存储。您的 API 密钥限定于您的组织。详情请参阅 https://meerkatplatform.com/privacy。