Content Moderation
Moderate user-generated content using Vettly's AI-powered content moderation API. This skill uses the @vettly/mcp MCP server to check text, images, and video against configurable moderation policies with auditable decisions.
Setup
Add the @vettly/mcp MCP server to your configuration:
CODEBLOCK0
Get an API key at vettly.dev.
Available Tools
moderate_content
Check text, image, or video content against a Vettly moderation policy. Returns a safety assessment with category scores, the action taken, provider used, latency, and cost.
Parameters:
- -
content (required) - The content to moderate (text string, or URL for images/video) - INLINECODE4 (required) - The policy ID to use for moderation
- INLINECODE5 (optional, default:
text) - Type of content: text, image, or INLINECODE9
validate_policy
Validate a Vettly policy YAML without saving it. Returns validation results with any syntax or configuration errors. Use this to test policy changes before deploying them.
Parameters:
- -
yamlContent (required) - The YAML policy content to validate
list_policies
List all moderation policies available in your Vettly account. Takes no parameters. Use this to discover available policy IDs before moderating content.
get_usage_stats
Get usage statistics for your Vettly account including request counts, costs, and moderation outcomes.
Parameters:
- -
days (optional, default: 30) - Number of days to include in statistics (1-365)
get_recent_decisions
Get recent moderation decisions with optional filtering by outcome, content type, or policy.
Parameters:
- -
limit (optional, default: 10) - Number of decisions to return (1-50) - INLINECODE19 (optional) - Filter to only flagged content (
true) or safe content (false) - INLINECODE22 (optional) - Filter by specific policy ID
- INLINECODE23 (optional) - Filter by content type:
text, image, or INLINECODE26
When to Use
- - Moderate user-generated content (comments, posts, uploads) before publishing
- Test and validate moderation policy YAML configs during development
- Audit recent moderation decisions to review flagged content
- Monitor moderation costs and usage across your account
- Compare moderation results across different policies
Examples
Moderate a user comment
CODEBLOCK1
Call list_policies to find available policies, then moderate_content with the appropriate policy ID and return the safety assessment.
Validate a policy before deploying
CODEBLOCK2
Call validate_policy and report any syntax or configuration errors.
Review recent flagged content
CODEBLOCK3
Call get_recent_decisions with flagged: true to retrieve recent moderation decisions that were flagged.
Tips
- - Always call
list_policies first if you don't know which policy ID to use - Use
validate_policy to test policy changes before deploying to production - Use
get_usage_stats to monitor costs and catch unexpected spikes - Filter
get_recent_decisions by contentType or policyId to narrow results - For image and video moderation, pass the content URL rather than raw data
内容审核
使用Vettly的AI驱动内容审核API对用户生成内容进行审核。此技能通过@vettly/mcp MCP服务器检查文本、图片和视频,依据可配置的审核策略执行审核,并生成可审计的决策记录。
配置
将@vettly/mcp MCP服务器添加到您的配置中:
json
{
mcpServers: {
vettly: {
command: npx,
args: [-y, @vettly/mcp],
env: {
VETTLYAPIKEY: your-api-key
}
}
}
}
在vettly.dev获取API密钥。
可用工具
moderate_content
根据Vettly审核策略检查文本、图片或视频内容。返回包含类别评分、执行操作、所用提供商、延迟和成本的安全评估结果。
参数:
- - content(必填)- 待审核的内容(文本字符串,或图片/视频的URL)
- policyId(必填)- 用于审核的策略ID
- contentType(可选,默认值:text)- 内容类型:text、image或video
validate_policy
验证Vettly策略YAML但不保存。返回包含语法或配置错误的验证结果。在部署前使用此工具测试策略更改。
参数:
- - yamlContent(必填)- 待验证的YAML策略内容
list_policies
列出Vettly账户中所有可用的审核策略。无需参数。在审核内容前使用此工具发现可用的策略ID。
getusagestats
获取Vettly账户的使用统计信息,包括请求数量、成本和审核结果。
参数:
- - days(可选,默认值:30)- 统计中包含的天数(1-365)
getrecentdecisions
获取最近的审核决策,可按结果、内容类型或策略进行筛选。
参数:
- - limit(可选,默认值:10)- 返回的决策数量(1-50)
- flagged(可选)- 仅筛选标记内容(true)或安全内容(false)
- policyId(可选)- 按特定策略ID筛选
- contentType(可选)- 按内容类型筛选:text、image或video
使用场景
- - 在发布前审核用户生成内容(评论、帖子、上传文件)
- 在开发期间测试和验证审核策略YAML配置
- 审计最近的审核决策以审查标记内容
- 监控账户范围内的审核成本和使用情况
- 比较不同策略下的审核结果
示例
审核用户评论
根据我的社区论坛策略审核这条用户评论:
我讨厌这个产品,这是我用过的最糟糕的东西,开发者应该感到羞愧
调用listpolicies查找可用策略,然后使用适当的策略ID调用moderatecontent并返回安全评估结果。
部署前验证策略
验证此审核策略YAML:
categories:
- name: toxicity
threshold: 0.8
action: flag
- name: spam
threshold: 0.6
action: block
调用validate_policy并报告任何语法或配置错误。
查看最近标记的内容
显示上周所有被标记的内容
使用flagged: true调用getrecentdecisions,检索最近被标记的审核决策。
提示
- - 如果不确定使用哪个策略ID,请始终先调用listpolicies
- 在部署到生产环境前,使用validatepolicy测试策略更改
- 使用getusagestats监控成本并发现异常峰值
- 按contentType或policyId筛选getrecentdecisions以缩小结果范围
- 对于图片和视频审核,传递内容URL而非原始数据