HubSpot CRM Database Audit
Run a full diagnostic audit of a HubSpot CRM portal. This skill collects metrics across eight dimensions, grades each one, and produces a prioritized report with actionable recommendations.
Setup
- 1. Get the API token. Check
.env for HUBSPOT_API_TOKEN. If it is not set, ask the user to provide their HubSpot private app API token and store it in .env:
CODEBLOCK0
- 2. Install dependencies. Use
uv (not pip):
CODEBLOCK1
- 3. Create the output directory if it does not exist:
CODEBLOCK2
Audit Dimensions
Run queries for each of the following eight dimensions. Collect exact counts for every metric listed.
1. Database Size
- - Total contacts
- Total companies
- Total deals
- Marketing contacts vs non-marketing contacts (if Marketing Hub is active)
2. Email Deliverability
- - Hard bounced contacts (
hs_email_hard_bounce_reason_enum is not empty) - Soft bounced contacts (
hs_email_bounce > 0 AND no hard bounce) - Global unsubscribes (
hs_is_unworked or hs_email_optout = true) - Never-emailed contacts (no
hs_email_last_send_date) - Invalid email format (regex check on
email property) - Contacts with 3+ bounces
3. Data Completeness
- - Missing INLINECODE10
- Missing
company (contact-level) - Missing
industry (contact-level) - Missing
country and/or INLINECODE14 - Missing INLINECODE15
- Missing INLINECODE16
- Missing INLINECODE17
- Companies missing INLINECODE18
- Companies missing INLINECODE19
- Companies missing
city / state / INLINECODE22
4. Engagement Health
- - Last activity distribution: active in last 30 days, 31-90 days, 91-180 days, 181-365 days, 365+ days, never engaged
- Email open rate (last 90 days)
- Email click rate (last 90 days)
- Contacts with zero page views
- Contacts with zero form submissions
5. Duplicate Analysis
- - Duplicate email addresses (exact match)
- Companies sharing the same INLINECODE23
- Companies with very similar names (fuzzy — note: API cannot do fuzzy matching natively; count exact duplicates on
name and flag for manual review)
6. Owner Health
- - Deactivated owners who still have assigned contacts
- Deactivated owners who still have assigned companies
- Deactivated owners who still have assigned deals
- Contacts with no owner
- Companies with no owner
7. List & Workflow Health
- - Total active lists vs static lists
- Lists with zero members
- Workflows currently active
- Workflows that have not enrolled anyone in 90+ days
- Forms with zero submissions
- Forms with submissions in last 30 days
8. Deal Pipeline Health
- - Deals without INLINECODE25
- Deals without INLINECODE26
- Deals in each pipeline stage
- Stale deals (no activity in 60+ days, still open)
- Average deal age by stage
API Technical Notes
These details are critical for getting accurate results:
- - Null checks: Use the
NOT_HAS_PROPERTY filter operator to find contacts where a property has never been set. HubSpot stores "never happened" as null (property absent), not as 0 or empty string.
CODEBLOCK3
- - Search API pagination limit: The Search API returns a maximum of 10,000 results per query. If you expect more than 10K, segment queries by another property (e.g.,
createdate ranges, lifecycle stage, or first letter of email) and sum the results.
- - Deactivated owners: The Owners API does not return deactivated owners by default. Pass
archived=True:
CODEBLOCK4
- - Rate limiting: Private apps are limited to 100 requests per 10 seconds. Add a small delay between batch calls or use exponential backoff on 429 responses.
- - Engagement timestamps: Use
hs_last_sales_activity_timestamp and notes_last_contacted for activity dating. hs_email_last_open_date and hs_email_last_click_date are useful for email engagement specifically.
- - Marketing contact status: The property
hs_marketable_status indicates whether a contact is set as a marketing contact. This property is read-only via API.
Script Structure
Write a single Python script (scripts/audit_portal.py) that:
- 1. Loads the API token from INLINECODE36
- Initializes the HubSpot client:
from hubspot import HubSpot
api_client = HubSpot(access_token=os.getenv("HUBSPOT_API_TOKEN"))
- 3. Runs each dimension's queries sequentially (respect rate limits)
- Collects all results into a structured dict
- Computes letter grades per dimension (see grading rubric below)
- Renders the markdown report
- Saves to INLINECODE37
Grading Rubric
Assign a letter grade to each dimension based on severity:
| Grade | Meaning | Criteria |
|---|
| A | Healthy | < 5% of records affected |
| B |
Minor issues | 5-15% of records affected |
| C | Needs attention | 15-30% of records affected |
| D | Significant problems | 30-50% of records affected |
| F | Critical | > 50% of records affected |
For dimensions without a simple percentage (e.g., Owner Health), use judgment based on the number of affected records and business impact.
Output Format
Save the report to reports/hubspot-audit-{YYYY-MM-DD}.md with this structure:
CODEBLOCK6
Skill Prescription
After generating the audit report, prescribe a specific ordered list of skills the user should run. Do not just present findings — tell the user exactly what to do next.
Step 1: Map Findings to Skills
For each audit finding that scored C or worse, map it to the appropriate skill. Use this category-ordered lookup:
Database Hygiene (run first — billing and deliverability impact):
| Finding | Skill | Priority |
|---|
| Contacts missing email | INLINECODE39 | P0 |
| Hard bounced contacts |
/suppress-hard-bounced | P0 |
| Global unsubscribes |
/suppress-global-unsubscribes | P0 |
| Ghost/never-engaged contacts |
/suppress-ghost-contacts | P1 |
| Duplicate companies |
/merge-duplicate-companies | P1 |
| Deactivated owners with contacts |
/reassign-deactivated-owners | P1 |
Data Enrichment (run second — data quality):
| Finding | Skill | Priority |
|---|
| Missing company name | INLINECODE45 | P1 |
| Missing industry |
/enrich-industry | P1 |
| Inconsistent geo data |
/standardize-geo-values | P2 |
| Missing geo data |
/backfill-geo-data | P2 |
| Missing/wrong lifecycle stage |
/fix-lifecycle-stages | P1 |
| Unowned marketing contacts |
/assign-unowned-contacts | P1 |
Segmentation & Scoring (run third — targeting):
| Finding | Skill | Priority |
|---|
| No ICP classification | INLINECODE51 | P2 |
| No lead scoring |
/build-lead-scoring | P2 |
| No segment lists |
/build-smart-lists | P2 |
Automation Workflows (run fourth — prevention):
| Finding | Skill | Priority |
|---|
| No new-contact hygiene | INLINECODE54 | P2 |
| High disengagement rate |
/engagement-suppression-workflow | P2 |
| No lifecycle automation |
/lifecycle-progression-workflow | P3 |
| No bounce monitoring |
/bounce-monitoring-workflow | P2 |
Ongoing Maintenance (run last — sustainability):
| Finding | Skill | Priority |
|---|
| Unused lists | INLINECODE58 | P3 |
| Unused forms |
/cleanup-forms | P3 |
| Stale workflows |
/cleanup-workflows | P3 |
| Dashboard clutter |
/cleanup-dashboards | P3 |
| Deal pipeline issues |
/cleanup-deals | P3 |
| Unused properties |
/cleanup-properties | P3 |
Step 2: Present the Ordered Prescription
After the audit report, present a numbered action list — not just findings. Format like this:
CODEBLOCK7
Step 3: Handle Missing Skills
If the audit reveals a problem that no existing skill covers, do the following:
- 1. Tell the user clearly: "This audit found an issue that isn't covered by any existing skill: [description]."
- Offer to create it on the spot: "I can create a new skill for this right now. It would be called
/[suggested-name] and would handle [brief description]." - Ask about contributing upstream: "Would you like to contribute this new skill back to the community? If yes, I'll:
- Create the skill in
skills/[name]/SKILL.md
- Fork the repo (if not already forked)
- Push the new skill to your fork
- Open a pull request to INLINECODE66
This helps everyone who uses these skills in the future."
- 4. If the user agrees, create the skill following the standard SKILL.md format, commit it, and open the PR.
- If the user declines the upstream contribution, still create the skill locally so they can use it.
Step 4: Suggest Next Step
End with:
CODEBLOCK8
After Running
- - Print the file path of the saved report
- Present the ordered skill prescription (Step 2 above)
- Highlight the top 3 most critical findings
- Flag any findings that have no matching skill (Step 3 above)
- Suggest running
/hubspot-implementation-plan for the full phased plan
HubSpot CRM 数据库审计
对 HubSpot CRM 门户进行全面诊断审计。该技能从八个维度收集指标,对每个维度进行评分,并生成一份带有可操作建议的优先级报告。
设置
- 1. 获取 API 令牌。 检查 .env 文件中是否有 HUBSPOTAPITOKEN。如果未设置,请要求用户提供其 HubSpot 私有应用 API 令牌并将其存储在 .env 中:
HUBSPOTAPITOKEN=pat-na1-xxxxxxxx
- 2. 安装依赖项。 使用 uv(而非 pip):
bash
uv pip install hubspot-api-client python-dotenv
- 3. 创建输出目录(如果不存在):
bash
mkdir -p reports
审计维度
对以下八个维度分别运行查询。收集所列每个指标的精确计数。
1. 数据库规模
- - 总联系人数量
- 总公司数量
- 总交易数量
- 营销联系人与非营销联系人(如果营销中心已激活)
2. 邮件送达率
- - 硬退回联系人(hsemailhardbouncereasonenum 不为空)
- 软退回联系人(hsemailbounce > 0 且无硬退回)
- 全局退订(hsisunworked 或 hsemailoptout = true)
- 从未发送邮件的联系人(无 hsemaillastsend_date)
- 无效邮箱格式(对 email 属性进行正则检查)
- 退回次数超过 3 次的联系人
3. 数据完整性
- - 缺少 email
- 缺少 company(联系人级别)
- 缺少 industry(联系人级别)
- 缺少 country 和/或 state
- 缺少 lifecyclestage
- 缺少 hubspotownerid
- 缺少 jobtitle
- 公司缺少 domain
- 公司缺少 industry
- 公司缺少 city / state / country
4. 互动健康度
- - 最后活动分布:过去 30 天内活跃、31-90 天、91-180 天、181-365 天、365 天以上、从未互动
- 邮件打开率(过去 90 天)
- 邮件点击率(过去 90 天)
- 零页面浏览量的联系人
- 零表单提交的联系人
5. 重复数据分析
- - 重复的邮箱地址(精确匹配)
- 共享相同 domain 的公司
- 名称高度相似的公司(模糊匹配——注意:API 原生不支持模糊匹配;对 name 进行精确重复计数并标记以供人工审核)
6. 负责人健康度
- - 已停用但仍分配有联系人的负责人
- 已停用但仍分配有公司的负责人
- 已停用但仍分配有交易的负责人
- 无负责人的联系人
- 无负责人的公司
7. 列表与工作流健康度
- - 活跃列表与静态列表总数
- 零成员列表
- 当前活跃的工作流
- 超过 90 天未注册任何人的工作流
- 零提交的表单
- 过去 30 天内有提交的表单
8. 交易管道健康度
- - 缺少 amount 的交易
- 缺少 closedate 的交易
- 各管道阶段的交易
- 停滞交易(超过 60 天无活动且仍处于开启状态)
- 按阶段划分的平均交易年龄
API 技术说明
以下细节对于获取准确结果至关重要:
- - 空值检查:使用 NOTHASPROPERTY 筛选运算符查找从未设置过属性的联系人。HubSpot 将从未发生存储为空值(属性不存在),而非 0 或空字符串。
python
{
filterGroups: [{
filters: [{
propertyName: hs
emaillast
senddate,
operator: NOT
HASPROPERTY
}]
}]
}
- - 搜索 API 分页限制:搜索 API 每次查询最多返回 10,000 条结果。如果预期超过 10K,请按其他属性(例如 createdate 范围、生命周期阶段或邮箱首字母)分段查询并汇总结果。
- - 已停用负责人:默认情况下,负责人 API 不返回已停用的负责人。请传递 archived=True:
python
api
client.crm.owners.ownersapi.get_page(archived=True)
- - 速率限制:私有应用每 10 秒限制 100 个请求。在批量调用之间添加少量延迟,或在收到 429 响应时使用指数退避。
- - 互动时间戳:使用 hslastsalesactivitytimestamp 和 noteslastcontacted 进行活动日期标记。hsemaillastopendate 和 hsemaillastclickdate 特别适用于邮件互动。
- - 营销联系人状态:属性 hsmarketablestatus 指示联系人是否被设置为营销联系人。此属性通过 API 为只读。
脚本结构
编写一个 Python 脚本(scripts/audit_portal.py),该脚本:
- 1. 从 .env 加载 API 令牌
- 初始化 HubSpot 客户端:
python
from hubspot import HubSpot
api
client = HubSpot(accesstoken=os.getenv(HUBSPOT
APITOKEN))
- 3. 依次运行每个维度的查询(遵守速率限制)
- 将所有结果收集到结构化字典中
- 计算每个维度的字母等级(参见下面的评分标准)
- 渲染 Markdown 报告
- 保存到 reports/hubspot-audit-{YYYY-MM-DD}.md
评分标准
根据严重程度为每个维度分配字母等级:
轻微问题 | 受影响记录 5-15% |
| C | 需要注意 | 受影响记录 15-30% |
| D | 严重问题 | 受影响记录 30-50% |
| F | 危急 | 受影响记录 > 50% |
对于没有简单百分比的维度(例如负责人健康度),根据受影响记录的数量和业务影响进行判断。
输出格式
将报告保存到 reports/hubspot-audit-{YYYY-MM-DD}.md,结构如下:
markdown
HubSpot CRM 审计报告
日期: YYYY-MM-DD
门户 ID: [portal-id]
执行摘要
| 维度 | 等级 | 关键发现 |
|---|
| 数据库规模 | B | ~XX,000 联系人,XX,000 公司 |
| 邮件送达率 |
D | XX% 硬退回,XX% 全局退订 |
| 数据完整性 | F | XX% 缺少邮箱,XX% 缺少行业 |
| 互动健康度 | D | XX% 从未互动,XX% 超过 12 个月未活跃 |
| 重复数据分析 | C | ~X,XXX 重复公司域名 |
| 负责人健康度 | F | X 个已停用负责人分配了 XX,XXX 个联系人 |
| 列表与工作流健康度 | B | XX 个未使用列表,X 个停滞工作流 |
| 交易管道健康度 | C | XX% 交易缺少金额,X 个停滞交易 |
总体等级:X
优先建议
- 1. [危急] 删除无邮箱联系人 — XX,XXX 个无邮箱地址的联系人
是不可计费的无效数据。运行 /delete-no-email-contacts。
工作量:1 小时 | 完全可脚本化
- 2. [危急] 抑制硬退回联系人 — XX,XXX 个硬退回正在破坏
发件人声誉。运行 /suppress-hard-bounced。
工作量:1 小时 | 混合(API + 工作流)
- 3. [高] 重新分配已停用负责人的联系人 — XX,XXX 个联系人分配给
X 个已停用用户。运行 /reassign-deactivated-owners。
工作量:2 小时 | 完全可脚本化
- 4. ...按影响继续排序...
详细发现
1. 数据库规模
| 指标 | 计数 | 占总数的百分比 |
|---|
| 总联系人 | XX,XXX | — |
| 总公司 |
XX,XXX | — |
| 总交易 | X,XXX | — |
| 营销联系人 | XX,XXX | XX% |
2. 邮件送达率
XX%