When to Use
Use this skill when the user needs to analyze, explain, or visualize data from SQL, spreadsheets, notebooks, dashboards, exports, or ad hoc tables.
Use it for KPI debugging, experiment readouts, funnel or cohort analysis, anomaly reviews, executive reporting, and quality checks on metrics or query logic.
Prefer this skill over generic coding or spreadsheet help when the hard part is analytical judgment: metric definition, comparison design, interpretation, or recommendation.
User asks about: analyzing data, finding patterns, understanding metrics, testing hypotheses, cohort analysis, A/B testing, churn analysis, or statistical significance.
Core Principle
Analysis without a decision is just arithmetic. Always clarify: What would change if this analysis shows X vs Y?
Methodology First
Before touching data:
- 1. What decision is this analysis supporting?
- What would change your mind? (the real question)
- What data do you actually have vs what you wish you had?
- What timeframe is relevant?
Statistical Rigor Checklist
- - [ ] Sample size sufficient? (small N = wide confidence intervals)
- [ ] Comparison groups fair? (same time period, similar conditions)
- [ ] Multiple comparisons? (20 tests = 1 "significant" by chance)
- [ ] Effect size meaningful? (statistically significant != practically important)
- [ ] Uncertainty quantified? ("12-18% lift" not just "15% lift")
Architecture
This skill does not require local folders, persistent memory, or setup state.
Use the included reference files as lightweight guides:
- -
metric-contracts.md for KPI definitions and caveats - INLINECODE1 for visual choice and chart anti-patterns
- INLINECODE2 for stakeholder-facing outputs
- INLINECODE3 and
techniques.md for analytical rigor and method choice
Quick Reference
Load only the smallest relevant file to keep context focused.
| Topic | File |
|---|
| Metric definition contracts | INLINECODE5 |
| Visual selection and chart anti-patterns |
chart-selection.md |
| Decision-ready output formats |
decision-briefs.md |
| Failure modes to catch early |
pitfalls.md |
| Method selection by question type |
techniques.md |
Core Rules
1. Start from the decision, not the dataset
- - Identify the decision owner, the question that could change a decision, and the deadline before doing analysis.
- If no decision would change, reframe the request before computing anything.
2. Lock the metric contract before calculating
- - Define entity, grain, numerator, denominator, time window, timezone, filters, exclusions, and source of truth.
- If any of those are ambiguous, state the ambiguity explicitly before presenting results.
3. Separate extraction, transformation, and interpretation
- - Keep query logic, cleanup assumptions, and analytical conclusions distinguishable.
- Never hide business assumptions inside SQL, formulas, or notebook code without naming them in the write-up.
4. Choose visuals to answer a question
- - Select charts based on the analytical question: trend, comparison, distribution, relationship, composition, funnel, or cohort retention.
- Do not add charts that make the deck look fuller but do not change the decision.
5. Brief every result in decision format
- - Every output should include the answer, evidence, confidence, caveats, and recommended next action.
- If the output is going to a stakeholder, translate the method into business implications instead of leading with technical detail.
6. Stress-test claims before recommending action
- - Segment by obvious confounders, compare the right baseline, quantify uncertainty, and check sensitivity to exclusions or time windows.
- Strong-looking numbers without robustness checks are not decision-ready.
7. Escalate when the data cannot support the claim
- - Block or downgrade conclusions when sample size is weak, the source is unreliable, definitions drifted, or confounding is unresolved.
- It is better to say "unknown yet" than to produce false confidence.
Common Traps
- - Reusing a KPI name after changing numerator, denominator, or exclusions -> trend comparisons become invalid.
- Comparing daily, weekly, and monthly grains in one chart -> movement looks real but is mostly aggregation noise.
- Showing percentages without underlying counts -> leadership overreacts to tiny denominators.
- Using a pretty chart instead of the right chart -> the output looks polished but hides the actual decision signal.
- Hunting for interesting cuts after seeing the result -> narrative follows chance instead of evidence.
- Shipping automated reports without metric owners or caveats -> bad numbers spread faster than they can be corrected.
- Treating observational patterns as causal proof -> action plans get built on correlation alone.
Approach Selection
| Question type | Approach | Key output |
|---|
| "Is X different from Y?" | Hypothesis test | p-value + effect size + CI |
| "What predicts Z?" |
Regression/correlation | Coefficients + R² + residual check |
| "How do users behave over time?" | Cohort analysis | Retention curves by cohort |
| "Are these groups different?" | Segmentation | Profiles + statistical comparison |
| "What's unusual?" | Anomaly detection | Flagged points + context |
For technique details and when to use each, see techniques.md.
Output Standards
- 1. Lead with the insight, not the methodology
- Quantify uncertainty - ranges, not point estimates
- State limitations - what this analysis can't tell you
- Recommend next steps - what would strengthen the conclusion
Red Flags to Escalate
- - User wants to "prove" a predetermined conclusion
- Sample size too small for reliable inference
- Data quality issues that invalidate analysis
- Confounders that can't be controlled for
External Endpoints
This skill makes no external network requests.
| Endpoint | Data Sent | Purpose |
|---|
| None | None | N/A |
No data is sent externally.
Security & Privacy
Data that leaves your machine:
Data that stays local:
This skill does NOT:
- - Access undeclared external endpoints.
- Store credentials or raw exports in hidden local memory files.
- Create or depend on local folder systems for persistence.
- Create automations or background jobs without explicit user confirmation.
- Rewrite its own instruction source files.
Related Skills
Install with
clawhub install <slug> if user confirms:
- -
sql - query design and review for reliable data extraction. - INLINECODE13 - cleanup and normalization for tabular inputs before analysis.
- INLINECODE14 - implementation patterns for KPI visualization layers.
- INLINECODE15 - structured stakeholder-facing deliverables after analysis.
- INLINECODE16 - KPI systems and operating cadence beyond one-off analysis.
Feedback
- - If useful: INLINECODE17
- Stay updated: INLINECODE18
技能名称:数据分析
详细描述:
何时使用
当用户需要分析、解释或可视化来自SQL、电子表格、笔记本、仪表板、导出文件或临时表格中的数据时,使用此技能。
用于KPI调试、实验解读、漏斗或同期群分析、异常审查、高管报告,以及对指标或查询逻辑的质量检查。
当难点在于分析判断(如指标定义、对比设计、解读或建议)时,优先使用此技能,而非通用编码或电子表格帮助。
用户询问:分析数据、发现模式、理解指标、检验假设、同期群分析、A/B测试、流失分析或统计显著性。
核心原则
没有决策的分析只是算术。始终明确:如果分析结果显示X而非Y,会有什么不同?
方法论优先
在接触数据之前:
- 1. 这项分析支持什么决策?
- 什么会改变你的想法?(真正的问题)
- 你实际拥有什么数据,与你希望拥有什么数据?
- 相关的时间范围是什么?
统计严谨性检查清单
- - [ ] 样本量是否充足?(样本量小 = 置信区间宽)
- [ ] 对比组是否公平?(相同时间段,相似条件)
- [ ] 是否存在多重比较?(20次测试 = 有1次因偶然性而“显著”)
- [ ] 效应量是否有意义?(统计显著 != 实际重要)
- [ ] 不确定性是否量化?(“提升12-18%”而非仅“提升15%”)
架构
此技能不需要本地文件夹、持久内存或设置状态。
使用附带的参考文件作为轻量级指南:
- - metric-contracts.md 用于KPI定义和注意事项
- chart-selection.md 用于视觉选择和图表反模式
- decision-briefs.md 用于面向利益相关者的输出
- pitfalls.md 和 techniques.md 用于分析严谨性和方法选择
快速参考
仅加载最小的相关文件以保持上下文聚焦。
| 主题 | 文件 |
|---|
| 指标定义合同 | metric-contracts.md |
| 视觉选择与图表反模式 |
chart-selection.md |
| 决策就绪的输出格式 | decision-briefs.md |
| 需及早发现的失败模式 | pitfalls.md |
| 按问题类型选择方法 | techniques.md |
核心规则
1. 从决策出发,而非数据集
- - 在进行分析前,确定决策者、可能改变决策的问题以及截止日期。
- 如果没有决策会改变,则在计算任何内容之前重新构建请求。
2. 在计算前锁定指标合同
- - 定义实体、粒度、分子、分母、时间窗口、时区、筛选条件、排除项和事实来源。
- 如果其中任何一项不明确,在呈现结果前明确说明这种模糊性。
3. 分离提取、转换和解读
- - 确保查询逻辑、清理假设和分析结论是可区分的。
- 切勿在未在书面报告中说明的情况下,将业务假设隐藏在SQL、公式或笔记本代码中。
4. 选择可视化来回答问题
- - 根据分析问题选择图表:趋势、比较、分布、关系、构成、漏斗或同期群留存。
- 不要添加那些让演示文稿看起来更充实但不会改变决策的图表。
5. 以决策格式简要说明每个结果
- - 每个输出都应包含答案、证据、置信度、注意事项和推荐的下一步行动。
- 如果输出面向利益相关者,请将方法转化为业务影响,而不是首先提供技术细节。
6. 在建议行动前对结论进行压力测试
- - 按明显的混杂因素进行细分,比较正确的基线,量化不确定性,并检查对排除项或时间窗口的敏感性。
- 未经稳健性检查的看似强劲的数字并非决策就绪。
7. 当数据无法支持结论时升级问题
- - 当样本量不足、来源不可靠、定义发生偏移或混杂因素未解决时,阻止或降级结论。
- 说“尚不清楚”比制造虚假信心要好。
常见陷阱
- - 在更改分子、分母或排除项后重复使用KPI名称 -> 趋势比较变得无效。
- 在一个图表中比较日、周、月粒度 -> 变化看似真实,但主要是聚合噪声。
- 显示百分比而不显示底层计数 -> 领导层对微小分母反应过度。
- 使用漂亮的图表而非正确的图表 -> 输出看起来精致,但隐藏了实际的决策信号。
- 在看到结果后寻找有趣的切分 -> 叙述跟随偶然性而非证据。
- 在没有指标负责人或注意事项的情况下发送自动化报告 -> 坏数字传播速度超过其被纠正的速度。
- 将观察性模式视为因果证据 -> 行动计划仅建立在相关性之上。
方法选择
| 问题类型 | 方法 | 关键输出 |
|---|
| “X与Y不同吗?” | 假设检验 | p值 + 效应量 + 置信区间 |
| “什么预测Z?” |
回归/相关性 | 系数 + R² + 残差检验 |
| “用户行为如何随时间变化?” | 同期群分析 | 按同期群划分的留存曲线 |
| “这些群体不同吗?” | 细分 | 画像 + 统计比较 |
| “什么是不寻常的?” | 异常检测 | 标记点 + 上下文 |
有关技术细节及何时使用每种方法,请参阅 techniques.md。
输出标准
- 1. 以洞察为先导,而非方法论
- 量化不确定性 - 使用范围,而非点估计
- 说明局限性 - 此分析无法告诉你什么
- 推荐后续步骤 - 什么可以加强结论
需升级的警示信号
- - 用户想要“证明”一个预先确定的结论
- 样本量太小,无法进行可靠推断
- 数据质量问题使分析无效
- 无法控制的混杂因素
外部端点
此技能不发出外部网络请求。
没有数据被发送到外部。
安全与隐私
离开你机器的数据:
保留在本地的数据:
此技能不会:
- - 访问未声明的外部端点。
- 将凭据或原始导出文件存储在隐藏的本地内存文件中。
- 创建或依赖本地文件夹系统进行持久化。
- 在未经用户明确确认的情况下创建自动化或后台任务。
- 重写其自身的指令源文件。
相关技能
如果用户确认,使用 clawhub install
安装:
- - sql - 用于可靠数据提取的查询设计和审查。
- csv - 用于分析前表格输入的清理和规范化。
- dashboard - 用于KPI可视化层的实现模式。
- report - 用于分析后面向利益相关者的结构化交付物。
- business-intelligence - 超越一次性分析的KPI系统和运营节奏。
反馈
- - 如果有用:clawhub star data-analysis
- 保持更新:clawhub sync