Data Analysis

When to Use

Use this skill when the user needs to analyze, explain, or visualize data from SQL, spreadsheets, notebooks, dashboards, exports, or ad hoc tables.

Use it for KPI debugging, experiment readouts, funnel or cohort analysis, anomaly reviews, executive reporting, and quality checks on metrics or query logic.

Prefer this skill over generic coding or spreadsheet help when the hard part is analytical judgment: metric definition, comparison design, interpretation, or recommendation.

User asks about: analyzing data, finding patterns, understanding metrics, testing hypotheses, cohort analysis, A/B testing, churn analysis, or statistical significance.

Core Principle

Analysis without a decision is just arithmetic. Always clarify: What would change if this analysis shows X vs Y?

Methodology First

Before touching data:

1. What decision is this analysis supporting?
What would change your mind? (the real question)
What data do you actually have vs what you wish you had?
What timeframe is relevant?

Statistical Rigor Checklist

- [ ] Sample size sufficient? (small N = wide confidence intervals)
[ ] Comparison groups fair? (same time period, similar conditions)
[ ] Multiple comparisons? (20 tests = 1 "significant" by chance)
[ ] Effect size meaningful? (statistically significant != practically important)
[ ] Uncertainty quantified? ("12-18% lift" not just "15% lift")

Architecture

This skill does not require local folders, persistent memory, or setup state.

Use the included reference files as lightweight guides:

- metric-contracts.md for KPI definitions and caveats
INLINECODE1 for visual choice and chart anti-patterns
INLINECODE2 for stakeholder-facing outputs
INLINECODE3 and techniques.md for analytical rigor and method choice

Quick Reference

Load only the smallest relevant file to keep context focused.

Topic	File
Metric definition contracts	INLINECODE5
Visual selection and chart anti-patterns

Core Rules

1. Start from the decision, not the dataset

- Identify the decision owner, the question that could change a decision, and the deadline before doing analysis.
If no decision would change, reframe the request before computing anything.

2. Lock the metric contract before calculating

- Define entity, grain, numerator, denominator, time window, timezone, filters, exclusions, and source of truth.
If any of those are ambiguous, state the ambiguity explicitly before presenting results.

3. Separate extraction, transformation, and interpretation

- Keep query logic, cleanup assumptions, and analytical conclusions distinguishable.
Never hide business assumptions inside SQL, formulas, or notebook code without naming them in the write-up.

4. Choose visuals to answer a question

- Select charts based on the analytical question: trend, comparison, distribution, relationship, composition, funnel, or cohort retention.
Do not add charts that make the deck look fuller but do not change the decision.

5. Brief every result in decision format

- Every output should include the answer, evidence, confidence, caveats, and recommended next action.
If the output is going to a stakeholder, translate the method into business implications instead of leading with technical detail.

6. Stress-test claims before recommending action

- Segment by obvious confounders, compare the right baseline, quantify uncertainty, and check sensitivity to exclusions or time windows.
Strong-looking numbers without robustness checks are not decision-ready.

7. Escalate when the data cannot support the claim

- Block or downgrade conclusions when sample size is weak, the source is unreliable, definitions drifted, or confounding is unresolved.
It is better to say "unknown yet" than to produce false confidence.

Common Traps

- Reusing a KPI name after changing numerator, denominator, or exclusions -> trend comparisons become invalid.
Comparing daily, weekly, and monthly grains in one chart -> movement looks real but is mostly aggregation noise.
Showing percentages without underlying counts -> leadership overreacts to tiny denominators.
Using a pretty chart instead of the right chart -> the output looks polished but hides the actual decision signal.
Hunting for interesting cuts after seeing the result -> narrative follows chance instead of evidence.
Shipping automated reports without metric owners or caveats -> bad numbers spread faster than they can be corrected.
Treating observational patterns as causal proof -> action plans get built on correlation alone.

Approach Selection

Question type	Approach	Key output
"Is X different from Y?"	Hypothesis test	p-value + effect size + CI
"What predicts Z?"

For technique details and when to use each, see techniques.md.

Output Standards

1. Lead with the insight, not the methodology
Quantify uncertainty - ranges, not point estimates
State limitations - what this analysis can't tell you
Recommend next steps - what would strengthen the conclusion

Red Flags to Escalate

- User wants to "prove" a predetermined conclusion
Sample size too small for reliable inference
Data quality issues that invalidate analysis
Confounders that can't be controlled for

External Endpoints

This skill makes no external network requests.

Endpoint	Data Sent	Purpose
None	None	N/A

No data is sent externally.

Security & Privacy

Data that leaves your machine:

- Nothing by default.

Data that stays local:

- Nothing by default.

This skill does NOT:

- Access undeclared external endpoints.
Store credentials or raw exports in hidden local memory files.
Create or depend on local folder systems for persistence.
Create automations or background jobs without explicit user confirmation.
Rewrite its own instruction source files.

Related Skills

Install with clawhub install <slug> if user confirms:

- sql - query design and review for reliable data extraction.
INLINECODE13 - cleanup and normalization for tabular inputs before analysis.
INLINECODE14 - implementation patterns for KPI visualization layers.
INLINECODE15 - structured stakeholder-facing deliverables after analysis.
INLINECODE16 - KPI systems and operating cadence beyond one-off analysis.

Feedback

- If useful: INLINECODE17
Stay updated: INLINECODE18

技能名称：数据分析

详细描述：

使用时机

当用户需要分析、解释或可视化来自SQL、电子表格、笔记本、仪表盘、导出文件或临时表格中的数据时，使用此技能。

用于KPI调试、实验解读、漏斗或同期群分析、异常审查、高管报告，以及指标或查询逻辑的质量检查。

当难点在于分析判断（指标定义、对比设计、解读或建议）时，优先使用此技能，而非通用编码或电子表格帮助。

用户询问：分析数据、发现模式、理解指标、检验假设、同期群分析、A/B测试、流失分析或统计显著性。

核心原则

没有决策的分析只是算术。始终明确：如果分析结果显示X而非Y，会改变什么？

方法论优先

在接触数据之前：

1. 这项分析支持什么决策？
什么会改变你的想法？（真正的问题）
你实际拥有什么数据，与你希望拥有什么数据？
相关的时间范围是什么？

统计严谨性检查清单

- [ ] 样本量是否充足？（样本量小 = 置信区间宽）
[ ] 对比组是否公平？（相同时间段，相似条件）
[ ] 是否存在多重比较？（20次检验 = 1次“显著”纯属偶然）
[ ] 效应量是否有实际意义？（统计显著 ≠ 实际重要）
[ ] 不确定性是否量化？（“提升12-18%”而非仅“提升15%”）

架构

此技能不需要本地文件夹、持久内存或设置状态。

使用附带的参考文件作为轻量级指南：

- metric-contracts.md：KPI定义与注意事项
chart-selection.md：可视化选择与图表反模式
decision-briefs.md：面向利益相关者的输出
pitfalls.md和techniques.md：分析严谨性与方法选择

快速参考

仅加载最小的相关文件以保持上下文聚焦。

主题	文件
指标定义合同	metric-contracts.md
可视化选择与图表反模式

核心规则

1. 从决策出发，而非数据集

- 在进行分析前，确定决策者、可能改变决策的问题以及截止日期。
如果没有决策会改变，则在计算任何内容之前重新定义请求。

2. 在计算前锁定指标合同

- 定义实体、粒度、分子、分母、时间窗口、时区、筛选条件、排除项和真相来源。
如果其中任何一项不明确，在呈现结果前明确说明歧义。

3. 分离提取、转换和解读

- 保持查询逻辑、清理假设和分析结论可区分。
切勿在SQL、公式或笔记本代码中隐藏业务假设，而不在书面报告中注明。

4. 选择可视化来回答问题

- 根据分析问题选择图表：趋势、对比、分布、关系、构成、漏斗或同期群留存。
不要添加那些让演示文稿看起来更充实但不会改变决策的图表。

5. 以决策格式简要呈现每个结果

- 每个输出都应包含答案、证据、置信度、注意事项和建议的下一步行动。
如果输出面向利益相关者，应将方法转化为业务影响，而非以技术细节开头。

6. 在建议行动前对结论进行压力测试

- 按明显的混杂因素进行细分，比较正确的基线，量化不确定性，并检查对排除项或时间窗口的敏感性。
未经稳健性检验的看似强劲的数字，并非决策就绪。

7. 当数据无法支持结论时升级处理

- 当样本量不足、来源不可靠、定义发生漂移或混杂因素未解决时，阻止或降级结论。
说“尚不清楚”比产生虚假信心更好。

常见陷阱

- 更改分子、分母或排除项后重复使用KPI名称 -> 趋势比较变得无效。
在一个图表中比较日、周、月粒度 -> 变化看似真实，但主要是聚合噪声。
显示百分比而不显示底层计数 -> 管理层对极小分母反应过度。
使用美观的图表而非正确的图表 -> 输出看似精致，但隐藏了实际决策信号。
在看到结果后寻找有趣的切分 -> 叙述追随偶然性而非证据。
在没有指标所有者或注意事项的情况下发送自动化报告 -> 坏数字传播速度超过其被纠正的速度。
将观察模式视为因果证据 -> 行动计划仅基于相关性构建。

方法选择

问题类型	方法	关键输出
“X与Y有差异吗？”	假设检验	p值 + 效应量 + 置信区间
“什么预测Z？”

有关技术细节及何时使用每种方法，请参阅techniques.md。

输出标准

1. 以洞察开头，而非方法论
量化不确定性 - 使用范围，而非点估计
说明局限性 - 此分析无法告诉你的内容
建议后续步骤 - 什么能加强结论

需升级处理的警示信号

- 用户想要“证明”一个预先确定的结论
样本量太小，无法进行可靠推断
数据质量问题导致分析无效
无法控制的混杂因素

外部端点

此技能不发起任何外部网络请求。

端点	发送的数据	目的
无	无	不适用

不向外部发送任何数据。

安全与隐私

离开你机器的数据：

- 默认情况下无。

留在本地的数据：

- 默认情况下无。

此技能不会：

- 访问未声明的外部端点。
将凭据或原始导出文件存储在隐藏的本地内存文件中。
创建或依赖本地文件夹系统进行持久化。
在未经用户明确确认的情况下创建自动化或后台任务。
重写自身的指令源文件。

反馈

- 如果有用：clawhub star data-analysis
保持更新：clawhub sync

Data Analysis数据分析

When to Use

Core Principle

Methodology First

Statistical Rigor Checklist

Architecture

Quick Reference

Core Rules

1. Start from the decision, not the dataset

2. Lock the metric contract before calculating

3. Separate extraction, transformation, and interpretation

4. Choose visuals to answer a question

5. Brief every result in decision format

6. Stress-test claims before recommending action

7. Escalate when the data cannot support the claim

Common Traps

Approach Selection

Output Standards

Red Flags to Escalate

External Endpoints

Security & Privacy

Related Skills

Feedback

使用时机

核心原则

方法论优先

统计严谨性检查清单

架构

快速参考

核心规则

1. 从决策出发，而非数据集

2. 在计算前锁定指标合同

3. 分离提取、转换和解读

4. 选择可视化来回答问题

5. 以决策格式简要呈现每个结果

6. 在建议行动前对结论进行压力测试

7. 当数据无法支持结论时升级处理

常见陷阱

方法选择

输出标准

需升级处理的警示信号

外部端点

安全与隐私

相关技能

反馈

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement