Data Analysis

When to Use

Use this skill when the user needs to analyze, explain, or visualize data from SQL, spreadsheets, notebooks, dashboards, exports, or ad hoc tables.

Use it for KPI debugging, experiment readouts, funnel or cohort analysis, anomaly reviews, executive reporting, and quality checks on metrics or query logic.

Prefer this skill over generic coding or spreadsheet help when the hard part is analytical judgment: metric definition, comparison design, interpretation, or recommendation.

User asks about: analyzing data, finding patterns, understanding metrics, testing hypotheses, cohort analysis, A/B testing, churn analysis, or statistical significance.

Core Principle

Analysis without a decision is just arithmetic. Always clarify: What would change if this analysis shows X vs Y?

Methodology First

Before touching data:

1. What decision is this analysis supporting?
What would change your mind? (the real question)
What data do you actually have vs what you wish you had?
What timeframe is relevant?

Statistical Rigor Checklist

- [ ] Sample size sufficient? (small N = wide confidence intervals)
[ ] Comparison groups fair? (same time period, similar conditions)
[ ] Multiple comparisons? (20 tests = 1 "significant" by chance)
[ ] Effect size meaningful? (statistically significant != practically important)
[ ] Uncertainty quantified? ("12-18% lift" not just "15% lift")

Architecture

This skill does not require local folders, persistent memory, or setup state.

Use the included reference files as lightweight guides:

- metric-contracts.md for KPI definitions and caveats
INLINECODE1 for visual choice and chart anti-patterns
INLINECODE2 for stakeholder-facing outputs
INLINECODE3 and techniques.md for analytical rigor and method choice

Quick Reference

Load only the smallest relevant file to keep context focused.

Topic	File
Metric definition contracts	INLINECODE5
Visual selection and chart anti-patterns

Core Rules

1. Start from the decision, not the dataset

- Identify the decision owner, the question that could change a decision, and the deadline before doing analysis.
If no decision would change, reframe the request before computing anything.

2. Lock the metric contract before calculating

- Define entity, grain, numerator, denominator, time window, timezone, filters, exclusions, and source of truth.
If any of those are ambiguous, state the ambiguity explicitly before presenting results.

3. Separate extraction, transformation, and interpretation

- Keep query logic, cleanup assumptions, and analytical conclusions distinguishable.
Never hide business assumptions inside SQL, formulas, or notebook code without naming them in the write-up.

4. Choose visuals to answer a question

- Select charts based on the analytical question: trend, comparison, distribution, relationship, composition, funnel, or cohort retention.
Do not add charts that make the deck look fuller but do not change the decision.

5. Brief every result in decision format

- Every output should include the answer, evidence, confidence, caveats, and recommended next action.
If the output is going to a stakeholder, translate the method into business implications instead of leading with technical detail.

6. Stress-test claims before recommending action

- Segment by obvious confounders, compare the right baseline, quantify uncertainty, and check sensitivity to exclusions or time windows.
Strong-looking numbers without robustness checks are not decision-ready.

7. Escalate when the data cannot support the claim

- Block or downgrade conclusions when sample size is weak, the source is unreliable, definitions drifted, or confounding is unresolved.
It is better to say "unknown yet" than to produce false confidence.

Common Traps

- Reusing a KPI name after changing numerator, denominator, or exclusions -> trend comparisons become invalid.
Comparing daily, weekly, and monthly grains in one chart -> movement looks real but is mostly aggregation noise.
Showing percentages without underlying counts -> leadership overreacts to tiny denominators.
Using a pretty chart instead of the right chart -> the output looks polished but hides the actual decision signal.
Hunting for interesting cuts after seeing the result -> narrative follows chance instead of evidence.
Shipping automated reports without metric owners or caveats -> bad numbers spread faster than they can be corrected.
Treating observational patterns as causal proof -> action plans get built on correlation alone.

Approach Selection

Question type	Approach	Key output
"Is X different from Y?"	Hypothesis test	p-value + effect size + CI
"What predicts Z?"

For technique details and when to use each, see techniques.md.

Output Standards

1. Lead with the insight, not the methodology
Quantify uncertainty - ranges, not point estimates
State limitations - what this analysis can't tell you
Recommend next steps - what would strengthen the conclusion

Red Flags to Escalate

- User wants to "prove" a predetermined conclusion
Sample size too small for reliable inference
Data quality issues that invalidate analysis
Confounders that can't be controlled for

External Endpoints

This skill makes no external network requests.

Endpoint	Data Sent	Purpose
None	None	N/A

No data is sent externally.

Security & Privacy

Data that leaves your machine:

- Nothing by default.

Data that stays local:

- Nothing by default.

This skill does NOT:

- Access undeclared external endpoints.
Store credentials or raw exports in hidden local memory files.
Create or depend on local folder systems for persistence.
Create automations or background jobs without explicit user confirmation.
Rewrite its own instruction source files.

Related Skills

Install with clawhub install <slug> if user confirms:

- sql - query design and review for reliable data extraction.
INLINECODE13 - cleanup and normalization for tabular inputs before analysis.
INLINECODE14 - implementation patterns for KPI visualization layers.
INLINECODE15 - structured stakeholder-facing deliverables after analysis.
INLINECODE16 - KPI systems and operating cadence beyond one-off analysis.

Feedback

- If useful: INLINECODE17
Stay updated: INLINECODE18

技能名称：数据分析

详细描述：

何时使用

当用户需要分析、解释或可视化来自SQL、电子表格、笔记本、仪表板、导出文件或临时表格中的数据时，使用此技能。

用于KPI调试、实验解读、漏斗或同期群分析、异常审查、高管报告，以及对指标或查询逻辑的质量检查。

当难点在于分析判断（如指标定义、对比设计、解读或建议）时，优先使用此技能，而非通用编码或电子表格帮助。

用户询问：分析数据、发现模式、理解指标、检验假设、同期群分析、A/B测试、流失分析或统计显著性。

核心原则

没有决策的分析只是算术。始终明确：如果分析结果显示X而非Y，会有什么不同？

方法论优先

在接触数据之前：

1. 这项分析支持什么决策？
什么会改变你的想法？（真正的问题）
你实际拥有什么数据，与你希望拥有什么数据？
相关的时间范围是什么？

统计严谨性检查清单

- [ ] 样本量是否充足？（样本量小 = 置信区间宽）
[ ] 对比组是否公平？（相同时间段，相似条件）
[ ] 是否存在多重比较？（20次测试 = 有1次因偶然性而“显著”）
[ ] 效应量是否有意义？（统计显著 != 实际重要）
[ ] 不确定性是否量化？（“提升12-18%”而非仅“提升15%”）

架构

此技能不需要本地文件夹、持久内存或设置状态。

使用附带的参考文件作为轻量级指南：

- metric-contracts.md 用于KPI定义和注意事项
chart-selection.md 用于视觉选择和图表反模式
decision-briefs.md 用于面向利益相关者的输出
pitfalls.md 和 techniques.md 用于分析严谨性和方法选择

快速参考

仅加载最小的相关文件以保持上下文聚焦。

主题	文件
指标定义合同	metric-contracts.md
视觉选择与图表反模式

核心规则

1. 从决策出发，而非数据集

- 在进行分析前，确定决策者、可能改变决策的问题以及截止日期。
如果没有决策会改变，则在计算任何内容之前重新构建请求。

2. 在计算前锁定指标合同

- 定义实体、粒度、分子、分母、时间窗口、时区、筛选条件、排除项和事实来源。
如果其中任何一项不明确，在呈现结果前明确说明这种模糊性。

3. 分离提取、转换和解读

- 确保查询逻辑、清理假设和分析结论是可区分的。
切勿在未在书面报告中说明的情况下，将业务假设隐藏在SQL、公式或笔记本代码中。

4. 选择可视化来回答问题

- 根据分析问题选择图表：趋势、比较、分布、关系、构成、漏斗或同期群留存。
不要添加那些让演示文稿看起来更充实但不会改变决策的图表。

5. 以决策格式简要说明每个结果

- 每个输出都应包含答案、证据、置信度、注意事项和推荐的下一步行动。
如果输出面向利益相关者，请将方法转化为业务影响，而不是首先提供技术细节。

6. 在建议行动前对结论进行压力测试

- 按明显的混杂因素进行细分，比较正确的基线，量化不确定性，并检查对排除项或时间窗口的敏感性。
未经稳健性检查的看似强劲的数字并非决策就绪。

7. 当数据无法支持结论时升级问题

- 当样本量不足、来源不可靠、定义发生偏移或混杂因素未解决时，阻止或降级结论。
说“尚不清楚”比制造虚假信心要好。

常见陷阱

- 在更改分子、分母或排除项后重复使用KPI名称 -> 趋势比较变得无效。
在一个图表中比较日、周、月粒度 -> 变化看似真实，但主要是聚合噪声。
显示百分比而不显示底层计数 -> 领导层对微小分母反应过度。
使用漂亮的图表而非正确的图表 -> 输出看起来精致，但隐藏了实际的决策信号。
在看到结果后寻找有趣的切分 -> 叙述跟随偶然性而非证据。
在没有指标负责人或注意事项的情况下发送自动化报告 -> 坏数字传播速度超过其被纠正的速度。
将观察性模式视为因果证据 -> 行动计划仅建立在相关性之上。

方法选择

问题类型	方法	关键输出
“X与Y不同吗？”	假设检验	p值 + 效应量 + 置信区间
“什么预测Z？”

有关技术细节及何时使用每种方法，请参阅 techniques.md。

输出标准

1. 以洞察为先导，而非方法论
量化不确定性 - 使用范围，而非点估计
说明局限性 - 此分析无法告诉你什么
推荐后续步骤 - 什么可以加强结论

需升级的警示信号

- 用户想要“证明”一个预先确定的结论
样本量太小，无法进行可靠推断
数据质量问题使分析无效
无法控制的混杂因素

外部端点

此技能不发出外部网络请求。

端点	发送的数据	目的
无	无	不适用

没有数据被发送到外部。

安全与隐私

离开你机器的数据：

- 默认情况下无。

保留在本地的数据：

- 默认情况下无。

此技能不会：

- 访问未声明的外部端点。
将凭据或原始导出文件存储在隐藏的本地内存文件中。
创建或依赖本地文件夹系统进行持久化。
在未经用户明确确认的情况下创建自动化或后台任务。
重写其自身的指令源文件。

反馈

- 如果有用：clawhub star data-analysis
保持更新：clawhub sync

Data Analysis数据分析

When to Use

Core Principle

Methodology First

Statistical Rigor Checklist

Architecture

Quick Reference

Core Rules

1. Start from the decision, not the dataset

2. Lock the metric contract before calculating

3. Separate extraction, transformation, and interpretation

4. Choose visuals to answer a question

5. Brief every result in decision format

6. Stress-test claims before recommending action

7. Escalate when the data cannot support the claim

Common Traps

Approach Selection

Output Standards

Red Flags to Escalate

External Endpoints

Security & Privacy

Related Skills

Feedback

何时使用

核心原则

方法论优先

统计严谨性检查清单

架构

快速参考

核心规则

1. 从决策出发，而非数据集

2. 在计算前锁定指标合同

3. 分离提取、转换和解读

4. 选择可视化来回答问题

5. 以决策格式简要说明每个结果

6. 在建议行动前对结论进行压力测试

7. 当数据无法支持结论时升级问题

常见陷阱

方法选择

输出标准

需升级的警示信号

外部端点

安全与隐私

相关技能

反馈

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement