When to Use
Use this skill when the user needs to analyze, explain, or visualize data from SQL, spreadsheets, notebooks, dashboards, exports, or ad hoc tables.
Use it for KPI debugging, experiment readouts, funnel or cohort analysis, anomaly reviews, executive reporting, and quality checks on metrics or query logic.
Prefer this skill over generic coding or spreadsheet help when the hard part is analytical judgment: metric definition, comparison design, interpretation, or recommendation.
User asks about: analyzing data, finding patterns, understanding metrics, testing hypotheses, cohort analysis, A/B testing, churn analysis, or statistical significance.
Core Principle
Analysis without a decision is just arithmetic. Always clarify: What would change if this analysis shows X vs Y?
Methodology First
Before touching data:
- 1. What decision is this analysis supporting?
- What would change your mind? (the real question)
- What data do you actually have vs what you wish you had?
- What timeframe is relevant?
Statistical Rigor Checklist
- - [ ] Sample size sufficient? (small N = wide confidence intervals)
- [ ] Comparison groups fair? (same time period, similar conditions)
- [ ] Multiple comparisons? (20 tests = 1 "significant" by chance)
- [ ] Effect size meaningful? (statistically significant != practically important)
- [ ] Uncertainty quantified? ("12-18% lift" not just "15% lift")
Architecture
This skill does not require local folders, persistent memory, or setup state.
Use the included reference files as lightweight guides:
- -
metric-contracts.md for KPI definitions and caveats - INLINECODE1 for visual choice and chart anti-patterns
- INLINECODE2 for stakeholder-facing outputs
- INLINECODE3 and
techniques.md for analytical rigor and method choice
Quick Reference
Load only the smallest relevant file to keep context focused.
| Topic | File |
|---|
| Metric definition contracts | INLINECODE5 |
| Visual selection and chart anti-patterns |
chart-selection.md |
| Decision-ready output formats |
decision-briefs.md |
| Failure modes to catch early |
pitfalls.md |
| Method selection by question type |
techniques.md |
Core Rules
1. Start from the decision, not the dataset
- - Identify the decision owner, the question that could change a decision, and the deadline before doing analysis.
- If no decision would change, reframe the request before computing anything.
2. Lock the metric contract before calculating
- - Define entity, grain, numerator, denominator, time window, timezone, filters, exclusions, and source of truth.
- If any of those are ambiguous, state the ambiguity explicitly before presenting results.
3. Separate extraction, transformation, and interpretation
- - Keep query logic, cleanup assumptions, and analytical conclusions distinguishable.
- Never hide business assumptions inside SQL, formulas, or notebook code without naming them in the write-up.
4. Choose visuals to answer a question
- - Select charts based on the analytical question: trend, comparison, distribution, relationship, composition, funnel, or cohort retention.
- Do not add charts that make the deck look fuller but do not change the decision.
5. Brief every result in decision format
- - Every output should include the answer, evidence, confidence, caveats, and recommended next action.
- If the output is going to a stakeholder, translate the method into business implications instead of leading with technical detail.
6. Stress-test claims before recommending action
- - Segment by obvious confounders, compare the right baseline, quantify uncertainty, and check sensitivity to exclusions or time windows.
- Strong-looking numbers without robustness checks are not decision-ready.
7. Escalate when the data cannot support the claim
- - Block or downgrade conclusions when sample size is weak, the source is unreliable, definitions drifted, or confounding is unresolved.
- It is better to say "unknown yet" than to produce false confidence.
Common Traps
- - Reusing a KPI name after changing numerator, denominator, or exclusions -> trend comparisons become invalid.
- Comparing daily, weekly, and monthly grains in one chart -> movement looks real but is mostly aggregation noise.
- Showing percentages without underlying counts -> leadership overreacts to tiny denominators.
- Using a pretty chart instead of the right chart -> the output looks polished but hides the actual decision signal.
- Hunting for interesting cuts after seeing the result -> narrative follows chance instead of evidence.
- Shipping automated reports without metric owners or caveats -> bad numbers spread faster than they can be corrected.
- Treating observational patterns as causal proof -> action plans get built on correlation alone.
Approach Selection
| Question type | Approach | Key output |
|---|
| "Is X different from Y?" | Hypothesis test | p-value + effect size + CI |
| "What predicts Z?" |
Regression/correlation | Coefficients + R² + residual check |
| "How do users behave over time?" | Cohort analysis | Retention curves by cohort |
| "Are these groups different?" | Segmentation | Profiles + statistical comparison |
| "What's unusual?" | Anomaly detection | Flagged points + context |
For technique details and when to use each, see techniques.md.
Output Standards
- 1. Lead with the insight, not the methodology
- Quantify uncertainty - ranges, not point estimates
- State limitations - what this analysis can't tell you
- Recommend next steps - what would strengthen the conclusion
Red Flags to Escalate
- - User wants to "prove" a predetermined conclusion
- Sample size too small for reliable inference
- Data quality issues that invalidate analysis
- Confounders that can't be controlled for
External Endpoints
This skill makes no external network requests.
| Endpoint | Data Sent | Purpose |
|---|
| None | None | N/A |
No data is sent externally.
Security & Privacy
Data that leaves your machine:
Data that stays local:
This skill does NOT:
- - Access undeclared external endpoints.
- Store credentials or raw exports in hidden local memory files.
- Create or depend on local folder systems for persistence.
- Create automations or background jobs without explicit user confirmation.
- Rewrite its own instruction source files.
Related Skills
Install with
clawhub install <slug> if user confirms:
- -
sql - query design and review for reliable data extraction. - INLINECODE13 - cleanup and normalization for tabular inputs before analysis.
- INLINECODE14 - implementation patterns for KPI visualization layers.
- INLINECODE15 - structured stakeholder-facing deliverables after analysis.
- INLINECODE16 - KPI systems and operating cadence beyond one-off analysis.
Feedback
- - If useful: INLINECODE17
- Stay updated: INLINECODE18
技能名称:数据分析
详细描述:
使用时机
当用户需要分析、解释或可视化来自SQL、电子表格、笔记本、仪表盘、导出文件或临时表格中的数据时,使用此技能。
用于KPI调试、实验解读、漏斗或同期群分析、异常审查、高管报告,以及指标或查询逻辑的质量检查。
当难点在于分析判断(指标定义、对比设计、解读或建议)时,优先使用此技能,而非通用编码或电子表格帮助。
用户询问:分析数据、发现模式、理解指标、检验假设、同期群分析、A/B测试、流失分析或统计显著性。
核心原则
没有决策的分析只是算术。始终明确:如果分析结果显示X而非Y,会改变什么?
方法论优先
在接触数据之前:
- 1. 这项分析支持什么决策?
- 什么会改变你的想法?(真正的问题)
- 你实际拥有什么数据,与你希望拥有什么数据?
- 相关的时间范围是什么?
统计严谨性检查清单
- - [ ] 样本量是否充足?(样本量小 = 置信区间宽)
- [ ] 对比组是否公平?(相同时间段,相似条件)
- [ ] 是否存在多重比较?(20次检验 = 1次“显著”纯属偶然)
- [ ] 效应量是否有实际意义?(统计显著 ≠ 实际重要)
- [ ] 不确定性是否量化?(“提升12-18%”而非仅“提升15%”)
架构
此技能不需要本地文件夹、持久内存或设置状态。
使用附带的参考文件作为轻量级指南:
- - metric-contracts.md:KPI定义与注意事项
- chart-selection.md:可视化选择与图表反模式
- decision-briefs.md:面向利益相关者的输出
- pitfalls.md和techniques.md:分析严谨性与方法选择
快速参考
仅加载最小的相关文件以保持上下文聚焦。
| 主题 | 文件 |
|---|
| 指标定义合同 | metric-contracts.md |
| 可视化选择与图表反模式 |
chart-selection.md |
| 决策就绪的输出格式 | decision-briefs.md |
| 需及早发现的失败模式 | pitfalls.md |
| 按问题类型选择方法 | techniques.md |
核心规则
1. 从决策出发,而非数据集
- - 在进行分析前,确定决策者、可能改变决策的问题以及截止日期。
- 如果没有决策会改变,则在计算任何内容之前重新定义请求。
2. 在计算前锁定指标合同
- - 定义实体、粒度、分子、分母、时间窗口、时区、筛选条件、排除项和真相来源。
- 如果其中任何一项不明确,在呈现结果前明确说明歧义。
3. 分离提取、转换和解读
- - 保持查询逻辑、清理假设和分析结论可区分。
- 切勿在SQL、公式或笔记本代码中隐藏业务假设,而不在书面报告中注明。
4. 选择可视化来回答问题
- - 根据分析问题选择图表:趋势、对比、分布、关系、构成、漏斗或同期群留存。
- 不要添加那些让演示文稿看起来更充实但不会改变决策的图表。
5. 以决策格式简要呈现每个结果
- - 每个输出都应包含答案、证据、置信度、注意事项和建议的下一步行动。
- 如果输出面向利益相关者,应将方法转化为业务影响,而非以技术细节开头。
6. 在建议行动前对结论进行压力测试
- - 按明显的混杂因素进行细分,比较正确的基线,量化不确定性,并检查对排除项或时间窗口的敏感性。
- 未经稳健性检验的看似强劲的数字,并非决策就绪。
7. 当数据无法支持结论时升级处理
- - 当样本量不足、来源不可靠、定义发生漂移或混杂因素未解决时,阻止或降级结论。
- 说“尚不清楚”比产生虚假信心更好。
常见陷阱
- - 更改分子、分母或排除项后重复使用KPI名称 -> 趋势比较变得无效。
- 在一个图表中比较日、周、月粒度 -> 变化看似真实,但主要是聚合噪声。
- 显示百分比而不显示底层计数 -> 管理层对极小分母反应过度。
- 使用美观的图表而非正确的图表 -> 输出看似精致,但隐藏了实际决策信号。
- 在看到结果后寻找有趣的切分 -> 叙述追随偶然性而非证据。
- 在没有指标所有者或注意事项的情况下发送自动化报告 -> 坏数字传播速度超过其被纠正的速度。
- 将观察模式视为因果证据 -> 行动计划仅基于相关性构建。
方法选择
| 问题类型 | 方法 | 关键输出 |
|---|
| “X与Y有差异吗?” | 假设检验 | p值 + 效应量 + 置信区间 |
| “什么预测Z?” |
回归/相关分析 | 系数 + R² + 残差检验 |
| “用户随时间如何行为?” | 同期群分析 | 按同期群划分的留存曲线 |
| “这些群体不同吗?” | 细分分析 | 画像 + 统计比较 |
| “什么异常?” | 异常检测 | 标记点 + 背景信息 |
有关技术细节及何时使用每种方法,请参阅techniques.md。
输出标准
- 1. 以洞察开头,而非方法论
- 量化不确定性 - 使用范围,而非点估计
- 说明局限性 - 此分析无法告诉你的内容
- 建议后续步骤 - 什么能加强结论
需升级处理的警示信号
- - 用户想要“证明”一个预先确定的结论
- 样本量太小,无法进行可靠推断
- 数据质量问题导致分析无效
- 无法控制的混杂因素
外部端点
此技能不发起任何外部网络请求。
不向外部发送任何数据。
安全与隐私
离开你机器的数据:
留在本地的数据:
此技能不会:
- - 访问未声明的外部端点。
- 将凭据或原始导出文件存储在隐藏的本地内存文件中。
- 创建或依赖本地文件夹系统进行持久化。
- 在未经用户明确确认的情况下创建自动化或后台任务。
- 重写自身的指令源文件。
相关技能
如果用户确认,使用 clawhub install
安装:
- - sql - 用于可靠数据提取的查询设计与审查。
- csv - 分析前对表格输入进行清理和标准化。
- dashboard - KPI可视化层的实现模式。
- report - 分析后面向利益相关者的结构化交付物。
- business-intelligence - 超越一次性分析的KPI系统与运营节奏。
反馈
- - 如果有用:clawhub star data-analysis
- 保持更新:clawhub sync