data-analysis-litiao

When to Load

User asks about: analyzing data, finding patterns, understanding metrics, testing hypotheses, cohort analysis, A/B testing, churn analysis, statistical significance.

Core Principle

Analysis without a decision is just arithmetic. Always clarify: What would change if this analysis shows X vs Y?

Methodology First

Before touching data:

1. What decision is this analysis supporting?
What would change your mind? (the real question)
What data do you actually have vs what you wish you had?
What timeframe is relevant?

Statistical Rigor Checklist

- [ ] Sample size sufficient? (small N = wide confidence intervals)
[ ] Comparison groups fair? (same time period, similar conditions)
[ ] Multiple comparisons? (20 tests = 1 "significant" by chance)
[ ] Effect size meaningful? (statistically significant ≠ practically important)
[ ] Uncertainty quantified? ("12-18% lift" not just "15% lift")

Analytical Pitfalls to Catch

Pitfall	What it looks like	How to avoid
Simpson's Paradox	Trend reverses when you segment	Always check by key dimensions
Survivorship bias

For detailed examples of each pitfall, see pitfalls.md.

Approach Selection

Question type	Approach	Key output
"Is X different from Y?"	Hypothesis test	p-value + effect size + CI
"What predicts Z?"

For technique details and when to use each, see techniques.md.

Output Standards

1. Lead with the insight, not the methodology
Quantify uncertainty — ranges, not point estimates
State limitations — what this analysis can't tell you
Recommend next steps — what would strengthen the conclusion

Red Flags to Escalate

- User wants to "prove" a predetermined conclusion
Sample size too small for reliable inference
Data quality issues that invalidate analysis
Confounders that can't be controlled for

何时使用

用户询问：分析数据、发现模式、理解指标、检验假设、同期群分析、A/B测试、流失分析、统计显著性。

核心原则

没有决策的分析只是算术。始终明确：如果分析结果显示X而非Y，会改变什么？

方法论优先

在处理数据之前：

1. 这项分析支持什么决策？
什么会改变你的想法？（真正的问题）
你实际拥有什么数据 vs 你希望拥有什么数据？
相关的时间范围是什么？

统计严谨性检查清单

- [ ] 样本量是否充足？（小样本 = 宽置信区间）
[ ] 比较组是否公平？（相同时间段、相似条件）
[ ] 是否存在多重比较？（20次检验 = 1次显著纯属偶然）
[ ] 效应量是否有意义？（统计显著 ≠ 实际重要）
[ ] 不确定性是否量化？（12-18%提升而非仅15%提升）

需警惕的分析陷阱

陷阱	表现	如何避免
辛普森悖论	细分后趋势反转	始终按关键维度检查
幸存者偏差

每个陷阱的详细示例，请参见 pitfalls.md。

方法选择

问题类型	方法	关键输出
X与Y是否不同？	假设检验	p值 + 效应量 + 置信区间
什么预测Z？

技术细节及何时使用每种方法，请参见 techniques.md。

输出标准

1. 以洞察为先，而非方法论
量化不确定性——范围，而非点估计
说明局限性——此分析无法告诉你的内容
推荐后续步骤——什么能加强结论

需升级的红旗信号

- 用户想要证明预先确定的结论
样本量过小，无法进行可靠推断
数据质量问题导致分析无效
无法控制的混杂因素

data-analysis-litiao数据分析