OpenClaw R Stats
When to Use
User asks for any statistical analysis, hypothesis testing, group comparison,
prediction, association, survival analysis, meta-analysis, causal inference,
power/sample size, or mentions R statistical packages.
What This Skill Does NOT Do
- - Claim causality from observational data (use "associated with")
- Run large exploratory fishing without clear user intent
- Silently ignore assumption violations
- Report only p-values (always include effect sizes and CIs)
Pre-Flight (Mandatory)
- 1. Confirm dataset exists and is readable
- Run schema inspection: INLINECODE0
- Report: rows, columns, types, missing values
- If missing > 5%, warn. If n < 30, warn small sample.
Environment Setup
First time or errors: INLINECODE1
Install by profile (only when needed):
| Profile | Script | Methods |
|---|
| Core | INLINECODE2 | t-test, regression, ANOVA, chi-sq |
| Survival |
install-survival.R | KM, Cox, competing risks, RMST |
| Missing |
install-missing.R | MICE, MCAR test |
| Mixed |
install-mixed.R | LMM, GLMM, GEE, ICC |
| Bayes |
install-bayes.R | brms, Bayes factors |
| Causal |
install-causal.R | PSM, IPTW, IV, DiD, RDD, TMLE |
| Meta |
install-meta.R | meta-analysis, NMA |
| SEM |
install-sem.R | SEM, CFA, lavaan |
| Diagnostic |
install-diagnostic.R | ROC, kappa, alpha |
| Advanced |
install-advanced.R | GAM, quantile, zero-inflated |
| Power |
install-power.R | power/sample size |
Workflow
- 1. Determine analysis type (see references/METHOD_TABLE.md)
- Inspect dataset schema
- Build JSON spec:
{
"dataset_path": "<path>",
"analysis_type": "<type>",
"outcome": "<column>",
"predictors": ["<col1>"],
"hypothesis": "<plain language>",
"alpha": 0.05,
"seed": 42,
"output_dir": "<path>"
}
- 4. Save as .json, run: INLINECODE13
- Read summary.json + report.md
- Present: Summary → Statistics → Interpretation → Plots → Assumptions → Caveats
Analysis Selection
For the complete 82-method table with user intent mapping,
see references/METHOD_TABLE.md.
Quick lookup — most common:
| Intent | analysis_type |
|---|
| Compare 2 groups | INLINECODE14 or INLINECODE15 |
| Compare 3+ groups |
anova or
kruskal |
| Categorical association |
chisq or
fisher |
| Predict continuous |
linear_regression |
| Predict binary |
logistic_regression |
| Survival curves |
kaplan_meier |
| Survival regression |
cox_regression |
| Meta-analysis |
meta_analysis |
| Causal effect |
propensity_match or
did |
| Power/sample size |
power_analysis |
Automatic Method Switching
- - Non-normal + n < 30 →
wilcoxon over INLINECODE29 - Unequal variance → Welch t-test (
equal_var: false) - Expected cells < 5 →
fisher over INLINECODE32 - Overdispersion in Poisson → suggest negative binomial
- Heteroscedastic residuals → robust SE warning
Reporting Rules (Non-Negotiable)
Every analysis MUST include:
- - Sample size (n) and missing data handling
- Method name and rationale
- Point estimates with confidence intervals
- Effect sizes (Cohen's d, η², R², OR, HR, etc.)
- Assumption check results
- Limitations
Language: "associated with" / "evidence suggests" — NEVER "proves" / "causes"
Spec Field Reference
See references/SPECREFERENCE.md for required/optional fields per analysistype.
OpenClaw R 统计
使用时机
当用户提出任何统计分析、假设检验、组间比较、预测、关联性、生存分析、荟萃分析、因果推断、功效/样本量问题,或提及R统计包时使用。
本技能不执行的操作
- - 从观察性数据中声称因果关系(应使用与...相关)
- 在用户意图不明确时运行大规模探索性分析
- 静默忽略假设违反情况
- 仅报告p值(始终包含效应量和置信区间)
前置检查(必选)
- 1. 确认数据集存在且可读取
- 运行模式检查:bash {baseDir}/scripts/run-rstats.sh schema --data <路径>
- 报告:行数、列数、类型、缺失值
- 若缺失值>5%,发出警告。若n<30,发出小样本警告。
环境设置
首次使用或出现错误时:bash {baseDir}/scripts/run-rstats.sh doctor
按配置文件安装(仅在需要时):
| 配置文件 | 脚本 | 方法 |
|---|
| 核心 | install-core.R | t检验、回归、方差分析、卡方检验 |
| 生存 |
install-survival.R | KM、Cox、竞争风险、RMST |
| 缺失 | install-missing.R | MICE、MCAR检验 |
| 混合 | install-mixed.R | LMM、GLMM、GEE、ICC |
| 贝叶斯 | install-bayes.R | brms、贝叶斯因子 |
| 因果 | install-causal.R | PSM、IPTW、IV、DiD、RDD、TMLE |
| 荟萃 | install-meta.R | 荟萃分析、NMA |
| SEM | install-sem.R | SEM、CFA、lavaan |
| 诊断 | install-diagnostic.R | ROC、kappa、alpha |
| 高级 | install-advanced.R | GAM、分位数、零膨胀 |
| 功效 | install-power.R | 功效/样本量 |
工作流程
- 1. 确定分析类型(参见 references/METHOD_TABLE.md)
- 检查数据集模式
- 构建JSON规范:
json
{
dataset_path: <路径>,
analysis_type: <类型>,
outcome: <列名>,
predictors: [<列名1>],
hypothesis: <通俗语言描述>,
alpha: 0.05,
seed: 42,
output_dir: <路径>
}
- 4. 保存为.json文件,运行:bash {baseDir}/scripts/run-rstats.sh analyze --spec <路径>
- 读取summary.json + report.md
- 呈现:摘要 → 统计量 → 解释 → 图表 → 假设检验 → 注意事项
分析选择
完整的82种方法表及用户意图映射,
请参见 references/METHOD_TABLE.md。
快速查询 — 最常用:
| 意图 | analysis_type |
|---|
| 比较2组 | ttest 或 wilcoxon |
| 比较3+组 |
anova 或 kruskal |
| 分类关联 | chisq 或 fisher |
| 预测连续变量 | linear_regression |
| 预测二分类变量 | logistic_regression |
| 生存曲线 | kaplan_meier |
| 生存回归 | cox_regression |
| 荟萃分析 | meta_analysis |
| 因果效应 | propensity_match 或 did |
| 功效/样本量 | power_analysis |
自动方法切换
- - 非正态 + n < 30 → 使用 wilcoxon 替代 ttest
- 方差不齐 → Welch t检验(equal_var: false)
- 期望频数 < 5 → 使用 fisher 替代 chisq
- 泊松过离散 → 建议使用负二项分布
- 异方差残差 → 稳健标准误警告
报告规则(不可协商)
每项分析必须包含:
- - 样本量(n)和缺失数据处理方式
- 方法名称及选择理由
- 点估计及置信区间
- 效应量(Cohens d、η²、R²、OR、HR等)
- 假设检验结果
- 局限性
语言表述:与...相关/证据表明 — 绝不使用证明/导致
规范字段参考
每种分析类型的必填/可选字段请参见 references/SPEC_REFERENCE.md。