Two-Sample Mendelian Randomization Research Planner
Generates a complete two-sample MR study design from a user-provided research direction. Always outputs four workload configurations and a recommended primary plan.
Supported Study Styles
| Style | Description | Example |
|---|
| A. Single Exposure → Single Outcome | One biomarker or trait to one disease | Serum uric acid → gout; vitamin D → osteoporosis |
| B. Multi-Exposure Screening |
Panel of exposures to one outcome | Dietary factors → endometriosis; cytokine panel → RA |
|
C. Bidirectional MR | Reciprocal causal testing | Inflammation ↔ depression; BMI ↔ osteoarthritis |
|
D. Lifestyle / Diet / Behavioral | Self-reported behavioral exposures | Coffee intake → hypertension; sleep duration → stroke |
|
E. Biomarker / Molecular Trait | Circulating proteins, metabolites | Cytokines → autoimmune disease; plasma proteins → Alzheimer's |
|
F. Publication-Oriented | Comprehensive sensitivity-rich design | Full estimator suite with complete figure set |
Minimum User Input
- - One exposure (or exposure set) + one outcome
- If limited detail is provided, infer a reasonable default design and state all assumptions explicitly
Step-by-Step Execution
Step 1: Infer Study Type
Identify:
- - Exposure(s) and outcome
- Exposure class (dietary, biomarker, metabolite, behavioral, disease trait, molecular)
- User goal: screening, bidirectional, causal verification, or publication strength
- Whether MVMR or colocalization is justified
- Time or resource constraints stated by the user
Step 2: Output Four Configurations
Always generate all four. For each configuration describe: goal, required data, major modules, expected workload, figure set, strengths, and weaknesses.
| Config | Goal | Timeframe | Best For |
|---|
| Lite | Fast minimal causal test | 2–4 weeks | Quick launch, 1 exposure × 1 outcome |
| Standard |
Publication-ready core MR | 4–8 weeks | Single or small panel + sensitivity suite |
|
Advanced | Robust multi-extension design | 8–14 weeks | Bidirectional, MVMR, replication GWAS |
|
Publication+ | High-impact comprehensive paper | 12–20 weeks | Full sensitivity, MVMR, colocalization, power |
Step 3: Recommend One Primary Plan
Select the best-fit configuration and explain why, given the exposure type, outcome, and any stated user constraints (time, data access, publication goal).
Step 4: Full Step-by-Step Workflow
For each step include: step name, purpose, input, method, key parameters/thresholds, expected output, failure points, and alternative approaches.
Core modules to address when relevant:
- - Exposure GWAS selection + ancestry matching
- Outcome GWAS selection
- Instrument extraction (p < 5×10⁻⁸, LD clumping r² < 0.001 / 10,000 kb)
- F-statistic screening (F > 10)
- Harmonization (palindromic SNP handling)
- IVW (primary analysis, random effects)
- MR-Egger, weighted median, simple/weighted mode (complementary)
- Heterogeneity (Cochran's Q, I²)
- Pleiotropy (MR-Egger intercept, MR-PRESSO)
- Leave-one-out analysis
- Bidirectional MR (when justified — see Hard Rules)
- MVMR (when confounding exposures need adjustment)
- Power / MDES discussion
- Colocalization (Advanced / Publication+ only; PP.H4 > 0.8 standard)
Exposure-class IV count benchmarks — state expected IV count and flag weak-instrument risk accordingly:
→ Full benchmarks by exposure class: references/iv_benchmarks.md
GWAS data sources by exposure class:
→ Recommended databases and last-verified dates: references/gwas_databases.md
Fault tolerance guidelines:
- - If the target GWAS is unavailable: state this explicitly, suggest the closest publicly available alternative, and recommend the Lite configuration until data access is confirmed
- If IV count falls below 3: warn the user that MR is not feasible with current instruments; suggest waiting for larger GWAS or pivoting to a proxy exposure
- If F-statistic < 10 for all IVs: do not proceed with IVW as primary; escalate to weak-instrument-robust methods (LIML, sisVIVE) and note this as a study limitation
Step 5: Figure and Deliverable Plan
Always list:
- - Scatter plots (exposure–outcome per estimator)
- Forest plots (leave-one-out)
- Funnel plots (pleiotropy visual)
- Summary results table (all estimators)
- Sensitivity analysis table
Step 6: Validation and Robustness Plan
State what each layer proves and what it does not prove. Distinguish:
- - Primary MR evidence: IVW result + instrument validity checks (F > 10, no strong pleiotropy signal)
- Sensitivity support: estimator consistency across MR-Egger, weighted median, mode; Cochran Q non-significant
- Higher-tier causal strengthening: MVMR (adjusts for correlated exposures), bidirectional MR (rules out reverse causation), colocalization (rules out LD confounding)
Step 7: Risk Review
Always include a self-critical section addressing:
- - Strongest part of the design
- Most assumption-dependent part
- Most likely source of false positives
- Easiest part to overinterpret
- Most likely reviewer criticisms: weak instruments, pleiotropy, ancestry mismatch, sample overlap, multiple-testing (for screening studies), behavioral phenotype noise, insufficient IV count for dietary/microbiome exposures
- Revision strategy if first-pass findings fail
Step 8: Minimal Executable Version
Slim version using only publicly available GWAS: 1 exposure (or small set), 1 outcome, IVW + 1–2 complementary estimators, heterogeneity/pleiotropy/leave-one-out, concise interpretation. Confirm this fits within any stated time constraints before recommending.
Step 9: Publication Upgrade Path
Explain what to add beyond Standard, which additions most improve publication strength, and which modules add rigor versus complexity. For molecular trait MR (proteins, metabolites), always include colocalization as a required upgrade for high-impact journals.
R Code Framework Guidelines
When providing R code examples or frameworks:
- - Always use the
TwoSampleMR package (CRAN) as the primary tool - Mark all GWAS IDs as examples with an explicit inline comment: INLINECODE1
- Do not present example IDs as validated or guaranteed to resolve correctly
- Provide the IEU Open GWAS API query pattern so users can search for their own phenotype IDs
Standard R framework template:
CODEBLOCK0
To find valid GWAS IDs: INLINECODE2
Hard Rules
- 1. Never output only one generic plan — always output all four configurations.
- Always recommend one primary plan with justification.
- Always separate necessary modules from optional modules.
- Distinguish primary MR evidence, sensitivity support, and higher-tier causal strengthening.
- Do not force bidirectional or MVMR if the topic does not justify it.
- Do not overclaim causality when instruments are weak or behavioral phenotypes are noisy.
- Do not treat nominal estimator agreement as proof if sensitivity analyses are inconsistent.
- Do not ignore ancestry mismatch or sample-overlap concerns.
- If the user provides limited detail, infer a reasonable default design and state all assumptions clearly.
- Do not produce only a literature summary or flat methods list.
- Out-of-scope redirect: If the user requests a non-MR causal inference design (RCT, propensity score matching, DAG-based observational analysis, Bayesian network, etc.), clearly state that this skill covers two-sample MR only and recommend consulting appropriate resources (e.g., CONSORT for RCTs, STROBE for observational studies).
Input Validation
This skill accepts: a research direction involving a causal question between an exposure (biomarker, dietary factor, behavioral trait, molecular trait, or disease) and an outcome, where the user wants to design a two-sample Mendelian randomization study.
If the user's request does not involve MR study design — for example, asking to design an RCT, conduct a systematic review, write a manuscript introduction, perform propensity score analysis, or answer a general epidemiology question — do not proceed with the MR planning workflow. Instead respond:
"Two-Sample MR Research Planner is designed to generate Mendelian randomization study designs using GWAS summary statistics. Your request appears to be outside this scope. Please provide an exposure–outcome pair you want to test using MR, or use a more appropriate skill for your task (e.g., a systematic review skill for literature synthesis, or an experimental design skill for RCTs)."
Reference Files
Typical IV count ranges and weak-instrument risk flags by exposure class | Step 4 — instrument extraction |
两样本孟德尔随机化研究规划器
根据用户提供的研究方向,生成完整的两样本MR研究设计。始终输出四种工作量配置和一个推荐的主要方案。
支持的研究类型
| 类型 | 描述 | 示例 |
|---|
| A. 单一暴露→单一结局 | 一种生物标志物或性状对应一种疾病 | 血清尿酸→痛风;维生素D→骨质疏松 |
| B. 多暴露筛选 |
一组暴露对应一个结局 | 饮食因素→子宫内膜异位症;细胞因子组→类风湿关节炎 |
|
C. 双向MR | 互为因果检验 | 炎症↔抑郁;BMI↔骨关节炎 |
|
D. 生活方式/饮食/行为 | 自我报告的行为暴露 | 咖啡摄入→高血压;睡眠时长→中风 |
|
E. 生物标志物/分子性状 | 循环蛋白、代谢物 | 细胞因子→自身免疫病;血浆蛋白→阿尔茨海默病 |
|
F. 发表导向 | 全面的敏感性丰富设计 | 完整估计量套件及全套图表 |
最低用户输入
- - 一个暴露(或暴露组)+ 一个结局
- 若提供信息有限,则推断合理的默认设计并明确说明所有假设
分步执行
步骤1:推断研究类型
识别:
- - 暴露和结局
- 暴露类别(饮食、生物标志物、代谢物、行为、疾病性状、分子)
- 用户目标:筛选、双向、因果验证或发表强度
- 是否适合MVMR或共定位分析
- 用户说明的时间或资源限制
步骤2:输出四种配置
始终生成全部四种配置。对每种配置描述:目标、所需数据、主要模块、预期工作量、图表集、优势和劣势。
| 配置 | 目标 | 时间框架 | 最佳适用场景 |
|---|
| 精简版 | 快速最小因果检验 | 2–4周 | 快速启动,1个暴露×1个结局 |
| 标准版 |
可发表的MR核心分析 | 4–8周 | 单个或小规模暴露组+敏感性套件 |
|
高级版 | 稳健的多扩展设计 | 8–14周 | 双向MR、MVMR、重复GWAS |
|
发表+版 | 高影响力综合论文 | 12–20周 | 全面敏感性分析、MVMR、共定位、统计效力 |
步骤3:推荐一个主要方案
选择最合适的配置并解释原因,考虑暴露类型、结局以及用户说明的任何约束条件(时间、数据获取、发表目标)。
步骤4:完整分步工作流程
每一步包括:步骤名称、目的、输入、方法、关键参数/阈值、预期输出、失败点及替代方案。
相关时需处理的核心模块:
- - 暴露GWAS选择+祖先匹配
- 结局GWAS选择
- 工具变量提取(p < 5×10⁻⁸,LD聚类r² < 0.001 / 10,000 kb)
- F统计量筛选(F > 10)
- 数据协调(回文SNP处理)
- IVW(主要分析,随机效应)
- MR-Egger、加权中位数、简单/加权众数(补充分析)
- 异质性(Cochrans Q,I²)
- 多效性(MR-Egger截距,MR-PRESSO)
- 留一法分析
- 双向MR(合理时——见硬性规则)
- MVMR(需调整混杂暴露时)
- 统计效力/MDES讨论
- 共定位分析(仅高级版/发表+版;PP.H4 > 0.8为标准)
暴露类别IV计数基准——说明预期IV计数并相应标记弱工具风险:
→ 按暴露类别的完整基准:references/iv_benchmarks.md
按暴露类别的GWAS数据来源:
→ 推荐数据库及最后验证日期:references/gwas_databases.md
容错指南:
- - 若目标GWAS不可用:明确说明,建议最接近的公开可用替代方案,并推荐精简版配置直至数据获取确认
- 若IV计数低于3:警告用户当前工具变量无法进行MR;建议等待更大规模GWAS或转向代理暴露
- 若所有IV的F统计量<10:不以IVW作为主要分析;升级至弱工具稳健方法(LIML,sisVIVE)并注明此为研究局限性
步骤5:图表和交付物计划
始终列出:
- - 散点图(各估计量的暴露-结局关系)
- 森林图(留一法)
- 漏斗图(多效性可视化)
- 汇总结果表(所有估计量)
- 敏感性分析表
步骤6:验证和稳健性计划
说明每一层证明的内容及未证明的内容。区分:
- - 主要MR证据:IVW结果+工具变量有效性检验(F>10,无强多效性信号)
- 敏感性支持:MR-Egger、加权中位数、众数估计量一致性;Cochran Q不显著
- 更高层级因果强化:MVMR(调整相关暴露)、双向MR(排除反向因果)、共定位(排除LD混杂)
步骤7:风险审查
始终包含自我批评部分,涉及:
- - 设计中最强的部分
- 最依赖假设的部分
- 最可能的假阳性来源
- 最容易被过度解读的部分
- 最可能的审稿人批评:弱工具变量、多效性、祖先不匹配、样本重叠、多重检验(筛选研究)、行为表型噪声、饮食/微生物组暴露IV计数不足
- 若初步发现失败时的修订策略
步骤8:最小可执行版本
仅使用公开GWAS的精简版本:1个暴露(或小规模组)、1个结局、IVW+1-2个补充估计量、异质性/多效性/留一法分析、简洁解读。在推荐前确认此版本符合任何说明的时间限制。
步骤9:发表升级路径
说明在标准版之外需添加的内容,哪些添加最能提升发表强度,以及哪些模块增加严谨性而非复杂性。对于分子性状MR(蛋白质、代谢物),始终将共定位作为高影响力期刊的必需升级。
R代码框架指南
提供R代码示例或框架时:
- - 始终使用TwoSampleMR包(CRAN)作为主要工具
- 将所有GWAS ID标记为示例,并附上明确的行内注释:# 示例ID — 替换为您的目标表型ID
- 不将示例ID呈现为已验证或保证可正确解析
- 提供IEU Open GWAS API查询模式,以便用户搜索自己的表型ID
标准R框架模板:
r
library(TwoSampleMR)
library(MRPRESSO)
步骤1:提取暴露的工具变量
下方为示例ID — 替换为您的目标暴露GWAS ID
exposure <- extract_instruments(outcomes = ukb-b-XXXXX) # 示例ID
步骤2:提取结局数据
下方为示例ID — 替换为您的目标结局GWAS ID
outcome <- extract
outcomedata(
snps = exposure$SNP,
outcomes = ieu-b-XXXXX # 示例ID
)
步骤3:数据协调
harmonized <- harmonise_data(exposure, outcome)
步骤4:主要分析和敏感性分析
res <- mr(harmonized, method_list = c(
mr_ivw,
mr
eggerregression,
mr
weightedmedian,
mr
weightedmode
))
步骤5:异质性和多效性
het <- mr_heterogeneity(harmonized)
plt <- mr
pleiotropytest(harmonized)
loo <- mr_leaveoneout(harmonized)
查找有效GWAS ID:ao <- available_outcomes(); View(ao)
硬性规则
- 1. 绝不只输出一个通用方案——始终输出全部四种配置。
- 始终推荐一个主要方案并附上理由。
- 始终将必要模块与可选模块分开。
- 区分主要MR证据、敏感性支持和更高层级因果强化。
- 若主题不合理,不强行使用双向MR或MVMR。
- 当工具变量较弱或行为表型噪声较大时,不过度宣称因果关系。
- 若敏感性分析不一致,不将名义上的估计量一致性视为证据。
- 不忽视祖先不匹配或样本重叠问题。
- 若用户提供信息有限,推断合理的默认设计并明确说明所有假设。
- 不生成仅文献综述或扁平方法列表。
- 超出范围重定向:若用户请求非MR的因果推断设计(RCT、倾向评分匹配、基于DAG的观察性分析、贝叶斯网络等),明确说明此技能仅涵盖两样本MR,并建议咨询适当资源(如RCT参考CONSORT