Nutrigenomics — Personalised Nutrition from Genetic Data
Skill ID: nutrigenomics
Version: 0.3.1
Status: Beta
Author: David de Lorenzo
Requires: Python 3.11+, pandas, numpy, matplotlib, seaborn, reportlab (optional)
What This Skill Does
The Nutrigenomics generates a personalised nutrition report from consumer
genetic data (23andMe, AncestryDNA raw files or VCF). It interrogates a curated
set of nutritionally-relevant SNPs drawn from GWAS Catalog, ClinVar, and
peer-reviewed nutrigenomics literature, then translates genotype calls into
actionable dietary and supplementation guidance — all computed locally.
Key outputs
- - Markdown nutrition report with risk scores and per-SNP genotype calls
- Radar chart of nutrient risk profile
- Gene × nutrient heatmap
- Reproducibility bundle (
README_reproducibility.txt, environment.yml, checksums.txt, provenance.json)
Trigger Phrases
The Bio Orchestrator should route to this skill when the user says anything like:
- - "personalised nutrition", "nutrigenomics", "diet genetics"
- "what should I eat based on my DNA"
- "nutrient metabolism", "vitamin absorption genetics"
- "MTHFR", "APOE", "FTO", "BCMO1", "VDR", "FADS1/2"
- "folate", "omega-3", "vitamin D", "caffeine metabolism", "lactose", "gluten"
- Input files:
.txt or .csv (23andMe), .csv (AncestryDNA), INLINECODE8
Curated SNP Panel
Macronutrient Metabolism
| Gene | SNP | Nutrient Impact | Evidence |
|---|
| FTO | rs9939609 | Energy balance, fat mass, carb sensitivity | Strong (GWAS) |
| PPARG |
rs1801282 | Fat metabolism, insulin sensitivity | Moderate |
| APOA5 | rs662799 | Triglyceride response to dietary fat | Strong |
| TCF7L2 | rs7903146 | Carbohydrate metabolism, T2D risk | Strong |
| ADRB2 | rs1042713 | Fat oxidation, exercise × diet interaction | Moderate |
Micronutrient Metabolism
| Gene | SNP | Nutrient | Effect of risk allele |
|---|
| MTHFR | rs1801133 | Folate / B12 | ↓ 5-MTHF conversion (~70%) |
| MTHFR |
rs1801131 | Folate / B12 | ↓ enzyme activity (~30%) |
| MTR | rs1805087 | B12 / homocysteine | ↑ homocysteine risk |
| BCMO1 | rs7501331 | Beta-carotene → Vitamin A | ↓ conversion (~50%) |
| BCMO1 | rs12934922 | Beta-carotene → Vitamin A | ↓ conversion (compound het) |
| VDR | rs2228570 | Vitamin D absorption | ↓ VDR function |
| VDR | rs731236 | Vitamin D | ↓ bone mineral density response |
| GC | rs4588 | Vitamin D binding | ↑ deficiency risk |
| SLC23A1 | rs33972313 | Vitamin C transport | ↓ renal reabsorption |
| ALPL | rs1256335 | Vitamin B6 | ↓ alkaline phosphatase activity |
Omega-3 / Fatty Acid Metabolism
| Gene | SNP | Nutrient | Effect |
|---|
| FADS1 | rs174546 | LC-PUFA synthesis | ↑/↓ EPA/DHA from ALA |
| FADS2 |
rs1535 | LC-PUFA synthesis | Modulates omega-6:omega-3 ratio |
| ELOVL2 | rs953413 | DHA synthesis | ↓ elongation of EPA→DHA |
| APOE | rs429358 | Saturated fat response | ε4 → ↑ LDL-C on high SFA diet |
| APOE | rs7412 | Saturated fat response | Combined with rs429358 for ε typing |
Caffeine & Alcohol
| Gene | SNP | Compound | Effect |
|---|
| CYP1A2 | rs762551 | Caffeine | Slow/Fast metaboliser |
| AHR |
rs4410790 | Caffeine | Modulates CYP1A2 induction |
| ADH1B | rs1229984 | Alcohol | Acetaldehyde accumulation risk |
| ALDH2 | rs671 | Alcohol | Asian flush / toxicity risk |
Food Sensitivities
| Gene | SNP | Sensitivity | Effect |
|---|
| MCM6 | rs4988235 | Lactose intolerance | Non-persistence of lactase |
| HLA-DQ2 |
Proxy SNPs | Coeliac / gluten | HLA-DQA1/DQB1 risk haplotypes |
Antioxidant & Detoxification
| Gene | SNP | Pathway | Effect |
|---|
| SOD2 | rs4880 | Manganese SOD | ↓ mitochondrial antioxidant |
| GPX1 |
rs1050450 | Selenium / GSH-Px | ↓ glutathione peroxidase |
| GSTT1 | Deletion | Glutathione-S-trans | Null genotype → ↑ oxidative risk|
| NQO1 | rs1800566 | Coenzyme Q10 | ↓ CoQ10 regeneration |
| COMT | rs4680 | Catechol / B vitamins | Met/Val → methylation load |
Algorithm
1. Input Parsing (parse_input.py)
Accepts:
- - 23andMe
.txt or .csv (tab-separated: rsid, chromosome, position, genotype) - AncestryDNA INLINECODE12
- Standard VCF (extracts GT field)
Auto-detects format from header lines. Normalises alleles to forward strand using
a hard-coded reference table (avoids requiring external databases).
2. Genotype Extraction (extract_genotypes.py)
For each SNP in the panel:
- 1. Look up rsid in parsed data
- Return genotype string (e.g.
"AT", "TT", "AA") - Flag as
"NOT_TESTED" if absent (common for chip-to-chip variation)
3. Risk Scoring (score_variants.py)
Each SNP is scored on a 0 / 0.5 / 1.0 scale:
- -
0.0 — homozygous reference (lowest risk) - INLINECODE20 — heterozygous
- INLINECODE21 — homozygous risk allele
Composite Nutrient Risk Scores (0–10) are computed per nutrient domain by
summing weighted SNP scores. Weights are derived from reported effect sizes
(beta coefficients or OR) in the primary literature.
Risk categories:
- - 0–3: Low risk — standard dietary advice applies
- 3–6: Moderate risk — dietary optimisation recommended
- 6–10: Elevated risk — consider testing and targeted supplementation
Important caveat: These are polygenic risk indicators based on common
variants. They are not diagnostic. Rare pathogenic variants (e.g. MTHFR
compound heterozygosity with high homocysteine) require clinical confirmation.
4. Report Generation (generate_report.py)
Outputs a structured Markdown report with:
- - Executive summary (top 3 personalised findings)
- Per-nutrient sections: genotype table → interpretation → recommendation
- Radar chart (matplotlib) of nutrient risk scores
- Gene × nutrient heatmap (seaborn)
- Supplement interactions table
- Disclaimer section
- Reproducibility block
5. Reproducibility Bundle (repro_bundle.py)
Exports to the output directory (not committed to the repo):
- -
README_reproducibility.txt — step-by-step instructions to reproduce the analysis manually - INLINECODE25 — pinned conda environment
- INLINECODE26 — SHA-256 checksums of the SNP panel and output report (input file intentionally excluded to avoid persisting a fingerprint of genetic data)
- INLINECODE27 — timestamp, version, and format arguments (input filename intentionally omitted)
Note: No executable scripts are generated. The reproducibility bundle contains
only text files for documentation and integrity verification.
Execution
To run the analysis on a user-provided genetic file, execute this command directly:
CODEBLOCK0
To run a demo without real genetic data (synthetic patient file included with the skill):
CODEBLOCK1
INLINECODE28 is replaced by OpenClaw at runtime with the absolute path to this skill's folder. Do not substitute it manually. Output is written to a timestamped directory (nutrigenomics_output_YYYYMMDD_HHMMSS/) in the current working directory and persists until manually deleted.
Supported --format values: auto (default), 23andme, ancestry, vcf.
Usage
CODEBLOCK2
File Structure
CODEBLOCK3
Note: Runtime output directories and randomly generated patient files are
excluded from version control. Only the pre-rendered demo
report in examples/output/ is committed.
Privacy
All computation runs locally — no genetic data is ever transmitted to external
servers or third-party services.
What the report contains: The Markdown report includes per-SNP genotype calls
(e.g. AT, TT) for each of the 58 panel SNPs analysed. This is intentional:
knowing your specific genotype at each nutrition-related locus is what makes the
report actionable. Full raw genome data from the input file is not reproduced in
the report; only the 58 panel SNPs are included.
File persistence: Output files (report, figures, reproducibility bundle) are
written to a timestamped nutrigenomics_output_YYYYMMDD_HHMMSS/ directory under
the working directory and persist on disk until manually deleted. The input
file is read-only and is never copied into the output directory.
If you are running this skill on behalf of others or in a shared environment,
delete the output directory once the user has downloaded their results.
Limitations & Disclaimer
- 1. Not a medical device. This skill provides educational, research-oriented
nutrigenomics analysis. It does not constitute medical advice.
- 2. Common variants only. The panel covers SNPs with MAF > 1% in at least one
major population. Rare pathogenic variants are out of scope.
- 3. Population context. Effect sizes are predominantly derived from European
GWAS cohorts. Risk estimates may not generalise equally across all ancestries.
- 4. Gene–environment interaction. Genetic risk scores interact with baseline
diet, lifestyle, microbiome, and epigenetic state. A "high risk" score does not
mean a nutrient deficiency is present — it means the individual may benefit from
monitoring.
- 5. Simpson's Paradox note. Population-level associations used to derive weights
may not reflect individual trajectories (see Corpas 2025, *Nutrigenomics and
the Ecological Fallacy*).
Roadmap
- - [ ] v0.2: Microbiome × genotype interaction module (16S rRNA input)
- [ ] v0.3: Longitudinal tracking — compare reports across time
- [ ] v0.4: HLA typing for immune-mediated food reactions (coeliac, gluten sensitivity)
- [ ] v1.0: Multi-omics integration (metabolomics + genomics + dietary recall)
References
This skill's SNP panel and methodology are informed by peer-reviewed nutrigenomics research. For verification and additional details, consult:
- - PubMed MEDLINE: https://pubmed.ncbi.nlm.nih.gov/
- GWAS Catalog: https://www.ebi.ac.uk/gwas/ (published genome-wide association studies)
- ClinVar: https://www.ncbi.nlm.nih.gov/clinvar/ (variant interpretations)
Users are encouraged to verify specific claims through these authoritative sources and with qualified healthcare providers.
Contributing
The SNP panel (data/snp_panel.json) is maintained by the skill author.
To suggest additions or corrections, contact David de Lorenzo directly via
GitHub (@drdaviddelorenzo) or open
an issue on GitHub.
营养基因组学——基于遗传数据的个性化营养方案
技能ID:nutrigenomics
版本:0.3.1
状态:测试版
作者:David de Lorenzo
运行要求:Python 3.11+、pandas、numpy、matplotlib、seaborn、reportlab(可选)
技能功能概述
营养基因组学技能可根据消费者的基因数据(23andMe、AncestryDNA原始文件或VCF格式)生成个性化营养报告。该技能检索来自GWAS目录、ClinVar数据库及同行评审营养基因组学文献中经过筛选的营养相关SNP集合,将基因型结果转化为可执行的饮食与补充剂指导建议——所有计算均在本地完成。
主要输出内容
- - Markdown格式营养报告,包含风险评分及每个SNP的基因型结果
- 营养素风险概况雷达图
- 基因×营养素热力图
- 可复现性数据包(README_reproducibility.txt、environment.yml、checksums.txt、provenance.json)
触发短语
当用户提及以下内容时,生物编排器应调用此技能:
- - 个性化营养、营养基因组学、饮食遗传学
- 根据我的DNA应该吃什么
- 营养素代谢、维生素吸收遗传学
- MTHFR、APOE、FTO、BCMO1、VDR、FADS1/2
- 叶酸、欧米伽-3、维生素D、咖啡因代谢、乳糖、麸质
- 输入文件:.txt或.csv(23andMe格式)、.csv(AncestryDNA格式)、.vcf
精选SNP检测组
宏量营养素代谢
| 基因 | SNP | 营养素影响 | 证据等级 |
|---|
| FTO | rs9939609 | 能量平衡、脂肪量、碳水化合物敏感性 | 强(GWAS) |
| PPARG |
rs1801282 | 脂肪代谢、胰岛素敏感性 | 中等 |
| APOA5 | rs662799 | 膳食脂肪对甘油三酯的反应 | 强 |
| TCF7L2 | rs7903146 | 碳水化合物代谢、2型糖尿病风险 | 强 |
| ADRB2 | rs1042713 | 脂肪氧化、运动×饮食交互作用 | 中等 |
微量营养素代谢
| 基因 | SNP | 营养素 | 风险等位基因效应 |
|---|
| MTHFR | rs1801133 | 叶酸/维生素B12 | 5-MTHF转化率降低约70% |
| MTHFR |
rs1801131 | 叶酸/维生素B12 | 酶活性降低约30% |
| MTR | rs1805087 | 维生素B12/同型半胱氨酸 | 同型半胱氨酸风险升高 |
| BCMO1 | rs7501331 | β-胡萝卜素→维生素A | 转化率降低约50% |
| BCMO1 | rs12934922 | β-胡萝卜素→维生素A | 转化率降低(复合杂合) |
| VDR | rs2228570 | 维生素D吸收 | VDR功能降低 |
| VDR | rs731236 | 维生素D | 骨密度反应降低 |
| GC | rs4588 | 维生素D结合 | 缺乏风险升高 |
| SLC23A1 | rs33972313 | 维生素C转运 | 肾重吸收降低 |
| ALPL | rs1256335 | 维生素B6 | 碱性磷酸酶活性降低 |
欧米伽-3/脂肪酸代谢
| 基因 | SNP | 营养素 | 效应 |
|---|
| FADS1 | rs174546 | 长链多不饱和脂肪酸合成 | α-亚麻酸转化为EPA/DHA能力升高/降低 |
| FADS2 |
rs1535 | 长链多不饱和脂肪酸合成 | 调节欧米伽-6与欧米伽-3比例 |
| ELOVL2 | rs953413 | DHA合成 | EPA向DHA延伸能力降低 |
| APOE | rs429358 | 饱和脂肪反应 | ε4等位基因→高饱和脂肪饮食下低密度脂蛋白胆固醇升高 |
| APOE | rs7412 | 饱和脂肪反应 | 与rs429358联合用于ε分型 |
咖啡因与酒精
| 基因 | SNP | 化合物 | 效应 |
|---|
| CYP1A2 | rs762551 | 咖啡因 | 慢速/快速代谢者 |
| AHR |
rs4410790 | 咖啡因 | 调节CYP1A2诱导 |
| ADH1B | rs1229984 | 酒精 | 乙醛蓄积风险 |
| ALDH2 | rs671 | 酒精 | 亚洲脸红/毒性风险 |
食物敏感性
| 基因 | SNP | 敏感性 | 效应 |
|---|
| MCM6 | rs4988235 | 乳糖不耐受 | 乳糖酶持续性缺失 |
| HLA-DQ2 |
代理SNP | 乳糜泻/麸质 | HLA-DQA1/DQB1风险单倍型 |
抗氧化与解毒
| 基因 | SNP | 通路 | 效应 |
|---|
| SOD2 | rs4880 | 锰超氧化物歧化酶 | 线粒体抗氧化能力降低 |
| GPX1 |
rs1050450 | 硒/谷胱甘肽过氧化物酶 | 谷胱甘肽过氧化物酶活性降低 |
| GSTT1 | 缺失 | 谷胱甘肽-S-转移酶 | 缺失基因型→氧化风险升高 |
| NQO1 | rs1800566 | 辅酶Q10 | 辅酶Q10再生能力降低 |
| COMT | rs4680 | 儿茶酚/维生素B族 | Met/Val→甲基化负荷 |
算法流程
1. 输入解析(parse_input.py)
接受以下格式:
- - 23andMe .txt或.csv(制表符分隔:rsid、染色体、位置、基因型)
- AncestryDNA .csv
- 标准VCF(提取GT字段)
根据文件头行自动检测格式。使用硬编码参考表将等位基因归一化为正向链(无需外部数据库)。
2. 基因型提取(extract_genotypes.py)
对检测组中的每个SNP:
- 1. 在解析数据中查找rsid
- 返回基因型字符串(例如AT、TT、AA)
- 若缺失则标记为NOT_TESTED(常见于不同芯片间的差异)
3. 风险评分(score_variants.py)
每个SNP按0/0.5/1.0等级评分:
- - 0.0 — 纯合参考型(风险最低)
- 0.5 — 杂合型
- 1.0 — 纯合风险等位基因
综合营养素风险评分(0-10分)通过对每个营养素领域的加权SNP评分求和计算。权重来源于原始文献中报告的效果量(β系数或比值比)。
风险分类:
- - 0-3分:低风险——适用标准饮食建议
- 3-6分:中等风险——建议优化饮食
- 6-10分:高风险——考虑检测和针对性补充
重要提示:这些是基于常见变异的多基因风险指标,不具诊断意义。罕见致病性变异(例如MTHFR复合杂合伴高同型半胱氨酸)需要临床确认。
4. 报告生成(generate_report.py)
输出结构化的Markdown报告,包含:
- - 执行摘要(前3项个性化发现)
- 各营养素板块:基因型表格→解读→建议
- 营养素风险评分雷达图(matplotlib)
- 基因×营养素热力图(seaborn)
- 补充剂交互作用表
- 免责声明板块
- 可复现性数据块
5. 可复现性数据包(repro_bundle.py)
导出至输出目录(不提交至仓库):