Volcano Plot Script Generator
A skill for generating publication-ready volcano plots from differential gene expression analysis results.
Overview
Volcano plots visualize the relationship between statistical significance (p-values) and magnitude of change (fold changes) in gene expression data. This skill generates customizable R or Python scripts for creating high-quality figures suitable for publications.
Use Cases
- - Visualize RNA-seq DEG analysis results
- Identify significantly upregulated and downregulated genes
- Highlight genes of interest (markers, pathways)
- Generate publication-quality figures for manuscripts
- Compare multiple experimental conditions
Input Requirements
Required input data format:
- - Gene identifier (gene symbol or ENSEMBL ID)
- Log2 fold change values
- Adjusted or raw p-values
- Optional: gene annotations, pathways
Output
- - Publication-ready volcano plot (PNG/PDF/SVG)
- Customizable R or Python script
- Optional: labeled significant gene lists
Usage
CODEBLOCK0
Parameters
| Parameter | Description | Default |
|---|
| INLINECODE0 | Path to DEG results CSV/TSV | required |
| INLINECODE1 |
Output plot file path | volcano_plot.png |
|
--log2fc-col | Column name for log2 fold change | log2FoldChange |
|
--pvalue-col | Column name for p-value | padj |
|
--gene-col | Column name for gene IDs | gene |
|
--log2fc-thresh | Log2 FC threshold for significance | 1.0 |
|
--pvalue-thresh | P-value threshold | 0.05 |
|
--label-genes | File with genes to label | None |
|
--top-n | Label top N significant genes | 10 |
|
--color-up | Color for upregulated genes | #E74C3C |
|
--color-down | Color for downregulated genes | #3498DB |
|
--color-ns | Color for non-significant genes | #95A5A6 |
Technical Difficulty
Medium - Requires understanding of:
- - DEG analysis concepts (fold change, p-values, FDR)
- Data visualization principles
- Matplotlib/ggplot2 plotting libraries
Dependencies
Python
- - pandas
- matplotlib
- seaborn
- numpy
R
- - ggplot2
- dplyr
- ggrepel (for label positioning)
References
Author
Auto-generated skill for bioinformatics visualization.
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access |
No external API calls | Low |
| File System Access | Read input files, write output plots | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- - [ ] No hardcoded credentials or API keys
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited (pandas, matplotlib, seaborn, numpy)
Prerequisites
CODEBLOCK1
Evaluation Criteria
Success Metrics
- - [ ] Successfully generates executable Python/R script
- [ ] Output plot is publication-ready quality
- [ ] Correctly identifies significant genes based on thresholds
- [ ] Handles missing or malformed data gracefully
- [ ] Color scheme is accessible (colorblind-friendly)
Test Cases
- 1. Basic DEG Visualization: Input standard DESeq2 results → Valid volcano plot
- Custom Thresholds: Adjust log2FC and p-value thresholds → Correct gene classification
- Gene Labeling: Specify genes to label → Labels appear correctly
- Large Dataset: Input 20,000+ genes → Performance remains acceptable
- Malformed Data: Input with missing values → Graceful error handling
Lifecycle Status
- - Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Add interactive plot option (Plotly)
- Support for multiple comparison groups
- Integration with pathway enrichment tools
火山图脚本生成器
一种用于从差异基因表达分析结果生成可发表级火山图的技能。
概述
火山图可直观展示基因表达数据中统计显著性(p值)与变化幅度(倍数变化)之间的关系。该技能可生成可定制的R或Python脚本,用于创建适合发表的高质量图表。
应用场景
- - 可视化RNA-seq差异表达基因分析结果
- 识别显著上调和下调基因
- 突出显示目标基因(标记基因、通路基因)
- 为手稿生成发表级图表
- 比较多个实验条件
输入要求
所需输入数据格式:
- - 基因标识符(基因符号或ENSEMBL ID)
- Log2倍数变化值
- 校正后或原始p值
- 可选:基因注释、通路信息
输出
- - 可发表级火山图(PNG/PDF/SVG格式)
- 可定制的R或Python脚本
- 可选:标注的显著基因列表
使用方法
python
示例:运行火山图生成器
python scripts/main.py --input deg
results.csv --output volcanoplot.png
参数
| 参数 | 描述 | 默认值 |
|---|
| --input | DEG结果CSV/TSV文件路径 | 必填 |
| --output |
输出图表文件路径 | volcano_plot.png |
| --log2fc-col | log2倍数变化列名 | log2FoldChange |
| --pvalue-col | p值列名 | padj |
| --gene-col | 基因ID列名 | gene |
| --log2fc-thresh | 显著性log2 FC阈值 | 1.0 |
| --pvalue-thresh | p值阈值 | 0.05 |
| --label-genes | 需标注的基因文件 | 无 |
| --top-n | 标注前N个显著基因 | 10 |
| --color-up | 上调基因颜色 | #E74C3C |
| --color-down | 下调基因颜色 | #3498DB |
| --color-ns | 非显著基因颜色 | #95A5A6 |
技术难度
中等 - 需要理解:
- - DEG分析概念(倍数变化、p值、FDR)
- 数据可视化原理
- Matplotlib/ggplot2绘图库
依赖项
Python
- - pandas
- matplotlib
- seaborn
- numpy
R
- - ggplot2
- dplyr
- ggrepel(用于标签定位)
参考资料
作者
生物信息学可视化自动生成技能。
风险评估
| 风险指标 | 评估 | 等级 |
|---|
| 代码执行 | Python/R脚本本地执行 | 中等 |
| 网络访问 |
无外部API调用 | 低 |
| 文件系统访问 | 读取输入文件,写入输出图表 | 中等 |
| 指令篡改 | 标准提示词指南 | 低 |
| 数据泄露 | 输出文件保存至工作区 | 低 |
安全检查清单
- - [ ] 无硬编码凭据或API密钥
- [ ] 输入文件路径已验证(无../遍历)
- [ ] 输出目录限制在工作区内
- [ ] 脚本在沙盒环境中执行
- [ ] 错误信息已清理(不暴露堆栈跟踪)
- [ ] 依赖项已审计(pandas, matplotlib, seaborn, numpy)
前置条件
bash
Python依赖项
pip install -r requirements.txt
R依赖项(如使用R)
install.packages(c(ggplot2, dplyr, ggrepel))
评估标准
成功指标
- - [ ] 成功生成可执行的Python/R脚本
- [ ] 输出图表达到可发表级质量
- [ ] 基于阈值正确识别显著基因
- [ ] 优雅处理缺失或格式错误的数据
- [ ] 配色方案无障碍(色盲友好)
测试用例
- 1. 基础DEG可视化:输入标准DESeq2结果 → 生成有效火山图
- 自定义阈值:调整log2FC和p值阈值 → 基因分类正确
- 基因标注:指定需标注的基因 → 标签正确显示
- 大数据集:输入20,000+基因 → 性能保持可接受
- 格式错误数据:输入含缺失值数据 → 优雅的错误处理
生命周期状态
- - 当前阶段:草案
- 下次审查日期:2026-03-06
- 已知问题:无
- 计划改进:
- 添加交互式图表选项(Plotly)
- 支持多个比较组
- 与通路富集工具集成