Volcano Plot Labeler (ID: 148)
Automatically identify and label the Top 10 most significant genes in volcano plots using a repulsion algorithm to prevent label overlap.
Features
- - Smart Gene Selection: Automatically identifies the top 10 most significant genes based on p-value and fold change
- Repulsion Algorithm: Uses force-directed positioning to prevent text label overlap
- Customizable: Configurable thresholds, label styling, and positioning options
- Multiple Output Formats: PNG, PDF, SVG support
Installation
CODEBLOCK0
Usage
Basic Usage
CODEBLOCK1
Advanced Usage
CODEBLOCK2
Command Line Usage
CODEBLOCK3
Input Format
Expected CSV/TSV columns:
- -
log2FoldChange: Log2 fold change values - INLINECODE1 or
pvalue: Adjusted p-values or raw p-values - INLINECODE3 : Gene identifiers
Algorithm
Significance Calculation
- 1. Calculate
-log10(pvalue) for all genes - Rank genes by combined score: INLINECODE5
- Select top N genes with highest significance
Repulsion Algorithm
- 1. Initial Placement: Place labels at gene coordinates
- Force Calculation:
- Repulsive force between overlapping labels
- Spring force pulling label toward its gene point
- Boundary forces to keep labels within plot area
- 3. Iterative Optimization: Update positions for N iterations until convergence
- Arrow Drawing: Draw connecting lines from labels to gene points
Parameters
| Parameter | Type | Default | Description |
|---|
| INLINECODE6 | DataFrame | - | Input data |
| INLINECODE7 |
str | 'log2FoldChange' | Column name for log2 fold change |
|
pvalue_col | str | 'padj' | Column name for p-value |
|
gene_col | str | 'gene_name' | Column name for gene names |
|
top_n | int | 10 | Number of top genes to label |
|
pvalue_threshold | float | 0.05 | P-value cutoff for coloring |
|
log2fc_threshold | float | 1.0 | Log2FC cutoff for coloring |
|
repulsion_iterations | int | 100 | Iterations for repulsion algorithm |
|
repulsion_force | float | 0.05 | Strength of repulsion force |
|
label_fontsize | int | 10 | Font size for labels |
|
figsize | tuple | (10, 10) | Figure size |
Output
- - Labeled volcano plot with:
- Color-coded points (up/down/not significant)
- Top 10 gene labels with leader lines
- No overlapping text labels
License
MIT
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access |
No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- - [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
Prerequisites
CODEBLOCK4
Evaluation Criteria
Success Metrics
- - [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
Test Cases
- 1. Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- - Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support
火山图标签工具 (ID: 148)
使用排斥算法自动识别并标记火山图中前10个最显著的基因,防止标签重叠。
功能特点
- - 智能基因选择:基于p值和差异倍数自动识别前10个最显著基因
- 排斥算法:使用力导向定位防止文本标签重叠
- 可定制化:可配置阈值、标签样式和定位选项
- 多种输出格式:支持PNG、PDF、SVG格式
安装
bash
pip install pandas matplotlib numpy scipy
使用方法
基本用法
python
from volcanoplotlabeler import labelvolcanoplot
import pandas as pd
加载数据
df = pd.read
csv(differentialexpression_results.csv)
生成带标签的火山图
fig = label
volcanoplot(
df,
log2fc_col=log2FoldChange,
pvalue_col=padj,
gene
col=genename,
top_n=10
)
fig.savefig(volcano
plotlabeled.png, dpi=300, bbox_inches=tight)
高级用法
python
from volcanoplotlabeler import labelvolcanoplot
fig = labelvolcanoplot(
df,
log2fc_col=log2FoldChange,
pvalue_col=padj,
genecol=genename,
top_n=10,
pvalue_threshold=0.05,
log2fc_threshold=1.0,
figsize=(12, 10),
repulsion_iterations=100,
repulsion_force=0.05,
label_fontsize=10,
label_color=black,
arrow_color=gray,
save_path=output.png
)
命令行使用
bash
python scripts/main.py \
--input data/deseq2_results.csv \
--output volcano_labeled.png \
--log2fc-col log2FoldChange \
--pvalue-col padj \
--gene-col gene_name \
--top-n 10
输入格式
期望的CSV/TSV列:
- - log2FoldChange:Log2差异倍数值
- padj 或 pvalue:校正后p值或原始p值
- gene_name:基因标识符
算法
显著性计算
- 1. 计算所有基因的 -log10(pvalue)
- 按综合得分排序基因:|log2FC| * -log10(pvalue)
- 选择显著性最高的前N个基因
排斥算法
- 1. 初始定位:将标签放置在基因坐标位置
- 力计算:
- 重叠标签之间的排斥力
- 将标签拉向基因点的弹簧力
- 将标签保持在绘图区域内的边界力
- 3. 迭代优化:进行N次迭代更新位置直至收敛
- 箭头绘制:从标签到基因点绘制连接线
参数
| 参数 | 类型 | 默认值 | 描述 |
|---|
| df | DataFrame | - | 输入数据 |
| log2fc_col |
str | log2FoldChange | log2差异倍数列名 |
| pvalue_col | str | padj | p值列名 |
| gene
col | str | genename | 基因名称列名 |
| top_n | int | 10 | 标记的顶部基因数量 |
| pvalue_threshold | float | 0.05 | 用于着色的p值阈值 |
| log2fc_threshold | float | 1.0 | 用于着色的Log2FC阈值 |
| repulsion_iterations | int | 100 | 排斥算法迭代次数 |
| repulsion_force | float | 0.05 | 排斥力强度 |
| label_fontsize | int | 10 | 标签字体大小 |
| figsize | tuple | (10, 10) | 图形尺寸 |
输出
- 颜色编码的点(上调/下调/不显著)
- 前10个基因标签及引导线
- 无重叠文本标签
许可证
MIT
风险评估
| 风险指标 | 评估 | 等级 |
|---|
| 代码执行 | Python/R脚本在本地执行 | 中等 |
| 网络访问 |
无外部API调用 | 低 |
| 文件系统访问 | 读取输入文件,写入输出文件 | 中等 |
| 指令篡改 | 标准提示指南 | 低 |
| 数据泄露 | 输出文件保存到工作空间 | 低 |
安全检查清单
- - [ ] 无硬编码凭据或API密钥
- [ ] 无未经授权的文件系统访问(../)
- [ ] 输出不暴露敏感信息
- [ ] 已实施提示注入保护
- [ ] 输入文件路径已验证(无../遍历)
- [ ] 输出目录限制在工作空间内
- [ ] 脚本在沙盒环境中执行
- [ ] 错误消息已清理(不暴露堆栈跟踪)
- [ ] 依赖项已审计
先决条件
bash
Python依赖
pip install -r requirements.txt
评估标准
成功指标
- - [ ] 成功执行主要功能
- [ ] 输出符合质量标准
- [ ] 优雅处理边缘情况
- [ ] 性能可接受
测试用例
- 1. 基本功能:标准输入 → 预期输出
- 边缘情况:无效输入 → 优雅的错误处理
- 性能:大数据集 → 可接受的处理时间
生命周期状态
- - 当前阶段:草案
- 下次审查日期:2026-03-06
- 已知问题:无
- 计划改进:
- 性能优化
- 额外功能支持