Volcano Plot Labeler (ID: 148)

Automatically identify and label the Top 10 most significant genes in volcano plots using a repulsion algorithm to prevent label overlap.

Features

- Smart Gene Selection: Automatically identifies the top 10 most significant genes based on p-value and fold change
Repulsion Algorithm: Uses force-directed positioning to prevent text label overlap
Customizable: Configurable thresholds, label styling, and positioning options
Multiple Output Formats: PNG, PDF, SVG support

Installation

CODEBLOCK0

Usage

Basic Usage

CODEBLOCK1

Advanced Usage

CODEBLOCK2

Command Line Usage

CODEBLOCK3

Input Format

Expected CSV/TSV columns:

- log2FoldChange: Log2 fold change values
INLINECODE1 or pvalue: Adjusted p-values or raw p-values
INLINECODE3: Gene identifiers

Algorithm

Significance Calculation

1. Calculate -log10(pvalue) for all genes
Rank genes by combined score: INLINECODE5
Select top N genes with highest significance

Repulsion Algorithm

1. Initial Placement: Place labels at gene coordinates
Force Calculation:

- Repulsive force between overlapping labels - Spring force pulling label toward its gene point - Boundary forces to keep labels within plot area

3. Iterative Optimization: Update positions for N iterations until convergence
Arrow Drawing: Draw connecting lines from labels to gene points

Parameters

Parameter	Type	Default	Description
INLINECODE6	DataFrame	-	Input data
INLINECODE7

Output

- Labeled volcano plot with:

- Color-coded points (up/down/not significant) - Top 10 gene labels with leader lines - No overlapping text labels

License

MIT

Risk Assessment

Risk Indicator	Assessment	Level
Code Execution	Python/R scripts executed locally	Medium
Network Access

Security Checklist

- [ ] No hardcoded credentials or API keys
[ ] No unauthorized file system access (../)
[ ] Output does not expose sensitive information
[ ] Prompt injection protections in place
[ ] Input file paths validated (no ../ traversal)
[ ] Output directory restricted to workspace
[ ] Script execution in sandboxed environment
[ ] Error messages sanitized (no stack traces exposed)
[ ] Dependencies audited

Prerequisites

CODEBLOCK4

Evaluation Criteria

Success Metrics

- [ ] Successfully executes main functionality
[ ] Output meets quality standards
[ ] Handles edge cases gracefully
[ ] Performance is acceptable

Test Cases

1. Basic Functionality: Standard input → Expected output
Edge Case: Invalid input → Graceful error handling
Performance: Large dataset → Acceptable processing time

Lifecycle Status

- Current Stage: Draft
Next Review Date: 2026-03-06
Known Issues: None
Planned Improvements:

- Performance optimization - Additional feature support

火山图标签工具 (ID: 148)

使用排斥算法自动识别并标记火山图中前10个最显著的基因，防止标签重叠。

功能特点

- 智能基因选择：基于p值和差异倍数自动识别前10个最显著基因
排斥算法：使用力导向定位防止文本标签重叠
可定制化：可配置阈值、标签样式和定位选项
多种输出格式：支持PNG、PDF、SVG格式

安装

bash
pip install pandas matplotlib numpy scipy

使用方法

基本用法

python
from volcanoplotlabeler import labelvolcanoplot
import pandas as pd

加载数据

df = pd.readcsv(differentialexpression_results.csv)

生成带标签的火山图

fig = labelvolcanoplot( df, log2fc_col=log2FoldChange, pvalue_col=padj, genecol=genename, top_n=10 ) fig.savefig(volcanoplotlabeled.png, dpi=300, bbox_inches=tight)

高级用法

python
from volcanoplotlabeler import labelvolcanoplot

fig = labelvolcanoplot(
df,
log2fc_col=log2FoldChange,
pvalue_col=padj,
genecol=genename,
top_n=10,
pvalue_threshold=0.05,
log2fc_threshold=1.0,
figsize=(12, 10),
repulsion_iterations=100,
repulsion_force=0.05,
label_fontsize=10,
label_color=black,
arrow_color=gray,
save_path=output.png
)

命令行使用

bash
python scripts/main.py \
--input data/deseq2_results.csv \
--output volcano_labeled.png \
--log2fc-col log2FoldChange \
--pvalue-col padj \
--gene-col gene_name \
--top-n 10

输入格式

期望的CSV/TSV列：

- log2FoldChange：Log2差异倍数值
padj 或 pvalue：校正后p值或原始p值
gene_name：基因标识符

算法

显著性计算

1. 计算所有基因的 -log10(pvalue)
按综合得分排序基因：|log2FC| * -log10(pvalue)
选择显著性最高的前N个基因

排斥算法

1. 初始定位：将标签放置在基因坐标位置
力计算：

- 重叠标签之间的排斥力 - 将标签拉向基因点的弹簧力 - 将标签保持在绘图区域内的边界力

3. 迭代优化：进行N次迭代更新位置直至收敛
箭头绘制：从标签到基因点绘制连接线

参数

参数	类型	默认值	描述
df	DataFrame	-	输入数据
log2fc_col

输出

- 带标签的火山图，包含：

- 颜色编码的点（上调/下调/不显著） - 前10个基因标签及引导线 - 无重叠文本标签

许可证

MIT

风险评估

风险指标	评估	等级
代码执行	Python/R脚本在本地执行	中等
网络访问

安全检查清单

- [ ] 无硬编码凭据或API密钥
[ ] 无未经授权的文件系统访问（../）
[ ] 输出不暴露敏感信息
[ ] 已实施提示注入保护
[ ] 输入文件路径已验证（无../遍历）
[ ] 输出目录限制在工作空间内
[ ] 脚本在沙盒环境中执行
[ ] 错误消息已清理（不暴露堆栈跟踪）
[ ] 依赖项已审计

先决条件

bash

Python依赖

pip install -r requirements.txt

评估标准

成功指标

- [ ] 成功执行主要功能
[ ] 输出符合质量标准
[ ] 优雅处理边缘情况
[ ] 性能可接受

测试用例

1. 基本功能：标准输入 → 预期输出
边缘情况：无效输入 → 优雅的错误处理
性能：大数据集 → 可接受的处理时间

生命周期状态

- 当前阶段：草案
下次审查日期：2026-03-06
已知问题：无
计划改进：

- 性能优化 - 额外功能支持

volcano-plot-labeler火山图基因标注