Protein-Ligand Docking

Use this skill for research questions such as:

- "Can ligand X plausibly bind protein Y?"
"Is this inhibitor likely to be selective between bacterial and human homologs?"
"Should we continue to docking, or is sequence/structure divergence already too large?"

Keep the workflow practical. If an early step already rules out a meaningful docking analysis, stop and explain why instead of forcing the full pipeline.

Inputs To Collect First

Ask for or infer:

- target protein name and species
ligand name and available structure format
whether the user wants a quick feasibility screen or a fuller workflow
whether an experimental structure already exists

Useful concrete inputs:

- UniProt ID or protein sequence
ligand SDF or SMILES
known PDB ID, if available
comparison target, if this is a selectivity question

Workflow

1. Sequence Retrieval

- Retrieve the target sequence from UniProt when the user provides a protein name or UniProt ID.
Save FASTA files with clear names because later scripts depend on them.
If the question is about selectivity, retrieve both sequences before moving on.

2. Structure Search

- Search RCSB PDB for experimentally solved structures first.
Prefer structures with a relevant ligand, catalytic domain, or biologically meaningful complex.
If no suitable structure exists, plan to use AlphaFold or AlphaFold-Multimer in Colab.

3. Sequence Conservation Check

When the question involves homolog comparison, run scripts/step3_alignment.py.

- High similarity suggests the binding region may be conserved and docking can be informative.
Borderline similarity means docking may still help, but interpretation must stay cautious.
Very low similarity can support an early "binding pocket likely not conserved" conclusion.

Detailed interpretation thresholds live in references/decision-guide.md.

4. Structure Modeling

Use AlphaFold-Multimer only when a suitable experimental structure is missing and a complex model is still needed.

- The Colab template is in references/alphafoldmultimer_colab.ipynb.
Include only biologically relevant chains.
Tell the user clearly when this step requires manual Colab execution.

5. Model Quality Assessment

Before docking an AlphaFold-derived structure, run scripts/step5pae_analysis.py.

Focus on two questions:

- Is the fold itself credible enough to use?
Is the interface or predicted docking region reliable enough to interpret?

If interface confidence is poor, stop and say docking would likely be misleading.

6. Docking

Run scripts/step6vina_docking.py when all of the following are true:

- the receptor structure is usable
the ligand structure is available
the docking box is justified by structure or interface analysis

Prefer docking settings derived from the modeled or known interaction region, not arbitrary whole-protein boxes.

7. Report The Result

Use scripts/step7summary_report.py when the user wants a structured deliverable.

The final answer should cover:

- binding affinity range, not just the single best score
whether the pose lands in a biologically meaningful region
whether the structure quality supports interpretation
what the main uncertainty is
what experimental validation would best test the claim

Decision Rules

Use these rules during execution:

- Do not treat docking as proof of binding.
Do not continue if the structure or interface confidence is clearly too poor.
Do not over-interpret small score differences across targets.
If the user only needs a quick answer, stop once the evidence is sufficient.
For biomedical research, always separate computational plausibility from experimental validation.

Thresholds, QC checks, and result wording guidance are in references/decision-guide.md.

Expected Outputs

Depending on the stage reached, provide some or all of:

- FASTA files for targets
selected PDB IDs or modeled structures
alignment summary JSON
model quality JSON with grid box coordinates
docking summary JSON
a short written conclusion in plain language
optional Summary.md, Summary.docx, and figure output

Dependencies

This skill may rely on:

- UniProt and RCSB web access
Google Colab for AlphaFold-Multimer
Python 3 plus Biopython, NumPy, RDKit, OpenBabel, and py3Dmol
AutoDock Vina in WSL or Linux

Installation notes and recommended thresholds are in references/decision-guide.md.

Limits To State Explicitly

Always warn the user about the main limits:

- docking scores are approximate, not definitive
static docking ignores induced fit and many solvent effects
AlphaFold confidence does not guarantee a correct ligand-binding geometry
experimental assays remain the standard for validation

蛋白质-配体对接

使用此技能研究以下问题：

- 配体X是否可能结合蛋白质Y？
该抑制剂是否可能在细菌与人类同源物之间具有选择性？
我们应继续对接，还是序列/结构差异已经过大？

保持工作流程的实用性。如果早期步骤已排除有意义的对接分析，则停止并解释原因，而非强行完成整个流程。

首先收集的输入信息

询问或推断：

- 目标蛋白名称和物种
配体名称及可用结构格式
用户需要快速可行性筛选还是更完整的工作流程
是否已有实验结构

有用的具体输入：

- UniProt ID或蛋白质序列
配体SDF或SMILES
已知的PDB ID（如有）
比较目标（如果是选择性相关问题）

工作流程

1. 序列检索

- 当用户提供蛋白质名称或UniProt ID时，从UniProt检索目标序列。
使用清晰名称保存FASTA文件，因为后续脚本依赖这些文件。
如果是选择性相关问题，在继续之前检索两个序列。

2. 结构搜索

- 首先在RCSB PDB中搜索实验解析的结构。
优先选择含有相关配体、催化结构域或生物学意义复合物的结构。
如果没有合适的结构，计划在Colab中使用AlphaFold或AlphaFold-Multimer。

3. 序列保守性检查

当问题涉及同源物比较时，运行scripts/step3_alignment.py。

- 高相似性表明结合区域可能保守，对接可提供有用信息。
边界相似性意味着对接可能仍有帮助，但解释必须保持谨慎。
极低相似性可支持结合口袋可能不保守的早期结论。

详细解释阈值见references/decision-guide.md。

4. 结构建模

仅在缺少合适的实验结构且仍需复合物模型时使用AlphaFold-Multimer。

- Colab模板位于references/alphafoldmultimer_colab.ipynb。
仅包含生物学相关的链。
当此步骤需要手动执行Colab时，明确告知用户。

5. 模型质量评估

在对接AlphaFold衍生的结构之前，运行scripts/step5pae_analysis.py。

关注两个问题：

- 折叠本身是否足够可信以供使用？
界面或预测的对接区域是否足够可靠以供解释？

如果界面置信度较低，停止并说明对接可能产生误导。

6. 对接

当以下所有条件满足时，运行scripts/step6vina_docking.py：

- 受体结构可用
配体结构可用
对接盒基于结构或界面分析合理确定

优先使用基于建模或已知相互作用区域确定的对接设置，而非任意全蛋白盒。

7. 报告结果

当用户需要结构化交付成果时，使用scripts/step7summary_report.py。

最终答案应涵盖：

- 结合亲和力范围，而非仅最佳评分
结合姿态是否位于生物学意义区域
结构质量是否支持解释
主要不确定性是什么
哪些实验验证最能检验该结论

决策规则

执行过程中使用以下规则：

- 不要将对接视为结合的证明。
如果结构或界面置信度明显过低，不要继续。
不要过度解释跨目标的微小评分差异。
如果用户仅需快速答案，证据充分时即可停止。
对于生物医学研究，始终将计算可行性与实验验证分开。

阈值、质量控制检查和结果措辞指南见references/decision-guide.md。

预期输出

根据达到的阶段，提供部分或全部内容：

- 目标的FASTA文件
选定的PDB ID或建模结构
比对摘要JSON
包含网格盒坐标的模型质量JSON
对接摘要JSON
用通俗语言编写的简短书面结论
可选的Summary.md、Summary.docx和图表输出

依赖项

此技能可能依赖：

- UniProt和RCSB网络访问
用于AlphaFold-Multimer的Google Colab
Python 3及Biopython、NumPy、RDKit、OpenBabel和py3Dmol
WSL或Linux中的AutoDock Vina

安装说明和推荐阈值见references/decision-guide.md。

需明确说明的限制

始终提醒用户主要限制：

- 对接分数为近似值，非确定性结果
静态对接忽略诱导契合和许多溶剂效应
AlphaFold置信度不保证正确的配体结合几何结构
实验测定仍是验证的标准

protein-ligand-docking蛋白质-配体对接

protein-ligand-docking

Protein-Ligand Docking

Inputs To Collect First