Protein-Ligand Docking
Use this skill for research questions such as:
- - "Can ligand X plausibly bind protein Y?"
- "Is this inhibitor likely to be selective between bacterial and human homologs?"
- "Should we continue to docking, or is sequence/structure divergence already too large?"
Keep the workflow practical. If an early step already rules out a meaningful docking analysis, stop and explain why instead of forcing the full pipeline.
Inputs To Collect First
Ask for or infer:
- - target protein name and species
- ligand name and available structure format
- whether the user wants a quick feasibility screen or a fuller workflow
- whether an experimental structure already exists
Useful concrete inputs:
- - UniProt ID or protein sequence
- ligand SDF or SMILES
- known PDB ID, if available
- comparison target, if this is a selectivity question
Workflow
1. Sequence Retrieval
- - Retrieve the target sequence from UniProt when the user provides a protein name or UniProt ID.
- Save FASTA files with clear names because later scripts depend on them.
- If the question is about selectivity, retrieve both sequences before moving on.
2. Structure Search
- - Search RCSB PDB for experimentally solved structures first.
- Prefer structures with a relevant ligand, catalytic domain, or biologically meaningful complex.
- If no suitable structure exists, plan to use AlphaFold or AlphaFold-Multimer in Colab.
3. Sequence Conservation Check
When the question involves homolog comparison, run scripts/step3_alignment.py.
- - High similarity suggests the binding region may be conserved and docking can be informative.
- Borderline similarity means docking may still help, but interpretation must stay cautious.
- Very low similarity can support an early "binding pocket likely not conserved" conclusion.
Detailed interpretation thresholds live in references/decision-guide.md.
4. Structure Modeling
Use AlphaFold-Multimer only when a suitable experimental structure is missing and a complex model is still needed.
5. Model Quality Assessment
Before docking an AlphaFold-derived structure, run scripts/step5pae_analysis.py.
Focus on two questions:
- - Is the fold itself credible enough to use?
- Is the interface or predicted docking region reliable enough to interpret?
If interface confidence is poor, stop and say docking would likely be misleading.
6. Docking
Run scripts/step6vina_docking.py when all of the following are true:
- - the receptor structure is usable
- the ligand structure is available
- the docking box is justified by structure or interface analysis
Prefer docking settings derived from the modeled or known interaction region, not arbitrary whole-protein boxes.
7. Report The Result
Use scripts/step7summary_report.py when the user wants a structured deliverable.
The final answer should cover:
- - binding affinity range, not just the single best score
- whether the pose lands in a biologically meaningful region
- whether the structure quality supports interpretation
- what the main uncertainty is
- what experimental validation would best test the claim
Decision Rules
Use these rules during execution:
- - Do not treat docking as proof of binding.
- Do not continue if the structure or interface confidence is clearly too poor.
- Do not over-interpret small score differences across targets.
- If the user only needs a quick answer, stop once the evidence is sufficient.
- For biomedical research, always separate computational plausibility from experimental validation.
Thresholds, QC checks, and result wording guidance are in references/decision-guide.md.
Expected Outputs
Depending on the stage reached, provide some or all of:
- - FASTA files for targets
- selected PDB IDs or modeled structures
- alignment summary JSON
- model quality JSON with grid box coordinates
- docking summary JSON
- a short written conclusion in plain language
- optional
Summary.md, Summary.docx, and figure output
Dependencies
This skill may rely on:
- - UniProt and RCSB web access
- Google Colab for AlphaFold-Multimer
- Python 3 plus Biopython, NumPy, RDKit, OpenBabel, and py3Dmol
- AutoDock Vina in WSL or Linux
Installation notes and recommended thresholds are in references/decision-guide.md.
Limits To State Explicitly
Always warn the user about the main limits:
- - docking scores are approximate, not definitive
- static docking ignores induced fit and many solvent effects
- AlphaFold confidence does not guarantee a correct ligand-binding geometry
- experimental assays remain the standard for validation
蛋白质-配体对接
使用此技能研究以下问题:
- - 配体X是否可能结合蛋白质Y?
- 该抑制剂是否可能在细菌与人类同源物之间具有选择性?
- 我们应继续对接,还是序列/结构差异已经过大?
保持工作流程的实用性。如果早期步骤已排除有意义的对接分析,则停止并解释原因,而非强行完成整个流程。
首先收集的输入信息
询问或推断:
- - 目标蛋白名称和物种
- 配体名称及可用结构格式
- 用户需要快速可行性筛选还是更完整的工作流程
- 是否已有实验结构
有用的具体输入:
- - UniProt ID或蛋白质序列
- 配体SDF或SMILES
- 已知的PDB ID(如有)
- 比较目标(如果是选择性相关问题)
工作流程
1. 序列检索
- - 当用户提供蛋白质名称或UniProt ID时,从UniProt检索目标序列。
- 使用清晰名称保存FASTA文件,因为后续脚本依赖这些文件。
- 如果是选择性相关问题,在继续之前检索两个序列。
2. 结构搜索
- - 首先在RCSB PDB中搜索实验解析的结构。
- 优先选择含有相关配体、催化结构域或生物学意义复合物的结构。
- 如果没有合适的结构,计划在Colab中使用AlphaFold或AlphaFold-Multimer。
3. 序列保守性检查
当问题涉及同源物比较时,运行scripts/step3_alignment.py。
- - 高相似性表明结合区域可能保守,对接可提供有用信息。
- 边界相似性意味着对接可能仍有帮助,但解释必须保持谨慎。
- 极低相似性可支持结合口袋可能不保守的早期结论。
详细解释阈值见references/decision-guide.md。
4. 结构建模
仅在缺少合适的实验结构且仍需复合物模型时使用AlphaFold-Multimer。
5. 模型质量评估
在对接AlphaFold衍生的结构之前,运行scripts/step5pae_analysis.py。
关注两个问题:
- - 折叠本身是否足够可信以供使用?
- 界面或预测的对接区域是否足够可靠以供解释?
如果界面置信度较低,停止并说明对接可能产生误导。
6. 对接
当以下所有条件满足时,运行scripts/step6vina_docking.py:
- - 受体结构可用
- 配体结构可用
- 对接盒基于结构或界面分析合理确定
优先使用基于建模或已知相互作用区域确定的对接设置,而非任意全蛋白盒。
7. 报告结果
当用户需要结构化交付成果时,使用scripts/step7summary_report.py。
最终答案应涵盖:
- - 结合亲和力范围,而非仅最佳评分
- 结合姿态是否位于生物学意义区域
- 结构质量是否支持解释
- 主要不确定性是什么
- 哪些实验验证最能检验该结论
决策规则
执行过程中使用以下规则:
- - 不要将对接视为结合的证明。
- 如果结构或界面置信度明显过低,不要继续。
- 不要过度解释跨目标的微小评分差异。
- 如果用户仅需快速答案,证据充分时即可停止。
- 对于生物医学研究,始终将计算可行性与实验验证分开。
阈值、质量控制检查和结果措辞指南见references/decision-guide.md。
预期输出
根据达到的阶段,提供部分或全部内容:
- - 目标的FASTA文件
- 选定的PDB ID或建模结构
- 比对摘要JSON
- 包含网格盒坐标的模型质量JSON
- 对接摘要JSON
- 用通俗语言编写的简短书面结论
- 可选的Summary.md、Summary.docx和图表输出
依赖项
此技能可能依赖:
- - UniProt和RCSB网络访问
- 用于AlphaFold-Multimer的Google Colab
- Python 3及Biopython、NumPy、RDKit、OpenBabel和py3Dmol
- WSL或Linux中的AutoDock Vina
安装说明和推荐阈值见references/decision-guide.md。
需明确说明的限制
始终提醒用户主要限制:
- - 对接分数为近似值,非确定性结果
- 静态对接忽略诱导契合和许多溶剂效应
- AlphaFold置信度不保证正确的配体结合几何结构
- 实验测定仍是验证的标准