Chemistry Query Agent v1.4.1

Overview

Full-stack chemistry toolkit combining PubChem data retrieval with RDKit molecule processing, visualization, analysis, retrosynthesis, and synthesis planning. All outputs are structured JSON for easy downstream chaining. Generates PNG/SVG images on demand.

Key capabilities:

- PubChem compound lookup (info, structure, synthesis refs, similarity search)
RDKit molecular properties (MW, logP, TPSA, HBD/HBA, rotatable bonds, aromatic rings)
2D molecule visualization (PNG/SVG)
BRICS retrosynthesis with recursive depth control
Multi-step synthesis route planning
Forward reaction simulation with SMARTS templates
Morgan fingerprints and similarity/substructure search
21 named reaction templates (Suzuki, Heck, Grignard, Wittig, Diels-Alder, etc.)

Quick Start

CODEBLOCK0

Scripts

`scripts/query_pubchem.py`

PubChem REST API queries with automatic name→CID resolution and timeout handling.

CODEBLOCK1

- info: Formula, MW, IUPAC name, InChIKey (JSON)
structure: SMILES, InChI, image URL, or full JSON
synthesis: Synonyms/references for a compound
similar: Similar compounds by 2D fingerprint (top 20)

`scripts/rdkit_mol.py`

RDKit cheminformatics engine. Resolves names via PubChem automatically.

CODEBLOCK2

Action	Description	Key Args
props	MW, logP, TPSA, HBD, HBA, rotB, aromRings	INLINECODE2
draw

`scripts/chain_entry.py`

Standard agent chain interface. Accepts {"smiles": "...", "context": "..."} or {"name": "...", "context": "..."}. Returns unified JSON with props, visualization, and retrosynthesis.

CODEBLOCK3

Output schema:
CODEBLOCK4

`scripts/templates.json`

21 named reaction templates with SMARTS, expected yields, conditions, and references. Includes: Suzuki, Heck, Buchwald-Hartwig, Grignard, Wittig, Diels-Alder, Click, Sonogashira, Negishi, and more.

Chaining

1. Name → Full Profile: chain_entry.py with {"name": "ibuprofen"} → props + draw + retro
Chemistry → Pharmacology: Output feeds directly into INLINECODE17
Retro + Viz: Get precursors, then draw each one
Suzuki Test: INLINECODE18

Tested With

All features verified end-to-end with RDKit 2024.03+:

Molecule	SMILES	Tests Passed
Caffeine	INLINECODE19	info, structure, props, draw, retro, plan, chain
Aspirin

CC(=O)Oc1ccccc1C(=O)O | info, structure, props, draw, retro, plan, chain |
| Sotorasib | PubChem name lookup | info, structure, props, draw, retro, chain |
| Ibuprofen | PubChem name lookup | info, structure, props, chain |
| Invalid SMILES | XXXINVALID | Graceful JSON error |
| Empty input | {} | Graceful JSON error |

Resources

- references/api_endpoints.md — PubChem API endpoint reference and rate limits
INLINECODE24 — Legacy reaction module
INLINECODE25, scripts/pubmed_search.py, scripts/admet_predict.py — Additional query modules

`scripts/advanced_chem.py`

Advanced cheminformatics engine with 6 Tier 1 capabilities.

CODEBLOCK5

Action	Description	Key Args
standardize	Salt stripping, charge normalization, tautomer enumeration	INLINECODE29
descriptors

Examples:
CODEBLOCK6

Changelog

v2.0.0 (2026-02-28)

- NEW: advanced_chem.py with 6 Tier 1 cheminformatics capabilities

- Molecular Standardization & Tautomer Enumeration (salt stripping, charge normalization, canonical tautomers)
- Extended Descriptors (217+ RDKit descriptors, QED, SA Score, Lipinski, Veber)
- Scaffold Analysis (Murcko, generic scaffolds, diversity ratio, R-group decomposition)
- Maximum Common Substructure (rdFMCS with coverage per molecule)
- Matched Molecular Pair Analysis (rdMMPA fragmentation, transformation detection)
- Chemical Space Visualization (PCA/t-SNE/UMAP with matplotlib scatter plots)

- Dependencies: scikit-learn, matplotlib (added)

v1.4.1 (2026-02-25)

- Security hardening: input sanitization for all subprocess calls (SMILES, compound names, output paths)
Added _sanitize_input() — length limits, null-byte rejection for all user inputs
Added _sanitize_output_path() — prevents path traversal, restricts extensions, blocks arbitrary file writes
Added shell metacharacter rejection in INLINECODE40
Added SMILES validation via RDKit in chem_ui.py before subprocess calls
Added compound input validation in query_pubchem.py (length/null-byte checks)
Added timeout to resolve_target() PubChem subprocess call
Addresses VirusTotal "suspicious" classification for argument injection vectors

v1.4.0 (2026-02-14)

- Fixed PubChem SMILES/InChI endpoint (property/CanonicalSMILES/TXT)
Fixed chainentry.py HTML entity corruption
Fixed bricsretro to handle BRICSDecompose string output correctly
Added request timeouts (15s) to all PubChem calls
Graceful error handling for invalid SMILES and empty input
Updated chain output version and schema
Comprehensive end-to-end testing

v1.3.0

- RDKit props NoneType fixes, invalid SMILES graceful errors
React fix: ReactionFromSmarts import
Name resolution via PubChem for all RDKit actions

v1.2.0

- BRICS retrosynthesis + 21 reaction templates library
Multi-step synthesis planning

化学查询代理 v1.4.1

概述

全栈化学工具包，结合PubChem数据检索与RDKit分子处理、可视化、分析、逆合成及合成规划。所有输出均为结构化JSON格式，便于下游链式调用。可按需生成PNG/SVG图像。

关键能力：

- PubChem化合物查询（信息、结构、合成参考文献、相似性搜索）
RDKit分子属性（分子量、logP、TPSA、氢键供体/受体、可旋转键、芳香环）
2D分子可视化（PNG/SVG）
支持递归深度控制的BRICS逆合成
多步合成路线规划
基于SMARTS模板的正向反应模拟
Morgan指纹及相似性/子结构搜索
21种命名反应模板（Suzuki、Heck、Grignard、Wittig、Diels-Alder等）

快速开始

bash

PubChem化合物信息

exec python scripts/query_pubchem.py --compound aspirin --type info

从SMILES计算分子属性

exec python scripts/rdkit_mol.py --smiles CC(=O)Oc1ccccc1C(=O)O --action props

逆合成

exec python scripts/rdkit_mol.py --target CC(=O)Oc1ccccc1C(=O)O --action retro --depth 2

完整链式调用（名称 → 属性 + 绘图 + 逆合成）

exec python scripts/chain_entry.py --input-json {name: caffeine, context: user}

脚本

scripts/query_pubchem.py

PubChem REST API查询，支持自动名称→CID解析和超时处理。

--compound <名称|CID> --type [--format smiles|inchi|image|json] [--threshold 80]

- info: 分子式、分子量、IUPAC名称、InChIKey（JSON格式）
structure: SMILES、InChI、图像URL或完整JSON
synthesis: 化合物的同义词/参考文献
similar: 基于2D指纹的相似化合物（前20个）

scripts/rdkit_mol.py

RDKit化学信息学引擎。通过PubChem自动解析名称。

--smiles --action

操作	描述	关键参数
props	分子量、logP、TPSA、氢键供体、氢键受体、可旋转键、芳香环	--smiles
draw

scripts/chain_entry.py

标准代理链式接口。接受{smiles: ..., context: ...}或{name: ..., context: ...}。返回包含属性、可视化和逆合成的统一JSON。

bash
python scripts/chain_entry.py --input-json {name: sotorasib, context: user}

输出模式：
json
{
agent: chemistry-query,
version: 1.4.0,
smiles: <规范SMILES>,
status: success|error,
report: {props: {...}, draw: {...}, retro: {...}},
risks: [],
viz: [path/to/image.png],
recommend_next: [pharmacology, toxicology],
confidence: 0.95,
warnings: [],
timestamp: ISO8601
}

scripts/templates.json

21种命名反应模板，包含SMARTS、预期产率、反应条件和参考文献。包括：Suzuki、Heck、Buchwald-Hartwig、Grignard、Wittig、Diels-Alder、Click、Sonogashira、Negishi等。

链式调用

1. 名称 → 完整档案： chainentry.py 使用 {name: ibuprofen} → 属性 + 绘图 + 逆合成
化学 → 药理学： 输出直接输入到 pharma-pharmacology-agent
逆合成 + 可视化： 获取前体，然后绘制每个前体
Suzuki测试： --action react --reactants c1ccccc1Br c1ccccc1B(O)O --smarts [c:1][Br:2].[c:3]B(O)O>>[c:1][c:3]

测试验证

所有功能已使用RDKit 2024.03+进行端到端验证：

分子 SMILES 通过的测试
咖啡因 CN1C=NC2=C1C(=O)N(C(=O)N2C)C info, structure, props, draw, retro, plan, chain
阿司匹林
CC(=O)Oc1ccccc1C(=O)O | info, structure, props, draw, retro, plan, chain |
| 索托拉西布 | PubChem名称查询 | info, structure, props, draw, retro, chain |
| 布洛芬 | PubChem名称查询 | info, structure, props, chain |
| 无效SMILES | XXXINVALID | 优雅的JSON错误 |
| 空输入 | {} | 优雅的JSON错误 |

资源

- references/apiendpoints.md — PubChem API端点参考和速率限制
scripts/rdkitreaction.py — 遗留反应模块
scripts/chemblquery.py、scripts/pubmedsearch.py、scripts/admet_predict.py — 其他查询模块

scripts/advanced_chem.py
高级化学信息学引擎，具备6项一级能力。
--action --smiles [选项]

操作描述关键参数
standardize 盐去除、电荷归一化、互变异构体枚举 --smiles
descriptors
217+分子描述符（RDKit全集）、QED、SA评分、Lipinski/Veber规则 | --smiles --descriptor_set all\|druglike\|physical\|topological | | scaffold | Murcko骨架提取、通用骨架、多样性分析、R基团分解 | --smiles 或 --targetsmiles smi1,smi2,... --rgroupcore | | mcs | 2+分子间的最大公共子结构 | --target_smiles smi1,smi2,... | | mmpa | 匹配分子对分析 — 寻找单点变换 | --target_smiles smi1,smi2,... | | chemspace | 化学空间可视化（PCA/t-SNE/UMAP散点图PNG） | --target_smiles smi1,smi2,... --method pca\|tsne\|umap --output plot.png |
示例：
bash

标准化盐形式

python scripts/advanced_chem.py --action standardize --smiles [Na+].CC(=O)[O-]

完整描述符（217+）
python scripts/advancedchem.py --action descriptors --smiles CC(=O)Oc1ccccc1C(=O)O --descriptorset all
集合的骨架多样性
python scripts/advancedchem.py --action scaffold --targetsmiles CC(=O)Oc1ccccc1C(=O)O,CN1C=NC2=C1C(=O)N(C(=O)N2C)C,CC(C)Cc1ccc(cc1)C(C)C(=O)O
阿司匹林和水杨酸的MCS

分子	SMILES	通过的测试
咖啡因	CN1C=NC2=C1C(=O)N(C(=O)N2C)C	info, structure, props, draw, retro, plan, chain
阿司匹林

操作	描述	关键参数
standardize	盐去除、电荷归一化、互变异构体枚举	--smiles
descriptors

pharmaclaw-chemistry-query药化查询

pharmaclaw-chemistry-query

Chemistry Query Agent v1.4.1

Overview

Quick Start