SMILES De-salter
ID: 176
Batch process chemical structure strings, removing salt ion portions and retaining only the active core.
When to Use
- - Use this skill when the task needs Batch process chemical SMILES strings to remove salt ions and retain.
- Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
Key Features
- - Scope-focused workflow aligned to: Analyze data with
smiles-de-salter using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation. - Packaged executable path(s):
scripts/main.py. - Reference material available in
references/ for task-specific guidance. - Structured execution path designed to keep outputs consistent and reviewable.
Dependencies
- - Python >= 3.8
- rdkit >= 2022.03.1
Example Usage
See ## Usage above for related details.
CODEBLOCK0
Example run plan:
- 1. Confirm the user input, output path, and any required config values.
- Edit the in-file
CONFIG block or documented parameters if the script uses fixed settings. - Run
python scripts/main.py with the validated inputs. - Review the generated output and return the final artifact with any assumptions called out.
Implementation Details
See ## Workflow above for related details.
- - Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface:
scripts/main.py. - Reference guidance:
references/ contains supporting rules, prompts, or checklists. - Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
CODEBLOCK1
Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
CODEBLOCK2
Workflow
- 1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
- Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
- Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
- Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
- If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
Function Description
This Skill is used to process chemical SMILES strings, automatically identifying and removing counterions, retaining only the active pharmaceutical ingredient (API).
Salt Ion Identification Rules
- - Identify multiple components through
. separator - Salt ions are usually smaller ions (such as Na⁺, Cl⁻, K⁺, Br⁻, etc.)
- Retain the component with the most atoms as the core
- Support common inorganic salts and organic acid salts
Supported Salt Types
| Type | Examples |
|---|
| Inorganic salts | NaCl, KCl, HCl, H₂SO₄ |
| Organic acid salts |
Citrate, Tartrate, Maleate |
| Quaternary ammonium salts | Various quaternary ammonium compounds |
Usage
Command Line
CODEBLOCK3
Parameter Description
| Parameter | Short | Description | Default |
|---|
| INLINECODE10 | INLINECODE11 | Input file path (CSV/TSV/SMILES) | Required |
| INLINECODE12 |
-o | Output file path | desalted_output.csv |
|
--column |
-c | SMILES column name | smiles |
|
--keep-largest |
-k | Keep largest component (by atom count) | True |
Single Processing Example
CODEBLOCK4
Input Format
CSV/TSV Files
CODEBLOCK5
Pure SMILES Files
One SMILES string per line:
CODEBLOCK6
Output Format
Output file contains original data and new processing result columns:
CODEBLOCK7
Install Dependencies
CODEBLOCK8
Processing Logic
- 1. Parse SMILES: Use RDKit to parse input string
- Component Splitting: Identify multiple molecular components separated by INLINECODE18
- Core Identification:
- Default selects component with the most atoms
- Optional: based on molecular weight, ring count, etc.
- 4. Output Result: Return clean core SMILES
Error Handling
- - If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If
scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback. - Do not fabricate files, citations, data, search results, or execution outcomes.
Examples
Example 1: Simple Inorganic Salt
Input: CCO.[Na+]
Output: INLINECODE21
Example 2: HCl Salt
Input: CN1C=NC2=C1C(=O)N(C)C(=O)N2C.Cl
Output: INLINECODE23
Example 3: Complex Organic Salt
Input: CC(C)CN1C(=O)N(C)C(=O)C2=C1N=CN2C.C(C(=O)O)C(CC(=O)O)(C(=O)O)O
Output: CC(C)CN1C(=O)N(C)C(=O)C2=C1N=CN2C (retains larger caffeine molecule)
Notes
- 1. This tool assumes the core is the component with the most atoms
- For co-crystals or multi-component drugs, manual review may be needed
- Some hydrochloride salts may exist as
[Cl-] or INLINECODE27 - It is recommended to sample and verify results
Author
OpenClaw Skill Hub
Version
v1.0.0
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access |
No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- - [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
Prerequisites
No additional Python packages required.
Evaluation Criteria
Success Metrics
- - [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
Test Cases
- 1. Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- - Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support
Output Requirements
Every final response should make these items explicit when they are relevant:
- - Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
Input Validation
This skill accepts requests that match the documented purpose of smiles-de-salter and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
INLINECODE29 only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
Response Template
Use the following fixed structure for non-trivial requests:
- 1. Objective
- Inputs Received
- Assumptions
- Workflow
- Deliverable
- Risks and Limits
- Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
Inputs to Collect
- - Required inputs: the user goal, the primary data or source file, and the requested output format.
- Optional inputs: output directory, formatting preferences, and validation constraints.
- If a required input is unavailable, return a short clarification request before continuing.
Output Contract
- - Return a short summary, the main deliverables, and any assumptions that materially affect interpretation.
- If execution is partial, label what succeeded, what failed, and the next safe recovery step.
- Keep the final answer within the documented scope of the skill.
Validation and Safety Rules
- - Validate identifiers, file paths, and user-provided parameters before execution.
- Do not fabricate results, metrics, citations, or downstream conclusions.
- Use safe fallback behavior when dependencies, credentials, or required inputs are missing.
- Surface any execution failure with a concise diagnosis and recovery path.
SMILES 脱盐器
ID: 176
批量处理化学结构字符串,去除盐离子部分,仅保留活性核心。
使用时机
- - 当任务需要批量处理化学SMILES字符串以去除盐离子并保留时使用此技能。
- 用于需要明确假设、有限范围和可重复输出格式的数据分析任务。
- 当需要为缺失输入、执行错误或部分证据提供有文档记录的备用路径时使用此技能。
主要特性
- - 聚焦范围的工作流程,符合:使用smiles-de-salter通过可重复工作流、明确验证和结构化输出分析数据,以便进行可审查的解读。
- 打包的可执行路径:scripts/main.py。
- references/中提供参考资料,用于任务特定指导。
- 结构化执行路径,旨在保持输出一致且可审查。
依赖项
- - Python >= 3.8
- rdkit >= 2022.03.1
使用示例
相关详情请参见上方的## Usage。
bash
cd 20260318/scientific-skills/Data Analytics/smiles-de-salter
python -m py_compile scripts/main.py
python scripts/main.py --help
示例运行计划:
- 1. 确认用户输入、输出路径以及任何必需的配置值。
- 如果脚本使用固定设置,编辑文件内的CONFIG块或文档化参数。
- 使用验证后的输入运行python scripts/main.py。
- 审查生成的输出,并返回最终产物,同时注明任何假设。
实现细节
相关详情请参见上方的## Workflow。
- - 执行模型:验证请求,选择打包的工作流,并生成有限范围的可交付成果。
- 输入控制:在运行任何脚本之前,确认源文件、范围限制、输出格式和验收标准。
- 主要实现界面:scripts/main.py。
- 参考指南:references/包含支持规则、提示或检查清单。
- 需首先明确的参数:输入路径、输出路径、范围过滤器、阈值以及任何领域特定的约束。
- 输出规范:保持结果可重复,明确标识假设,避免未记录的副作用。
快速检查
在深入执行之前,使用此命令验证打包脚本入口点是否可解析。
bash
python -m py_compile scripts/main.py
审计就绪命令
使用这些具体命令进行验证。它们特意设计为自包含,避免使用占位符路径。
bash
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --input 具有明确症状、病史、评估和下一步计划的审计验证样本。
工作流程
- 1. 在进行详细工作之前,确认用户目标、必需输入和不可协商的约束条件。
- 验证请求是否与文档化范围匹配,如果任务需要不支持的假设则提前停止。
- 仅使用实际可用的输入,使用打包脚本路径或文档化的推理路径。
- 返回结构化结果,区分假设、可交付成果、风险和未解决项。
- 如果执行失败或输入不完整,切换到备用路径并准确说明阻止完整完成的原因。
功能描述
此技能用于处理化学SMILES字符串,自动识别并去除反离子,仅保留活性药物成分(API)。
盐离子识别规则
- - 通过.分隔符识别多个组分
- 盐离子通常是较小的离子(如Na⁺、Cl⁻、K⁺、Br⁻等)
- 保留原子数最多的组分作为核心
- 支持常见无机盐和有机酸盐
支持的盐类型
| 类型 | 示例 |
|---|
| 无机盐 | NaCl、KCl、HCl、H₂SO₄ |
| 有机酸盐 |
柠檬酸盐、酒石酸盐、马来酸盐 |
| 季铵盐 | 各种季铵化合物 |
使用方法
命令行
text
python -m py_compile scripts/main.py
示例调用:python scripts/main.py -i input.csv -o output.csv -c smiles_column
参数说明
| 参数 | 缩写 | 描述 | 默认值 |
|---|
| --input | -i | 输入文件路径(CSV/TSV/SMILES) | 必需 |
| --output |
-o | 输出文件路径 | desalted_output.csv |
| --column | -c | SMILES列名 | smiles |
| --keep-largest | -k | 保留最大组分(按原子数) | True |
单次处理示例
text
python scripts/main.py -s CC(C)CN1C(=O)N(C)C(=O)C2=C1N=CN2C.[Na+]
输出:CC(C)CN1C(=O)N(C)C(=O)C2=C1N=CN2C
输入格式
CSV/TSV文件
csv
id,smiles,name
1,CCO.[Na+],ethanol_sodium
2,c1ccccc1.[Cl-],benzene_hcl
纯SMILES文件
每行一个SMILES字符串:
CCO.[Na+]
c1ccccc1.[Cl-]
输出格式
输出文件包含原始数据和新处理结果列:
csv
id,smiles,name,desalted_smiles,status
1,CCO.[Na+],ethanol_sodium,CCO,success
2,c1ccccc1.[Cl-],benzene_hcl,c1ccccc1,success
安装依赖项
text
pip install rdkit pandas
处理逻辑
- 1. 解析SMILES:使用RDKit解析输入字符串
- 组分拆分:识别由.分隔的多个分子组分
- 核心识别:
- 默认选择原子数最多的组分
- 可选:基于分子量、环数等
- 4. 输出结果:返回纯净的核心SMILES
错误处理
- - 如果缺少必需输入,准确说明缺少哪些字段,并仅请求最少的额外信息。
- 如果任务超出文档化范围,则停止,而不是猜测或悄悄扩大任务范围。
- 如果scripts/main.py失败,报告失败点,总结仍可安全完成的内容,并提供手动备用方案。
- 不要捏造文件、引用、数据、搜索结果或执行结果。
示例
示例1:简单无机盐
输入:CCO.[Na+]
输出:CCO
示例2:盐酸盐
输入:CN1C=NC2=C1C(=O)N(C)C(=O)N2C.Cl
输出:CN1C=NC2=C1C(=O)N(C)C(=O)N2C
示例3:复杂有机盐
输入:CC(C)CN1C(=O)N(C)C(=O)C2=C1N=CN2C.C(C(=O)O)C(CC(=O)O)(C(=O)O)O
输出:CC(C)CN1C(=O)N(C)C(=O)C2=C1N=CN2C(保留较大的咖啡因分子)
注意事项
- 1. 此工具假设核心是原子数最多的组分
- 对于共晶或多组分药物,可能需要人工审查
- 某些盐酸盐可能以[Cl-]或Cl形式存在
- 建议抽样验证结果
作者
OpenClaw技能中心
版本
v1.0.0
风险评估
| 风险指标 | 评估 | 级别 |
|---|
| 代码执行 | Python/R脚本在本地执行 | 中 |
| 网络访问 |
无外部API调用 | 低 |
| 文件系统访问 | 读取输入文件,写入输出文件 | 中 |
| 指令篡改 | 标准提示指南 | 低 |
| 数据暴露 | 输出文件保存到工作区 | 低 |
安全检查清单
- - [ ] 无硬编码凭据或API密钥
- [ ] 无未经授权的文件系统访问(../)
- [ ] 输出不暴露敏感信息
- [ ] 已实施提示注入保护
- [ ] 输入文件路径已验证(无../遍历)
- [ ] 输出目录限制在工作区内
- [ ] 脚本在沙盒环境中执行
- [ ] 错误消息已清理(不暴露堆栈跟踪)
- [ ] 依赖项已审计
先决条件
无需额外的Python包。
评估标准
成功指标
- - [ ] 成功执行主要功能
- [ ] 输出符合质量标准
- [ ] 优雅处理边缘情况
- [ ] 性能可接受
测试用例
- 1. 基本功能:标准输入 → 预期输出
- 边缘情况:无效输入 → 优雅的错误处理
- 性能:大数据集 → 可接受的处理时间
生命周期状态
- - 当前阶段:草稿
- 下次审查日期:2026-03-06
- 已知问题