Ask-More
Consult multiple LLMs in parallel, then merge answers into a structured diff report: consensus, unique insights, conflicts, and actionable next steps.
Configuration
Config file: INLINECODE0
If missing, copy from skills/ask-more/config.example.yaml and guide user through setup (see § First-Use Setup).
Required
- -
models — list of model IDs (≥2), must be providers configured in the gateway
Optional
- -
presets — named model groups (see § Presets) - INLINECODE4 — separate model for diff merge (must NOT be in the models pool)
- INLINECODE5 — log run results to INLINECODE6
Validation
Run
bash skills/ask-more/scripts/load-config.sh skills/ask-more to validate config. It uses proper YAML parsing and checks: model count, preset validity, synthesis_model conflicts.
First-Use Setup
When
/ask is triggered but no models configured:
- 1. List providers already in the gateway
- Suggest 2-3 diverse models from different providers
- Offer to write config.yaml for them
- Do NOT block — get them running in one interaction
Commands
| Command | Description |
|---|
| INLINECODE9 | Consult all configured models |
| INLINECODE10 |
Consult on the most recent conversation topic |
|
/ask --preset <name> [question] | Use a named preset model group |
|
/ask --compare [question] | Raw compare mode: show responses side-by-side, no synthesis |
Presets
If presets is configured in config.yaml, users can select a named group:
CODEBLOCK0
When --preset <name> is used, override the default models with the preset's list for that run. If the preset doesn't exist, list available presets and abort.
Workflow
1. Parse trigger
- -
/ask <question> → use the provided question - INLINECODE17 (no args) → use the most recent user question/topic from conversation. If the recent topic is ambiguous, ask the user to clarify instead of guessing.
- INLINECODE18 → use preset model group
- INLINECODE19 → raw compare mode (skip step 6, go directly to step 7b)
Pre-flight check: If the question is trivially simple or empty (e.g., /ask hi, /ask ok), warn the user that multi-model consultation may not add value for this type of question, and confirm they want to proceed.
2. Privacy check (first use only)
Read config. If privacy_acknowledged is false:
⚠️ 隐私提示: ask-more 会将问题摘要发送给您配置的多个模型 provider(如 OpenAI、Google、Anthropic 等)。请勿在涉及高度敏感信息的场景使用。
继续使用即表示您已知悉。
Set privacy_acknowledged: true in config and proceed.
3. Model availability pre-check
Before context packing, verify that configured models are likely reachable:
- - Check that each model ID exists in the gateway's provider config
- If any model is not found, warn the user and suggest removing it
- If fewer than
min_models are available, abort early with a clear message
This prevents users from waiting through the full flow only to discover models are misconfigured.
4. Context Packing
Rewrite the user's question into a self-contained description:
- - Summarize relevant background (what was discussed, constraints, requirements)
- Replace all pronouns/references ("this", "that plan") with concrete descriptions
- State the core question explicitly
- Write neutrally — do NOT inject the primary model's own opinion or framing
- Explicitly note any assumptions you introduced that weren't in the original conversation
Summary mode (default, context_mode: summary):
- - Distill to 500-1000 words
Full mode (context_mode: full):
- - Include the last N turns of raw conversation verbatim (N =
full_mode_max_turns) - Prepend a 2-3 sentence summary of what the conversation is about
- Append the explicit question at the end
5. Confirmation (editable)
Present to user:
CODEBLOCK1
Wait for user response:
- - Confirm → proceed
- User sends modified text → use their version, re-confirm
- Cancel → abort
Dedup check: Hash the packed question. If an identical question was asked in the last 5 minutes, warn user and confirm they want to re-run (saves cost on accidental double-triggers).
6. Parallel consultation
For each model, spawn a subagent:
CODEBLOCK2
Progress feedback: After spawning, show status. As each model completes, update:
CODEBLOCK3
State tracking per model:
- -
pending → spawned, waiting - INLINECODE30 → response received
- INLINECODE31 → API error or malformed response
- INLINECODE32 → exceeded timeout_seconds
After all models complete or time out, call sessions_yield() to collect results.
Failure handling:
- - If a model returns an error → mark as
failed, log reason - If a model returns an extremely short response (<50 words) or a refusal → mark as
degenerate, include in report metadata but exclude from synthesis - If fewer than
min_models succeeded (non-degenerate) → inform user, offer to show whatever raw responses are available, abort synthesis
7a. Diff merge (normal mode)
Read the merge prompt from references/prompt-templates.md § "Diff Merge Prompt".
If synthesis_model is configured, use it for this step. Verify synthesis_model is NOT in the consultation pool — if it is, warn and fall back to primary model.
Degenerate response handling in merge prompt:
- - If a model refused to answer, note it as "[Model] declined to answer (safety/policy)"
- If a model gave an extremely short/generic response, note it as "[Model] provided limited input"
- These should appear in the report metadata, not pollute the consensus/diff analysis
Unanimous agreement shortcut:
If all models give substantially the same answer with no meaningful unique insights or conflicts, use a shorter output:
CODEBLOCK4
Normal output structure:
CODEBLOCK5
Chat interface adaptation: In Telegram/Discord/WhatsApp, deliver a 3-5 bullet summary first, then the full report. Avoid wall-of-text.
7b. Raw compare mode (--compare)
Skip synthesis entirely. Present each model's response in sequence:
CODEBLOCK6
This mode is for users who want to judge themselves without synthesis bias.
8. Graceful degradation
When things go wrong, degrade gracefully instead of failing silently:
| Scenario | Action |
|---|
| Synthesis model fails | Fall back to primary model for merge. If that also fails, return raw responses (like --compare mode) with a note. |
Only min_models barely respond |
Proceed but mark report as "⚠️ 低置信度:仅 N 个模型回复" |
| Only 1 model responds | Abort synthesis, show the single response with note: "仅收到 1 个回复,无法进行多模型对比" |
| All models time out | Abort, suggest user check model config and try again |
| Model returns refusal | Exclude from synthesis, note in report as policy difference |
| Model returns gibberish/extremely short | Exclude from synthesis, note as degenerate response |
9. Logging
If enable_logging: true in config, after each run, log to skills/ask-more/logs/runs.jsonl:
CODEBLOCK7
Use bash skills/ask-more/scripts/log-run.sh <skill-dir> '<json>' to append.
Prompt Engineering Notes
Sub-model prompt design
The subagent prompt (in
references/prompt-templates.md) explicitly:
- - Structures output as: assumptions → analysis → risks → recommendations
- Encourages challenging the question's premises — don't just answer, check if the question itself has flawed assumptions
- Uses soft word limit (~800 words) without hard truncation
Context packing discipline
- - Write neutrally — no opinion injection
- Explicitly list any assumptions introduced by packing
- User can edit the packed question — this is a core feature, not an afterthought
Merge prompt discipline
- - Sections 1-3 (consensus/diff/conflicts): faithful to model outputs, no added analysis
- Section 4 (synthesis): clearly labeled as synthesis, not truth
- Distinguish true conflicts from mere phrasing differences
- Include uncertainty labels: 高一致性 / 假设敏感 / 证据薄弱 / 价值判断分歧
- Cite which model supports each claim
- Handle degenerate/refusal responses explicitly
- For high-stakes conflicts: escalate with warning instead of resolving
Extensibility
Adding new modes
The architecture supports pluggable consultation modes beyond normal and compare:
- - Debate mode — models critique each other's responses in rounds
- Red-team mode — one model defends, others attack
- Domain-specific merge — different prompt templates for coding / product / legal / research questions
To add a mode: create a new prompt template in references/, add a command flag, and branch in step 7.
Model metadata (future)
For smarter model selection, config could include per-model metadata:
CODEBLOCK8
Ask-More
并行咨询多个大语言模型,然后将答案合并为一份结构化的差异报告:共识、独特见解、冲突点和可执行的后续步骤。
配置
配置文件:skills/ask-more/config.yaml
如果文件缺失,请从 skills/ask-more/config.example.yaml 复制,并引导用户完成设置(参见 § 首次使用设置)。
必需项
- - models — 模型ID列表(≥2个),必须是已在网关中配置的提供商
可选项
- - presets — 命名的模型分组(参见 § 预设组)
- synthesismodel — 用于差异合并的独立模型(不得在模型池中)
- enablelogging — 将运行结果记录到 logs/runs.jsonl
验证
运行 bash skills/ask-more/scripts/load-config.sh skills/ask-more 来验证配置。该脚本使用正确的YAML解析并检查:模型数量、预设组有效性、synthesis_model冲突。
首次使用设置
当 /ask 被触发但未配置任何模型时:
- 1. 列出网关中已有的提供商
- 建议2-3个来自不同提供商的多样化模型
- 提供为其写入config.yaml的选项
- 不要阻塞——在一次交互中让他们运行起来
命令
| 命令 | 描述 |
|---|
| /ask [问题] | 咨询所有已配置的模型 |
| /ask |
就最近的对话主题进行咨询 |
| /ask --preset <名称> [问题] | 使用指定的预设模型组 |
| /ask --compare [问题] | 原始对比模式:并排显示回复,不进行综合 |
预设组
如果在config.yaml中配置了 presets,用户可以选择一个命名分组:
yaml
presets:
fast:
- google/gemini-2.5-flash
- deepseek/deepseek-chat
deep:
- anthropic/claude-opus-4
- openai/gpt-4o
- google/gemini-2.5-pro
当使用 --preset <名称> 时,用该预设组的列表覆盖本次运行的默认 models。如果预设组不存在,列出可用的预设组并中止。
工作流程
1. 解析触发指令
- - /ask <问题> → 使用提供的问题
- /ask(无参数)→ 使用对话中最近一次的用户问题/主题。如果最近的主题不明确,要求用户澄清而不是猜测。
- /ask --preset <名称> ... → 使用预设模型组
- /ask --compare ... → 原始对比模式(跳过第6步,直接进入第7b步)
预检检查: 如果问题过于简单或为空(例如 /ask hi、/ask ok),警告用户对此类问题使用多模型咨询可能不会增加价值,并确认他们是否要继续。
2. 隐私检查(仅首次使用)
读取配置。如果 privacy_acknowledged 为 false:
⚠️ 隐私提示: ask-more 会将问题摘要发送给您配置的多个模型 provider(如 OpenAI、Google、Anthropic 等)。请勿在涉及高度敏感信息的场景使用。
继续使用即表示您已知悉。
在配置中将 privacy_acknowledged 设置为 true 并继续。
3. 模型可用性预检
在打包上下文之前,验证配置的模型是否可能可达:
- - 检查每个模型ID是否存在于网关的提供商配置中
- 如果找不到任何模型,警告用户并建议移除
- 如果可用模型少于 min_models,提前中止并给出明确信息
这可以防止用户等待整个流程完成后才发现模型配置错误。
4. 上下文打包
将用户的问题重写为自包含的描述:
- - 总结相关背景(讨论了什么、约束条件、要求)
- 将所有代词/指代(这个、那个计划)替换为具体描述
- 明确陈述核心问题
- 中立地书写——不要注入主模型自身的观点或框架
- 明确说明你在打包过程中引入的任何原始对话中没有的假设
摘要模式(默认,context_mode: summary):
完整模式(context_mode: full):
- - 逐字包含最近N轮原始对话(N = fullmodemax_turns)
- 在前面加上2-3句话的摘要,说明对话内容
- 在末尾附加明确的问题
5. 确认(可编辑)
向用户展示:
📋 背景:[打包后的背景]
❓ 问题:[明确的问题]
⚠️ 假设:[打包引入的任何假设,如果有]
🤖 将咨询:[模型列表]
💰 预估:[N] 次模型调用,约 $X.XX(基于模型定价层级的粗略估算)
确认发送?你也可以直接修改上面的问题描述。
请注意检查问题描述是否准确、中立、没有偏向性。
等待用户回应:
- - 确认 → 继续
- 用户发送修改后的文本 → 使用他们的版本,再次确认
- 取消 → 中止
去重检查: 对打包后的问题进行哈希。如果在过去5分钟内询问过相同的问题,警告用户并确认他们是否要重新运行(避免意外重复触发造成成本浪费)。
6. 并行咨询
为每个模型生成一个子代理:
sessions_spawn(
task: <打包后的问题 + 来自 references/prompt-templates.md 的子代理提示>,
model: <模型ID>,
mode: run,
runTimeoutSeconds:
)
进度反馈: 生成后显示状态。当每个模型完成时,更新:
⏳ 已收到 1/3 回复(Claude ✅)...
⏳ 已收到 2/3 回复(Claude ✅ GPT ✅)...
✅ 全部收到,正在合并...
每个模型的状态跟踪:
- - pending → 已生成,等待中
- success → 已收到回复
- failed → API错误或格式错误的回复
- timedout → 超过timeoutseconds
在所有模型完成或超时后,调用 sessions_yield() 收集结果。
失败处理:
- - 如果模型返回错误 → 标记为 failed,记录原因
- 如果模型返回极短的回复(<50字)或拒绝回答 → 标记为 degenerate,包含在报告元数据中,但排除在综合之外
- 如果成功(非degenerate)的模型少于 min_models → 通知用户,提供显示所有原始回复的选项,中止综合
7a. 差异合并(正常模式)
从 references/prompt-templates.md 的 差异合并提示 部分读取合并提示。
如果配置了 synthesismodel,则使用该模型进行此步骤。验证 synthesismodel 不在咨询池中——如果在,警告并回退到主模型。
合并提示中的退化回复处理:
- - 如果模型拒绝回答,记为 [模型] 拒绝回答(安全/政策原因)
- 如果模型给出了极短/通用的回复,记为 [模型] 提供了有限的输入
- 这些应出现在报告元数据中,不要污染共识/差异分析
全体一致快捷方式:
如果所有模型给出的答案基本相同,没有有意义的独特见解或冲突点,使用较短的输出:
🔍 Ask-More 咨询报告
📝 咨询问题:[问题]
✅ 高度一致:所有模型观点基本一致。
本次咨询的增量价值有限,以下是综合要点:
🎯 综合判断:...
📊 参与模型:[所有模型] ✅
正常输出结构:
🔍 Ask-More 咨询报告
📝 咨询问题:[问题]
🤝 共识观点:(多数模型都提到的)
⚠️ 共识代表被咨询模型之间的一致看法,不等于客观事实。模型可能共享相同的训练偏差。
💡 差异观点:
假设前提:...
假设前提:...
⚡ 冲突点:(模型之间互相矛盾的)
- - [模型 A] 认为 X,但 [模型 B] 认为 Y
→ 分歧原因:...(不同假设?不同优先级?不同证据?)
⚠️ 如涉及高风险决策,请以人工判断为准,不要直接采信任何一方。
🎯 综合判断:
- - 当前最佳判断:...
- 不确定性标签:[高一致性 / 假设敏感 / 证据薄弱 / 价值判断分歧]
- 信息缺口:...
- 建议下一步:...
📊 参与模型:[模型 A] ✅ | [模型 B] ✅ | [模型 C] ⏱️超时
聊天界面适配: 在Telegram/Discord/WhatsApp中,先发送