Multi Search Engine
Integration of 16 search engines for web crawling without API keys.
Workflow
- 1. Preparation: AI Agent initializes an empty in-memory cookie store. Cookies are only acquired dynamically during search operations when access is denied
- 2. Language Evaluation: Detect the language attribute of the search query. If the query is in Chinese, use Domestic search engines (Baidu, Bing CN, Bing INT, 360, Sogou, WeChat, Shenma). If the query is non-Chinese, use International search engines (Google, Google HK, DuckDuckGo, Yahoo, Startpage, Brave, Ecosia, Qwant, WolframAlpha). Select engines based on query relevance and availability.
- 3. Controlled Search: Use web_fetch to execute search requests with rate limiting:
- Add 1-2 second delay between requests to respect server load
- Batch requests in groups of 3-4 engines with sequential execution between batches
- Include standard browser headers to identify as legitimate user agent
- If access is denied (403/429), fetch engine homepage to obtain fresh session cookies
- 4. Cookie Management:
- Cookies are stored ONLY in memory during runtime
- Cookies are acquired on-demand when search requests fail
- No cookies are read from or written to config.json or any file
- Cookies are cleared after search session completes
- Only session cookies from search engine domains are captured
- 5. Retry Mechanism: If a search fails due to cookie/session issues, retry once with freshly acquired cookies after a 2-second delay
- 6. Result Aggregation: Consolidate successful results from search engines, organize and summarize them to output a core search report
Search Engines
Domestic (7)
- - Baidu: INLINECODE0
- Bing CN: INLINECODE1
- Bing INT: INLINECODE2
- 360: INLINECODE3
- Sogou: INLINECODE4
- WeChat: INLINECODE5
- Shenma: INLINECODE6
International (9)
- - Google: INLINECODE7
- Google HK: INLINECODE8
- DuckDuckGo: INLINECODE9
- Yahoo: INLINECODE10
- Startpage: INLINECODE11
- Brave: INLINECODE12
- Ecosia: INLINECODE13
- Qwant: INLINECODE14
- WolframAlpha: INLINECODE15
Quick Examples
CODEBLOCK0
Advanced Operators
| Operator | Example | Description |
|---|
| INLINECODE16 | INLINECODE17 | Search within site |
| INLINECODE18 |
filetype:pdf report | Specific file type |
|
"" |
"machine learning" | Exact match |
|
- |
python -snake | Exclude term |
|
OR |
cat OR dog | Either term |
Time Filters
| Parameter | Description |
|---|
| INLINECODE26 | Past hour |
| INLINECODE27 |
Past day |
|
tbs=qdr:w | Past week |
|
tbs=qdr:m | Past month |
|
tbs=qdr:y | Past year |
Privacy Engines
- - DuckDuckGo: No tracking
- Startpage: Google results + privacy
- Brave: Independent index
- Qwant: EU GDPR compliant
Bangs Shortcuts (DuckDuckGo)
| Bang | Destination |
|---|
| INLINECODE31 | Google |
| INLINECODE32 |
GitHub |
|
!so | Stack Overflow |
|
!w | Wikipedia |
|
!yt | YouTube |
WolframAlpha Queries
- - Math: INLINECODE36
- Conversion: INLINECODE37
- Stocks: INLINECODE38
- Weather: INLINECODE39
Documentation
- -
references/advanced-search.md - Domestic search guide - INLINECODE41 - International search guide
- INLINECODE42 - Version history
License
MIT
Security & Privacy Notice
Cookie Handling
- - Purpose: Cookies are used ONLY to maintain search session state when access is denied (403/429 errors)
- Storage: Cookies are kept STRICTLY in memory during runtime - NEVER persisted to disk or config files
- Acquisition: Cookies are acquired on-demand from search engine homepages only when search requests fail
- Scope: Only session cookies from the specific search engine domain are captured
- Lifecycle: Cookies are cleared immediately after the search session completes
- No Pre-configuration: No cookies are loaded from config.json or any external file at startup
- No API Keys: This tool uses standard web search URLs, no authentication required
Crawling Ethics
- - Rate Limiting: Implement reasonable delays between requests (recommend 1-2 seconds)
- Respect robots.txt: Honor search engine crawling policies
- Terms of Service: Users are responsible for complying with search engine ToS
- Purpose: Designed for legitimate search aggregation, not mass data scraping
Data Handling
- - No Personal Data: Tool does not collect or transmit user personal information
- Local Execution: All operations run locally, no external data transmission
- Session Isolation: Cookies are session-specific and cleared after use
多搜索引擎
集成16个搜索引擎,无需API密钥即可进行网页抓取。
工作流程
- 1. 准备阶段:AI智能体初始化一个空的内存Cookie存储。仅在搜索操作被拒绝访问时动态获取Cookie
- 2. 语言评估:检测搜索查询的语言属性。如果查询为中文,使用国内搜索引擎(百度、必应中国、必应国际、360、搜狗、微信、神马)。如果查询为非中文,使用国际搜索引擎(谷歌、谷歌香港、DuckDuckGo、雅虎、Startpage、Brave、Ecosia、Qwant、WolframAlpha)。根据查询相关性和可用性选择引擎。
- 3. 受控搜索:使用web_fetch执行带速率限制的搜索请求:
- 请求之间添加1-2秒延迟以尊重服务器负载
- 以3-4个引擎为一组进行批量请求,批次之间顺序执行
- 包含标准浏览器头部以识别为合法用户代理
- 如果访问被拒绝(403/429),获取引擎主页以获取新的会话Cookie
- 4. Cookie管理:
- Cookie仅在运行时存储在内存中
- 当搜索请求失败时按需获取Cookie
- 不从config.json或任何文件读取或写入Cookie
- 搜索会话完成后清除Cookie
- 仅捕获来自搜索引擎域的会话Cookie
- 5. 重试机制:如果搜索因Cookie/会话问题失败,在2秒延迟后使用新获取的Cookie重试一次
- 6. 结果聚合:整合来自搜索引擎的成功结果,组织并总结以输出核心搜索报告
搜索引擎
国内(7个)
- - 百度:https://www.baidu.com/s?wd={keyword}
- 必应中国:https://cn.bing.com/search?q={keyword}&ensearch=0
- 必应国际:https://cn.bing.com/search?q={keyword}&ensearch=1
- 360:https://www.so.com/s?q={keyword}
- 搜狗:https://sogou.com/web?query={keyword}
- 微信:https://wx.sogou.com/weixin?type=2&query={keyword}
- 神马:https://m.sm.cn/s?q={keyword}
国际(9个)
- - 谷歌:https://www.google.com/search?q={keyword}
- 谷歌香港:https://www.google.com.hk/search?q={keyword}
- DuckDuckGo:https://duckduckgo.com/html/?q={keyword}
- 雅虎:https://search.yahoo.com/search?p={keyword}
- Startpage:https://www.startpage.com/sp/search?query={keyword}
- Brave:https://search.brave.com/search?q={keyword}
- Ecosia:https://www.ecosia.org/search?q={keyword}
- Qwant:https://www.qwant.com/?q={keyword}
- WolframAlpha:https://www.wolframalpha.com/input?i={keyword}
快速示例
javascript
// 基本搜索
web_fetch({url: https://www.google.com/search?q=python+tutorial})
// 指定站点
web_fetch({url: https://www.google.com/search?q=site:github.com+react})
// 文件类型
web_fetch({url: https://www.google.com/search?q=machine+learning+filetype:pdf})
// 时间筛选(过去一周)
web_fetch({url: https://www.google.com/search?q=ai+news&tbs=qdr:w})
// 隐私搜索
web_fetch({url: https://duckduckgo.com/html/?q=privacy+tools})
// DuckDuckGo快捷指令
web_fetch({url: https://duckduckgo.com/html/?q=!gh+tensorflow})
// 知识计算
web_fetch({url: https://www.wolframalpha.com/input?i=100+USD+to+CNY})
高级运算符
| 运算符 | 示例 | 描述 |
|---|
| site: | site:github.com python | 在站点内搜索 |
| filetype: |
filetype:pdf report | 特定文件类型 |
| | machine learning | 精确匹配 |
| - | python -snake | 排除术语 |
| OR | cat OR dog | 任一术语 |
时间筛选器
| 参数 | 描述 |
|---|
| tbs=qdr:h | 过去一小时 |
| tbs=qdr:d |
过去一天 |
| tbs=qdr:w | 过去一周 |
| tbs=qdr:m | 过去一个月 |
| tbs=qdr:y | 过去一年 |
隐私引擎
- - DuckDuckGo:无追踪
- Startpage:谷歌结果+隐私保护
- Brave:独立索引
- Qwant:符合欧盟GDPR标准
快捷指令(DuckDuckGo)
GitHub |
| !so | Stack Overflow |
| !w | 维基百科 |
| !yt | YouTube |
WolframAlpha查询
- - 数学:integrate x^2 dx
- 转换:100 USD to CNY
- 股票:AAPL stock
- 天气:weather in Beijing
文档
- - references/advanced-search.md - 国内搜索指南
- references/international-search.md - 国际搜索指南
- CHANGELOG.md - 版本历史
许可证
MIT
安全与隐私声明
Cookie处理
- - 目的:Cookie仅用于在被拒绝访问(403/429错误)时维护搜索会话状态
- 存储:Cookie严格在运行时内存中保存 - 从不持久化到磁盘或配置文件
- 获取:仅在搜索请求失败时从搜索引擎主页按需获取Cookie
- 范围:仅捕获来自特定搜索引擎域的会话Cookie
- 生命周期:搜索会话完成后立即清除Cookie
- 无预配置:启动时不从config.json或任何外部文件加载Cookie
- 无API密钥:此工具使用标准网页搜索URL,无需认证
抓取伦理
- - 速率限制:在请求之间实施合理延迟(建议1-2秒)
- 尊重robots.txt:遵守搜索引擎抓取策略
- 服务条款:用户负责遵守搜索引擎服务条款
- 目的:设计用于合法搜索聚合,非大规模数据抓取
数据处理
- - 无个人数据:工具不收集或传输用户个人信息
- 本地执行:所有操作在本地运行,无外部数据传输
- 会话隔离:Cookie为会话特定,使用后清除