MCP Health Monitor
Automated health monitoring for MCP servers and AI services. Detects failures, auto-restarts services via launchctl, and sends Telegram alerts on incidents. Silent when everything is healthy.
Supported Check Types
| Type | Method | Auto-Restart |
|---|
| HTTP Health | INLINECODE1 endpoint check | Yes (via launchctl) |
| Process |
pgrep process detection | Yes (via launchctl) |
| Process-only |
pgrep detection | No (expects external spawn) |
Quick Start
1. Install the healthcheck script
CODEBLOCK0
2. Configure environment variables
Create or update your .env file (path configurable via ENV_FILE):
CODEBLOCK1
3. Configure services to monitor
Edit the SERVICES array in healthcheck.sh. Each entry follows this format:
CODEBLOCK2
Fields:
- -
name — Display name for logs and alerts - INLINECODE9 —
http (HTTP endpoint) or process (pgrep pattern) - INLINECODE12 — URL for http checks, or process grep pattern for process checks
- INLINECODE13 — macOS launchctl service label for auto-restart (use
none to skip restart)
Default configuration:
CODEBLOCK3
4. Set up automated scheduling (macOS LaunchAgent)
Create ~/Library/LaunchAgents/com.mcp-health-monitor.plist:
CODEBLOCK4
Load the agent:
CODEBLOCK5
5. Linux alternative (systemd timer or cron)
CODEBLOCK6
For systemd, replace launchctl stop/start calls in the script with systemctl restart.
Environment Variables
| Variable | Required | Description |
|---|
| INLINECODE18 | No | Path to .env file (default: $HOME/.env) |
| INLINECODE21 |
No | Path to log file (default:
$HOME/.local/logs/mcp-healthcheck.log) |
|
TELEGRAM_BOT_TOKEN | No | Telegram Bot API token for alerts |
|
TELEGRAM_CHAT_ID | No | Telegram chat/group ID for alerts |
|
HTTP_TIMEOUT | No | HTTP check timeout in seconds (default:
5) |
|
RESTART_DELAY | No | Seconds to wait between stop and start (default:
3) |
Log Format
Structured log entries:
CODEBLOCK7
Telegram Alert Format
Alerts are only sent when failures are detected. No notification on all-healthy runs.
CODEBLOCK8
Customization
Adding a new service
Append to the SERVICES array:
CODEBLOCK9
HTTP check with custom port
CODEBLOCK10
Disable auto-restart for a service
Set the launchctl label to none:
CODEBLOCK11
Troubleshooting
| Issue | Solution |
|---|
| Telegram alerts not sending | Verify TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID in your INLINECODE33 |
| INLINECODE34 restart fails |
Check the service label matches your
.plist file name |
| False positives on process checks | Adjust the
pgrep pattern to be more specific |
| Log file growing too large | Set up
logrotate or periodic cleanup via cron |
MCP 健康监控器
对 MCP 服务器和 AI 服务进行自动化健康监控。检测故障,通过 launchctl 自动重启服务,并在事件发生时发送 Telegram 警报。一切正常时保持静默。
支持的检查类型
| 类型 | 方法 | 自动重启 |
|---|
| HTTP 健康检查 | curl 端点检查 | 是(通过 launchctl) |
| 进程检查 |
pgrep 进程检测 | 是(通过 launchctl) |
| 仅进程检查 | pgrep 检测 | 否(依赖外部启动) |
快速开始
1. 安装健康检查脚本
bash
将脚本复制到您偏好的位置
cp scripts/healthcheck.sh ~/.local/bin/mcp-healthcheck.sh
chmod +x ~/.local/bin/mcp-healthcheck.sh
2. 配置环境变量
创建或更新您的 .env 文件(路径可通过 ENV_FILE 配置):
bash
Telegram 警报所需(可选 — 无此配置脚本仍可运行)
TELEGRAM
BOTTOKEN=您的机器人令牌
TELEGRAM
CHATID=您的聊天ID
3. 配置要监控的服务
编辑 healthcheck.sh 中的 SERVICES 数组。每个条目遵循以下格式:
名称|检查类型|目标|launchctl标签
字段说明:
- - 名称 — 用于日志和警报的显示名称
- 检查类型 — http(HTTP 端点)或 process(pgrep 模式)
- 目标 — HTTP 检查的 URL,或进程检查的 grep 模式
- launchctl标签 — macOS launchctl 服务标签,用于自动重启(使用 none 跳过重启)
默认配置:
bash
SERVICES=(
Claw-Empire|http|http://127.0.0.1:8790/api/health|com.claw-empire.server
Hermes-Gateway|process|hermes_cli.main gateway|ai.hermes.gateway
mem0-MCP|process|mem0_mcp/server.py|none
Brave-Search-MCP|process|brave-search-mcp-server|none
Context7-MCP|process|context7-mcp|none
)
4. 设置自动调度(macOS LaunchAgent)
创建 ~/Library/LaunchAgents/com.mcp-health-monitor.plist:
xml
http://www.apple.com/DTDs/PropertyList-1.0.dtd>
Label
com.mcp-health-monitor
ProgramArguments
/bin/bash
~/.local/bin/mcp-healthcheck.sh
StartInterval
300
StandardOutPath
~/.local/logs/mcp-healthcheck-stdout.log
StandardErrorPath
~/.local/logs/mcp-healthcheck-stderr.log
RunAtLoad
加载代理:
bash
launchctl load ~/Library/LaunchAgents/com.mcp-health-monitor.plist
5. Linux 替代方案(systemd 定时器或 cron)
bash
crontab -e
/5 * /path/to/mcp-healthcheck.sh
对于 systemd,将脚本中的 launchctl stop/start 调用替换为 systemctl restart。
环境变量
| 变量 | 必需 | 描述 |
|---|
| ENVFILE | 否 | .env 文件路径(默认:$HOME/.env) |
| LOGFILE |
否 | 日志文件路径(默认:$HOME/.local/logs/mcp-healthcheck.log) |
| TELEGRAM
BOTTOKEN | 否 | 用于警报的 Telegram Bot API 令牌 |
| TELEGRAM
CHATID | 否 | 用于警报的 Telegram 聊天/群组 ID |
| HTTP_TIMEOUT | 否 | HTTP 检查超时时间(秒)(默认:5) |
| RESTART_DELAY | 否 | 停止和启动之间的等待时间(秒)(默认:3) |
日志格式
结构化日志条目:
[2026-03-30 14:00:00] [Claw-Empire] [OK] 健康 (HTTP 200)
[2026-03-30 14:00:00] [mem0-MCP] [失败] 未运行 (进程未找到)
[2026-03-30 14:00:00] [系统] [警报] 检测到 1 个故障 — 正在发送通知
Telegram 警报格式
仅在检测到故障时发送警报。所有服务健康时不发送通知。
🚨 MCP 健康检查警报
2026-03-30 14:00:00
检测到故障 (1):
带有 launchctl 标签的服务已自动重启。
仅进程检查的服务等待外部重新启动。
自定义配置
添加新服务
追加到 SERVICES 数组:
bash
SERVICES+=(
My-Custom-MCP|process|my-custom-mcp-server|com.my-custom.mcp
)
使用自定义端口的 HTTP 检查
bash
SERVICES+=(
My-API|http|http://127.0.0.1:3000/health|com.my-api.server
)
禁用服务的自动重启
将 launchctl 标签设置为 none:
bash
My-Service|process|my-service-pattern|none
故障排除
| 问题 | 解决方案 |
|---|
| Telegram 警报未发送 | 验证 .env 文件中的 TELEGRAMBOTTOKEN 和 TELEGRAMCHATID |
| launchctl 重启失败 |
检查服务标签是否与您的 .plist 文件名匹配 |
| 进程检查误报 | 调整 pgrep 模式使其更精确 |
| 日志文件过大 | 设置 logrotate 或通过 cron 定期清理 |