System Restoration
Comprehensive guide for restoring Advantage HPE's operational intelligence systems when they fail or go down.
Investigation Workflow
1. System Status Assessment
Before fixing anything, map out what's broken:
Core Intelligence Systems:
- 1. Zero Revenue Alerts → #margin-alerts (Every 30 min)
- Morning Pulse → #manager-nudges (Daily 6:35 AM)
- Live Nudges → #manager-nudges (Every 15 min)
- Material Truth Report → #material-intel-systems (Daily 7:00 AM)
- Friend-Zone Reformatter → #live-ops (ServiceTitan email alerts)
Investigation Commands:
CODEBLOCK0
2. Locate Code & Determine Failure Cause
Common Locations:
- -
/Users/stephendobbins/.config/ranger/scripts/ - Main operational scripts - INLINECODE1 - Material intelligence
- INLINECODE2 - Recent scripts & fixes
- INLINECODE3 - LaunchD service definitions
Common Failure Patterns:
- - LaunchD services unloaded - Emergency shutdown or system restart
- Data source broken - ServiceTitan API returning wrong data
- Scheduling missing - Functions exist but no cron/LaunchD trigger
- Script errors - Import failures, credential issues
System-Specific Restoration
Zero Revenue Alerts
Script: /Users/stephendobbins/.config/ranger/scripts/margin_alerts.py
Channel: #margin-alerts (C0A5L7MG60P)
Schedule: Every 30 minutes
Restoration Steps:
- 1. Verify script exists and posts to Slack
- Load LaunchD service: INLINECODE5
- Test manually: INLINECODE6
- Check logs: INLINECODE7
Morning Pulse
Script: /Users/stephendobbins/.config/ranger/scripts/pulse_os_full.py
Channel: #manager-nudges (C0A5V9JL2KV)
Schedule: Daily 6:35 AM CT
Restoration Steps:
- 1. If broken API data: Check for
.bak backup with working data sources - Restore backup: INLINECODE10
- Fix data sources: Replace API calls with browser automation (see references/browser-data-sources.md)
- Load LaunchD service: INLINECODE11
- Test: INLINECODE12
Live Nudges
Script: /Users/stephendobbins/.config/ranger/scripts/pulse_os_full.py nudges
Channel: #manager-nudges
Schedule: Every 15 minutes
Function: run_nudges() on line 548-617
Features: 🚗 dispatched / 📍 arrived / ✅ completed alerts
Restoration Steps:
- 1. Verify function exists: INLINECODE15
- Create LaunchD service (see scripts/create-live-nudges-service.py)
- Load service: INLINECODE16
- Test: INLINECODE17
Material Truth Report
Script: /Users/stephendobbins/.config/ranger/materials/reconciliation_report.py
Channel: #material-intel-systems (C0A5L7RB5EK)
Schedule: Daily 7:00 AM CT
Restoration Steps:
- 1. Test script: INLINECODE19
- Create cron job with 7:00 AM schedule
- Verify channel posting
Data Source Repair
ServiceTitan API vs UI Data
Problem: ServiceTitan API often returns test/historical data instead of real operational data.
Solution: Replace API calls with browser automation:
- 1. Create browser data source module (see scripts/browserdatasources.py)
- Import in main script: Replace parse functions with browser equivalents
- Preserve output format - Same sections, different data source
Browser Data Functions:
- - INLINECODE20
- INLINECODE21
- INLINECODE22
- INLINECODE23
KEEL System Issues
Script: INLINECODE24
Safe restart for field tech DM only:
- 1. Disable operational intelligence: Set INLINECODE25
- Restart process: INLINECODE26
- Verify running: INLINECODE27
Service Management Commands
LaunchD Services
CODEBLOCK1
Cron Jobs (OpenClaw)
CODEBLOCK2
Emergency Shutdown Recovery
When systems are emergency-stopped due to bad data:
- 1. Investigate root cause - Usually ServiceTitan API data issues
- Fix data sources - Switch to browser automation or correct API endpoints
- Test manually - Verify data accuracy before re-enabling
- Restore services - Load LaunchD services and cron jobs
- Monitor initially - Check logs and channel posts for accuracy
Resources
scripts/
- -
create-live-nudges-service.py - Generate LaunchD plist for live nudges - INLINECODE29 - Browser automation replacement for broken APIs
references/
- -
launchd-service-templates.md - LaunchD plist templates for different schedules - INLINECODE31 - Slack channel IDs for all operational intelligence channels
- INLINECODE32 - Step-by-step debugging guide
系统恢复
当 Advantage HPE 的运营智能系统发生故障或宕机时,进行恢复的全面指南。
调查工作流程
1. 系统状态评估
在修复任何问题之前,先查明哪些部分出现故障:
核心智能系统:
- 1. 零收入警报 → #margin-alerts(每30分钟)
- 晨间脉搏 → #manager-nudges(每天上午6:35)
- 实时提醒 → #manager-nudges(每15分钟)
- 物料真相报告 → #material-intel-systems(每天上午7:00)
- 好友区格式化工具 → #live-ops(ServiceTitan 邮件提醒)
调查命令:
bash
检查 LaunchD 服务
launchctl list | grep ranger
检查 cron 任务
cron list
检查正在运行的进程
ps aux | grep -E (keel|pulse|margin|nudge) | grep -v grep
查找系统代码
find /Users/stephendobbins/.config/ranger -name *.py | grep -E (pulse|margin|nudge)
find /Users/stephendobbins/.openclaw/workspace -name *.py | grep -E (zero|revenue)
2. 定位代码并确定故障原因
常见位置:
- - /Users/stephendobbins/.config/ranger/scripts/ - 主要运营脚本
- /Users/stephendobbins/.config/ranger/materials/ - 物料智能
- /Users/stephendobbins/.openclaw/workspace/ - 近期脚本和修复
- /Users/stephendobbins/Library/LaunchAgents/ - LaunchD 服务定义
常见故障模式:
- - LaunchD 服务已卸载 - 紧急关闭或系统重启
- 数据源损坏 - ServiceTitan API 返回错误数据
- 调度缺失 - 函数存在但无 cron/LaunchD 触发器
- 脚本错误 - 导入失败、凭据问题
系统特定恢复
零收入警报
脚本: /Users/stephendobbins/.config/ranger/scripts/margin_alerts.py
频道: #margin-alerts (C0A5L7MG60P)
调度: 每30分钟
恢复步骤:
- 1. 验证脚本存在并能发布到 Slack
- 加载 LaunchD 服务:launchctl load /Users/stephendobbins/Library/LaunchAgents/com.ranger.margin-alerts.plist
- 手动测试:cd /Users/stephendobbins/.config/ranger/scripts && python3 marginalerts.py
- 检查日志:tail /tmp/marginalerts.log
晨间脉搏
脚本: /Users/stephendobbins/.config/ranger/scripts/pulseosfull.py
频道: #manager-nudges (C0A5V9JL2KV)
调度: 每天上午6:35 CT
恢复步骤:
- 1. 如果 API 数据损坏: 检查是否有包含可用数据源的 .bak 备份
- 恢复备份: cp pulseosfull.py.bak pulseosfull.py
- 修复数据源: 用浏览器自动化替换 API 调用(参见 references/browser-data-sources.md)
- 加载 LaunchD 服务:launchctl load /Users/stephendobbins/Library/LaunchAgents/com.ranger.morning-pulse.plist
- 测试:python3 pulseosfull.py pulse
实时提醒
脚本: /Users/stephendobbins/.config/ranger/scripts/pulseosfull.py nudges
频道: #manager-nudges
调度: 每15分钟
函数: 第548-617行的 run_nudges()
功能: 🚗 已派单 / 📍 已到达 / ✅ 已完成 提醒
恢复步骤:
- 1. 验证函数存在:grep -n def runnudges pulseosfull.py
- 创建 LaunchD 服务(参见 scripts/create-live-nudges-service.py)
- 加载服务:launchctl load /Users/stephendobbins/Library/LaunchAgents/com.ranger.live-nudges.plist
- 测试:python3 pulseos_full.py nudges
物料真相报告
脚本: /Users/stephendobbins/.config/ranger/materials/reconciliation_report.py
频道: #material-intel-systems (C0A5L7RB5EK)
调度: 每天上午7:00 CT
恢复步骤:
- 1. 测试脚本:cd /Users/stephendobbins/.config/ranger/materials && python3 reconciliation_report.py --no-email
- 创建 cron 任务,设置上午7:00调度
- 验证频道发布
数据源修复
ServiceTitan API 与 UI 数据
问题: ServiceTitan API 经常返回测试/历史数据,而非真实运营数据。
解决方案: 用浏览器自动化替换 API 调用:
- 1. 创建浏览器数据源模块(参见 scripts/browserdatasources.py)
- 在主脚本中导入: 用浏览器等效函数替换解析函数
- 保持输出格式 - 相同部分,不同数据源
浏览器数据函数:
- - getbrowserlowmarginjobs()
- getbrowserstaleestimates()
- getbrowserrevenueleaks()
- getbrowserdriver_incidents()
KEEL 系统问题
脚本: /Users/stephendobbins/.config/ranger/keel/keelslackbot.py
仅限现场技术人员的 DM 安全重启:
- 1. 禁用运营智能: 设置 OPERATIONALINTELLIGENCEENABLED = False
- 重启进程: cd /Users/stephendobbins/.config/ranger/keel && python3 keelslackbot.py &
- 验证运行: ps aux | grep keelslackbot
服务管理命令
LaunchD 服务
bash
列出服务
launchctl list | grep ranger
加载服务
launchctl load /Users/stephendobbins/Library/LaunchAgents/com.ranger.
.plist
卸载服务
launchctl unload /Users/stephendobbins/Library/LaunchAgents/com.ranger..plist
立即启动服务
launchctl start com.ranger.
检查服务日志
tail /tmp/.log
tail /tmp/.err
Cron 任务(OpenClaw)
bash
列出任务
cron list
添加任务
cron add
移除任务
cron remove
紧急关闭恢复
当系统因数据错误被紧急停止时:
- 1. 调查根本原因 - 通常是 ServiceTitan API 数据问题
- 修复数据源 - 切换到浏览器自动化或正确的 API 端点
- 手动测试 - 在重新启用前验证数据准确性
- 恢复服务 - 加载 LaunchD 服务和 cron 任务
- 初始监控 - 检查日志和频道发布内容的准确性
资源
scripts/
- - create-live-nudges-service.py - 为实时提醒生成 LaunchD plist
- browserdatasources.py - 用于损坏 API 的浏览器自动化替代方案
references/
- - launchd-service-templates.md - 不同调度的 LaunchD plist 模板
- channel-ids.md - 所有运营智能频道的 Slack 频道 ID
- troubleshooting-checklist.md - 逐步调试指南