observability-lgtm
Set up a full local observability stack (Loki + Grafana + Tempo + Prometheus + Alloy)
for FastAPI apps on macOS (Apple Silicon) or Linux. One command to start, one import
to instrument any app. Logs → Loki, metrics → Prometheus, traces → Tempo, all
unified in Grafana.
When to use
- - User is building a FastAPI web app and wants logs, metrics, and traces
- User wants a local Grafana dashboard without setting up ELK (too heavy)
- User wants to correlate logs ↔ traces ↔ metrics in one UI
- User has multiple local apps and wants universal observability
When NOT to use
- - Production cloud deployments (use managed Grafana Cloud or Datadog instead)
- Non-Python apps (the Python lib only works for FastAPI; the stack itself is language-agnostic)
- When Docker is not available
Prerequisites
- - Docker + Docker Compose v2 installed
- Python 3.10+ (for the instrumentation lib)
- FastAPI app to instrument
What gets installed
| Service | Port | Purpose |
|---|
| Grafana | 3000 | Dashboards — no login in dev mode |
| Prometheus |
9091 | Metrics scraping (avoids 9090 if MinIO running) |
| Loki | 3300 | Log storage (avoids 3100 if Langfuse running) |
| Tempo gRPC | 4317 | OTLP trace receiver |
| Tempo HTTP | 4318 | OTLP HTTP alternative |
| Alloy UI | 12345 | Agent status |
Steps
Step 1 — Check for port conflicts
CODEBLOCK0
If any of the ports above are in use, update the relevant port in docker-compose.yml
and the matching url: in config/grafana/provisioning/datasources/datasources.yml.
Common conflicts: Langfuse on 3100, MinIO on 9090.
Step 2 — Copy the stack
Copy these files from the skill directory into a projects/observability/ folder
in the workspace:
- - INLINECODE2
- INLINECODE3 (entire directory tree)
- INLINECODE4
- INLINECODE5
CODEBLOCK1
Step 3 — Start the stack
CODEBLOCK2
Wait ~15 seconds for all services to start, then verify:
CODEBLOCK3
All should return 200. If Loki or Tempo return 503, wait 10 more seconds and retry
(they have a slower startup than Grafana/Prometheus).
Step 4 — Install Python deps for the app
CODEBLOCK4
Step 5 — Instrument the FastAPI app
Add to the app's app.py (or main.py), just after app = FastAPI(...):
CODEBLOCK5
That's it. The app now:
- - Exposes
/metrics for Prometheus - Writes JSON logs to INLINECODE10
- Sends traces to Tempo on localhost:4317
Step 6 — Register with Prometheus
CODEBLOCK6
Prometheus hot-reloads the target within 30 seconds. Verify:
CODEBLOCK7
Step 7 — Open Grafana
Open http://localhost:3000
The FastAPI — App Overview dashboard is pre-loaded. Select your service from
the dropdown at the top. You'll see:
- - Request rate (req/s)
- Error rate (%)
- Latency p50/p95/p99
- Requests by endpoint
- HTTP status codes
- Live log panel (Loki)
To jump from a log line to its trace: click the trace_id link in the log detail panel.
It opens the full trace in Tempo automatically (datasource pre-wired).
Step 8 — Import additional dashboards (optional)
In Grafana → Dashboards → Import:
- - 16110 — FastAPI Observability (richer alternative to the built-in)
- 13407 — Loki Logs Overview
- 16112 — Tempo Service Graph (service dependency map)
Useful commands
CODEBLOCK8
Manual tracing (optional)
CODEBLOCK9
Log/trace correlation
The OTel instrumentation injects trace_id into every log record. Grafana Loki
is pre-configured with a derived field that turns "trace_id":"abc123" into a
clickable link to the Tempo trace.
To manually include trace context in your own log calls:
CODEBLOCK10
Notes
- - Logs are written to
projects/observability/logs/<service>/app.log as JSON.
Alloy tails these files and ships to Loki — no code changes needed beyond setup_observability().
- - All observability is local — no data leaves the machine.
- INLINECODE14 is the default for all traces/logs.
- The Alloy config drops DEBUG-level logs by default. Edit INLINECODE15
to remove the
stage.drop block if you need debug logs.
observability-lgtm
在macOS(Apple Silicon)或Linux上为FastAPI应用搭建完整的本地可观测性技术栈(Loki + Grafana + Tempo + Prometheus + Alloy)。一条命令启动,一个导入即可对任何应用进行检测。日志→Loki,指标→Prometheus,链路→Tempo,全部统一在Grafana中展示。
适用场景
- - 用户正在构建FastAPI Web应用,需要日志、指标和链路追踪
- 用户希望拥有本地Grafana仪表盘,但不想搭建ELK(过于笨重)
- 用户希望在同一个界面中关联日志↔链路↔指标
- 用户有多个本地应用,需要通用的可观测性方案
不适用场景
- - 生产环境云部署(请使用托管版Grafana Cloud或Datadog)
- 非Python应用(Python库仅适用于FastAPI;技术栈本身与语言无关)
- 当Docker不可用时
前置条件
- - 已安装Docker + Docker Compose v2
- Python 3.10+(用于检测库)
- 需要检测的FastAPI应用
安装的服务
| 服务 | 端口 | 用途 |
|---|
| Grafana | 3000 | 仪表盘——开发模式无需登录 |
| Prometheus |
9091 | 指标采集(避免与MinIO的9090端口冲突) |
| Loki | 3300 | 日志存储(避免与Langfuse的3100端口冲突) |
| Tempo gRPC | 4317 | OTLP链路接收器 |
| Tempo HTTP | 4318 | OTLP HTTP备用端口 |
| Alloy UI | 12345 | Agent状态查看 |
步骤
步骤1 — 检查端口冲突
bash
lsof -iTCP -sTCP:LISTEN -n -P 2>/dev/null | grep -E :(3000|3300|9091|4317|4318|12345) | awk {print $9, $1}
如果上述端口中有被占用的,请更新docker-compose.yml中的相应端口,以及config/grafana/provisioning/datasources/datasources.yml中对应的url:。常见冲突:Langfuse占用3100端口,MinIO占用9090端口。
步骤2 — 复制技术栈文件
将技能目录中的以下文件复制到工作区的projects/observability/文件夹中:
- - assets/docker-compose.yml
- assets/config/(整个目录树)
- assets/lib/observability.py
- assets/scripts/register_app.sh
bash
mkdir -p projects/observability
cp -r SKILL_DIR/assets/* projects/observability/
mkdir -p projects/observability/logs
touch projects/observability/logs/.gitkeep
chmod +x projects/observability/scripts/register_app.sh
步骤3 — 启动技术栈
bash
cd projects/observability
docker compose up -d
等待约15秒让所有服务启动,然后验证:
bash
curl -s -o /dev/null -w Grafana: %{http_code}\n http://localhost:3000/api/health
curl -s -o /dev/null -w Prometheus: %{http_code}\n http://localhost:9091/-/healthy
curl -s -o /dev/null -w Loki: %{http_code}\n http://localhost:3300/ready
curl -s -o /dev/null -w Tempo: %{http_code}\n http://localhost:4318/ready
所有服务应返回200。如果Loki或Tempo返回503,请等待10秒后重试(它们的启动速度比Grafana/Prometheus慢)。
步骤4 — 安装应用的Python依赖
bash
pip install \
prometheus-fastapi-instrumentator>=7.0.0 \
opentelemetry-sdk>=1.25.0 \
opentelemetry-exporter-otlp-proto-grpc>=1.25.0 \
opentelemetry-instrumentation-fastapi>=0.46b0 \
python-json-logger>=2.0.7
步骤5 — 检测FastAPI应用
在应用的app.py(或main.py)中,紧跟在app = FastAPI(...)之后添加:
python
import sys
sys.path.insert(0, path/to/projects/observability/lib)
from observability import setup_observability
logger = setupobservability(app, servicename=my-service-name)
完成。现在应用将:
- - 暴露/metrics端点供Prometheus采集
- 将JSON格式日志写入projects/observability/logs/my-service-name/app.log
- 将链路数据发送到localhost:4317的Tempo
步骤6 — 注册到Prometheus
bash
cd projects/observability
./scripts/register_app.sh my-service-name
例如:./scripts/register_app.sh image-gen-studio 7860
Prometheus会在30秒内热加载目标。验证:
bash
curl -s http://localhost:9091/api/v1/targets | python3 -c
import json, sys
data = json.load(sys.stdin)
for t in data[data][activeTargets]:
svc = t[labels].get(service, )
print(svc, ->, t[health])
步骤7 — 打开Grafana
打开 http://localhost:3000
FastAPI — 应用概览仪表盘已预加载。从顶部的下拉菜单中选择您的服务。您将看到:
- - 请求速率(req/s)
- 错误率(%)
- 延迟 p50/p95/p99
- 按端点统计的请求数
- HTTP状态码
- 实时日志面板(Loki)
要从日志行跳转到其链路:点击日志详情面板中的trace_id链接。它将自动在Tempo中打开完整的链路(数据源已预先配置)。
步骤8 — 导入其他仪表盘(可选)
在Grafana → 仪表盘 → 导入中:
- - 16110 — FastAPI可观测性(内置仪表盘的增强版)
- 13407 — Loki日志概览
- 16112 — Tempo服务图(服务依赖关系图)
常用命令
bash
注册新应用后重新加载Prometheus配置:
curl -s -X POST http://localhost:9091/-/reload
重启单个服务而不丢失数据:
docker compose -f projects/observability/docker-compose.yml restart grafana
停止所有服务(保留数据卷):
docker compose -f projects/observability/docker-compose.yml down
彻底重置(清除所有存储的数据):
docker compose -f projects/observability/docker-compose.yml down -v
查看Alloy日志传输状态:
open http://localhost:12345
手动链路追踪(可选)
python
from observability import get_tracer
tracer = get_tracer(name)
@app.get(/expensive-endpoint)
async def handler():
with tracer.startascurrent_span(db-query) as span:
span.set_attribute(db.table, users)
result = await db.query(...)
return result
日志/链路关联
OTel检测会在每条日志记录中注入traceid。Grafana Loki已预先配置了一个派生字段,可将traceid:abc123转换为可点击的Tempo链路链接。
要在自己的日志调用中手动包含链路上下文:
python
from opentelemetry import trace
def trace_ctx() -> dict:
ctx = trace.getcurrentspan().getspancontext()
return {traceid: format(ctx.traceid, 032x)} if ctx.is_valid else {}
logger.info(Processing request, extra=trace_ctx())
注意事项
- - 日志以JSON格式写入projects/observability/logs//app.log。Alloy会跟踪这些文件并将其发送到Loki——除了setupobservability()之外,无需修改代码。
- 所有可观测性数据均为本地数据——不会离开本机。
- 所有链路和日志的默认数据分类为dataclassification: LOCAL_ONLY。
- Alloy配置默认丢弃DEBUG级别的日志。如果需要调试日志,请编辑config/alloy/config.alloy删除stage.drop块。