Reconnection (Always Forget)
- - Connections drop silently—TCP FIN may never arrive; don't assume
onclose fires - Exponential backoff: 1s, 2s, 4s, 8s... cap at 30s—prevents thundering herd on server recovery
- Add jitter:
delay * (0.5 + Math.random())—prevents synchronized reconnection storms - Track reconnection state—queue messages during reconnect, replay after
- Max retry limit then surface error to user—don't retry forever silently
Heartbeats (Critical)
- - Ping/pong frames at protocol level—browser doesn't expose; use application-level ping
- Send ping every 30s, expect pong within 10s—no pong = connection dead, reconnect
- Server should ping too—detects dead clients, cleans up resources
- Idle timeout in proxies (60-120s typical)—heartbeat must be more frequent
- Don't rely on TCP keepalive—too infrequent, not reliable through proxies
Connection State
- -
readyState: 0=CONNECTING, 1=OPEN, 2=CLOSING, 3=CLOSED—check before sending - Buffer messages while CONNECTING—send after OPEN
- INLINECODE3 shows queued bytes—pause sending if backpressure building
- Multiple tabs = multiple connections—coordinate via BroadcastChannel or SharedWorker
Authentication
- - Token in URL query:
wss://host/ws?token=xxx—simple but logged in access logs - First message auth: connect, send token, wait for ack—cleaner but more round trips
- Cookie auth: works if same origin—but no custom headers in WebSocket
- Reauthenticate after reconnect—don't assume previous session valid
Scaling Challenges
- - WebSocket connections are stateful—can't round-robin between servers
- Sticky sessions: route by client ID to same server—or use Redis pub/sub for broadcast
- Each connection holds memory—thousands of connections = significant RAM
- Graceful shutdown: send close frame, wait for clients to reconnect elsewhere
Nginx/Proxy Config
CODEBLOCK0
- - Without these headers, upgrade fails—connection closes immediately
- INLINECODE5 must exceed your ping interval—default 60s too short
- Load balancer health checks: separate HTTP endpoint, not WebSocket
Close Codes
- - 1000: normal closure; 1001: going away (page close)
- 1006: abnormal (no close frame received)—usually network issue
- 1008: policy violation; 1011: server error
- 4000-4999: application-defined—use for auth failure, rate limit, etc.
- Always send close code and reason—helps debugging
Message Handling
- - Text frames for JSON; binary frames for blobs/protobuf—don't mix without framing
- No guaranteed message boundaries in TCP—but WebSocket handles framing for you
- Order preserved per connection—messages arrive in send order
- Large messages may fragment—library handles reassembly; set max message size server-side
Security
- - Validate Origin header on handshake—prevent cross-site WebSocket hijacking
- Same-origin policy doesn't apply—any page can connect to your WebSocket server
- Rate limit per connection—one client can flood with messages
- Validate every message—malicious clients can send anything after connecting
Common Mistakes
- - No heartbeat—connection appears alive but is dead; messages go nowhere
- Reconnect without backoff—hammers server during outage, prolongs recovery
- Storing state only in connection—lost on reconnect; persist critical state externally
- Huge messages—blocks event loop; stream large data via chunking
- Not handling
bufferedAmount—memory grows unbounded if client slower than server
重连(总是忘记)
- - 连接会静默断开——TCP FIN包可能永远不会到达;不要假设onclose一定会触发
- 指数退避:1秒、2秒、4秒、8秒……上限30秒——防止服务器恢复时出现惊群效应
- 添加抖动:delay * (0.5 + Math.random())——防止同步重连风暴
- 追踪重连状态——重连期间将消息加入队列,重连后重放
- 设置最大重试次数,然后将错误呈现给用户——不要静默地无限重试
心跳(至关重要)
- - 协议层的Ping/Pong帧——浏览器不暴露;使用应用层的心跳
- 每30秒发送一次Ping,期望10秒内收到Pong——无Pong = 连接已死,触发重连
- 服务器也应发送Ping——检测死客户端,清理资源
- 代理中的空闲超时(通常60-120秒)——心跳频率必须更高
- 不要依赖TCP keepalive——频率太低,且通过代理时不可靠
连接状态
- - readyState:0=CONNECTING(连接中),1=OPEN(已打开),2=CLOSING(关闭中),3=CLOSED(已关闭)——发送前检查
- 在CONNECTING状态时缓冲消息——OPEN后发送
- bufferedAmount显示已排队字节数——如果出现背压,暂停发送
- 多个标签页 = 多个连接——通过BroadcastChannel或SharedWorker协调
身份认证
- - URL查询参数中的Token:wss://host/ws?token=xxx——简单但会记录在访问日志中
- 首条消息认证:建立连接,发送Token,等待确认——更干净但往返次数更多
- Cookie认证:同源时有效——但WebSocket中无法使用自定义头部
- 重连后重新认证——不要假设之前的会话仍然有效
扩展挑战
- - WebSocket连接是有状态的——无法在服务器间轮询分发
- 粘性会话:按客户端ID路由到同一服务器——或使用Redis的发布/订阅进行广播
- 每个连接占用内存——数千个连接 = 大量RAM消耗
- 优雅关闭:发送关闭帧,等待客户端重新连接到其他服务器
Nginx/代理配置
proxyhttpversion 1.1;
proxysetheader Upgrade $http_upgrade;
proxysetheader Connection upgrade;
proxyreadtimeout 3600s;
- - 没有这些头部,升级会失败——连接立即关闭
- proxyreadtimeout必须超过你的心跳间隔——默认60秒太短
- 负载均衡器健康检查:使用单独的HTTP端点,而非WebSocket
关闭码
- - 1000:正常关闭;1001:离开(页面关闭)
- 1006:异常(未收到关闭帧)——通常是网络问题
- 1008:策略违规;1011:服务器错误
- 4000-4999:应用自定义——用于认证失败、速率限制等
- 始终发送关闭码和原因——有助于调试
消息处理
- - 文本帧用于JSON;二进制帧用于Blob/Protobuf——没有帧协议时不要混用
- TCP中没有保证的消息边界——但WebSocket为你处理了帧封装
- 每个连接的消息顺序得到保证——消息按发送顺序到达
- 大消息可能分片——库会处理重组;在服务器端设置最大消息大小
安全性
- - 在握手时验证Origin头部——防止跨站WebSocket劫持
- 同源策略不适用——任何页面都可以连接到你的WebSocket服务器
- 每个连接进行速率限制——一个客户端可能发送大量消息
- 验证每条消息——恶意客户端连接后可以发送任何内容
常见错误
- - 没有心跳——连接看似存活实则已死;消息无处可去
- 重连没有退避——中断期间冲击服务器,延长恢复时间
- 仅在连接中存储状态——重连后丢失;将关键状态持久化到外部
- 发送超大消息——阻塞事件循环;通过分块流式传输大块数据
- 未处理bufferedAmount——如果客户端比服务器慢,内存会无限制增长