Network Log Analysis
Device-level syslog analysis and forensic timeline construction without
SIEM platforms. This skill covers raw log data from rsyslog/syslog-ng
collectors, device console output, and SNMP trap receivers — all
analysis uses standard Unix tools (grep, awk, sort, sed).
For environments with a SIEM platform, use the companion skill
siem-log-analysis which provides the same forensic reasoning with
platform-specific query syntax.
Reference references/cli-reference.md for rsyslog/syslog-ng
configuration, device syslog commands, and log parsing one-liners.
Reference references/syslog-patterns.md for vendor-specific message
formats, the RFC 5424 facility/severity matrix, and common event
pattern catalogs.
When to Use
- - No SIEM available — investigate network events using only raw
syslog files on a centralized collector or individual device logs
- - Syslog infrastructure audit — verify that rsyslog or syslog-ng
is correctly receiving, routing, and retaining logs from all network
devices in scope
- - Multi-device event correlation — construct a unified timeline
from separate per-device or per-facility log files using timestamp
sorting and pattern matching
- - Anomaly investigation — identify deviations from normal log
volume, new message types, or authentication failure clusters
without statistical query engines
- - Post-incident timeline reconstruction — assemble a chronological
evidence chain from raw logs after a network outage or security event
- - Log retention compliance — verify that log rotation policies
and retention periods meet organizational or regulatory requirements
Prerequisites
- - Syslog collector access — SSH or console access to the
rsyslog/syslog-ng server with read permissions on log directories
(typically
/var/log/ or custom paths defined in collector config)
- - Device CLI access — Read-only credentials for network devices
to verify syslog forwarding configuration and NTP synchronization
status
- - Unix tool availability —
grep, awk, sort, sed, wc,
and
date available on the syslog collector (standard on any
Linux/BSD system)
- - NTP verification — Confirm time synchronization across all
network devices and the syslog collector before multi-device
correlation; skewed clocks corrupt timeline accuracy
- - Log file identification — Know the log file paths, naming
conventions, and rotation schedule on the collector; rsyslog and
syslog-ng route logs differently based on facility, severity, and
source address
Procedure
Follow these six steps sequentially. The procedure builds a forensic
timeline from raw syslog evidence through pattern recognition,
correlation, anomaly detection, and chronological reconstruction.
Each step produces artifacts that feed subsequent steps.
Step 1: Log Collection Assessment
Verify that the syslog infrastructure is complete and healthy before
analyzing log content. Missing or misconfigured sources create blind
spots that invalidate investigation conclusions.
Syslog server configuration — Examine the collector configuration
to understand how logs are routed:
- - [rsyslog] — Review
/etc/rsyslog.conf and INLINECODE10
for input modules (imudp, imtcp, imrelp), facility/severity routing
rules, and output file templates. Confirm that
$ActionFileDefaultTemplate
includes the source hostname for multi-device disambiguation.
- - [syslog-ng] — Review
/etc/syslog-ng/syslog-ng.conf for source
definitions (network listeners), filter chains (facility, severity,
host match), and destination paths. Verify that
keep-hostname(yes)
preserves the originating device hostname.
Device syslog verification — On each in-scope network device,
confirm syslog forwarding is active and targeting the correct collector:
- - [Cisco] —
show logging confirms logging host, trap level, and
facility;
show logging history shows recent buffered severity counts
- - [JunOS] —
show system syslog shows configured host targets,
facility filters, and structured-data enablement
- - [EOS] —
show logging shows syslog server address, logging level,
and protocol (UDP/TCP)
NTP synchronization check — Verify on each device:
- - [Cisco] —
show ntp status (stratum, offset) - [JunOS] —
show system ntp (peer status, offset) - [EOS] —
show ntp status (clock state, stratum)
Devices with NTP offset exceeding 1 second require timestamp correction
before correlation.
Log retention and rotation — Check logrotate configuration for
retention period, compression, and file size limits. Confirm the
retention window covers the investigation period. Missing rotated
files indicate evidence gaps.
Step 2: Syslog Pattern Recognition
Parse raw syslog messages into structured fields using vendor-specific
format knowledge. Pattern recognition transforms unstructured text into
correlatable evidence.
Vendor message formats:
- - [Cisco] —
%FACILITY-SEVERITY-MNEMONIC: message. Facility
identifies the subsystem (LINEPROTO, OSPF, SEC), severity is 0–7
(RFC 5424), mnemonic is the event identifier.
- - [JunOS] —
hostname process[pid]: EVENT_ID: message. When
structured-data is enabled, adds
[tag value] pairs.
- - [EOS] —
hostname AgentName: %FACILITY-SEVERITY-message.
Agent name identifies the subsystem (Ebra, Bgp, Ospf).
Severity classification — Extract the severity digit and map to
RFC 5424 levels (0=Emergency through 7=Debug). Filter scope to
severity 0–4 (Emergency through Warning) for operationally significant
events. Use severity 5–7 only when lower severities lack context.
Message frequency baseline — Count messages per facility per hour
to establish normal volume:
CODEBLOCK0
This produces a timestamp-grouped frequency table. Significant
deviations from the hourly mean indicate events worth investigating.
Facility-to-subsystem mapping — Map syslog facility codes to
network subsystems using references/syslog-patterns.md. Facility
local0–local7 assignments vary per organization — check the rsyslog
routing rules from Step 1 to decode local facility meanings.
Step 3: Event Correlation
Join events from multiple devices and log files by shared attributes
to build investigation threads from isolated messages.
Multi-device timeline via grep/awk/sort — Merge per-device log
files into a single chronologically sorted stream:
CODEBLOCK1
If log files use different timestamp formats, normalize first with
awk before merging (see Step 5 for timestamp normalization details).
Temporal clustering — Identify events within a configurable time
window of a known trigger event. For precision, convert timestamps to
epoch seconds and select events within the desired window using awk
numeric comparison (see references/cli-reference.md for correlation
helpers).
Causal chain detection — Network failures propagate through
protocol dependencies in predictable patterns: interface flap
(LINK-3-UPDOWN) → OSPF neighbor down (OSPF-5-ADJCHG) → BGP route
withdrawal (BGP-5-ADJCHANGE) → traffic reroute on alternate paths.
Search for cascading patterns by extracting events matching each stage
and verifying temporal sequence.
SNMP trap correlation — If the collector receives SNMP traps
(via snmptrapd), correlate trap OIDs with syslog events. Interface
linkDown traps (OID 1.3.6.1.6.3.1.1.5.3) should pair with LINK-UPDOWN
syslog messages from the same device. Mismatches indicate logging gaps.
Step 4: Anomaly Detection
Compare observed log patterns against baselines to surface deviations
that warrant investigation. All detection uses grep, awk, and sort
against raw log files.
Baseline deviation — Compare current-day event counts per device
against the rolling 7-day average. A current-day count exceeding
twice the average signals a volume anomaly. Calculate per-facility
counts to pinpoint which subsystem generates excess messages.
New or unseen message types — Extract unique mnemonics from
the investigation window and compare against a baseline file:
CODEBLOCK2
Mnemonics present in current but absent from baseline are first-seen
events requiring classification.
Authentication failure clustering — Extract auth messages and
group by source IP using grep/awk/sort (see references/cli-reference.md
for one-liners). Source IPs with failure counts exceeding 10 per hour
warrant investigation as potential brute-force attempts.
Config changes outside maintenance windows — Filter for config
change messages (SYS-5-CONFIGI, UICOMMIT) and check timestamps
against the approved schedule. Changes outside the window require
attribution — who changed what, and was it authorized.
Login source IP anomalies — Extract management session source IPs
and compare against the authorized management subnet list. IPs outside
known ranges indicate unauthorized access attempts.
Step 5: Timeline Reconstruction
Assemble a definitive chronological event sequence from all evidence
gathered in Steps 1–4. The timeline is the primary deliverable of
forensic log analysis.
Chronological assembly — Merge relevant events from multiple
log sources into a single sorted output:
CODEBLOCK3
Use sort -s -k1,3 (stable sort) to preserve original order of
events with identical timestamps.
NTP-aware timestamp normalization — If devices log in different
timezones or formats, normalize all timestamps to UTC epoch seconds
before sorting. Convert RFC 3164 timestamps with awk piped to date
(see references/cli-reference.md for conversion one-liners). Apply
the NTP offset correction from Step 1 to events from devices with
known clock drift. For BSD/macOS, use date -jf instead of date -d.
Event-to-impact mapping — For each significant event in the
timeline, annotate the user-visible impact:
- 1. Identify the event (e.g.,
OSPF-5-ADJCHG neighbor down) - Determine the network impact (loss of redundant path)
- Map to the user symptom (degraded connectivity or failover latency)
Root cause ordering — Walk the timeline backward from the
user-reported symptom to the earliest causal event. The root cause
is the first event that, if prevented, would have prevented all
downstream effects. Document the causal chain with event references
for each link.
Step 6: Report
Compile all findings into a structured deliverable. Present the event
timeline from Step 5 as the central artifact — annotate each entry
with its classification (root cause, contributing factor, symptom,
or informational). Summarize anomaly findings from Step 4 with counts
and severity assessments. Document correlation chains from Step 3
with supporting evidence. State the root cause assessment with
confidence level and the supporting evidence chain. Include an
integrity section listing evidence gaps that limit conclusions.
Threshold Tables
Log Volume Anomaly Thresholds
| Metric | Normal | Warning (>1.5×) | Alert (>2×) | Critical (>3×) |
|---|
| Messages per hour (per device) | Baseline ± 50% | 1.5–2× baseline | 2–3× baseline | >3× baseline |
| Unique mnemonics per day |
Baseline count | 1–3 new mnemonics | 4–10 new mnemonics | >10 new mnemonics |
| Auth failure events (per source IP) | ≤3/hour | 4–10/hour | 11–50/hour | >50/hour |
| Config change events (per device) | ≤2/day during windows | Any outside window | 3+ outside window | >5 outside window |
| SNMP trap rate (per device) | ≤5/hour | 6–20/hour | 21–100/hour | >100/hour |
Syslog Severity Response Matrix
| Severity | RFC 5424 Level | Investigation Action |
|---|
| 0 — Emergency | System unusable | Immediate investigation, all-hands |
| 1 — Alert |
Immediate action needed | Priority investigation within 15 minutes |
| 2 — Critical | Critical conditions | Investigation within 1 hour |
| 3 — Error | Error conditions | Investigation within 4 hours |
| 4 — Warning | Warning conditions | Review within 24 hours |
| 5 — Notice | Normal but significant | Log for trending, review weekly |
| 6 — Info | Informational | Baseline data, no action |
| 7 — Debug | Debug-level | Exclude from standard analysis |
Correlation Confidence Levels
| Confidence | Criteria | Action |
|---|
| Confirmed | 3+ events across 2+ devices with matching attributes and <60s window | Treat as established fact in report |
| Probable |
2 correlated events or single-device chain with supporting evidence | Include in report with qualification |
|
Possible | Single event or loose time correlation (>5 min window) | Note as hypothesis, do not assert as finding |
Decision Trees
Investigation Entry Point
CODEBLOCK4
Anomaly Classification
CODEBLOCK5
Report Template
CODEBLOCK6
Troubleshooting
Missing Device Logs on Collector
Symptom: Expected device logs are absent from syslog collector files
despite the device being configured to forward syslog.
Diagnosis: Verify syslog configuration on the device (see Step 1
commands). Check network path — firewall rules may block UDP 514 or
TCP 514 between the device and collector. On the collector, check
rsyslog/syslog-ng for dropped messages: rsyslogd logs input errors
to its own syslog facility. Verify the collector is listening on the
expected port with ss -ulnp | grep 514.
Resolution: Fix the forwarding path (device config, network ACLs,
collector listener). Generate a test message from the device and confirm
receipt. Document the gap period in the investigation report.
Timestamp Format Inconsistencies
Symptom: Merged log files produce an unsortable timeline because
timestamp formats differ (RFC 3164 vs RFC 5424 vs device-specific).
Diagnosis: Inspect the first 10 lines of each source. RFC 3164
uses Mmm dd HH:MM:SS (no year); RFC 5424 uses ISO 8601 with timezone.
Resolution: Write an awk normalizer for each format (see Step 5
and references/cli-reference.md). Add the year to RFC 3164 timestamps
based on file modification date or logrotate naming convention.
Log Rotation Destroyed Evidence
Symptom: Investigation period extends beyond the oldest available
log file.
Diagnosis: Check /etc/logrotate.d/ for retention and compression
settings. Look for compressed archives (.gz, .bz2, .xz).
Resolution: Search within compressed files using zgrep or
zcat | grep. If data is permanently lost, document the gap and state
which conclusions are limited by missing evidence.
High Log Volume Makes grep Impractical
Symptom: Multi-GB log files make interactive grep analysis
impractically slow.
Resolution: Narrow to the investigation window with a date-based
grep first, redirect to a working file, then apply detailed analysis
to the smaller extract. Use LC_ALL=C grep for faster processing.
Consider GNU parallel for multi-file analysis.
网络日志分析
无需SIEM平台的设备级系统日志分析和取证时间线构建。本技能涵盖来自rsyslog/syslog-ng收集器、设备控制台输出和SNMP陷阱接收器的原始日志数据——所有分析均使用标准Unix工具(grep、awk、sort、sed)。
对于使用SIEM平台的环境,请参考配套技能siem-log-analysis,该技能使用特定于平台的查询语法提供相同的取证推理。
参考references/cli-reference.md了解rsyslog/syslog-ng配置、设备系统日志命令和日志解析单行命令。参考references/syslog-patterns.md了解供应商特定的消息格式、RFC 5424设施/严重性矩阵和常见事件模式目录。
何时使用
- - 无可用SIEM — 仅使用集中式收集器上的原始系统日志文件或单个设备日志调查网络事件
- 系统日志基础设施审计 — 验证rsyslog或syslog-ng是否正确接收、路由和保留范围内所有网络设备的日志
- 多设备事件关联 — 使用时间戳排序和模式匹配,从独立的每设备或每设施日志文件构建统一时间线
- 异常调查 — 在无统计查询引擎的情况下,识别与正常日志量的偏差、新消息类型或认证失败集群
- 事后时间线重建 — 在网络中断或安全事件后,从原始日志中组装按时间顺序的证据链
- 日志保留合规性 — 验证日志轮换策略和保留期限是否符合组织或法规要求
前提条件
- - 系统日志收集器访问权限 — 对rsyslog/syslog-ng服务器的SSH或控制台访问权限,具有日志目录的读取权限(通常为/var/log/或收集器配置中定义的自定义路径)
- 设备CLI访问权限 — 网络设备的只读凭据,用于验证系统日志转发配置和NTP同步状态
- Unix工具可用性 — 系统日志收集器上可用的grep、awk、sort、sed、wc和date(任何Linux/BSD系统标配)
- NTP验证 — 在进行多设备关联之前,确认所有网络设备和系统日志收集器之间的时间同步;时钟偏差会破坏时间线准确性
- 日志文件识别 — 了解收集器上的日志文件路径、命名约定和轮换计划;rsyslog和syslog-ng根据设施、严重性和源地址以不同方式路由日志
操作步骤
按顺序执行以下六个步骤。该过程通过模式识别、关联、异常检测和按时间顺序重建,从原始系统日志证据构建取证时间线。每个步骤产生的工件将供给后续步骤使用。
步骤1:日志收集评估
在分析日志内容之前,验证系统日志基础设施是否完整且健康。缺失或配置错误的源会造成盲点,使调查结论无效。
系统日志服务器配置 — 检查收集器配置以了解日志的路由方式:
- - [rsyslog] — 检查/etc/rsyslog.conf和/etc/syslog-ng.d/*.conf中的输入模块(imudp、imtcp、imrelp)、设施/严重性路由规则和输出文件模板。确认$ActionFileDefaultTemplate包含源主机名以进行多设备区分。
- [syslog-ng] — 检查/etc/syslog-ng/syslog-ng.conf中的源定义(网络监听器)、过滤链(设施、严重性、主机匹配)和目标路径。验证keep-hostname(yes)是否保留原始设备主机名。
设备系统日志验证 — 在每个范围内的网络设备上,确认系统日志转发已启用并指向正确的收集器:
- - [Cisco] — show logging确认日志主机、陷阱级别和设施;show logging history显示最近的缓冲严重性计数
- [JunOS] — show system syslog显示配置的主机目标、设施过滤器和结构化数据启用状态
- [EOS] — show logging显示系统日志服务器地址、日志级别和协议(UDP/TCP)
NTP同步检查 — 在每个设备上验证:
- - [Cisco] — show ntp status(层级、偏移量)
- [JunOS] — show system ntp(对等状态、偏移量)
- [EOS] — show ntp status(时钟状态、层级)
NTP偏移量超过1秒的设备需要在关联前进行时间戳校正。
日志保留和轮换 — 检查logrotate配置中的保留期限、压缩和文件大小限制。确认保留窗口覆盖调查期间。缺失的轮换文件表示证据缺口。
步骤2:系统日志模式识别
使用供应商特定的格式知识将原始系统日志消息解析为结构化字段。模式识别将非结构化文本转换为可关联的证据。
供应商消息格式:
- - [Cisco] — %FACILITY-SEVERITY-MNEMONIC: message。设施标识子系统(LINEPROTO、OSPF、SEC),严重性为0–7(RFC 5424),助记符为事件标识符。
- [JunOS] — hostname process[pid]: EVENT_ID: message。当启用structured-data时,添加[tag value]对。
- [EOS] — hostname AgentName: %FACILITY-SEVERITY-message。代理名称标识子系统(Ebra、Bgp、Ospf)。
严重性分类 — 提取严重性数字并映射到RFC 5424级别(0=紧急至7=调试)。将范围过滤为严重性0–4(紧急至警告)以获取操作上重要的事件。仅当较低严重性缺乏上下文时才使用严重性5–7。
消息频率基线 — 按设施每小时计数消息以建立正常量:
awk {print $1, $2, $3} /var/log/network.log | sort | uniq -c | sort -rn
这将生成一个按时间戳分组的频率表。与每小时平均值的显著偏差表示值得调查的事件。
设施到子系统映射 — 使用references/syslog-patterns.md将系统日志设施代码映射到网络子系统。设施local0–local7的分配因组织而异——检查步骤1中的rsyslog路由规则以解码本地设施含义。
步骤3:事件关联
通过共享属性将来自多个设备和日志文件的事件连接起来,从孤立消息构建调查线索。
通过grep/awk/sort实现的多设备时间线 — 将每设备日志文件合并为单个按时间顺序排序的流:
cat /var/log/rtr.log /var/log/sw.log | sort -k1,3 > /tmp/merged-timeline.log
如果日志文件使用不同的时间戳格式,在合并前先用awk进行规范化(有关时间戳规范化详细信息,请参见步骤5)。
时间聚类 — 识别已知触发事件可配置时间窗口内的事件。为精确起见,将时间戳转换为纪元秒,并使用awk数值比较选择所需窗口内的事件(有关关联辅助工具,请参见references/cli-reference.md)。
因果链检测 — 网络故障通过协议依赖关系以可预测的模式传播:接口抖动(LINK-3-UPDOWN)→ OSPF邻居断开(OSPF-5-ADJCHG)→ BGP路由撤销(BGP-5-ADJCHANGE)→ 备用路径上的流量重路由。通过提取匹配每个阶段的事件并验证时间顺序来搜索级联模式。
SNMP陷阱关联 — 如果收集器接收SNMP陷阱(通过snmptrapd),将陷阱OID与系统日志事件关联。接口linkDown陷阱(OID 1.3.6.1.6.3.1.1.5.3)应与来自同一设备的LINK-UPDOWN系统日志消息配对。不匹配表示日志记录缺口。
步骤4:异常检测
将观察到的日志模式与基线进行比较,以发现需要调查的偏差。所有检测均使用grep、awk和sort对原始日志文件进行操作。
基线偏差 — 将每个设备当前日的事件计数与滚动7天平均值进行比较。当前日计数超过平均值两倍表示量异常。按设施计算计数以确定哪个子系统生成过多消息。
新的或未见过的消息类型 — 从调查窗口中提取唯一的助记符,并与基线文件进行比较:
grep -oP %\S+-\d-\S+ /var/log/cisco.log | sort -u > /tmp/current.txt
comm -23 /tmp/current.txt /tmp/baseline-mnemonics.txt
当前存在但基线中不存在的助记符是需要分类的首次见到的事件。
认证失败聚类 — 使用grep/awk/sort提取认证消息并按源IP分组(有关单行命令,请参见references/cli-reference.md)。失败计数超过每小时10次的源IP需要作为潜在的暴力破解尝试进行调查。
维护窗口外的配置更改 — 过滤配置更改消息(SYS-5-CONFIGI、UICOMMIT)并检查时间戳是否在批准的计划