Regex Patterns
Practical regular expression cookbook. Patterns for validation, parsing, extraction, and refactoring across JavaScript, Python, Go, and command-line tools.
When to Use
- - Validating user input (email, URL, IP, phone, dates)
- Parsing log lines or structured text
- Extracting data from strings (IDs, numbers, tokens)
- Search-and-replace in code (rename variables, update imports)
- Filtering lines in files or command output
- Debugging regexes that don't match as expected
Quick Reference
Metacharacters
| Pattern | Matches | Example |
|---|
| INLINECODE0 | Any character (except newline) | INLINECODE1 matches abc, INLINECODE3 |
| INLINECODE4 |
Digit
[0-9] |
\d{3} matches
123 |
|
\w | Word char
[a-zA-Z0-9_] |
\w+ matches
hello_123 |
|
\s | Whitespace
[ \t\n\r\f] |
\s+ matches spaces/tabs |
|
\b | Word boundary |
\bcat\b matches
cat not
scatter |
|
^ | Start of line |
^Error matches line starting with Error |
|
$ | End of line |
\.js$ matches line ending with .js |
|
\D,
\W,
\S | Negated: non-digit, non-word, non-space | |
Quantifiers
| Pattern | Meaning |
|---|
| INLINECODE26 | 0 or more (greedy) |
| INLINECODE27 |
1 or more (greedy) |
|
? | 0 or 1 (optional) |
|
{3} | Exactly 3 |
|
{2,5} | Between 2 and 5 |
|
{3,} | 3 or more |
|
*?,
+? | Lazy (match as few as possible) |
Groups and Alternation
| Pattern | Meaning |
|---|
| INLINECODE34 | Capture group |
| INLINECODE35 |
Non-capturing group |
|
(?P<name>abc) | Named group (Python) |
|
(?<name>abc) | Named group (JS/Go) |
|
a\|b | Alternation (a or b) |
|
[abc] | Character class (a, b, or c) |
|
[^abc] | Negated class (not a, b, or c) |
|
[a-z] | Range |
Lookahead and Lookbehind
| Pattern | Meaning |
|---|
| INLINECODE42 | Positive lookahead (followed by abc) |
| INLINECODE43 |
Negative lookahead (not followed by abc) |
|
(?<=abc) | Positive lookbehind (preceded by abc) |
|
(?<!abc) | Negative lookbehind (not preceded by abc) |
Validation Patterns
Email
CODEBLOCK0
URL
CODEBLOCK1
IP Addresses
CODEBLOCK2
Phone Numbers
CODEBLOCK3
Dates and Times
CODEBLOCK4
Passwords (Strength Check)
CODEBLOCK5
UUIDs
CODEBLOCK6
Semantic Version
CODEBLOCK7
Parsing Patterns
Log Lines
CODEBLOCK8
Code Patterns
CODEBLOCK9
Data Extraction
CODEBLOCK10
Language-Specific Usage
JavaScript
CODEBLOCK11
Python
CODEBLOCK12
Go
CODEBLOCK13
Command Line (grep/sed)
CODEBLOCK14
Search-and-Replace Patterns
Code Refactoring
CODEBLOCK15
Text Cleanup
CODEBLOCK16
Common Gotchas
Greedy vs lazy matching
CODEBLOCK17
Escaping special characters
CODEBLOCK18
Newlines and multiline
CODEBLOCK19
Backtracking and performance
CODEBLOCK20
Tips
- - Start simple and add complexity.
\d+ is almost always enough — you rarely need [0-9]+. - Test your regex on real data, not just the happy path. Edge cases (empty strings, special characters, Unicode) break naive patterns.
- Use non-capturing groups
(?:...) when you don't need the captured value. It's slightly faster and cleaner. - In JavaScript, always use the
g flag for matchAll and global replace. Without it, only the first match is found/replaced. - Go's
regexp package uses RE2 (no lookahead/lookbehind). If you need those, use a different approach or the regexp2 package. - INLINECODE54 (PCRE) is the most powerful command-line regex. Use it over
grep -E when you need lookahead, \d, or \b. - For complex patterns, use verbose mode (
re.VERBOSE in Python, /x in Perl) with comments explaining each part. - Regex is the wrong tool for parsing HTML, XML, or JSON. Use a proper parser. Regex works for extracting simple values from these formats, not for structural parsing.
正则表达式模式
实用的正则表达式手册。涵盖验证、解析、提取和重构的模式,适用于JavaScript、Python、Go和命令行工具。
何时使用
- - 验证用户输入(邮箱、URL、IP、电话、日期)
- 解析日志行或结构化文本
- 从字符串中提取数据(ID、数字、令牌)
- 代码中的搜索替换(重命名变量、更新导入)
- 过滤文件或命令输出中的行
- 调试不按预期匹配的正则表达式
快速参考
元字符
| 模式 | 匹配内容 | 示例 |
|---|
| . | 任意字符(除换行符) | a.c 匹配 abc、a1c |
| \d |
数字 [0-9] | \d{3} 匹配 123 |
| \w | 单词字符 [a-zA-Z0-9
] | \w+ 匹配 hello123 |
| \s | 空白字符 [ \t\n\r\f] | \s+ 匹配空格/制表符 |
| \b | 单词边界 | \bcat\b 匹配 cat 不匹配 scatter |
| ^ | 行首 | ^Error 匹配以Error开头的行 |
| $ | 行尾 | \.js$ 匹配以.js结尾的行 |
| \D、\W、\S | 否定:非数字、非单词、非空白 | |
量词
1次或多次(贪婪) |
| ? | 0次或1次(可选) |
| {3} | 恰好3次 |
| {2,5} | 2到5次之间 |
| {3,} | 3次或以上 |
| *?、+? | 懒惰(尽可能少匹配) |
分组与选择
非捕获组 |
| (?P
abc) | 命名组(Python) |
| (?abc) | 命名组(JS/Go) |
| a\|b | 选择(a或b) |
| [abc] | 字符类(a、b或c) |
| [^abc] | 否定类(不是a、b或c) |
| [a-z] | 范围 |
前瞻与后顾
| 模式 | 含义 |
|---|
| (?=abc) | 正向前瞻(后面跟着abc) |
| (?!abc) |
负向前瞻(后面不跟着abc) |
| (?<=abc) | 正向后顾(前面有abc) |
| (?验证模式
邮箱
基础版(覆盖99%的真实邮箱)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
严格版(无连续点,本地部分无首尾点)
^a-zA-Z0-9?@a-zA-Z0-9?(\.[a-zA-Z]{2,})+$
URL
HTTP/HTTPS URL
https?://a-zA-Z0-9?(\.a-zA-Z0-9?)(/[^\s])?
带可选端口和查询参数
https?://[^\s/]+(/[^\s?])?(\?[^\s#])?(#[^\s]*)?
IP地址
IPv4
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b
IPv4(简单版,允许无效地址如999.999.999.999)
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
IPv6(简化版)
(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}
电话号码
美国电话(多种格式)
(?:\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
匹配:+1 (555) 123-4567、555.123.4567、5551234567
国际电话(E.164)
\+[1-9]\d{6,14}
日期和时间
ISO 8601 日期
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
ISO 8601 日期时间
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:\d{2})
美国日期(MM/DD/YYYY)
(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}
时间(HH:MM:SS,24小时制)
(?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d
密码(强度检查)
至少8个字符,1个大写,1个小写,1个数字,1个特殊字符
^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[!@#$%^&*()_+=-]).{8,}$
UUID
[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}
语义化版本
\bv?(\d+)\.(\d+)\.(\d+)(?:-([\w.]+))?(?:\+([\w.]+))?\b
捕获:主版本、次版本、补丁版本、预发布、构建
匹配:1.2.3、v1.0.0-beta.1、2.0.0+build.123
解析模式
日志行
bash
Apache/Nginx 访问日志
格式:IP - - [日期] 方法 /路径 HTTP/x.x 状态码 大小
grep -oP (\S+) - - \[([^\]]+)\] (\w+) (\S+) \S+ (\d+) (\d+) access.log
提取IP和状态码
grep -oP ^\S+|\s\K\d{3} access.log
Syslog 格式
格式:月 日 时:分:秒 主机名 进程名[进程ID]: 消息
grep -oP ^\w+\s+\d+\s[\d:]+\s(\S+)\s(\S+)\[(\d+)\]:\s(.*) syslog
JSON 日志 — 提取字段
grep -oP level\s:\s\K[^]+ app.log
grep -oP message\s:\s\K[^]+ app.log
代码模式
bash
查找函数定义(JavaScript/TypeScript)
grep -nP (?:function\s+\w+|(?:const|let|var)\s+\w+\s=\s(?:async\s)?\([^)]\)\s=>|(?:async\s+)?function\s\() src/*.ts
查找类定义
grep -nP class\s+\w+(?:\s+extends\s+\w+)? src/*.ts
查找导入语句
grep -nP ^import\s+.*\s+from\s+ src