q_code

扫码关注官方微信

cell_code

扫码下载APP

返回顶部

R

Regex正则表达式

Write correct, efficient regular expressions across different engines.

作者: admin | 来源: ClawHub

下载

源自

ClawHub

版本

V 1.0.0

安全检测

已通过

1,164
下载量

免费
免费

2
收藏

概述

安装方式

版本历史

Regex

Greedy vs Lazy

- .* is greedy—matches as much as possible; .*? is lazy—matches minimum
Greedy often overshoots: <.*> on <a>b</a> matches entire string, not INLINECODE4
Default quantifiers + * {n,} are greedy—add ? for lazy: +? *? INLINECODE9

Escaping

- Metacharacters need escape: INLINECODE10
Inside character class []: only ], \, ^, - need escape (and ^ only at start, - only mid)
Literal backslash: \\ in regex, but in strings often need \\\\ (double escape)

Anchors

- ^ start, $ end—but behavior changes with multiline flag
Multiline mode: ^ $ match line starts/ends; without, only string start/end
INLINECODE24 always string start, \Z always string end (not all engines)
Word boundary \b matches position, not character—\bword\b for whole words

Character Classes

- [abc] matches one of a, b, c; [^abc] matches anything except a, b, c
Ranges: [a-z] [0-9]—but [a-Z] is invalid (ASCII order matters)
Shorthand: \d digit, \w word char, \s whitespace; uppercase negates: \D \W INLINECODE38
INLINECODE39 matches any char except newline—use [\s\S] for truly any, or s flag if available

Groups

- Capturing () vs non-capturing (?:)—use (?:) when you don't need backreference
Named groups: (?<name>...) or (?P<name>...) depending on engine
Backreferences: \1 \2 refer to captured groups in same pattern
Groups also establish scope for alternation: cat|dog vs INLINECODE50

Lookahead & Lookbehind

- Positive lookahead (?=...): assert what follows, don't consume
Negative lookahead (?!...): assert what doesn't follow
Positive lookbehind (?<=...): assert what precedes
Negative lookbehind (?<!...): assert what doesn't precede
Lookbehinds must be fixed-width in most engines—no * or + inside

Flags

- i case-insensitive, m multiline (^$ match lines), g global (find all)
INLINECODE60 (dotall): . matches newline—not supported everywhere
INLINECODE62 unicode: enables \p{} properties, proper surrogate handling
Flags syntax varies: /pattern/flags (JS), (?flags) inline, or function arg (Python re.I)

Engine Differences

- JavaScript: no lookbehind until ES2018; no \A \Z; no possessive quantifiers
Python re: uses (?P<name>) for named groups; no \p{} without regex module
PCRE (PHP, grep -P): full features; possessive ++ *+; recursive patterns
Go: RE2 engine, no backreferences, no lookahead—guaranteed linear time

Performance

- Catastrophic backtracking: (a+)+ against aaaaaaaaaab is exponential—avoid nested quantifiers
Possessive quantifiers ++ *+ prevent backtracking—use when backtracking pointless
Atomic groups (?>...) don't give back chars—similar to possessive
Anchor patterns when possible—^prefix is O(1), unanchored prefix is O(n)

Common Mistakes

- Email validation: RFC-compliant regex is 6000+ chars—use simple check or library
URL matching: edge cases are endless—use URL parser, regex for quick extraction only
Don't use regex for HTML/XML—use a parser; regex can't handle nesting
Forgetting to escape user input—regex injection is real; use literal escaping functions

Testing

- Test edge cases: empty string, special chars, unicode, very long input
Visualize with tools: regex101.com shows matches and explains
Check which engine documentation you're reading—features vary significantly

贪婪 vs 懒惰

- . 是贪婪的——尽可能多地匹配；.? 是懒惰的——匹配最少内容
贪婪模式常会过度匹配：<.> 对 b 会匹配整个字符串，而非
默认量词 + {n,} 都是贪婪的——添加 ? 变为懒惰模式：+? *? {n,}?

转义

- 元字符需要转义：\. \* \+ \? \[ \] \( \) \{ \} \| \\ \^ \$
在字符类 [] 内部：只有 ]、\、^、- 需要转义（且 ^ 仅在开头，- 仅在中间）
字面反斜杠：正则中用 \\，但在字符串中通常需要 \\\\（双重转义）

锚点

- ^ 表示开头，$ 表示结尾——但行为会随多行标志改变
多行模式：^ $ 匹配行首/行尾；非多行模式仅匹配字符串首尾
\A 始终匹配字符串开头，\Z 始终匹配字符串结尾（并非所有引擎支持）
单词边界 \b 匹配位置而非字符——\bword\b 用于匹配完整单词

字符类

- [abc] 匹配 a、b、c 中的任意一个；[^abc] 匹配除 a、b、c 外的任意字符
范围：[a-z] [0-9]——但 [a-Z] 无效（ASCII 顺序决定）
简写：\d 数字、\w 单词字符、\s 空白字符；大写表示否定：\D \W \S
. 匹配除换行符外的任意字符——使用 [\s\S] 匹配真正任意字符，或使用 s 标志（如支持）

分组

- 捕获分组 () 与非捕获分组 (?:)——不需要反向引用时使用 (?:)
命名分组：(?...) 或 (?P...)（取决于引擎）
反向引用：\1 \2 引用同一模式中的捕获分组
分组也建立分支作用域：cat|dog 与 ca(t|d)og

前瞻与后顾

- 正向前瞻 (?=...)：断言后面是什么，但不消耗字符
负向前瞻 (?!...)：断言后面不是什么
正向后顾 (?<=...)：断言前面是什么
负向后顾 (?
大多数引擎中后顾必须是固定宽度——内部不能使用 * 或 +

标志

- i 不区分大小写、m 多行模式（^$ 匹配行）、g 全局匹配（查找所有）
s（点号通配模式）：. 匹配换行符——并非所有环境支持
u Unicode：启用 \p{} 属性、正确处理代理对
标志语法各异：/pattern/flags（JavaScript）、(?flags) 内联标志、或函数参数（Python re.I）

引擎差异

- JavaScript：ES2018 前不支持后顾；无 \A \Z；无占有量词
Python re：使用 (?P) 命名分组；无 regex 模块时不支持 \p{}
PCRE（PHP、grep -P）：功能完整；支持占有量词 ++ *+；支持递归模式
Go：RE2 引擎，不支持反向引用和前瞻——保证线性时间

性能

- 灾难性回溯：(a+)+ 对 aaaaaaaaaab 呈指数级增长——避免嵌套量词
占有量词 ++ *+ 阻止回溯——在回溯无意义时使用
原子分组 (?>...) 不释放已匹配字符——类似占有量词
尽可能锚定模式——^prefix 是 O(1)，未锚定的 prefix 是 O(n)

常见错误

- 邮箱验证：符合 RFC 规范的正则表达式超过 6000 字符——使用简单检查或库函数
URL 匹配：边界情况无穷无尽——使用 URL 解析器，正则仅用于快速提取
不要用正则处理 HTML/XML——使用解析器；正则无法处理嵌套结构
忘记转义用户输入——正则注入真实存在；使用字面转义函数

测试

- 测试边界情况：空字符串、特殊字符、Unicode、超长输入
使用工具可视化：regex101.com 显示匹配结果并解释
确认你阅读的是哪个引擎的文档——功能差异显著

标签

通过对话安装

该技能支持在以下平台通过对话安装：

OpenClaw WorkBuddy QClaw Kimi Claude

方式一：安装 SkillHub 和技能

帮我安装 SkillHub 和 regex-1776327798 技能

方式二：设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源，然后帮我安装 regex-1776327798 技能

通过命令行安装

skillhub install regex-1776327798

下载

⬇ 下载 Regex v1.0.0（免费）

文件大小: 2.39 KB | 发布时间: 2026-4-17 15:03

v1.0.0 最新 2026-4-17 15:03

Initial release

闲社论坛
关于我们会员介绍开通会员羊毛论坛
闲社论坛
羊毛交流论坛线报讨论社区优惠分享交流线报更新服务
网站服务
会员咨询：515151560 广告合作：515151570 投诉建议：515151580 售后指导：515151590

多链集团旗下-闲社网

闲社网热线

免费联系电话

0527-80111111

服务时间：周一到周日 8:00-24:00

公众号
闲社闲社线报社区

关注闲社网

闲社在线客服
关注闲社网微信
闲社网APP

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0 © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large

返回顶部