正则表达式¶

正则表达式（Regular Expression）是一种通用的文本匹配语言，语法本身与编程语言无关——同一个正则模式在 Python、JavaScript、Java、Go 等语言中含义完全相同。差异仅体现在两点：

字符串转义：Java/C# 等语言在字符串字面量中需要双重转义（\\d），Python 用 r"" 前缀、JavaScript/Go 用原始字面量可以避免
API 调用方式：每种语言有自己的正则库函数

在线调试工具

Regex101 支持多语言模式切换，是学习和调试正则的首选工具。

核心语法¶

普通字符与元字符¶

普通字符：直接匹配自身，如 abc 匹配 "abc"
需转义的元字符：. * + ? ^ $ \ | ( ) [ ] { }
. 匹配除换行符外的任意单个字符

字符类¶

写法	含义
`[abc]`	匹配 a、b 或 c 之一
`[a-z]`	匹配任意小写字母
`[^abc]`	匹配非 a、b、c 的字符
`\d`	数字，等价于 `[0-9]`
`\D`	非数字
`\w`	单词字符 `[a-zA-Z0-9_]`
`\W`	非单词字符
`\s`	空白字符（空格、制表符、换行等）
`\S`	非空白字符

量词¶

写法	含义
`X?`	0 或 1 次
`X+`	1 次或多次
`X*`	0 次或多次
`X{n}`	恰好 n 次
`X{n,}`	至少 n 次
`X{n,m}`	n 到 m 次

在量词后加 ? 变为非贪婪模式（尽可能少匹配），如 .*?。

边界匹配¶

写法	含义
`^`	行开头
`$`	行结尾
`\b`	单词边界
`\B`	非单词边界

分组与捕获¶

(abc) — 捕获组，捕获匹配到的内容供后续引用
(?:abc) — 非捕获组，仅分组不捕获
(?<name>abc) — 命名捕获组
\1、\k<name> — 反向引用（匹配与该捕获组相同的内容）

前瞻 / 后顾¶

写法	含义
`(?=...)`	肯定前瞻：其后匹配某模式
`(?!...)`	否定前瞻：其后不匹配某模式
`(?<=...)`	肯定后顾：其前匹配某模式
`(?<!...)`	否定后顾：其前不匹配某模式

示例：\d+(?= 元) 匹配后面跟着"元"的数字。

各语言使用方式¶

PythonJavaScriptJavaGo

Python 推荐使用 r"" 原始字符串，无需双重转义：

import re

text = "邮箱：user@example.com，电话：13812345678"

# 查找第一个匹配
m = re.search(r'\d{11}', text)
print(m.group())  # 13812345678

# 查找所有匹配
emails = re.findall(r'[\w.]+@[\w.]+\.\w{2,}', text)

# 替换
result = re.sub(r'\d', '#', text)

# 命名分组
m = re.search(r'(?P<user>[\w.]+)@(?P<domain>[\w.]+)', text)
print(m.group('user'))    # user
print(m.group('domain'))  # example.com

JavaScript 支持字面量 /pattern/flags，无需转义：

const text = "邮箱：user@example.com，电话：13812345678";

// 字面量写法（推荐）
const phone = text.match(/\d{11}/);
console.log(phone[0]); // 13812345678

// 全局匹配（加 g 标志）
const nums = text.match(/\d+/g);

// 替换
const result = text.replace(/\d/g, '#');

// 命名分组
const m = text.match(/(?<user>[\w.]+)@(?<domain>[\w.]+)/);
console.log(m.groups.user);    // user
console.log(m.groups.domain);  // example.com

Java 字符串中反斜杠需双重转义（\d 写成 \\d）：

import java.util.regex.*;

String text = "邮箱：user@example.com，电话：13812345678";

// 匹配
Pattern p = Pattern.compile("\\d{11}");
Matcher m = p.matcher(text);
if (m.find()) System.out.println(m.group()); // 13812345678

// 字符串方法（简单场景）
boolean valid = "abc123".matches(".*\\d+.*");
String result = text.replaceAll("\\d", "#");

// 命名分组（Java 7+）
Pattern p2 = Pattern.compile("(?<user>[\\w.]+)@(?<domain>[\\w.]+)");
Matcher m2 = p2.matcher(text);
if (m2.find()) {
    System.out.println(m2.group("user"));    // user
    System.out.println(m2.group("domain"));  // example.com
}

Go 使用原始字符串字面量 ` `，无需转义：

import "regexp"

text := "邮箱：user@example.com，电话：13812345678"

// 编译正则
p := regexp.MustCompile(`\d{11}`)
fmt.Println(p.FindString(text)) // 13812345678

// 查找所有
nums := regexp.MustCompile(`\d+`).FindAllString(text, -1)

// 替换
result := p.ReplaceAllString(text, "###########")

// 命名分组
p2 := regexp.MustCompile(`(?P<user>[\w.]+)@(?P<domain>[\w.]+)`)
m := p2.FindStringSubmatch(text)
fmt.Println(m[p2.SubexpIndex("user")])    // user
fmt.Println(m[p2.SubexpIndex("domain")])  // example.com

常用模式¶

邮箱       [\w.+-]+@[\w-]+\.[a-zA-Z]{2,}
手机号     1[3-9]\d{9}
日期       \d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])
IPv4       (\d{1,3}\.){3}\d{1,3}
URL        https?://[\w./-]+
中文字符   [一-龥]+