Regular Expressions

What are Regular Expressions?

Regular expressions (RegEx or regex for short) is a pattern matching syntax for describing text patterns. It's used for text search, matching, replacement, and other operations, and is a core skill in Linux text processing.

Regex Types

Linux has two main regular expression types:

Type	Description	Supporting Tools
BRE	Basic Regular Expression	grep, sed
ERE	Extended Regular Expression	egrep, grep -E, sed -E

Main Differences

Meta Character	BRE	ERE
`?`	`\?`	`?`
`+`	`\+`	`+`
`{}`	`\{}`	`{}`
`()`	``	`()`
`	`	`\|`

Basic Meta Characters

Character Matching

Meta Character	Description	Example
`.`	Match any single character	`a.c` matches abc, adc
`[]`	Character class	`[abc]` matches a, b, c
`[^]`	Negated character class	`[^abc]` matches non a, b, c
`\`	Escape character	`\.` matches dot
```

Character Class Shortcuts

Character Class	Description
`[0-9]`	Digits
`[a-z]`	Lowercase letters
`[A-Z]`	Uppercase letters
`[a-zA-Z0-9]`	All letters
`[a-zA-Z0-9]`	Letters and numbers

POSIX Character Classes

Character Class	Description
`[:alnum:]`	Letters and numbers
`[:alpha:]`	Letters
`[:digit:]`	Numbers
[:lower:]	Lowercase letters
[:upper:]	Uppercase letters
[:space:]	Whitespace characters
[:punct:]	Punctuation marks
[:blank:]	Space and Tab

# Using POSIX character classes
$ grep '[[:digit]]' file.txt
$ grep '[[:alpha:]]' file.txt

Position Anchors

Meta Character	Description	Example
`^`	Line start	`^hello`
`$`	Line end	`world$`
`\b`	Word boundary	`\bword\b`
`\B`	Non-word boundary	`\Bword\B`

# Match lines starting with hello
$ grep '^hello' file.txt

# Match lines ending with world
$ grep 'world$' file.txt

# Match entire line
$ grep '^hello world$' file.txt

# Match empty line
$ grep '^$' file.txt

# Match complete word
$ grep '\bword\b' file.txt

Quantifiers

Meta Character	Description	Example
`*`	Zero or more times	`ab*c` matches ac, abc, abbc, abbbc
`+`	One or more times	`ab+c` matches abc, abbc, abbcc
`?`	Zero or one time	`ab?c` matches ac, abc
`{n}`	Exactly n times	`a{3}` matches aaa
`{n,}`	At least n times	`a{2,}` matches aa, aaa, aab
`{n,m}`	Between n and m times	`a{2,4}` matches aa, aaa, aab
`{n,m}`	Up to m times	`a{1,5}` matches a, aa, aaa, aab, aaaa

# Basic usage
$ grep 'ab*c' file.txt       # BRE
$ grep -E 'ab+c' file.txt    # ERE

# Specific number
$ grep 'a\{3\}' file.txt     # BRE
$ grep -E 'a{3}' file.txt    # ERE

# Range
$ grep -E 'a{2,4}' file.txt

Groups and Capturing

# Grouping
$ grep -E '(ab)+' file.txt    # Matches ab, abab, ababab

# Back reference
$ grep -E '(.)\(.\1\)' file.txt   # Matches aba, aca

# sed usage
$ sed 's/\(.*\)/\1/' file.txt      # ERE
$ sed -E 's/(hello) (world)/\2 \1/' file.txt   # ERE

Alternation (OR)

# Use |
$ grep -E 'cat|dog' file.txt

# With grouping
$ grep -E '(red|blue) car' file.txt

Common Pattern Examples

Numbers

# Integers
[0-9]+

# Floating-point
[0-9]+\.[0-9]+

# Signed numbers
-?[0-9]+

# Phone numbers (China mobile)
1[3-9][0-9]{9}

Strings

# Quoted strings
"[^"]*"

# Arbitrary word
\b\w+\b

# Email addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# URL
https?://[^ ]+

# IP address (simplified)
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}

# Date YYYY-MM-DD
[0-9]{4}-[0-9]{2}-[0-9]{2}

# Time HH:MM:SS
[0-9]{2}:[0-9]{2}:[0-9]{2}

# Phone number (China mobile)
1[3-9][0-9]{9}

Using grep with Regex

# Basic regex
$ grep 'pattern' file.txt

# Extended regex
$ grep -E 'pattern' file.txt

# Perl regex
$ grep -P 'pattern' file.txt

# Only match complete word
$ grep -w 'word' file.txt

# Show line numbers
$ grep -n 'pattern' file.txt

# Case-insensitive
$ grep -i 'pattern' file.txt

# Reverse match (not containing)
$ grep -v 'pattern' file.txt

Examples

# Find lines containing numbers
$ grep '[0-9]' file.txt

# Find lines starting with letters
$ grep '^[a-zA-Z]' file.txt

# Find email addresses
$ grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

# Find IP addresses
$ grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.txt

# Find empty lines or whitespace-only
$ grep -E '^[[:space:]]*$' file.txt

Using sed with Regex

# Basic replacement
$ sed 's/old/new/' file.txt

# Use extended regex
$ sed -E 's/(hello) (world)/\2 \1/' file.txt

# Back reference
$ sed 's/\([a-z]+) \([a-z]+) /\2 \1/' file.txt   # ERE
$ sed -E 's/([a-z]+) ([a-z]+)/\2 \1/' file.txt   # ERE

# Use & to reference entire match
$ sed 's/[0-9]*/【&】/' file.txt   # Add brackets

Practical Examples

# Delete HTML tags
$ sed 's/<[^>]*>//g' file.html

# Delete leading whitespace
$ sed 's/^[[:space:]]*//' file.txt

# Extract quoted content
$ sed -E 's/.*"([^"]*)"/\1/' file.txt

# Format phone number
$ sed -E 's/([0-9]{3})([0-9]{4})([0-9]{4})/\1-\2-\3-\4/' file.txt

Using awk with Regex

# Pattern matching
$ awk '/pattern/' file.txt

# Field matching
$ awk '$1 ~ /pattern/' file.txt
$ awk '$1 !~ /pattern/' file.txt

# Regex delimiter
$ awk -F '[,:]' '{print $1}' file.csv

# gsub substitution
$ awk '{gsub(/old/, "new"); print}' file.txt

# Match extraction
$ awk 'match($0, /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/) {print substr($0, RSTART, RLENGTH)}' file.txt

Greedy vs Non-greedy

By default, regular expressions are greedy and will match as much as possible.

# Greedy match
$ echo "aaa bbb ccc ddd" | grep -oE 'a.*c'
aaa bbb ccc ddd

# Non-greedy (Perl regex only)
$ echo "aaa bbb ccc ddd" | grep -oP 'a.*?c'
aaa bbb c

Common Errors

1. Forgetting to Escape

# Wrong: . matches any character
$ grep 'file.txt' file.txt

# Correct
$ grep 'file\.txt' file.txt

2. BRE and ERE Confusion

# BRE where + needs escaping
$ grep 'a\+' file.txt

# ERE no escaping needed
$ grep -E 'a+' file.txt

3. Greedy Matching Issues

# May match too much
$ sed 's/<.*>//' file.html

# Use negative character class
$ sed 's/<[^>]*>//g' file.html

Testing Tools

Online Tools

regex101.com
regexr.com

Command Line Testing

# Use grep to test
$ echo "test string" | grep -E 'pattern'
$ echo "test string" | sed -E 's/pattern/replacement/'

# Use awk to test
$ echo "test string" | awk '/pattern/'

Summary

This chapter introduced Linux regular expressions:

Meta characters: ., *, +, ?, ^, $
Character classes: [], [^], POSIX classes
Quantifiers: {n}, {n,m}, {n,}, {n,m}
Groups and capturing: (), \1, \2
Alternation: |
In tools: grep, sed, awk

Regular expressions require practice to master. Start with simple patterns and gradually learn more complex usage.

Previous chapter: Text Processing Tools

Next chapter: User Management

#Regular Expressions

#What are Regular Expressions?

#Regex Types

#Basic Meta Characters

#Character Matching

#Character Class Shortcuts

#POSIX Character Classes

#Position Anchors

#Quantifiers

#Groups and Capturing

#Alternation (OR)

#Common Pattern Examples

#Numbers

#Strings

#Using grep with Regex

#Examples

#Using sed with Regex

#Practical Examples

#Using awk with Regex

#Greedy vs Non-greedy

#Common Errors

#1. Forgetting to Escape

#2. BRE and ERE Confusion

#3. Greedy Matching Issues

#Testing Tools

#Online Tools

#Command Line Testing

#Summary