Regular Expressions
What are Regular Expressions?
Regular expressions (RegEx or regex for short) is a pattern matching syntax for describing text patterns. It's used for text search, matching, replacement, and other operations, and is a core skill in Linux text processing.
Regex Types
Linux has two main regular expression types:
| Type | Description | Supporting Tools |
|---|---|---|
| BRE | Basic Regular Expression | grep, sed |
| ERE | Extended Regular Expression | egrep, grep -E, sed -E |
Main Differences
| Meta Character | BRE | ERE |
|---|---|---|
? | \? | ? |
+ | \+ | + |
{} | \{} | {} |
() | \(\) | () |
| ` | ` | | |
Basic Meta Characters
Character Matching
| Meta Character | Description | Example |
|---|---|---|
. | Match any single character | a.c matches abc, adc |
[] | Character class | [abc] matches a, b, c |
[^] | Negated character class | [^abc] matches non a, b, c |
\ | Escape character | \. matches dot |
| ``` |
Character Class Shortcuts
| Character Class | Description |
|---|---|
[0-9] | Digits |
[a-z] | Lowercase letters |
[A-Z] | Uppercase letters |
[a-zA-Z0-9] | All letters |
[a-zA-Z0-9] | Letters and numbers |
POSIX Character Classes
| Character Class | Description |
|---|---|
[:alnum:] | Letters and numbers |
[:alpha:] | Letters |
[:digit:] | Numbers |
| [:lower:] | Lowercase letters |
| [:upper:] | Uppercase letters |
| [:space:] | Whitespace characters |
| [:punct:] | Punctuation marks |
| [:blank:] | Space and Tab |
bash
# Using POSIX character classes
$ grep '[[:digit]]' file.txt
$ grep '[[:alpha:]]' file.txtPosition Anchors
| Meta Character | Description | Example |
|---|---|---|
^ | Line start | ^hello |
$ | Line end | world$ |
\b | Word boundary | \bword\b |
\B | Non-word boundary | \Bword\B |
bash
# Match lines starting with hello
$ grep '^hello' file.txt
# Match lines ending with world
$ grep 'world$' file.txt
# Match entire line
$ grep '^hello world$' file.txt
# Match empty line
$ grep '^$' file.txt
# Match complete word
$ grep '\bword\b' file.txtQuantifiers
| Meta Character | Description | Example |
|---|---|---|
* | Zero or more times | ab*c matches ac, abc, abbc, abbbc |
+ | One or more times | ab+c matches abc, abbc, abbcc |
? | Zero or one time | ab?c matches ac, abc |
{n} | Exactly n times | a{3} matches aaa |
{n,} | At least n times | a{2,} matches aa, aaa, aab |
{n,m} | Between n and m times | a{2,4} matches aa, aaa, aab |
{n,m} | Up to m times | a{1,5} matches a, aa, aaa, aab, aaaa |
bash
# Basic usage
$ grep 'ab*c' file.txt # BRE
$ grep -E 'ab+c' file.txt # ERE
# Specific number
$ grep 'a\{3\}' file.txt # BRE
$ grep -E 'a{3}' file.txt # ERE
# Range
$ grep -E 'a{2,4}' file.txtGroups and Capturing
bash
# Grouping
$ grep -E '(ab)+' file.txt # Matches ab, abab, ababab
# Back reference
$ grep -E '(.)\(.\1\)' file.txt # Matches aba, aca
# sed usage
$ sed 's/\(.*\)/\1/' file.txt # ERE
$ sed -E 's/(hello) (world)/\2 \1/' file.txt # EREAlternation (OR)
bash
# Use |
$ grep -E 'cat|dog' file.txt
# With grouping
$ grep -E '(red|blue) car' file.txtCommon Pattern Examples
Numbers
bash
# Integers
[0-9]+
# Floating-point
[0-9]+\.[0-9]+
# Signed numbers
-?[0-9]+
# Phone numbers (China mobile)
1[3-9][0-9]{9}Strings
bash
# Quoted strings
"[^"]*"
# Arbitrary word
\b\w+\b
# Email addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
# URL
https?://[^ ]+
# IP address (simplified)
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}
# Date YYYY-MM-DD
[0-9]{4}-[0-9]{2}-[0-9]{2}
# Time HH:MM:SS
[0-9]{2}:[0-9]{2}:[0-9]{2}
# Phone number (China mobile)
1[3-9][0-9]{9}Using grep with Regex
bash
# Basic regex
$ grep 'pattern' file.txt
# Extended regex
$ grep -E 'pattern' file.txt
# Perl regex
$ grep -P 'pattern' file.txt
# Only match complete word
$ grep -w 'word' file.txt
# Show line numbers
$ grep -n 'pattern' file.txt
# Case-insensitive
$ grep -i 'pattern' file.txt
# Reverse match (not containing)
$ grep -v 'pattern' file.txtExamples
bash
# Find lines containing numbers
$ grep '[0-9]' file.txt
# Find lines starting with letters
$ grep '^[a-zA-Z]' file.txt
# Find email addresses
$ grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt
# Find IP addresses
$ grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.txt
# Find empty lines or whitespace-only
$ grep -E '^[[:space:]]*$' file.txtUsing sed with Regex
bash
# Basic replacement
$ sed 's/old/new/' file.txt
# Use extended regex
$ sed -E 's/(hello) (world)/\2 \1/' file.txt
# Back reference
$ sed 's/\([a-z]+) \([a-z]+) /\2 \1/' file.txt # ERE
$ sed -E 's/([a-z]+) ([a-z]+)/\2 \1/' file.txt # ERE
# Use & to reference entire match
$ sed 's/[0-9]*/【&】/' file.txt # Add bracketsPractical Examples
bash
# Delete HTML tags
$ sed 's/<[^>]*>//g' file.html
# Delete leading whitespace
$ sed 's/^[[:space:]]*//' file.txt
# Extract quoted content
$ sed -E 's/.*"([^"]*)"/\1/' file.txt
# Format phone number
$ sed -E 's/([0-9]{3})([0-9]{4})([0-9]{4})/\1-\2-\3-\4/' file.txtUsing awk with Regex
bash
# Pattern matching
$ awk '/pattern/' file.txt
# Field matching
$ awk '$1 ~ /pattern/' file.txt
$ awk '$1 !~ /pattern/' file.txt
# Regex delimiter
$ awk -F '[,:]' '{print $1}' file.csv
# gsub substitution
$ awk '{gsub(/old/, "new"); print}' file.txt
# Match extraction
$ awk 'match($0, /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/) {print substr($0, RSTART, RLENGTH)}' file.txtGreedy vs Non-greedy
By default, regular expressions are greedy and will match as much as possible.
bash
# Greedy match
$ echo "aaa bbb ccc ddd" | grep -oE 'a.*c'
aaa bbb ccc ddd
# Non-greedy (Perl regex only)
$ echo "aaa bbb ccc ddd" | grep -oP 'a.*?c'
aaa bbb cCommon Errors
1. Forgetting to Escape
bash
# Wrong: . matches any character
$ grep 'file.txt' file.txt
# Correct
$ grep 'file\.txt' file.txt2. BRE and ERE Confusion
bash
# BRE where + needs escaping
$ grep 'a\+' file.txt
# ERE no escaping needed
$ grep -E 'a+' file.txt3. Greedy Matching Issues
bash
# May match too much
$ sed 's/<.*>//' file.html
# Use negative character class
$ sed 's/<[^>]*>//g' file.htmlTesting Tools
Online Tools
- regex101.com
- regexr.com
Command Line Testing
bash
# Use grep to test
$ echo "test string" | grep -E 'pattern'
$ echo "test string" | sed -E 's/pattern/replacement/'
# Use awk to test
$ echo "test string" | awk '/pattern/'Summary
This chapter introduced Linux regular expressions:
- Meta characters:
.,*,+,?,^,$ - Character classes:
[],[^], POSIX classes - Quantifiers:
{n},{n,m},{n,},{n,m} - Groups and capturing:
(),\1,\2 - Alternation:
| - In tools: grep, sed, awk
Regular expressions require practice to master. Start with simple patterns and gradually learn more complex usage.
Previous chapter: Text Processing Tools
Next chapter: User Management