Skip to content

Regular Expressions

What are Regular Expressions?

Regular expressions (RegEx or regex for short) is a pattern matching syntax for describing text patterns. It's used for text search, matching, replacement, and other operations, and is a core skill in Linux text processing.

Regex Types

Linux has two main regular expression types:

TypeDescriptionSupporting Tools
BREBasic Regular Expressiongrep, sed
EREExtended Regular Expressionegrep, grep -E, sed -E

Main Differences

Meta CharacterBREERE
?\??
+\++
{}\{}{}
()\(\)()
``|

Basic Meta Characters

Character Matching

Meta CharacterDescriptionExample
.Match any single charactera.c matches abc, adc
[]Character class[abc] matches a, b, c
[^]Negated character class[^abc] matches non a, b, c
\Escape character\. matches dot
```

Character Class Shortcuts

Character ClassDescription
[0-9]Digits
[a-z]Lowercase letters
[A-Z]Uppercase letters
[a-zA-Z0-9]All letters
[a-zA-Z0-9]Letters and numbers

POSIX Character Classes

Character ClassDescription
[:alnum:]Letters and numbers
[:alpha:]Letters
[:digit:]Numbers
[:lower:]Lowercase letters
[:upper:]Uppercase letters
[:space:]Whitespace characters
[:punct:]Punctuation marks
[:blank:]Space and Tab
bash
# Using POSIX character classes
$ grep '[[:digit]]' file.txt
$ grep '[[:alpha:]]' file.txt

Position Anchors

Meta CharacterDescriptionExample
^Line start^hello
$Line endworld$
\bWord boundary\bword\b
\BNon-word boundary\Bword\B
bash
# Match lines starting with hello
$ grep '^hello' file.txt

# Match lines ending with world
$ grep 'world$' file.txt

# Match entire line
$ grep '^hello world$' file.txt

# Match empty line
$ grep '^$' file.txt

# Match complete word
$ grep '\bword\b' file.txt

Quantifiers

Meta CharacterDescriptionExample
*Zero or more timesab*c matches ac, abc, abbc, abbbc
+One or more timesab+c matches abc, abbc, abbcc
?Zero or one timeab?c matches ac, abc
{n}Exactly n timesa{3} matches aaa
{n,}At least n timesa{2,} matches aa, aaa, aab
{n,m}Between n and m timesa{2,4} matches aa, aaa, aab
{n,m}Up to m timesa{1,5} matches a, aa, aaa, aab, aaaa
bash
# Basic usage
$ grep 'ab*c' file.txt       # BRE
$ grep -E 'ab+c' file.txt    # ERE

# Specific number
$ grep 'a\{3\}' file.txt     # BRE
$ grep -E 'a{3}' file.txt    # ERE

# Range
$ grep -E 'a{2,4}' file.txt

Groups and Capturing

bash
# Grouping
$ grep -E '(ab)+' file.txt    # Matches ab, abab, ababab

# Back reference
$ grep -E '(.)\(.\1\)' file.txt   # Matches aba, aca

# sed usage
$ sed 's/\(.*\)/\1/' file.txt      # ERE
$ sed -E 's/(hello) (world)/\2 \1/' file.txt   # ERE

Alternation (OR)

bash
# Use |
$ grep -E 'cat|dog' file.txt

# With grouping
$ grep -E '(red|blue) car' file.txt

Common Pattern Examples

Numbers

bash
# Integers
[0-9]+

# Floating-point
[0-9]+\.[0-9]+

# Signed numbers
-?[0-9]+

# Phone numbers (China mobile)
1[3-9][0-9]{9}

Strings

bash
# Quoted strings
"[^"]*"

# Arbitrary word
\b\w+\b

# Email addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# URL
https?://[^ ]+

# IP address (simplified)
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}

# Date YYYY-MM-DD
[0-9]{4}-[0-9]{2}-[0-9]{2}

# Time HH:MM:SS
[0-9]{2}:[0-9]{2}:[0-9]{2}

# Phone number (China mobile)
1[3-9][0-9]{9}

Using grep with Regex

bash
# Basic regex
$ grep 'pattern' file.txt

# Extended regex
$ grep -E 'pattern' file.txt

# Perl regex
$ grep -P 'pattern' file.txt

# Only match complete word
$ grep -w 'word' file.txt

# Show line numbers
$ grep -n 'pattern' file.txt

# Case-insensitive
$ grep -i 'pattern' file.txt

# Reverse match (not containing)
$ grep -v 'pattern' file.txt

Examples

bash
# Find lines containing numbers
$ grep '[0-9]' file.txt

# Find lines starting with letters
$ grep '^[a-zA-Z]' file.txt

# Find email addresses
$ grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

# Find IP addresses
$ grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.txt

# Find empty lines or whitespace-only
$ grep -E '^[[:space:]]*$' file.txt

Using sed with Regex

bash
# Basic replacement
$ sed 's/old/new/' file.txt

# Use extended regex
$ sed -E 's/(hello) (world)/\2 \1/' file.txt

# Back reference
$ sed 's/\([a-z]+) \([a-z]+) /\2 \1/' file.txt   # ERE
$ sed -E 's/([a-z]+) ([a-z]+)/\2 \1/' file.txt   # ERE

# Use & to reference entire match
$ sed 's/[0-9]*/【&】/' file.txt   # Add brackets

Practical Examples

bash
# Delete HTML tags
$ sed 's/<[^>]*>//g' file.html

# Delete leading whitespace
$ sed 's/^[[:space:]]*//' file.txt

# Extract quoted content
$ sed -E 's/.*"([^"]*)"/\1/' file.txt

# Format phone number
$ sed -E 's/([0-9]{3})([0-9]{4})([0-9]{4})/\1-\2-\3-\4/' file.txt

Using awk with Regex

bash
# Pattern matching
$ awk '/pattern/' file.txt

# Field matching
$ awk '$1 ~ /pattern/' file.txt
$ awk '$1 !~ /pattern/' file.txt

# Regex delimiter
$ awk -F '[,:]' '{print $1}' file.csv

# gsub substitution
$ awk '{gsub(/old/, "new"); print}' file.txt

# Match extraction
$ awk 'match($0, /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/) {print substr($0, RSTART, RLENGTH)}' file.txt

Greedy vs Non-greedy

By default, regular expressions are greedy and will match as much as possible.

bash
# Greedy match
$ echo "aaa bbb ccc ddd" | grep -oE 'a.*c'
aaa bbb ccc ddd

# Non-greedy (Perl regex only)
$ echo "aaa bbb ccc ddd" | grep -oP 'a.*?c'
aaa bbb c

Common Errors

1. Forgetting to Escape

bash
# Wrong: . matches any character
$ grep 'file.txt' file.txt

# Correct
$ grep 'file\.txt' file.txt

2. BRE and ERE Confusion

bash
# BRE where + needs escaping
$ grep 'a\+' file.txt

# ERE no escaping needed
$ grep -E 'a+' file.txt

3. Greedy Matching Issues

bash
# May match too much
$ sed 's/<.*>//' file.html

# Use negative character class
$ sed 's/<[^>]*>//g' file.html

Testing Tools

Online Tools

  • regex101.com
  • regexr.com

Command Line Testing

bash
# Use grep to test
$ echo "test string" | grep -E 'pattern'
$ echo "test string" | sed -E 's/pattern/replacement/'

# Use awk to test
$ echo "test string" | awk '/pattern/'

Summary

This chapter introduced Linux regular expressions:

  • Meta characters: ., *, +, ?, ^, $
  • Character classes: [], [^], POSIX classes
  • Quantifiers: {n}, {n,m}, {n,}, {n,m}
  • Groups and capturing: (), \1, \2
  • Alternation: |
  • In tools: grep, sed, awk

Regular expressions require practice to master. Start with simple patterns and gradually learn more complex usage.


Previous chapter: Text Processing Tools

Next chapter: User Management

Content is for learning and research only.