Python Regular Expressions

Regular Expressions (regex) are a powerful tool that uses a specialized syntax to search, match, and manipulate specific patterns in strings. Python supports regular expressions through the built-in re module.

What is a Regular Expression?

Imagine you want to find all phone numbers or email addresses in a long text. These patterns are hard to describe using simple string methods (like find() or startswith()). Regular expressions allow you to define a "pattern" and then use this pattern to match any complex strings you want.

Core Functions of the `re` Module

`re.search(pattern, string)`

Scans the entire string and finds the first occurrence of the pattern, returning a match object (Match Object) if found, otherwise returning None.

import re

text = "The rain in Spain falls mainly in the plain."

# Find the 'ai' pattern
match = re.search(r"ai", text)

if match:
    print("Match found!")
    print(f"Span: {match.span()}")   # Start and end positions of match: (5, 7)
    print(f"String: {match.string}") # Original string
    print(f"Group: {match.group()}")   # Matched string: 'ai'
else:
    print("No match found.")

What is r"..."? The r prefix indicates this is a "raw string". In regular expressions, backslashes \ have special meanings (such as \d representing numbers). Using raw strings prevents the Python interpreter from escaping backslashes, thereby simplifying the writing of regular expressions.

`re.findall(pattern, string)`

Finds all non-overlapping substrings in the string that match the pattern and returns them as a list.

import re

text = "The rain in Spain falls mainly in the plain."

# Find all instances of 'ai'
all_matches = re.findall(r"ai", text)

print(all_matches) # Output: ['ai', 'ai', 'ai', 'ai']

`re.sub(pattern, repl, string)`

Finds substrings that match the pattern and replaces them with repl. Returns the new string after replacement.

import re

text = "My phone number is 123-456-7890."

# Replace phone number with [REDACTED]
redacted_text = re.sub(r"\d{3}-\d{3}-\d{4}", "[REDACTED]", text)

print(redacted_text) # Output: My phone number is [REDACTED].

Common Metacharacters

Metacharacters are characters with special meanings in regular expressions.

Metacharacter	Description	Example	Matches
`.`	Matches any single character except newline	`a.b`	`acb`, `a_b`
`^`	Matches the beginning of a string	`^Hello`	`Hello World`
`$`	Matches the end of a string	`World$`	`Hello World`
`*`	Matches the preceding character 0 or more times	`ab*c`	`ac`, `abc`, `abbbc`
`+`	Matches the preceding character 1 or more times	`ab+c`	`abc`, `abbbc` (doesn't match `ac`)
`?`	Matches the preceding character 0 or 1 time	`ab?c`	`ac`, `abc`
`{m,n}`	Matches the preceding character m to n times	`a{2,4}`	`aa`, `aaa`, `aaaa`
`[]`	Character set, matches any one character in the brackets	`[aeiou]`	`a`, `e`, `i`, `o`, `u`
`\`	Escapes special characters or introduces special sequences	`\.`	`.` (the character itself)
`\d`	Matches any digit (equivalent to `[0-9]`)	`\d+`	`123`, `45`
`\D`	Matches any non-digit character
`\s`	Matches any whitespace character (space, tab, newline)
`\S`	Matches any non-whitespace character
`\w`	Matches any letter, number, or underscore (equivalent to `[a-zA-Z0-9_]`)
`\W`	Matches any non-letter, number, or underscore character

Grouping

Using parentheses () can group patterns. This has two main purposes:

Apply quantifiers (such as *, +, ?) to multiple characters as a whole.
Capture the matched content for later referencing.

import re

text = "Email: john.doe@example.com, User: jane_doe"

# Pattern matches a complete email address
# (\w+\.\w+) captures the username part
# (\w+\.\w+) captures the domain part
match = re.search(r"(\w+\.\w+)@(\w+\.\w+)", text)

if match:
    print(f"Full match: {match.group(0)}") # group(0) or group() is the entire match
    print(f"Username: {match.group(1)}")   # group(1) is the content captured by the first parentheses
    print(f"Domain: {match.group(2)}")     # group(2) is the content captured by the second parentheses

Regular expressions are a very vast and powerful field, and mastering them requires constant practice. It's recommended to use online tools (such as regex101.com) to test and learn patterns.

#Python Regular Expressions

#What is a Regular Expression?

#Core Functions of the re Module

#re.search(pattern, string)

#re.findall(pattern, string)

#re.sub(pattern, repl, string)

#Common Metacharacters

#Grouping