Python Regular Expressions
Regular Expressions (regex) are a powerful tool that uses a specialized syntax to search, match, and manipulate specific patterns in strings. Python supports regular expressions through the built-in re module.
What is a Regular Expression?
Imagine you want to find all phone numbers or email addresses in a long text. These patterns are hard to describe using simple string methods (like find() or startswith()). Regular expressions allow you to define a "pattern" and then use this pattern to match any complex strings you want.
Core Functions of the re Module
re.search(pattern, string)
Scans the entire string and finds the first occurrence of the pattern, returning a match object (Match Object) if found, otherwise returning None.
What is
r"..."? Therprefix indicates this is a "raw string". In regular expressions, backslashes\have special meanings (such as\drepresenting numbers). Using raw strings prevents the Python interpreter from escaping backslashes, thereby simplifying the writing of regular expressions.
re.findall(pattern, string)
Finds all non-overlapping substrings in the string that match the pattern and returns them as a list.
re.sub(pattern, repl, string)
Finds substrings that match the pattern and replaces them with repl. Returns the new string after replacement.
Common Metacharacters
Metacharacters are characters with special meanings in regular expressions.
Grouping
Using parentheses () can group patterns. This has two main purposes:
- Apply quantifiers (such as
*,+,?) to multiple characters as a whole. - Capture the matched content for later referencing.
Regular expressions are a very vast and powerful field, and mastering them requires constant practice. It's recommended to use online tools (such as regex101.com) to test and learn patterns.