Java Regular Expressions

Regular expressions (Regex) are a powerful string processing tool. They are special character sequences used to define search patterns. In Java, the core functionality for handling regular expressions is located in the java.util.regex package.

What are Regular Expressions?

Regular expressions can be used to:

Validate: Check if a string conforms to a certain format (such as email, phone number).
Search: Find all substrings in a text that match a specific pattern.
Replace: Find matching substrings and replace them with other content.
Split: Split a string based on a pattern.

Core Classes in the `java.util.regex` Package

Pattern Class: Represents a compiled regular expression. A Pattern object has no public constructor and needs to be created through its static method Pattern.compile().
Matcher Class: A regex matching engine. It performs matching operations on an input string by interpreting a Pattern. A Matcher object is obtained through the pattern.matcher(inputString) method.

Basic Matching Process

Using regular expressions typically follows these three steps:

Create a Pattern object using Pattern.compile(regex).
Create a Matcher object using pattern.matcher(input).
Use methods of the Matcher object (such as find(), matches()) to perform matching.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexExample {
    public static void main(String[] args) {
        String text = "The quick brown fox jumps over the lazy dog.";
        String regex = "\\b[a-zA-Z]{3}\\b"; // Match all 3-letter words

        // 1. Compile the regular expression
        Pattern pattern = Pattern.compile(regex);

        // 2. Create a matcher
        Matcher matcher = pattern.matcher(text);

        // 3. Find matches
        System.out.println("Finding all 3-letter words in the text:");
        while (matcher.find()) {
            // find() tries to find the next match
            // group() returns the currently found matching substring
            System.out.println("Found: '" + matcher.group() + "' at index " + matcher.start());
        }
    }
}
// Output:
// Found: 'The' at index 0
// Found: 'fox' at index 16
// Found: 'the' at index 31
// Found: 'dog' at index 40

Note: In Java strings, the backslash \ is an escape character, so to use a \ in a regular expression, you need to write \\ in the string.

Common `Matcher` Methods

matches(): Attempts to match the entire input string against the pattern. Returns true only if the entire string matches completely.
find(): Attempts to find the next subsequence of the input string that matches the pattern. Each call continues searching from where the last match ended.
lookingAt(): Attempts to match the pattern from the beginning of the input string. Returns true if the beginning matches, without requiring the entire string to match.
group(): Returns the substring captured by the last matching operation (such as find()).
start() / end(): Returns the start index and end index (exclusive) of the last matched substring.
replaceAll(replacement): Replaces all matching substrings.

Regex Methods in the `String` Class

For convenience, the String class also has some built-in methods that directly support regular expressions.

boolean matches(String regex): Determines if the entire string matches the given regular expression. Equivalent to Pattern.matches(regex, this).

String email = "test@example.com";
// A simple email format validation
boolean isValid = email.matches("^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$");
System.out.println("Is email format valid: " + isValid); // true

String[] split(String regex): Splits the string based on a regular expression.

String text = "apple, banana; orange";
String[] fruits = text.split("[,;\\s]+"); // Split by comma, semicolon, or whitespace
// fruits -> ["apple", "banana", "orange"]

String replaceAll(String regex, String replacement): Replaces all substrings matching the regular expression with the specified string.

String text = "My phone number is 123-456-7890.";
// Replace all digits with 'X'
String censored = text.replaceAll("\\d", "X");
// censored -> "My phone number is XXX-XXX-XXXX."

Common Regex Metacharacters

Metacharacter	Description
`.`	Matches any single character except newline
`\d`	Matches a digit, equivalent to `[0-9]`
`\D`	Matches a non-digit character
`\s`	Matches any whitespace character (space, tab, newline, etc.)
`\S`	Matches any non-whitespace character
`\w`	Matches any word character (letter, digit, underscore), equivalent to `[a-zA-Z_0-9]`
`\W`	Matches any non-word character
`\b`	Matches a word boundary
`^`	Matches the beginning of input
`$`	Matches the end of input
`*`	Matches the preceding element zero or more times
`+`	Matches the preceding element one or more times
`?`	Matches the preceding element zero or one time
`{n}`	Matches the preceding element exactly n times
`{n,}`	Matches the preceding element at least n times
`{n,m}`	Matches the preceding element at least n times, but no more than m times
`[]`	Character set, matches any one character in the brackets. For example, `[abc]` matches 'a', 'b', or 'c'
`()`	Grouping, treats multiple characters as a single unit, and used for capturing matches
`\|`	OR operator, matches either expression on either side of `\|`. For example, `cat\|dog` matches "cat" or "dog"

#Java Regular Expressions

#What are Regular Expressions?

#Core Classes in the java.util.regex Package

#Basic Matching Process

#Common Matcher Methods

#Regex Methods in the String Class