Skip to content

Java Regular Expressions

Regular expressions (Regex) are a powerful string processing tool. They are special character sequences used to define search patterns. In Java, the core functionality for handling regular expressions is located in the java.util.regex package.

What are Regular Expressions?

Regular expressions can be used to:

  • Validate: Check if a string conforms to a certain format (such as email, phone number).
  • Search: Find all substrings in a text that match a specific pattern.
  • Replace: Find matching substrings and replace them with other content.
  • Split: Split a string based on a pattern.

Core Classes in the java.util.regex Package

  1. Pattern Class: Represents a compiled regular expression. A Pattern object has no public constructor and needs to be created through its static method Pattern.compile().
  2. Matcher Class: A regex matching engine. It performs matching operations on an input string by interpreting a Pattern. A Matcher object is obtained through the pattern.matcher(inputString) method.

Basic Matching Process

Using regular expressions typically follows these three steps:

  1. Create a Pattern object using Pattern.compile(regex).
  2. Create a Matcher object using pattern.matcher(input).
  3. Use methods of the Matcher object (such as find(), matches()) to perform matching.
java
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexExample {
    public static void main(String[] args) {
        String text = "The quick brown fox jumps over the lazy dog.";
        String regex = "\\b[a-zA-Z]{3}\\b"; // Match all 3-letter words

        // 1. Compile the regular expression
        Pattern pattern = Pattern.compile(regex);

        // 2. Create a matcher
        Matcher matcher = pattern.matcher(text);

        // 3. Find matches
        System.out.println("Finding all 3-letter words in the text:");
        while (matcher.find()) {
            // find() tries to find the next match
            // group() returns the currently found matching substring
            System.out.println("Found: '" + matcher.group() + "' at index " + matcher.start());
        }
    }
}
// Output:
// Found: 'The' at index 0
// Found: 'fox' at index 16
// Found: 'the' at index 31
// Found: 'dog' at index 40

Note: In Java strings, the backslash \ is an escape character, so to use a \ in a regular expression, you need to write \\ in the string.

Common Matcher Methods

  • matches(): Attempts to match the entire input string against the pattern. Returns true only if the entire string matches completely.
  • find(): Attempts to find the next subsequence of the input string that matches the pattern. Each call continues searching from where the last match ended.
  • lookingAt(): Attempts to match the pattern from the beginning of the input string. Returns true if the beginning matches, without requiring the entire string to match.
  • group(): Returns the substring captured by the last matching operation (such as find()).
  • start() / end(): Returns the start index and end index (exclusive) of the last matched substring.
  • replaceAll(replacement): Replaces all matching substrings.

Regex Methods in the String Class

For convenience, the String class also has some built-in methods that directly support regular expressions.

  • boolean matches(String regex): Determines if the entire string matches the given regular expression. Equivalent to Pattern.matches(regex, this).

    java
    String email = "test@example.com";
    // A simple email format validation
    boolean isValid = email.matches("^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$");
    System.out.println("Is email format valid: " + isValid); // true
  • String[] split(String regex): Splits the string based on a regular expression.

    java
    String text = "apple, banana; orange";
    String[] fruits = text.split("[,;\\s]+"); // Split by comma, semicolon, or whitespace
    // fruits -> ["apple", "banana", "orange"]
  • String replaceAll(String regex, String replacement): Replaces all substrings matching the regular expression with the specified string.

    java
    String text = "My phone number is 123-456-7890.";
    // Replace all digits with 'X'
    String censored = text.replaceAll("\\d", "X");
    // censored -> "My phone number is XXX-XXX-XXXX."

Common Regex Metacharacters

MetacharacterDescription
.Matches any single character except newline
\dMatches a digit, equivalent to [0-9]
\DMatches a non-digit character
\sMatches any whitespace character (space, tab, newline, etc.)
\SMatches any non-whitespace character
\wMatches any word character (letter, digit, underscore), equivalent to [a-zA-Z_0-9]
\WMatches any non-word character
\bMatches a word boundary
^Matches the beginning of input
$Matches the end of input
*Matches the preceding element zero or more times
+Matches the preceding element one or more times
?Matches the preceding element zero or one time
{n}Matches the preceding element exactly n times
{n,}Matches the preceding element at least n times
{n,m}Matches the preceding element at least n times, but no more than m times
[]Character set, matches any one character in the brackets. For example, [abc] matches 'a', 'b', or 'c'
()Grouping, treats multiple characters as a single unit, and used for capturing matches
|OR operator, matches either expression on either side of |. For example, cat|dog matches "cat" or "dog"

Content is for learning and research only.