Regular Expressions
Overview
Regular expressions are a powerful text pattern matching tool. PHP uses the PCRE (Perl Compatible Regular Expressions) library to support regular expressions. This chapter will learn how to use regular expressions for text matching, replacement, splitting, and validation.
Basic Syntax
PCRE Functions Introduction
php
<?php
// Main PCRE functions
// preg_match() - Execute a match
// preg_match_all() - Execute a global match
// preg_replace() - Perform search and replace
// preg_split() - Split string using regular expression
// Basic matching example
$text = "Hello World 2024";
$pattern = '/World/';
if (preg_match($pattern, $text)) {
echo "Match found!\n";
}
// Get match results
if (preg_match('/(\d+)/', $text, $matches)) {
echo "Found number: " . $matches[1] . "\n";
}
?>Basic Metacharacters
php
<?php
$text = "The price is $25.99 for item #123";
// . - Match any character (except newline)
preg_match('/p.ice/', $text, $matches);
echo "Match '.': " . ($matches[0] ?? 'None') . "\n";
// * - Match preceding character 0 or more times
preg_match('/\d*/', $text, $matches);
echo "Match '*': " . ($matches[0] ?? 'None') . "\n";
// + - Match preceding character 1 or more times
preg_match('/\d+/', $text, $matches);
echo "Match '+': " . ($matches[0] ?? 'None') . "\n";
// ? - Match preceding character 0 or 1 time
preg_match('/\$?\d+/', $text, $matches);
echo "Match '?': " . ($matches[0] ?? 'None') . "\n";
?>Character Classes and Predefined Character Classes
php
<?php
$text = "User ID: A123, Age: 25, Email: user@example.com";
// [abc] - Match any character in the character set
preg_match('/[AEI]/', $text, $matches);
echo "Character class [AEI]: " . ($matches[0] ?? 'None') . "\n";
// [a-z] - Match characters in range
preg_match('/[a-z]+/', $text, $matches);
echo "Character class [a-z]: " . ($matches[0] ?? 'None') . "\n";
// Predefined character classes
// \d - Match digits [0-9]
preg_match_all('/\d/', $text, $matches);
echo "Digit characters: " . implode(', ', $matches[0]) . "\n";
// \w - Match word characters [a-zA-Z0-9_]
preg_match_all('/\w+/', $text, $matches);
echo "Word characters: " . implode(', ', $matches[0]) . "\n";
// \s - Match whitespace characters
preg_match_all('/\s/', $text, $matches);
echo "Whitespace character count: " . count($matches[0]) . "\n";
?>Common Validation Patterns
Data Validation Class
php
<?php
class Validator {
// Email validation
public static function validateEmail($email) {
$pattern = '/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/';
return preg_match($pattern, $email);
}
// Phone number validation (Mainland China)
public static function validatePhone($phone) {
$pattern = '/^1[3-9]\d{9}$/';
return preg_match($pattern, $phone);
}
// ID card validation (simplified)
public static function validateIdCard($idCard) {
$pattern = '/^[1-9]\d{5}(19|20)\d{2}(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01])\d{3}[\dX]$/';
return preg_match($pattern, $idCard);
}
// Password strength validation
public static function validatePassword($password) {
// At least 8 characters, containing uppercase, lowercase, digits, and special characters
$pattern = '/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/';
return preg_match($pattern, $password);
}
// URL validation
public static function validateUrl($url) {
$pattern = '/^https?:\/\/(?:[-\w.])+(?:\:[0-9]+)?(?:\/(?:[\w\/_.])*(?:\?(?:[\w&=%.])*)?(?:\#(?:[\w.])*)?)?$/';
return preg_match($pattern, $url);
}
// IP address validation
public static function validateIP($ip) {
$pattern = '/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/';
return preg_match($pattern, $ip);
}
}
// Test validator
$testData = [
'email' => 'user@example.com',
'phone' => '13800138000',
'password' => 'StrongP@ss123',
'url' => 'https://www.example.com',
'ip' => '192.168.1.1'
];
foreach ($testData as $type => $value) {
$method = 'validate' . ucfirst($type);
if (method_exists('Validator', $method)) {
$isValid = Validator::$method($value);
echo "$type ($value): " . ($isValid ? "Valid" : "Invalid") . "\n";
}
}
?>Text Processing and Replacement
Search and Replace
php
<?php
$text = "Contact us: Phone 010-12345678, Mobile 138-0013-8000, Email contact@example.com";
// Basic replacement
$result = preg_replace('/\d{3}-\d{4}-\d{4}/', '***-****-****', $text);
echo "Hide phone number: $result\n";
// Replace using callback function
$result = preg_replace_callback('/(\w+)@([\w.-]+)/', function($matches) {
return $matches[1] . '@***';
}, $text);
echo "Hide email domain: $result\n";
// Advanced replacement using groups
$html = '<img src="image1.jpg" alt="Image 1"><img src="image2.png" alt="Image 2">';
$result = preg_replace('/<img src="([^"]+)" alt="([^"]+)">/', '<figure><img src="$1"><figcaption>$2</figcaption></figure>', $html);
echo "HTML conversion: $result\n";
?>Text Splitting and Extraction
php
<?php
// Split string
$text = "Apple,Banana;Orange|Grape Strawberry";
$fruits = preg_split('/[,;|\s]+/', $text);
print_r($fruits);
// Extract log information
$log = "2024-01-15 10:30:45 [ERROR] Database connection failed";
$pattern = '/(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] (.+)/';
if (preg_match($pattern, $log, $matches)) {
echo "Date: " . $matches[1] . "\n";
echo "Time: " . $matches[2] . "\n";
echo "Level: " . $matches[3] . "\n";
echo "Message: " . $matches[4] . "\n";
}
// Extract all email addresses
$text = "Contact: admin@example.com, support@test.org, info@company.net";
preg_match_all('/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/', $text, $matches);
echo "Found email addresses:\n";
foreach ($matches[0] as $email) {
echo "- $email\n";
}
?>Advanced Features
Lookahead and Lookbehind Assertions
php
<?php
$text = "password123, admin456, user789, guest000";
// Positive lookahead (?=...)
// Match words followed by digits
preg_match_all('/\w+(?=\d+)/', $text, $matches);
echo "Words followed by digits: " . implode(', ', $matches[0]) . "\n";
// Positive lookbehind (?<=...)
// Match content preceded by specific pattern
$text3 = "Price: $100, Fee: $50, Tax: $10";
preg_match_all('/(?<=\$)\d+/', $text3, $matches);
echo "Price numbers: " . implode(', ', $matches[0]) . "\n";
?>Practical Application: Log Parser
php
<?php
class LogParser {
private $patterns = [
'apache' => '/^(\S+) \S+ \S+ \[([\w:\/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) (\d+)/',
'custom' => '/^\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] (\w+): (.+)/'
];
public function parseLogLine($line, $type = 'custom') {
if (!isset($this->patterns[$type])) {
throw new InvalidArgumentException("Unsupported log type: $type");
}
$pattern = $this->patterns[$type];
if (preg_match($pattern, trim($line), $matches)) {
return $this->formatLogEntry($matches, $type);
}
return null;
}
private function formatLogEntry($matches, $type) {
switch ($type) {
case 'apache':
return [
'ip' => $matches[1],
'timestamp' => $matches[2],
'method' => $matches[3],
'url' => $matches[4],
'status' => $matches[6],
'size' => $matches[7]
];
case 'custom':
return [
'timestamp' => $matches[1],
'level' => $matches[2],
'message' => $matches[3]
];
default:
return $matches;
}
}
}
// Usage example
$parser = new LogParser();
$logLine = "[2024-01-15 10:30:45] ERROR: Database connection failed";
$parsed = $parser->parseLogLine($logLine);
if ($parsed) {
echo "Time: {$parsed['timestamp']}\n";
echo "Level: {$parsed['level']}\n";
echo "Message: {$parsed['message']}\n";
}
?>Best Practices and Performance Optimization
Error Handling and Safe Usage
php
<?php
// Safe regular expression usage
function safeRegexMatch($pattern, $subject) {
$result = preg_match($pattern, $subject, $matches);
if ($result === false) {
$error = preg_last_error();
$errorMessages = [
PREG_NO_ERROR => 'No error',
PREG_INTERNAL_ERROR => 'Internal error',
PREG_BACKTRACK_LIMIT_ERROR => 'Backtrack limit error',
PREG_RECURSION_LIMIT_ERROR => 'Recursion limit error',
PREG_BAD_UTF8_ERROR => 'UTF-8 error'
];
throw new RuntimeException('Regular expression error: ' . ($errorMessages[$error] ?? 'Unknown error'));
}
return [$result, $matches ?? []];
}
// Validate user input
function validateUserInput($input, $type) {
$patterns = [
'username' => '/^[a-zA-Z0-9_]{3,20}$/',
'email' => '/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/',
'phone' => '/^1[3-9]\d{9}$/'
];
if (!isset($patterns[$type])) {
throw new InvalidArgumentException("Unsupported validation type: $type");
}
return preg_match($patterns[$type], $input) === 1;
}
// Usage example
try {
list($result, $matches) = safeRegexMatch('/\d+/', 'test123');
if ($result) {
echo "Found number: " . $matches[0] . "\n";
}
} catch (RuntimeException $e) {
echo "Error: " . $e->getMessage() . "\n";
}
?>Performance Optimization Tips
php
<?php
// 1. Use character classes instead of alternation
// Slow: (a|b|c|d|e)
// Fast: [a-e]
// 2. Avoid unnecessary backtracking
function optimizedEmailValidation($email) {
// Use atomic groups (?>...) to avoid backtracking
$pattern = '/^[a-zA-Z0-9]++(?:\.[a-zA-Z0-9]++)*+@[a-zA-Z0-9]++(?:\.[a-zA-Z0-9]++)*+$/';
return preg_match($pattern, $email);
}
// 3. Handle UTF-8 characters
$text = "中文测试123";
// Wrong: u modifier not specified
$wrong = '/\w+/';
// Correct: Use u modifier for Unicode support
$correct = '/\w+/u';
preg_match_all($wrong, $text, $matches1);
preg_match_all($correct, $text, $matches2);
echo "Without u modifier: " . implode(', ', $matches1[0]) . "\n";
echo "With u modifier: " . implode(', ', $matches2[0]) . "\n";
?>Common Errors and Solutions
Escaping Character Issues
php
<?php
// Wrong: Not properly escaped
$wrong = '/\d+.\d+/'; // . in regex means any character
// Correct: Properly escaped
$correct = '/\d+\.\d+/'; // \. means literal dot
$number = "3.14";
echo "Wrong pattern: " . (preg_match($wrong, $number) ? "Match" : "No match") . "\n";
echo "Correct pattern: " . (preg_match($correct, $number) ? "Match" : "No match") . "\n";
?>Greedy Matching Issues
php
<?php
// Problem: Greedy matching causes unexpected results
$html = '<div>content1</div><div>content2</div>';
$greedy = '/<div>.*<\/div>/';
$nonGreedy = '/<div>.*?<\/div>/';
preg_match($greedy, $html, $matches1);
preg_match($nonGreedy, $html, $matches2);
echo "Greedy match: " . $matches1[0] . "\n";
echo "Non-greedy match: " . $matches2[0] . "\n";
?>Summary
This chapter introduced the use of regular expressions in PHP:
- Basic Syntax: Metacharacters, character classes, quantifiers
- Advanced Features: Groups, assertions, lookahead/lookbehind
- Practical Applications: Data validation, text processing, log parsing
- Performance Optimization: Avoid backtracking, use appropriate patterns
- Error Handling: Safe usage, exception handling
Mastering regular expressions can greatly improve the efficiency and accuracy of text processing. In the next chapter, we will learn about PHP's standard library and built-in functions.