Skip to content

MySQL Regular Expressions

Overview

MySQL provides regular expression support through the REGEXP and RLIKE operators for pattern matching in string data. Regular expressions are powerful tools for complex pattern matching.

Regular Expression Basics

  • Pattern Matching: Match specific patterns in text
  • Wildcards: Special characters for flexible matching
  • Anchors: Position of match in string
  • Character Classes: Sets of characters to match

REGEXP Operator

Basic Syntax

sql
-- REGEXP (case-sensitive by default)
SELECT 'MySQL' REGEXP 'My';         -- TRUE
SELECT 'mysql' REGEXP 'My';         -- FALSE (case-sensitive)

-- RLIKE is synonym for REGEXP
SELECT 'Hello' RLIKE 'Hell';       -- TRUE

-- NOT REGEXP for negative match
SELECT 'Hello' NOT REGEXP 'World';  -- TRUE

REGEXP vs LIKE

sql
-- LIKE uses simple wildcards
SELECT name FROM products WHERE name LIKE '%phone%';
-- Matches: iPhone, smartphone, telephone

-- REGEXP uses powerful patterns
SELECT name FROM products WHERE name REGEXP 'phone|phone';
-- Matches: iPhone, smartphone, telephone, phone

-- Multiple patterns with REGEXP
SELECT name FROM products WHERE name REGEXP 'phone|laptop|tablet';
-- Matches: iPhone, laptop, tablet

-- Equivalent with LIKE would require OR
SELECT name FROM products 
WHERE name LIKE '%phone%' 
   OR name LIKE '%laptop%' 
   OR name LIKE '%tablet%';

Regular Expression Metacharacters

Common Metacharacters

| Pattern |-----------------|------------------|----------------| | ^ | Start of string | $ | End of string | . | Any single character | | | OR operator | * | Zero or more occurrences | + | One or more occurrences | ? | Zero or one occurrence | [ ] | Character class | [^ ] | Negated character class | \ | Escape character

Examples

sql
-- Start of string
SELECT 'Hello World' REGEXP '^Hello';      -- TRUE
SELECT 'Hello World' REGEXP '^World';      -- FALSE

-- End of string
SELECT 'Hello World' REGEXP 'World$';      -- TRUE
SELECT 'Hello World' REGEXP 'Hello$';      -- FALSE

-- Any character
SELECT 'cat' REGEXP 'c.t';                -- TRUE
SELECT 'cut' REGEXP 'c.t';                -- TRUE
SELECT 'ct' REGEXP 'c.t';                 -- FALSE (needs one character)

-- OR operator
SELECT 'cat' REGEXP 'cat|dog';            -- TRUE
SELECT 'dog' REGEXP 'cat|dog';            -- TRUE
SELECT 'bird' REGEXP 'cat|dog';           -- FALSE

-- Repetition
SELECT 'ca' REGEXP 'ca*';                -- TRUE (zero a's)
SELECT 'caa' REGEXP 'ca*';               -- TRUE (two a's)
SELECT 'caaa' REGEXP 'ca+';               -- TRUE (one or more a's)
SELECT 'cb' REGEXP 'ca+';                 -- FALSE (no a)

-- Optional
SELECT 'color' REGEXP 'colou?r';          -- TRUE
SELECT 'colour' REGEXP 'colou?r';         -- TRUE

Character Classes

Predefined Classes

| Class |------------|------------------| | [[:alnum:]] | Alphanumeric | [[:alpha:]] | Alphabetic | [[:digit:]] | Digits | [[:lower:]] | Lowercase | [::upper:]] | Uppercase | [[:space:]] | Whitespace | [::punct:]] | Punctuation

Examples

sql
-- Alphanumeric
SELECT 'user123' REGEXP '[[:alnum:]]+';     -- TRUE
SELECT 'user_123' REGEXP '[[:alnum:]]+';   -- TRUE (matches part)

-- Digits
SELECT '123' REGEXP '[[:digit:]]+';          -- TRUE
SELECT 'abc' REGEXP '[[:digit:]]+';          -- FALSE

-- Alphabetic
SELECT 'hello' REGEXP '[[:alpha:]]+';         -- TRUE
SELECT 'hello123' REGEXP '[[:alpha:]]+';     -- TRUE (matches hello)

-- Custom character class
SELECT 'cat' REGEXP '[abc]at';              -- TRUE
SELECT 'dog' REGEXP '[abc]at';              -- FALSE

-- Range in character class
SELECT 'cat' REGEXP '[a-z]at';              -- TRUE
SELECT 'dog' REGEXP '[a-z]og';              -- TRUE
SELECT '123' REGEXP '[0-9]+';               -- TRUE

-- Negated character class
SELECT 'cat' REGEXP '[^x]at';              -- TRUE
SELECT 'xat' REGEXP '[^x]at';              -- FALSE

Practical Examples

Email Validation

sql
-- Simple email pattern
SELECT * FROM users 
WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

-- More specific pattern
SELECT * FROM users 
WHERE email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$';

Phone Number Matching

sql
-- Match US phone numbers
SELECT * FROM contacts 
WHERE phone REGEXP '^\\([0-9]{3}\\) [0-9]{3}-[0-9]{4}$';

-- Match various phone formats
SELECT * FROM contacts 
WHERE phone REGEXP '^[0-9]{3}-[0-9]{3}-[0-9]{4}$|^[0-9]{10}$';

Product Codes

sql
-- Match product codes like PROD-123
SELECT * FROM products 
WHERE code REGEXP '^[A-Z]{4}-[0-9]{3}$';

-- Match SKU patterns
SELECT * FROM products 
WHERE sku REGEXP '[A-Z]{2}[0-9]{4}[A-Z]';
sql
-- Find words starting with specific letters
SELECT * FROM articles 
WHERE content REGEXP '\\b[A-M][a-z]+\\b';

-- Find URLs in text
SELECT * FROM messages 
WHERE content REGEXP 'https?://[a-zA-Z0-9./?=&_-]+';

-- Find mentions (@username)
SELECT * FROM tweets 
WHERE content REGEXP '@[a-zA-Z0-9_]+';

REGEXP_REPLACE Function

MySQL 8.0+ supports REGEXP_REPLACE for substitution.

sql
-- Replace digits with X
SELECT REGEXP_REPLACE('Product123', '[0-9]', 'X');
-- Returns: ProductXXX

-- Remove non-alphanumeric characters
SELECT REGEXP_REPLACE('Item #123!', '[^a-zA-Z0-9]', '');
-- Returns: Item123

-- Format phone numbers
SELECT REGEXP_REPLACE('5551234567', '([0-9]{3})([0-9]{3})([0-9]{4})', '(\\1) \\2-\\3');
-- Returns: (555) 123-4567

-- Replace multiple spaces
SELECT REGEXP_REPLACE('Too    many    spaces', '[[:space:]]+', ' ');
-- Returns: Too many spaces

REGEXP_LIKE / REGEXP_LIKE

Simple pattern matching without full regex support.

sql
-- REGEXP_LIKE for simple patterns
SELECT name FROM products WHERE name REGEXP_LIKE 'phone%';
-- Similar to LIKE but uses REGEXP engine

-- Case-insensitive matching
SELECT name FROM products WHERE name REGEXP_LIKE 'phone%';

Performance Considerations

REGEXP Performance

sql
-- REGEXP can be slow on large datasets
-- Consider using indexed columns first

-- Efficient pattern
SELECT * FROM users 
WHERE email REGEXP '^[a-z]+@example\\.com$'
  AND email LIKE '%@example.com%';
-- LIKE uses index if possible, REGEXP refines

-- Avoid REGEXP with leading wildcards
-- Poor: (can't use index)
SELECT * FROM products WHERE name REGEXP '.*phone';

-- Better: (can use index)
SELECT * FROM products WHERE name LIKE 'phone%' OR name LIKE '%phone';

Alternative Approaches

sql
-- Use SUBSTRING for fixed patterns
SELECT * FROM products 
WHERE code LIKE 'PROD-%';

-- Use LEFT for prefix matching
SELECT * FROM products 
WHERE LEFT(code, 4) = 'PROD';

-- Use FULLTEXT for text search
SELECT * FROM articles 
WHERE MATCH(title, content) AGAINST('database' IN NATURAL LANGUAGE MODE);

Advanced Patterns

Capturing Groups

sql
-- Extract parts using REGEXP_SUBSTR (MySQL 8.0+)
SELECT 
    code,
    REGEXP_SUBSTR(code, '[A-Z]+') AS letters,
    REGEXP_SUBSTR(code, '[0-9]+') AS numbers
FROM products
WHERE code = 'PROD123';
-- Returns: letters=PROD, numbers=123

Word Boundaries

sql
-- Match whole words
SELECT * FROM articles 
WHERE content REGEXP '\\bdatabase\\b';

-- Match word at start of string
SELECT * FROM articles 
WHERE title REGEXP '^\\bMySQL\\b';

Repetition Quantifiers

sql
-- Exact match {n}
SELECT 'aaa' REGEXP 'a{3}';     -- TRUE
SELECT 'aa' REGEXP 'a{3}';      -- FALSE

-- Range match {n,m}
SELECT 'aaa' REGEXP 'a{2,4}';   -- TRUE
SELECT 'aaaaa' REGEXP 'a{2,4}';  -- FALSE (too many)

-- Minimum match {n,}
SELECT 'aaa' REGEXP 'a{2,}';    -- TRUE
SELECT 'a' REGEXP 'a{2,}';     -- FALSE

REGEXP Case Sensitivity

sql
-- Case-sensitive (default)
SELECT 'MySQL' REGEXP 'mysql';     -- FALSE

-- Case-insensitive (using BINARY)
SELECT BINARY 'MySQL' REGEXP BINARY 'mysql';  -- FALSE

-- Case-insensitive pattern
SELECT 'MySQL' REGEXP '[Mm][Yy][Ss][Qq][Ll]';  -- TRUE

-- Convert to lowercase
SELECT LOWER('MySQL') REGEXP 'mysql';  -- TRUE

Common Use Cases

Data Validation

sql
-- Validate postal codes
SELECT * FROM addresses 
WHERE postal_code REGEXP '^[0-9]{5}(-[0-9]{4})?$';

-- Validate credit card format
SELECT * FROM payments 
WHERE card_number REGEXP '^[0-9]{13,19}$';

Data Cleaning

sql
-- Remove special characters
UPDATE products 
SET name_clean = REGEXP_REPLACE(name, '[^a-zA-Z0-9 ]', '');

-- Standardize phone formats
UPDATE contacts 
SET phone = REGEXP_REPLACE(phone, '[^0-9]', '');

Pattern Extraction

sql
-- Extract domain from email
SELECT 
    email,
    REGEXP_SUBSTR(email, '@([a-zA-Z0-9.-]+)\\.[a-z]{2,}$') AS domain
FROM users;

Summary

Regular expressions in MySQL provide:

  • Powerful Pattern Matching: Complex patterns beyond LIKE
  • Flexible Matching: Wildcards, character classes, anchors
  • String Functions: REGEXP_REPLACE, REGEXP_SUBSTR (MySQL 8.0+)
  • Search Capabilities: Email, phone, URL validation

Consider performance and use appropriate indexing when working with large datasets.


Previous: NULL Values

Next: Transactions

Content is for learning and research only.