GROUP BY Grouping

The GROUP BY clause is used to group query results by one or more columns, typically used with aggregate functions. This chapter introduces how to use GROUP BY.

Basic Syntax

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

Simple Grouping

-- Count users per city
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city;

Result:

cityuser_count
Beijing150
Shanghai120
Guangzhou80

With Aggregate Functions

COUNT

-- Users per city
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city;

SUM

-- Total order amount per user
SELECT 
    user_id,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY user_id;

AVG

-- Average age per city
SELECT 
    city,
    AVG(age) AS avg_age
FROM users
GROUP BY city;

MAX/MIN

-- Most and least expensive products per category
SELECT 
    category,
    MAX(price) AS max_price,
    MIN(price) AS min_price
FROM products
GROUP BY category;

Multi-Column Grouping

-- Group by city and gender
SELECT 
    city,
    gender,
    COUNT(*) AS user_count
FROM users
GROUP BY city, gender
ORDER BY city, gender;

HAVING Clause

HAVING filters grouped results (WHERE filters rows before grouping).

Basic Usage

-- Find cities with more than 100 users
SELECT 
    city,
    COUNT(*) AS user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 100;

WHERE vs HAVING

-- WHERE: Filter before grouping
SELECT city, COUNT(*) AS user_count
FROM users
WHERE age >= 18  -- Filter age first
GROUP BY city
HAVING COUNT(*) > 50;  -- Then filter groups

Execution Order:

  1. FROM - Select table
  2. WHERE - Filter rows
  3. GROUP BY - Group
  4. HAVING - Filter groups
  5. SELECT - Select columns
  6. ORDER BY - Sort

Multiple Aggregate Functions

-- Order statistics per user
SELECT 
    user_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent,
    AVG(total_amount) AS avg_order_value,
    MAX(total_amount) AS max_order,
    MIN(total_amount) AS min_order
FROM orders
GROUP BY user_id;

Group by Expression

-- Group orders by year
SELECT 
    YEAR(order_date) AS year,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_sales
FROM orders
GROUP BY YEAR(order_date)
ORDER BY year;

Practical Examples

Sales Statistics

-- Monthly sales statistics
SELECT 
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_sales,
    AVG(total_amount) AS avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;

User Behavior Analysis

-- Analyze user purchase behavior
SELECT 
    u.id,
    u.username,
    COUNT(o.id) AS order_count,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_value,
    MAX(o.order_date) AS last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.username
HAVING order_count > 0
ORDER BY total_spent DESC
LIMIT 20;

Common Errors

Error 1: Non-grouped Column in SELECT

-- Error: name not in GROUP BY
SELECT city, name, COUNT(*)
FROM users
GROUP BY city;

-- Correct: Only select grouped columns or aggregates
SELECT city, COUNT(*)
FROM users
GROUP BY city;

Error 2: Aggregate Function in WHERE

-- Error: WHERE cannot use aggregate functions
SELECT city, COUNT(*) AS user_count
FROM users
WHERE COUNT(*) > 100
GROUP BY city;

-- Correct: Use HAVING
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 100;

Performance Optimization

1. Create Indexes for Group Columns

-- Create index for commonly grouped columns
CREATE INDEX idx_city ON users(city);
CREATE INDEX idx_category ON products(category);

2. Filter Before Grouping

-- Good: Filter with WHERE first
SELECT city, COUNT(*)
FROM users
WHERE status = 'active'
GROUP BY city;

Best Practices

  1. Only use grouped columns or aggregates in SELECT
  2. Use WHERE to filter rows, HAVING to filter groups
  3. Create indexes for group columns
  4. Filter before grouping for better performance
  5. Use meaningful aliases

Summary

  • GROUP BY: Group data by columns
  • Aggregate Functions: COUNT, SUM, AVG, MAX, MIN
  • HAVING: Filter grouped results
  • Multi-column Grouping: Group by multiple columns
  • Performance: Create indexes, filter before grouping

Next Step: Learn ALTER Tables