Skip to content

GROUP BY Grouping

The GROUP BY clause is used to group query results by one or more columns, typically used with aggregate functions. This chapter introduces how to use GROUP BY.

Basic Syntax

sql
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

Simple Grouping

sql
-- Count users per city
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city;

Result:

cityuser_count
Beijing150
Shanghai120
Guangzhou80

With Aggregate Functions

COUNT

sql
-- Users per city
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city;

SUM

sql
-- Total order amount per user
SELECT 
    user_id,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY user_id;

AVG

sql
-- Average age per city
SELECT 
    city,
    AVG(age) AS avg_age
FROM users
GROUP BY city;

MAX/MIN

sql
-- Most and least expensive products per category
SELECT 
    category,
    MAX(price) AS max_price,
    MIN(price) AS min_price
FROM products
GROUP BY category;

Multi-Column Grouping

sql
-- Group by city and gender
SELECT 
    city,
    gender,
    COUNT(*) AS user_count
FROM users
GROUP BY city, gender
ORDER BY city, gender;

HAVING Clause

HAVING filters grouped results (WHERE filters rows before grouping).

Basic Usage

sql
-- Find cities with more than 100 users
SELECT 
    city,
    COUNT(*) AS user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 100;

WHERE vs HAVING

sql
-- WHERE: Filter before grouping
SELECT city, COUNT(*) AS user_count
FROM users
WHERE age >= 18  -- Filter age first
GROUP BY city
HAVING COUNT(*) > 50;  -- Then filter groups

Execution Order:

  1. FROM - Select table
  2. WHERE - Filter rows
  3. GROUP BY - Group
  4. HAVING - Filter groups
  5. SELECT - Select columns
  6. ORDER BY - Sort

Multiple Aggregate Functions

sql
-- Order statistics per user
SELECT 
    user_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent,
    AVG(total_amount) AS avg_order_value,
    MAX(total_amount) AS max_order,
    MIN(total_amount) AS min_order
FROM orders
GROUP BY user_id;

Group by Expression

sql
-- Group orders by year
SELECT 
    YEAR(order_date) AS year,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_sales
FROM orders
GROUP BY YEAR(order_date)
ORDER BY year;

Practical Examples

Sales Statistics

sql
-- Monthly sales statistics
SELECT 
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_sales,
    AVG(total_amount) AS avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;

User Behavior Analysis

sql
-- Analyze user purchase behavior
SELECT 
    u.id,
    u.username,
    COUNT(o.id) AS order_count,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_value,
    MAX(o.order_date) AS last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.username
HAVING order_count > 0
ORDER BY total_spent DESC
LIMIT 20;

Common Errors

Error 1: Non-grouped Column in SELECT

sql
-- Error: name not in GROUP BY
SELECT city, name, COUNT(*)
FROM users
GROUP BY city;

-- Correct: Only select grouped columns or aggregates
SELECT city, COUNT(*)
FROM users
GROUP BY city;

Error 2: Aggregate Function in WHERE

sql
-- Error: WHERE cannot use aggregate functions
SELECT city, COUNT(*) AS user_count
FROM users
WHERE COUNT(*) > 100
GROUP BY city;

-- Correct: Use HAVING
SELECT city, COUNT(*) AS user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 100;

Performance Optimization

1. Create Indexes for Group Columns

sql
-- Create index for commonly grouped columns
CREATE INDEX idx_city ON users(city);
CREATE INDEX idx_category ON products(category);

2. Filter Before Grouping

sql
-- Good: Filter with WHERE first
SELECT city, COUNT(*)
FROM users
WHERE status = 'active'
GROUP BY city;

Best Practices

  1. Only use grouped columns or aggregates in SELECT
  2. Use WHERE to filter rows, HAVING to filter groups
  3. Create indexes for group columns
  4. Filter before grouping for better performance
  5. Use meaningful aliases

Summary

  • GROUP BY: Group data by columns
  • Aggregate Functions: COUNT, SUM, AVG, MAX, MIN
  • HAVING: Filter grouped results
  • Multi-column Grouping: Group by multiple columns
  • Performance: Create indexes, filter before grouping

Next Step: Learn ALTER Tables

Content is for learning and research only.