Skip to content

Aggregate Functions

Aggregate functions are used to calculate a set of values and return a single result value. They are very important in data analysis and statistics.

Common Aggregate Functions

COUNT

Count rows:

sql
-- Count all rows
SELECT COUNT(*) FROM users;

-- Count non-NULL values
SELECT COUNT(email) FROM users;

-- Count different values
SELECT COUNT(DISTINCT city) FROM users;

SUM

Calculate sum of numeric columns:

sql
-- Calculate total amount of all orders
SELECT SUM(total_amount) FROM orders;

-- Calculate total amount for specific user
SELECT SUM(total_amount) FROM orders WHERE user_id = 1;

AVG

Calculate average of numeric columns:

sql
-- Calculate average age of users
SELECT AVG(age) FROM users;

-- Calculate average amount of orders
SELECT AVG(total_amount) FROM orders;

MIN and MAX

Find minimum and maximum values:

sql
-- Find minimum age
SELECT MIN(age) FROM users;

-- Find maximum age
SELECT MAX(age) FROM users;

-- Find earliest order
SELECT MIN(order_date) FROM orders;

Grouping Aggregation

GROUP BY Basics

Group by columns for aggregation:

sql
-- Group by city, count users per city
SELECT
    city,
    COUNT(*) AS user_count
FROM users
GROUP BY city;

Result:

cityuser_count
Beijing150
Shanghai120
Guangzhou80

Multi-column Grouping

sql
-- Group by city and gender
SELECT
    city,
    gender,
    COUNT(*) AS user_count
FROM users
GROUP BY city, gender;

Using HAVING to Filter Groups

HAVING is used to filter aggregated results:

sql
-- Find cities with more than 100 users
SELECT
    city,
    COUNT(*) AS user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 100;

Practical Examples

Example 1: Sales Statistics

sql
SELECT
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_sales,
    AVG(total_amount) AS avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;

Example 2: User Behavior Analysis

sql
SELECT
    u.id AS user_id,
    u.username,
    COUNT(o.id) AS order_count,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_value,
    MAX(o.order_date) AS last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id
ORDER BY total_spent DESC;

Performance Optimization

1. Use WHERE Instead of HAVING

sql
-- Not recommended
SELECT city, COUNT(*)
FROM users
GROUP BY city
HAVING city = 'Beijing';

-- Recommended
SELECT city, COUNT(*)
FROM users
WHERE city = 'Beijing'
GROUP BY city;

2. Create Indexes for Grouping Columns

sql
CREATE INDEX idx_user_city ON users(city);

Summary

Aggregate functions are foundational tools for data analysis. Mastering them is crucial for data statistics and analysis.

Next Step: Learn GROUP BY

Content is for learning and research only.