Aggregate Functions
Aggregate functions are used to calculate a set of values and return a single result value. They are very important in data analysis and statistics.
Common Aggregate Functions
COUNT
Count rows:
sql
-- Count all rows
SELECT COUNT(*) FROM users;
-- Count non-NULL values
SELECT COUNT(email) FROM users;
-- Count different values
SELECT COUNT(DISTINCT city) FROM users;SUM
Calculate sum of numeric columns:
sql
-- Calculate total amount of all orders
SELECT SUM(total_amount) FROM orders;
-- Calculate total amount for specific user
SELECT SUM(total_amount) FROM orders WHERE user_id = 1;AVG
Calculate average of numeric columns:
sql
-- Calculate average age of users
SELECT AVG(age) FROM users;
-- Calculate average amount of orders
SELECT AVG(total_amount) FROM orders;MIN and MAX
Find minimum and maximum values:
sql
-- Find minimum age
SELECT MIN(age) FROM users;
-- Find maximum age
SELECT MAX(age) FROM users;
-- Find earliest order
SELECT MIN(order_date) FROM orders;Grouping Aggregation
GROUP BY Basics
Group by columns for aggregation:
sql
-- Group by city, count users per city
SELECT
city,
COUNT(*) AS user_count
FROM users
GROUP BY city;Result:
| city | user_count |
|---|---|
| Beijing | 150 |
| Shanghai | 120 |
| Guangzhou | 80 |
Multi-column Grouping
sql
-- Group by city and gender
SELECT
city,
gender,
COUNT(*) AS user_count
FROM users
GROUP BY city, gender;Using HAVING to Filter Groups
HAVING is used to filter aggregated results:
sql
-- Find cities with more than 100 users
SELECT
city,
COUNT(*) AS user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 100;Practical Examples
Example 1: Sales Statistics
sql
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS total_sales,
AVG(total_amount) AS avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;Example 2: User Behavior Analysis
sql
SELECT
u.id AS user_id,
u.username,
COUNT(o.id) AS order_count,
SUM(o.total_amount) AS total_spent,
AVG(o.total_amount) AS avg_order_value,
MAX(o.order_date) AS last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id
ORDER BY total_spent DESC;Performance Optimization
1. Use WHERE Instead of HAVING
sql
-- Not recommended
SELECT city, COUNT(*)
FROM users
GROUP BY city
HAVING city = 'Beijing';
-- Recommended
SELECT city, COUNT(*)
FROM users
WHERE city = 'Beijing'
GROUP BY city;2. Create Indexes for Grouping Columns
sql
CREATE INDEX idx_user_city ON users(city);Summary
Aggregate functions are foundational tools for data analysis. Mastering them is crucial for data statistics and analysis.
Next Step: Learn GROUP BY