Skip to content

聚合函数

聚合函数用于对一组值进行计算,返回单个结果值。它们在数据分析和统计中非常重要。

常用聚合函数

COUNT

计算行数:

sql
-- 计算所有行数
SELECT COUNT(*) FROM users;

-- 计算非 NULL 值的行数
SELECT COUNT(email) FROM users;

-- 计算不同的值
SELECT COUNT(DISTINCT city) FROM users;

SUM

计算数值列的总和:

sql
-- 计算所有订单的总金额
SELECT SUM(total_amount) FROM orders;

-- 计算特定用户的订单总额
SELECT SUM(total_amount) FROM orders WHERE user_id = 1;

AVG

计算数值列的平均值:

sql
-- 计算用户的平均年龄
SELECT AVG(age) FROM users;

-- 计算订单的平均金额
SELECT AVG(total_amount) FROM orders;

MIN 和 MAX

查找最小和最大值:

sql
-- 查找最小年龄
SELECT MIN(age) FROM users;

-- 查找最大年龄
SELECT MAX(age) FROM users;

-- 查找最早的订单
SELECT MIN(order_date) FROM orders;

分组聚合

GROUP BY 基础

按列分组进行聚合:

sql
-- 按城市分组,统计每个城市的用户数
SELECT 
    city,
    COUNT(*) AS user_count
FROM users
GROUP BY city;

结果:

cityuser_count
北京150
上海120
广州80

多列分组

sql
-- 按城市和性别分组
SELECT 
    city,
    gender,
    COUNT(*) AS user_count
FROM users
GROUP BY city, gender;

使用 HAVING 过滤分组

HAVING 用于过滤聚合结果:

sql
-- 查找用户数大于 100 的城市
SELECT 
    city,
    COUNT(*) AS user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 100;

实战示例

示例 1:销售统计

sql
SELECT 
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_sales,
    AVG(total_amount) AS avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;

示例 2:用户行为分析

sql
SELECT 
    u.id AS user_id,
    u.username,
    COUNT(o.id) AS order_count,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_value,
    MAX(o.order_date) AS last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id
ORDER BY total_spent DESC;

性能优化

1. 使用 WHERE 而不是 HAVING

sql
-- 不推荐
SELECT city, COUNT(*) 
FROM users 
GROUP BY city 
HAVING city = '北京';

-- 推荐
SELECT city, COUNT(*) 
FROM users 
WHERE city = '北京' 
GROUP BY city;

2. 为分组列创建索引

sql
CREATE INDEX idx_user_city ON users(city);

小结

聚合函数是数据分析的基础工具,掌握它们对于数据统计和分析至关重要。

下一步: 学习 GROUP BY 分组