Skip to content

PostgreSQL HAVING 子句

什么是 HAVING 子句?

HAVING 子句用于过滤 GROUP BY 子句产生的分组结果。它类似于 WHERE 子句,但 WHERE 在分组前过滤行,而 HAVING 在分组后过滤组。

基本语法

sql
SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1
HAVING aggregate_condition;

HAVING 与 WHERE 的区别

特性WHEREHAVING
过滤时机分组前分组后
可用函数不能使用聚合函数可以使用聚合函数
执行顺序先执行后执行
作用对象单行数据分组数据

基础示例

示例 1:过滤分组计数

sql
-- 查找订单数量超过 5 的客户
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;

示例 2:过滤平均值

sql
-- 查找平均工资超过 50000 的部门
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;

示例 3:过滤总和

sql
-- 查找总销售额超过 100000 的产品类别
SELECT category, SUM(sales_amount) as total_sales
FROM sales
GROUP BY category
HAVING SUM(sales_amount) > 100000;

高级用法

多个条件

sql
-- 使用多个 HAVING 条件
SELECT department, 
       COUNT(*) as emp_count,
       AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) > 10 
   AND AVG(salary) > 60000;

结合 WHERE 和 HAVING

sql
-- WHERE 过滤行,HAVING 过滤组
SELECT department, AVG(salary) as avg_salary
FROM employees
WHERE hire_date >= '2020-01-01'  -- 先过滤行
GROUP BY department
HAVING AVG(salary) > 55000;      -- 再过滤组

使用表达式

sql
-- HAVING 子句中使用表达式
SELECT product_category,
       SUM(quantity * price) as total_revenue
FROM sales
GROUP BY product_category
HAVING SUM(quantity * price) > 50000
ORDER BY total_revenue DESC;

实际应用场景

场景 1:客户分析

sql
-- 查找高价值客户(总购买金额超过 10000)
SELECT customer_id,
       customer_name,
       COUNT(*) as order_count,
       SUM(order_amount) as total_spent
FROM orders
JOIN customers USING (customer_id)
GROUP BY customer_id, customer_name
HAVING SUM(order_amount) > 10000
ORDER BY total_spent DESC;

场景 2:产品性能分析

sql
-- 查找畅销产品(销量超过平均销量的 150%)
SELECT product_id,
       product_name,
       SUM(quantity) as total_sold
FROM order_items
JOIN products USING (product_id)
GROUP BY product_id, product_name
HAVING SUM(quantity) > (
    SELECT AVG(total_qty) * 1.5
    FROM (
        SELECT SUM(quantity) as total_qty
        FROM order_items
        GROUP BY product_id
    ) subquery
);

场景 3:质量控制

sql
-- 查找缺陷率高的生产批次
SELECT batch_id,
       production_date,
       COUNT(*) as total_items,
       SUM(CASE WHEN is_defective THEN 1 ELSE 0 END) as defects,
       ROUND(100.0 * SUM(CASE WHEN is_defective THEN 1 ELSE 0 END) / COUNT(*), 2) as defect_rate
FROM production_items
GROUP BY batch_id, production_date
HAVING 100.0 * SUM(CASE WHEN is_defective THEN 1 ELSE 0 END) / COUNT(*) > 5.0
ORDER BY defect_rate DESC;

与其他子句结合

HAVING + ORDER BY

sql
-- 按平均评分排序,只显示评分高的餐厅
SELECT restaurant_id,
       restaurant_name,
       AVG(rating) as avg_rating,
       COUNT(*) as review_count
FROM reviews
GROUP BY restaurant_id, restaurant_name
HAVING COUNT(*) >= 10
ORDER BY avg_rating DESC;

HAVING + LIMIT

sql
-- 获取前 5 个最活跃的用户
SELECT user_id,
       username,
       COUNT(*) as post_count
FROM posts
GROUP BY user_id, username
HAVING COUNT(*) > 5
ORDER BY post_count DESC
LIMIT 5;

性能优化技巧

1. 使用 WHERE 预过滤

sql
-- 不推荐:在 HAVING 中过滤可以在 WHERE 中过滤的条件
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING department IN ('Sales', 'Marketing');

-- 推荐:使用 WHERE 预过滤
SELECT department, AVG(salary)
FROM employees
WHERE department IN ('Sales', 'Marketing')
GROUP BY department;

2. 避免在 HAVING 中使用复杂计算

sql
-- 不推荐:重复计算
SELECT category,
       SUM(price * quantity) as revenue
FROM sales
GROUP BY category
HAVING SUM(price * quantity) > 10000;

-- 推荐:使用子查询或 CTE
WITH category_revenue AS (
    SELECT category,
           SUM(price * quantity) as revenue
    FROM sales
    GROUP BY category
)
SELECT * FROM category_revenue
WHERE revenue > 10000;

3. 使用索引优化

sql
-- 在分组列上创建索引
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- 查询将更快
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;

常见错误和解决方案

错误 1:在 HAVING 中引用未分组的列

sql
-- 错误
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING employee_name = 'John';  -- employee_name 未分组

-- 正确:使用 WHERE
SELECT department, AVG(salary)
FROM employees
WHERE employee_name = 'John'
GROUP BY department;

错误 2:混淆 WHERE 和 HAVING

sql
-- 错误:在 WHERE 中使用聚合函数
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 50000  -- 错误!
GROUP BY department;

-- 正确:使用 HAVING
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;

复杂示例

示例 1:多级过滤

sql
-- 查找活跃且高价值的客户
SELECT c.customer_id,
       c.customer_name,
       COUNT(DISTINCT o.order_id) as order_count,
       SUM(o.total_amount) as total_spent,
       AVG(o.total_amount) as avg_order_value
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(DISTINCT o.order_id) >= 5
   AND SUM(o.total_amount) > 5000
   AND AVG(o.total_amount) > 500
ORDER BY total_spent DESC;

示例 2:时间序列分析

sql
-- 查找销售增长的月份
SELECT DATE_TRUNC('month', order_date) as month,
       SUM(total_amount) as monthly_sales,
       COUNT(*) as order_count
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '2 years'
GROUP BY DATE_TRUNC('month', order_date)
HAVING SUM(total_amount) > (
    SELECT AVG(monthly_total)
    FROM (
        SELECT SUM(total_amount) as monthly_total
        FROM orders
        WHERE order_date >= CURRENT_DATE - INTERVAL '2 years'
        GROUP BY DATE_TRUNC('month', order_date)
    ) avg_calc
)
ORDER BY month;

最佳实践

  1. 优先使用 WHERE:能在 WHERE 中过滤的条件不要放在 HAVING 中
  2. 清晰的命名:为聚合结果使用有意义的别名
  3. 性能考虑:HAVING 在大数据集上可能很慢,考虑使用索引
  4. 可读性:复杂的 HAVING 条件考虑使用 CTE 或子查询
  5. 测试边界情况:确保 HAVING 条件正确处理 NULL 值和空组

总结

HAVING 子句是 SQL 中强大的分组过滤工具:

  • 用于过滤 GROUP BY 产生的分组结果
  • 可以使用聚合函数进行条件判断
  • 在 WHERE 之后、ORDER BY 之前执行
  • 与 WHERE 配合使用可以实现复杂的数据分析
  • 注意性能优化,合理使用索引和预过滤

掌握 HAVING 子句对于进行数据分析和报表生成至关重要。