Finding High-Volume Customers

easySQL

Lesson

SQL Aggregation with GROUP BY and HAVING

When working with relational databases, you often need to analyze data by grouping related records together and calculating summary statistics. This is where SQL's GROUP BY clause becomes essential.

GROUP BY allows you to group rows that have the same values in specified columns, then apply aggregate functions like COUNT(), SUM(), AVG() to each group. Think of it like sorting data into buckets - all rows with the same grouping criteria go into the same bucket, then you can perform calculations on each bucket.

HAVING is GROUP BY's companion for filtering. While WHERE filters individual rows before grouping, HAVING filters the groups after they've been formed and aggregated. This is crucial when you want to filter based on aggregate values.

The typical pattern for aggregation queries is:

  1. JOIN tables to connect related data
  2. GROUP BY to organize rows into logical groups
  3. SELECT with aggregate functions to calculate summaries
  4. HAVING to filter groups based on aggregate conditions
  5. ORDER BY to sort the final results

This pattern is fundamental for business intelligence queries, reporting, and data analysis tasks where you need to answer questions like "which customers are our biggest spenders?" or "what products sell most frequently?"

Example
1SELECT 2 department, 3 COUNT(*) as employee_count, 4 AVG(salary) as avg_salary 5FROM employees 6WHERE hire_date >= '2020-01-01' 7GROUP BY department 8HAVING COUNT(*) > 5 9ORDER BY employee_count DESC;
L4WHERE filters individual employees before grouping (only recent hires)
L5GROUP BY creates one result row per department
L6HAVING filters departments after grouping (only large departments)

Key Takeaways

  • •GROUP BY organizes rows into groups for aggregate calculations
  • •HAVING filters groups based on aggregate conditions, while WHERE filters individual rows
  • •Always include non-aggregate columns from SELECT in your GROUP BY clause
Loading...