When working with relational databases, you often need to analyze data by grouping related records together and calculating summary statistics. This is where SQL's GROUP BY clause becomes essential.
GROUP BY allows you to group rows that have the same values in specified columns, then apply aggregate functions like COUNT(), SUM(), AVG() to each group. Think of it like sorting data into buckets - all rows with the same grouping criteria go into the same bucket, then you can perform calculations on each bucket.
HAVING is GROUP BY's companion for filtering. While WHERE filters individual rows before grouping, HAVING filters the groups after they've been formed and aggregated. This is crucial when you want to filter based on aggregate values.
The typical pattern for aggregation queries is:
This pattern is fundamental for business intelligence queries, reporting, and data analysis tasks where you need to answer questions like "which customers are our biggest spenders?" or "what products sell most frequently?"
1SELECT
2 department,
3 COUNT(*) as employee_count,
4 AVG(salary) as avg_salary
5FROM employees
6WHERE hire_date >= '2020-01-01'
7GROUP BY department
8HAVING COUNT(*) > 5
9ORDER BY employee_count DESC;