GROUP BY groups rows by column values and applies aggregate functions:
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id;
This counts orders per user, collapsing multiple order rows into one summary row per user.
Basic Aggregation
Common aggregate functions:
SELECT
category,
COUNT(*) as product_count,
AVG(price) as avg_price,
MIN(price) as min_price,
MAX(price) as max_price,
SUM(price) as total_value
FROM products
GROUP BY category;
How GROUP BY Works
Given this data:
orders:
| user_id | amount |
|---------|--------|
| 1 | 50 |
| 1 | 30 |
| 2 | 20 |
| 2 | 40 |
| 3 | 15 |
Grouped by user_id:
SELECT user_id, SUM(amount) as total
FROM orders
GROUP BY user_id;
| user_id | total |
|---------|-------|
| 1 | 80 |
| 2 | 60 |
| 3 | 15 |
Multiple Columns
Group by multiple columns:
SELECT
category,
status,
COUNT(*) as count
FROM products
GROUP BY category, status;
| category | status | count |
|-------------|--------|-------|
| Electronics | active | 50 |
| Electronics | sold | 30 |
| Clothing | active | 40 |
| Clothing | sold | 25 |
HAVING Clause
Filter aggregated results with HAVING (not WHERE):
-- Users who placed more than 5 orders
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;
WHERE filters before grouping. HAVING filters after:
SELECT user_id, COUNT(*) as order_count
FROM orders
WHERE amount > 10 -- Filter individual rows first
GROUP BY user_id
HAVING COUNT(*) > 2; -- Then filter groups
Non-Aggregated Columns
Every non-aggregated column must be in GROUP BY:
-- Wrong - name not in GROUP BY
SELECT user_id, name, COUNT(*)
FROM users
JOIN orders ON users.id = orders.user_id
GROUP BY user_id;
-- Correct
SELECT user_id, name, COUNT(*)
FROM users
JOIN orders ON users.id = orders.user_id
GROUP BY user_id, name;
Common Aggregate Functions
COUNT(*) - count all rows:
SELECT category, COUNT(*) FROM products GROUP BY category;
COUNT(column) - count non-NULL values:
SELECT COUNT(phone) FROM users; -- Counts users with phone numbers
SUM - total:
SELECT SUM(amount) as revenue FROM orders;
AVG - average:
SELECT AVG(price) as avg_price FROM products;
MIN/MAX - extremes:
SELECT MIN(price), MAX(price) FROM products;
GROUP BY with JOIN
SELECT
users.name,
COUNT(orders.id) as order_count,
SUM(orders.amount) as total_spent
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;
Date Grouping
Group by date parts:
-- PostgreSQL
SELECT
DATE_TRUNC('month', created_at) as month,
COUNT(*) as order_count
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;
-- MySQL
SELECT
DATE_FORMAT(created_at, '%Y-%m') as month,
COUNT(*) as order_count
FROM orders
GROUP BY DATE_FORMAT(created_at, '%Y-%m')
ORDER BY month;
DISTINCT vs GROUP BY
For unique values without aggregation:
-- Using DISTINCT
SELECT DISTINCT category FROM products;
-- Using GROUP BY
SELECT category FROM products GROUP BY category;
DISTINCT is simpler. Use GROUP BY when you need counts or other aggregates.
Ordering Results
SELECT category, COUNT(*) as count
FROM products
GROUP BY category
ORDER BY count DESC;
Order by aggregate values or grouped columns.
Conditional Aggregation
SELECT
category,
COUNT(*) as total,
COUNT(CASE WHEN price > 100 THEN 1 END) as expensive_count,
COUNT(CASE WHEN price <= 100 THEN 1 END) as cheap_count
FROM products
GROUP BY category;
Common Mistakes
Forgetting to include non-aggregated columns:
-- Error
SELECT user_id, name, COUNT(*)
FROM orders
GROUP BY user_id; -- Missing 'name'
Using WHERE instead of HAVING:
-- Wrong - WHERE can't use aggregates
SELECT user_id, COUNT(*) as count
FROM orders
WHERE COUNT(*) > 5 -- Error
GROUP BY user_id;
-- Correct
SELECT user_id, COUNT(*) as count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;
Performance Tips
- Index columns used in GROUP BY
- Filter with WHERE before GROUP BY
- Limit groups when possible
- Consider covering indexes
Real-World Example
SELECT
DATE_TRUNC('day', created_at) as date,
status,
COUNT(*) as order_count,
SUM(amount) as revenue,
AVG(amount) as avg_order_value,
COUNT(DISTINCT user_id) as unique_customers
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE_TRUNC('day', created_at), status
HAVING COUNT(*) >= 10
ORDER BY date DESC, revenue DESC;
Further Reading
PostgreSQL's aggregation documentation covers GROUP BY in depth.
Mode's SQL tutorial has GROUP BY examples and exercises.
GROUP BY transforms detailed data into actionable summaries.
0 comments