SQL GROUP BY: Aggregating Data Effectively

SQL GROUP BY: Aggregating Data Effectively

GROUP BY groups rows by column values and applies aggregate functions:

SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id;

This counts orders per user, collapsing multiple order rows into one summary row per user.

Basic Aggregation

Common aggregate functions:

SELECT 
    category,
    COUNT(*) as product_count,
    AVG(price) as avg_price,
    MIN(price) as min_price,
    MAX(price) as max_price,
    SUM(price) as total_value
FROM products
GROUP BY category;

How GROUP BY Works

Given this data:

orders:
| user_id | amount |
|---------|--------|
| 1       | 50     |
| 1       | 30     |
| 2       | 20     |
| 2       | 40     |
| 3       | 15     |

Grouped by user_id:

SELECT user_id, SUM(amount) as total
FROM orders
GROUP BY user_id;

| user_id | total |
|---------|-------|
| 1       | 80    |
| 2       | 60    |
| 3       | 15    |

Multiple Columns

Group by multiple columns:

SELECT 
    category,
    status,
    COUNT(*) as count
FROM products
GROUP BY category, status;

| category    | status | count |
|-------------|--------|-------|
| Electronics | active | 50    |
| Electronics | sold   | 30    |
| Clothing    | active | 40    |
| Clothing    | sold   | 25    |

HAVING Clause

Filter aggregated results with HAVING (not WHERE):

-- Users who placed more than 5 orders
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;

WHERE filters before grouping. HAVING filters after:

SELECT user_id, COUNT(*) as order_count
FROM orders
WHERE amount > 10          -- Filter individual rows first
GROUP BY user_id
HAVING COUNT(*) > 2;       -- Then filter groups

Non-Aggregated Columns

Every non-aggregated column must be in GROUP BY:

-- Wrong - name not in GROUP BY
SELECT user_id, name, COUNT(*)
FROM users
JOIN orders ON users.id = orders.user_id
GROUP BY user_id;

-- Correct
SELECT user_id, name, COUNT(*)
FROM users
JOIN orders ON users.id = orders.user_id
GROUP BY user_id, name;

Common Aggregate Functions

COUNT(*) - count all rows:

SELECT category, COUNT(*) FROM products GROUP BY category;

COUNT(column) - count non-NULL values:

SELECT COUNT(phone) FROM users;  -- Counts users with phone numbers

SUM - total:

SELECT SUM(amount) as revenue FROM orders;

AVG - average:

SELECT AVG(price) as avg_price FROM products;

MIN/MAX - extremes:

SELECT MIN(price), MAX(price) FROM products;

GROUP BY with JOIN

SELECT 
    users.name,
    COUNT(orders.id) as order_count,
    SUM(orders.amount) as total_spent
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;

Date Grouping

Group by date parts:

-- PostgreSQL
SELECT 
    DATE_TRUNC('month', created_at) as month,
    COUNT(*) as order_count
FROM orders
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

-- MySQL
SELECT 
    DATE_FORMAT(created_at, '%Y-%m') as month,
    COUNT(*) as order_count
FROM orders
GROUP BY DATE_FORMAT(created_at, '%Y-%m')
ORDER BY month;

DISTINCT vs GROUP BY

For unique values without aggregation:

-- Using DISTINCT
SELECT DISTINCT category FROM products;

-- Using GROUP BY
SELECT category FROM products GROUP BY category;

DISTINCT is simpler. Use GROUP BY when you need counts or other aggregates.

Ordering Results

SELECT category, COUNT(*) as count
FROM products
GROUP BY category
ORDER BY count DESC;

Order by aggregate values or grouped columns.

Conditional Aggregation

SELECT 
    category,
    COUNT(*) as total,
    COUNT(CASE WHEN price > 100 THEN 1 END) as expensive_count,
    COUNT(CASE WHEN price <= 100 THEN 1 END) as cheap_count
FROM products
GROUP BY category;

Common Mistakes

Forgetting to include non-aggregated columns:

-- Error
SELECT user_id, name, COUNT(*)
FROM orders
GROUP BY user_id;  -- Missing 'name'

Using WHERE instead of HAVING:

-- Wrong - WHERE can't use aggregates
SELECT user_id, COUNT(*) as count
FROM orders
WHERE COUNT(*) > 5  -- Error
GROUP BY user_id;

-- Correct
SELECT user_id, COUNT(*) as count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;

Performance Tips

  • Index columns used in GROUP BY
  • Filter with WHERE before GROUP BY
  • Limit groups when possible
  • Consider covering indexes

Real-World Example

SELECT 
    DATE_TRUNC('day', created_at) as date,
    status,
    COUNT(*) as order_count,
    SUM(amount) as revenue,
    AVG(amount) as avg_order_value,
    COUNT(DISTINCT user_id) as unique_customers
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE_TRUNC('day', created_at), status
HAVING COUNT(*) >= 10
ORDER BY date DESC, revenue DESC;

Further Reading

PostgreSQL's aggregation documentation covers GROUP BY in depth.

Mode's SQL tutorial has GROUP BY examples and exercises.

GROUP BY transforms detailed data into actionable summaries.

Wear the code

Product mockup

GROUP BY Developer T-Shirt

£25.00

View product
Product mockup

LEFT JOIN Developer T-Shirt

£25.00

View product
Product mockup

SELECT * Developer T-Shirt

£25.00

View product

0 comments

Leave a comment

Please note, comments need to be approved before they are published.