Advance SQL

What are Window Functions?

Window functions in SQL are a type of analytical function that perform calculations

across a set of rows that are related to the current row, called a "window". A

window function calculates a value for each row in the result set based on a subset

of the rows that are defined by a window specification.

The window specification is defined using the OVER() clause in SQL, which specifies

the partitioning and ordering of the rows that the window function will operate

on. The partitioning divides the rows into groups based on a specific column or

expression, while the ordering defines the order in which the rows are processed

within each group.

Data Set:

CREATE TABLE marks (
  student_id INTEGER PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(255),
  branch VARCHAR(255),
  marks INTEGER
);


INSERT INTO marks (name, branch, marks) VALUES
('Nitish', 'EEE', 82),
('Rishabh', 'EEE', 91),
('Anukant', 'EEE', 69),
('Rupesh', 'EEE', 55),
('Shubham', 'CSE', 78),
('Ved', 'CSE', 43),
('Deepak', 'CSE', 98),
('Arpan', 'CSE', 95),
('Vinay', 'ECE', 95),
('Ankit', 'ECE', 88),
('Anand', 'ECE', 81),
('Rohit', 'ECE', 95),
('Prashant', 'MECH', 75),
('Amit', 'MECH', 69),
('Sunny', 'MECH', 39),
('Gautam', 'MECH', 51);


CREATE TABLE students (
  name VARCHAR(255),
  branch VARCHAR(255),
  marks INTEGER
);

INSERT INTO students (name, branch, marks) VALUES
  -- EEE branch
  ('Nitish', 'EEE', 82),
  ('Rishabh', 'EEE', 91),
  ('Anukant', 'EEE', 69),
  ('Rupesh', 'EEE', 55),

  -- CSE branch
  ('Shubham', 'CSE', 78),
  ('Ved', 'CSE', 43),
  ('Deepak', 'CSE', 98),

  -- ME branch
  ('Aman', 'ME', 72),
  ('Pratik', 'ME', 66),
  ('Harsh', 'ME', 88),

  -- IT branch
  ('Siddharth', 'IT', 81),
  ('Nikhil', 'IT', 90),
  ('Karan', 'IT', 76);
 

Aggregate Function with OVER()

Find all the students who have marks higher than the avg marks of

their respective branch

# Window functions

select *, AVG(marks) over() from student.marks

select *, AVG(marks) over(partition by branch) from student.marks

# smallest marks and highest marks students

select *, AVG(marks) over(), min(marks) over(), max(marks) over(), min(marks) over( partition by branch), max(marks) over(partition by branch) from student.marks order by student_id

# get the students whose marks is greatere than their respective branch's average marks

select * from (

select *, avg(marks) over(partition by branch) as 'branch_avg'

from student.marks) t

where marks > t.branch_avg

RANK/DENSE_RANK/ROW_NUMBER

1. Find top 2 most paying customers of each month

2. Create roll no from branch and marks

#RANK/DENSE_RANK/ROW_NUMBER

#usecase- you want to rank all the student in your branch

# 1. Find top 2 most paying customers of each month

# 2. Create roll no. from branch and marks

select *, rank() over(order by marks desc)

from student.marks

# get the rank w.r.t branch

select *, rank() over(partition by branch order by marks desc)

from student.marks

#Dence ranks-> uses 2 if two student got the same rank 1,

# where normal rank gives 3

select *,

rank() over(partition by branch order by marks desc),

dense_rank() over(partition by branch order by marks desc)

from student.marks

# Row_number

select *, row_number() over(partition by branch) from marks;

select *, concat(branch, '_', row_number() over(partition by branch))

from marks

use zomato

select * from zomato.orders

select monthname(date), user_id, sum(amount) from zomato.orders

group by monthname(date), user_id

order by month(date)

FIRST_VALUE/LAST VALUE/NTH_VALUE

These functions return values from within a window frame, which is useful for analytics and reporting.

Use cases:

1. Find the branch toppers

2. FRAME Clause

3. Find the last guy of each branch

4. Alternate way of writing Window functions

5. Find the 2nd last guy of each branch, 5th topper of each branch

Sample DataSet:

sale_id	employee	month	sales_amount
1	Alice	Jan	500
2	Alice	Feb	700
3	Alice	Mar	600
4	Bob	Jan	300
5	Bob	Feb	800
6	Bob	Mar	500

1. FIRST_VALUE()

Goal: Get each employee's first month's sales
SELECT
  employee,
  month,
  sales_amount,
  FIRST_VALUE(month) OVER (PARTITION BY employee ORDER BY month) AS first_month,
  FIRST_VALUE(sales_amount) OVER (PARTITION BY employee ORDER BY month) AS first_sales
FROM sales;

 Explanation:


PARTITION BY employee: Do the calculation per employee


ORDER BY month: Order by month to define the "first"

`2. LAST_VALUE()`

Goal: Get each employee’s last month’s sales


SELECT
  employee,
  month,
  sales_amount,
  LAST_VALUE(month) OVER (
    PARTITION BY employee ORDER BY month
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_month,
  LAST_VALUE(sales_amount) OVER (
    PARTITION BY employee ORDER BY month
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_sales
FROM sales;

⚠️ Without ROWS BETWEEN ..., LAST_VALUE() may return incorrect values because the default window frame ends at the current row.

3. `NTH_VALUE()`

Goal: Get each employee’s 2nd month’s sales


SELECT
  employee,
  month,
  sales_amount,
  NTH_VALUE(month, 2) OVER (
    PARTITION BY employee ORDER BY month
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS second_month,
  NTH_VALUE(sales_amount, 2) OVER (
    PARTITION BY employee ORDER BY month
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS second_sales
FROM sales;

Output Example:

employee	month	sales_amount	second_month	second_sales
Alice	Jan	500	Feb	700
Alice	Feb	700	Feb	700
Alice	Mar	600	Feb	700
Bob	Jan	300	Feb	800
Bob	Feb	800	Feb	800
Bob	Mar	500	Feb	800

Summary

Function	What It Returns
`FIRST_VALUE()`	First value in the window frame
`LAST_VALUE()`	Last value in the window frame
`NTH_VALUE(n)`	The n-th value in the window frame

Frames

A frame in a window function is a subset of rows within the partition that

determines the scope of the window function calculation. The frame is defined

using a combination of two clauses in the window function: ROWS and BETWEEN.

The ROWS clause specifies how many rows should be included in the frame

relative to the current row. For example, ROWS 3 PRECEDING means that the

frame includes the current row and the three rows that precede it in the partition.

The BETWEEN clause specifies the boundaries of the frame.

Examples

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW - means that the

frame includes all rows from the beginning of the partition up to and including the

current row.

•ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: the frame includes the

current row and the row immediately before and after it.

•ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: the

frame includes all rows in the partition.

•ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING: the frame includes the

current row and the three rows before it and the two rows after it.

This is crucial in windowed aggregations (like SUM, AVG, etc.) to control how many rows are included in the calculation relative to the current row.

Sample Table: `sales`

id	employee	sales
1	Alice	100
2	Alice	200
3	Alice	150
4	Alice	300
5	Alice	250

Goal: Use SUM(sales) with different frame definitions using ROWS BETWEEN ... AND ...

1. `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`

Frame: From the first row of the partition up to the current row (cumulative sum)


SELECT 
  id,
  employee,
  sales,
  SUM(sales) OVER (
    PARTITION BY employee
    ORDER BY id
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM sales;

Result:

id	sales	running_total
1	100	100
2	200	300
3	150	450
4	300	750
5	250	1000

2. `ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING`

Frame: Current row, one before, and one after (3-row moving sum)


SELECT 
  id,
  employee,
  sales,
  SUM(sales) OVER (
    PARTITION BY employee
    ORDER BY id
    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) AS moving_sum
FROM sales;

Result:

id	sales	moving_sum
1	100	300
2	200	450
3	150	650
4	300	700
5	250	550

3. `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`

🔒 Frame: Includes all rows in the partition (like a full partition sum)


SELECT 
  id,
  employee,
  sales,
  SUM(sales) OVER (
    PARTITION BY employee
    ORDER BY id
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS total_sales
FROM sales;

Result:

id	sales	total_sales
1	100	1000
2	200	1000
3	150	1000
4	300	1000
5	250	1000

4. `ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING`

Frame: 3 rows before, current row, and 2 rows after (6-row frame max)


SELECT 
  id,
  employee,
  sales,
  SUM(sales) OVER (
    PARTITION BY employee
    ORDER BY id
    ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING
  ) AS custom_frame_sum
FROM sales;

Result:

id	sales	custom_frame_sum
1	100	100 + 200 + 150 + 300 = 750
2	200	100 + 200 + 150 + 300 + 250 = 1000
3	150	100 + 200 + 150 + 300 + 250 = 1000
4	300	200 + 150 + 300 + 250 = 900
5	250	150 + 300 + 250 = 700

LEAD & LAG in SQL

LEAD(column, offset, default) → Get value after current row.
LAG(column, offset, default) → Get value before current row.

Sample Table: `sales`

id	employee	month	sales
1	Alice	Jan	100
2	Alice	Feb	150
3	Alice	Mar	200
4	Bob	Jan	120
5	Bob	Feb	180
6	Bob	Mar	160

Example: Using `LEAD` and `LAG` to Compare Sales Month-over-Month


SELECT
  employee,
  month,
  sales,
  LAG(sales) OVER (PARTITION BY employee ORDER BY month) AS previous_month_sales,
  LEAD(sales) OVER (PARTITION BY employee ORDER BY month) AS next_month_sales
FROM sales;

Output:

employee	month	sales	previous_month_sales	next_month_sales
Alice	Jan	100	NULL	150
Alice	Feb	150	100	200
Alice	Mar	200	150	NULL
Bob	Jan	120	NULL	180
Bob	Feb	180	120	160
Bob	Mar	160	180	NULL

Advanced: Calculate Difference from Previous Month


SELECT
  employee,
  month,
  sales,
  LAG(sales) OVER (PARTITION BY employee ORDER BY month) AS previous_month_sales,
  sales - LAG(sales) OVER (PARTITION BY employee ORDER BY month) AS sales_diff
FROM sales;

Output:

employee	month	sales	previous_month_ sales	sales_diff
Alice	Jan	100	NULL	NULL
Alice	Feb	150	100	50
Alice	Mar	200	150	50
Bob	Jan	120	NULL	NULL
Bob	Feb	180	120	60
Bob	Mar	160	180	-20

Parameters

You can use the offset and default parameters as well:


LEAD(sales, 2, 0) OVER (...)  -- Look 2 rows ahead, return 0 if not found
LAG(sales, 1, 0) OVER (...)   -- Look 1 row behind, return 0 if not found

Use Cases

Use Case	Use `LEAD` / `LAG`
Compare current row with previous	`LAG()`
Compare current row with next	`LEAD()`
Detect changes in a sequence	`LAG()` + comparison
Calculate growth rate	`LAG()`

Search This Blog

Blog

DS: Advance SQL

Advance SQL

What are Window Functions?

Data Set:

Aggregate Function with OVER()

RANK/DENSE_RANK/ROW_NUMBER

FIRST_VALUE/LAST VALUE/NTH_VALUE

`2. LAST_VALUE()`

Goal: Get each employee’s last month’s sales

3. `NTH_VALUE()`

Goal: Get each employee’s 2nd month’s sales

Output Example:

Summary

Frames

Sample Table: `sales`

1. `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`

Result:

2. `ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING`

Result:

3. `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`

Result:

4. `ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING`

Result:

LEAD & LAG in SQL

Sample Table: `sales`

Example: Using `LEAD` and `LAG` to Compare Sales Month-over-Month

Output:

Advanced: Calculate Difference from Previous Month

Output:

Parameters

Use Cases

Comments

Post a Comment

Popular posts from this blog

A Road-Map to Become Solution Architect

Pickle Vs M, VS ONNX vs SavedModel vs TorchScript

Module 3: Fine-Tuning and Customizing Generative AI Models

DS: Advance SQL

Advance SQL

What are Window Functions?

Data Set:

Aggregate Function with OVER()

RANK/DENSE_RANK/ROW_NUMBER

FIRST_VALUE/LAST VALUE/NTH_VALUE

2. LAST_VALUE()

Goal: Get each employee’s last month’s sales

3. NTH_VALUE()

Goal: Get each employee’s 2nd month’s sales

Output Example:

Summary

Frames

Sample Table: sales

1. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Result:

2. ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

Result:

3. ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Result:

4. ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING

Result:

LEAD & LAG in SQL

Sample Table: sales

Example: Using LEAD and LAG to Compare Sales Month-over-Month

Output:

Advanced: Calculate Difference from Previous Month

Output:

Parameters

Use Cases

Comments

Post a Comment

Popular posts from this blog

A Road-Map to Become Solution Architect

Pickle Vs M, VS ONNX vs SavedModel vs TorchScript

Module 3: Fine-Tuning and Customizing Generative AI Models

`2. LAST_VALUE()`

3. `NTH_VALUE()`

Sample Table: `sales`

1. `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`

2. `ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING`

3. `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`

4. `ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING`

Sample Table: `sales`

Example: Using `LEAD` and `LAG` to Compare Sales Month-over-Month