DS: Advance SQL
Advance SQL
What are Window Functions?
Window functions in SQL are a type of analytical function that perform calculations
across a set of rows that are related to the current row, called a "window". A
window function calculates a value for each row in the result set based on a subset
of the rows that are defined by a window specification.
The window specification is defined using the OVER() clause in SQL, which specifies
the partitioning and ordering of the rows that the window function will operate
on. The partitioning divides the rows into groups based on a specific column or
expression, while the ordering defines the order in which the rows are processed
within each group.
Data Set:
CREATE TABLE marks (
student_id INTEGER PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255),
branch VARCHAR(255),
marks INTEGER
);
INSERT INTO marks (name, branch, marks) VALUES
('Nitish', 'EEE', 82),
('Rishabh', 'EEE', 91),
('Anukant', 'EEE', 69),
('Rupesh', 'EEE', 55),
('Shubham', 'CSE', 78),
('Ved', 'CSE', 43),
('Deepak', 'CSE', 98),
('Arpan', 'CSE', 95),
('Vinay', 'ECE', 95),
('Ankit', 'ECE', 88),
('Anand', 'ECE', 81),
('Rohit', 'ECE', 95),
('Prashant', 'MECH', 75),
('Amit', 'MECH', 69),
('Sunny', 'MECH', 39),
('Gautam', 'MECH', 51);
CREATE TABLE students (
name VARCHAR(255),
branch VARCHAR(255),
marks INTEGER
);
INSERT INTO students (name, branch, marks) VALUES
-- EEE branch
('Nitish', 'EEE', 82),
('Rishabh', 'EEE', 91),
('Anukant', 'EEE', 69),
('Rupesh', 'EEE', 55),
-- CSE branch
('Shubham', 'CSE', 78),
('Ved', 'CSE', 43),
('Deepak', 'CSE', 98),
-- ME branch
('Aman', 'ME', 72),
('Pratik', 'ME', 66),
('Harsh', 'ME', 88),
-- IT branch
('Siddharth', 'IT', 81),
('Nikhil', 'IT', 90),
('Karan', 'IT', 76);
Aggregate Function with OVER()
# Window functions
select *, AVG(marks) over() from student.marks
select *, AVG(marks) over(partition by branch) from student.marks
# smallest marks and highest marks students
select *, AVG(marks) over(), min(marks) over(), max(marks) over(), min(marks) over( partition by branch), max(marks) over(partition by branch) from student.marks order by student_id
# get the students whose marks is greatere than their respective branch's average marks
select * from (
select *, avg(marks) over(partition by branch) as 'branch_avg'
from student.marks) t
where marks > t.branch_avg
RANK/DENSE_RANK/ROW_NUMBER
#RANK/DENSE_RANK/ROW_NUMBER
#usecase- you want to rank all the student in your branch
# 1. Find top 2 most paying customers of each month
# 2. Create roll no. from branch and marks
select *, rank() over(order by marks desc)
from student.marks
# get the rank w.r.t branch
select *, rank() over(partition by branch order by marks desc)
from student.marks
#Dence ranks-> uses 2 if two student got the same rank 1,
# where normal rank gives 3
select *,
rank() over(partition by branch order by marks desc),
dense_rank() over(partition by branch order by marks desc)
from student.marks
# Row_number
select *, row_number() over(partition by branch) from marks;
select *, concat(branch, '_', row_number() over(partition by branch))
from marks
use zomato
select * from zomato.orders
select monthname(date), user_id, sum(amount) from zomato.orders
group by monthname(date), user_id
order by month(date)
FIRST_VALUE/LAST VALUE/NTH_VALUE
sale_id | employee | month | sales_amount |
---|---|---|---|
1 | Alice | Jan | 500 |
2 | Alice | Feb | 700 |
3 | Alice | Mar | 600 |
4 | Bob | Jan | 300 |
5 | Bob | Feb | 800 |
6 | Bob | Mar | 500 |
1. FIRST_VALUE()
Goal: Get each employee's first month's sales
SELECT
employee,
month,
sales_amount,
FIRST_VALUE(month) OVER (PARTITION BY employee ORDER BY month) AS first_month,
FIRST_VALUE(sales_amount) OVER (PARTITION BY employee ORDER BY month) AS first_sales
FROM sales;
Explanation:
-
PARTITION BY employee
: Do the calculation per employee
-
ORDER BY month
: Order by month to define the "first"
2. LAST_VALUE()
Goal: Get each employee’s last month’s sales
⚠️ Without
ROWS BETWEEN ...
,LAST_VALUE()
may return incorrect values because the default window frame ends at the current row.
3. NTH_VALUE()
Goal: Get each employee’s 2nd month’s sales
Output Example:
employee | month | sales_amount | second_month | second_sales |
---|---|---|---|---|
Alice | Jan | 500 | Feb | 700 |
Alice | Feb | 700 | Feb | 700 |
Alice | Mar | 600 | Feb | 700 |
Bob | Jan | 300 | Feb | 800 |
Bob | Feb | 800 | Feb | 800 |
Bob | Mar | 500 | Feb | 800 |
Summary
Function | What It Returns |
---|---|
FIRST_VALUE() | First value in the window frame |
LAST_VALUE() | Last value in the window frame |
NTH_VALUE(n) | The n-th value in the window frame |
Frames
SUM
, AVG
, etc.) to control how many rows are included in the calculation relative to the current row.Sample Table: sales
id | employee | sales |
---|---|---|
1 | Alice | 100 |
2 | Alice | 200 |
3 | Alice | 150 |
4 | Alice | 300 |
5 | Alice | 250 |
SUM(sales)
with different frame definitions using ROWS BETWEEN ... AND ...
1. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Frame: From the first row of the partition up to the current row (cumulative sum)
Result:
id | sales | running_total |
---|---|---|
1 | 100 | 100 |
2 | 200 | 300 |
3 | 150 | 450 |
4 | 300 | 750 |
5 | 250 | 1000 |
2. ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
Frame: Current row, one before, and one after (3-row moving sum)
Result:
id | sales | moving_sum |
---|---|---|
1 | 100 | 300 |
2 | 200 | 450 |
3 | 150 | 650 |
4 | 300 | 700 |
5 | 250 | 550 |
3. ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
🔒 Frame: Includes all rows in the partition (like a full partition sum)
Result:
id | sales | total_sales |
---|---|---|
1 | 100 | 1000 |
2 | 200 | 1000 |
3 | 150 | 1000 |
4 | 300 | 1000 |
5 | 250 | 1000 |
4. ROWS BETWEEN 3 PRECEDING AND 2 FOLLOWING
Frame: 3 rows before, current row, and 2 rows after (6-row frame max)
Result:
id | sales | custom_frame_sum |
---|---|---|
1 | 100 | 100 + 200 + 150 + 300 = 750 |
2 | 200 | 100 + 200 + 150 + 300 + 250 = 1000 |
3 | 150 | 100 + 200 + 150 + 300 + 250 = 1000 |
4 | 300 | 200 + 150 + 300 + 250 = 900 |
5 | 250 | 150 + 300 + 250 = 700 |
LEAD & LAG in SQL
-
LEAD(column, offset, default)
→ Get value after current row. -
LAG(column, offset, default)
→ Get value before current row.
Sample Table: sales
id | employee | month | sales |
---|---|---|---|
1 | Alice | Jan | 100 |
2 | Alice | Feb | 150 |
3 | Alice | Mar | 200 |
4 | Bob | Jan | 120 |
5 | Bob | Feb | 180 |
6 | Bob | Mar | 160 |
Example: Using LEAD
and LAG
to Compare Sales Month-over-Month
Output:
employee | month | sales | previous_month_sales | next_month_sales |
---|---|---|---|---|
Alice | Jan | 100 | NULL | 150 |
Alice | Feb | 150 | 100 | 200 |
Alice | Mar | 200 | 150 | NULL |
Bob | Jan | 120 | NULL | 180 |
Bob | Feb | 180 | 120 | 160 |
Bob | Mar | 160 | 180 | NULL |
Advanced: Calculate Difference from Previous Month
Output:
employee | month | sales | previous_month_ sales | sales_diff |
---|---|---|---|---|
Alice | Jan | 100 | NULL | NULL |
Alice | Feb | 150 | 100 | 50 |
Alice | Mar | 200 | 150 | 50 |
Bob | Jan | 120 | NULL | NULL |
Bob | Feb | 180 | 120 | 60 |
Bob | Mar | 160 | 180 | -20 |
Parameters
You can use the offset
and default
parameters as well:
Use Cases
Use Case | Use LEAD / LAG |
---|---|
Compare current row with previous | LAG() |
Compare current row with next | LEAD() |
Detect changes in a sequence | LAG() + comparison |
Calculate growth rate | LAG() |
Comments
Post a Comment