Window Functions
1. Explain Window Functions and provide examples.
- Window functions provide a way to perform calculations across rows within a result set, without creating separate subqueries.
- Examples:
AVG, SUM, MAX, MIN, RANK, DENSE_RANK, LEAD, LAG,
2. OVER
OVER
clause defines the window (set of rows) over which the window function is applied.
3. PARTITION BY
PARTITION BY
clause is used to divide a result set into partitions based on specified columns.
4. ORDER BY
ORDER BY
clause within the OVER
clause is used to specify the order of rows within each partition.
5. OVER, PARTITION & ORDER BY (Syntax and Example).
Syntax
<WindowFunction>() OVER (
PARTITION BY <expression>[{,<expression_2>...}]
ORDER BY <expression> [ASC|DESC], [{,<expression_2>...}]
)
Example
SELECT
department_id,
employee_id,
salary,
SUM(salary) OVER (PARTITION BY department_id ORDER BY employee_id) AS cumulative_salary
FROM employees;
PARTITION BY department_id:
Divides the result set into partitions based ondepartment_id
. The cumulative salary is calculated separately for each department.ORDER BY employee_id
: Orders the rows within each department byemployee_id
before calculating the cumulative sum.
6. ROW_NUMBER
ROW_NUMBER
assigns a unique row number to each row within a partition.
7. RANK
The RANK()
function assigns a unique rank to each distinct row within a partition of a result set. It produces gaps in ranking values if there are ties.
Syntax:
RANK() OVER (PARTITION BY column_name ORDER BY column_name [ASC|DESC])
8. DENSE_RANK
The DENSE_RANK()
function also assigns ranks to rows within a partition of a result set but does not produce gaps in rank values when there are ties.
Syntax:
DENSE_RANK() OVER (PARTITION BY column_name ORDER BY column_name [ASC|DESC])
9. LEAD
The LEAD()
function allows you to access data from a row that follows the current row in the result set. It provides a way to look ahead within a partition or the entire dataset.
Syntax:
LEAD(column_name, [offset], [default_value]) OVER (PARTITION BY column_name ORDER BY column_name [ASC|DESC])
column_name
: The column whose value you want to retrieve from the next row.offset
(optional): Number of rows to look ahead (default is 1 if omitted).default_value
(optional): Value to return if the lead row does not exist (e.g., if the current row is near the end of the partition).
Example
SELECT
sale_date,
sales_amount,
LEAD(sales_amount, 1, 0) OVER (ORDER BY sale_date) AS next_day_sales
FROM sales;
- Explanation: This query retrieves the
sales_amount
for the next day. If the next day’s sales data is not available, it returns0
.
10. LAG
The LAG()
function allows you to access data from a row that precedes the current row in the result set. It provides a way to look back within a partition or the entire dataset.
Syntax:
LAG(column_name, [offset], [default_value]) OVER (PARTITION BY column_name ORDER BY column_name [ASC|DESC])
column_name
: The column whose value you want to retrieve from the previous row.offset
(optional): Number of rows to look back (default is 1 if omitted).default_value
(optional): Value to return if the lag row does not exist (e.g., if the current row is near the start of the partition).
Example
SELECT
sale_date,
sales_amount,
LAG(sales_amount, 1, 0) OVER (ORDER BY sale_date) AS previous_day_sales
FROM sales;
- Explanation: This query retrieves the
sales_amount
from the previous day. If the previous day’s sales data is not available, it returns0
.
11. ROWS
ROWS
clause is used to define the window frame that specifies the range of rows that should be included in the calculation.
12. ROWS PRECEDING / FOLLOWING / UNBOUNDED / CURRENT ROW

Image Source: https://www.andreaperlato.com/theorypost/bias-variance-trade-off/
PRECEDING
: Includes rows before the current row.FOLLOWING
: Includes rows after the current row.UNBOUNDED
: Specifies an unbounded number of rows.CURRENT ROW
: Represents the current row.
13. Explain the difference between RANK and DENSE_RANK.
- Gaps in Ranking:
RANK()
: Produces gaps in ranking values if there are ties. For example, if two rows are tied for rank 1, the next rank will be 3.DENSE_RANK()
: Does not produce gaps in ranking values. The next rank after a tie is incremented by 1.
- Use-Cases:
RANK()
: Useful when the presence of gaps in ranking is acceptable or desired.DENSE_RANK()
: Useful when you need a continuous sequence of ranks without gaps.
14. List some of the common aggregate functions to apply window function.
SUM(), AVG(), MIN(), MAX(), COUNT()
: These aggregate functions can also be used as window functions.
15. Explain the difference between WINDOW Functions and GROUP BY.
- GROUP BY is used for grouping rows and aggregating data, resulting in fewer rows in the output.
- Window functions are used to calculate additional information across a set of rows while retaining all original rows.