Home > SQL Mastery for Interviews > SQL Mastery for Interviews > SQL Window Functions

SQL Window Functions

Introduction

Window functions perform ranking, aggregation, and value functions across a certain window, i.e. set of rows. This is similar to the kind of calculation that an aggregate function may perform. However, using a window function prevents rows from being aggregated into a single output row, unlike ordinary aggregate functions; instead, the rows maintain their individual identities. That means that the window function has access to more rows of the query result than just the current row. Here’s a basic visualization of what we just explained:

Observe how the three rows are combined into one row using the GROUP BY aggregate in the upper part of the figure above. Each row can have an aggregation value output thanks to the window function shown on the lower part of the same figure. By doing this, you could avoid having to do a join after the GROUP BY.

The keyword that we use with window functions is “OVER”. This clause does two things:

Divides rows into groups called sets. (The phrase "PARTITION BY" is utilized)
Puts rows inside such divisions in a certain sequence. (The phrase “ORDER BY” is utilized)

The three primary categories of window functions are aggregate, ranking, and value functions. You can see some of the names of the functions included in each category in the figure below.

The table we’ll use in all following examples throughout this tutorial can be seen below (named Table1).

Aggregate Window Functions

Aggregate window functions refer to a variety of aggregate functions including SUM(), COUNT(), AVG(), MAX(), and MIN() that are applied to a certain window (set of data).

Let us take a look at an example in order to better grasp the concept of aggregate window functions.

Let’s say we want to find the average age of our male and female customers and order both classes by the number of orders they’ve made. However, we want to see only the CustomerID, Name, Sex, NumberOfItems, and the average age of each person. Then, the query would look like this:

TEXT/X-SQL

1SELECT CustomerID, Name, Sex, NumberOfItems, 
2AVG(Age)  OVER (PARTITION BY Sex ORDER BY NumberOfItems) AS AvgAge 
3FROM Table1

The resulting table would look something like this:

As we can see, the average age of both sexes is calculated and displayed in the column named AvgAge. Moreover, both classes of customers are ordered by the number of their orders, respectively.

Ranking Window Functions

The most commonly used ranking window functions are RANK(), DENSE_RANK(), and ROW_NUMBER().

RANK() - The rank function, as its name implies, gives each row in a partition a rank. The first row is given rank 1, and subsequent rows with the same value are given the same rank. One rank value will be skipped for the rank that follows two identical rank values. Basically, we’ll skip as many rank values as there are identical rows in the column. We’ll take a look at an example later on.
DENSE_RANK() - Similar to the RANK() function, the first row is given rank 1, and subsequent rows with the same value have the same rank. The difference between RANK() and DENSE RANK() is that in DENSE RANK(), no rank is skipped when there are two identical rows, i.e. two ranks that are the same. Again, the example should make things clearer.
ROW_NUMBER() - It gives all the rows in the partition a sequence of consecutive numbers. No two rows can have the same row number within a division.

Imagine we want to calculate the row number, rank, and dense rank according to the country, by taking into consideration only the Sex and Country of the customer. Then, we have to run the following query:

TEXT/X-SQL

1SELECT 
2ROW_NUMBER() OVER (PARTITION BY Country ORDER BY Sex) 
3AS RowNumber, Sex, Country,
4RANK() OVER(PARTITION BY Country
5ORDER BY Sex) AS Rank,
6DENSE_RANK() OVER(PARTITION BY Country
7                  ORDER BY Sex) 
8                  AS DenseRank,
9FROM Table1

The resulting table would look like this:

The dots are put here because all other rows have unique countries, so all three columns will have only values of 1.

Let's test your knowledge. Click the correct answer from the options.

How many rank values above 1 we’ll have if we run the RANK() function on the table from above (Table 1) and PARTITION BY the name?

Click the option that best answers the question.

Value Window Functions

With the use of these functions, you can easily extract values from other rows that could be beneficial in a report. NTILE(), LEAD(), and LAG() are the three value functions that are most often utilized.

LAG() and LEAD() - To generate a column that can draw data from other rows, we may utilize the LAG or LEAD functions. LEAD returns values from the rows after it, whereas LAG might return values from the rows before it. These two functions might come in handy when analyzing variations over time and dealing with time series data and comparing prior or subsequent rows.
NTILE() – The NTILE() function is used for determining which percentile, quartile, or other division a given row belongs in. In this case, the ORDER BY clause chooses the columns that should be used for determining the quartiles.

Are you sure you're getting this? Is this statement true or false?

We can use LEAD() and LAG() within the same query.

Press true if you believe the statement is correct, or false otherwise.

One Pager Cheat Sheet

Window functions allow for the calculation of values over multiple rows while preserving individual row identities, with the "OVER" clause allowing for division of multiple rows into a set and ordering them sequentially.
Aggregate window functions are a set of aggregate functions such as _SUM(), COUNT(), AVG(), MAX(),_ and _MIN()_ which can be applied to a window of data and compute statistical values such as _average age_, _order by number of orders_ and more.
Ranking Window functions such as **RANK()**, **DENSE_RANK()**, and **ROW_NUMBER()** can be used to assign ranks and row numbers to partitions depending on the specified criteria, and the resulting table can then be used for further analysis.
The RANK() function can be used to assign each row a rank, where the first row is given a rank of 1 and any subsequent rows with the same value have the same rank, skipping one rank value when two identical rank values occur.
With the use of NTILE(), LEAD(), and LAG() functions, you can easily extract values from other rows to compare data over time, and allocate them into percentile, quartiles, or other divisions.
Yes, it is possible to utilize LEAD() and LAG() functions together in the same query for comparison of values from rows before and after the current row.