Exploring Advanced Window Functions in SQL

Exploring Advanced Window Functions in SQL

Before going straight to the topic. There are some things you should know before going directly to the topic:

RDBMS

RDBMS stands for Relational Database Management System. It is a Management System which manages the relational database which includes CRUD(Create, Remove, Update and Delete) of data organized in the form of tables which are in rows and columns. Some of the most well-known RDBMSs include MySQL, PostgreSQL, MariaDB, Microsoft SQL Server, and Oracle Database.

IdFirst NameLast NameageCountry
1JohnDoe31USA
2RobertLuna22USA
3DavidRobinson22UK
4JohnReinhardt25UK
5BettyDoe28UAE

SQL

Structured Query Language (SQL) is one of the most popular languages for Relational Database Management Systems (RDBMS) due to its user-friendly nature, facilitating easy data retrieval and manipulation of tables through operations like JOIN , AGGREGATE, SORT, even sub-queries , recursive queries. It offers flexible querying capabilities, allowing users to efficiently interact with the database, and is cost-effective compared to alternative solutions. There are lot of functions which are given below.

  • Aggregate Functions: These functions operate on sets of values and return a single value. Common aggregate functions include:

    • COUNT(): Returns the number of rows in a result set.

    • SUM(): Calculates the sum of values in a column.

    • AVG(): Calculates the average of values in a column.

    • MIN(): Returns the minimum value in a column.

    • MAX(): Returns the maximum value in a column.

  • Scalar Functions: These functions operate on a single value and return a single value. Common scalar functions include:

    • UPPER(): Converts a string to uppercase.

    • LOWER(): Converts a string to lowercase.

    • SUBSTRING(): Returns a substring of a string.

    • CONCAT(): Concatenates two or more strings.

    • DATE(): Extracts the date part from a datetime value.

  • Date and Time Functions: These functions are used for manipulating date and time values. Common date and time functions include:

    • CURRENT_DATE: Returns the current date.

    • CURRENT_TIME: Returns the current time.

    • DATEADD(): Adds a specified time interval to a date.

    • DATEDIFF(): Calculates the difference between two dates.

    • DATEPART(): Extracts a specific part of a date or time.

  • Mathematical Functions: These functions perform mathematical operations on numeric values. Common mathematical functions include:

    • ABS(): Returns the absolute value of a number.

    • ROUND(): Rounds a number to a specified number of decimal places.

    • CEILING(): Returns the smallest integer greater than or equal to a number.

    • FLOOR(): Returns the largest integer less than or equal to a number.

  • String Functions: These functions perform operations on string values. Common string functions include:

    • LEN(): Returns the length of a string.

    • LEFT(): Returns the leftmost characters of a string.

    • RIGHT(): Returns the rightmost characters of a string.

    • REPLACE(): Replaces occurrences of a substring within a string.

WINDOW Functions:

In SQL, a window function works on a group of rows that are connected to the current record and returns one value per row. These functions work without the need of subqueries or self-joins, calculating across a set of linked rows to the current row. The 'OVER' clause, which specifies the window or subset of rows that the function operates across, is used to provide window functions.

We will look at some of the window function and their example using table from the above customer tables.

  1. ROW_NUMBER():

    Assigns a unique sequential integer to each row within a partition of a result set. This query assigns a row number to each customer based on their age to go orderly.

     SELECT
         customer_id,
         first_name,
         last_name,
         ROW_NUMBER() OVER (ORDER BY age) AS row_number
     FROM
         Customers;
    

    output:

  1. RANK():

    Assigns a unique rank to each distinct row within a partition of a result set, with gaps in rank numbers when there are ties. In the below code, query ranks customers based on their age.

SELECT
    customer_id,
    first_name,
    last_name,
    RANK() OVER (ORDER BY age) AS rank
FROM
    Customers;

output:

There is two Rank  1 Because Robert and David have the same height.

  1. DENSE_RANK():

    Assigns a unique rank to each distinct row within a partition of a result set, without any gaps in rank numbers. This query assigns dense ranks to customers based on their age.

SELECT
    customer_id,
    first_name,
    last_name,
    DENSE_RANK() OVER (ORDER BY age) AS dense_rank
FROM
    Customers;

output:

  1. NTILE():

    Divides the rows of a result set into a specified number of equally sized groups or "buckets" and assigns each row to a bucket.

    This query divides customers into three equally sized buckets based on their age.

SELECT
    customer_id,
    first_name,
    last_name,
    NTILE(3) OVER (ORDER BY age) AS bucket
FROM
    Customers;

output:

  1. LAG():

    Accesses data from a previous row in the result set, based on a specified offset. This query retrieves the age of the previous customer in the result set for each customer, ordered by age.

SELECT
    customer_id,
    first_name,
    last_name,
    age,
    LAG(age) OVER (ORDER BY age) AS previous_age
FROM
    Customers;

output:

The first row of previois_age is blank because there is no age before 22 in the table.

  1. LEAD(): Accesses data from a subsequent row in the result set, based on a specified offset. This query retrieves the age of the next customer in the result set for each customer, ordered by age
SELECT
    customer_id,
    first_name,
    last_name,
    age,
    LEAD(age) OVER (ORDER BY age) AS next_age
FROM
    Customers;

output:

  1. FIRST_VALUE():

Returns the value of the specified expression from the first row in a window frame. This query retrieves the age of the first customer in the result set for each customer, ordered by age.

SELECT
    customer_id,
    first_name,
    last_name,
    age,
    FIRST_VALUE(age) OVER (ORDER BY age) AS first_age
FROM
    Customers;

output:

  1. LAST_VALUE():

    Returns the value of the specified expression from the last row in a window frame. This query retrieves the age of the last customer in the result set for each customer, ordered by age.

    Things to Know:

    a) UNBOUNDED PRECEDING indicates that the window frame starts from the first row of the partition (or result set) and includes all preceding rows up to the current row.

    b)UNBOUNDED FOLLOWING indicates that the window frame ends at the last row of the partition (or result set) and includes all following rows after the current row.

    So, when using ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING with an ORDER BY clause in a window function, it means that the window function considers all rows in the partition (or result set), from the first row to the last row, regardless of their order based on the age column.

SELECT
    customer_id,
    first_name,
    last_name,
    age,
    LAST_VALUE(age) OVER (ORDER BY age ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_age
FROM
    Customers;

output:

  1. NTH_VALUE():

    Returns the value of the specified expression from the nth row in a window frame. This query retrieves the age of the third customer in the result set for each customer, ordered by age.

SELECT
    customer_id,
    first_name,
    last_name,
    age,
    NTH_VALUE(age, 3) OVER (ORDER BY age ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS third_age
FROM
    Customers;

output:

Conclusion

SQL window functions are strong tools that allow for complex data manipulation and analysis within of a query. Window functions assist complex calculations and offer significant insights by enabling operations across a set of rows relevant to the current row without requiring additional joins or subqueries. Gaining knowledge of and using window functions can significantly expand the capabilities of SQL queries and make it possible to analyse data more effectively and perceptively.