Table of contents
Before going straight to the topic. There are some things you should know before going directly to the topic:
RDBMS
RDBMS stands for Relational Database Management System. It is a Management System which manages the relational database which includes CRUD(Create, Remove, Update and Delete) of data organized in the form of tables which are in rows and columns. Some of the most well-known RDBMSs include MySQL, PostgreSQL, MariaDB, Microsoft SQL Server, and Oracle Database.
Id | First Name | Last Name | age | Country |
1 | John | Doe | 31 | USA |
2 | Robert | Luna | 22 | USA |
3 | David | Robinson | 22 | UK |
4 | John | Reinhardt | 25 | UK |
5 | Betty | Doe | 28 | UAE |
SQL
Structured Query Language (SQL) is one of the most popular languages for Relational Database Management Systems (RDBMS) due to its user-friendly nature, facilitating easy data retrieval and manipulation of tables through operations like JOIN , AGGREGATE, SORT, even sub-queries , recursive queries. It offers flexible querying capabilities, allowing users to efficiently interact with the database, and is cost-effective compared to alternative solutions. There are lot of functions which are given below.
Aggregate Functions: These functions operate on sets of values and return a single value. Common aggregate functions include:
COUNT()
: Returns the number of rows in a result set.SUM()
: Calculates the sum of values in a column.AVG()
: Calculates the average of values in a column.MIN()
: Returns the minimum value in a column.MAX()
: Returns the maximum value in a column.
Scalar Functions: These functions operate on a single value and return a single value. Common scalar functions include:
UPPER()
: Converts a string to uppercase.LOWER()
: Converts a string to lowercase.SUBSTRING()
: Returns a substring of a string.CONCAT()
: Concatenates two or more strings.DATE()
: Extracts the date part from a datetime value.
Date and Time Functions: These functions are used for manipulating date and time values. Common date and time functions include:
CURRENT_DATE
: Returns the current date.CURRENT_TIME
: Returns the current time.DATEADD()
: Adds a specified time interval to a date.DATEDIFF()
: Calculates the difference between two dates.DATEPART()
: Extracts a specific part of a date or time.
Mathematical Functions: These functions perform mathematical operations on numeric values. Common mathematical functions include:
ABS()
: Returns the absolute value of a number.ROUND()
: Rounds a number to a specified number of decimal places.CEILING()
: Returns the smallest integer greater than or equal to a number.FLOOR()
: Returns the largest integer less than or equal to a number.
String Functions: These functions perform operations on string values. Common string functions include:
LEN()
: Returns the length of a string.LEFT()
: Returns the leftmost characters of a string.RIGHT()
: Returns the rightmost characters of a string.REPLACE()
: Replaces occurrences of a substring within a string.
WINDOW Functions:
In SQL, a window function works on a group of rows that are connected to the current record and returns one value per row. These functions work without the need of subqueries or self-joins, calculating across a set of linked rows to the current row. The 'OVER' clause, which specifies the window or subset of rows that the function operates across, is used to provide window functions.
We will look at some of the window function and their example using table from the above customer tables.
ROW_NUMBER():
Assigns a unique sequential integer to each row within a partition of a result set. This query assigns a row number to each customer based on their age to go orderly.
SELECT customer_id, first_name, last_name, ROW_NUMBER() OVER (ORDER BY age) AS row_number FROM Customers;
output:
RANK():
Assigns a unique rank to each distinct row within a partition of a result set, with gaps in rank numbers when there are ties. In the below code, query ranks customers based on their age.
SELECT
customer_id,
first_name,
last_name,
RANK() OVER (ORDER BY age) AS rank
FROM
Customers;
output:
DENSE_RANK():
Assigns a unique rank to each distinct row within a partition of a result set, without any gaps in rank numbers. This query assigns dense ranks to customers based on their age.
SELECT
customer_id,
first_name,
last_name,
DENSE_RANK() OVER (ORDER BY age) AS dense_rank
FROM
Customers;
output:
NTILE():
Divides the rows of a result set into a specified number of equally sized groups or "buckets" and assigns each row to a bucket.
This query divides customers into three equally sized buckets based on their age.
SELECT
customer_id,
first_name,
last_name,
NTILE(3) OVER (ORDER BY age) AS bucket
FROM
Customers;
output:
LAG():
Accesses data from a previous row in the result set, based on a specified offset. This query retrieves the age of the previous customer in the result set for each customer, ordered by age.
SELECT
customer_id,
first_name,
last_name,
age,
LAG(age) OVER (ORDER BY age) AS previous_age
FROM
Customers;
output:
- LEAD(): Accesses data from a subsequent row in the result set, based on a specified offset. This query retrieves the age of the next customer in the result set for each customer, ordered by age
SELECT
customer_id,
first_name,
last_name,
age,
LEAD(age) OVER (ORDER BY age) AS next_age
FROM
Customers;
output:
- FIRST_VALUE():
Returns the value of the specified expression from the first row in a window frame. This query retrieves the age of the first customer in the result set for each customer, ordered by age.
SELECT
customer_id,
first_name,
last_name,
age,
FIRST_VALUE(age) OVER (ORDER BY age) AS first_age
FROM
Customers;
output:
LAST_VALUE():
Returns the value of the specified expression from the last row in a window frame. This query retrieves the age of the last customer in the result set for each customer, ordered by age.
Things to Know:
a)
UNBOUNDED PRECEDING
indicates that the window frame starts from the first row of the partition (or result set) and includes all preceding rows up to the current row.b)
UNBOUNDED FOLLOWING
indicates that the window frame ends at the last row of the partition (or result set) and includes all following rows after the current row.So, when using
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
with anORDER BY
clause in a window function, it means that the window function considers all rows in the partition (or result set), from the first row to the last row, regardless of their order based on theage
column.
SELECT
customer_id,
first_name,
last_name,
age,
LAST_VALUE(age) OVER (ORDER BY age ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_age
FROM
Customers;
output:
NTH_VALUE():
Returns the value of the specified expression from the nth row in a window frame. This query retrieves the age of the third customer in the result set for each customer, ordered by age.
SELECT
customer_id,
first_name,
last_name,
age,
NTH_VALUE(age, 3) OVER (ORDER BY age ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS third_age
FROM
Customers;
output:
Conclusion
SQL window functions are strong tools that allow for complex data manipulation and analysis within of a query. Window functions assist complex calculations and offer significant insights by enabling operations across a set of rows relevant to the current row without requiring additional joins or subqueries. Gaining knowledge of and using window functions can significantly expand the capabilities of SQL queries and make it possible to analyse data more effectively and perceptively.