Understanding RANK() vs. DENSE_RANK()

Understanding RANK() vs. DENSE_RANK() in SQL

The RANK() and DENSE_RANK() window functions are powerful tools in SQL for ranking rows within a result set. This guide explains their differences, particularly in how they handle ties in ranking.

Overview

  • RANK(): Assigns a rank to each row, leaving gaps in rank values when there are ties.
  • DENSE_RANK(): Similar to RANK(), but does not leave gaps in the ranking sequence when there are ties.

Syntax


function() OVER (PARTITION BY <column> ORDER BY <column>)
  • PARTITION BY: (optional) Divides the result set into partitions for ranking.
  • ORDER BY: Specifies the order in which to rank the rows.

Example Dataset

Consider the following employees table:

Name Salary
Alice 70000
Bob 70000
Charlie 60000
David 80000
Eva 60000

Example Queries

Using RANK()


SELECT Name, Salary,
       RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM employees;

Result Set with RANK()

Name Salary Rank
David 80000 1
Alice 70000 2
Bob 70000 2
Charlie 60000 4
Eva 60000 4

Using DENSE_RANK()


SELECT Name, Salary,
       DENSE_RANK() OVER (ORDER BY Salary DESC) AS DenseRank
FROM employees;

Result Set with DENSE_RANK()

Name Salary DenseRank
David 80000 1
Alice 70000 2
Bob 70000 2
Charlie 60000 3
Eva 60000 3

Key Differences

  • Gaps in Ranking:
    • RANK(): Introduces gaps in the ranking sequence when there are ties.
    • DENSE_RANK(): Provides a continuous ranking sequence without gaps.
  • Use Cases:
    • Use RANK() for identifying ranks with gaps.
    • Use DENSE_RANK() for sequential ranking without gaps.

Summary

Both RANK() and DENSE_RANK() are valuable for assigning rankings based on specified criteria. The choice depends on whether you want to maintain gaps in the ranks (using RANK()) or ensure a continuous ranking sequence (using DENSE_RANK()).

Comments

Popular posts from this blog

Databricks Auto Loader

Optimizing Spark memory - Resolving OutOfMemory (OOM)

PySpark tips for day-to-day use