PARTITION BY

 In Oracle SQL, the PARTITION BY clause is used in conjunction with window functions to divide the result set into partitions (groups of rows), and it allows you to perform calculations within each group separately.

When used with functions like ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), SUM(), AVG(), etc., the PARTITION BY clause defines how the data is grouped before the window function is applied. This allows for advanced analytics without needing to write complex joins or subqueries.

Key Points:

  1. Partitioning: It divides the data into partitions (groups) based on the specified column(s).
  2. Window Functions: The window function operates on each partition independently, and results are calculated per partition.
  3. Sorting within Partitions: Often, you may use the ORDER BY clause to control the order of rows within each partition, affecting how window functions behave.

Syntax:

<window_function> OVER (PARTITION BY <column_name> ORDER BY <column_name> [ASC/DESC])

  • PARTITION BY <column_name>: Defines how the result set is split into groups.
  • ORDER BY <column_name>: Specifies the order of rows within each partition (this is optional but important for some window functions).

Example 1: Using ROW_NUMBER() with PARTITION BY

This example demonstrates how to assign a row number to each row within a specific partition.

SELECT

    employee_id,

    department_id,

    salary,

    ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num

FROM employees;

  • Explanation: The query assigns a row number to each employee, grouped by department_id, and ordered by salary in descending order within each department. The row number is reset for each department.

Example 2: Using SUM() with PARTITION BY

This example calculates the total salary within each department, without needing to group the result set into separate rows.

SELECT

    employee_id,

    department_id,

    salary,

    SUM(salary) OVER (PARTITION BY department_id) AS department_total_salary

FROM employees;

  • Explanation: This query calculates the total salary for each department, and the result is shown next to each employee’s row. The sum of the salary is calculated within each department_id partition.

Example 3: Using RANK() with PARTITION BY

This example shows how to assign a rank to employees within each department based on their salary.

SELECT

    employee_id,

    department_id,

    salary,

    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank

FROM employees;

  • Explanation: This query ranks employees within each department by their salary, in descending order. If two employees have the same salary, they will receive the same rank, and the next rank will be skipped (i.e., "dense ranking").

Example 4: Using NTILE() with PARTITION BY

This example divides the employees into 4 groups (quartiles) within each department based on their salary.

SELECT

    employee_id,

    department_id,

    salary,

    NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_quartile

FROM employees;

  • Explanation: This query divides employees into 4 quartiles within each department based on their salary, ordered by salary in descending order. The NTILE(4) function evenly distributes the rows across 4 groups.

Key Window Functions Often Used with PARTITION BY:

  1. ROW_NUMBER(): Assigns a unique row number to each row within the partition.
  2. RANK(): Assigns a rank to each row within the partition. Ties are assigned the same rank, and the next rank is skipped.
  3. DENSE_RANK(): Similar to RANK(), but no gaps are left in the ranking.
  4. NTILE(n): Divides the rows into n groups and assigns each row a group number.
  5. SUM(), AVG(), COUNT(), MIN(), MAX(): These aggregation functions can be used as window functions to calculate sums, averages, counts, etc., over partitions.
  6. LEAD() and LAG(): These functions allow you to access data from subsequent or previous rows within the partition.

Example 5: Using LEAD() or LAG() with PARTITION BY

These window functions allow you to access the value of a row from the next (LEAD()) or previous (LAG()) row.

SELECT

    employee_id,

    department_id,

    salary,

    LAG(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS prev_salary

FROM employees;

  • Explanation: This query retrieves the salary of the previous employee (by salary) in the same department. If there is no previous row, the result will be NULL.

Important Considerations:

  1. PARTITION BY without ORDER BY: If you omit ORDER BY, the window function operates on an unordered partition, which might lead to unpredictable results.
  2. Efficiency: Window functions can be computationally expensive, especially on large datasets. Indexing the partitioned and ordered columns can improve performance.
  3. Resetting Partition: The result of the window function will reset with each new partition. So, every partition gets its own set of calculations.

Example of Using Multiple Window Functions:

You can combine multiple window functions in the same query to analyze your data from different perspectives.

SELECT

    employee_id,

    department_id,

    salary,

    SUM(salary) OVER (PARTITION BY department_id) AS department_total_salary,

    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank

FROM employees;

  • Explanation: This query calculates both the total salary for each department and the rank of each employee within their department based on their salary.

Conclusion:

The PARTITION BY clause in Oracle allows you to perform sophisticated analytics and calculations within subsets of data, without changing the structure of your result set. It’s particularly powerful when used with window functions like ROW_NUMBER(), RANK(), SUM(), and others, giving you the flexibility to derive insights without requiring complex subqueries or joins.

 

No comments:

Post a Comment