DISTINCT

The DISTINCT keyword in Oracle SQL is used to remove duplicate rows from the result set, ensuring that only unique (distinct) records are returned. This is especially useful when you want to get a list of unique values from one or more columns in a query.

 

Key Points About DISTINCT in Oracle

 

1. Basic Usage of DISTINCT

The DISTINCT keyword is placed directly after SELECT to eliminate duplicate rows from the result set based on the columns specified in the query.

Syntax:

SELECT DISTINCT column1, column2, ...
FROM table_name;

Example:

SELECT DISTINCT department_id
FROM employees;

In this example, the query will return only unique department_id values, even if some departments appear multiple times in the table.

 

2. DISTINCT with Multiple Columns

When DISTINCT is applied to multiple columns, the uniqueness of the result set is determined by the combination of all the specified columns. This means that only rows with a unique combination of values across all the specified columns will be returned.

Example:

SELECT DISTINCT department_id, job_id
FROM employees;

In this case, the query will return only distinct combinations of department_id and job_id. If the same pair of department_id and job_id appears in multiple rows, only one row will be returned.

 

3. DISTINCT on a Single Column

You can apply DISTINCT to a single column to get unique values from that column.

Example:

SELECT DISTINCT job_id
FROM employees;

This query will return a list of unique job titles across all employees.

 

4. DISTINCT and Aggregate Functions

When combined with aggregate functions such as COUNT(), SUM(), AVG(), etc., DISTINCT affects only the arguments within the aggregate function.

  • COUNT DISTINCT: Counts only unique values in a column.

Example:

SELECT COUNT(DISTINCT department_id)
FROM employees;

This query will return the number of distinct department IDs in the employees' table.

 

5. DISTINCT with ORDER BY Clause

You can use DISTINCT in combination with the ORDER BY clause to return distinct values sorted in a particular order.

Example:

SELECT DISTINCT department_id
FROM employees
ORDER BY department_id;

This query will return distinct department IDs, sorted in ascending order.

 

6. Performance Considerations with DISTINCT

  • Using DISTINCT requires Oracle to scan the result set and remove duplicate rows. This can have performance implications, especially on large tables or when applied to multiple columns.
  • To optimize performance, ensure that the columns you use with DISTINCT are indexed, if possible. This can speed up the search and reduce the overall cost of the operation.

 

7. DISTINCT and NULL Values

  • DISTINCT treats NULL values as identical. If a column has multiple NULL values, DISTINCT will consider them as a single value, meaning only one NULL will appear in the result set.

Example:

SELECT DISTINCT department_id
FROM employees;

If the department_id column contains multiple NULL values, only one NULL value will appear in the result set, even if the table contains many rows with NULL.

 

8. Using DISTINCT with JOIN

DISTINCT can also be used in queries involving JOIN operations to return unique rows based on a combination of columns from different tables.

Example:

SELECT DISTINCT e.department_id, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

This query will return unique combinations of department_id and department_name from the employees and departments tables.

 

9. Using DISTINCT with GROUP BY

DISTINCT is often used with GROUP BY to ensure unique groupings, but DISTINCT is not necessary when using GROUP BY because GROUP BY already groups rows into unique combinations.

Example with DISTINCT:

SELECT DISTINCT department_id
FROM employees
GROUP BY department_id;

However, since GROUP BY groups the data by the specified column(s), applying DISTINCT would be redundant here.

 

10. DISTINCT vs. GROUP BY

  • Both DISTINCT and GROUP BY can be used to remove duplicates, but they serve different purposes:
    • DISTINCT: Removes duplicate rows based on the entire result set or specific columns.
    • GROUP BY: Groups rows by specific columns and can be used in conjunction with aggregate functions (e.g., COUNT(), SUM(), AVG()).

Example of DISTINCT:

SELECT DISTINCT department_id
FROM employees;

Example of GROUP BY:

SELECT department_id, COUNT(*)
FROM employees
GROUP BY department_id;

 

11. Common Use Cases for DISTINCT

  • Retrieving Unique Values: When you need a list of unique values from a table (e.g., unique department IDs, job titles).
  • Removing Duplicates: When duplicates may arise in a result set due to joins or other operations.
  • Data Quality Checks: When analyzing data quality, you might use DISTINCT to check how many unique entries exist in a column.
  • Aggregating Data: When you want to count or summarize distinct values in your dataset.

 

12. Limitations of DISTINCT

  • Performance: As mentioned earlier, using DISTINCT can slow down query performance, especially on large datasets.
  • Duplication in Complex Joins: Even when using DISTINCT, complex joins or operations can still result in duplicated rows if the join condition doesn't guarantee uniqueness.

 

Examples of DISTINCT in Queries

1. Simple Use Case

Get distinct job IDs from the employees table.

SELECT DISTINCT job_id
FROM employees;

2. Using DISTINCT with Multiple Columns

Get distinct department ID and job ID combinations.

SELECT DISTINCT department_id, job_id
FROM employees;

3. Using DISTINCT with COUNT

Count the number of unique departments.

SELECT COUNT(DISTINCT department_id)
FROM employees;

4. Using DISTINCT with ORDER BY

Get distinct department IDs sorted in ascending order.

SELECT DISTINCT department_id
FROM employees
ORDER BY department_id;

5. Combining DISTINCT with JOIN

Get distinct department IDs and department names from two tables (employees and departments).

SELECT DISTINCT e.department_id, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

 

 

No comments:

Post a Comment