MINUS

The MINUS operator in Oracle SQL is used to return the difference between two result sets. It returns all the rows from the first SELECT statement that are not present in the second SELECT statement. In other words, MINUS returns the set of rows that exist in the first query but do not exist in the second query. It’s essentially the opposite of the INTERSECT operator, which returns only the rows that exist in both result sets.

1. Basic Syntax of MINUS

SELECT column1, column2, ...
FROM table1
WHERE condition1
 
MINUS
 
SELECT column1, column2, ...
FROM table2
WHERE condition2;
  • Each SELECT statement involved in a MINUS operation must return the same number of columns with compatible data types.
  • The result will contain all rows from the first SELECT that are not in the second SELECT.

2. Key Characteristics of MINUS

a. Removes Duplicates

  • MINUS removes duplicates by default. If a row exists multiple times in the first SELECT statement but not in the second, only one occurrence of that row will be included in the final result set.
  • This behavior is similar to UNION (which also removes duplicates), but different from UNION ALL (which retains duplicates).

b. Columns and Data Types

  • The number of columns returned by both SELECT statements involved in the MINUS operator must be the same.
  • Corresponding columns in both SELECT statements must have compatible data types. For example, a VARCHAR column in one SELECT can be subtracted from another VARCHAR column, but you cannot subtract a NUMBER column from a DATE column unless an implicit or explicit type conversion can be performed.

c. Order of Queries

  • The first query (the one before MINUS) is considered the "primary" query, and its rows will be compared against the rows from the second query (the one after MINUS).
  • The result will contain only the rows from the first query that are not found in the second query.

3. How MINUS Works

a. Return Rows from the First Query That Are Not in the Second

  • MINUS compares the result set of the first query with the result set of the second query and returns all rows from the first query that are not present in the second query.
  • For a row to be considered "equal" in both result sets, it must have the same values in all columns.

Example:

SELECT column1 FROM table1
MINUS
SELECT column1 FROM table2;

This will return all values in column1 from table1 that are not found in column1 of table2.

b. Handling Duplicates

  • As mentioned earlier, MINUS removes duplicates. If there are duplicate rows in the first query but not in the second, only one instance of those rows will appear in the result.

Example:

SELECT column1 FROM table1
MINUS
SELECT column1 FROM table2;

If table1 has two rows with the value 'A' and table2 has one row with 'A', the result will return 'A' only once (even though it appeared twice in table1).

4. Use Case for MINUS

a. Finding Rows in One Table but Not Another

  • MINUS is commonly used to find records in one table that are not present in another. For example, you can use MINUS to find customers in customers who have not made a purchase in the orders table:
SELECT customer_id FROM customers
MINUS
SELECT customer_id FROM orders;

This query will return the IDs of customers who are in the customers table but have not made any purchases (i.e., their customer_id is not in the orders table).

b. Data Comparison

  • You can use MINUS to compare the results of two different queries or data sources, and find what data exists in one source but not the other.

For example, comparing two lists of products from different suppliers:

SELECT product_id FROM supplier1_products
MINUS
SELECT product_id FROM supplier2_products;

This query will return a list of product IDs that are in the supplier1_products table but are not in the supplier2_products table.

c. Identifying Changes in Data

  • MINUS is also useful when comparing datasets at different points in time. For example, you could use MINUS to identify rows that were in a dataset previously but have been removed in a newer version.
SELECT id FROM dataset_version1
MINUS
SELECT id FROM dataset_version2;

This query will return the IDs that existed in dataset_version1 but are no longer in dataset_version2.

5. Performance Considerations

  • Removing Duplicates: Like UNION, MINUS removes duplicates, which can be a performance overhead, especially with large datasets. If you need to avoid this overhead and don't care about duplicates, UNION ALL might be a better choice.
  • Indexes: Performance can be improved by having appropriate indexes on the columns involved in the MINUS operation. This is particularly helpful when comparing large tables.
  • Efficient Query Design: Make sure that both SELECT statements are optimized, particularly for complex queries involving joins or subqueries.

6. Examples of Using MINUS

Example 1: Finding Missing Employees in the terminated_employees Table

Imagine you want to find employees that are no longer in the employees table but who are still listed in the terminated_employees table.

SELECT employee_id FROM terminated_employees
MINUS
SELECT employee_id FROM employees;

This query will return employee IDs that are in the terminated_employees table but have been removed from the employees table.

Example 2: Compare Two Lists of Products

You can use MINUS to compare products listed by two suppliers and see which products appear in the first supplier’s list but not the second.

SELECT product_id FROM supplier1_products
MINUS
SELECT product_id FROM supplier2_products;

Example 3: Finding Unmatched Records Between Two Tables

If you want to find records that are in one table but not in another, you can use MINUS. For instance, you could find employees who have no corresponding records in the attendance table.

SELECT employee_id FROM employees
MINUS
SELECT employee_id FROM attendance;

7. Key Points to Remember

  • Number of Columns: The SELECT statements involved in a MINUS operation must return the same number of columns.
  • Data Type Compatibility: The columns in both SELECT statements must have compatible data types.
  • Performance: MINUS removes duplicates by default, which may have performance implications for large datasets. It’s usually slower than UNION ALL but might be faster than UNION (since it doesn’t have to perform additional operations to eliminate duplicates).
  • Not for Set Operations: Unlike INTERSECT, which returns rows common to both queries, MINUS returns rows from the first query that are not in the second.

8. MINUS vs. NOT EXISTS

Both MINUS and the NOT EXISTS clause can be used to find rows in one query that do not exist in another, but they have different behaviors:

  • MINUS requires two separate SELECT statements and returns the difference between them.
  • NOT EXISTS is used in a subquery and returns rows from the outer query where the subquery condition is not met.

Example using NOT EXISTS:

SELECT employee_id FROM employees e
WHERE NOT EXISTS (
    SELECT 1 FROM attendance a WHERE a.employee_id = e.employee_id
);

While both methods can accomplish similar tasks, MINUS may be easier for simple set-based differences.

 

No comments:

Post a Comment