Correlated Subquery FAQS

1. What Is a Correlated Subquery?

A Correlated Subquery is a subquery that:

  • References one or more columns from the outer query
  • Cannot execute independently
  • Is executed once for each row processed by the outer query

It is called correlated because the inner query depends on the outer query.

2. Basic Structure of a Correlated Subquery

SELECT outer_column

FROM table1 t1

WHERE column operator (

    SELECT column

    FROM table2 t2

    WHERE t2.column = t1.column

);

Notice:

  • t1.column (outer query column)
  • Referenced inside the subquery

That reference creates the correlation.

3. How Does Oracle Execute a Correlated Subquery?

Execution Flow:

1.    Outer query fetches first row.

2.    Inner query runs using that row’s values.

3.    Condition is evaluated.

4.    Outer query moves to next row.

5.    Inner query runs again.

So if outer query returns 10,000 rows inner query runs 10,000 times.

4. Simple Example

Requirement:

Find employees earning more than the average salary of their department.

SELECT e.employee_id, e.salary, e.department_id

FROM employees e

WHERE e.salary > (

    SELECT AVG(salary)

    FROM employees

    WHERE department_id = e.department_id

);

Explanation:

  • Outer query processes one employee at a time.
  • Inner query calculates average salary for that employee’s department.
  • Comparison is done row by row.

5. How Is It Different from Non-Correlated Subquery?

Feature

Correlated

Non-Correlated

References outer query

Yes

No

Execution frequency

Per row

Once

Performance impact

Can be slower

Usually faster

Dependency

Dependent

Independent

6. When Should You Use Correlated Subqueries?

Use when:

  • Comparison depends on each row
  • Group-based filtering is required
  • EXISTS logic is needed
  • Per-row validation is required

Common use cases:

  • Top earners per department
  • Duplicate detection
  • Conditional existence checks
  • Filtering based on related table values

7. Correlated Subquery with EXISTS

Very common and efficient pattern.

Example:

Return employees who have job history.

SELECT e.employee_id

FROM employees e

WHERE EXISTS (

    SELECT 1

    FROM job_history j

    WHERE j.employee_id = e.employee_id

);

Explanation:

  • For each employee, check if at least one matching row exists.
  • Stops searching once match is found (efficient).

8. Correlated Subquery with NOT EXISTS

Find employees with no job history.

SELECT e.employee_id

FROM employees e

WHERE NOT EXISTS (

    SELECT 1

    FROM job_history j

    WHERE j.employee_id = e.employee_id

);

Safer than NOT IN (avoids NULL issues).

9. Performance Considerations

Why It Can Be Slow:

  • Inner query executes per outer row.
  • Large tables high logical I/O.
  • Missing indexes full scans repeatedly.

How to Improve Performance:

  • Add indexes on correlated columns.
  • Replace with JOIN if possible.
  • Use EXISTS instead of IN in large datasets.
  • Check execution plan.

10. Can Oracle Optimize Correlated Subqueries?

Yes.

Oracle optimizer may:

  • Transform it into a JOIN
  • Use HASH JOIN or NESTED LOOP
  • Apply subquery unnesting

So correlated subqueries are not always slow.

11. Correlated Subquery vs JOIN

Correlated version:

SELECT e.employee_id

FROM employees e

WHERE EXISTS (

    SELECT 1

    FROM departments d

    WHERE d.department_id = e.department_id

);

JOIN version:

SELECT e.employee_id

FROM employees e

JOIN departments d

ON e.department_id = d.department_id;

Both may produce same execution plan depending on optimizer.

12. Correlated Subquery in UPDATE

Example:

Give 10% raise to employees earning below department average.

UPDATE employees e

SET salary = salary * 1.10

WHERE salary < (

    SELECT AVG(salary)

    FROM employees

    WHERE department_id = e.department_id

);

Inner query runs per row being updated.

13. Correlated Subquery in DELETE

Delete employees earning below department average.

DELETE FROM employees e

WHERE salary < (

    SELECT AVG(salary)

    FROM employees

    WHERE department_id = e.department_id

);

14. Correlated Subquery with Multiple Conditions

SELECT e.employee_id

FROM employees e

WHERE EXISTS (

    SELECT 1

    FROM job_history j

    WHERE j.employee_id = e.employee_id

    AND j.end_date IS NOT NULL

);

You can add multiple filters inside subquery.

15. Real-World Business Scenario

Requirement:

Find highest paid employee in each department.

SELECT e.employee_id, e.department_id, e.salary

FROM employees e

WHERE e.salary = (

    SELECT MAX(salary)

    FROM employees

    WHERE department_id = e.department_id

);

This compares each employee salary with department max salary.

16. Common Interview Questions

Q1: What is a correlated subquery?

A subquery that references columns from the outer query and executes once per outer row.

Q2: Why can it be slow?

Because the inner query runs for every outer row.

Q3: How can you improve performance?

  • Add indexes
  • Rewrite as JOIN
  • Use EXISTS
  • Check execution plan

Q4: Difference between EXISTS and IN in correlated subqueries?

  • EXISTS checks row existence.
  • IN compares values.
  • EXISTS often better for large datasets.

17. Common Mistakes

  • Missing index on correlated column
  • Using NOT IN instead of NOT EXISTS
  • Not understanding execution frequency
  • Ignoring execution plan
  • Assuming correlated subquery always slow

18. Best Practices

  • Prefer EXISTS for correlated checks
  • Ensure indexing on correlated columns
  • Avoid complex nested correlated subqueries
  • Test with execution plan
  • Consider analytic functions as alternative

19. Alternative to Correlated Subquery (Analytic Function)

Instead of:

SELECT e.employee_id

FROM employees e

WHERE e.salary = (

    SELECT MAX(salary)

    FROM employees

    WHERE department_id = e.department_id

);

You could use:

SELECT employee_id

FROM (

    SELECT employee_id,

           department_id,

           salary,

           MAX(salary) OVER (PARTITION BY department_id) max_sal

    FROM employees

)

WHERE salary = max_sal;

Often more efficient for large datasets.

20. Short Interview Answer

“A correlated subquery is a subquery that references columns from the outer query and executes once for each outer row. It is commonly used with EXISTS for row-by-row validation. Proper indexing is important to avoid performance issues.”

 

No comments:

Post a Comment