1. What Is a Correlated Subquery?
A Correlated Subquery is a subquery that:
- References one or more columns from the outer query
- Cannot execute independently
- Is executed once for each row processed by the outer query
It is called correlated because the inner query depends on the outer query.
2. Basic Structure of a Correlated Subquery
SELECT outer_column
FROM table1 t1
WHERE column operator (
SELECT column
FROM table2 t2
WHERE t2.column = t1.column
);
Notice:
- t1.column (outer query column)
- Referenced inside the subquery
That reference creates the correlation.
3. How Does Oracle Execute a Correlated Subquery?
Execution Flow:
1. Outer query fetches first row.
2. Inner query runs using that row’s values.
3. Condition is evaluated.
4. Outer query moves to next row.
5. Inner query runs again.
So if outer query returns 10,000 rows → inner query runs 10,000 times.
4. Simple Example
Requirement:
Find employees earning more than the average salary of their department.
SELECT e.employee_id, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
Explanation:
- Outer query processes one employee at a time.
- Inner query calculates average salary for that employee’s department.
- Comparison is done row by row.
5. How Is It Different from Non-Correlated Subquery?
|
Feature |
Correlated |
Non-Correlated |
|
References outer query |
Yes |
No |
|
Execution frequency |
Per row |
Once |
|
Performance impact |
Can be slower |
Usually faster |
|
Dependency |
Dependent |
Independent |
6. When Should You Use Correlated Subqueries?
Use when:
- Comparison depends on each row
- Group-based filtering is required
- EXISTS logic is needed
- Per-row validation is required
Common use cases:
- Top earners per department
- Duplicate detection
- Conditional existence checks
- Filtering based on related table values
7. Correlated Subquery with EXISTS
Very common and efficient pattern.
Example:
Return employees who have job history.
SELECT e.employee_id
FROM employees e
WHERE EXISTS (
SELECT 1
FROM job_history j
WHERE j.employee_id = e.employee_id
);
Explanation:
- For each employee, check if at least one matching row exists.
- Stops searching once match is found (efficient).
8. Correlated Subquery with NOT EXISTS
Find employees with no job history.
SELECT e.employee_id
FROM employees e
WHERE NOT EXISTS (
SELECT 1
FROM job_history j
WHERE j.employee_id = e.employee_id
);
Safer than NOT IN (avoids NULL issues).
9. Performance Considerations
Why It Can Be Slow:
- Inner query executes per outer row.
- Large tables → high logical I/O.
- Missing indexes → full scans repeatedly.
How to Improve Performance:
- Add indexes on correlated columns.
- Replace with JOIN if possible.
- Use EXISTS instead of IN in large datasets.
- Check execution plan.
10. Can Oracle Optimize Correlated Subqueries?
Yes.
Oracle optimizer may:
- Transform it into a JOIN
- Use HASH JOIN or NESTED LOOP
- Apply subquery unnesting
So correlated subqueries are not always slow.
11. Correlated Subquery vs JOIN
Correlated version:
SELECT e.employee_id
FROM employees e
WHERE EXISTS (
SELECT 1
FROM departments d
WHERE d.department_id = e.department_id
);
JOIN version:
SELECT e.employee_id
FROM employees e
JOIN departments d
ON e.department_id = d.department_id;
Both may produce same execution plan depending on optimizer.
12. Correlated Subquery in UPDATE
Example:
Give 10% raise to employees earning below department average.
UPDATE employees e
SET salary = salary * 1.10
WHERE salary < (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
Inner query runs per row being updated.
13. Correlated Subquery in DELETE
Delete employees earning below department average.
DELETE FROM employees e
WHERE salary < (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
14. Correlated Subquery with Multiple Conditions
SELECT e.employee_id
FROM employees e
WHERE EXISTS (
SELECT 1
FROM job_history j
WHERE j.employee_id = e.employee_id
AND j.end_date IS NOT NULL
);
You can add multiple filters inside subquery.
15. Real-World Business Scenario
Requirement:
Find highest paid employee in each department.
SELECT e.employee_id, e.department_id, e.salary
FROM employees e
WHERE e.salary = (
SELECT MAX(salary)
FROM employees
WHERE department_id = e.department_id
);
This compares each employee salary with department max salary.
16. Common Interview Questions
Q1: What is a correlated subquery?
A subquery that references columns from the outer query and executes once per outer row.
Q2: Why can it be slow?
Because the inner query runs for every outer row.
Q3: How can you improve performance?
- Add indexes
- Rewrite as JOIN
- Use EXISTS
- Check execution plan
Q4: Difference between EXISTS and IN in correlated subqueries?
- EXISTS checks row existence.
- IN compares values.
- EXISTS often better for large datasets.
17. Common Mistakes
- Missing index on correlated column
- Using NOT IN instead of NOT EXISTS
- Not understanding execution frequency
- Ignoring execution plan
- Assuming correlated subquery always slow
18. Best Practices
- Prefer EXISTS for correlated checks
- Ensure indexing on correlated columns
- Avoid complex nested correlated subqueries
- Test with execution plan
- Consider analytic functions as alternative
19. Alternative to Correlated Subquery (Analytic Function)
Instead of:
SELECT e.employee_id
FROM employees e
WHERE e.salary = (
SELECT MAX(salary)
FROM employees
WHERE department_id = e.department_id
);
You could use:
SELECT employee_id
FROM (
SELECT employee_id,
department_id,
salary,
MAX(salary) OVER (PARTITION BY department_id) max_sal
FROM employees
)
WHERE salary = max_sal;
Often more efficient for large datasets.
20. Short Interview Answer
“A correlated subquery is a subquery that references columns from the outer query and executes once for each outer row. It is commonly used with EXISTS for row-by-row validation. Proper indexing is important to avoid performance issues.”
No comments:
Post a Comment