Intersect and Except
Within the standard of SQL language there are clauses of SELECT statement for fulfillment of operations of intersection and subtraction of the queries. Such queries are INTERSECT [ALL] (intersection) and EXCEPT [ALL] (subtraction), which operate analogously to UNION statement. Into the resulting set only those rows are included that are in the both queries (INTERSECT) and only those rows of the first query, that are absent in the second one (EXCEPT). Thereby both of the queries, that are involved in the operation, should be characterized by the same number of columns, and the corresponding columns should have the same (or implied) data types. The titles of the columns of the resulting set are formed from the titles of the first query.
If the key word ALL is not used, then the duplicate strings should be automatically canceled during fulfillment of the operation. If ALL is indicated, then the number of duplicate rows is subject to the following rules (n1 – the number of duplicate rows of the first query, n2 – the number of duplicate rows of the second query):
- INTERSECT ALL: min(n1, n2)
- EXCEPT ALL: n1 - n2, if n1>n2.
Example 5.7.3
SELECT name FROM Ships
INTERSECT
SELECT ship FROM Outcomes;
[[ column ]] |
---|
[[ value ]] |
In relational algebra intersection operation is commutative, as it is applied to relationships with identical titles. In SQL we can also change the order of the queries. The above-cited solution will give the same result as the following:
SELECT ship FROM Outcomes
INTERSECT
SELECT name FROM Ships;
[[ column ]] |
---|
[[ value ]] |
SELECT name FROM (
SELECT ship FROM Outcomes
INTERSECT
SELECT name FROM Ships
) x;
[[ column ]] |
---|
[[ value ]] |
Example 5.7.4
The problem is easily solved by means of EXCEPT statement:
SELECT ship FROM Outcomes
EXCEPT
SELECT name FROM Ships;
[[ column ]] |
---|
[[ value ]] |
Operation of subtraction is not commutative, therefore if we change the order of the queries, then we will obtain the solution of another problem:
Pls. note, that if some ship has taken part in several battles, then it will be selected only once for the resulting set for the reason of excluding of the duplicates. We have got such a ship - California, but it is included into Ships table also, and that is why it is not selected by the aforementioned query. That is why in order to illustrate what has been said above, let’s exclude it from the result of the second query of the subtraction operation:
SELECT ship FROM Outcomes
EXCEPT
SELECT name FROM Ships WHERE name <> 'California';
[[ column ]] |
---|
[[ value ]] |
ship |
---|
Bismarck |
California |
Duke of York |
Fuso |
Hood |
King George V |
Prince of Wales |
Rodney |
Schamhorst |
West Virginia |
Yamashiro |
For the available data set we will obtain the same result while executing the following query:
SELECT ship FROM Outcomes
EXCEPT ALL
SELECT name FROM Ships;
(2 duplicates for ‘California’ in Outcomes table minus 1 - in Ships)
Accordingly the query
SELECT ship FROM Outcomes
EXCEPT ALL
SELECT name FROM Ships WHERE name<>'California';
will generate double selection of California ship in the resulting data set (2 - 0 = 2):
ship |
---|
Bismarck |
California |
California |
Duke of York |
Fuso |
Hood |
King George V |
Prince of Wales |
Rodney |
Schamhorst |
West Virginia |
Yamashiro |
It is worth mentioning that not all the DBMS support these clauses of SELECT statement. In particular, INTERSECT/EXCEPT are not supported by MySQL and has been supported by MS SQL Server beginning with 2005 version, and the key word ALL was not included into it. ALL in conjunction with INTERSECT/EXCEPT are not maintained by Oracle either. It should be noted that MINUS keyword is used in Oracle instead of standard EXCEPT.
Therefore for fulfillment of operations of intersection and subtraction other means can be used. It should be mentioned in this respect that the same result can be obtained using various wordings of SELECT statement. In the case of intersection and subtraction one can use the predicate of existence EXISTS.
In the end let’s consider the example of use of INTERSECT ALL operation.
Example 5.7.5
SELECT maker FROM (
SELECT maker FROM Product WHERE type='PC'
INTERSECT ALL
SELECT maker FROM Product WHERE type ='Printer'
) X GROUP BY maker HAVING COUNT(*)>1;
INTERSECT ALL in the subquery of this solution will leave the minimal number of duplicates, i.e. if the producer manufactures 2 models of PC and one model of the printer (or vice a versa), he will be mentioned in the resulting data set once. Further on we make the grouping in accordance with the name of producer, leaving only those of them, that are indicated in the results of the subquery more than once.
Of course, we can solve this problem without the use of intersection operation. For example, by means of one query we will select those producers, that manufacture not less than 2 PC models, and by means of another query we will select those that produce not less than 2 models of printers. The solution of the problem will lead to joining of the subqueries. Below this algorithm is achieved on the basis of one standard type of joins – natural join:
SELECT PC.maker FROM (
SELECT maker FROM Product
WHERE type='PC' GROUP BY maker HAVING COUNT(*)>1) PC
NATURAL JOIN
(
SELECT maker FROM Product
WHERE type='Printer' GROUP BY maker HAVING COUNT(*)>1) Pr;
NATURAL JOIN is equijoin within the columns with identical names. SQL Server does not support the natural join, therefore the last query may be executed, say, by means of PostgreSQL.
EXCEPT operator excludes duplicates of rows from query result. This fact gives us another way of duplicates elimination. Let’s consider available ways on the example of the following task (Painting DB is used).
- Using DISTINCT keyword
select distinct v_color from utV;
[[ column ]] |
---|
[[ value ]] |
- Using GROUP BY clause
select v_color from utV group by v_color;
[[ column ]] |
---|
[[ value ]] |
- Using EXCEPT operator
This way is based on the idea of “subtracting” a unexisting color, ex. ‘Z’ from existing colors:
select v_color from utV
except
select 'Z';
[[ column ]] |
---|
[[ value ]] |
Because v_color column does not allow NULL, the last query can be rewritten in more universal form:
select v_color from utV
except
select NULL;
[[ column ]] |
---|
[[ value ]] |
SQL Server estimates the cost of all these queries as equal. It is not surprising in view of the fact that each of the queries accomplishes scanning the table and sorting, which are the most “heavy” operations in the query execution plan.