The result of this query is, in fact, the cartesian product:
Hence, we can say that the CROSS JOIN produces the cartesian product—i.e. for every row in the left-hand set, our result contains a row for every row in the right-hand set.
Reducing the Cartesian Product
The first thing to notice about the CROSS JOIN syntax is that there is no ‘ON’ clause—we are not specifying any way to “match” the rows from the left- and right-hand sets. All other join types do have an ON clause and so we can reduce the result set based on intent. Let’s consider that we want to match the ID columns from our left and right tables—and so we might want to have a statement like this:
If we consider the ON clause, and highlight the ID values that match in our CROSS JOIN result set, we can see two rows fit the bill:
And if we actually execute the query, we can see that we do get those two rows returned:
We can say, therefore, that the INNER JOIN returns only the rows where both sides of the join can provide a matching key.
Again, I’d like to point out that this is not how the INNER JOIN result set is created physically—no database engine would create the cartesian product and then select rows from that.
Leveling Up
Going from the cartesian product to the result of an INNER JOIN is fairly simple. We can visualize it as certain combinations being ‘picked’ from the cartesian product to produce a result. Getting to the results of an OUTER JOIN is trickier. Let’s look at why.
If we think of the definition of an OUTER JOIN, it includes the rows that aren’t matched by a predicate. Because of the fact that rows are repeated in the cartesian product, it’s no longer natural to think of the JOIN as a filter.
Considering our query again, but this time using an outer join:
Let’s just highlight the keys that aren’t matched against that predicate in our diagram:
We’ve highlighted the rows from the left and right sets that wouldn’t form a part of our INNER JOIN result. However, we have three of each, which isn’t hugely helpful for our visualization because an OUTER JOIN result would not include multiple rows. Also, in an OUTER JOIN, the unmatched side is represented by NULL values—and there aren’t any pairs matched with NULL in our cartesian product.
A typical flow for this sort of operation is to get the cartesian product, reduce it, and then add outer rows. I’m not sure that adding outer rows makes as much sense in terms of explanation, so it’s perhaps easier to think of OUTER JOINs as starting with a set that includes both the matches and the unmatched rows:
With OUTER JOIN, we are simply specifying which unmatched rows we would like to keep. In a LEFT OUTER JOIN we are keeping the unmatched rows from the left only. Similarly, a RIGHT OUTER JOIN keeps those on the right only. A FULL OUTER JOIN will keep the unmatched rows from both sides.
Let’s have a look at the actual results:
We can see that the columns from the non-matching sides have NULL values, indicating that a match is missing.
Logical Versus Physical
In this post, we’ve talked about logical joins—the “what” rather than the “how” of querying. It’s worth taking a look at the syntax that T-SQL offers because join hints provide a way that the logical intent is mixed in with the physical execution. Looking at an example from the linked resource:
In this query, we’re interested in the ‘LEFT OUTER HASH JOIN’ phrase—it contains a mixture of logical intent and physical instruction. The ‘LEFT OUTER JOIN’ is the join we want to perform. When we add the ‘HASH’ we’re adding an instruction about how we want to perform that join. In this post we’ve covered the logical operation only—the physical operation will be a great blog topic in the future!
Summing It Up
Let’s summarise what we’ve covered:
- CROSS JOIN is the full cartesian product of the two sides of a JOIN.
- INNER JOIN is a reduction of the cartesian product—we specify a predicate and get a result where the predicate matches.
- OUTER JOINs are more than a simple reduction—because the cartesian product contains non-matching rows multiple times and does not contain any pairs that have one NULL side.
- LEFT, RIGHT and FULL OUTER JOINs simply change the side on which unmatched rows are kept.
So, we’ve looked at the logical implications of INNER, OUTER, and CROSS JOINs. There’s a lot of depth that we haven’t covered, including semi-joins, joins with table-valued function output using the APPLY operator, the physical execution of various JOIN types, and more. Those may well be the topic for another day!
Comments