Re: A fine point about OUTER JOIN semantics

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: A fine point about OUTER JOIN semantics
Date: 2000-09-01 20:47:14
Message-ID: 28874.967841234@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg와이즈 토토SQL

Am I right in thinking that the WHERE clause of a query must logically
be applied *after* any joins specified in the FROM clause?

For example, suppose that we have table t1 (x int) containing the
values 1, 2, 3, 4, and table t2 (y int) containing the values 1, 2, 4.
It's clear that the result of
SELECT * FROM t1 LEFT JOIN t2 ON (x = y);
should be
x y

1 1
2 2
3 NULL
4 4

But suppose we make the query
SELECT * FROM t1 LEFT JOIN t2 ON (x = y) WHERE y <> 2;
It seems to me this should yield
x y

1 1
3 NULL
4 4

and not
x y

1 1
2 NULL
3 NULL
4 4

which is what you'd get if the y=2 tuple were filtered out before
reaching the left-join stage. Does anyone read the spec differently,
or get the latter result from another implementation?

The reason this is interesting is that this example breaks a rather
fundamental assumption in our planner/optimizer, namely that WHERE
conditions can be pushed down to the lowest level at which all the
variables they mention are available. Thus the planner would normally
apply "y <> 2" during its bottom-level scan of t2, which would cause the
LEFT JOIN to decide that x = 2 is an unmatched value, and thus produce
a "2 NULL" output row.

An even more interesting example is
SELECT * FROM t1 FULL JOIN t2 ON (x = y AND y <> 2);
My interpretation is that this should produce
x y

1 1
2 NULL
NULL 2
3 NULL
4 4
since both t1's x=2 and t2's y=2 tuple will appear "unmatched".
This is *not* the same output you'd get from
SELECT * FROM t1 FULL JOIN t2 ON (x = y) WHERE y <> 2;
which I think should yield
x y

1 1
3 NULL
4 4
This shows that JOIN/ON conditions for outer joins are not semantically
interchangeable with WHERE conditions.

This is going to be a bit of work to fix, so I thought I'd better
confirm that I'm reading the spec correctly before I dive into it.

Comments?

regards, tom lane


From: Thomas Swan <tswan-lst(at)tangent(dot)ics(dot)olemiss(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: A fine point about OUTER JOIN semantics
Date: 2000-09-01 21:08:51
Message-ID: 5.0.0.13.2.20000901160825.01d03ed0@tangent.ics.olemiss.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom I'd be happy to go back through any specs for second thoughts... do you
have a URL to go to?

At 9/1/2000 04:47 PM -0400, Tom Lane wrote:
>Am I right in thinking that the WHERE clause of a query must logically
>be applied *after* any joins specified in the FROM clause?
>
>For example, suppose that we have table t1 (x int) containing the
>values 1, 2, 3, 4, and table t2 (y int) containing the values 1, 2, 4.
>It's clear that the result of
> SELECT * FROM t1 LEFT JOIN t2 ON (x = y);
>should be
> x y
>
> 1 1
> 2 2
> 3 NULL
> 4 4
>
>But suppose we make the query
> SELECT * FROM t1 LEFT JOIN t2 ON (x = y) WHERE y <> 2;
>It seems to me this should yield
> x y
>
> 1 1
> 3 NULL
> 4 4
>
>and not
> x y
>
> 1 1
> 2 NULL
> 3 NULL
> 4 4
>
>which is what you'd get if the y=2 tuple were filtered out before
>reaching the left-join stage. Does anyone read the spec differently,
>or get the latter result from another implementation?
>
>The reason this is interesting is that this example breaks a rather
>fundamental assumption in our planner/optimizer, namely that WHERE
>conditions can be pushed down to the lowest level at which all the
>variables they mention are available. Thus the planner would normally
>apply "y <> 2" during its bottom-level scan of t2, which would cause the
>LEFT JOIN to decide that x = 2 is an unmatched value, and thus produce
>a "2 NULL" output row.
>
>An even more interesting example is
> SELECT * FROM t1 FULL JOIN t2 ON (x = y AND y <> 2);
>My interpretation is that this should produce
> x y
>
> 1 1
> 2 NULL
> NULL 2
> 3 NULL
> 4 4
>since both t1's x=2 and t2's y=2 tuple will appear "unmatched".
>This is *not* the same output you'd get from
> SELECT * FROM t1 FULL JOIN t2 ON (x = y) WHERE y <> 2;
>which I think should yield
> x y
>
> 1 1
> 3 NULL
> 4 4
>This shows that JOIN/ON conditions for outer joins are not semantically
>interchangeable with WHERE conditions.
>
>This is going to be a bit of work to fix, so I thought I'd better
>confirm that I'm reading the spec correctly before I dive into it.
>
>Comments?
>
> regards, tom lane


From: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: A fine point about OUTER JOIN semantics
Date: 2000-09-02 06:20:26
Message-ID: 39B09C2A.FEBC6B85@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Am I right in thinking that the WHERE clause of a query must logically
> be applied *after* any joins specified in the FROM clause?
...
> This shows that JOIN/ON conditions for outer joins are not semantically
> interchangeable with WHERE conditions.

Right. In some cases, an outer join with WHERE restrictions reduces to
an inner join (so the qualification clauses can be consolidated). Our
optimizer should be on the lookout for that, at least eventually.

- Thomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: A fine point about OUTER JOIN semantics
Date: 2000-09-02 16:40:58
Message-ID: 1206.967912858@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu> writes:
> In some cases, an outer join with WHERE restrictions reduces to
> an inner join (so the qualification clauses can be consolidated).

I recall you having muttered something about that before, but I don't
see how it works. Can you give an example of an outer join that
reduces to an inner join?

regards, tom lane


From: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: A fine point about OUTER JOIN semantics
Date: 2000-09-04 18:25:28
Message-ID: 39B3E918.2D869F2B@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > In some cases, an outer join with WHERE restrictions reduces to
> > an inner join (so the qualification clauses can be consolidated).
> I recall you having muttered something about that before, but I don't
> see how it works. Can you give an example of an outer join that
> reduces to an inner join?

Hmm. This example is pretty silly, but afaik it reduces to an inner
join:

select i, j from t1 left join t2 using (i) where j is not null;

(where t1 has column "i" and t2 has columns "i" and "j").

- Thomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: A fine point about OUTER JOIN semantics
Date: 2000-09-04 21:35:40
Message-ID: 1690.968103340@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg롤 토토SQL :

Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu> writes:
>>>> In some cases, an outer join with WHERE restrictions reduces to
>>>> an inner join (so the qualification clauses can be consolidated).
>> I recall you having muttered something about that before, but I don't
>> see how it works. Can you give an example of an outer join that
>> reduces to an inner join?

> Hmm. This example is pretty silly, but afaik it reduces to an inner
> join:

> select i, j from t1 left join t2 using (i) where j is not null;

> (where t1 has column "i" and t2 has columns "i" and "j").

Well, I guess so, but I can't get excited about adding machinery to
detect cases like this ... are there any less-silly examples that make
a more compelling case for expending planner cycles to see if an outer
join can be reduced to an inner join?

regards, tom lane


From: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: A fine point about OUTER JOIN semantics
Date: 2000-09-05 14:29:22
Message-ID: 39B50342.69108852@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Well, I guess so, but I can't get excited about adding machinery to
> detect cases like this ... are there any less-silly examples that make
> a more compelling case for expending planner cycles to see if an outer
> join can be reduced to an inner join?

Well, in all cases the outer join reduces to an inner join if there is a
qualification which would eliminate nulls from any intermediate result.
It would be neat to see this happen automagically, but perhaps that
would be a gratuitously studly feature ;)

- Thomas