Quick Links

AW: A fine point about OUTER JOIN semantics

Lists:	pgsql-hackers

From:	Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at>
To:	"'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	AW: A fine point about OUTER JOIN semantics
Date:	2000-09-05 14:56:59
Message-ID:	11C1E6749A55D411A9670001FA687963368065@sdexcsrv1.f000.d0188.sd.spardat.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I can tell you the results Informix produces:

> Am I right in thinking that the WHERE clause of a query must logically
> be applied *after* any joins specified in the FROM clause?
>
> For example, suppose that we have table t1 (x int) containing the
> values 1, 2, 3, 4, and table t2 (y int) containing the values 1, 2, 4.
> It's clear that the result of
> SELECT * FROM t1 LEFT JOIN t2 ON (x = y);
> should be
> x y
>
> 1 1
> 2 2
> 3 NULL
> 4 4

same

>
> But suppose we make the query
> SELECT * FROM t1 LEFT JOIN t2 ON (x = y) WHERE y <> 2;
> It seems to me this should yield
> x y
>
> 1 1
> 3 NULL
> 4 4
>
> and not
> x y
>
> 1 1
> 2 NULL
> 3 NULL
> 4 4

x y

1 1
4 4

>
> which is what you'd get if the y=2 tuple were filtered out before
> reaching the left-join stage. Does anyone read the spec differently,
> or get the latter result from another implementation?
>
> The reason this is interesting is that this example breaks a rather
> fundamental assumption in our planner/optimizer, namely that WHERE
> conditions can be pushed down to the lowest level at which all the
> variables they mention are available. Thus the planner would normally
> apply "y <> 2" during its bottom-level scan of t2, which
> would cause the
> LEFT JOIN to decide that x = 2 is an unmatched value, and thus produce
> a "2 NULL" output row.
>
> An even more interesting example is
> SELECT * FROM t1 FULL JOIN t2 ON (x = y AND y <> 2);
> My interpretation is that this should produce
> x y
>
> 1 1
> 2 NULL
> NULL 2
> 3 NULL
> 4 4

x y

1 1
4 4

> since both t1's x=2 and t2's y=2 tuple will appear "unmatched".
> This is *not* the same output you'd get from
> SELECT * FROM t1 FULL JOIN t2 ON (x = y) WHERE y <> 2;
> which I think should yield
> x y
>
> 1 1
> 3 NULL
> 4 4
> This shows that JOIN/ON conditions for outer joins are not
> semantically
> interchangeable with WHERE conditions.

x y

1 1
4 4

>
> This is going to be a bit of work to fix, so I thought I'd better
> confirm that I'm reading the spec correctly before I dive into it.

No idea if they interpret correctly, but seems they hand it interchangeably.
Someone want to check Oracle and MS Sql ?

Andreas

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: AW: A fine point about OUTER JOIN semantics
Date:	2000-09-05 15:30:15
Message-ID:	1251.968167815@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at> writes:
>> But suppose we make the query
>> SELECT * FROM t1 LEFT JOIN t2 ON (x = y) WHERE y <> 2;
>> It seems to me this should yield
>> x y
>>
>> 1 1
>> 3 NULL
>> 4 4

> x y

> 1 1
> 4 4

Oh, my mistake, I forgot that the WHERE clause would filter out NULLs.
Try
SELECT * FROM t1 LEFT JOIN t2 ON (x = y) WHERE y <> 2 OR y IS NULL;

>> An even more interesting example is
>> SELECT * FROM t1 FULL JOIN t2 ON (x = y AND y <> 2);
>> My interpretation is that this should produce
>> x y
>>
>> 1 1
>> 2 NULL
>> NULL 2
>> 3 NULL
>> 4 4

> x y

> 1 1
> 4 4

Here I believe Informix is broken. Their result clearly does not
agree with the spec's definition of a FULL JOIN ... indeed it looks
exactly like an inner join.

regards, tom lane