Re: Ignored join clause

Lists: pgsql-bugs
From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Ignored join clause
Date: 2018-04-18 23:34:47
Message-ID: f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

It seems to me like PostgreSQL incorrectly removes a join clause when
planning some queries. I discovered this while debugging a large query,
which I below have simplified as much as I could. I suspect the bug may
be related to the lateral join but I am not sure.

This bug appears in both 10.3 and master (at least).

This works:

CREATE TABLE t AS SELECT 1 AS x, '{10,20}'::int[] AS ys;

SELECT *
FROM t
JOIN (VALUES (1, 10), (2, 20)) AS q1 (x, y) ON q1.x = t.x
LEFT JOIN unnest(ys) q2 (y) ON q2.y = q1.y;

x | ys | x | y | y
---+---------+---+----+----
1 | {10,20} | 1 | 10 | 10
(1 row)

This does not:

SELECT *
FROM t
LEFT JOIN (VALUES (1, 10), (2, 20)) AS q1 (x, y) ON q1.x = t.x
LEFT JOIN unnest(ys) q2 (y) ON q2.y = q1.y;

x | ys | x | y | y
---+---------+---+----+----
1 | {10,20} | 1 | 10 | 10
1 | {10,20} | 2 | 20 |
(2 rows)

I expect both these queries to return the same data on this data set,.
And the second row of the result violates "q1.x = t.x".

JOIN plan:

QUERY PLAN

---------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.05..56.90 rows=13 width=48)
Output: t.x, t.ys, "*VALUES*".column1, "*VALUES*".column2, q2.y
Join Filter: (q2.y = "*VALUES*".column2)
-> Hash Join (cost=0.05..27.64 rows=13 width=44)
Output: t.x, t.ys, "*VALUES*".column1, "*VALUES*".column2
Hash Cond: (t.x = "*VALUES*".column1)
-> Seq Scan on public.t (cost=0.00..22.70 rows=1270 width=36)
Output: t.x, t.ys
-> Hash (cost=0.03..0.03 rows=2 width=8)
Output: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=8)
Output: "*VALUES*".column1, "*VALUES*".column2
-> Function Scan on pg_catalog.unnest q2 (cost=0.00..1.00 rows=100
width=4)
Output: q2.y
Function Call: unnest(t.ys)
(15 rows)

LEFT JOIN plan:

QUERY PLAN

---------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.05..1826.15 rows=1270 width=48)
Output: t.x, t.ys, "*VALUES*".column1, "*VALUES*".column2, q2.y
-> Seq Scan on public.t (cost=0.00..22.70 rows=1270 width=36)
Output: t.x, t.ys
-> Hash Right Join (cost=0.05..1.45 rows=2 width=12)
Output: "*VALUES*".column1, "*VALUES*".column2, q2.y
Hash Cond: (q2.y = "*VALUES*".column2)
Join Filter: ("*VALUES*".column1 = t.x)
-> Function Scan on pg_catalog.unnest q2 (cost=0.00..1.00
rows=100 width=4)
Output: q2.y
Function Call: unnest(t.ys)
-> Hash (cost=0.03..0.03 rows=2 width=8)
Output: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=8)
Output: "*VALUES*".column1, "*VALUES*".column2
(15 rows)

Andreas


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Ignored join clause
Date: 2018-04-19 02:50:52
Message-ID: 87muy0dk4a.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

>>>>> "Andreas" == Andreas Karlsson <andreas(at)proxel(dot)se> writes:

Andreas> Hi,

Andreas> It seems to me like PostgreSQL incorrectly removes a join
Andreas> clause when planning some queries. I discovered this while
Andreas> debugging a large query, which I below have simplified as much
Andreas> as I could. I suspect the bug may be related to the lateral
Andreas> join but I am not sure.

Fascinating.

What's happening here is not that the condition is being ignored, but
rather that what should be a simple filter condition (or a join filter
at the upper level) is being placed in the "Join Filter" slot of an
outer join at the inner level - where the condition's falsity doesn't
remove the whole row but causes it to be treated as unmatched.

My suspicion is that this is an interaction between lateral and join
reordering. Looking into it further.

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Ignored join clause
Date: 2018-04-19 11:59:40
Message-ID: 87efjbe8rz.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:

Andrew> My suspicion is that this is an interaction between lateral and
Andrew> join reordering. Looking into it further.

I think I was wrong, and that in fact this is a much more general
problem which amounts to a lack of communication between
get_joinrel_parampathinfo and extract_actual_join_clauses.

When we build the hash join path between q1 and q2,
get_joinrel_parampathinfo adds the q1.x=t.x clause to the
restrict_clauses list, but it doesn't distinguish it in any way from the
clauses that were there already.

Later when building the final Plan node for the hash join, we call
extract_actual_join_clauses to determine which clauses are join clauses
rather than filters. But this only looks at the RestrictInfo's
is_pushed_down field, and in this case that's wrong; is_pushed_down
isn't set, because the condition really was a join clause at the place
where it was originally written, but the condition has now been moved
from its original place by the parameterization and is effectively
pushed down even though it's not marked as such. So it ends up on the
wrong list.

--
Andrew (irc:RhodiumToad)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Ignored join clause
Date: 2018-04-19 15:11:32
Message-ID: 21075.1524150692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> Andrew> My suspicion is that this is an interaction between lateral and
> Andrew> join reordering. Looking into it further.

> I think I was wrong, and that in fact this is a much more general
> problem which amounts to a lack of communication between
> get_joinrel_parampathinfo and extract_actual_join_clauses.

I've not got to the bottom of it yet either, but I notice that if you
replace the VALUES thingy with a plain table, the bug goes away:

regression=# create table q1(x int, y int);
CREATE TABLE
regression=# insert into q1 values (1, 10), (2, 20);
INSERT 0 2
regression=# SELECT *
FROM t
LEFT JOIN q1 ON q1.x = t.x
LEFT JOIN unnest(ys) q2 (y) ON q2.y = q1.y;
x | ys | x | y | y
---+---------+---+----+----
1 | {10,20} | 1 | 10 | 10
(1 row)

The plan's much saner-looking, too:

Nested Loop Left Join (cost=246.68..32758.05 rows=14351 width=48)
Output: t.x, t.ys, q1.x, q1.y, q2.y
Join Filter: (q2.y = q1.y)
-> Merge Left Join (cost=246.68..468.29 rows=14351 width=44)
Output: t.x, t.ys, q1.x, q1.y
Merge Cond: (t.x = q1.x)
-> Sort (cost=88.17..91.35 rows=1270 width=36)
Output: t.x, t.ys
Sort Key: t.x
-> Seq Scan on public.t (cost=0.00..22.70 rows=1270 width=36)
Output: t.x, t.ys
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Output: q1.x, q1.y
Sort Key: q1.x
-> Seq Scan on public.q1 (cost=0.00..32.60 rows=2260 width=8)
Output: q1.x, q1.y
-> Function Scan on pg_catalog.unnest q2 (cost=0.00..1.00 rows=100 width=4)
Output: q2.y
Function Call: unnest(t.ys)

So I'm suspicious that the real issue here has to do with the weird
subselect representation we use for VALUES; either that's broken in
itself somehow, or more likely the planner gets confused while
flattening it.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Ignored join clause
Date: 2018-04-19 15:20:39
Message-ID: 21505.1524151239@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I wrote:
> I've not got to the bottom of it yet either, but I notice that if you
> replace the VALUES thingy with a plain table, the bug goes away:

Oh, scratch that --- if you "ANALYZE q1", it goes right back to the
bogus plan. So knowing that q1 is small is part of the triggering
condition for picking the bogus plan, but VALUES per se isn't.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Ignored join clause
Date: 2018-04-19 16:58:47
Message-ID: 18363.1524157127@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> I think I was wrong, and that in fact this is a much more general
> problem which amounts to a lack of communication between
> get_joinrel_parampathinfo and extract_actual_join_clauses.

Yeah, I think you're right. The rules about moving clauses down into
a parameterized path may require something that had been a regular
outer join clause to be moved to a join below its syntactic level.
If we'd done that because the clause was actually degenerate (not
mentioning the outer join's LHS), we'd have marked it is_pushed_down,
but that doesn't seem practical in this situation since the RestrictInfo
probably is also referenced in other paths where it's not pushed down.
And I don't want to start making multiple RestrictInfos for the same
clause --- that'll break some other stuff.

So the only practical answer seems to be to teach
extract_actual_join_clauses to check the clause's syntactic level
along with is_pushed_down, as per the attached patch.

Out of curiosity, I put

Assert(bms_is_subset(rinfo->required_relids, joinrelids));

in there, and verified that we have no existing regression test
that hits the assertion, though of course Andreas' example does.

I've been pretty dissatisfied with the squishiness of is_pushed_down
for some time (cf comments in initsplan.c around line 1770), and this
bug seems like a sufficient reason to write it out of existence
entirely. But that's surely not going to lead to a back-patchable
change, so it's likely something for v12.

regards, tom lane

Attachment Content-Type Size
parameterization-filter-clause-fix.patch text/x-diff 3.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Ignored join clause
Date: 2018-04-20 16:32:55
Message-ID: 27781.1524241975@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I wrote:
> So the only practical answer seems to be to teach
> extract_actual_join_clauses to check the clause's syntactic level
> along with is_pushed_down, as per the attached patch.

After further thought about this, I decided that that patch didn't go
nearly far enough: in reality, just about every single place we test
RestrictInfo.is_pushed_down is potentially wrong in the same way that
extract_actual_join_clauses was, and needs to be taught to check the
clause relids in the same way. Hence the attached follow-on patch.
There might be a few of these places that don't really need the change
because they can never be reached while considering a parameterized join
path ... but I don't care to bet that that's true and will stay true.

There are a couple of places in analyzejoins.c that were already
examining required_relids, but were using bms_equal, which now seems
overly strict; this patch makes them use bms_subset like the other
places. Thoughts about that?

Also, I wondered in commit 306d6e59f whether changing the signature of
extract_actual_join_clauses in the back branches was really a good idea.
While it's still unpleasant, I'm inclined to leave it that way, because
any code that is using that function is almost certainly broken due to
this issue and needs to be fixed anyway. I see that the reason
postgres_fdw doesn't use that function as of v10 is commit 28b047875,
which probably should have been back-patched to 9.6 anyhow.

The patch as attached adds a couple of bms_union() steps to calculate join
relids in places that didn't have them handy. In both places this could
be avoided by passing down the join relids from a caller, but it'd require
API changes to globally-visible routines, so I thought eating some cycles
would be a safer solution in the back branches. We could clean that up
in HEAD though.

regards, tom lane

Attachment Content-Type Size
is-pushed-down-followup.patch text/x-diff 20.5 KB