Quick Links

Re: Removing unneeded self joins

Lists:	pgsql-hackers

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Removing unneeded self joins
Date:	2018-05-16 15:43:33
Message-ID:	64486b0b-0404-e39e-322d-0801154901f3@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg사설 토토 사이트SQL

Hi hackers,

There is a join optimization we don't do -- removing inner join of a
table with itself on a unique column. Such joins are generated by
various ORMs, so from time to time our customers ask us to look into
this. Most recently, it was discussed on the list in relation to an
article comparing the optimizations that some DBMS make [1].

I started to explore what can be done about this. Attached is a proof of
concept patch. It works for some simple cases:

create table tt(a int primary key, b text);
explain select p.* from tt p join (select * from tt where b ~~ 'a%') q
on p.a = q.a;

QUERY PLAN
──────────────────────────────────────────────────────
Seq Scan on tt p (cost=0.00..25.88 rows=6 width=36)
Filter: (b ~~ 'a%'::text)

It also works for semi-joins like `explain select p.* from tt p where
exists (select * from tt where b ~~ 'a%' and a = p.a);`. This requires a
preparatory step of reducing unique semi joins to inner joins, and we
already do this (reduce_unique_semijoin).

What this patch tries to do is to remove these inner joins when a single
join is being planned (populate_joinrel_with_paths). The main entry
point is reduce_self_unique_join. First, it proves that both input
relations are uniquely constrained by the same index given the
particular join clauses. We already have a way to find such indexes
(relation_has_unique_index_for), so I was able to reuse this. What I'm
not sure about is how to properly remove the join after that. For now, I
just pretend that the join relation being built is the outer baserel,
add to it the restrictions from the inner relation, and then plan it as
usual. Maybe there is a less hacky way to do it? I've seen elsewhere a
suggestion to use an AppendPath for a similar purpose, but here we can't
just use the outer relation we've already planned because the
restriction list is different.

I'd be glad to hear your thoughts on this.

[1]
/message-id/flat/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw%40mail(dot)gmail(dot)com#CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw(at)mail(dot)gmail(dot)com

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
remove-self-join-v1.patch	text/x-patch	18.3 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 16:08:46
Message-ID:	3295.1526486926@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg와이즈 토토SQL

Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
> There is a join optimization we don't do -- removing inner join of a
> table with itself on a unique column. Such joins are generated by
> various ORMs, so from time to time our customers ask us to look into
> this. Most recently, it was discussed on the list in relation to an
> article comparing the optimizations that some DBMS make [1].

This is the sort of thing that I always wonder why the customers don't
ask the ORM to stop generating such damfool queries. Its *expensive*
for us to clean up after their stupidity; almost certainly, it would
take far fewer cycles, net, for them to be a bit smarter in the first
place.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 16:26:48
Message-ID:	CA+TgmoYuH_Tkdv7_fxOinJLf7gw6NyYq1Lm=gevq8aoj507g4A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 16, 2018 at 12:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
>> There is a join optimization we don't do -- removing inner join of a
>> table with itself on a unique column. Such joins are generated by
>> various ORMs, so from time to time our customers ask us to look into
>> this. Most recently, it was discussed on the list in relation to an
>> article comparing the optimizations that some DBMS make [1].
>
> This is the sort of thing that I always wonder why the customers don't
> ask the ORM to stop generating such damfool queries. Its *expensive*
> for us to clean up after their stupidity; almost certainly, it would
> take far fewer cycles, net, for them to be a bit smarter in the first
> place.

The trouble, of course, is that the customer didn't write the ORM,
likely has no idea how it works, and doesn't want to run a modified
version of it even if they do. If the queries run faster on other
systems than they do on PostgreSQL, we get dinged -- not unjustly.

Also, I'm not sure that I believe that it's always easy to avoid
generating such queries. I mean, this case is trivial so it's easy to
say, well, just rewrite the query. But suppose that I have a fact
table over which I've created two views, each of which performs
various joins between the fact table and various lookup tables. My
queries are such that I normally need the joins in just one of these
two views and not the other to fetch the information I care about.
But every once in a while I need to run a report that involves pulling
every column possible. The obvious solution is to join the views on
the underlying table's primary key, but then you get this problem. Of
course there's a workaround: define a third view that does both sets
of joins-to-lookup-tables. But that starts to feel like you're
handholding the database; surely it's the database's job to optimize
queries, not the user's.

It's been about 10 years since I worked as a web developer, but I do
remember hitting this kind of problem from time to time and I'd really
like to see us do something about it. I wish we could optimize away
inner joins, too, for similar reasons.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 18:58:40
Message-ID:	20180516185840.a7crgjho7txxqmi4@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg범퍼카 토토SQL

Hi,

On 2018-05-16 12:26:48 -0400, Robert Haas wrote:
> Also, I'm not sure that I believe that it's always easy to avoid
> generating such queries.

Yea. There's obviously plenty cases where ORMs just want to make the
database hurt. But especially when building a join between a number of
tables based on various fields, it's not going to be easy for the ORM to
figure out which ones can be safely omitted. It'd need similar
optimization as we'd have to do, without having the infrastructure core
PG has. And then there's, as you say, views etc...

Greetings,

Andres Freund

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 20:04:14
Message-ID:	CANP8+jLFEQBKRz9q1c2yWK9so4K44Z24sQacoZrCew9NZhcf1A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토 사이트SQL

On 16 May 2018 at 11:26, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, May 16, 2018 at 12:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
>>> There is a join optimization we don't do -- removing inner join of a
>>> table with itself on a unique column. Such joins are generated by
>>> various ORMs, so from time to time our customers ask us to look into
>>> this. Most recently, it was discussed on the list in relation to an
>>> article comparing the optimizations that some DBMS make [1].
>>
>> This is the sort of thing that I always wonder why the customers don't
>> ask the ORM to stop generating such damfool queries. Its *expensive*
>> for us to clean up after their stupidity; almost certainly, it would
>> take far fewer cycles, net, for them to be a bit smarter in the first
>> place.
>
> The trouble, of course, is that the customer didn't write the ORM,
> likely has no idea how it works, and doesn't want to run a modified
> version of it even if they do. If the queries run faster on other
> systems than they do on PostgreSQL, we get dinged -- not unjustly.
>
> Also, I'm not sure that I believe that it's always easy to avoid
> generating such queries. I mean, this case is trivial so it's easy to
> say, well, just rewrite the query. But suppose that I have a fact
> table over which I've created two views, each of which performs
> various joins between the fact table and various lookup tables. My
> queries are such that I normally need the joins in just one of these
> two views and not the other to fetch the information I care about.
> But every once in a while I need to run a report that involves pulling
> every column possible. The obvious solution is to join the views on
> the underlying table's primary key, but then you get this problem. Of
> course there's a workaround: define a third view that does both sets
> of joins-to-lookup-tables. But that starts to feel like you're
> handholding the database; surely it's the database's job to optimize
> queries, not the user's.
>
> It's been about 10 years since I worked as a web developer, but I do
> remember hitting this kind of problem from time to time and I'd really
> like to see us do something about it. I wish we could optimize away
> inner joins, too, for similar reasons.

I agree with everything you say.

What I would add is that I've seen cases where the extra joins do NOT
hurt performance, so the extra CPU used to remove the join hurts more
than the benefit of removing it. Yes, we tried it.

More advanced optimizations should only be applied when we've assessed
that the likely run time is high enough to make it worth investing in
further optimization.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 20:10:45
Message-ID:	10951.1526501445@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg와이즈 토토SQL

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> What I would add is that I've seen cases where the extra joins do NOT
> hurt performance, so the extra CPU used to remove the join hurts more
> than the benefit of removing it. Yes, we tried it.

Interesting. The concern I had was more about the cost imposed on every
query to detect self-joins and try to prove them useless, even in queries
where no benefit ensues. It's possible that we can get that down to the
point where it's negligible; but this says that even the successful-proof
case has to be very cheap.

regards, tom lane

From:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 20:36:51
Message-ID:	E238CD3A-993A-4FD1-8920-988B00C01478@postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On May 16, 2018, at 1:58 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2018-05-16 12:26:48 -0400, Robert Haas wrote:
>> Also, I'm not sure that I believe that it's always easy to avoid
>> generating such queries.
>
> Yea. There's obviously plenty cases where ORMs just want to make the
> database hurt. But especially when building a join between a number of
> tables based on various fields, it's not going to be easy for the ORM to
> figure out which ones can be safely omitted. It'd need similar
> optimization as we'd have to do, without having the infrastructure core
> PG has. And then there's, as you say, views etc…

Are there specific examples of what the ORM code is that generated
the SQL? I’m more curious to see what people are writing that
generates such code. As earlier mentioned we could always report back
to the specific ORM maintainer(s) such examples and see if they could
tweak.

Jonathan

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 20:44:27
Message-ID:	CANP8+jJ4BRCZVHSf7E0hH6gNhqDYQC1h+X4=vNP0HvW+XfKyeA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 16 May 2018 at 15:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> What I would add is that I've seen cases where the extra joins do NOT
>> hurt performance, so the extra CPU used to remove the join hurts more
>> than the benefit of removing it. Yes, we tried it.
>
> Interesting. The concern I had was more about the cost imposed on every
> query to detect self-joins and try to prove them useless, even in queries
> where no benefit ensues. It's possible that we can get that down to the
> point where it's negligible; but this says that even the successful-proof
> case has to be very cheap.

What I was advocating was an approach that varies according to the
query cost, so we don't waste time trying to tune the heck out of OLTP
queries, but for larger queries we might take a more considered
approach.

For advanced optimizations that are costly to check for, skip the
check if we are already below a cost threshold. The threshold would be
a heuristic that varies according to the cost of the check.

I realise that in this case we wouldn't know the full query cost until
we've done join planning, so we would need some lower bound estimate
to check whether its worth trying to remove joins.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 20:45:53
Message-ID:	CAKJS1f8ySSsBfooH3bJK7OD3LBEbDb99d8J_FtqDd6w50p-eAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 May 2018 at 03:43, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> I'd be glad to hear your thoughts on this.

(I only glanced at the patch)

I've thought and discussed this before on this list. I think the
arguments for and against it were much the same as you've received
already. If you trawl through the archives you'll see my argument for
matches quite closely to Robert regarding the nested-views. I
personally experienced this issue in my previous job, although it was
not with PostgreSQL.

I think it's worth doing this providing that we can fast-path out
quickly enough in cases where we can't possibly remove anything.
Likely the success of this patch depends on how quick that fast-path
is.

From my experience on join removals, I imagine all this can be done
just after the left join removal code has completed. I see your patch
does it much later, which I don't think is particularly great since
Paths have already been generated by that time. I think it makes sense
to do this as early as possible to save wasting planning work for
relations that will be removed.

I think all this can be done just after left joins are removed by
remove_useless_joins. You may like to move the code that exists in
that function today into a new static function named
remove_useless_left_joins, and put this new code in new static
function named remove_useless_self_joins:

1. Allocate an array root->simple_rel_array_size in size. Populate it
with a struct which is defined as struct { Index relid; Oid oid; }
2. Populate that array by looping over the simple_rel_array. Ignore
anything that's not a baserel with relkind = 'r'
3. qsort the array on Oid.
4. Make a pass over the array (up to its size - 1) looking for
elements where the current oid is the same as the next. Build a List
of RelIds containing all relids of Oids which are duplicated.
5. If no pairs. Abort.
6. Process each combination of pairs found in each Relids in the list
made in step 1. Probably start at the lowest relid.
7. For each pair:
a. If there's a join condition, ensure all join OpExprs are equality
exprs with a mergejoinable opno (copy what left join removal check
with the opno used). Ensure Vars used in the OpExpr have the same
attrno on each side.
b. For bonus points don't reject non-Vars used in the join
condition, but ensure they're equal and there are no non-immutable
functions inside.
c. Ensure relation_has_unique_index_for returns true for the Vars
(and Exprs if doing b) used in the join condition.
d. Choose the relation with the highest relid and rewrite the parse
changing the varno of all Vars to use the one of the relation with the
lowest relid.
e. list_concat baserestictinfos from removed relation onto other relation.
f. Check for Vars in the join condition that can contain NULLs and
lappend IS NOT NULLs into the baserestrictinfo. (Removing the join
could have removed NULL filtering)
g. Mark highest relid relation as DEAD. (See what the left join
removal code does (you may need to do some extra work around
equivalence classes))

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 20:48:58
Message-ID:	CAKJS1f_x+b5FF4cEpEt3ywEpTGrQrpDYqAkEHmxtMF=usRitSA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg범퍼카 토토SQL

On 17 May 2018 at 08:44, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> What I was advocating was an approach that varies according to the
> query cost, so we don't waste time trying to tune the heck out of OLTP
> queries, but for larger queries we might take a more considered
> approach.

That's tricky. If we do this, it should be done before Path
generation, so not much is known about the costs in those case.

Perhaps something can be done by looking at the number of relpages,
but I've no idea what that would be. Perhaps we need to see how costly
this operation is first before we try to think of ways to only apply
it conditionally?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 21:31:37
Message-ID:	eba84a3d-14bb-cc37-4477-e49194445724@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

David,

Many thanks for the detailed explanation. I'll try to code it up and
measure how much overhead it introduces.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 21:41:25
Message-ID:	CAEepm=1TAUo5iJFRsX+=pHtTJFwHyGhD2cHuyF9Or95UiYmnbQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 17, 2018 at 3:43 AM, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> There is a join optimization we don't do -- removing inner join of a table
> with itself on a unique column. Such joins are generated by various ORMs, so
> from time to time our customers ask us to look into this. Most recently, it
> was discussed on the list in relation to an article comparing the
> optimizations that some DBMS make [1].
>
> ...
>
> I'd be glad to hear your thoughts on this.

Some thoughts:

There might be some interesting corner cases involving self-joins in
UPDATE/DELETE statements, and also FOR UPDATE etc. Those can result
in some surprising results in a self-join (one side is subject to EPQ
and the other isn't) which I think might be changed by your patch
(though I didn't try it or read the patch very closely).

IIUC in DB2 (the clear winner at join elimination in the article you
mentioned), you get these sorts of things by default (optimisation
level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
3 as many articles recommend for OLTP work. I think it's interesting
that they provide that knob rather than something automatic, and
interesting that there is one linear knob to classify your workload
rather than N knobs for N optimisations.

--
Thomas Munro
http://www.enterprisedb.com

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 22:02:22
Message-ID:	20180516220222.i4ei5i5575qxfyc7@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg메이저 토토 사이트SQL

HI,

On 2018-05-17 08:48:58 +1200, David Rowley wrote:
> On 17 May 2018 at 08:44, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > What I was advocating was an approach that varies according to the
> > query cost, so we don't waste time trying to tune the heck out of OLTP
> > queries, but for larger queries we might take a more considered
> > approach.
>
> That's tricky. If we do this, it should be done before Path
> generation, so not much is known about the costs in those case.
>
> Perhaps something can be done by looking at the number of relpages,
> but I've no idea what that would be. Perhaps we need to see how costly
> this operation is first before we try to think of ways to only apply
> it conditionally?

I'm also not buying that this isn't a benefit in OLTP in general. Sure,
for a single query RTT costs are going to dominate, but if you use
prepared statements the costs are going to pay of over multiple
executions. Even just avoiding initializing unnecessary executor nodes
shows up in profiles.

Greetings,

Andres Freund

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 22:13:51
Message-ID:	18757.1526508831@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg윈 토토SQL :

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> On 17 May 2018 at 08:44, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> What I was advocating was an approach that varies according to the
>> query cost, so we don't waste time trying to tune the heck out of OLTP
>> queries, but for larger queries we might take a more considered
>> approach.

> That's tricky. If we do this, it should be done before Path
> generation, so not much is known about the costs in those case.

Yeah. It'd have to be a very heuristic thing that doesn't account
for much beyond the number of relations in the query, and maybe their
sizes --- although I don't think we even know the latter at the
point where join removal would be desirable. (And note that one of
the desirable benefits of join removal is not having to find out the
sizes of removed rels ... so just swapping that around doesn't appeal.)

regards, tom lane

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 22:24:53
Message-ID:	CAKJS1f9UkZgezi0Uj0P+0HhJsZ2KZEPaGL8ch5w8SA3p8GFGQA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 May 2018 at 10:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah. It'd have to be a very heuristic thing that doesn't account
> for much beyond the number of relations in the query, and maybe their
> sizes --- although I don't think we even know the latter at the
> point where join removal would be desirable. (And note that one of
> the desirable benefits of join removal is not having to find out the
> sizes of removed rels ... so just swapping that around doesn't appeal.)

There's probably some argument for delaying obtaining the relation
size until after join removal and probably partition pruning too, but
it's currently done well before that in build_simple_rel, where the
RelOptInfo is built.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 22:37:11
Message-ID:	19719.1526510231@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 결과SQL

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> IIUC in DB2 (the clear winner at join elimination in the article you
> mentioned), you get these sorts of things by default (optimisation
> level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
> 3 as many articles recommend for OLTP work. I think it's interesting
> that they provide that knob rather than something automatic, and
> interesting that there is one linear knob to classify your workload
> rather than N knobs for N optimisations.

There's a lot to be said for that type of approach, as opposed to trying
to drive it off some necessarily-very-inexact preliminary estimate of
query cost. For example, the mere fact that you're joining giant tables
doesn't in itself suggest that extra efforts in query optimization will be
repaid. (If anything, it seems more likely that the user would've avoided
silliness like useless self-joins in such a case.)

A different line of thought is that, to me, the most intellectually
defensible rationale for efforts like const-simplification and join
removal is that opportunities for those things can arise after view
expansion, even in queries where the original query text didn't seem
to contain anything extraneous. (Robert and Andres alluded to this
upthread, but not very clearly.) So maybe we could track how much
the query got changed during rewriting, and use that to drive the
planner's decisions about how hard to work later on. But I'm not
very sure that this'd be superior to having a user-visible knob.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 22:44:50
Message-ID:	20180516224450.qawd3iqmprujj57r@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2018-05-16 18:37:11 -0400, Tom Lane wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> > IIUC in DB2 (the clear winner at join elimination in the article you
> > mentioned), you get these sorts of things by default (optimisation
> > level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
> > 3 as many articles recommend for OLTP work. I think it's interesting
> > that they provide that knob rather than something automatic, and
> > interesting that there is one linear knob to classify your workload
> > rather than N knobs for N optimisations.
>
> There's a lot to be said for that type of approach, as opposed to trying
> to drive it off some necessarily-very-inexact preliminary estimate of
> query cost. For example, the mere fact that you're joining giant tables
> doesn't in itself suggest that extra efforts in query optimization will be
> repaid. (If anything, it seems more likely that the user would've avoided
> silliness like useless self-joins in such a case.)

For prepared statements we could also start making more expensive
optimizations after the first execution, when we know how long the query
took / how expensive it was (also, if we had a plan cache...).

Greetings,

Andres Freund

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 22:45:45
Message-ID:	CAKJS1f8kUwb3wsXB6_jXM_9RqXOnCskWCC=pY-x637PUo=5pwQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 May 2018 at 10:37, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> IIUC in DB2 (the clear winner at join elimination in the article you
>> mentioned), you get these sorts of things by default (optimisation
>> level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
>> 3 as many articles recommend for OLTP work. I think it's interesting
>> that they provide that knob rather than something automatic, and
>> interesting that there is one linear knob to classify your workload
>> rather than N knobs for N optimisations.
>
> There's a lot to be said for that type of approach, as opposed to trying
> to drive it off some necessarily-very-inexact preliminary estimate of
> query cost. For example, the mere fact that you're joining giant tables
> doesn't in itself suggest that extra efforts in query optimization will be
> repaid. (If anything, it seems more likely that the user would've avoided
> silliness like useless self-joins in such a case.)
>
> A different line of thought is that, to me, the most intellectually
> defensible rationale for efforts like const-simplification and join
> removal is that opportunities for those things can arise after view
> expansion, even in queries where the original query text didn't seem
> to contain anything extraneous. (Robert and Andres alluded to this
> upthread, but not very clearly.) So maybe we could track how much
> the query got changed during rewriting, and use that to drive the
> planner's decisions about how hard to work later on. But I'm not
> very sure that this'd be superior to having a user-visible knob.

This seems like a good line of thought. Perhaps a knob is a good
first step, then maybe having the ability to set that knob to
"automatic" is something to aspire for later.

I don't think Alexander should work on this as part of this patch
though. Perhaps we can re-evaluate when Alexander posts some planner
benchmarks from the patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 22:55:41
Message-ID:	20474.1526511341@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 커뮤니티SQL

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> On 17 May 2018 at 10:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah. It'd have to be a very heuristic thing that doesn't account
>> for much beyond the number of relations in the query, and maybe their
>> sizes --- although I don't think we even know the latter at the
>> point where join removal would be desirable. (And note that one of
>> the desirable benefits of join removal is not having to find out the
>> sizes of removed rels ... so just swapping that around doesn't appeal.)

> There's probably some argument for delaying obtaining the relation
> size until after join removal and probably partition pruning too, but
> it's currently done well before that in build_simple_rel, where the
> RelOptInfo is built.

Yeah, but that's something we ought to fix someday; IMO it's an artifact
of having wedged in remove_useless_joins without doing the extensive
refactoring that'd be needed to do it at a more desirable time. I don't
want to build user-visible behavior that's dependent on doing that wrong.

(But wait a second ... we could improve this without quite that much work:
instead of doing estimate_rel_size immediately during get_relation_info,
couldn't it be left until the set_base_rel_sizes pass? Since
RelationGetNumberOfBlocks involves kernel calls, skipping it for removed
rels seems worth doing.)

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-16 23:00:53
Message-ID:	20180516230053.upq62j47n6i543tf@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2018-05-16 18:55:41 -0400, Tom Lane wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> > On 17 May 2018 at 10:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Yeah. It'd have to be a very heuristic thing that doesn't account
> >> for much beyond the number of relations in the query, and maybe their
> >> sizes --- although I don't think we even know the latter at the
> >> point where join removal would be desirable. (And note that one of
> >> the desirable benefits of join removal is not having to find out the
> >> sizes of removed rels ... so just swapping that around doesn't appeal.)
>
> > There's probably some argument for delaying obtaining the relation
> > size until after join removal and probably partition pruning too, but
> > it's currently done well before that in build_simple_rel, where the
> > RelOptInfo is built.
>
> Yeah, but that's something we ought to fix someday; IMO it's an artifact
> of having wedged in remove_useless_joins without doing the extensive
> refactoring that'd be needed to do it at a more desirable time. I don't
> want to build user-visible behavior that's dependent on doing that wrong.

My patch that introduced a radix tree buffer mapping also keeps an
accurate relation size in memory, making it far cheaper to use. While I
depriorized the patchset for the moment (I'll post what I'm working on
first soon), that should address some of the cost till then.

Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.

Greetings,

Andres Freund

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-17 01:33:37
Message-ID:	CAKJS1f-xe30S6SeWaLMfM_G6Y6ZSryew7djsiiYvX1tSv-_UDw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg사설 토토 사이트SQL

On 17 May 2018 at 10:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> There's probably some argument for delaying obtaining the relation
>> size until after join removal and probably partition pruning too, but
>> it's currently done well before that in build_simple_rel, where the
>> RelOptInfo is built.
>
> Yeah, but that's something we ought to fix someday; IMO it's an artifact
> of having wedged in remove_useless_joins without doing the extensive
> refactoring that'd be needed to do it at a more desirable time. I don't
> want to build user-visible behavior that's dependent on doing that wrong.
>
> (But wait a second ... we could improve this without quite that much work:
> instead of doing estimate_rel_size immediately during get_relation_info,
> couldn't it be left until the set_base_rel_sizes pass? Since
> RelationGetNumberOfBlocks involves kernel calls, skipping it for removed
> rels seems worth doing.)

I did mean just obtaining the sizes, not delaying building the
RelOptInfo. I see nothing that needs RelOptInfo->pages before
set_base_rel_size apart from the code which I mentioned about moving a
couple of days ago in [1].

[1] /message-id/CAKJS1f_eUz0_h5_vU1rqE7wuxMcoENcWK2FTODz0pOyxp3_Uig%40mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-17 01:38:50
Message-ID:	CAKJS1f85vtEVqTHReduJd-n2SOOPJDB-=a302HXtUFJ7LfcoGQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 May 2018 at 11:00, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Wonder if we shouldn't just cache an estimated relation size in the
> relcache entry till then. For planning purposes we don't need to be
> accurate, and usually activity that drastically expands relation size
> will trigger relcache activity before long. Currently there's plenty
> workloads where the lseeks(SEEK_END) show up pretty prominently.

While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value. We could have uses pg_class.relpages all
along, but it would cause the planner to not work so well in face of
the relation changing size significantly between analyze runs.

FWIW the major case where that does show up is when generating a plan
for a partitioned table with many partitions then pruning all but a
few of them.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-17 02:11:22
Message-ID:	28694.1526523082@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 커뮤니티SQL

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> On 17 May 2018 at 11:00, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Wonder if we shouldn't just cache an estimated relation size in the
>> relcache entry till then. For planning purposes we don't need to be
>> accurate, and usually activity that drastically expands relation size
>> will trigger relcache activity before long. Currently there's plenty
>> workloads where the lseeks(SEEK_END) show up pretty prominently.

> While I'm in favour of speeding that up, I think we'd get complaints
> if we used a stale value.

Yeah, that scares me too. We'd then be in a situation where (arguably)
any relation extension should force a relcache inval. Not good.
I do not buy Andres' argument that the value is noncritical, either ---
particularly during initial population of a table, where the size could
go from zero to something-significant before autoanalyze gets around
to noticing.

I'm a bit skeptical of the idea of maintaining an accurate relation
size in shared memory, too. AIUI, a lot of the problem we see with
lseek(SEEK_END) has to do with contention inside the kernel for access
to the single-point-of-truth where the file's size is kept. Keeping
our own copy would eliminate kernel-call overhead, which can't hurt,
but it won't improve the contention angle.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-17 02:19:34
Message-ID:	20180517021934.m3kgn2lg3disruoi@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2018-05-16 22:11:22 -0400, Tom Lane wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> > On 17 May 2018 at 11:00, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> Wonder if we shouldn't just cache an estimated relation size in the
> >> relcache entry till then. For planning purposes we don't need to be
> >> accurate, and usually activity that drastically expands relation size
> >> will trigger relcache activity before long. Currently there's plenty
> >> workloads where the lseeks(SEEK_END) show up pretty prominently.
>
> > While I'm in favour of speeding that up, I think we'd get complaints
> > if we used a stale value.
>
> Yeah, that scares me too. We'd then be in a situation where (arguably)
> any relation extension should force a relcache inval. Not good.
> I do not buy Andres' argument that the value is noncritical, either ---
> particularly during initial population of a table, where the size could
> go from zero to something-significant before autoanalyze gets around
> to noticing.

I don't think every extension needs to force a relcache inval. It'd
instead be perfectly reasonable to define a rule that an inval is
triggered whenever crossing a 10% relation size boundary. Which'll lead
to invalidations for the first few pages, but much less frequently
later.

> I'm a bit skeptical of the idea of maintaining an accurate relation
> size in shared memory, too. AIUI, a lot of the problem we see with
> lseek(SEEK_END) has to do with contention inside the kernel for access
> to the single-point-of-truth where the file's size is kept. Keeping
> our own copy would eliminate kernel-call overhead, which can't hurt,
> but it won't improve the contention angle.

A syscall is several hundred instructions. An unlocked read - which'll
be be sufficient in many cases, given that the value can quickly be out
of date anyway - is a few cycles. Even with a barrier you're talking a
few dozen cycles. So I can't see how it'd not improve the contention.

But the main reason for keeping it in shmem is less the lseek avoidance
- although that's nice, context switches aren't great - but to make
relation extension need far less locking.

Greetings,

Andres Freund

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Removing unneeded self joins
Date:	2018-05-17 08:15:51
Message-ID:	9c1a30c1-2c2d-50f4-dbd7-b4b50f6ddec4@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 꽁 머니SQL

On 17.05.2018 05:19, Andres Freund wrote:
> On 2018-05-16 22:11:22 -0400, Tom Lane wrote:
>> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>>> On 17 May 2018 at 11:00, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>>> Wonder if we shouldn't just cache an estimated relation size in the
>>>> relcache entry till then. For planning purposes we don't need to be
>>>> accurate, and usually activity that drastically expands relation size
>>>> will trigger relcache activity before long. Currently there's plenty
>>>> workloads where the lseeks(SEEK_END) show up pretty prominently.
>>> While I'm in favour of speeding that up, I think we'd get complaints
>>> if we used a stale value.
>> Yeah, that scares me too. We'd then be in a situation where (arguably)
>> any relation extension should force a relcache inval. Not good.
>> I do not buy Andres' argument that the value is noncritical, either ---
>> particularly during initial population of a table, where the size could
>> go from zero to something-significant before autoanalyze gets around
>> to noticing.
> I don't think every extension needs to force a relcache inval. It'd
> instead be perfectly reasonable to define a rule that an inval is
> triggered whenever crossing a 10% relation size boundary. Which'll lead
> to invalidations for the first few pages, but much less frequently
> later.
>
>
>> I'm a bit skeptical of the idea of maintaining an accurate relation
>> size in shared memory, too. AIUI, a lot of the problem we see with
>> lseek(SEEK_END) has to do with contention inside the kernel for access
>> to the single-point-of-truth where the file's size is kept. Keeping
>> our own copy would eliminate kernel-call overhead, which can't hurt,
>> but it won't improve the contention angle.
> A syscall is several hundred instructions. An unlocked read - which'll
> be be sufficient in many cases, given that the value can quickly be out
> of date anyway - is a few cycles. Even with a barrier you're talking a
> few dozen cycles. So I can't see how it'd not improve the contention.
>
> But the main reason for keeping it in shmem is less the lseek avoidance
> - although that's nice, context switches aren't great - but to make
> relation extension need far less locking.
>
> Greetings,
>
> Andres Freund
>

I completely agree with Andreas. In my multithreaded Postgres prototype
file description cache (shared by all threads) becomes bottleneck
exactly because of each query execution requires
access to file system (lseek) to provide optimizer estimation of the
relation size, despite to the fact that all database fits in memory.
Well, this is certainly specific of shared descriptor's pool in my
prototype, but the fact the we have to perform lseek at each query
compilation seems to be annoying in any case.

And there is really no problem that cached relation size estimation is
not precise. It really can be invalidated even if relation size is
changed more than some threshold value (1Mb?) or lease time for cached
value is expired.
May be it is reasonable to implement specific invalidation for relation
size esimation, to avoid complete invalidation and reconstruction of
relation description and all dependent objects.
In this case time-based invalidation seems to be the easiest choice to
implement. Repeating lseek each 10 or 1 second seems to have no
noticeable impact on performance and relation size can not dramatically
changed during this time.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-05-18 19:47:42
Message-ID:	CA+TgmoZ_HbxXopVvwtrne7kh-hz+Smp1LwjrFzBrq_cGPXL_NA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 16, 2018 at 6:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> On 17 May 2018 at 08:44, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> What I was advocating was an approach that varies according to the
>>> query cost, so we don't waste time trying to tune the heck out of OLTP
>>> queries, but for larger queries we might take a more considered
>>> approach.
>
>> That's tricky. If we do this, it should be done before Path
>> generation, so not much is known about the costs in those case.
>
> Yeah. It'd have to be a very heuristic thing that doesn't account
> for much beyond the number of relations in the query, and maybe their
> sizes --- although I don't think we even know the latter at the
> point where join removal would be desirable. (And note that one of
> the desirable benefits of join removal is not having to find out the
> sizes of removed rels ... so just swapping that around doesn't appeal.)

As I've mentioned before, the problem we're talking about here is also
highly relevant to parallel query. We only want to bother generating
partial paths for queries that are expensive enough to justify
considering parallelism, but we don't know how expensive they are
until we finishing planning. The only way I could think of to tackle
that problem was to drive it off the relation sizes, but the presence
or absence of expensive functions in the target list can wildly change
the point at which parallelism potentially becomes useful. So we end
up sometimes wasting effort generating partial paths that are totally
useless, and at other times failing to generate partial paths that
would have been useful. (Really, I'd like to generate a lot more
partial paths than we do, trying out various numbers of workers, but
that would just make the existing problem worse.)

I have wondered about doing a preliminary pass over the tree where we
try to make a crude estimate of the amount of effort involved, and
then planning for real with that number in hand. But it seems like
there's so little information you could get at that early stage that
it would be hard to decide anything useful on that basis. You've got
to at least have relation sizes, and really you need some estimate of
the result cardinality as well. It seems like we currently can't
figure out cardinality without also computing paths, and I've wondered
if we could split those into two separate phases. But every time I
think about trying to do that I realize that my pain tolerance isn't
that high.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-06-25 15:26:28
Message-ID:	d44ef452-10b8-7680-2984-84897283fa97@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토SQL

David,

I tried to implement your suggestions, here are the patches.

The first patch mostly moves some code around without changing
functionality. It modifies innerrel_is_unique to not only cache the fact
that the relation is unique, but also
cache the index that guarantees uniqueness.

The second patch adds the unique self join removal code. It goes along
the lines of your plan. I didn't even have to examine join clauses
because the constraints imposed by innerrel_is_unique are strong enough
to guarantee that when it finds the same unique index for both
relations, the join can be removed. Namely, it requires mergejoinable
equality clauses for all columns of a unique index.

As a simple benchmark, I measured the duration of query_planner and
remove_useless_self_joins with clock_gettime() on the regression tests.
The following table shows average times in microseconds, median over 11
runs. First row is with this patch, and the second row doesn't call
remove_useless_self_joins and just calls clock_gettime to measure its
overhead.

query_planner remove_useless_self_joins

with removal 39.61 0.61

no removal 39.45 0.38

So, on this workload, unique self join removal adds about 0.2 mcs, or
0.6% of total time, to query_planner. I also tried a query that joins 26
relations, remove_useless_self_joins takes about 40 mcs. Still, this
time grows quadratically with number of relations we have to process, so
in the final patch I limit it to join_collapse_limit, which brings the
time down to 15 mcs. This is negligible compared to the total
query_planner time, which for 8 relations is about 3 ms, that is, 3
orders of magnitude higher.

These benchmarks mostly measure the path where we don't remove any
joins. I didn't time the join removal itself, because it wins quite some
time by allowing to plan one relation and one join less.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0002-Remove-unique-self-joins-v2.patch	text/x-patch	18.7 KB
0001-Preparatory-refactoring-v2.patch	text/x-patch	26.9 KB

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-07-27 12:26:03
Message-ID:	8a17cc3d-f933-ba32-6110-f2d8481070e7@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 사이트SQL

Here is a current version of the patch, still rather experimental. Since
the previous version, I fixed some bugs and added the possibility to
remove a relation even when it is mentioned in target lists. I have to
rewrite all references to the removed relation in targetlists and the
equivalence classes, so that they point to the remaining relation. I
change RestrictInfos in place, and update attr_needed and reltarget of
the remaining relation. I also update equivalence members, and delete
equivalence classes that become single-member.

I'm posting it a single file now, because all meaningful changes are in
analyzejoins.c anyway.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-unique-self-joins-v3.patch	text/x-patch	47.8 KB

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-07-29 10:42:32
Message-ID:	CAEepm=19wLyUc4o43j5WnSbYsQPoaErdoOfeXiPP0iB_zRUm_w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 커뮤니티SQL

On Sat, Jul 28, 2018 at 12:26 AM, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> Here is a current version of the patch, still rather experimental.

Hi Alexander,

The eval-qual-plan isolation test is failing:

- checking 1050 checking 600
+ checking 600 checking 600

That's the result of a self join with EPQ on one side of the join:

SELECT * FROM accounts a1, accounts a2
WHERE a1.accountid = a2.accountid
FOR UPDATE OF a1;

I think you need to disable the optimisation when there is a locking
clause on one side. Maybe it could be allowed if it's on both sides?
Not sure.

+ Assert(is_opclause(rinfo->clause));
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);

You can't declare a variable here in C89.

--
Thomas Munro
http://www.enterprisedb.com

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-08-03 16:49:47
Message-ID:	2285d5d0-f330-e8b6-9ee5-b2b90e44409b@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg무지개 토토SQL

On 07/29/2018 01:42 PM, Thomas Munro wrote:
>
> I think you need to disable the optimisation when there is a locking
> clause on one side. Maybe it could be allowed if it's on both sides?
> Not sure.

Hi Thomas,

Thanks for testing. I think it should be OK to remove the join if the
row marks for both sides have the same type. I added the code to check
for that.

> You can't declare a variable here in C89.
>

Fixed.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-unique-self-joins-v4.patch	text/x-patch	49.8 KB

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-10-08 14:52:21
Message-ID:	35fd3789-bde8-f290-9b55-da3d54cd6af5@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here is a rebased version of the patch. It includes some fixes after an
off-list review by Konstantin Knizhnik.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
remove-self-joins-v5.patch	text/x-patch	52.2 KB

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-10-18 12:47:09
Message-ID:	e34fc4cc-68d0-8ebf-9376-a8089a92ec2d@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here is a version that compiles.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
remove-self-join-v6.patch	text/x-patch	52.2 KB

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-11-08 05:59:19
Message-ID:	CAKJS1f8-TuxfYMewFoBicGDtac1dK8RhbYzXgyzgW6Ej5U70sg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19 October 2018 at 01:47, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> Here is a version that compiles.

I had a quick read through this and I think its missing about a 1-page
comment section detailing when we can and when we cannot remove these
self joins, and what measures we must take when we do remove them.

Apart from that, I noted the following during my read:

1. I don't think this is the right way to do this. There are other
places where we alter the varnoold. For example:
search_indexed_tlist_for_var(). So you should likely be doing that too
rather than working around it.

@@ -166,10 +166,13 @@ _equalVar(const Var *a, const Var *b)
COMPARE_SCALAR_FIELD(vartypmod);
COMPARE_SCALAR_FIELD(varcollid);
COMPARE_SCALAR_FIELD(varlevelsup);
- COMPARE_SCALAR_FIELD(varnoold);
- COMPARE_SCALAR_FIELD(varoattno);
COMPARE_LOCATION_FIELD(location);

+ /*
+ * varnoold/varoattno are used only for debugging and may differ even
+ * when the variables are logically the same.
+ */
+

2. Surely the following loop is incorrect:

for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
{
int attno = i - toKeep->min_attr;
toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
toRemove->attr_needed[attno]);
}

What if toRemove has a lower min_attr or higher max_attr?

3. "wind" -> "find"

+ * When we wind such a join, we mark one of the participating relation as

4. I think the following shouldn't be happening:

+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)

5. I'd have thought the opposite. Surely there are more chances of
this being useful with more joins?

+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;

6. In remove_self_joins_one_level() I think you should collect the
removed relations in a Relids rather than a list.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-11-21 12:54:20
Message-ID:	23f6cf6d-32ca-faa2-f4ec-da690ba42482@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

El 08/11/18 a las 08:59, David Rowley escribió:
> On 19 October 2018 at 01:47, Alexander Kuzmenkov
> <a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
>> Here is a version that compiles.
> I had a quick read through this and I think its missing about a 1-page
> comment section detailing when we can and when we cannot remove these
> self joins, and what measures we must take when we do remove them.

I added some explanation to the comment for remove_useless_joins. This
is probably still not clear enough, so if you have any particular
questions I'll cover them too. While improving the comments, I found
some bugs around the handling of join clauses and broken ECs, so I fixed
them and added the tests.

> Apart from that, I noted the following during my read:
>
> 1. I don't think this is the right way to do this. There are other
> places where we alter the varnoold. For example:
> search_indexed_tlist_for_var(). So you should likely be doing that too
> rather than working around it.

Fixed.

> 2. Surely the following loop is incorrect:
>
> for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
> {
> int attno = i - toKeep->min_attr;
> toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
> toRemove->attr_needed[attno]);
> }
>
> What if toRemove has a lower min_attr or higher max_attr?

This shouldn't happen because this is always the same relation, and
max_attr is its physical number of attributes. There is an assertion
about this in remove_self_joins_one_group:
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[relids[i]]->relid ==
root->simple_rte_array[relids[o]]->relid);

> 3. "wind" -> "find"
>
> + * When we wind such a join, we mark one of the participating relation as

Fixed.

> 4. I think the following shouldn't be happening:
>
> +------------------------------------------------
> Result
> One-Time Filter: false
> -(2 rows)
> + -> Index Scan using parent_pkey on parent x
> + Index Cond: (k = 1)
> +(4 rows)

This happens because for join rels, we make some effort to prove that
they are empty and not make any paths for them, and we don't do this for
base rels. When we remove the join, this difference is exposed. Compare
to this query:

postgres=# explain select * from parent where k = 1 and k = 2;
                                   QUERY PLAN
────────────────────────────────────────────────────────────────────────────────
Result (cost=0.15..8.17 rows=1 width=8)
   One-Time Filter: false
   -> Index Scan using parent_pkey on parent (cost=0.15..8.17 rows=1
width=8)
         Index Cond: (k = 1)
(4 rows)

> 5. I'd have thought the opposite. Surely there are more chances of
> this being useful with more joins?
>
> + /* Limit the number of joins we process to control the quadratic behavior. */
> + if (n > join_collapse_limit)
> + break;

That is true, but we also have to think about the overhead when we don't
find any joins to remove. Without this cutoff, we have to examine every
pair of self-joins, so the run time grows quadratically with the number
of such joins in the query. I don't have a better idea on how to control
this.

> 6. In remove_self_joins_one_level() I think you should collect the
> removed relations in a Relids rather than a list.

Done.

--
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
remove-self-join-v7.patch	text/x-patch	56.7 KB

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2018-12-24 16:28:01
Message-ID:	c3ce94ab-1c1d-98ee-c067-f3e3a5444e98@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here is a rebased version with some bugfixes.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-self-joins-v9.patch	text/x-patch	61.0 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-02-22 00:25:12
Message-ID:	22996.1550795112@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 핫SQL :

Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
> Here is a rebased version with some bugfixes.

I noticed this had bit-rotted again. I've not really reviewed it, but
I rebased it up to HEAD, and fixed a couple small things:

* My compiler was bitching about misplaced declarations, so I moved
some variable declarations accordingly. I couldn't help noticing
that many of those wouldn't have been a problem in the first place
if you were following project style for loops around list_delete_cell
calls, which usually look more like this:

prev = NULL;
for (cell = list_head(root->rowMarks); cell; cell = next)
{
PlanRowMark *rc = (PlanRowMark *) lfirst(cell);

next = lnext(cell);
if (rt_fetch(rc->rti, root->parse->rtable)->rtekind == RTE_RESULT)
root->rowMarks = list_delete_cell(root->rowMarks, cell, prev);
else
prev = cell;
}

* I saw you had a problem with an existing test in join.sql that
was being optimized away because it used an ill-advised self-join.
I've pushed a fix for that, so it's not a problem as of HEAD.

I notice though that there's one unexplained plan change remaining
in join.out:

@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)

-- bug 5255: this is not optimizable by join removal
begin;

That sure looks like a bug. I don't have time to look for the
cause right now.

I also noticed that the test results show that when a table
is successfully optimized away, the remaining reference seems
to have the alias of the second reference not the first one.
That seems a little ... weird. It's just cosmetic of course, but
why is that?

Also, I did notice that you'd stuck a declaration for
"struct UniqueIndexInfo" into paths.h, which then compelled you
to include that header in planmain.h. This seems like poor style;
I'd have been inclined to put the struct in pathnodes.h instead.
That's assuming you need it at all -- in those two usages, seems
like it'd be just about as easy to return two separate Lists.
On the other hand, given

+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.

I wonder why you didn't include the Relids into UniqueIndexInfo as well
... and maybe make it a proper Node so that unique_for_rels could be
printed by outfuncs.c. So any way I slice it, it seems like this data
structure could use more careful contemplation.

Anyway, updated patch attached.

regards, tom lane

Attachment	Content-Type	Size
0001-Remove-self-joins-v10.patch	text/x-diff	58.1 KB

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-02-28 14:09:34
Message-ID:	2b74af78-3417-167d-ad72-d54ec558d8be@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

Hi Tom,

Thanks for the update.

On 2/22/19 03:25, Tom Lane wrote:
> * My compiler was bitching about misplaced declarations, so I moved
> some variable declarations accordingly. I couldn't help noticing
> that many of those wouldn't have been a problem in the first place
> if you were following project style for loops around list_delete_cell
> calls, which usually look more like this:

Changed them to conform to project style. After a couple of segfaults I
remembered why I wrote them the way I did: you can't use plain
'continue' with our style of loops and also have to update prev, and I
can't seem to remember to do that.

> I notice though that there's one unexplained plan change remaining
> in join.out:
>
> @@ -4365,11 +4365,13 @@ explain (costs off)
> select p.* from
> (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
> where p.k = 1 and p.k = 2;
> - QUERY PLAN
> ---------------------------
> + QUERY PLAN
> +------------------------------------------------
> Result
> One-Time Filter: false
> -(2 rows)
> + -> Index Scan using parent_pkey on parent x
> + Index Cond: (k = 1)
> +(4 rows)
>
> -- bug 5255: this is not optimizable by join removal
> begin;
>
> That sure looks like a bug. I don't have time to look for the
> cause right now.

David also asked about this before. This is the same plan you'd get for
'select * from parent where k = 1 and k = 2', and this plan is exposed
by join removal. So this is not a bug in join removal itself.

> I also noticed that the test results show that when a table
> is successfully optimized away, the remaining reference seems
> to have the alias of the second reference not the first one.
> That seems a little ... weird. It's just cosmetic of course, but
> why is that?

Can you point me to a query that looks wrong?

> Also, I did notice that you'd stuck a declaration for
> "struct UniqueIndexInfo" into paths.h, which then compelled you
> to include that header in planmain.h. This seems like poor style;
> I'd have been inclined to put the struct in pathnodes.h instead.

Moved.

> I wonder why you didn't include the Relids into UniqueIndexInfo as well
> ... and maybe make it a proper Node so that unique_for_rels could be
> printed by outfuncs.c.

We also prove and cache the uniqueness for subqueries, for which the
UniqueIndexInfo is not relevant, that's why it was optional and stored
in a parallel list. Now I changed it to UniqueRelInfo which always has
outerrelids and optionally the unique index.

I also fixed a bug with not updating the references in HAVING clause.
New version is attached.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
v11-0001-Remove-self-joins-on-a-unique-column.patch	text/x-patch	62.5 KB

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-03-14 11:21:29
Message-ID:	CAKJS1f8VV4YvsC_YSNHVtKfmqTAnrfcDoH21gVgAJNAFmwK6zA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 1 Mar 2019 at 03:09, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> New version is attached.

I started looking over v11 and I'm wondering why you really need to
know which unique index proved the join unique?

I removed that check and I see it causes the following to remove the self-join:

create unique index on t1(a);
create unique index on t1(b);
explain select * from t1 inner join t1 t2 on t1.a=t2.b;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 t2 (cost=0.00..38.25 rows=11 width=16)
Filter: (a = b)
(2 rows)

I had thought that the loop in is_unique_self_join() would have
rejected this, but looking more closely it's only checking the list of
join quals matched to each index match each other. Shouldn't it be
testing that the expressions on either side of the OpExprs are the
same after aligning the varnos? At the moment you're testing
innerrel_is_unique() for either side of the join, if you made this
change then couldn't you just test one rel and look for a unique index
on the join quals that match both sides of the join at the same time?
Probably relation_has_unique_index_for() is not the right tool for
that job, so you might need to write another function that does the
additional checks. Maybe it would be possible to split out some of the
code into helper functions so you don't have to copy the bulk of it.

The reason I mention this is that rel_is_distinct_for() also
considered subqueries. This allows the LEFT JOIN removal code to
remove joins like:

SELECT t1.* FROM t1 LEFT JOIN (SELECT DISTINCT a FROM t2) t2 on t1.a=t2.a;

In this case, the DISTINCT clause was used for unique proofs, not a
unique index.

It does not seem unreasonable that someone one day might want to
extend the self join removal code to do the same, e.g:

SELECT * FROM t1 INNER JOIN (SELECT DISTINCT a FROM t1) t2 on t1.a = t2.a;

and you're not setting it up to be very easy to do that because you're
insisting the proofs are unique indexes only.

What do you think?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-03-20 12:20:35
Message-ID:	552e481b-2feb-75fd-4e9f-4199bfd1c1f3@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 커뮤니티SQL

On 3/14/19 14:21, David Rowley wrote:

> What do you think?

Let's recap the conditions when we can remove a self-join. It is when
for each outer row, 1) at most one inner row matches the join clauses,
and 2) it is the same row as the outer one. I'm not sure what (2) means
precisely in a general case, but for a plain table, we can identify
these rows by ctid. So when both sides have the same unique index with
the same clauses, we conclude that we are always dealing with the same
row (as identified by ctid) on both sides, hence the join can be
replaced with a scan.

The code I wrote just checks for the above conditions. The data we need
for these checks is a byproduct of checking the relations for
uniqueness, which we do anyway, so we just cache it for a negligible cost.

I didn't write it in a more generic way because I don't understand the
conditions for generic case. In your DISTINCT example, the join can be
removed indeed. But if we select some columns from the inner side apart
from the join ones, we can't remove the join anymore:

select * from t1, (select distinct on (a) a, b from t1) tt where t1.a =
tt.a;

I think this might be a different kind of optimization, where we remove
the self-join if the inner side is unique, and no inner columns are
selected besides the join ones.

Also, reading your letter I realized that I don't commute the index
clauses correctly before comparing them in is_unique_self_join, so I
fixed this in the new version of the patch.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
v12-0001-Remove-unique-self-joins.patch	text/x-patch	62.6 KB

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-03-20 22:54:49
Message-ID:	CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg+o1MGZk4mHBn_Rg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 21 Mar 2019 at 01:20, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> Let's recap the conditions when we can remove a self-join. It is when
> for each outer row, 1) at most one inner row matches the join clauses,
> and 2) it is the same row as the outer one. I'm not sure what (2) means
> precisely in a general case, but for a plain table, we can identify
> these rows by ctid. So when both sides have the same unique index with
> the same clauses, we conclude that we are always dealing with the same
> row (as identified by ctid) on both sides, hence the join can be
> replaced with a scan.
>
> The code I wrote just checks for the above conditions. The data we need
> for these checks is a byproduct of checking the relations for
> uniqueness, which we do anyway, so we just cache it for a negligible cost.
>
> I didn't write it in a more generic way because I don't understand the
> conditions for generic case. In your DISTINCT example, the join can be
> removed indeed. But if we select some columns from the inner side apart
> from the join ones, we can't remove the join anymore:
>
> select * from t1, (select distinct on (a) a, b from t1) tt where t1.a =
> tt.a;
>
> I think this might be a different kind of optimization, where we remove
> the self-join if the inner side is unique, and no inner columns are
> selected besides the join ones.
>
>
> Also, reading your letter I realized that I don't commute the index
> clauses correctly before comparing them in is_unique_self_join, so I
> fixed this in the new version of the patch.

I really just don't think checking the unique indexes match is going
to cut it. You should be looking for a unique index where the join
clauses match on either side of the join, not looking independently
and then checking the indexes are the same ones.

Here's an example of what can go wrong with your current code:

drop table abc;
create table abc(a int, b int, c int);
create unique index on abc(a);
create unique index on abc(b);
create unique index on abc(c);
explain select * from abc a1 inner join abc a2 on a1.a=a2.b and a1.c=a2.c;
QUERY PLAN
---------------------------------------------------------
Seq Scan on abc a2 (cost=0.00..35.50 rows=10 width=24)
Filter: ((c IS NOT NULL) AND (a = b))
(2 rows)

The above seems fine, but let's try again, this time change the order
that the indexes are defined.

drop table abc;
create table abc(a int, b int, c int);
create unique index on abc(a);
create unique index on abc(c);
create unique index on abc(b);
explain select * from abc a1 inner join abc a2 on a1.a=a2.b and a1.c=a2.c;
QUERY PLAN
-----------------------------------------------------------------------
Hash Join (cost=61.00..102.11 rows=1 width=24)
Hash Cond: ((a1.a = a2.b) AND (a1.c = a2.c))
-> Seq Scan on abc a1 (cost=0.00..30.40 rows=2040 width=12)
-> Hash (cost=30.40..30.40 rows=2040 width=12)
-> Seq Scan on abc a2 (cost=0.00..30.40 rows=2040 width=12)
(5 rows)

oops. I think behaviour like this that depends on the order that
indexes are created is not going to cut it. Probably you maybe could
restrict the join qual list to just quals that have the same expr on
either side of the join, that way you could still use
innerrel_is_unique() to check the inner rel is unique. You'll likely
need to pass force_cache as false though, since you don't want to
cache non-uniqueness with a subset of joinquals. Doing that could
cause unique joins not to work when the join search is done via GEQO.
I also think this way would give you the subquery GROUP BY / DISTINCT
self join removal for just about free. However, there might be more
cleanup to be done in that case... I've not thought about that too
hard.

I'm going to set this to waiting on author, as that's quite a decent
sized change.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-03-22 14:39:40
Message-ID:	31c2ee33-5db1-a43c-b964-8e2624020ae9@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg와이즈 토토SQL

On 3/21/19 01:54, David Rowley wrote:
> I really just don't think checking the unique indexes match is going
> to cut it. You should be looking for a unique index where the join
> clauses match on either side of the join, not looking independently
> and then checking the indexes are the same ones.

The bug you mention later is an implementation bug that can be fixed (I
will expand on that below). Besides this, do you think current self-join
detection algorithm has fundamental correctness problems? I am not aware
of such problems and this algorithm reflects my understanding of what
constitutes a removable self-join, so it would be helpful if you could
explain what exactly is wrong with it.

Your alternative idea sounds plausible, but I won't know it's correct
for sure until I implement it, which is a nontrivial amount of work. I
am also concerned that we will have to redo calculations similar to
innerrel_is_unique(), because having near-zero overhead is a hard
prerequisite for this patch, as was discussed upthread.

In short, I am reluctant to implement a new approach to detection until
I understand why the current one is fundamentally broken.

> Here's an example of what can go wrong with your current code:

This is a bug indeed. Unique index search is not exhaustive, so if many
indexes match the join quals, we might not find the same index for both
sides. I think this can be overcome if we switch to exhaustive search in
relation_has_unique_index_for, and then try to find matching indexes in
is_unique_self_join. Please see the new patch for the fix.

> I also think this way would give you the subquery GROUP BY / DISTINCT
> self join removal for just about free.

Could you explain how exactly we can generalize join removal to the
DISTINCT/GROUP BY case? I understand the removable self-joins as having
the same row on both sides of join, as identified by ctid, but I'm not
sure how to apply this to subqueries. Your earlier DISTINCT example
looked like it needed a different kind of join removal, with different
conditions for when it applies (please see my previous letter for details).

Another thing I'd like to mention is that this patch splits in two
independent parts, the detection of self-joins and their removal. While
we are looking for agreement on the detection part, could you also
review the removal part? I'm sure it has its own share of problems.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
v13-0001-Remove-unique-self-joins.patch	text/x-patch	70.7 KB

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-03-22 15:13:09
Message-ID:	CAKJS1f_cPDTRBTW4KpuD5fktVjPs2ugg704CJn=OxWXoUv-rDw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 23 Mar 2019 at 03:39, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> The bug you mention later is an implementation bug that can be fixed (I
> will expand on that below). Besides this, do you think current self-join
> detection algorithm has fundamental correctness problems? I am not aware
> of such problems and this algorithm reflects my understanding of what
> constitutes a removable self-join, so it would be helpful if you could
> explain what exactly is wrong with it.

I did explain what is wrong with it, and also showed an example of why
it is broken. I didn't see anything that looks fundamentally broken,
just the implementation needs more work.

> Your alternative idea sounds plausible, but I won't know it's correct
> for sure until I implement it, which is a nontrivial amount of work. I
> am also concerned that we will have to redo calculations similar to
> innerrel_is_unique(), because having near-zero overhead is a hard
> prerequisite for this patch, as was discussed upthread.

Are you worried about bypassing the unique rel cache is going to be too costly?

> > Here's an example of what can go wrong with your current code:
>
> This is a bug indeed. Unique index search is not exhaustive, so if many
> indexes match the join quals, we might not find the same index for both
> sides. I think this can be overcome if we switch to exhaustive search in
> relation_has_unique_index_for, and then try to find matching indexes in
> is_unique_self_join. Please see the new patch for the fix.

I really don't think modifying relation_has_unique_index_for to
collect details for up to 5 indexes is a good fix for this. It looks
like it's still possible to trigger this, just the example would need
to be more complex. Also, you've likely just more than doubled the
runtime of a successful match in relation_has_unique_index_for().
Previously, on average we'd have found that match halfway through the
list of indexes. Now it's most likely you'll end up looking at every
index, even after a match has been found. That does not seem well
aligned to keeping the CPU overhead for the patch low.

What is wrong with just weeding out join quals that don't have the
equivalent expression on either side before passing them to
relation_has_unique_index_for()? That'll save you from getting false
matches like I showed. To make that work it just seems mostly like
you'd mostly just need to swap the order of operations in the patch,
but you'd also likely end up needing to rip out all the UniqueRelInfo
code, since I don't think that'll be needed any longer. Likely that
means your entire patch would be limited to analyzejoins.c, although
I'm unsure what of the eclass editing code should be moved into
equivclass.c.

> > I also think this way would give you the subquery GROUP BY / DISTINCT
> > self join removal for just about free.
>
>
> Could you explain how exactly we can generalize join removal to the
> DISTINCT/GROUP BY case? I understand the removable self-joins as having
> the same row on both sides of join, as identified by ctid, but I'm not
> sure how to apply this to subqueries. Your earlier DISTINCT example
> looked like it needed a different kind of join removal, with different
> conditions for when it applies (please see my previous letter for details).

Well, by removing the requirement that the unique proofs have to come
from a unique index. I don't think you need to ensure this works for
your patch, it would be nice if it did for free. I just don't think
your implementation should block it from ever working.

> Another thing I'd like to mention is that this patch splits in two
> independent parts, the detection of self-joins and their removal. While
> we are looking for agreement on the detection part, could you also
> review the removal part? I'm sure it has its own share of problems.

I'd rather focus on the detection method before reviewing the removal
code. If there's some blocker in the detection code then the removal
code is not useful.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-03-25 04:07:33
Message-ID:	CAKJS1f8cJOCGyoxi7a_LG7eu+WKF9+HTff3wp1KKS5gcUg2Qfg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 23 Mar 2019 at 04:13, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> On Sat, 23 Mar 2019 at 03:39, Alexander Kuzmenkov
> > This is a bug indeed. Unique index search is not exhaustive, so if many
> > indexes match the join quals, we might not find the same index for both
> > sides. I think this can be overcome if we switch to exhaustive search in
> > relation_has_unique_index_for, and then try to find matching indexes in
> > is_unique_self_join. Please see the new patch for the fix.
>
> I really don't think modifying relation_has_unique_index_for to
> collect details for up to 5 indexes is a good fix for this. It looks
> like it's still possible to trigger this, just the example would need
> to be more complex. Also, you've likely just more than doubled the
> runtime of a successful match in relation_has_unique_index_for().
> Previously, on average we'd have found that match halfway through the
> list of indexes. Now it's most likely you'll end up looking at every
> index, even after a match has been found. That does not seem well
> aligned to keeping the CPU overhead for the patch low.
>
> What is wrong with just weeding out join quals that don't have the
> equivalent expression on either side before passing them to
> relation_has_unique_index_for()? That'll save you from getting false
> matches like I showed. To make that work it just seems mostly like
> you'd mostly just need to swap the order of operations in the patch,
> but you'd also likely end up needing to rip out all the UniqueRelInfo
> code, since I don't think that'll be needed any longer. Likely that
> means your entire patch would be limited to analyzejoins.c, although
> I'm unsure what of the eclass editing code should be moved into
> equivclass.c.

I've done some minimal modifications to your v12 patch to make it work
the way I described. I ended up splitting out the joinqual list into
two lists; one that contains just self join quals that match on either
side, and the other with the remaining quals. We just pass the self
join matching quals to innerrel_is_unique() so we no longer can trick
it into matching with the wrong index.

I didn't modify remove_self_join_rel(). It could make use of the split
lists instead of checking what we've already checked in
split_selfjoin_quals(). i.e. selfjoinquals get a IS NOT NULL test
added to the basequals and the otherjoinquals have their varnos
changed and then applied to the basequals too.

There was a couple of regression test failures using your version of
the tests. One test just went back to what it was before you changed
the output and the other seems like a missed optimisation in your
version of the patch.

Namely, you didn't remove the self join in a case like:

explain select * from t1 inner join t1 t2 on t1.a=t2.a where t1.b = 1
and t2.b = 2;

but my version does. You had commented the test with:

-- If index conditions are different for each side, we won't select the same
-- row on both sides, so the join can't be removed.

but I don't quite understand why we can't remove the join in this
situation. For this particular case no rows can match, so maybe the
plan should really be the same as what happens when you do:

# explain select * from t1 where b = 1 and b = 2;
QUERY PLAN
-------------------------------------------------------------------------
Result (cost=0.15..8.17 rows=1 width=8)
One-Time Filter: false
-> Index Scan using t1_b_key on t1 (cost=0.15..8.17 rows=1 width=8)
Index Cond: (b = 1)
(4 rows)

For now, it still produces a plan like:

QUERY PLAN
-----------------------------------------------------------------------
Index Scan using t1_b_key on t1 t2 (cost=0.15..8.17 rows=1 width=16)
Index Cond: ((b = 2) AND (b = 1))
Filter: (a IS NOT NULL)
(3 rows)

My implementation of split_selfjoin_quals() still needs work. I only
wrote code to handle simple Vars. There could be Unique indexes on
exprs too, which the current code will fail to detect. I don't think
the join quals can contain vars from higher levels, but I didn't go to
the trouble of checking that in detail. If it's possible then we'll
need to reject those.

I also finished off the renaming of remove_useless_left_joins(). I
didn't go into any detail on the actual removal code.

I've attached a complete patch and also a delta against v12.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
v14-remove-unique-self-joins.patch	application/octet-stream	61.6 KB
v14-remove-unique-self-joins_delta.patch	application/octet-stream	25.2 KB

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-03-25 13:45:02
Message-ID:	89c8a17d-01f8-a03d-7fb8-0a8a070be47c@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/25/19 07:07, David Rowley wrote:
> You had commented the test with:
>
> -- If index conditions are different for each side, we won't select the same
> -- row on both sides, so the join can't be removed.
>
> but I don't quite understand why we can't remove the join in this
> situation.

My rationale was that we're not exactly removing the join, but replacing
it with a scan. So it is not enough to just have a single row on each
side, it must be the same physical row. In this example, before the
transformation, t1.b is not equal to t2.b, but they become equal
afterwards. This looks somewhat wrong. On the other hand, if the
conditions are different, the resulting condition is going to evaluate
to constant false and we won't get any rows, so maybe it's OK.

This brings me again to the question of what are the conditions for join
removal. If the formulation with indexes is not general enough, what do
we use instead? I guess it could be something like this:

1. Given the (btree equality) join and restriction clauses, both sides
are unique on the same set of columns. That is, if we fix the values of
these columns, both sides have at most one matching row.

a. For each of these columns, we have either

i) a join clause that equates some expression referencing the
outer column to the same expression referencing the same inner column.

ii) a clause for each relation that equates the same expression
referencing the outer and inner column to some other arbitrary
expression, possibly a different one for each side. This expression may
be a Const or some expression that references a Var of some third relation.

2. All the resulting columns can be calculated using either side of the
join. For now, just require that both sides are base relations that
refer to the same physical relation.

Two points are not clear to me here:

1. We don't handle join clauses to third relations, but can they be
treated the same way we treat Consts?

2. Can we simplify the join when we don't have any join clauses and only
have Consts? Or should we have at least one join clause that equates the
same inner and outer column? Why is one join clause enough?

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2019-03-25 15:13:24
Message-ID:	fa5ce779-2200-702f-85ec-780f1769cd66@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I noticed you lost a couple of test cases in v14, so I added them back.

I also added a case where your implementation returns a different number
of rows compared to vanilla:

select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;

Please see the attached v15.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
v15-0001-Remove-self-joins.patch	text/x-patch	54.1 KB

From:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2019-05-13 16:21:24
Message-ID:	9b226ea2-bce0-90b2-9909-7964b7f353f7@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/25/19 18:13, Alexander Kuzmenkov wrote:
>
> Please see the attached v15.
>

I won't be able to continue working on this because I'm changing jobs.
My colleague Arseny Sher is probably going to take over.

Here is a v16 that is a rebased v12, plus renames from v15, plus a
couple of bug fixes (use bms_is_empty instead of a NULL check, and
properly create a NullTest clause when replacing "X = X").

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-self-joins-v16.patch	text/x-patch	65.8 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2019-06-27 06:42:05
Message-ID:	8556348d-b5fa-728a-e4db-8dcad9d11e4b@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alexander Lakhin detected the bug in the 'remove self joins' patch:

test:
=====

CREATE TABLE t (a INT UNIQUE);
INSERT INTO t VALUES (1);
SELECT 1 FROM (SELECT x.* FROM t AS x, t AS y WHERE x.a = y.a) AS q,
LATERAL generate_series(1, q.a) gs(i);

Description:
============

FUNCTIONS, TABLEFUNCS and VALUES plan nodes uses direct link to the rte
table. We need to change varno references to relid which will be kept.

Version v.17 of the patch that fix the bug see in attachment.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-Self-Joins-v17.patch	text/x-patch	66.8 KB

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2019-08-01 23:54:47
Message-ID:	CA+hUKGK_gpPm4AN7rV0RUvdFus=V_P_7DEVErRfMdeQt4Zr3vA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 27, 2019 at 6:42 PM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> Version v.17 of the patch that fix the bug see in attachment.

While moving this to the September CF, I noticed that it needs to be
updated for the recent pg_list.h API changes.

--
Thomas Munro
https://enterprisedb.com

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2019-08-05 11:24:52
Message-ID:	9aa6421b-fa8c-b578-6f18-0174c51f0884@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg범퍼카 토토SQL

On 02/08/2019 04:54, Thomas Munro wrote:
> On Thu, Jun 27, 2019 at 6:42 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> Version v.17 of the patch that fix the bug see in attachment.
>
> While moving this to the September CF, I noticed that it needs to be
> updated for the recent pg_list.h API changes.
The patch was updated:
1. Changes caused by pg_list.h API changes.
2. Fix the problem of joint clause_relids and required_relids changes [1].
3. Add eclass mentions of removed relation into the kept relation (field
eclass_indexes was introduced by commit 3373c71553).

[1]
/message-id/flat/5c21029d-81a2-c999-6744-6a898fcc9a19%40postgrespro.ru

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-self-joins-v18.patch	text/x-patch	67.6 KB

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2019-08-07 16:23:34
Message-ID:	bfcdcf79-d726-3986-597e-bbd9b68a67cc@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05.08.2019 14:24, Andrey Lepikhov wrote:
>
>
> On 02/08/2019 04:54, Thomas Munro wrote:
>> On Thu, Jun 27, 2019 at 6:42 PM Andrey Lepikhov
>> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>>> Version v.17 of the patch that fix the bug see in attachment.
>>
>> While moving this to the September CF, I noticed that it needs to be
>> updated for the recent pg_list.h API changes.
> The patch was updated:
> 1. Changes caused by pg_list.h API changes.
> 2. Fix the problem of joint clause_relids and required_relids changes
> [1].
> 3. Add eclass mentions of removed relation into the kept relation
> (field eclass_indexes was introduced by commit 3373c71553).
>
> [1]
> /message-id/flat/5c21029d-81a2-c999-6744-6a898fcc9a19%40postgrespro.ru
>

One more bug is fixed in this patch: OR clauses were not correctly
updated in case of self join removal.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-self-joins-v19.patch	text/x-patch	67.5 KB

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2019-09-27 16:38:00
Message-ID:	2c08741c-3d58-f517-23de-549d6c15e14e@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Some more bug fixes in this patch.

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-self-joins-20.patch	text/x-patch	71.8 KB

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2019-09-30 08:29:44
Message-ID:	cd839807-fc9e-9028-a132-a70631a1edbf@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Slightly refactored version of the patch with more comments.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-self-joins-v20.patch	text/x-patch	72.0 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2019-11-05 18:20:28
Message-ID:	f84b7d25-d99c-5dd8-c9cb-64a3731a3e2a@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

v.21 in attechment fix small bug:
Now we move all non-mergejoinable clauses from joininfo to base restrict
info because of the relation will not be joined.

On 30/09/2019 13:29, Konstantin Knizhnik wrote:
>
> Slightly refactored version of the patch with more comments.
>

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-self-joins-v21.patch	text/x-patch	74.6 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-01-28 04:10:07
Message-ID:	9315b1bc-f134-93aa-ed38-0d6a08719058@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Rebased version v.22.
- Added enable_self_join_removal GUC (true is default)
- The joinquals of the relation that is being removed, redistributed in
accordance with the remove_rel_from_query () machinery.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-self-joins-v.22.patch	text/x-patch	76.2 KB

From:	David Steele <david(at)pgmasters(dot)net>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-04-01 15:34:54
Message-ID:	031ab7da-c943-cd65-590e-671add024e3f@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg무지개 토토SQL

On 1/27/20 11:10 PM, Andrey Lepikhov wrote:
> Rebased version v.22.
> - Added enable_self_join_removal GUC (true is default)
> - The joinquals of the relation that is being removed, redistributed in
> accordance with the remove_rel_from_query () machinery.

This patch no longer applies cleanly on
src/test/regress/sql/equivclass.sql:
http://cfbot.cputube.org/patch_27_1712.log

The CF entry has been updated to Waiting on Author.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-04-03 04:43:41
Message-ID:	8a427a9a-50cd-51e9-3c1c-2176eceb39b0@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg무지개 토토SQL

On 4/1/20 8:34 PM, David Steele wrote:
> This patch no longer applies cleanly on
> src/test/regress/sql/equivclass.sql:
> http://cfbot.cputube.org/patch_27_1712.log
>
> The CF entry has been updated to Waiting on Author.

v.23 in attachment:

1. The patch applies cleanly.
2. Add checks: two potentially self joined relations may belong to
different rules of order restriction in join_info_list.
3. Add test for item 2.

The CF entry has been updated to Needs review.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Remove-Self-Joins-v.23.patch	text/x-patch	79.7 KB

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-09-23 04:23:27
Message-ID:	CAApHDvpggnFMC4yP-jUO7PKN=fXeErW5bOxisvJ0HvkHQEY=Ww@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> v.23 in attachment:

Hi,

This is only a partial review as I see the patch still does not
incorporate the self join detection method I mentioned in March 2019
and the one the patch only partially works.

Here's the partial review which contains the details:

1. Change to aset.c not relevant to this patch.

--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1485,7 +1485,6 @@ AllocSetCheck(MemoryContext context)

chsize = chunk->size; /* aligned chunk size */
dsize = chunk->requested_size; /* real data */
-
/*
* Check chunk size
*/

2. Why GUC_NOT_IN_SAMPLE?

+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE

3. I'd expect this to remove the join.

create table t1 (
a int not null unique,
b int not null unique,
c int not null unique,
d int not null,
e int not null
);

explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.b=t2.c;
QUERY PLAN
----------------------------------------------------------------------
Hash Join (cost=52.50..88.42 rows=1 width=40)
Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.c))
-> Seq Scan on t1 (cost=0.00..27.00 rows=1700 width=20)
-> Hash (cost=27.00..27.00 rows=1700 width=20)
-> Seq Scan on t1 t2 (cost=0.00..27.00 rows=1700 width=20)
(5 rows)

This one seems to work.

This one *does* seem to work.

explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.d=t2.e;

You should likely implement the method I wrote in the final paragraph in [1]

The basic idea here is you first process the join quals:

t1.a=t2.a and t1.b=t2.c

and keep just the ones that use the same varattno on either side
(t1.a=t2.a). Perhaps it's not worth thinking about Exprs for now, or
if you think it is you can normalise the varnos to 1 and equal()

Then just pass the remaining quals to relation_has_unique_index_for().
If it returns true then there's no need to perform the opposite check
for the other relation. It would just return the same thing.

This will allow you to get rid of using the following as proofs:

/* We must have the same unique index for both relations. */
if (outerinfo->index->indexoid != innerinfo->index->indexoid)
return false;

... as I've shown above. This only works in some cases and that's not
really good enough.

Doing thing the way I describe will allow you to get rid of all the
UniqueRelInfo stuff.

4. Should be ok for partitioned tables though:

/*
* This optimization won't work for tables that have inheritance
* children.
*/
if (rte->inh)
continue;

David

[1] /message-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg%2Bo1MGZk4mHBn_Rg%40mail.gmail.com

From:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-10-31 09:26:29
Message-ID:	30b38869-9430-31fb-39e1-c735074b9259@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for this partial review, I included your changes:

On 9/23/20 9:23 AM, David Rowley wrote:
> On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> Doing thing the way I describe will allow you to get rid of all the
> UniqueRelInfo stuff.
Thanks for the review and sorry for the late reply.
I fixed small mistakes, mentioned in your letter.
Also I rewrote this patch at your suggestion [1].
Because of many changes, this patch can be viewed as a sketch.

To change self-join detection algorithm I used your delta patch from
[2]. I added in the split_selfjoin_quals routine complex expressions
handling for demonstration. But, it is not very useful with current
infrastructure, i think.

Also I implemented one additional way for self-join detection algorithm:
if the join target list isn't contained vars from inner relation, then
we can detect self-join with only quals like a1.x=a2.y if check
innerrel_is_unique is true.
Analysis of the target list is contained in the new routine -
tlist_contains_rel_exprs - rewritten version of the build_joinrel_tlist
routine.

Also changes of the join_is_removable() routine is removed from the
patch. I couldn't understand why it is needed here.

Note, this patch causes change of one join.sql regression test output.
It is not a bug, but maybe fixed.

Applied over commit 4a071afbd0.

> [1]
/message-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg%2Bo1MGZk4mHBn_Rg%40mail.gmail.com
[2]
/message-id/CAKJS1f8cJOCGyoxi7a_LG7eu%2BWKF9%2BHTff3wp1KKS5gcUg2Qfg%40mail.gmail.com

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v24-0001-Remove-self-joins.patch	text/x-patch	65.9 KB

From:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-11-27 09:19:20
Message-ID:	433c3093-6b94-8c68-f5e1-3a0ec1124ab9@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 사이트SQL

On 31.10.2020 12:26, Andrey V. Lepikhov wrote:
> Thank you for this partial review, I included your changes:
>
> On 9/23/20 9:23 AM, David Rowley wrote:
>> On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov
>> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> Doing thing the way I describe will allow you to get rid of all the
>> UniqueRelInfo stuff.
> Thanks for the review and sorry for the late reply.
> I fixed small mistakes, mentioned in your letter.
> Also I rewrote this patch at your suggestion [1].
> Because of many changes, this patch can be viewed as a sketch.
>
> To change self-join detection algorithm I used your delta patch from
> [2]. I added in the split_selfjoin_quals routine complex expressions
> handling for demonstration. But, it is not very useful with current
> infrastructure, i think.
>
> Also I implemented one additional way for self-join detection
> algorithm: if the join target list isn't contained vars from inner
> relation, then we can detect self-join with only quals like a1.x=a2.y
> if check innerrel_is_unique is true.
> Analysis of the target list is contained in the new routine -
> tlist_contains_rel_exprs - rewritten version of the
> build_joinrel_tlist routine.
>
> Also changes of the join_is_removable() routine is removed from the
> patch. I couldn't understand why it is needed here.
>
> Note, this patch causes change of one join.sql regression test output.
> It is not a bug, but maybe fixed.
>
> Applied over commit 4a071afbd0.
>
> > [1]
> /message-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg%2Bo1MGZk4mHBn_Rg%40mail.gmail.com
> [2]
> /message-id/CAKJS1f8cJOCGyoxi7a_LG7eu%2BWKF9%2BHTff3wp1KKS5gcUg2Qfg%40mail.gmail.com
>
Status update for a commitfest entry.

This entry was "Waiting on author" during this CF. As I see, the latest
message contains new version of the patch. Does it need more work? Are
you going to continue working on it?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-11-27 16:49:57
Message-ID:	16aa5d06-d8c6-416a-cb71-21663019944e@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
> + /*
> + * Process restrictlist to seperate out the self join quals from
> + * the other quals. e.g x = x goes to selfjoinquals and a = b to
> + * otherjoinquals.
> + */
> + split_selfjoin_quals(root, restrictlist, &selfjoinquals,
> + &otherjoinquals);
> +
> + if (list_length(selfjoinquals) == 0)
> + {
> + /*
> + * Have a chance to remove join if target list contains vars from
> + * the only one relation.
> + */
> + if (list_length(otherjoinquals) == 0)
> + {
> + /* Can't determine uniqueness without any quals. */
> + continue;
> +
> + }
> + else if (!tlist_contains_rel_exprs(root, joinrelids, inner))
> + {
> + if (!innerrel_is_unique(root, joinrelids, outer->relids,
> + inner, JOIN_INNER, otherjoinquals,
> + false))
> + continue;
> + }
> + else
> + /*
> + * The target list contains vars from both inner and outer
> + * relations.
> + */
> + continue;
> + }
> +
> + /*
> + * Determine if the inner table can duplicate outer rows. We must
> + * bypass the unique rel cache here since we're possibly using a
> + * subset of join quals. We can use 'force_cache' = true when all
> + * join quals are selfjoin quals. Otherwise we could end up
> + * putting false negatives in the cache.
> + */
> + else if (!innerrel_is_unique(root, joinrelids, outer->relids,
> + inner, JOIN_INNER, selfjoinquals,
> + list_length(otherjoinquals) == 0))
> + continue;

I don't understand the logic here. If 'selfjoinquals' is empty, it means
that there is no join qual between the two relations, right? How can we
ever remove the join in that case? And how does the target list affect
that? Can you give an example query of that?

> --- a/src/test/regress/expected/join.out
> +++ b/src/test/regress/expected/join.out
> @@ -4553,11 +4553,13 @@ explain (costs off)
> select p.* from
> (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
> where p.k = 1 and p.k = 2;
> - QUERY PLAN
> ---------------------------
> + QUERY PLAN
> +------------------------------------------------
> Result
> One-Time Filter: false
> -(2 rows)
> + -> Index Scan using parent_pkey on parent x
> + Index Cond: (k = 1)
> +(4 rows)
>
> -- bug 5255: this is not optimizable by join removal
> begin;

That doesn't seem like an improvement...

My general impression of this patch is that it's a lot of code to deal
with a rare special case. At the beginning of this thread there was
discussion on the overhead that this might add to planning queries that
don't benefit, but adding a lot of code isn't nice either, even if the
performance is acceptable. That's not necessarily a show-stopper, but it
does make me much less excited about this. I'm not sure what to suggest
to do about that, except a really vague "Can you make is simpler?"

- Heikki

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-11-28 17:21:17
Message-ID:	b32dda0d-41dc-381f-73f0-4702265c97aa@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg배트맨 토토SQL

Thank you for the review,

On 27.11.2020 21:49, Heikki Linnakangas wrote:
> On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
>> +            /*
>> +             * Process restrictlist to seperate out the self join
>> quals from
>> +             * the other quals. e.g x = x goes to selfjoinquals and a
>> = b to
>> +             * otherjoinquals.
>> +             */
>> +            split_selfjoin_quals(root, restrictlist, &selfjoinquals,
>> +                                 &otherjoinquals);
>> +
>> +            if (list_length(selfjoinquals) == 0)
>> +            {
>> +                /*
>> +                 * Have a chance to remove join if target list
>> contains vars from
>> +                 * the only one relation.
>> +                 */
>
> I don't understand the logic here. If 'selfjoinquals' is empty, it means
> that there is no join qual between the two relations, right? How can we
> ever remove the join in that case? And how does the target list affect
> that? Can you give an example query of that?

Maybe it is a problem of variable naming. Following the idea of David
Rowley, we split quals into two subsets: {x==x} and another, for example
{x=y}.
First set is an trivial case of self-join: if we have unique index on
the attribute 'x', then this join is self-join.
Second set is give us a chance: if right side is unique for right side
of the qual and no vars from right side end up in the target list of the
join, then this is a self-join case. Example:

CREATE TABLE a(x int, y int);
CREATE UNIQUE INDEX ON a(x);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
CREATE UNIQUE INDEX ON a(y);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join too

>
>> --- a/src/test/regress/expected/join.out
>> +++ b/src/test/regress/expected/join.out
>> @@ -4553,11 +4553,13 @@ explain (costs off)
>> select p.* from
>>    (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
>>    where p.k = 1 and p.k = 2;
>> -        QUERY PLAN ---------------------------
>> +                   QUERY PLAN
>> +------------------------------------------------
>> Result
>>     One-Time Filter: false
>> -(2 rows)
>> +   -> Index Scan using parent_pkey on parent x
>> +         Index Cond: (k = 1)
>> +(4 rows)
>>
>> -- bug 5255: this is not optimizable by join removal
>> begin;
>
> That doesn't seem like an improvement...

I investigated this case. It is a planner feature: it simplifies dummy
joins and dummy scans to different plans. Maybe it can cause some
discussion, but this example so rare and doesn't make a problem.

> My general impression of this patch is that it's a lot of code to deal
> with a rare special case. At the beginning of this thread there was
> discussion on the overhead that this might add to planning queries that
> don't benefit, but adding a lot of code isn't nice either, even if the
> performance is acceptable. That's not necessarily a show-stopper, but it
> does make me much less excited about this. I'm not sure what to suggest
> to do about that, except a really vague "Can you make is simpler?"

Currently planner reduces useless outer joins and unique semijoins.
Reduce self join feature continues the development of the planner in the
same direction. For example, it is needed for ORM software.
Most of the code dedicated to removing unnecessary relation and
replacing of one oid with another. We are trying to use
remove_rel_from_query() machinery. Perhaps this will allow us to make
the code shorter.

--
regards,
Andrey Lepikhov
Postgres Professional

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-11-29 17:10:55
Message-ID:	4bc7e3f9-d955-c213-dbcd-c41d4f83bbe6@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg배트맨 토토SQL

On 28/11/2020 19:21, Andrey Lepikhov wrote:
> On 27.11.2020 21:49, Heikki Linnakangas wrote:
>> On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
>>> +            /*
>>> +             * Process restrictlist to seperate out the self join
>>> quals from
>>> +             * the other quals. e.g x = x goes to selfjoinquals and a
>>> = b to
>>> +             * otherjoinquals.
>>> +             */
>>> +            split_selfjoin_quals(root, restrictlist, &selfjoinquals,
>>> +                                 &otherjoinquals);
>>> +
>>> +            if (list_length(selfjoinquals) == 0)
>>> +            {
>>> +                /*
>>> +                 * Have a chance to remove join if target list
>>> contains vars from
>>> +                 * the only one relation.
>>> +                 */
>>
>> I don't understand the logic here. If 'selfjoinquals' is empty, it means
>> that there is no join qual between the two relations, right? How can we
>> ever remove the join in that case? And how does the target list affect
>> that? Can you give an example query of that?
>
> Maybe it is a problem of variable naming. Following the idea of David
> Rowley, we split quals into two subsets: {x==x} and another, for example
> {x=y}.
> First set is an trivial case of self-join: if we have unique index on
> the attribute 'x', then this join is self-join.
> Second set is give us a chance: if right side is unique for right side
> of the qual and no vars from right side end up in the target list of the
> join, then this is a self-join case. Example:
>
> CREATE TABLE a(x int, y int);
> CREATE UNIQUE INDEX ON a(x);
> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
> CREATE UNIQUE INDEX ON a(y);
> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join too

The latter join is not "useless". The patch is returning incorrect
result for that query:

> postgres=# insert into a values (1, 2);
> INSERT 0 1
> postgres=# insert into a values (2, 1);
> INSERT 0 1
> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
> x | y
> ---+---
> (0 rows)
>
> postgres=# set enable_self_join_removal=off;
> SET
> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT RESULT
> x | y
> ---+---
> 1 | 2
> 2 | 1
> (2 rows)

- Heikki

From:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2020-11-30 05:50:57
Message-ID:	3c4906d2-2825-ae88-6368-bce8f3975269@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/29/20 10:10 PM, Heikki Linnakangas wrote:
> On 28/11/2020 19:21, Andrey Lepikhov wrote:
>> On 27.11.2020 21:49, Heikki Linnakangas wrote:
>> CREATE TABLE a(x int, y int);
>> CREATE UNIQUE INDEX ON a(x);
>> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
>> CREATE UNIQUE INDEX ON a(y);
>> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join too
>
> The latter join is not "useless". The patch is returning incorrect
> result for that query:
>
>> postgres=# insert into a values (1, 2);
>> INSERT 0 1
>> postgres=# insert into a values (2, 1);
>> INSERT 0 1
>> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
>> x | y ---+---
>> (0 rows)
>>
>> postgres=# set enable_self_join_removal=off;
>> SET
>> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT
>> RESULT
>> x | y ---+---
>> 1 | 2
>> 2 | 1
>> (2 rows)

Thanks, it is my fault. I tried to extend this patch with foreign key
references and made a mistake.
Currently I rollback this new option (see patch in attachment), but will
be working for a while to simplify this patch.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v25-0001-Remove-self-joins.patch	text/x-patch	65.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <dgrowleyml(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2021-01-07 14:08:03
Message-ID:	CAD21AoBrDio7GZnKbhOS0o++eB9WkfmPk0LvfLd_3CqE97n0eA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Andrey,

On Mon, Nov 30, 2020 at 2:51 PM Andrey V. Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> On 11/29/20 10:10 PM, Heikki Linnakangas wrote:
> > On 28/11/2020 19:21, Andrey Lepikhov wrote:
> >> On 27.11.2020 21:49, Heikki Linnakangas wrote:
> >> CREATE TABLE a(x int, y int);
> >> CREATE UNIQUE INDEX ON a(x);
> >> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
> >> CREATE UNIQUE INDEX ON a(y);
> >> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join too
> >
> > The latter join is not "useless". The patch is returning incorrect
> > result for that query:
> >
> >> postgres=# insert into a values (1, 2);
> >> INSERT 0 1
> >> postgres=# insert into a values (2, 1);
> >> INSERT 0 1
> >> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
> >> x | y ---+---
> >> (0 rows)
> >>
> >> postgres=# set enable_self_join_removal=off;
> >> SET
> >> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT
> >> RESULT
> >> x | y ---+---
> >> 1 | 2
> >> 2 | 1
> >> (2 rows)
>
> Thanks, it is my fault. I tried to extend this patch with foreign key
> references and made a mistake.
> Currently I rollback this new option (see patch in attachment), but will
> be working for a while to simplify this patch.

Are you working to simplify the patch? This patch has been "Waiting on
Author" for 1 month and doesn't seem to pass cfbot tests. Please
update the patch.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

From:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <dgrowleyml(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2021-01-11 04:07:15
Message-ID:	21e15759-dba5-2d2b-dd03-06943ad40378@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg롤 토토SQL :

On 1/7/21 7:08 PM, Masahiko Sawada wrote:
> On Mon, Nov 30, 2020 at 2:51 PM Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> Thanks, it is my fault. I tried to extend this patch with foreign key
>> references and made a mistake.
>> Currently I rollback this new option (see patch in attachment), but will
>> be working for a while to simplify this patch.
>
> Are you working to simplify the patch? This patch has been "Waiting on
> Author" for 1 month and doesn't seem to pass cfbot tests. Please
> update the patch.

Yes, I'm working to improve this feature.
In attachment - fixed and rebased on ce6a71fa53.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v25_1-0001-Remove-self-joins.patch	text/x-patch	65.0 KB

From:	Hywel Carver <hywel(at)skillerwhale(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2021-03-12 09:05:06
Message-ID:	161553990603.28624.7702383984090736980.pgcf@coridan.postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

I've built and tested this, and it seems to function correctly to me. One question I have is whether the added "IS NOT NULL" filters can be omitted when they're unnecessary. Some of the resulting plans included an "IS NOT NULL" filter on a non-nullable column. To be clear, this is still an improvement (to me) without that.

Here's the simple test script I ran, on master ("before") and with the latest patch applied ("after").

CREATE TABLE users (id BIGINT PRIMARY KEY, nullable_int BIGINT UNIQUE, some_non_nullable_int BIGINT NOT NULL);
CREATE VIEW only_some_users AS (SELECT * FROM users WHERE id < 10);
CREATE VIEW some_other_users AS (SELECT * FROM users WHERE id > 3);

EXPLAIN SELECT *
FROM users u1
INNER JOIN users u2
ON u1.id = u2.id;
-- before does HASH JOIN
-- after does seq scan with "id IS NOT NULL" condition

EXPLAIN SELECT *
FROM only_some_users
INNER JOIN some_other_users
ON only_some_users.id = some_other_users.id;
-- before does HASH JOIN
-- after does no JOIN, instead does scan, with an extra "id IS NOT NULL condition" (in addition to id < 10, id > 3)

EXPLAIN SELECT *
FROM users u1
INNER JOIN users u2
ON u1.nullable_int = u2.nullable_int;
-- before does HASH JOIN
-- after does scan with (nullable_int IS NOT NULL) filter

EXPLAIN SELECT *
FROM users u1
INNER JOIN users u2
ON u1.id = u2.nullable_int;
-- before does HASH JOIN
-- after correctly unchanged

EXPLAIN SELECT *
FROM users u1
INNER JOIN users u2
ON u1.id = u2.some_non_nullable_int
INNER JOIN users u3
ON u2.some_non_nullable_int = u3.id;
-- before does 2x HASH JOIN
-- now does 1x HASH JOIN, with a sequential scan over users filtered by id IS NOT NULL

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Hywel Carver <hywel(at)skillerwhale(dot)com>
Cc:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <dgrowleyml(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Removing unneeded self joins
Date:	2021-05-07 07:23:00
Message-ID:	e241df57-9dc7-d8fb-b870-68ecead56822@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/3/21 14:05, Hywel Carver wrote:
> I've built and tested this, and it seems to function correctly to me. One question I have is whether the added "IS NOT NULL" filters can be omitted when they're unnecessary. Some of the resulting plans included an "IS NOT NULL" filter on a non-nullable column. To be clear, this is still an improvement (to me) without that.

New version of the feature. Deeply refactored with main goal - to reduce
the code size) and rebased on current master.

Here I didn't work on 'unnecessary IS NOT NULL filter'.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v26-0001-Remove-self-joins.patch	text/plain	61.3 KB

From:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Hywel Carver <hywel(at)skillerwhale(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2021-05-27 06:51:23
Message-ID:	ef6684fb-41bc-7d63-36a0-ebcc16f6e4a6@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 핫SQL :

On 5/8/21 2:00 AM, Hywel Carver wrote:
> On Fri, May 7, 2021 at 8:23 AM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> Here I didn't work on 'unnecessary IS NOT NULL filter'.
>
> I've tested the new patch, and it is giving the same improved behaviour
> as the old patch.
Thank you for this efforts!

I cleaned the code of previous version, improved regression tests and
rebased on current master.

Also, I see that we could do additional optimizations for an
EC-generated selfjoin clause (See equivclass.patch for necessary
changes). Example:
explain (costs off)
select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b =
t2.a;
QUERY PLAN
-----------------------------------------------------
Seq Scan on sj t2
Filter: ((a IS NOT NULL) AND (b = a) AND (a = b))
(2 rows)

But I'm not sure that this patch need to be a part of the self-join
removal feature because of code complexity.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
equivclass.patch	text/x-patch	1.9 KB
v27-0001-Remove-self-joins.patch	text/x-patch	62.9 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Hywel Carver <hywel(at)skillerwhale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2021-06-30 11:21:15
Message-ID:	2a6e540a-4c9f-9402-2e02-71e35fcd4dd1@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/3/21 12:05, Hywel Carver wrote:
> I've built and tested this, and it seems to function correctly to me. One question I have is whether the added "IS NOT NULL" filters can be omitted when they're unnecessary. Some of the resulting plans included an "IS NOT NULL" filter on a non-nullable column. To be clear, this is still an improvement (to me) without that.
I think, here we could ask more general question: do we want to remove a
'IS NOT NULL' clause from the clause list if the rest of the list
implicitly implies it?

Right now we don't analyze list of clauses at all:
CREATE TABLE a (x int);

EXPLAIN (COSTS OFF)
SELECT * FROM a WHERE (x < 1) AND (X
IS NOT NULL) AND
(x < 1) AND (X IS NOT NULL);
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on a
Filter: ((x IS NOT NULL) AND (x IS NOT NULL) AND (x < 1) AND (x < 1))

And even worse:
EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM a WHERE (X IS NOT NULL) AND (X IS NULL);
QUERY PLAN

----------------------------------------------------------------------------------------------------
Seq Scan on public.a (cost=0.00..15.00 rows=87 width=4) (actual
time=0.136..0.136 rows=0 loops=1)
Output: x
Filter: ((a.x IS NOT NULL) AND (a.x IS NULL))
Rows Removed by Filter: 1000

It could reduce a number of selectivity mistakes, but increase CPU
consumption.
If we had such a clause analyzing machinery, we could trivially remove
this unneeded qual.

--
regards,
Andrey Lepikhov
Postgres Professional

From:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Hywel Carver <hywel(at)skillerwhale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2021-06-30 15:55:21
Message-ID:	b17c18d9fd379ce2355c355cdb44787a452961ec.camel@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2021-06-30 at 14:21 +0300, Andrey Lepikhov wrote:
> I think, here we could ask more general question: do we want to remove a
> 'IS NOT NULL' clause from the clause list if the rest of the list
> implicitly implies it?
>
> EXPLAIN (ANALYZE, VERBOSE)
> SELECT *
> FROM a WHERE (X IS NOT NULL) AND (X IS NULL);
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------
> Seq Scan on public.a (cost=0.00..15.00 rows=87 width=4) (actual time=0.136..0.136 rows=0 loops=1)
> Output: x
> Filter: ((a.x IS NOT NULL) AND (a.x IS NULL))
> Rows Removed by Filter: 1000
>
> It could reduce a number of selectivity mistakes, but increase CPU
> consumption.
> If we had such a clause analyzing machinery, we could trivially remove
> this unneeded qual.

On the other hand, sometimes something like that can be used to change
the optimizer's estimates to encourage certain plans.

We also don't optimize "ORDER BY x + 0" (which now you can use to prevent
an index scan) or the famous OFFSET 0, partly because it saves planning time,
partly because those can be useful tools.

Generally I have the impression that we are not too keen on spending
planning time on optimizing cases that can be trivially improved by rewriting
the query.

Yours,
Laurenz Albe

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-01 05:20:24
Message-ID:	1b4b177a-ef90-d3b9-59b2-5cec182f0efb@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 30/6/21 18:55, Laurenz Albe wrote:
> On Wed, 2021-06-30 at 14:21 +0300, Andrey Lepikhov wrote:
>> I think, here we could ask more general question: do we want to remove a
>> 'IS NOT NULL' clause from the clause list if the rest of the list
>> implicitly implies it?
> Generally I have the impression that we are not too keen on spending
> planning time on optimizing cases that can be trivially improved by rewriting
> the query.
Thank you, It's strong argument.
Such feature would look better as a part of an enterprise edition.

--
regards,
Andrey Lepikhov
Postgres Professional

From:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Hywel Carver <hywel(at)skillerwhale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-01 06:37:46
Message-ID:	4911774.fQZKmtWmdp@aivenronan
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Generally I have the impression that we are not too keen on spending
> planning time on optimizing cases that can be trivially improved by
> rewriting the query.

Well in some cases they can't, when the query is not emitting redundant
predicates by itself but they are added by something else like a view or a RLS
policy.
Maybe it would be worth it to allow spending a bit more time planning for
those cases ?

Regards,

--
Ronan Dunklau

From:	Hywel Carver <hywel(at)skillerwhale(dot)com>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-01 22:56:37
Message-ID:	CAFcA2FauZ2jGka8utvFmRE1eLrKfmJmPnQjtf_cdewChFu0qcQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:

> On 12/3/21 12:05, Hywel Carver wrote:
> > I've built and tested this, and it seems to function correctly to me.
> One question I have is whether the added "IS NOT NULL" filters can be
> omitted when they're unnecessary. Some of the resulting plans included an
> "IS NOT NULL" filter on a non-nullable column. To be clear, this is still
> an improvement (to me) without that.

I think, here we could ask more general question: do we want to remove a
> 'IS NOT NULL' clause from the clause list if the rest of the list
> implicitly implies it?

My suggestion was not to remove it, but to avoid adding it in the first
place. When your optimisation has found a join on a group of columns under
a uniqueness constraint, you would do something like this (forgive the
pseudo-code)

foreach(column, join_clause) {
if(column.nullable) { // This condition is what I'm suggesting is added
add_null_test(column, IS_NOT_NULL);
}
}

But it may be that that's not possible or practical at this point in the
code.

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Hywel Carver <hywel(at)skillerwhale(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-05 13:20:01
Message-ID:	4b69ea09-5047-cd8a-a0d7-de1f12120433@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 커뮤니티SQL

On 2/7/21 01:56, Hywel Carver wrote:
> On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> I think, here we could ask more general question: do we want to
> remove a
> 'IS NOT NULL' clause from the clause list if the rest of the list
> implicitly implies it?
>
>
> My suggestion was not to remove it, but to avoid adding it in the first
> place. When your optimisation has found a join on a group of columns
> under a uniqueness constraint, you would do something like this (forgive
> the pseudo-code)
>
> foreach(column, join_clause) {
> if(column.nullable) { // This condition is what I'm suggesting is added
> add_null_test(column, IS_NOT_NULL);
> }
> }
>
> But it may be that that's not possible or practical at this point in the
> code.
I think, such option will require to implement a new machinery to prove
that arbitrary column couldn't produce NULL value.

--
regards,
Andrey Lepikhov
Postgres Professional

From:	Hywel Carver <hywel(at)skillerwhale(dot)com>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-06 10:49:54
Message-ID:	CAFcA2FZA4MDp4wYguZtMt=5fZrvcGM4duMyriP210NKnVCETtw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:

> On 2/7/21 01:56, Hywel Carver wrote:
> > On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > I think, here we could ask more general question: do we want to
> > remove a
> > 'IS NOT NULL' clause from the clause list if the rest of the list
> > implicitly implies it?
> >
> >
> > My suggestion was not to remove it, but to avoid adding it in the first
> > place. When your optimisation has found a join on a group of columns
> > under a uniqueness constraint, you would do something like this (forgive
> > the pseudo-code)
> >
> > foreach(column, join_clause) {
> > if(column.nullable) { // This condition is what I'm suggesting is
> added
> > add_null_test(column, IS_NOT_NULL);
> > }
> > }
> >
> > But it may be that that's not possible or practical at this point in the
> > code.
> I think, such option will require to implement a new machinery to prove
> that arbitrary column couldn't produce NULL value.
>

Got it, and it makes sense to me that this would be out of scope for this
change.

I remember in the previous conversation about this, Tomas acknowledged that
while there are some silly queries that would benefit from this change,
there are also some well-written ones (e.g. properly denormalised table
structures, with decomposed views that need joining together in some
queries). So the optimization needs to be essentially free to run to
minimise impact on other queries.

Looking through the email chain, a previous version of this patch added
~0.6% to planning time in the worst case tested - does that meet the
"essentially free" requirement?

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Hywel Carver <hywel(at)skillerwhale(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-15 12:05:23
Message-ID:	CALDaNm1FyUqAierWUrMkQ8E+c3UnFJxGC1L7xtQb3YjxLHARdw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 27, 2021 at 12:21 PM Andrey V. Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> On 5/8/21 2:00 AM, Hywel Carver wrote:
> > On Fri, May 7, 2021 at 8:23 AM Andrey Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > Here I didn't work on 'unnecessary IS NOT NULL filter'.
> >
> > I've tested the new patch, and it is giving the same improved behaviour
> > as the old patch.
> Thank you for this efforts!
>
> I cleaned the code of previous version, improved regression tests and
> rebased on current master.
>
> Also, I see that we could do additional optimizations for an
> EC-generated selfjoin clause (See equivclass.patch for necessary
> changes). Example:
> explain (costs off)
> select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b =
> t2.a;
> QUERY PLAN
> -----------------------------------------------------
> Seq Scan on sj t2
> Filter: ((a IS NOT NULL) AND (b = a) AND (a = b))
> (2 rows)
>
> But I'm not sure that this patch need to be a part of the self-join
> removal feature because of code complexity.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Hywel Carver <hywel(at)skillerwhale(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Removing unneeded self joins
Date:	2021-07-15 14:49:11
Message-ID:	a141d07f-d686-f42e-c4e5-f22dd33b40bd@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/7/21 13:49, Hywel Carver wrote:
> On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> Looking through the email chain, a previous version of this patch added
> ~0.6% to planning time in the worst case tested - does that meet the
> "essentially free" requirement?
I think these tests weren't full coverage of possible use cases. It will
depend on a number of relations in the query. For the JOIN of
partitioned tables, for example, the overhead could grow. But in the
context of overall planning time this overhead will be small till the
large number of relations.
Also, we made this feature optional to solve possible problems.
Rebased on 768ea9bcf9

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v28-0001-Remove-self-joins.patch	text/plain	62.9 KB

From:	Zhihong Yu <zyu(at)yugabyte(dot)com>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-15 15:25:56
Message-ID:	CALNJ-vSr0Jxn7FnDsP2EEwGXKLdCYg1b3qJ3c+yFXtPmoSa0jw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 15, 2021 at 7:49 AM Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:

> On 6/7/21 13:49, Hywel Carver wrote:
> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > Looking through the email chain, a previous version of this patch added
> > ~0.6% to planning time in the worst case tested - does that meet the
> > "essentially free" requirement?
> I think these tests weren't full coverage of possible use cases. It will
> depend on a number of relations in the query. For the JOIN of
> partitioned tables, for example, the overhead could grow. But in the
> context of overall planning time this overhead will be small till the
> large number of relations.
> Also, we made this feature optional to solve possible problems.
> Rebased on 768ea9bcf9
>
> --
> regards,
> Andrey Lepikhov
> Postgres Professional
>
Hi,

bq. We can proof the uniqueness

proof -> prove

1. Collect all mergejoinable join quals looks like a.x = b.x

quals looks like -> quals which look like

For update_ec_sources(), the variable cc is not used.

Cheers

From:	Zhihong Yu <zyu(at)yugabyte(dot)com>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-15 19:28:13
Message-ID:	CALNJ-vSS-bWvNEacBhKtEcVhoBA_trPEHGMHKB_gaeeC4_=v2A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 15, 2021 at 8:25 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:

>
>
> On Thu, Jul 15, 2021 at 7:49 AM Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
> wrote:
>
>> On 6/7/21 13:49, Hywel Carver wrote:
>> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
>> > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
>> > Looking through the email chain, a previous version of this patch added
>> > ~0.6% to planning time in the worst case tested - does that meet the
>> > "essentially free" requirement?
>> I think these tests weren't full coverage of possible use cases. It will
>> depend on a number of relations in the query. For the JOIN of
>> partitioned tables, for example, the overhead could grow. But in the
>> context of overall planning time this overhead will be small till the
>> large number of relations.
>> Also, we made this feature optional to solve possible problems.
>> Rebased on 768ea9bcf9
>>
>> --
>> regards,
>> Andrey Lepikhov
>> Postgres Professional
>>
> Hi,
>
> bq. We can proof the uniqueness
>
> proof -> prove
>
> 1. Collect all mergejoinable join quals looks like a.x = b.x
>
> quals looks like -> quals which look like
>
> For update_ec_sources(), the variable cc is not used.
>
> Cheers
>
Hi,

+ *otherjoinquals = rjoinquals;

Maybe rename rjoinquals as ojoinquals to align with the target variable
name.

+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */

Naming k as kept, r as removed would make the code more readable (remain
starts with r but has opposite meaning).

+ if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;

There are 4 if statements where jinfo_check is assigned false. Once
jinfo_check is assigned, we can break out of the loop - instead of checking
the remaining conditions.

+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,

nit: the 'else' is not needed since the if block above it goes to next
iteration of the loop.

+ /* See for row marks. */
+ foreach (lc, root->rowMarks)

It seems once imark and omark are set, we can come out of the loop.

Cheers

From:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc:	Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2021-07-27 05:34:30
Message-ID:	41f6fa78-e4a5-bd2a-d817-8dc2060ef47c@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7/16/21 12:28 AM, Zhihong Yu wrote:
>
>
> On Thu, Jul 15, 2021 at 8:25 AM Zhihong Yu <zyu(at)yugabyte(dot)com
> <mailto:zyu(at)yugabyte(dot)com>> wrote:
> bq. We can proof the uniqueness
>
> proof -> prove
Fixed
>
> 1. Collect all mergejoinable join quals looks like a.x = b.x
>
> quals looks like -> quals which look like
>
> For update_ec_sources(), the variable cc is not used.
Fixed
> + *otherjoinquals = rjoinquals;
>
> Maybe rename rjoinquals as ojoinquals to align with the target variable
> name.
Agree, fixed
>
> + int k; /* Index of kept relation */
> + int r = -1; /* Index of removed relation */
>
> Naming k as kept, r as removed would make the code more readable (remain
> starts with r but has opposite meaning).
I think it is correct now: k - index of inner (keeping) relation. r - of
outer (removing) relation.
>
> + if (bms_is_member(r, info->syn_righthand) &&
> + !bms_is_member(k, info->syn_righthand))
> + jinfo_check = false;
> +
> + if (!jinfo_check)
> + break;
>
> There are 4 if statements where jinfo_check is assigned false. Once
> jinfo_check is assigned, we can break out of the loop - instead of
> checking the remaining conditions.
Fixed
>
> + else if (!innerrel_is_unique(root, joinrelids, outer->relids,
>
> nit: the 'else' is not needed since the if block above it goes to next
> iteration of the loop.
Fixed
>
> + /* See for row marks. */
> + foreach (lc, root->rowMarks)
>
> It seems once imark and omark are set, we can come out of the loop.
Maybe you right. fixed.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v29-0001-Remove-self-joins.patch	text/x-patch	62.9 KB

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2022-02-28 22:03:25
Message-ID:	CAM-w4HP9piqhdTLzmPGBLVxKvkRxdFYNchRK3+4Tkn_Rx9p1hQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
>
> Well in some cases they can't, when the query is not emitting redundant
> predicates by itself but they are added by something else like a view or a RLS
> policy.
> Maybe it would be worth it to allow spending a bit more time planning for
> those cases ?

Yeah, I'm generally in favour of doing more work in the optimizer to
save query authors work writing queries.

My question is whether it handles cases like:

select b.x,c.y
from t
join t2 as b on (b.id = t.id)
join t2 as c on (c.id = t.id)

That is, if you join against the same table twice on the same qual.
Does the EC mechanism turn this into a qual on b.id = c.id and then
turn this into a self-join that can be removed?

That's the usual pattern I've seen this arise. Not so much that people
write self joins explicitly but that they add a join to check some
column but that is happening in some isolated piece of code that
doesn't know that that join is already in the query. You can easily
end up with a lot of joins against the same table this way.

It's not far different from the old chestnut

select (select x from t2 where id = t.id) as x,
(select y from t2 where id = t.id) as y
from t

which is actually pretty hard to avoid sometimes.

--
greg

From:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Hywel Carver <hywel(at)skillerwhale(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Removing unneeded self joins
Date:	2022-03-01 00:09:49
Message-ID:	Yh1kTUtj7+S4Wygy@ahch-to
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg범퍼카 토토SQL

On Thu, Jul 15, 2021 at 05:49:11PM +0300, Andrey Lepikhov wrote:
> On 6/7/21 13:49, Hywel Carver wrote:
> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > Looking through the email chain, a previous version of this patch added
> > ~0.6% to planning time in the worst case tested - does that meet the
> > "essentially free" requirement?
> I think these tests weren't full coverage of possible use cases. It will
> depend on a number of relations in the query. For the JOIN of partitioned
> tables, for example, the overhead could grow. But in the context of overall
> planning time this overhead will be small till the large number of
> relations.
> Also, we made this feature optional to solve possible problems.
> Rebased on 768ea9bcf9
>

I made some tests in a machine with 16 cores and 32GB of RAM.
So we can see if this is an improvement.

This is what I found:

+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| test | mode | master | enabled | % | disabled | % |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| pgbench read only | standard | 64418.13 | 63942.94 | -0.74 | 62231.38 | -3.39 |
| pgbench read only | prepared | 108463.51 | 107002.13 | -1.35 | 100960.83 | -6.92 |
| pgbench read only | extended | 55409.65 | 56427.63 | 1.84 | 55927.62 | 0.93 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| pgbench read/write | standard | 9374.91 | 9135.21 | -2.56 | 8840.68 | -5.70 |
| pgbench read/write | prepared | 11849.86 | 11672.23 | -1.50 | 11393.39 | -3.85 |
| pgbench read/write | extended | 7976.80 | 7947.07 | -0.37 | 7788.99 | -2.35 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select non optimize 1 | standard | 80.97 | 81.29 | 0.40 | 81.30 | 0.41 |
| select non optimize 1 | prepared | 81.29 | 81.28 | -0.01 | 80.89 | -0.49 |
| select non optimize 1 | extended | 81.07 | 80.81 | -0.32 | 80.98 | -0.11 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select optimized 1 | standard | 15.84 | 13.90 |-12.25 | 15.80 | -0.25 |
| select optimized 1 | prepared | 15.24 | 13.82 | -9.32 | 15.55 | 2.03 |
| select optimized 1 | extended | 15.38 | 13.89 | -9.69 | 15.59 | 1.37 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select optimized 2 | standard | 10204.91 | 10818.39 | 6.01 | 10261.07 | 0.55 |
| select optimized 2 | prepared | 13284.06 | 15579.33 | 17.28 | 13116.22 | -1.26 |
| select optimized 2 | extended | 10143.43 | 10645.23 | 4.95 | 10142.77 | -0.01 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select shoe | standard | 5645.28 | 5661.71 | 0.29 | 6180.60 | 9.48 |
| select shoe | prepared | 9660.45 | 9602.37 | -0.60 | 9894.82 | 2.43 |
| select shoe | extended | 5666.47 | 5634.10 | -0.57 | 5757.26 | 1.60 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+

Obviously the pgbench runs are from the standard script. The numbers are
not clear for me, I can see improvementes with the patch only in one
case and, for some reason, if I disable the patch
(enable_self_join_removal='off') I still see a regression in normal
cases and curiosly an improvement in one case.

I'm attaching the queries. I used the users table that is down-thread
and loaded with ~200k rows using:

insert into users
select seq, case when random() < 0.2 then null else random() * 1000 end,
random() * 10000
from generate_series(1, 1000000) seq
on conflict (nullable_int) do nothing;

for master I just dumped the data from the table and loaded it. I'm also
attaching the queries I used.

After this tests, I'm not convinced this is actually providing something
performance-wise. At least not in its current state.

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachment	Content-Type	Size
select_nonoptimize1.sql	application/sql	74 bytes
select_optimize1.sql	application/sql	142 bytes
select_optimize2.sql	application/sql	155 bytes
select_shoe.sql	application/sql	105 bytes

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-03-01 02:45:19
Message-ID:	CAM-w4HMS0N9y73AzwTwU_K9=25JMs0cTUoo_8MR_83Gh+Ci7wA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I don't think the benchmarking that's needed is to check whether
pruning unnecessary joins is helpful. Obviously it's going to be hard
to measure on simple queries and small tables. But the resulting plan
is unambiguously superior and in more complex cases could extra i/o.

The benchmarking people were looking for in the past was testing the
impact of the extra planning work in cases where it doesn't end up
being applied. I'm not sure what the worst case is, perhaps a many-way
self-join where the join clauses are not suitable for pruning?

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Greg Stark <stark(at)mit(dot)edu>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2022-03-04 10:47:47
Message-ID:	13a0026b-46eb-cbfa-6dbb-4bc2ce210c27@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/3/2022 03:03, Greg Stark wrote:
> On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
>>
>> Well in some cases they can't, when the query is not emitting redundant
>> predicates by itself but they are added by something else like a view or a RLS
>> policy.
>> Maybe it would be worth it to allow spending a bit more time planning for
>> those cases ?
>
> Yeah, I'm generally in favour of doing more work in the optimizer to
> save query authors work writing queries.
>
> My question is whether it handles cases like:
>
> select b.x,c.y
> from t
> join t2 as b on (b.id = t.id)
> join t2 as c on (c.id = t.id)
>
> That is, if you join against the same table twice on the same qual.
> Does the EC mechanism turn this into a qual on b.id = c.id and then
> turn this into a self-join that can be removed?
Yes, the self-join removal machinery uses EC mechanism as usual to get
all join clauses. So, this case works (See demo in attachment).

Also, in new version of the patch I fixed one stupid bug: checking a
self-join candidate expression operator - we can remove only expressions
like F(arg1) = G(arg2).

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v30-0001-Remove-self-joins.patch	text/plain	64.0 KB
ec_usage_demo.sql	application/sql	379 bytes

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2022-03-22 00:58:39
Message-ID:	20220322005839.hmb34acerhmcc3ky@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg배트맨 토토SQL

Hi,

On 2022-03-04 15:47:47 +0500, Andrey Lepikhov wrote:
> Also, in new version of the patch I fixed one stupid bug: checking a
> self-join candidate expression operator - we can remove only expressions
> like F(arg1) = G(arg2).

This CF entry currently fails tests: https://cirrus-ci.com/task/4632127944785920?logs=test_world#L1938

Looks like you're missing an adjustment of postgresql.conf.sample

Marked as waiting-on-author.

Greetings,

Andres Freund

From:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2022-03-24 05:21:54
Message-ID:	de021f36-bb56-9e2e-6d3d-94fee25534da@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/22/22 05:58, Andres Freund wrote:
> Hi,
>
> On 2022-03-04 15:47:47 +0500, Andrey Lepikhov wrote:
>> Also, in new version of the patch I fixed one stupid bug: checking a
>> self-join candidate expression operator - we can remove only expressions
>> like F(arg1) = G(arg2).
>
> This CF entry currently fails tests: https://cirrus-ci.com/task/4632127944785920?logs=test_world#L1938
>
> Looks like you're missing an adjustment of postgresql.conf.sample
>
> Marked as waiting-on-author.
Thanks, I fixed it.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v31-0001-Remove-self-joins.patch	text/x-patch	64.0 KB

From:	Greg Stark <stark(at)mit(dot)edu>
To:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2022-04-01 15:27:56
Message-ID:	CAM-w4HMN5yax2bLWwknW5aB9Deg5A77JT1AXdVkdGVSVUpfhjQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

Sigh. And now there's a patch conflict in a regression test expected
output: sysviews.out

Please rebase. Incidentally, make sure to check the expected output is
actually correct. It's easy to "fix" an expected output to
accidentally just memorialize an incorrect output.

Btw, it's the last week before feature freeze so time is of the essence.

From:	"Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2022-04-04 08:27:45
Message-ID:	fd75d292-593b-a59f-6350-1c8944a80337@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/1/22 20:27, Greg Stark wrote:
> Sigh. And now there's a patch conflict in a regression test expected
> output: sysviews.out
>
> Please rebase. Incidentally, make sure to check the expected output is
> actually correct. It's easy to "fix" an expected output to
> accidentally just memorialize an incorrect output.
>
> Btw, it's the last week before feature freeze so time is of the essence.Thanks,
patch in attachment rebased on current master.
Sorry for late answer.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v32-0001-Remove-self-joins.patch	text/x-patch	64.0 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-05-13 05:07:47
Message-ID:	c0f41cba-659d-c6e1-7fb4-85e703dc2395@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

New version of the feature.
Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
when uniqueness of an inner deduced not from join quals, but from a
baserestrictinfo clauses 'x=const', where x - unique field.
Code, dedicated to solve second issue is controversial, so i attached
delta.txt for quick observation.
Maybe we should return to previous version of code, when we didn't split
restriction list into join quals and base quals?

--
Regards
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v33-0001-Remove-self-joins.patch	text/x-patch	84.3 KB
delta.txt	text/plain	25.2 KB

From:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To:	Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-05-17 14:14:46
Message-ID:	2537176.Lt9SDvczpP@aivenronan
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg메이저 토토 사이트SQL

Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
> New version of the feature.
> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
> when uniqueness of an inner deduced not from join quals, but from a
> baserestrictinfo clauses 'x=const', where x - unique field.
> Code, dedicated to solve second issue is controversial, so i attached
> delta.txt for quick observation.
> Maybe we should return to previous version of code, when we didn't split
> restriction list into join quals and base quals?

Hello,

I tried to find problematic cases, which would make the planning time grow
unacceptably, and couldn't devise it.

The worst case scenario I could think of was as follows:

- a query with many different self joins
- an abundance of unique indexes on combinations of this table columns to
consider
- additional predicates on the where clause on columns.

The base table I used for this was a table with 40 integers. 39 unique indexes
were defined on every combination of (c1, cX) with cX being columns c2 to c40.

I turned geqo off, set from_collapse_limit and join_collapse_limit to
unreasonably high values (30), and tried to run queries of the form:

SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.

So no self join can be eliminated in that case.
The performance was very similar with or without the GUC enabled. I tested the
same thing without the patch, since the test for uniqueness has been slightly
altered and incurs a new allocation, but it doesn't seem to change.

One interesting side effect of this patch, is that removing any unneeded self
join cuts down the planification time very significantly, as we lower the number
of combinations to consider.

As for the code:

- Comments on relation_has_unique_index_ext and relation_has_unique_index_for
should be rewritten, as relation_has_unique_index_for is now just a special
case of relation_has_unique_index_ext. By the way, the comment would probably
be better read as: "but if extra_clauses isn't NULL".
- The whole thing about "extra_clauses", ie, baserestrictinfos which were
used to determine uniqueness, is not very clear. Most functions where the new
argument has been added have not seen an update in their comments, and the
name itself doesn't really convey the intented meaning: perhaps
required_non_join_clauses ?

The way this works should be explained a bit more thoroughly, for example in
remove_self_joins_one_group the purpose of uclauses should be explained. The
fact that degenerate_case returns true when we don't have any additional base
restrict info is also confusing, as well as the degenerate_case name.

I'll update if I think of more interesting things to add.

--
Ronan Dunklau

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Greg Stark <stark(at)mit(dot)edu>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-05-19 10:48:18
Message-ID:	a1d6290c-44e0-0dfc-3fca-66a68b3109ef@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/17/22 19:14, Ronan Dunklau wrote:
> Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
>> New version of the feature.
>> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
>> when uniqueness of an inner deduced not from join quals, but from a
>> baserestrictinfo clauses 'x=const', where x - unique field.
>> Code, dedicated to solve second issue is controversial, so i attached
>> delta.txt for quick observation.
>> Maybe we should return to previous version of code, when we didn't split
>> restriction list into join quals and base quals?
>
> Hello,
>
> I tried to find problematic cases, which would make the planning time grow
> unacceptably, and couldn't devise it.
>
> The worst case scenario I could think of was as follows:
>
> - a query with many different self joins
> - an abundance of unique indexes on combinations of this table columns to
> consider
> - additional predicates on the where clause on columns.
Looking into the patch I can imagine, that the most difficult case is
when a set of relations with the same OID is huge, but only small part
of them (or nothing) can be removed.
Also, removing a clause from restrictinfo list or from equivalence class
adds non-linear complexity. So, you can dig this way ).
>
> The base table I used for this was a table with 40 integers. 39 unique indexes
> were defined on every combination of (c1, cX) with cX being columns c2 to c40.
>
> I turned geqo off, set from_collapse_limit and join_collapse_limit to
> unreasonably high values (30), and tried to run queries of the form:
>
> SELECT * FROM test_table t1
> JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
> ...
> JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.
>
> So no self join can be eliminated in that case.
I think, you should compare t1.cX with tX.cX to eliminate self-join.
Cross-unique-index proof isn't supported now.
> The performance was very similar with or without the GUC enabled. I tested the
> same thing without the patch, since the test for uniqueness has been slightly
> altered and incurs a new allocation, but it doesn't seem to change.
>
> One interesting side effect of this patch, is that removing any unneeded self
> join cuts down the planification time very significantly, as we lower the number
> of combinations to consider.
Even more - removing a join we improve cardinality estimation.
>
> As for the code:
>
> - Comments on relation_has_unique_index_ext and relation_has_unique_index_for
> should be rewritten, as relation_has_unique_index_for is now just a special
> case of relation_has_unique_index_ext. By the way, the comment would probably
> be better read as: "but if extra_clauses isn't NULL".
> - The whole thing about "extra_clauses", ie, baserestrictinfos which were
> used to determine uniqueness, is not very clear. Most functions where the new
> argument has been added have not seen an update in their comments, and the
> name itself doesn't really convey the intented meaning: perhaps
> required_non_join_clauses ?
>
> The way this works should be explained a bit more thoroughly, for example in
> remove_self_joins_one_group the purpose of uclauses should be explained. The
> fact that degenerate_case returns true when we don't have any additional base
> restrict info is also confusing, as well as the degenerate_case name.
Agree,
but after this case thoughts wander in my head: should we make one step
back to pre-[1] approach? It looks like we have quite similar changes,
but without special function for a 'degenerate case' detection and
restrictlist splitting.
>
> I'll update if I think of more interesting things to add.
Thank you for your efforts!

See in attachment next version which fixes mistakes detected by
zyu(at)yugabyte(dot)com(dot)

[1]
/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com

--
Regards
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v34-0001-Remove-self-joins.patch	text/x-patch	83.9 KB

From:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To:	Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-05-19 11:47:36
Message-ID:	13021914.uLZWGnKmhe@aivenronan
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le jeudi 19 mai 2022, 12:48:18 CEST Andrey Lepikhov a écrit :
> On 5/17/22 19:14, Ronan Dunklau wrote:
> > Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
> >> New version of the feature.
> >> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
> >> when uniqueness of an inner deduced not from join quals, but from a
> >> baserestrictinfo clauses 'x=const', where x - unique field.
> >> Code, dedicated to solve second issue is controversial, so i attached
> >> delta.txt for quick observation.
> >> Maybe we should return to previous version of code, when we didn't split
> >> restriction list into join quals and base quals?
> >
> > Hello,
> >
> > I tried to find problematic cases, which would make the planning time grow
> > unacceptably, and couldn't devise it.
> >
> > The worst case scenario I could think of was as follows:
> > - a query with many different self joins
> > - an abundance of unique indexes on combinations of this table columns
> > to
> >
> > consider
> >
> > - additional predicates on the where clause on columns.
>
> Looking into the patch I can imagine, that the most difficult case is
> when a set of relations with the same OID is huge, but only small part
> of them (or nothing) can be removed.
> Also, removing a clause from restrictinfo list or from equivalence class
> adds non-linear complexity. So, you can dig this way ).
>
> > The base table I used for this was a table with 40 integers. 39 unique
> > indexes were defined on every combination of (c1, cX) with cX being
> > columns c2 to c40.
> >
> > I turned geqo off, set from_collapse_limit and join_collapse_limit to
> > unreasonably high values (30), and tried to run queries of the form:
> >
> > SELECT * FROM test_table t1
> > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
> > ...
> > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.
> >
> > So no self join can be eliminated in that case.
>
> I think, you should compare t1.cX with tX.cX to eliminate self-join.
> Cross-unique-index proof isn't supported now.

Yes, that's the point. I wanted to try to introduce as much complexity as I
could, without actually performing any self join elimination. The idea was to
try to come up with the worst case scenario.
>
> > The performance was very similar with or without the GUC enabled. I tested
> > the same thing without the patch, since the test for uniqueness has been
> > slightly altered and incurs a new allocation, but it doesn't seem to
> > change.
> >
> > One interesting side effect of this patch, is that removing any unneeded
> > self join cuts down the planification time very significantly, as we
> > lower the number of combinations to consider.
>
> Even more - removing a join we improve cardinality estimation.
>
> > As for the code:
> > - Comments on relation_has_unique_index_ext and
> > relation_has_unique_index_for>
> > should be rewritten, as relation_has_unique_index_for is now just a
> > special
> > case of relation_has_unique_index_ext. By the way, the comment would
> > probably be better read as: "but if extra_clauses isn't NULL".
> >
> > - The whole thing about "extra_clauses", ie, baserestrictinfos which
> > were
> >
> > used to determine uniqueness, is not very clear. Most functions where the
> > new argument has been added have not seen an update in their comments,
> > and the name itself doesn't really convey the intented meaning: perhaps
> > required_non_join_clauses ?
> >
> > The way this works should be explained a bit more thoroughly, for example
> > in remove_self_joins_one_group the purpose of uclauses should be
> > explained. The fact that degenerate_case returns true when we don't have
> > any additional base restrict info is also confusing, as well as the
> > degenerate_case name.
> Agree,
> but after this case thoughts wander in my head: should we make one step
> back to pre-[1] approach? It looks like we have quite similar changes,
> but without special function for a 'degenerate case' detection and
> restrictlist splitting.

I'll take a look at that one.

>
> > I'll update if I think of more interesting things to add.
>
> Thank you for your efforts!
>
> See in attachment next version which fixes mistakes detected by
> zyu(at)yugabyte(dot)com(dot)
>
> [1]
> /message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW
> 5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com

--
Ronan Dunklau

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Greg Stark <stark(at)mit(dot)edu>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-06-30 14:11:51
Message-ID:	e726cc25-de80-3818-66b2-1b893bb54602@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19/5/2022 16:47, Ronan Dunklau wrote:
> I'll take a look at that one.
New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v35-0001-Remove-self-joins.patch	text/plain	83.9 KB

From:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To:	Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-07-04 13:49:59
Message-ID:	2375492.jE0xQCEvom@aivenronan
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
> On 19/5/2022 16:47, Ronan Dunklau wrote:
> > I'll take a look at that one.
>
> New version of the patch, rebased on current master:
> 1. pgindent over the patch have passed.
> 2. number of changed files is reduced.
> 3. Some documentation and comments is added.

Hello Andrey,

Thanks for the updates.

The general approach seems sensible to me, so I'm going to focus on some details.

In a very recent thread [1], Tom Lane is proposing to add infrastructure to make Var aware of their nullability by outer joins. I wonder if that would help with avoiding the need for adding is not null clauses when the column is known not null ?
If we have a precedent for adding a BitmapSet to the Var itself, maybe the whole discussion regarding keeping track of nullability can be extended to the original column nullability ?

Also, I saw it was mentioned earlier in the thread but how difficult would it be to process the transformed quals through the EquivalenceClass machinery and the qual simplification ?
For example, if the target audience of this patch is ORM, or inlined views, it wouldn't surprise me to see queries of this kind in the wild, which could be avoided altogether:

postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a = s2.a where s1.b = 2 and s2.b =3;
QUERY PLAN
-----------------------------------------------------
Seq Scan on sj s2
Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
(2 lignes)

+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
....
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);

Why don't you use foreach() and foreach_delete_current macros for iterating and removing items in the lists, both in update_ec_members and update_ec_sources ?

+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))

I think this is more compact and easier to follow than the previous version, but I'm not sure how common it is in postgres source code to use that kind of construction ?

Some review about the comments:

I see you keep using the terms "the keeping relation" and "the removing relation" in reference to the relation that is kept and the one that is removed.
Aside from the grammar (the kept relation or the removed relation), maybe it would make it clearer to call them something else. In other parts of the code, you used "the remaining relation / the removed relation" which makes sense.

/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.

This could be rephrased: ", optionally replacing it with another relid. The latter is used by the self-join removing process."

[1] /message-id/flat/830269.1656693747%40sss.pgh.pa.us

--
Ronan Dunklau

From:	Zhihong Yu <zyu(at)yugabyte(dot)com>
To:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-07-04 14:16:31
Message-ID:	CALNJ-vSeE9xzeFK9gM+kNqNF1csM1sH2epeeTZUbS6WFhXraGA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 4, 2022 at 6:52 AM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:

> Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
> > On 19/5/2022 16:47, Ronan Dunklau wrote:
> > > I'll take a look at that one.
> >
> > New version of the patch, rebased on current master:
> > 1. pgindent over the patch have passed.
> > 2. number of changed files is reduced.
> > 3. Some documentation and comments is added.
>
> Hello Andrey,
>
> Thanks for the updates.
>
> The general approach seems sensible to me, so I'm going to focus on some
> details.
>
> In a very recent thread [1], Tom Lane is proposing to add infrastructure
> to make Var aware of their nullability by outer joins. I wonder if that
> would help with avoiding the need for adding is not null clauses when the
> column is known not null ?
> If we have a precedent for adding a BitmapSet to the Var itself, maybe the
> whole discussion regarding keeping track of nullability can be extended to
> the original column nullability ?
>
> Also, I saw it was mentioned earlier in the thread but how difficult would
> it be to process the transformed quals through the EquivalenceClass
> machinery and the qual simplification ?
> For example, if the target audience of this patch is ORM, or inlined
> views, it wouldn't surprise me to see queries of this kind in the wild,
> which could be avoided altogether:
>
> postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a =
> s2.a where s1.b = 2 and s2.b =3;
> QUERY PLAN
> -----------------------------------------------------
> Seq Scan on sj s2
> Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
> (2 lignes)
>
>
> + for (counter = 0; counter < list_length(*sources);)
> + {
> + ListCell *cell = list_nth_cell(*sources, counter);
> + RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
> + int counter1;
> +
> ....
> + ec->ec_members = list_delete_cell(ec->ec_members, cell);
>
>
> Why don't you use foreach() and foreach_delete_current macros for
> iterating and removing items in the lists, both in update_ec_members and
> update_ec_sources ?
>
>
> + if ((bms_is_member(k, info->syn_lefthand) ^
> + bms_is_member(r,
> info->syn_lefthand)) ||
> + (bms_is_member(k,
> info->syn_righthand) ^
> + bms_is_member(r,
> info->syn_righthand)))
>
> I think this is more compact and easier to follow than the previous
> version, but I'm not sure how common it is in postgres source code to use
> that kind of construction ?
>
> Some review about the comments:
>
>
> I see you keep using the terms "the keeping relation" and "the removing
> relation" in reference to the relation that is kept and the one that is
> removed.
> Aside from the grammar (the kept relation or the removed relation), maybe
> it would make it clearer to call them something else. In other parts of the
> code, you used "the remaining relation / the removed relation" which makes
> sense.
>
> /*
> * Remove the target relid from the planner's data structures, having
> - * determined that there is no need to include it in the query.
> + * determined that there is no need to include it in the query. Or replace
> + * with another relid.
> + * To reusability, this routine can work in two modes: delete relid from
> a plan
> + * or replace it. It is used in replace mode in a self-join removing
> process.
>
> This could be rephrased: ", optionally replacing it with another relid.
> The latter is used by the self-join removing process."
>
>
> [1]
> /message-id/flat/830269.1656693747%40sss.pgh.pa.us
>
> --
> Ronan Dunklau
>
>
> Hi,
bq. this is more compact and easier to follow than the previous version

A code comment can be added above the expression (involving XOR) to explain
the purpose of the expression.

Cheers

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Greg Stark <stark(at)mit(dot)edu>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-08-26 12:22:08
Message-ID:	fb94ea63-2558-232b-b7fc-8101f826e91e@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토 사이트SQL

On 30/6/2022 17:11, Andrey Lepikhov wrote:
> On 19/5/2022 16:47, Ronan Dunklau wrote:
>> I'll take a look at that one.
> New version of the patch, rebased on current master:
> 1. pgindent over the patch have passed.
> 2. number of changed files is reduced.
> 3. Some documentation and comments is added.
New version rebased on new master, minor changes and tests added.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v36-0001-Remove-self-joins.patch	text/plain	86.9 KB

From:	Zhihong Yu <zyu(at)yugabyte(dot)com>
To:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-08-26 22:02:16
Message-ID:	CALNJ-vS4PtC0e53=8fTEe8t3U5d4cTOJYL1PAQsn4=aBk9C5tQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg범퍼카 토토SQL

Hi,
For v36-0001-Remove-self-joins.patch :

bq removes inner join of plane table to itself

plane table -> plain table

For relation_has_unique_index_ext(), it seems when extra_clauses is NULL,
there is no need to compute `exprs`.

Cheers

From:	Zhihong Yu <zyu(at)yugabyte(dot)com>
To:	Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-08-28 23:39:09
Message-ID:	CALNJ-vTyL-LpvSOPZxpC63Et3LJLUAFZSfRqGEhT5Rj7_EEj7w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg윈 토토SQL :

On Fri, Aug 26, 2022 at 3:02 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:

> Hi,
> For v36-0001-Remove-self-joins.patch :
>
> bq removes inner join of plane table to itself
>
> plane table -> plain table
>
> For relation_has_unique_index_ext(), it seems when extra_clauses is NULL,
> there is no need to compute `exprs`.
>
> Cheers
>

For remove_self_joins_recurse():

+ if (bms_num_members(relids) > join_collapse_limit)
+ break;

The above line just comes out of the switch statement. This check should be
done again between foreach and switch.
Otherwise the above check wouldn't achieve what you want.

Cheers

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Zhihong Yu <zyu(at)yugabyte(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-08-29 05:05:29
Message-ID:	0fd15c10-eaa5-51bd-b5c7-e635fe03a94d@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 8/29/22 04:39, Zhihong Yu wrote:
>
>
> On Fri, Aug 26, 2022 at 3:02 PM Zhihong Yu <zyu(at)yugabyte(dot)com
> <mailto:zyu(at)yugabyte(dot)com>> wrote:
>
> Hi,
> For v36-0001-Remove-self-joins.patch :
>
> bq removes inner join of plane table to itself
>
> plane table -> plain table
>
> For relation_has_unique_index_ext(), it seems when extra_clauses
> is NULL, there is no need to compute `exprs`.
>
> Cheers
Done
>
>
> For remove_self_joins_recurse():
>
> + if (bms_num_members(relids) > join_collapse_limit)
> + break;
>
> The above line just comes out of the switch statement. This check should
> be done again between foreach and switch.
> Otherwise the above check wouldn't achieve what you want.
>
> Cheers
Thanks for highlighting the problem.
I guess, usage either of join_collapse_limit or from_collapse_limit
isn't practical here.
That we really afraid here - many senseless search cycles of self-joins.
And it may have higher limit than GUCs above. So I introduced a guc,
called "self_join_search_limit" (so far undocumented) that is an
explicit limit for a set of plain relations in FROM-list to search
self-joins.

--
Regards
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v37-0001-Remove-self-joins.patch	text/x-patch	89.3 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Zhihong Yu <zyu(at)yugabyte(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-10-05 12:25:18
Message-ID:	d9493d5a-b950-465e-f09b-62c92a23fba1@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

New version, rebased onto current master.
Nothing special, just rebase.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v38-0001-Remove-self-joins.patch	text/plain	89.3 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Zhihong Yu <zyu(at)yugabyte(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Greg Stark <stark(at)mit(dot)edu>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-12-06 18:46:43
Message-ID:	20221206184643.zya44h5auv6f45zp@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2022-10-05 17:25:18 +0500, Andrey Lepikhov wrote:
> New version, rebased onto current master.
> Nothing special, just rebase.

This doesn't pass the main regression tests due to a plan difference.
https://cirrus-ci.com/task/5537518245380096
https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 +0000
@@ -5806,7 +5806,7 @@
Nested Loop
Join Filter: (sj_t3.id = sj_t1.id)
-> Nested Loop
- Join Filter: (sj_t3.id = sj_t2.id)
+ Join Filter: (sj_t2.id = sj_t3.id)
-> Nested Loop Semi Join
-> Nested Loop
-> HashAggregate

Greetings,

Andres Freund

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Zhihong Yu <zyu(at)yugabyte(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Greg Stark <stark(at)mit(dot)edu>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2022-12-16 06:45:30
Message-ID:	36579124-f9d9-a51e-38b3-af3511921c7e@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/6/22 23:46, Andres Freund wrote:
> This doesn't pass the main regression tests due to a plan difference.
> https://cirrus-ci.com/task/5537518245380096
> https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs
>
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 +0000
> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 +0000
> @@ -5806,7 +5806,7 @@
> Nested Loop
> Join Filter: (sj_t3.id = sj_t1.id)
> -> Nested Loop
> - Join Filter: (sj_t3.id = sj_t2.id)
> + Join Filter: (sj_t2.id = sj_t3.id)
> -> Nested Loop Semi Join
> -> Nested Loop
> -> HashAggregate
This change in the test behaviour is induced by the a5fc4641
"Avoid making commutatively-duplicate clauses in EquivalenceClasses."
Nothing special, as I see. Attached patch fixes this.

--
Regards
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v39-0001-Remove-self-joins.patch	text/x-patch	89.3 KB

From:	Michał Kłeczek <michal(at)kleczek(dot)org>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-03-06 05:30:03
Message-ID:	88079CBD-B5BB-4C61-A68B-A8C67599E1C7@kleczek.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 사이트 순위SQL

Hi All,

I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several commit tests.

I think this patch would be really beneficial for us as we heavily use views to structure out code.
Each view is responsible for providing some calculated values and they are joined in a query to retrieve the full set of information.

Not sure how the process works and how I could help (I am absolutely not capable of helping with coding I am afraid - but could sponsor a (small :) ) bounty to speed things up).

Thanks,
Michal

> On 16 Dec 2022, at 07:45, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> On 12/6/22 23:46, Andres Freund wrote:
>> This doesn't pass the main regression tests due to a plan difference.
>> https://cirrus-ci.com/task/5537518245380096
>> https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs
>> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
>> --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 +0000
>> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 +0000
>> @@ -5806,7 +5806,7 @@
>> Nested Loop
>> Join Filter: (sj_t3.id = sj_t1.id)
>> -> Nested Loop
>> - Join Filter: (sj_t3.id = sj_t2.id)
>> + Join Filter: (sj_t2.id = sj_t3.id)
>> -> Nested Loop Semi Join
>> -> Nested Loop
>> -> HashAggregate
> This change in the test behaviour is induced by the a5fc4641
> "Avoid making commutatively-duplicate clauses in EquivalenceClasses."
> Nothing special, as I see. Attached patch fixes this.
>
> --
> Regards
> Andrey Lepikhov
> Postgres Professional
> <v39-0001-Remove-self-joins.patch>

From:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>
To:	Michał Kłeczek <michal(at)kleczek(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-04-03 20:30:52
Message-ID:	CAM-w4HMZXW+hKc8C+WLQ+gdpEkoh9378a2UGKyZRz7mZVerG+Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek <michal(at)kleczek(dot)org> wrote:
>
> Hi All,
>
> I just wanted to ask about the status and plans for this patch.
> I can see it being stuck at “Waiting for Author” status in several commit tests.

Sadly it seems to now be badly in need of a rebase. There are large
hunks failing in the guts of analyzejoins.c as well as minor failures
elsewhere and lots of offsets which need to be reviewed.

I think given the lack of activity it's out of time for this release
at this point. I'm moving it ahead to the next CF.

--
Gregory Stark
As Commitfest Manager

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-05-25 09:40:35
Message-ID:	f1eee810-ccd5-cdee-775f-80c87c74f49b@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg무지개 토토SQL

On 3/6/23 10:30, Michał Kłeczek wrote:
> Hi All,
>
> I just wanted to ask about the status and plans for this patch.
> I can see it being stuck at “Waiting for Author” status in several
> commit tests.
>
> I think this patch would be really beneficial for us as we heavily use
> views to structure out code.
> Each view is responsible for providing some calculated values and
they > are joined in a query to retrieve the full set of information.
>
> Not sure how the process works and how I could help (I am absolutely
> not capable of helping with coding I am afraid - but could sponsor a
> (small :) ) bounty to speed things up).

Yes, I am still working on this feature. Because of significant changes
in the optimizer code which Tom & Richard had been doing last months, I
didn't touch it for a while. But now this work can be continued.

Current patch is rebased on current master. Because of the nullable_rels
logic, introduced recently, ojrelids were highly spreaded across planner
bitmapsets. So, JE logic was changed.

But now, I'm less happy with the code. It seems we need to refactor it:
1. According to reports of some performance engineers, the feature can
cause overhead ~0.5% on trivial queries without joins at all. We should
discover the patch and find the way for quick and cheap return, if the
statement contains no one join or, maybe stronger, no one self join.
2. During join elimination we replace clauses like 'x=x' with 'x IS NOT
NULL'. It is a weak point because we change clause semantic
(mergejoinable to non-mergejoinable, in this example) and could forget
consistently change some RestrictInfo fields.
3. In the previous versions we changed the remove_rel_from_query routine
trying to use it in both 'Useless outer join' and 'Self join'
elimination optimizations. Now, because of the 'ojrelid' field it looks
too complicated. Do we need to split this routine again?

--
Regards
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v40-0001-Remove-self-joins.patch	text/x-patch	93.8 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-06-30 10:50:19
Message-ID:	c97574c4-13e6-9107-26ab-af2bd626fc85@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/4/2023 02:30, Gregory Stark (as CFM) wrote:
> On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek <michal(at)kleczek(dot)org> wrote:
>>
>> Hi All,
>>
>> I just wanted to ask about the status and plans for this patch.
>> I can see it being stuck at “Waiting for Author” status in several commit tests.
>
> Sadly it seems to now be badly in need of a rebase. There are large
> hunks failing in the guts of analyzejoins.c as well as minor failures
> elsewhere and lots of offsets which need to be reviewed.
>
> I think given the lack of activity it's out of time for this release
> at this point. I'm moving it ahead to the next CF.
Hi,

Version 41 is heavily remade of the feature:

1. In previous versions, I tried to reuse remove_rel_from_query() for
both left and self-join removal. But for now, I realized that it is a
bit different procedures which treat different operations. In this
patch, only common stages of the PlannerInfo fixing process are united
in one function.
2. Transferring clauses from the removing table to keeping one is more
transparent now and contains comments.
3. Equivalence classes update procedure was changed according to David's
commit 3373c71. As I see, Tom has added remove_rel_from_eclass since the
last v.40 version, and it looks pretty similar to the update_eclass
routine in this patch.

It passes regression tests, but some questions are still open:
1. Should we look for duplicated or redundant clauses (the same for
eclasses) during the clause transfer procedure? On the one side, we
reduce the length of restrict lists that can impact planning or
executing time. Additionally, we improve the accuracy of cardinality
estimation. On the other side, it is one more place that can make
planning time much longer in specific cases. It would have been better
to avoid calling the equal() function here, but it's the only way to
detect duplicated inequality expressions.
2. Could we reuse ChangeVarNodes instead of sje_walker(), merge
remove_rel_from_restrictinfo with replace_varno?
3. Also, I still don't finish with the split_selfjoin_quals: some
improvements could be made.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v41-0001-Remove-self-joins.patch	text/plain	92.7 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-07-05 14:28:59
Message-ID:	bcf2f3c5-fbdc-f85b-af3e-ebec616160f6@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

During the significant code revision in v.41 I lost some replacement
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by
baserestrictinfos if no one self-join clause have existed for the join.
Now, it is fixed, and the test has been added.
To understand changes readily, see the delta file in the attachment.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v41-v42-delta.txt	text/plain	6.5 KB
v42-0001-Remove-self-joins.patch	text/plain	96.6 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-09-12 11:49:05
Message-ID:	05469e80-454b-41d7-d04f-421f42bbd4a9@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/7/2023 21:28, Andrey Lepikhov wrote:
> Hi,
>
> During the significant code revision in v.41 I lost some replacement
> operations. Here is the fix and extra tests to check this in the future.
> Also, Tom added the JoinDomain structure five months ago, and I added
> code to replace relids for that place too.
> One more thing, I found out that we didn't replace SJs, defined by
> baserestrictinfos if no one self-join clause have existed for the join.
> Now, it is fixed, and the test has been added.
> To understand changes readily, see the delta file in the attachment.
Here is new patch in attachment. Rebased on current master and some
minor gaffes are fixed.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
v43-0001-Remove-self-joins.patch	text/plain	96.6 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-04 00:12:02
Message-ID:	CAPpHfdt9G3EU+vXuqexhweEQoMF-acPUUFuB6hV2Kg0gCy7AjA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토 사이트SQL

Hi!

I think this is a neat optimization.

On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 5/7/2023 21:28, Andrey Lepikhov wrote:
> > During the significant code revision in v.41 I lost some replacement
> > operations. Here is the fix and extra tests to check this in the future.
> > Also, Tom added the JoinDomain structure five months ago, and I added
> > code to replace relids for that place too.
> > One more thing, I found out that we didn't replace SJs, defined by
> > baserestrictinfos if no one self-join clause have existed for the join.
> > Now, it is fixed, and the test has been added.
> > To understand changes readily, see the delta file in the attachment.
> Here is new patch in attachment. Rebased on current master and some
> minor gaffes are fixed.

I went through the thread and I think the patch gets better shape. A
couple of notes from my side.
1) Why replace_relid() makes a copy of lids only on insert/replace of
a member, but performs deletion in-place?
2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?

Links
1. /message-id/2375492.jE0xQCEvom%40aivenronan
2. /message-id/flat/830269.1656693747%40sss.pgh.pa.us

------
Regards,
Alexander Korotkov

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-04 06:55:59
Message-ID:	ebe66358-428c-439f-85dd-da72c9b2b0c2@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/10/2023 07:12, Alexander Korotkov wrote:
> Hi!
Thanks for the review!
>
> I think this is a neat optimization.
>
> On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 5/7/2023 21:28, Andrey Lepikhov wrote:
>>> During the significant code revision in v.41 I lost some replacement
>>> operations. Here is the fix and extra tests to check this in the future.
>>> Also, Tom added the JoinDomain structure five months ago, and I added
>>> code to replace relids for that place too.
>>> One more thing, I found out that we didn't replace SJs, defined by
>>> baserestrictinfos if no one self-join clause have existed for the join.
>>> Now, it is fixed, and the test has been added.
>>> To understand changes readily, see the delta file in the attachment.
>> Here is new patch in attachment. Rebased on current master and some
>> minor gaffes are fixed.
>
> I went through the thread and I think the patch gets better shape. A
> couple of notes from my side.
> 1) Why replace_relid() makes a copy of lids only on insert/replace of
> a member, but performs deletion in-place?

Shortly speaking, it was done according to the 'Paranoid' strategy.
The main reason for copying before deletion was the case with the rinfo
required_relids and clause_relids. They both point to the same Bitmapset
in some cases. And we feared such things for other fields.
Right now, it may be redundant because we resolved the issue mentioned
above in replace_varno_walker.

Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.

> 2) It would be nice to skip the insertion of IS NOT NULL checks when
> they are not necessary. [1] points that infrastructure from [2] might
> be useful. The patchset from [2] seems committed mow. However, I
> can't see it is directly helpful in this matter. Could we just skip
> adding IS NOT NULL clause for the columns, that have
> pg_attribute.attnotnull set?
Thanks for the links, I will look into that case.
>
> Links
> 1. /message-id/2375492.jE0xQCEvom%40aivenronan
> 2. /message-id/flat/830269.1656693747%40sss.pgh.pa.us

--
regards,
Andrey Lepikhov
Postgres Professional

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-04 07:34:15
Message-ID:	CAPpHfdt-0kVV7O==aJEbjY2iGYBu+XBzTHEbPv_6sVNeC7fffQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 결과SQL

Hi!

On Wed, Oct 4, 2023 at 9:56 AM Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:
> On 4/10/2023 07:12, Alexander Korotkov wrote:
> > On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> >> On 5/7/2023 21:28, Andrey Lepikhov wrote:
> >>> During the significant code revision in v.41 I lost some replacement
> >>> operations. Here is the fix and extra tests to check this in the
future.
> >>> Also, Tom added the JoinDomain structure five months ago, and I added
> >>> code to replace relids for that place too.
> >>> One more thing, I found out that we didn't replace SJs, defined by
> >>> baserestrictinfos if no one self-join clause have existed for the
join.
> >>> Now, it is fixed, and the test has been added.
> >>> To understand changes readily, see the delta file in the attachment.
> >> Here is new patch in attachment. Rebased on current master and some
> >> minor gaffes are fixed.
> >
> > I went through the thread and I think the patch gets better shape. A
> > couple of notes from my side.
> > 1) Why replace_relid() makes a copy of lids only on insert/replace of
> > a member, but performs deletion in-place?
>
> Shortly speaking, it was done according to the 'Paranoid' strategy.
> The main reason for copying before deletion was the case with the rinfo
> required_relids and clause_relids. They both point to the same Bitmapset
> in some cases. And we feared such things for other fields.
> Right now, it may be redundant because we resolved the issue mentioned
> above in replace_varno_walker.

OK, but my point is still that you should be paranoid in all the cases or
none of the cases. Right now (newId < 0) branch doesn't copy source
relids, but bms_is_member(oldId, relids) does copy. Also, I think whether
we copy or not should be reflected in the function comment.

/*
* Substitute newId by oldId in relids.
*/
static Bitmapset *
replace_relid(Relids relids, int oldId, int newId)
{
if (oldId < 0)
return relids;

if (newId < 0)
/* Delete relid without substitution. */
return bms_del_member(relids, oldId);

if (bms_is_member(oldId, relids))
return bms_add_member(bms_del_member(bms_copy(relids), oldId),
newId);

return relids;
}

> Relid replacement machinery is the most contradictory code here. We used
> a utilitarian approach and implemented a simplistic variant.

> > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> > they are not necessary. [1] points that infrastructure from [2] might
> > be useful. The patchset from [2] seems committed mow. However, I
> > can't see it is directly helpful in this matter. Could we just skip
> > adding IS NOT NULL clause for the columns, that have
> > pg_attribute.attnotnull set?
> Thanks for the links, I will look into that case.

OK, thank you.

------
Regards,
Alexander Korotkov

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-05 07:20:21
Message-ID:	5ff80147-18a0-4295-b508-638e90a4df71@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg롤 토토SQL :

On 4/10/2023 14:34, Alexander Korotkov wrote:
> Hi!
>
> On Wed, Oct 4, 2023 at 9:56 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > On 4/10/2023 07:12, Alexander Korotkov wrote:
> > > On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
> > > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > >> On 5/7/2023 21:28, Andrey Lepikhov wrote:
> > >>> During the significant code revision in v.41 I lost some replacement
> > >>> operations. Here is the fix and extra tests to check this in the
> future.
> > >>> Also, Tom added the JoinDomain structure five months ago, and I added
> > >>> code to replace relids for that place too.
> > >>> One more thing, I found out that we didn't replace SJs, defined by
> > >>> baserestrictinfos if no one self-join clause have existed for the
> join.
> > >>> Now, it is fixed, and the test has been added.
> > >>> To understand changes readily, see the delta file in the attachment.
> > >> Here is new patch in attachment. Rebased on current master and some
> > >> minor gaffes are fixed.
> > >
> > > I went through the thread and I think the patch gets better shape. A
> > > couple of notes from my side.
> > > 1) Why replace_relid() makes a copy of lids only on insert/replace of
> > > a member, but performs deletion in-place?
> >
> > Shortly speaking, it was done according to the 'Paranoid' strategy.
> > The main reason for copying before deletion was the case with the rinfo
> > required_relids and clause_relids. They both point to the same Bitmapset
> > in some cases. And we feared such things for other fields.
> > Right now, it may be redundant because we resolved the issue mentioned
> > above in replace_varno_walker.
>
> OK, but my point is still that you should be paranoid in all the cases
> or none of the cases. Right now (newId < 0) branch doesn't copy source
> relids, but bms_is_member(oldId, relids) does copy. Also, I think
> whether we copy or not should be reflected in the function comment.
>
> /*
> * Substitute newId by oldId in relids.
> */
> static Bitmapset *
> replace_relid(Relids relids, int oldId, int newId)
> {
> if (oldId < 0)
> return relids;
>
> if (newId < 0)
> /* Delete relid without substitution. */
> return bms_del_member(relids, oldId);
>
> if (bms_is_member(oldId, relids))
> return bms_add_member(bms_del_member(bms_copy(relids), oldId),
> newId);
>
> return relids;
> }

We tried to use replace_relid() for both cases of JOIN deletion:
unneeded outer join and self-join, and the relid deletion is used only
in the first case. Such an approach was used there for a long time, and
we just didn't change it.
I am prone to removing the copy operation in the code of relid replacement.

>
> > Relid replacement machinery is the most contradictory code here. We used
> > a utilitarian approach and implemented a simplistic variant.
>
> > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> > > they are not necessary. [1] points that infrastructure from [2] might
> > > be useful. The patchset from [2] seems committed mow. However, I
> > > can't see it is directly helpful in this matter. Could we just skip
> > > adding IS NOT NULL clause for the columns, that have
> > > pg_attribute.attnotnull set?
> > Thanks for the links, I will look into that case.
Thanks for the curious issue.
The new field Var::varnullingrels introduced in [2] doesn't make sense
here, as I see: we operate with plain relations only, and I don't know
how it can be applied to an arbitrary subtree contained OUTER JOINs.
The second option, the attnotnull flag, can be used in this code. We
haven't implemented it because the process_equivalence routine doesn't
check the attnotnull before creating NullTest.
In general, it is not a difficult operation - we just need to add a
trivial get_attnotnull() routine to lssycache.c likewise
get_attgenerated() and other functions.
But, replace_varno uses the walker to change the relid. The mentioned
replacement, like
X=X --> X IS NOT NULL
can be applied on different levels of the expression, look:
A a1 JOIN A a2 ON (a1.id=a2.id) WHERE (a1.x AND (a1.y=a2.y))
Here, we can replace id=id and y=y. It may need some 'unwanted clauses'
collection procedure and a second pass through the expression tree to
remove them. It may add some unpredictable overhead.
We can replace such a clause with a trivial 'TRUE' clause, of course.
But is it the feature you have requested?

--
regards,
Andrey Lepikhov
Postgres Professional

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-05 09:17:08
Message-ID:	48c11b23-9c23-440e-9a3c-020e05196449@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/10/2023 14:34, Alexander Korotkov wrote:
> > Relid replacement machinery is the most contradictory code here. We used
> > a utilitarian approach and implemented a simplistic variant.
>
> > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> > > they are not necessary. [1] points that infrastructure from [2] might
> > > be useful. The patchset from [2] seems committed mow. However, I
> > > can't see it is directly helpful in this matter. Could we just skip
> > > adding IS NOT NULL clause for the columns, that have
> > > pg_attribute.attnotnull set?
> > Thanks for the links, I will look into that case.
To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:

DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);

We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
check_attnotnull.diff	text/plain	14.2 KB

From:	Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-10 19:29:58
Message-ID:	96f66ae3-df10-4060-9844-4c9633062cd3@yandex.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 꽁 머니SQL

Hi!

I have reviewed your patch and I noticed a few things.

First of all, I think I found a bug in your latest patch version, and
this query shows it:

EXPLAIN (COSTS OFF)
SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
SELECT e1.oid FROM pg_extension e1, pg_extension e2
WHERE e1.oid=e2.oid) AS e
ON c.oid=e.oid;

In the current version we get such a query plan:

               QUERY PLAN
-----------------------------------------
Hash Full Join
   Hash Cond: (c.oid = e2.oid)
   -> Seq Scan on pg_class c
   -> Hash
         -> Seq Scan on pg_extension e2
(5 rows)

But I think it should be:

QUERY PLAN
-----------------------------------------
Hash Full Join
Hash Cond: (c.oid = e2.oid)
-> Seq Scan on pg_class c
-> Hash
-> Seq Scan on pg_extension e2
*Filter: (oid IS NOT NULL)*
(6 rows)

I have looked at the latest version of the code, I assume that the error
lies in the replace_varno_walker function, especially in the place where
we check the node by type Var, and does not form any NullTest node.

if (OidIsValid(reloid) && get_attnotnull(reloid, attnum)) -- this
condition works
        {
          rinfo->clause = (Expr *) makeBoolConst(true, false);
        }
        else
        {
          NullTest   *ntest = makeNode(NullTest);

          ntest->arg = leftOp;
          ntest->nulltesttype = IS_NOT_NULL;
          ntest->argisrow = false;
          ntest->location = -1;
          rinfo->clause = (Expr *) ntest;
        }

Secondly, I added some code in some places to catch erroneous cases and
added a condition when we should not try to apply the self-join-removal
transformation due to the absence of an empty self-join list after
searching for it and in general if there are no joins in the query.
Besides, I added a query for testing and wrote about it above. I have
attached my diff file.

In addition, I found a comment for myself that was not clear to me. I
would be glad if you could explain it to me.

You mentioned superior outer join in the comment, unfortunately, I
didn't find anything about it in the PostgreSQL code, and this
explanation remained unclear to me. Could you explain in more detail
what you meant?

/*
* At this stage joininfo lists of inner and outer can contain
* only clauses, required for *a superior outer join* that can't
* influence on this optimization. So, we can avoid to call the
* build_joinrel_restrictlist() routine.
*/
restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);

Regards,
Alena Rybakina

Attachment	Content-Type	Size
self-join-removal.diff	text/x-patch	5.2 KB

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-11 03:51:26
Message-ID:	b2f522c6-c150-4f1f-a722-f70e9a75fe13@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 사이트SQL

On 11/10/2023 02:29, Alena Rybakina wrote:
> I have reviewed your patch and I noticed a few things.

Thanks for your efforts,

> I have looked at the latest version of the code, I assume that the error
> lies in the replace_varno_walker function, especially in the place where
> we check the node by type Var, and does not form any NullTest node.

It's not a bug, it's an optimization we discussed with Alexander above.

> Secondly, I added some code in some places to catch erroneous cases and
> added a condition when we should not try to apply the self-join-removal
> transformation due to the absence of an empty self-join list after
> searching for it and in general if there are no joins in the query.
> Besides, I added a query for testing and wrote about it above. I have
> attached my diff file.
Ok, I will look at this
> In addition, I found a comment for myself that was not clear to me. I
> would be glad if you could explain it to me.
>
> You mentioned superior outer join in the comment, unfortunately, I
> didn't find anything about it in the PostgreSQL code, and this
> explanation remained unclear to me. Could you explain in more detail
> what you meant?
I meant here that only clauses pushed by reconsider_outer_join_clauses()
can be found in the joininfo list, and they are not relevant, as you can
understand.
Having written that, I realized that it was a false statement. ;) -
joininfo can also contain results of previous SJE iterations, look:

CREATE TABLE test (oid int PRIMARY KEY);
CREATE UNIQUE INDEX ON test((oid*oid));
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid;

Having executed this SQL code, you could see that in the last query, the
SJE feature didn't delete one of the JOINs because of the reason I had
written above.
It's not an one-minute fix - I will try to propose solution a bit later.

--
regards,
Andrey Lepikhov
Postgres Professional

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-12 11:32:27
Message-ID:	CAPpHfdsQkY0Gu7bQ9Ekdyzo_5Xxz74TYKbwmk1on1WAASpxXcQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 4/10/2023 14:34, Alexander Korotkov wrote:
> > > Relid replacement machinery is the most contradictory code here. We used
> > > a utilitarian approach and implemented a simplistic variant.
> >
> > > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> > > > they are not necessary. [1] points that infrastructure from [2] might
> > > > be useful. The patchset from [2] seems committed mow. However, I
> > > > can't see it is directly helpful in this matter. Could we just skip
> > > > adding IS NOT NULL clause for the columns, that have
> > > > pg_attribute.attnotnull set?
> > > Thanks for the links, I will look into that case.
> To be more precise, in the attachment, you can find a diff to the main
> patch, which shows the volume of changes to achieve the desired behaviour.
> Some explains in regression tests shifted. So, I've made additional tests:
>
> DROP TABLE test CASCADE;
> CREATE TABLE test (a int, b int not null);
> CREATE UNIQUE INDEX abc ON test(b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> CREATE UNIQUE INDEX abc1 ON test(a,b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> DROP INDEX abc1;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
>
> We have almost the results we wanted to have. But in the last explain
> you can see that nothing happened with the OR clause. We should use the
> expression mutator instead of walker to handle such clauses. But It
> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> of this issue off for a while.

OK. I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code. Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it. That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).

Links
1. /message-id/CAKJS1f8ySSsBfooH3bJK7OD3LBEbDb99d8J_FtqDd6w50p-eAQ%40mail.gmail.com
2. /message-id/96f66ae3-df10-4060-9844-4c9633062cd3%40yandex.ru

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
0001-Remove-useless-self-joins-v44.patch	application/octet-stream	100.5 KB

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-13 02:55:13
Message-ID:	9eb86ca9-da3c-417f-8ce1-29c0ef5a4f50@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/10/2023 18:32, Alexander Korotkov wrote:
> On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
>> We have almost the results we wanted to have. But in the last explain
>> you can see that nothing happened with the OR clause. We should use the
>> expression mutator instead of walker to handle such clauses. But It
>> doesn't process the RestrictInfo node ... I'm inclined to put a solution
>> of this issue off for a while.
>
> OK. I think it doesn't worth to eliminate IS NULL quals with this
> complexity (at least at this stage of work).

Yeah. I think It would be meaningful in the case of replacing also
nested x IS NOT NULL with nothing. But it requires using a mutator
instead of the walker and may be done more accurately next time.

> I made improvements over the code. Mostly new comments, grammar
> corrections of existing comments and small refactoring.

Great!

> Also, I found that the suggestion from David Rowley [1] to qsort
> array of relations to faster find duplicates is still unaddressed.
> I've implemented it. That helps to evade quadratic complexity with
> large number of relations.

I see. The thread is too long so far, thanks for the catch.

> Also I've incorporated improvements from Alena Rybakina except one for
> skipping SJ removal when no SJ quals is found. It's not yet clear for
> me if this check fix some cases. But at least optimization got skipped
> in some useful cases (as you can see in regression tests).

Agree. I wouldn't say I like it too. But also, I suggest skipping some
unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the
negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
checked as is_opclause.
All these changes (see in the attachment) are optional.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
minor-changes-v44.diff	text/plain	1.7 KB

From:	"a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-13 08:56:37
Message-ID:	29ddec15-a83f-4f24-b069-6e93c189b169@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> Also I've incorporated improvements from Alena Rybakina except one for
>> skipping SJ removal when no SJ quals is found. It's not yet clear for
>> me if this check fix some cases. But at least optimization got skipped
>> in some useful cases (as you can see in regression tests).
>
> Agree. I wouldn't say I like it too. But also, I suggest skipping some
> unnecessary assertions proposed in that patch:
> Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the
> negative numbers, at least?
> Assert(is_opclause(orinfo->clause)); - above we skip clauses with
> rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
> checked as is_opclause.
> All these changes (see in the attachment) are optional.
>
I don't mind about asserts, maybe I misunderstood something in the patch.

About skipping SJ removal when no SJ quals is found, I assume it is
about it:

split_selfjoin_quals(root, restrictlist, &selfjoinquals,
&otherjoinquals, inner->relid,
outer->relid);

+            if (list_length(selfjoinquals) == 0)
+             {
+                 /*
+                 * XXX:
+                 * we would detect self-join without quals like 'x==x'
if we had
+                 * an foreign key constraint on some of other quals
and this join
+                 * haven't any columns from the outer in the target list.
+                 * But it is still complex task.
+                 */
+                 continue;
+             }

as far as I remember, this is the place where it is checked that the SJ
list is empty and it is logical, in my opinion, that no transformations
should be performed if no elements are found for them.

As for the cases where SJ did not work, maybe this is just right if
there are no elements for processing these cases. I'll try to check or
come up with tests for them. If I'm wrong, write.

On 11.10.2023 06:51, Andrei Lepikhov wrote:
> On 11/10/2023 02:29, Alena Rybakina wrote:
>> I have reviewed your patch and I noticed a few things.
>
> Thanks for your efforts,
>
>> I have looked at the latest version of the code, I assume that the
>> error lies in the replace_varno_walker function, especially in the
>> place where we check the node by type Var, and does not form any
>> NullTest node.
>
> It's not a bug, it's an optimization we discussed with Alexander above.
>
>> Secondly, I added some code in some places to catch erroneous cases
>> and added a condition when we should not try to apply the
>> self-join-removal transformation due to the absence of an empty
>> self-join list after searching for it and in general if there are no
>> joins in the query. Besides, I added a query for testing and wrote
>> about it above. I have attached my diff file.
> Ok, I will look at this
>> In addition, I found a comment for myself that was not clear to me. I
>> would be glad if you could explain it to me.
>>
>> You mentioned superior outer join in the comment, unfortunately, I
>> didn't find anything about it in the PostgreSQL code, and this
>> explanation remained unclear to me. Could you explain in more detail
>> what you meant?
> I meant here that only clauses pushed by
> reconsider_outer_join_clauses() can be found in the joininfo list, and
> they are not relevant, as you can understand.
> Having written that, I realized that it was a false statement. ;) -
> joininfo can also contain results of previous SJE iterations, look:
>
> CREATE TABLE test (oid int PRIMARY KEY);
> CREATE UNIQUE INDEX ON test((oid*oid));
> explain
> SELECT count(*)
> FROM test c1, test c2, test c3
> WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid;
> explain
> SELECT count(*)
> FROM test c1, test c2, test c3
> WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid;
> explain
> SELECT count(*)
> FROM test c1, test c2, test c3
> WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid;

Ok, I understood. Thank you for explanation.

--
Regards,
Alena Rybakina

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	"a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-13 09:03:50
Message-ID:	f052df9a-c714-4c60-92ba-9b5ae7e258b0@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg범퍼카 토토SQL

On 13/10/2023 15:56, a.rybakina wrote:
>
>>> Also I've incorporated improvements from Alena Rybakina except one for
>>> skipping SJ removal when no SJ quals is found. It's not yet clear for
>>> me if this check fix some cases. But at least optimization got skipped
>>> in some useful cases (as you can see in regression tests).
>>
>> Agree. I wouldn't say I like it too. But also, I suggest skipping some
>> unnecessary assertions proposed in that patch:
>> Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the
>> negative numbers, at least?
>> Assert(is_opclause(orinfo->clause)); - above we skip clauses with
>> rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
>> checked as is_opclause.
>> All these changes (see in the attachment) are optional.
>>
> I don't mind about asserts, maybe I misunderstood something in the patch.
>
> About skipping SJ removal when no SJ quals is found, I assume it is
> about it:
>
> split_selfjoin_quals(root, restrictlist, &selfjoinquals,
>                                  &otherjoinquals, inner->relid,
> outer->relid);
>
> +            if (list_length(selfjoinquals) == 0)
> +             {
> +                 /*
> +                 * XXX:
> +                 * we would detect self-join without quals like 'x==x'
> if we had
> +                 * an foreign key constraint on some of other quals
> and this join
> +                 * haven't any columns from the outer in the target list.
> +                 * But it is still complex task.
> +                 */
> +                 continue;
> +             }
>
> as far as I remember, this is the place where it is checked that the SJ
> list is empty and it is logical, in my opinion, that no transformations
> should be performed if no elements are found for them.
You forget we have "Degenerate" case, as Alexander mentioned above. What
if you have something like that:
SELECT ... FROM A a1, A a2 WHERE a1.id=1 AND a2.id=1;
In this case, uniqueness can be achieved by the baserestrictinfo
"A.id=1", if we have an unique index on this column.

--
regards,
Andrey Lepikhov
Postgres Professional

From:	"a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-13 09:42:35
Message-ID:	69a74357-102c-4591-9619-f672076135db@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13.10.2023 12:03, Andrei Lepikhov wrote:
> On 13/10/2023 15:56, a.rybakina wrote:
>>
>>>> Also I've incorporated improvements from Alena Rybakina except one for
>>>> skipping SJ removal when no SJ quals is found. It's not yet clear for
>>>> me if this check fix some cases. But at least optimization got skipped
>>>> in some useful cases (as you can see in regression tests).
>>>
>>> Agree. I wouldn't say I like it too. But also, I suggest skipping
>>> some unnecessary assertions proposed in that patch:
>>> Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all
>>> the negative numbers, at least?
>>> Assert(is_opclause(orinfo->clause)); - above we skip clauses with
>>> rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
>>> checked as is_opclause.
>>> All these changes (see in the attachment) are optional.
>>>
>> I don't mind about asserts, maybe I misunderstood something in the
>> patch.
>>
>> About skipping SJ removal when no SJ quals is found, I assume it is
>> about it:
>>
>> split_selfjoin_quals(root, restrictlist, &selfjoinquals,
>>                                   &otherjoinquals, inner->relid,
>> outer->relid);
>>
>> +            if (list_length(selfjoinquals) == 0)
>> +             {
>> +                 /*
>> +                 * XXX:
>> +                 * we would detect self-join without quals like
>> 'x==x' if we had
>> +                 * an foreign key constraint on some of other quals
>> and this join
>> +                 * haven't any columns from the outer in the target
>> list.
>> +                 * But it is still complex task.
>> +                 */
>> +                 continue;
>> +             }
>>
>> as far as I remember, this is the place where it is checked that the
>> SJ list is empty and it is logical, in my opinion, that no
>> transformations should be performed if no elements are found for them.
> You forget we have "Degenerate" case, as Alexander mentioned above.
> What if you have something like that:
> SELECT ... FROM A a1, A a2 WHERE a1.id=1 AND a2.id=1;
> In this case, uniqueness can be achieved by the baserestrictinfo
> "A.id=1", if we have an unique index on this column.
>
Yes, sorry, I missed it. thanks again for the explanation 🙂

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-16 08:28:13
Message-ID:	8800ccf2-ae82-461c-92a1-d4c0a22f8cf2@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/10/2023 18:32, Alexander Korotkov wrote:
> On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 4/10/2023 14:34, Alexander Korotkov wrote:
>>> > Relid replacement machinery is the most contradictory code here. We used
>>> > a utilitarian approach and implemented a simplistic variant.
>>>
>>> > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
>>> > > they are not necessary. [1] points that infrastructure from [2] might
>>> > > be useful. The patchset from [2] seems committed mow. However, I
>>> > > can't see it is directly helpful in this matter. Could we just skip
>>> > > adding IS NOT NULL clause for the columns, that have
>>> > > pg_attribute.attnotnull set?
>>> > Thanks for the links, I will look into that case.
>> To be more precise, in the attachment, you can find a diff to the main
>> patch, which shows the volume of changes to achieve the desired behaviour.
>> Some explains in regression tests shifted. So, I've made additional tests:
>>
>> DROP TABLE test CASCADE;
>> CREATE TABLE test (a int, b int not null);
>> CREATE UNIQUE INDEX abc ON test(b);
>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>> WHERE t1.b=t2.b;
>> CREATE UNIQUE INDEX abc1 ON test(a,b);
>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>> WHERE t1.b=t2.b;
>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
>> DROP INDEX abc1;
>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
>>
>> We have almost the results we wanted to have. But in the last explain
>> you can see that nothing happened with the OR clause. We should use the
>> expression mutator instead of walker to handle such clauses. But It
>> doesn't process the RestrictInfo node ... I'm inclined to put a solution
>> of this issue off for a while.
>
> OK. I think it doesn't worth to eliminate IS NULL quals with this
> complexity (at least at this stage of work).
>
> I made improvements over the code. Mostly new comments, grammar
> corrections of existing comments and small refactoring.
>
> Also, I found that the suggestion from David Rowley [1] to qsort
> array of relations to faster find duplicates is still unaddressed.
> I've implemented it. That helps to evade quadratic complexity with
> large number of relations.
>
> Also I've incorporated improvements from Alena Rybakina except one for
> skipping SJ removal when no SJ quals is found. It's not yet clear for
> me if this check fix some cases. But at least optimization got skipped
> in some useful cases (as you can see in regression tests).

I would like to propose one more minor improvement (see in attachment).
The idea here is that after removing a self-join and changing clauses we
should re-probe the set of relids with the same Oid, because we can find
more removable self-joins (see the demo test in join.sql).

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
iterative_elimination.diff	text/plain	4.4 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-18 18:50:37
Message-ID:	CAPpHfdvJ3jBQaJQKrf69K++kXi+Tvzn6TXO-DKXj0KVJjvMebg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 12/10/2023 18:32, Alexander Korotkov wrote:
> > On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> >> On 4/10/2023 14:34, Alexander Korotkov wrote:
> >>> > Relid replacement machinery is the most contradictory code here. We used
> >>> > a utilitarian approach and implemented a simplistic variant.
> >>>
> >>> > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> >>> > > they are not necessary. [1] points that infrastructure from [2] might
> >>> > > be useful. The patchset from [2] seems committed mow. However, I
> >>> > > can't see it is directly helpful in this matter. Could we just skip
> >>> > > adding IS NOT NULL clause for the columns, that have
> >>> > > pg_attribute.attnotnull set?
> >>> > Thanks for the links, I will look into that case.
> >> To be more precise, in the attachment, you can find a diff to the main
> >> patch, which shows the volume of changes to achieve the desired behaviour.
> >> Some explains in regression tests shifted. So, I've made additional tests:
> >>
> >> DROP TABLE test CASCADE;
> >> CREATE TABLE test (a int, b int not null);
> >> CREATE UNIQUE INDEX abc ON test(b);
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b;
> >> CREATE UNIQUE INDEX abc1 ON test(a,b);
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b;
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> >> DROP INDEX abc1;
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
> >>
> >> We have almost the results we wanted to have. But in the last explain
> >> you can see that nothing happened with the OR clause. We should use the
> >> expression mutator instead of walker to handle such clauses. But It
> >> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> >> of this issue off for a while.
> >
> > OK. I think it doesn't worth to eliminate IS NULL quals with this
> > complexity (at least at this stage of work).
> >
> > I made improvements over the code. Mostly new comments, grammar
> > corrections of existing comments and small refactoring.
> >
> > Also, I found that the suggestion from David Rowley [1] to qsort
> > array of relations to faster find duplicates is still unaddressed.
> > I've implemented it. That helps to evade quadratic complexity with
> > large number of relations.
> >
> > Also I've incorporated improvements from Alena Rybakina except one for
> > skipping SJ removal when no SJ quals is found. It's not yet clear for
> > me if this check fix some cases. But at least optimization got skipped
> > in some useful cases (as you can see in regression tests).
>
> I would like to propose one more minor improvement (see in attachment).
> The idea here is that after removing a self-join and changing clauses we
> should re-probe the set of relids with the same Oid, because we can find
> more removable self-joins (see the demo test in join.sql).

Thank you, I've integrated this into the patch. BTW, the patch
introduces two new GUC variables: enable_self_join_removal,
self_join_search_limit. enable_self_join_removal variable turns
on/off optimization at all. self_join_search_limit variable limits
its usage by the number of joins. AFICS, self_join_search_limit is
intended to protect us from quadratic complexity self-join removal
has. I tried to reproduce the extreme case.

SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
AND a100.aid = a99.aid + 1;

This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled. So, no measurable
overhead. Similar to the higher number of joins. Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts? If not, should we
remove self_join_search_limit GUC?

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
0001-Remove-useless-self-joins-v45.patch	application/octet-stream	103.0 KB

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, zyu(at)yugabyte(dot)com
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-19 03:16:05
Message-ID:	1bd80083-a13e-4e9f-a7ce-73c3faab521c@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19/10/2023 01:50, Alexander Korotkov wrote:
> On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 12/10/2023 18:32, Alexander Korotkov wrote:
>>> On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
>>> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>>>> On 4/10/2023 14:34, Alexander Korotkov wrote:
>>>>> > Relid replacement machinery is the most contradictory code here. We used
>>>>> > a utilitarian approach and implemented a simplistic variant.
>>>>>
>>>>> > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
>>>>> > > they are not necessary. [1] points that infrastructure from [2] might
>>>>> > > be useful. The patchset from [2] seems committed mow. However, I
>>>>> > > can't see it is directly helpful in this matter. Could we just skip
>>>>> > > adding IS NOT NULL clause for the columns, that have
>>>>> > > pg_attribute.attnotnull set?
>>>>> > Thanks for the links, I will look into that case.
>>>> To be more precise, in the attachment, you can find a diff to the main
>>>> patch, which shows the volume of changes to achieve the desired behaviour.
>>>> Some explains in regression tests shifted. So, I've made additional tests:
>>>>
>>>> DROP TABLE test CASCADE;
>>>> CREATE TABLE test (a int, b int not null);
>>>> CREATE UNIQUE INDEX abc ON test(b);
>>>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>>>> WHERE t1.b=t2.b;
>>>> CREATE UNIQUE INDEX abc1 ON test(a,b);
>>>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>>>> WHERE t1.b=t2.b;
>>>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>>>> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
>>>> DROP INDEX abc1;
>>>> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
>>>> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
>>>>
>>>> We have almost the results we wanted to have. But in the last explain
>>>> you can see that nothing happened with the OR clause. We should use the
>>>> expression mutator instead of walker to handle such clauses. But It
>>>> doesn't process the RestrictInfo node ... I'm inclined to put a solution
>>>> of this issue off for a while.
>>>
>>> OK. I think it doesn't worth to eliminate IS NULL quals with this
>>> complexity (at least at this stage of work).
>>>
>>> I made improvements over the code. Mostly new comments, grammar
>>> corrections of existing comments and small refactoring.
>>>
>>> Also, I found that the suggestion from David Rowley [1] to qsort
>>> array of relations to faster find duplicates is still unaddressed.
>>> I've implemented it. That helps to evade quadratic complexity with
>>> large number of relations.
>>>
>>> Also I've incorporated improvements from Alena Rybakina except one for
>>> skipping SJ removal when no SJ quals is found. It's not yet clear for
>>> me if this check fix some cases. But at least optimization got skipped
>>> in some useful cases (as you can see in regression tests).
>>
>> I would like to propose one more minor improvement (see in attachment).
>> The idea here is that after removing a self-join and changing clauses we
>> should re-probe the set of relids with the same Oid, because we can find
>> more removable self-joins (see the demo test in join.sql).
>
>
> Thank you, I've integrated this into the patch. BTW, the patch
> introduces two new GUC variables: enable_self_join_removal,
> self_join_search_limit. enable_self_join_removal variable turns
> on/off optimization at all. self_join_search_limit variable limits
> its usage by the number of joins. AFICS, self_join_search_limit is
> intended to protect us from quadratic complexity self-join removal
> has. I tried to reproduce the extreme case.
>
> SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
> pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
> AND a100.aid = a99.aid + 1;
>
> This query took 3778.432 ms with self-join removal disabled, and
> 3756.009 ms with self-join removal enabled. So, no measurable
> overhead. Similar to the higher number of joins. Can you imagine
> some extreme case when self-join removal could introduce significant
> overhead in comparison with other optimizer parts? If not, should we
> remove self_join_search_limit GUC?
Thanks,
It was Zhihong Yu who worried about that case [1]. And my purpose was to
show a method to avoid such a problem if it would be needed.
I guess the main idea here is that we have a lot of self-joins, but only
few of them (or no one) can be removed.
I can't imagine a practical situation when we can be stuck in the
problems here. So, I vote to remove this GUC.

[1]
/message-id/CALNJ-vTyL-LpvSOPZxpC63Et3LJLUAFZSfRqGEhT5Rj7_EEj7w%40mail.gmail.com

--
regards,
Andrey Lepikhov
Postgres Professional

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	zyu(at)yugabyte(dot)com, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-21 22:01:49
Message-ID:	CAPpHfdsz8E085YF2VoJe2+h4YAShsh7p--t3kQfQTntiCy63Xg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 19/10/2023 01:50, Alexander Korotkov wrote:
> > This query took 3778.432 ms with self-join removal disabled, and
> > 3756.009 ms with self-join removal enabled. So, no measurable
> > overhead. Similar to the higher number of joins. Can you imagine
> > some extreme case when self-join removal could introduce significant
> > overhead in comparison with other optimizer parts? If not, should we
> > remove self_join_search_limit GUC?
> Thanks,
> It was Zhihong Yu who worried about that case [1]. And my purpose was to
> show a method to avoid such a problem if it would be needed.
> I guess the main idea here is that we have a lot of self-joins, but only
> few of them (or no one) can be removed.
> I can't imagine a practical situation when we can be stuck in the
> problems here. So, I vote to remove this GUC.

I've removed the self_join_search_limit. Anyway there is
enable_self_join_removal if the self join removal algorithm causes any
problems. I also did some grammar corrections for the comments. I
think the patch is getting to the committable shape. I noticed some
failures on commitfest.cputube.org. I'd like to check how this
version will pass it.

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
0001-Remove-useless-self-joins-v46.patch	application/octet-stream	100.1 KB

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-23 03:43:13
Message-ID:	922623af-872e-4cdc-9238-129b18bee42d@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22/10/2023 05:01, Alexander Korotkov wrote:
> On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 19/10/2023 01:50, Alexander Korotkov wrote:
>>> This query took 3778.432 ms with self-join removal disabled, and
>>> 3756.009 ms with self-join removal enabled. So, no measurable
>>> overhead. Similar to the higher number of joins. Can you imagine
>>> some extreme case when self-join removal could introduce significant
>>> overhead in comparison with other optimizer parts? If not, should we
>>> remove self_join_search_limit GUC?
>> Thanks,
>> It was Zhihong Yu who worried about that case [1]. And my purpose was to
>> show a method to avoid such a problem if it would be needed.
>> I guess the main idea here is that we have a lot of self-joins, but only
>> few of them (or no one) can be removed.
>> I can't imagine a practical situation when we can be stuck in the
>> problems here. So, I vote to remove this GUC.
>
> I've removed the self_join_search_limit. Anyway there is
> enable_self_join_removal if the self join removal algorithm causes any
> problems. I also did some grammar corrections for the comments. I
> think the patch is getting to the committable shape. I noticed some
> failures on commitfest.cputube.org. I'd like to check how this
> version will pass it.

I have observed the final patch. A couple of minor changes can be made
(see attachment).
Also, I see room for improvement, but it can be done later. For example,
we limit the optimization to only ordinary tables in this patch. It can
be extended at least with partitioned and foreign tables soon.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachment	Content-Type	Size
minor_changes.diff	text/plain	685 bytes

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-23 09:47:53
Message-ID:	CAPpHfduoAHTp6O7Q8PDe_w=30v0n_06pGe5M65iR4DYcfaYnRw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 23, 2023 at 6:43 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 22/10/2023 05:01, Alexander Korotkov wrote:
> > On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
> > <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> >> On 19/10/2023 01:50, Alexander Korotkov wrote:
> >>> This query took 3778.432 ms with self-join removal disabled, and
> >>> 3756.009 ms with self-join removal enabled. So, no measurable
> >>> overhead. Similar to the higher number of joins. Can you imagine
> >>> some extreme case when self-join removal could introduce significant
> >>> overhead in comparison with other optimizer parts? If not, should we
> >>> remove self_join_search_limit GUC?
> >> Thanks,
> >> It was Zhihong Yu who worried about that case [1]. And my purpose was to
> >> show a method to avoid such a problem if it would be needed.
> >> I guess the main idea here is that we have a lot of self-joins, but only
> >> few of them (or no one) can be removed.
> >> I can't imagine a practical situation when we can be stuck in the
> >> problems here. So, I vote to remove this GUC.
> >
> > I've removed the self_join_search_limit. Anyway there is
> > enable_self_join_removal if the self join removal algorithm causes any
> > problems. I also did some grammar corrections for the comments. I
> > think the patch is getting to the committable shape. I noticed some
> > failures on commitfest.cputube.org. I'd like to check how this
> > version will pass it.
>
> I have observed the final patch. A couple of minor changes can be made
> (see attachment).

Thank you, Andrei! I've integrated your changes into the patch.

> Also, I see room for improvement, but it can be done later. For example,
> we limit the optimization to only ordinary tables in this patch. It can
> be extended at least with partitioned and foreign tables soon.

Yes, I think it's reasonable to postpone some improvements. It's
important to get the basic feature in, make sure it's safe and stable.
Then we can make improvements incrementally.

I think this patch makes substantial improvement to query planning.
It has received plenty of reviews. The code is currently in quite
good shape. I didn't manage to find the cases when this optimization
causes significant overhead to planning time. Even if such cases will
be spotted there is a GUC option to disable this feature. So, I'll
push this if there are no objections.

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
0001-Remove-useless-self-joins-v47.patch	application/octet-stream	100.1 KB

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-23 11:00:00
Message-ID:	107682e0-137a-d0fb-2733-7fc6e9562690@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

Hi Alexander,

23.10.2023 12:47, Alexander Korotkov wrote:
> I think this patch makes substantial improvement to query planning.
> It has received plenty of reviews. The code is currently in quite
> good shape. I didn't manage to find the cases when this optimization
> causes significant overhead to planning time. Even if such cases will
> be spotted there is a GUC option to disable this feature. So, I'll
> push this if there are no objections.

On a quick glance, I've noticed following typos/inconsistencies in the
patch, which maybe worth fixing:
s/cadidates/candidates/
s/uniquiness/uniqueness/
s/selfjoin/self-join/
s/seperate/separate/

Also, shouldn't the reference "see generate_implied_equalities" be
"see generate_implied_equalities_for_column"?

Best regards,
Alexander

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2023-10-23 11:29:15
Message-ID:	CAPpHfduLxYm4biJrTbjBxTAW6vkxBswuQ2B=gXU+c37QJd6+Ow@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 23, 2023 at 2:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> 23.10.2023 12:47, Alexander Korotkov wrote:
> > I think this patch makes substantial improvement to query planning.
> > It has received plenty of reviews. The code is currently in quite
> > good shape. I didn't manage to find the cases when this optimization
> > causes significant overhead to planning time. Even if such cases will
> > be spotted there is a GUC option to disable this feature. So, I'll
> > push this if there are no objections.
>
> On a quick glance, I've noticed following typos/inconsistencies in the
> patch, which maybe worth fixing:
> s/cadidates/candidates/
> s/uniquiness/uniqueness/
> s/selfjoin/self-join/
> s/seperate/separate/
>
> Also, shouldn't the reference "see generate_implied_equalities" be
> "see generate_implied_equalities_for_column"?

Fixed all of the above. Thank you for catching this!

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
0001-Remove-useless-self-joins-v48.patch	application/octet-stream	100.1 KB

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2023-12-29 05:00:00
Message-ID:	a89f480f-8143-0965-f22d-0a892777f501@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 커뮤니티SQL

Hi Alexander,

23.10.2023 14:29, Alexander Korotkov wrote:
> Fixed all of the above. Thank you for catching this!

I've discovered that starting from d3d55ce57 the following query:
CREATE TABLE t(a int PRIMARY KEY);

WITH tt AS (SELECT * FROM t)
UPDATE t SET a = tt.a + 1 FROM tt
WHERE tt.a = t.a RETURNING t.a;

triggers an error "variable not found in subplan target lists".
(Commits 8a8ed916f and b5fb6736e don't fix this, unfortunately.)

Best regards,
Alexander

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2023-12-29 09:58:11
Message-ID:	0c5f461f-e26f-4fd9-9700-91b0e9c38fd6@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 사이트 순위SQL

On 29/12/2023 12:00, Alexander Lakhin wrote:
> Hi Alexander,
>
> 23.10.2023 14:29, Alexander Korotkov wrote:
>> Fixed all of the above. Thank you for catching this!
>
> I've discovered that starting from d3d55ce57 the following query:
> CREATE TABLE t(a int PRIMARY KEY);
>
> WITH tt AS (SELECT * FROM t)
> UPDATE t SET a = tt.a + 1 FROM tt
> WHERE tt.a = t.a RETURNING t.a;
>
> triggers an error "variable not found in subplan target lists".
> (Commits 8a8ed916f and b5fb6736e don't fix this, unfortunately.)

Thanks for the report!
The problem is with the resultRelation field. We forget to replace the
relid here.
Could you check your issue with the patch in the attachment? Does it
resolve this case?

--
regards,
Andrei Lepikhov
Postgres Professional

Attachment	Content-Type	Size
quickfix.diff	text/plain	1.1 KB

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2023-12-29 11:00:01
Message-ID:	336c748b-4ffe-5915-3d3c-ece6d19e021c@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Andrei,

29.12.2023 12:58, Andrei Lepikhov wrote:
> Thanks for the report!
> The problem is with the resultRelation field. We forget to replace the relid here.
> Could you check your issue with the patch in the attachment? Does it resolve this case?
>

Yes, with the patch applied I see no error.
Thank you!

Best regards,
Alexander

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-01-08 20:00:01
Message-ID:	4788f781-31bd-9796-d7d6-588a751c8787@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg사설 토토SQL

Hello Andrei and Alexander,

Please look at the following query which produces an incorrect result since
d3d55ce57:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1), (2, 1);
SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 0);

a | b
---+---
1 | 1
(1 row)

I think that the expected result is:
a | b
---+---
1 | 1
2 | 1
(2 rows)

Best regards,
Alexander

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-01-08 20:20:36
Message-ID:	CAPpHfduYKqbWWT=YuBJyoExrwsJSgjUuJN1Gsr2YZMG=nh_ctQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 8, 2024 at 10:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> Please look at the following query which produces an incorrect result since
> d3d55ce57:
> CREATE TABLE t(a int PRIMARY KEY, b int);
> INSERT INTO t VALUES (1, 1), (2, 1);
> SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 0);
>
> a | b
> ---+---
> 1 | 1
> (1 row)
>
> I think that the expected result is:
> a | b
> ---+---
> 1 | 1
> 2 | 1
> (2 rows)

Thank you for your report. I'm looking at this now.

------
Regards,
Alexander Korotkov

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-01-08 22:09:49
Message-ID:	CAPpHfdvu+wyazT+ujJbJ5rKZrHQoG+FCVeWENA4pyf_A1W1WAA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg롤 토토SQL :

On Mon, Jan 8, 2024 at 10:20 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> On Mon, Jan 8, 2024 at 10:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > Please look at the following query which produces an incorrect result since
> > d3d55ce57:
> > CREATE TABLE t(a int PRIMARY KEY, b int);
> > INSERT INTO t VALUES (1, 1), (2, 1);
> > SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 0);
> >
> > a | b
> > ---+---
> > 1 | 1
> > (1 row)
> >
> > I think that the expected result is:
> > a | b
> > ---+---
> > 1 | 1
> > 2 | 1
> > (2 rows)
>
> Thank you for your report. I'm looking at this now.

Fixed in 30b4955a46.

------
Regards,
Alexander Korotkov

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-01-09 04:00:01
Message-ID:	b9e8f460-f9a6-0e9b-e8ba-60d59f0bc22c@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

09.01.2024 01:09, Alexander Korotkov wrote:
> Fixed in 30b4955a46.

Thank you for fixing that!

I've found another anomaly coined with d3d55ce57. This query:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1), (2, 1);

WITH t1 AS (SELECT * FROM t)
UPDATE t SET b = t1.b + 1 FROM t1
WHERE t.a = t1.a RETURNING t.a, t1.b;

gives "ERROR: variable not found in subplan target lists" on d3d55ce57, but
starting from a7928a57b it gives an incorrect result:
a | b
---+---
1 | 2
2 | 2
(2 rows)

Best regards,
Alexander

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-01-09 06:08:37
Message-ID:	CAPpHfdsqe6wOkegR3QDeuNCZo3YMR2Nmeg2OXmdWgrbcJRDY7Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 9, 2024 at 6:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> 09.01.2024 01:09, Alexander Korotkov wrote:
> > Fixed in 30b4955a46.
>
> Thank you for fixing that!
>
> I've found another anomaly coined with d3d55ce57. This query:
> CREATE TABLE t(a int PRIMARY KEY, b int);
> INSERT INTO t VALUES (1, 1), (2, 1);
>
> WITH t1 AS (SELECT * FROM t)
> UPDATE t SET b = t1.b + 1 FROM t1
> WHERE t.a = t1.a RETURNING t.a, t1.b;
>
> gives "ERROR: variable not found in subplan target lists" on d3d55ce57, but
> starting from a7928a57b it gives an incorrect result:
> a | b
> ---+---
> 1 | 2
> 2 | 2
> (2 rows)

I see. It seems to be not safe to apply SJE to the modify table
target relation because it could use a different snapshot for the
RETURNING clause. I think we should just forbid SJE to involve the
modify table target relation. I'm planning to fix this later today.

------
Regards,
Alexander Korotkov

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-01-09 08:03:28
Message-ID:	CAPpHfdstTOaZYJJ2V6WJeZcOLqoCw1TnTruGJ7S1dPYrk=0nGA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 9, 2024 at 8:08 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> On Tue, Jan 9, 2024 at 6:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > 09.01.2024 01:09, Alexander Korotkov wrote:
> > > Fixed in 30b4955a46.
> >
> > Thank you for fixing that!
> >
> > I've found another anomaly coined with d3d55ce57. This query:
> > CREATE TABLE t(a int PRIMARY KEY, b int);
> > INSERT INTO t VALUES (1, 1), (2, 1);
> >
> > WITH t1 AS (SELECT * FROM t)
> > UPDATE t SET b = t1.b + 1 FROM t1
> > WHERE t.a = t1.a RETURNING t.a, t1.b;
> >
> > gives "ERROR: variable not found in subplan target lists" on d3d55ce57, but
> > starting from a7928a57b it gives an incorrect result:
> > a | b
> > ---+---
> > 1 | 2
> > 2 | 2
> > (2 rows)
>
> I see. It seems to be not safe to apply SJE to the modify table
> target relation because it could use a different snapshot for the
> RETURNING clause. I think we should just forbid SJE to involve the
> modify table target relation. I'm planning to fix this later today.

Fixed in 8c441c08279.

------
Regards,
Alexander Korotkov

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-18 13:00:00
Message-ID:	56ee4520-e9d1-d519-54fe-c8bff880ce9b@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Alexander,

09.01.2024 01:09, Alexander Korotkov wrote:
> Fixed in 30b4955a46.

Please look at the following query which fails with an error since
d3d55ce57:

create table t (i int primary key);

select t3.i from t t1
join t t2 on t1.i = t2.i,
lateral (select t1.i limit 1) t3;

ERROR: non-LATERAL parameter required by subquery

Best regards,
Alexander

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-18 15:04:17
Message-ID:	CAPpHfdsJ=MZNFw0L2T_k-shOk+4dPNx9j5RZzicy3NW+yiiQVg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> 09.01.2024 01:09, Alexander Korotkov wrote:
>
> Fixed in 30b4955a46.
>
>
> Please look at the following query which fails with an error since
> d3d55ce57:
>
> create table t (i int primary key);
>
> select t3.i from t t1
> join t t2 on t1.i = t2.i,
> lateral (select t1.i limit 1) t3;
>
> ERROR: non-LATERAL parameter required by subquery

Thank you for spotting. I'm looking at this.

------
Regards,
Alexander Korotkov

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-18 16:18:21
Message-ID:	CAPpHfdtB0A1Rz8k5yPMCM6ngLt2JM4NSaG8rF0=wT-txdcEf3g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > 09.01.2024 01:09, Alexander Korotkov wrote:
> >
> > Fixed in 30b4955a46.
> >
> >
> > Please look at the following query which fails with an error since
> > d3d55ce57:
> >
> > create table t (i int primary key);
> >
> > select t3.i from t t1
> > join t t2 on t1.i = t2.i,
> > lateral (select t1.i limit 1) t3;
> >
> > ERROR: non-LATERAL parameter required by subquery
>
> Thank you for spotting. I'm looking at this.

Attached is a draft patch fixing this query. Could you, please, recheck?

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
0001-Replace-relids-in-lateral-subquery-target-list-du-v1.patch	application/octet-stream	2.8 KB

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-18 18:00:00
Message-ID:	dde78ada-b933-7d55-64b2-a1be31317802@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토 베트맨SQL

18.02.2024 19:18, Alexander Korotkov wrote:
> Attached is a draft patch fixing this query. Could you, please, recheck?

Yes, this patch fixes the behavior for that query (I've also tried several
similar queries). Though I don't know the code well enough to judge the
code change.

Best regards,
Alexander

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-19 05:24:09
Message-ID:	5063f378-ca3c-405e-b1e8-1927ea9052e8@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg와이즈 토토SQL

On 18/2/2024 23:18, Alexander Korotkov wrote:
> On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>>> 09.01.2024 01:09, Alexander Korotkov wrote:
>>>
>>> Fixed in 30b4955a46.
>>>
>>>
>>> Please look at the following query which fails with an error since
>>> d3d55ce57:
>>>
>>> create table t (i int primary key);
>>>
>>> select t3.i from t t1
>>> join t t2 on t1.i = t2.i,
>>> lateral (select t1.i limit 1) t3;
>>>
>>> ERROR: non-LATERAL parameter required by subquery
>>
>> Thank you for spotting. I'm looking at this.
>
> Attached is a draft patch fixing this query. Could you, please, recheck?
I reviewed this patch. Why do you check only the target list? I guess
these links can be everywhere. See the patch in the attachment with the
elaborated test and slightly changed code.

--
regards,
Andrei Lepikhov
Postgres Professional

Attachment	Content-Type	Size
0001-Replace-relids-in-lateral-subquery-target-list-du-v2.patch	text/plain	5.2 KB

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-21 07:26:57
Message-ID:	CAMbWs4-=PO6Mm9gNnySbx0VHyXjgnnYYwbN9dth=TLQweZ-M+g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 19, 2024 at 1:24 PM Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
wrote:

> On 18/2/2024 23:18, Alexander Korotkov wrote:
> > On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com>
> wrote:
> >> On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com>
> wrote:
> >>> Please look at the following query which fails with an error since
> >>> d3d55ce57:
> >>>
> >>> create table t (i int primary key);
> >>>
> >>> select t3.i from t t1
> >>> join t t2 on t1.i = t2.i,
> >>> lateral (select t1.i limit 1) t3;
> >>>
> >>> ERROR: non-LATERAL parameter required by subquery
> >>
> >> Thank you for spotting. I'm looking at this.
> >
> > Attached is a draft patch fixing this query. Could you, please, recheck?
> I reviewed this patch. Why do you check only the target list? I guess
> these links can be everywhere. See the patch in the attachment with the
> elaborated test and slightly changed code.

I just noticed that this fix has been committed in 489072ab7a, but it's
just flat wrong.

* The fix walks the subquery and replaces all the Vars with a varno
equal to the relid of the removing rel, without checking the
varlevelsup. That is to say, a Var that belongs to the subquery itself
might also be replaced, which is wrong. For instance,

create table t (i int primary key);

explain (costs off)
select t3.i from t t1
join t t2 on t1.i = t2.i
join lateral (select * from (select t1.i offset 0) offset 0) t3 on true;
ERROR: no relation entry for relid 2

* The fix only traverses one level within the subquery, so Vars that
appear in subqueries with multiple levels cannot be replaced. For
instance,

explain (costs off)
select t4.i from t t1
join t t2 on t1.i = t2.i
join lateral (select t3.i from t t3, (select t1.i) offset 0) t4 on true;
ERROR: non-LATERAL parameter required by subquery

I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.

Attached is a patch for the fix.

While writing the fix, I noticed some outdated comments. Such as in
remove_rel_from_query, the first for loop updates otherrel's attr_needed
as well as lateral_vars, but the comment only mentions attr_needed. So
this patch also fixes some outdated comments.

While writing the test cases, I found that the test cases for SJE are
quite messy. Below are what I have noticed:

* There are several test cases using catalog tables like pg_class,
pg_stats, pg_index, etc. for testing join removal. I don't see a reason
why we need to use catalog tables, and I think this just raises the risk
of instability.

* In many test cases, a mix of uppercase and lowercase keywords is used
in one query. I think it'd better to maintain consistency by using
either all uppercase or all lowercase keywords in one query.

* In most situations, we verify the plan and the output of a query like:

explain (costs off)
select ...;
select ...;

The two select queries are supposed to be the same. But in the SJE test
cases, I have noticed instances where the two select queries differ from
each other.

This patch also includes some cosmetic tweaks for SJE test cases. It
does not change the test cases using catalog tables though. I think
that should be a seperate patch.

Thanks
Richard

Attachment	Content-Type	Size
v1-0001-Replace-lateral-references-to-removed-rels-in-subqueries.patch	application/octet-stream	39.0 KB

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-21 07:59:45
Message-ID:	dc465c1a-c9dc-4f57-af73-9c5f1cd0854a@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 21/2/2024 14:26, Richard Guo wrote:
> This patch also includes some cosmetic tweaks for SJE test cases. It
> does not change the test cases using catalog tables though. I think
> that should be a seperate patch.
Thanks for this catch, it is really messy thing, keeping aside the
question why we need two different subtrees for the same query.
I will look into your fix.

--
regards,
Andrei Lepikhov
Postgres Professional

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-22 08:51:19
Message-ID:	d6d227ff-0284-412c-aecf-603ad895873e@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토 베트맨SQL

On 21/2/2024 14:26, Richard Guo wrote:
> I think the right fix for these issues is to introduce a new element
> 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> to 1) recurse into subselects with sublevels_up increased, and 2)
> perform the replacement only when varlevelsup is equal to sublevels_up.
This code looks good. No idea how we have lost it before.
>
> While writing the fix, I noticed some outdated comments. Such as in
> remove_rel_from_query, the first for loop updates otherrel's attr_needed
> as well as lateral_vars, but the comment only mentions attr_needed. So
> this patch also fixes some outdated comments.
Thanks, looks good.
>
> While writing the test cases, I found that the test cases for SJE are
> quite messy. Below are what I have noticed:
>
> * There are several test cases using catalog tables like pg_class,
> pg_stats, pg_index, etc. for testing join removal. I don't see a reason
> why we need to use catalog tables, and I think this just raises the risk
> of instability.
I see only one unusual query with the pg_class involved.
>
> * In many test cases, a mix of uppercase and lowercase keywords is used
> in one query. I think it'd better to maintain consistency by using
> either all uppercase or all lowercase keywords in one query.
I see uppercase -> lowercase change:
select t1.*, t2.a as ax from sj t1 join sj t2
and lowercase -> uppercase in many other cases:
explain (costs off)
I guess it is a matter of taste, so give up for the committer decision.
Technically, it's OK.
>
> * In most situations, we verify the plan and the output of a query like:
>
> explain (costs off)
> select ...;
> select ...;
>
> The two select queries are supposed to be the same. But in the SJE test
> cases, I have noticed instances where the two select queries differ from
> each other.
>
> This patch also includes some cosmetic tweaks for SJE test cases. It
> does not change the test cases using catalog tables though. I think
> that should be a seperate patch.
I can't assess the necessity of changing these dozens of lines of code
because I follow another commenting style, but technically, it's still OK.

--
regards,
Andrei Lepikhov
Postgres Professional

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-23 22:36:59
Message-ID:	CAPpHfdt57GeLokML7pYaecejV-VfHVDpVqbx-hTVswWmQ6nMTQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg젠 토토SQL :

On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 21/2/2024 14:26, Richard Guo wrote:
> > I think the right fix for these issues is to introduce a new element
> > 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> > to 1) recurse into subselects with sublevels_up increased, and 2)
> > perform the replacement only when varlevelsup is equal to sublevels_up.
> This code looks good. No idea how we have lost it before.

Thanks to Richard for the patch and to Andrei for review. I also find
code looking good. Pushed with minor edits from me.

------
Regards,
Alexander Korotkov

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-24 05:12:05
Message-ID:	20240224051205.db@rfd.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
> On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> > On 21/2/2024 14:26, Richard Guo wrote:
> > > I think the right fix for these issues is to introduce a new element
> > > 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> > > to 1) recurse into subselects with sublevels_up increased, and 2)
> > > perform the replacement only when varlevelsup is equal to sublevels_up.
> > This code looks good. No idea how we have lost it before.
>
> Thanks to Richard for the patch and to Andrei for review. I also find
> code looking good. Pushed with minor edits from me.

I feel this, commit 466979e, misses a few of our project standards:

- The patch makes many non-whitespace changes to existing test queries. This
makes it hard to review the consequences of the non-test part of the patch.
Did you minimize such edits? Of course, not every such edit is avoidable.

- The commit message doesn't convey whether this is refactoring or is a bug
fix. This makes it hard to write release notes, among other things. From
this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
v17-specific. The commit message for 489072ab7a is also silent about that
commit's status as refactoring or as a bug fix.

- Normally, I could answer the previous question by reading the test case
diffs. However, in addition to the first point about non-whitespace
changes, the first three join.sql patch hunks just change whitespace.
Worse, since they move line breaks, "git diff -w" doesn't filter them out.

To what extent are those community standards vs. points of individual
committer preference? Please tell me where I'm wrong here.

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-24 11:02:01
Message-ID:	CAPpHfdtqthbfyrtWjMkVRLcUDFxQeC3_v1Q6q6F2oTmNrzdnwg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg무지개 토토SQL

Hi, Noah!

On Sat, Feb 24, 2024 at 7:12 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
> > On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> > > On 21/2/2024 14:26, Richard Guo wrote:
> > > > I think the right fix for these issues is to introduce a new element
> > > > 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> > > > to 1) recurse into subselects with sublevels_up increased, and 2)
> > > > perform the replacement only when varlevelsup is equal to sublevels_up.
> > > This code looks good. No idea how we have lost it before.
> >
> > Thanks to Richard for the patch and to Andrei for review. I also find
> > code looking good. Pushed with minor edits from me.
>
> I feel this, commit 466979e, misses a few of our project standards:
>
> - The patch makes many non-whitespace changes to existing test queries. This
> makes it hard to review the consequences of the non-test part of the patch.
> Did you minimize such edits? Of course, not every such edit is avoidable.
>
> - The commit message doesn't convey whether this is refactoring or is a bug
> fix. This makes it hard to write release notes, among other things. From
> this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
> v17-specific. The commit message for 489072ab7a is also silent about that
> commit's status as refactoring or as a bug fix.
>
> - Normally, I could answer the previous question by reading the test case
> diffs. However, in addition to the first point about non-whitespace
> changes, the first three join.sql patch hunks just change whitespace.
> Worse, since they move line breaks, "git diff -w" doesn't filter them out.
>
> To what extent are those community standards vs. points of individual
> committer preference? Please tell me where I'm wrong here.

I agree that commit 466979e is my individual committer failure. I
should have written a better, more clear commit message and separate
tests refactoring from the bug fix.

I'm not so sure about 489072ab7a (except it provides a wrong fix). It
has a "Reported-by:" field meaning it's a problem reported by a
particular person. The "Discussion:" points directly to the reported
test case. And commit contains the relevant test case. The commit
message could be more wordy though.

------
Regards,
Alexander Korotkov

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Removing unneeded self joins
Date:	2024-02-24 17:20:22
Message-ID:	20240224172022.c4@rfd.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

On Sat, Feb 24, 2024 at 01:02:01PM +0200, Alexander Korotkov wrote:
> On Sat, Feb 24, 2024 at 7:12 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
> > > On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> > > > On 21/2/2024 14:26, Richard Guo wrote:
> > > > > I think the right fix for these issues is to introduce a new element
> > > > > 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> > > > > to 1) recurse into subselects with sublevels_up increased, and 2)
> > > > > perform the replacement only when varlevelsup is equal to sublevels_up.
> > > > This code looks good. No idea how we have lost it before.
> > >
> > > Thanks to Richard for the patch and to Andrei for review. I also find
> > > code looking good. Pushed with minor edits from me.
> >
> > I feel this, commit 466979e, misses a few of our project standards:
> >
> > - The patch makes many non-whitespace changes to existing test queries. This
> > makes it hard to review the consequences of the non-test part of the patch.
> > Did you minimize such edits? Of course, not every such edit is avoidable.
> >
> > - The commit message doesn't convey whether this is refactoring or is a bug
> > fix. This makes it hard to write release notes, among other things. From
> > this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
> > v17-specific. The commit message for 489072ab7a is also silent about that
> > commit's status as refactoring or as a bug fix.
> >
> > - Normally, I could answer the previous question by reading the test case
> > diffs. However, in addition to the first point about non-whitespace
> > changes, the first three join.sql patch hunks just change whitespace.
> > Worse, since they move line breaks, "git diff -w" doesn't filter them out.
> >
> > To what extent are those community standards vs. points of individual
> > committer preference? Please tell me where I'm wrong here.
>
> I agree that commit 466979e is my individual committer failure. I
> should have written a better, more clear commit message and separate
> tests refactoring from the bug fix.
>
> I'm not so sure about 489072ab7a (except it provides a wrong fix). It
> has a "Reported-by:" field meaning it's a problem reported by a
> particular person. The "Discussion:" points directly to the reported
> test case. And commit contains the relevant test case. The commit
> message could be more wordy though.

Agreed, the first and third points don't apply to 489072ab7a. Thanks to that,
one can deduce from its new test case query that it fixes a bug. It sounds
like we agree about commit 466979e, so that's good.

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-04-30 06:00:00
Message-ID:	0ed2cf02-e655-6ec7-a4fe-52fd8d572f65@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello Alexander,

I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
(SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;

With asserts enabled, it triggers
TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: "initsplan.c", Line: 697, PID: 3074054
ExceptionalCondition at assert.c:52:13
create_lateral_join_info at initsplan.c:700:8
query_planner at planmain.c:257:2
grouping_planner at planner.c:1523:17
subquery_planner at planner.c:1098:2
standard_planner at planner.c:415:9
planner at planner.c:282:12
pg_plan_query at postgres.c:904:9
pg_plan_queries at postgres.c:996:11
exec_simple_query at postgres.c:1193:19
PostgresMain at postgres.c:4684:27

With no asserts, I get:
ERROR: failed to construct the join relation

Please take a look at this.

Best regards,
Alexander

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-04-30 08:53:20
Message-ID:	CAPpHfdtEde7mgjKAthWsQWvf+oSyf11qdczs7-i390C2p=W5kg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Alexander!

On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> 23.10.2023 12:47, Alexander Korotkov wrote:
> > I think this patch makes substantial improvement to query planning.
> > It has received plenty of reviews. The code is currently in quite
> > good shape. I didn't manage to find the cases when this optimization
> > causes significant overhead to planning time. Even if such cases will
> > be spotted there is a GUC option to disable this feature. So, I'll
> > push this if there are no objections.
>
> I've discovered another failure, introduced by d3d55ce57.
> Please try the following:
> CREATE TABLE t (a int unique, b float);
> SELECT * FROM t NATURAL JOIN LATERAL
> (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;
>
> With asserts enabled, it triggers
> TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: "initsplan.c", Line: 697, PID: 3074054
> ExceptionalCondition at assert.c:52:13
> create_lateral_join_info at initsplan.c:700:8
> query_planner at planmain.c:257:2
> grouping_planner at planner.c:1523:17
> subquery_planner at planner.c:1098:2
> standard_planner at planner.c:415:9
> planner at planner.c:282:12
> pg_plan_query at postgres.c:904:9
> pg_plan_queries at postgres.c:996:11
> exec_simple_query at postgres.c:1193:19
> PostgresMain at postgres.c:4684:27
>
> With no asserts, I get:
> ERROR: failed to construct the join relation
>
> Please take a look at this.

I'm looking into this, thank you!

------
Regards,
Alexander Korotkov

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-04-30 10:20:54
Message-ID:	CAPpHfdsHWbdPVzUdmoKv3dpo8RHJOa=LQR-ZAT+89uvktMs9XA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 커뮤니티SQL

I think we should just forbid SJE in case when relations to be merged
have cross-references with lateral vars. The draft patch for this is
attached. I'd like to ask Alexander to test it, Richard and Andrei to
review it. Thank you!

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
sje_skip_cross_lateral_vars.patch	application/octet-stream	1.6 KB

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-01 11:00:00
Message-ID:	735b5cd4-568d-fbf7-ad8c-dcfb4c3af86e@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토 베트맨SQL

30.04.2024 13:20, Alexander Korotkov wrote:
> On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>> I've discovered another failure, introduced by d3d55ce57.
>> Please try the following:
>> CREATE TABLE t (a int unique, b float);
>> SELECT * FROM t NATURAL JOIN LATERAL
>> (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;
> I think we should just forbid SJE in case when relations to be merged
> have cross-references with lateral vars. The draft patch for this is
> attached. I'd like to ask Alexander to test it, Richard and Andrei to
> review it. Thank you!

Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
in general. For instance:
CREATE TABLE t (a int unique);
INSERT INTO t SELECT * FROM generate_series (1,100);

SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
returned 100, 100, 100 for me, though with enable_self_join_removal = off,
I got 4, 0, 1...

Best regards,
Alexander

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-01 11:59:58
Message-ID:	CAPpHfdssga9CBQQ6s32t1Xb2Js2WuPfiporbAse334CJiyBb1Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 1, 2024 at 2:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> 30.04.2024 13:20, Alexander Korotkov wrote:
> > On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> >> I've discovered another failure, introduced by d3d55ce57.
> >> Please try the following:
> >> CREATE TABLE t (a int unique, b float);
> >> SELECT * FROM t NATURAL JOIN LATERAL
> >> (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;
> > I think we should just forbid SJE in case when relations to be merged
> > have cross-references with lateral vars. The draft patch for this is
> > attached. I'd like to ask Alexander to test it, Richard and Andrei to
> > review it. Thank you!
>
> Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
> in general. For instance:
> CREATE TABLE t (a int unique);
> INSERT INTO t SELECT * FROM generate_series (1,100);
>
> SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
> NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
> returned 100, 100, 100 for me, though with enable_self_join_removal = off,
> I got 4, 0, 1...

Right, thank you for reporting this.

BTW, I found another case where my previous fix doesn't work.

SELECT * FROM t NATURAL JOIN LATERAL (SELECT * FROM t t2 TABLESAMPLE
SYSTEM (t.b) NATURAL JOIN LATERAL(SELECT * FROM t t3 TABLESAMPLE
SYSTEM (t2.b)) t3) t2;

I think we probably could forbid SJE for the tables with TABLESAMPLE
altogether. Please, check the attached patch.

------
Regards,
Alexander Korotkov

Attachment	Content-Type	Size
sje_tablesample.patch	application/octet-stream	666 bytes

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	"Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-02 09:44:54
Message-ID:	8100af4a-d0aa-450d-a09f-541f26b04ac2@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/1/24 18:59, Alexander Korotkov wrote:
> I think we probably could forbid SJE for the tables with TABLESAMPLE
> altogether. Please, check the attached patch.
Your patch looks good to me. I added some comments and test case into
the join.sql.

One question for me is: Do we anticipate other lateral self-references
except the TABLESAMPLE case? Looking into the extract_lateral_references
implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
pull up subqueries before extracting lateral references. So, if we have
a reference to a subquery, it means we will not flatten this subquery
and don't execute SJE. Do we need more code, as you have written in the
first patch?

--
regards,
Andrei Lepikhov
Postgres Professional

Attachment	Content-Type	Size
0001-Forbid-self-join-elimination-on-table-sampling-scans.patch	text/x-patch	3.7 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Alexander Lakhin <exclusion(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-02 10:07:54
Message-ID:	CAPpHfdvGVcud2i00pEVH2_GS09c8UsDh6P-SEOLd6YMGOxiNEw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 커뮤니티SQL

On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> On 5/1/24 18:59, Alexander Korotkov wrote:
> > I think we probably could forbid SJE for the tables with TABLESAMPLE
> > altogether. Please, check the attached patch.
> Your patch looks good to me. I added some comments and test case into
> the join.sql.

Thank you

> One question for me is: Do we anticipate other lateral self-references
> except the TABLESAMPLE case? Looking into the extract_lateral_references
> implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
> pull up subqueries before extracting lateral references. So, if we have
> a reference to a subquery, it means we will not flatten this subquery
> and don't execute SJE. Do we need more code, as you have written in the
> first patch?

I think my first patch was crap anyway. Your explanation seems
reasonable to me. I'm not sure this requires any more code. Probably
it would be enough to add more comments about this.

------
Regards,
Alexander Korotkov

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-02 13:14:20
Message-ID:	CAMbWs49Q8g1LPVCeNHYv-Y-gFo826ertrg6nYNC9LL=8=zHP3g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 2, 2024 at 6:08 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com>
wrote:

> On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> > One question for me is: Do we anticipate other lateral self-references
> > except the TABLESAMPLE case? Looking into the extract_lateral_references
> > implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
> > pull up subqueries before extracting lateral references. So, if we have
> > a reference to a subquery, it means we will not flatten this subquery
> > and don't execute SJE. Do we need more code, as you have written in the
> > first patch?
>
> I think my first patch was crap anyway. Your explanation seems
> reasonable to me. I'm not sure this requires any more code. Probably
> it would be enough to add more comments about this.

The tablesample case is not the only factor that can cause a relation to
have a lateral dependency on itself after self-join removal. It can
also happen with PHVs. As an example, consider

explain (costs off)
select * from t t1
left join lateral
(select t1.a as t1a, * from t t2) t2
on true
where t1.a = t2.a;
server closed the connection unexpectedly

This is because after self-join removal, a PlaceHolderInfo's ph_lateral
might contain rels mentioned in ph_eval_at, which we should get rid of.

For the tablesample case, I agree that we should not consider relations
with TABLESAMPLE clauses as candidates to be removed. Removing such a
relation could potentially change the syntax of the query, as shown by
Alexander's example. It seems to me that we can just check that in
remove_self_joins_recurse, while we're collecting the base relations
that are considered to be candidates for removal.

This leads to the attached patch. This patch also includes some code
refactoring for the surrounding code.

Thanks
Richard

Attachment	Content-Type	Size
v1-0001-Fix-bogus-lateral-dependency-after-self-join-removal.patch	application/octet-stream	5.2 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-02 23:04:40
Message-ID:	CAPpHfdsUiwbGLbBwtdL+UCT5GCUVEU+SqQaaZjtcje3znU2JgQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 핫SQL :

Hi, Richard!

On Thu, May 2, 2024 at 4:14 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> On Thu, May 2, 2024 at 6:08 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
>> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> > One question for me is: Do we anticipate other lateral self-references
>> > except the TABLESAMPLE case? Looking into the extract_lateral_references
>> > implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
>> > pull up subqueries before extracting lateral references. So, if we have
>> > a reference to a subquery, it means we will not flatten this subquery
>> > and don't execute SJE. Do we need more code, as you have written in the
>> > first patch?
>>
>> I think my first patch was crap anyway. Your explanation seems
>> reasonable to me. I'm not sure this requires any more code. Probably
>> it would be enough to add more comments about this.
>
>
> The tablesample case is not the only factor that can cause a relation to
> have a lateral dependency on itself after self-join removal. It can
> also happen with PHVs. As an example, consider
>
> explain (costs off)
> select * from t t1
> left join lateral
> (select t1.a as t1a, * from t t2) t2
> on true
> where t1.a = t2.a;
> server closed the connection unexpectedly
>
> This is because after self-join removal, a PlaceHolderInfo's ph_lateral
> might contain rels mentioned in ph_eval_at, which we should get rid of.
>
> For the tablesample case, I agree that we should not consider relations
> with TABLESAMPLE clauses as candidates to be removed. Removing such a
> relation could potentially change the syntax of the query, as shown by
> Alexander's example. It seems to me that we can just check that in
> remove_self_joins_recurse, while we're collecting the base relations
> that are considered to be candidates for removal.
>
> This leads to the attached patch. This patch also includes some code
> refactoring for the surrounding code.

Great, thank you for your work on this!

I'd like to split this into separate patches for better granularity of
git history. I also added 0001 patch, which makes first usage of the
SJE acronym in file to come with disambiguation. Also, I've added
assert that ph_lateral and ph_eval_at didn't overlap before the
changes. I think this should help from the potential situation when
the changes we do could mask another bug.

I would appreciate your review of this patchset, and review from Andrei as well.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v2-0002-Minor-refactoring-for-self-join-elimination-code.patch	application/octet-stream	3.1 KB
v2-0003-Forbid-self-join-elimination-on-table-sampling-sc.patch	application/octet-stream	3.5 KB
v2-0004-Fix-self-join-elimination-work-with-PlaceHolderIn.patch	application/octet-stream	3.0 KB
v2-0001-Clarify-the-SJE-self-join-elimination-acronym.patch	application/octet-stream	2.6 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-02 23:19:34
Message-ID:	2422119.1714691974@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> I would appreciate your review of this patchset, and review from Andrei as well.

I hate to say this ... but if we're still finding bugs this
basic in SJE, isn't it time to give up on it for v17?

I might feel better about it if there were any reason to think
these were the last major bugs. But you have already committed
around twenty separate fixes for the original SJE patch, and
now here you come with several more; so it doesn't seem like
the defect rate has slowed materially. There can be no doubt
whatever that the original patch was far from commit-ready.

I think we should revert SJE for v17 and do a thorough design
review before trying again in v18.

regards, tom lane

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-03 03:19:58
Message-ID:	75de38d9-005d-4995-9ecf-a155844dcbcd@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/3/24 06:19, Tom Lane wrote:
> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
>> I would appreciate your review of this patchset, and review from Andrei as well.
>
> I hate to say this ... but if we're still finding bugs this
> basic in SJE, isn't it time to give up on it for v17?
>
> I might feel better about it if there were any reason to think
> these were the last major bugs. But you have already committed
> around twenty separate fixes for the original SJE patch, and
> now here you come with several more; so it doesn't seem like
> the defect rate has slowed materially. There can be no doubt
> whatever that the original patch was far from commit-ready.
>
> I think we should revert SJE for v17 and do a thorough design
> review before trying again in v18.
I need to say I don't see any evidence of bad design.
I think this feature follows the example of 2489d76 [1], 1349d27 [2],
and partitionwise join features — we get some issues from time to time,
but these strengths and frequencies are significantly reduced.
First and foremost, this feature is highly isolated: like the PWJ
feature, you can just disable (not enable?) SJE and it guarantees you
will avoid the problems.
Secondly, this feature reflects the design decisions the optimiser has
made before. It raises some questions: Do we really control the
consistency of our paths and the plan tree? Maybe we hide our
misunderstanding of its logic by extensively copying expression trees,
sometimes without fundamental necessity. Perhaps the optimiser needs
some abstraction layers or reconstruction to reduce the quickly growing
complexity.
A good example here is [1]. IMO, the new promising feature it has
introduced isn't worth the complexity it added to the planner.
SJE, much like OR <-> ANY transformation, introduces a fresh perspective
into the planner: if we encounter a complex, redundant query, it may be
more beneficial to invest in simplifying the internal query
representation rather than adding new optimisations that will grapple
with this complexity.
Also, SJE raised questions I've never seen before, like: Could we
control the consistency of the PlannerInfo by changing something in the
logic?
Considering the current state, I don't see any concrete outcomes or
evidence that a redesign of the feature will lead us to a new path.
However, I believe the presence of SJE in the core could potentially
trigger improvements in the planner. As a result, I vote to stay with
the feature. But remember, as an author, I'm not entirely objective, so
let's wait for alternative opinions.

[1] Make Vars be outer-join-aware
[2] Improve performance of ORDER BY / DISTINCT aggregates

--
regards,
Andrei Lepikhov
Postgres Professional

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-03 08:56:32
Message-ID:	CAPpHfdv6B8HCLdj8WidBryRrX0+X3F1rrR8uAuMQmp6rvPdscg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Tom!

On Fri, May 3, 2024 at 2:19 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > I would appreciate your review of this patchset, and review from Andrei as well.
>
> I hate to say this ... but if we're still finding bugs this
> basic in SJE, isn't it time to give up on it for v17?
>
> I might feel better about it if there were any reason to think
> these were the last major bugs. But you have already committed
> around twenty separate fixes for the original SJE patch, and
> now here you come with several more; so it doesn't seem like
> the defect rate has slowed materially. There can be no doubt
> whatever that the original patch was far from commit-ready.

I think if we subtract from the SJE followup commits containing
improvements (extra comments, asserts) and fix for in-place Bitmapset
modification, which was there before, the number of fixes will be
closer to ten. And the number of pending fixes will be two. But I
totally get your concern that we're quite late in the release cycle
and new SJE-related issues continue to arise. This leads to a
significant risk of raising many bugs for end users.

> I think we should revert SJE for v17 and do a thorough design
> review before trying again in v18.

I agree to revert it for v17, but I'm not exactly sure the issue is
design (nevertheless design review is very welcome as any other type
of review). The experience of the bugs arising with the SJE doesn't
show me a particular weak spot in the feature. It looks more like
this patch has to revise awfully a lot planner data structures to
replace one relid with another. And I don't see the design, which
could avoid that. Somewhere in the thread I have proposed a concept
of "alias relids". However, I suspect that could leave us with more
lurking bugs instead of the bug-free code.

I suggest we should give this feature more review and testing, then
commit early v18. That would leave us enough time to fix any other
issues before v18 release.

------
Regards,
Alexander Korotkov
Supabase

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-03 13:55:22
Message-ID:	CA+TgmoYSRwjeTDeLY+OfMkKZOg0M6bBOh-L9UfZExu0xZVY_bA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 3, 2024 at 4:57 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> I agree to revert it for v17, but I'm not exactly sure the issue is
> design (nevertheless design review is very welcome as any other type
> of review). The experience of the bugs arising with the SJE doesn't
> show me a particular weak spot in the feature. It looks more like
> this patch has to revise awfully a lot planner data structures to
> replace one relid with another. And I don't see the design, which
> could avoid that. Somewhere in the thread I have proposed a concept
> of "alias relids". However, I suspect that could leave us with more
> lurking bugs instead of the bug-free code.

I agree that reverting it for v17 makes sense. In terms of moving
forward, whether a design review is exactly the right idea or not, I'm
not sure. However, I think that the need to replace relids in a lot of
places is something that a design review might potentially flag as a
problem. Maybe there is some other approach that can avoid the need
for this.

On the other hand, maybe there's not. But in that case, the question
becomes how the patch author(s), and committer, are going to make sure
that most of the issues get flushed out before the initial commit.
What we can't do is say - we know that we need to replace relids in a
bunch of places, so we'll change the ones we know about, and then rely
on testing to find any that we missed. There has to be some kind of
systematic plan that everyone can agree should find all of the
affected places, and then if a few slip through, that's fine, that's
how life goes.

I haven't followed the self-join elimination work very closely, and I
do quite like the idea of the feature. However, looking over all the
follow-up commits, it's pretty hard to escape the conclusion that
there were a lot of cases that weren't adequately considered in the
initial work (lateral, result relations, PHVs, etc.). And that is a
big problem -- it really creates a lot of issues for the project when
a major feature commit misses whole areas that it needs to have
considered, as plenty of previous history will show. When anybody
starts to realize that they've not just had a few goofs but have
missed some whole affected area entirely, it's time to start thinking
about a revert.

One of my most embarrassing gaffes in this area personally was
a448e49bcbe40fb72e1ed85af910dd216d45bad8. I don't know how I managed
to commit the original patch without realizing it was going to cause
an increase in the WAL size, but I can tell you that when I realized
it, my heart sank through the floor. I'd love to return to that work
if we can all ever agree on a way of addressing that problem, but in
the meantime, that patch is very dead. And ... if somebody had taken
the time to give me a really good design review of that patch, they
might well have noticed, and saved me the embarrassment of committing
something that had no shot of remaining in the tree. Unfortunately,
one of the downsides of being a committer is that you tend to get less
of that sort of review, because people assume you know what you're
doing. Which is fabulous, when you actually do know what you're doing,
and really sucks, when you don't. One of the things I'd like to see
discussed at 2024.pgconf.dev is how we can improve this aspect of how
we work together.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-05 02:45:47
Message-ID:	24e93efd-eeac-43a6-a75a-ad68eacb32ee@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/5/2024 20:55, Robert Haas wrote:
> One of my most embarrassing gaffes in this area personally was
> a448e49bcbe40fb72e1ed85af910dd216d45bad8. I don't know how I managed
> to commit the original patch without realizing it was going to cause
> an increase in the WAL size, but I can tell you that when I realized
> it, my heart sank through the floor.
I discovered this feature and agree that it looks like a severe problem.
Unfortunately, in the case of the SJE patch, the committer and reviewers
don't provide negative feedback. We see the only (I'm not sure I use the
proper English phrase) 'negative feelings' from people who haven't
reviewed or analysed it at all (at least, they didn't mention it).

Considering the situation, I suggest setting the default value of
enable_self_join_removal to false in PG17 for added safety and then
changing it to true in early PG18.

Having no objective negative feedback, we have no reason to change
anything in the design or any part of the code. It looks regrettable and
unusual.

After designing the feature, fixing its bugs, and reviewing joint
patches on the commitfest, the question more likely lies in the planner
design. For example, I wonder if anyone here knows why exactly the
optimiser makes a copy of the whole query subtree in some places.
Another example is PlannerInfo. Can we really control all the
consequences of introducing, let's say, a new JoinDomain entity?

You also mentioned 2024.pgconf.dev. Considering the current migration
policy in some countries, it would be better to work through the online
presence as equivalent to offline. Without an online part of the
conference, the only way to communicate and discuss is through this
mailing list.

--
regards,
Andrei Lepikhov

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 14:44:33
Message-ID:	CA+TgmoZ66x=uWasH+O40a=G=fpbXN1ANEdGVunvR8BKuekqa2g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> Having no objective negative feedback, we have no reason to change
> anything in the design or any part of the code. It looks regrettable and
> unusual.

To me, this sounds like you think it's someone else's job to tell you
what is wrong with the patch, or how to fix it, and if they don't,
then you should get to have the patch as part of PostgreSQL. But that
is not how we do things, nor should we. I agree that it sucks when you
need feedback and don't get it, and I've written about that elsewhere
and recently. But if you don't get feedback and as a result you can't
get the patch to an acceptable level, or if you do get feedback but
the patch fails to reach an acceptable level anyway, then the only
correct decision is for us to not ship that code. That obviously sucks
from the point of view of the patch author, and also of the committer,
but consider the alternative. Once patches get through an initial
release and become part of the product, the responsibility for fixing
problems is understood to slowly move from the original committer to
the community as a whole. In practice, that means that a lot of the
work of fixing things that are broken, after some initial period, ends
up falling on committers other than the person who did the initial
commit. Even one or two problematic commits can generate an enormous
amount of work for people who weren't involved in the original
development and may not even have agreed with the development
direction, and it is more than fair for those people to express a view
about whether they are willing to carry that burden or not. When they
aren't, I do think that's regrettable, but I don't think it's unusual.
Just in this release, we've removed at least two previously-released
features because they're in bad shape and nobody's willing to maintain
them (snapshot too old, AIX support).

> After designing the feature, fixing its bugs, and reviewing joint
> patches on the commitfest, the question more likely lies in the planner
> design. For example, I wonder if anyone here knows why exactly the
> optimiser makes a copy of the whole query subtree in some places.
> Another example is PlannerInfo. Can we really control all the
> consequences of introducing, let's say, a new JoinDomain entity?

Bluntly, if you can't control those consequences, then you aren't
allowed to make that change.

I know first-hand how difficult some of these problems are. Sometime
in the last year or three, I spent weeks getting rid of ONE global
variable (ThisTimeLineID). It took an absolutely inordinate amount of
time, and it became clear to me that I was never going to get rid of
enough global variables in that part of the code to be able to write a
patch for the feature I wanted without risk of unforeseen
consequences. So I gave up on the entire feature. Maybe I'll try again
at some point, or maybe somebody else will feel like cleaning up that
code and then I can try again with a cleaner base, but what I don't
get to do is write a buggy patch for the feature I want and commit it
anyway. I either figure out a way to do it that I believe is low-risk
and that the community judges to be acceptable, or I don't do it.

I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner. That is not to say that I don't approve generally of the idea
of committing things earlier in the release cycle: I certainly do. It
gives us more time to shake out problems with patches before we ship.
But it only makes sense if we collectively believe that the patch is
mostly correct, and only needs fine-tuning, and I think there are good
reasons to believe that we shouldn't have that level of confidence in
this case.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 15:17:08
Message-ID:	Zjj0dNaeNNgJjgas@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 6, 2024 at 10:44:33AM -0400, Robert Haas wrote:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner. That is not to say that I don't approve generally of the idea
> of committing things earlier in the release cycle: I certainly do. It
> gives us more time to shake out problems with patches before we ship.
> But it only makes sense if we collectively believe that the patch is
> mostly correct, and only needs fine-tuning, and I think there are good
> reasons to believe that we shouldn't have that level of confidence in
> this case.

I think what Robert is saying is that it is an unacceptable plan to just
dump the code into PG 18 and clean it up in the following months --- it
needs more research before it is re-added to git.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 15:54:45
Message-ID:	3622801.1715010885@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner.

FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier. I had other things on my mind
for most of last year, and not enough bandwidth to help.

The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure. (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)

Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation). While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability. So I think we need to
look harder at finding a way around that. Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned? If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward. Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.

regards, tom lane

[1] /message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ(at)mail(dot)gmail(dot)com

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 16:01:39
Message-ID:	d94099ba-0a4c-4259-b621-d45923b65c73@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/5/2024 21:44, Robert Haas wrote:
> On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> Having no objective negative feedback, we have no reason to change
>> anything in the design or any part of the code. It looks regrettable and
>> unusual.
>
> To me, this sounds like you think it's someone else's job to tell you
> what is wrong with the patch, or how to fix it, and if they don't,
> then you should get to have the patch as part of PostgreSQL. But that
> is not how we do things, nor should we. I agree that it sucks when you
> need feedback and don't get it, and I've written about that elsewhere
> and recently. But if you don't get feedback and as a result you can't
> get the patch to an acceptable level,
I'm really sorry that the level of my language caused a misunderstanding.
The main purpose of this work is to form a more or less certain view of
the direction of the planner's development.
Right now, it evolves extensively - new structures, variables,
alternative copies of the same node trees with slightly changed
properties ... This way allows us to quickly introduce some planning
features (a lot of changes in planner logic since PG16 is evidence of
that) and with still growing computing resources it allows postgres to
fit RAM and proper planning time. But maybe we want to be more modest?
The Ashutosh's work he has been doing this year shows how sometimes
expensive the planner is. Perhaps we want machinery that will check the
integrity of planning data except the setrefs, which fail to detect that
occasionally?
If an extensive approach is the only viable option, then it's clear that
this and many other features are simply not suitable for Postgres
Planner. It's disheartening that this patch didn't elicit such
high-level feedback.

--
regards,
Andrei Lepikhov

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 16:24:41
Message-ID:	CA+TgmobySyaQk5M13PZT4UXswioiXYcYW23JXFNJFMowwwAwrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 꽁 머니SQL

On Mon, May 6, 2024 at 12:01 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> Right now, it evolves extensively - new structures, variables,
> alternative copies of the same node trees with slightly changed
> properties ... This way allows us to quickly introduce some planning
> features (a lot of changes in planner logic since PG16 is evidence of
> that) and with still growing computing resources it allows postgres to
> fit RAM and proper planning time. But maybe we want to be more modest?
> The Ashutosh's work he has been doing this year shows how sometimes
> expensive the planner is. Perhaps we want machinery that will check the
> integrity of planning data except the setrefs, which fail to detect that
> occasionally?
> If an extensive approach is the only viable option, then it's clear that
> this and many other features are simply not suitable for Postgres
> Planner. It's disheartening that this patch didn't elicit such
> high-level feedback.

Well, as I said before, I think self-join elimination is a good
feature, and I believe that it belongs in PostgreSQL. However, I don't
believe that this implementation was done as well as it needed to be
done. A great deal of the work involved in a feature like this lies in
figuring out at what stage of processing certain kinds of
transformations ought to be done, and what cleanup is needed
afterward. It is difficult for anyone to get that completely right the
first time around; left join elimination also provoked a series of
after-the-fact bug fixes. However, I think those were fewer in number
and spread over a longer period of time.

Now that being said, I do also agree that the planner code is quite
hard to understand, for various reasons. I don't think the structure
of that code and the assumptions underlying it are as well-documented
as they could be, and neither do I think that all of them are optimal.
It has taken me a long time to learn as much as I know, and there is
still quite a lot that I don't know. And I also agree that the planner
does an unfortunate amount of in-place modification of existing
structures without a lot of clarity about how it all works, and an
unfortunate amount of data copying in some places, and even that the
partition-wise join code isn't all that it could be. But I do not
think that adds up to a conclusion that we should just be less
ambitious with planner changes. Indeed, I would like to see us do
more. There is certainly a lot of useful work that could be done. The
trick is figuring out how to do it without breaking too many things,
and that is not easy.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 17:08:35
Message-ID:	ZjkOk6CO378thxa0@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 6, 2024 at 12:24:41PM -0400, Robert Haas wrote:
> Now that being said, I do also agree that the planner code is quite
> hard to understand, for various reasons. I don't think the structure
> of that code and the assumptions underlying it are as well-documented
> as they could be, and neither do I think that all of them are optimal.
> It has taken me a long time to learn as much as I know, and there is
> still quite a lot that I don't know. And I also agree that the planner
> does an unfortunate amount of in-place modification of existing
> structures without a lot of clarity about how it all works, and an
> unfortunate amount of data copying in some places, and even that the
> partition-wise join code isn't all that it could be. But I do not
> think that adds up to a conclusion that we should just be less
> ambitious with planner changes. Indeed, I would like to see us do
> more. There is certainly a lot of useful work that could be done. The
> trick is figuring out how to do it without breaking too many things,
> and that is not easy.

I agree with Robert. While writting the Postgres 17 release notes, I am
excited to see the many optimizer improvements, and removing self-joins
from that list will be unfortunate.

I did write a blog entry in 2021 that suggested we could have
optimizer aggressiveness control to allow for more expensive
optimizations:

https://momjian.us/main/blogs/pgblog/2021.html#May_14_2021

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 19:08:40
Message-ID:	CAPpHfdv5nOV1DjeXD4+cTMPdM2soGehCCMR2=F8WW-8h7Nwqpw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 6, 2024 at 6:54 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I want to go on record right now as disagreeing with the plan proposed
> > in the commit message for the revert commit, namely, committing this
> > again early in the v18 cycle. I don't think Tom would have proposed
> > reverting this feature unless he believed that it had more serious
> > problems than could be easily fixed in a short period of time. I think
> > that concern is well-founded, given the number of fixes that were
> > committed. It seems likely that the patch needs significant rework and
> > stabilization before it gets committed again, and I think it shouldn't
> > be committed again without explicit agreement from Tom or one of the
> > other committers who have significant experience with the query
> > planner.
>
> FWIW I accept some of the blame here, for not having paid any
> attention to the SJE work earlier. I had other things on my mind
> for most of last year, and not enough bandwidth to help.
>
> The main thing I'd like to understand before we try this again is
> why SJE needed so much new query-tree-manipulation infrastructure.
> I would have expected it to be very similar to the left-join
> elimination we do already, and therefore to mostly just share the
> existing infrastructure. (I also harbor suspicions that some of
> the new code existed just because someone didn't research what
> was already there --- for instance, the now-removed replace_varno
> sure looks like ChangeVarNodes should have been used instead.)

Thank you for pointing this. This area certainly requires more investigation.

> Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> result relation). While I don't claim that that destroyed the entire
> use case for SJE, it certainly knocked its usefulness down by many
> notches, maybe even to the point where it's not worth putting in the
> effort needed to get it to re-committability. So I think we need to
> look harder at finding a way around that. Is the concern that
> RETURNING should return either old or new values depending on which
> RTE is mentioned? If so, maybe the feature Dean has proposed to
> allow RETURNING to access old values [1] is a prerequisite to moving
> forward. Alternatively, perhaps it'd be good enough to forbid SJE
> only when the non-target relation is actually mentioned in RETURNING.

Another problem is EPQ. During EPQ, we use most recent tuples for the
target relation and snapshot-satisfying tuples for joined relations.
And that affects RETURNING as well. If we need to return values for
joined relation, that wouldn't be old values, but values of
snapshot-satisfying tuple which might be even older.

Proper support of this looks like quite amount of work for me.
Committing SJE to v18 with this looks challenging. AFICS, going this
way would require substantial help from you.

------
Regards,
Alexander Korotkov
Supabase

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 19:27:28
Message-ID:	CAPpHfduadUGtqCGGnTmQTSQMPhjGKokDxiWd6h++LcotH92XHg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

On Mon, May 6, 2024 at 5:44 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner. That is not to say that I don't approve generally of the idea
> of committing things earlier in the release cycle: I certainly do. It
> gives us more time to shake out problems with patches before we ship.
> But it only makes sense if we collectively believe that the patch is
> mostly correct, and only needs fine-tuning, and I think there are good
> reasons to believe that we shouldn't have that level of confidence in
> this case.

I agree it was a hurry to put the plan into commit message. I think
Tom already gave valuable feedback [1] and probably we will get more.
So, plan is to be decided. One way or the other I'm not going to
re-commit this without explicit Tom's consent.

Links.
1. /message-id/3622801.1715010885%40sss.pgh.pa.us

------
Regards,
Alexander Korotkov
Supabase

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-05-06 19:59:43
Message-ID:	CA+Tgmoaduj5=X3F-f6pG_TQZyUkTA9nW4djtTyOmjndQ_sn=TQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 6, 2024 at 3:27 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> I agree it was a hurry to put the plan into commit message. I think
> Tom already gave valuable feedback [1] and probably we will get more.
> So, plan is to be decided. One way or the other I'm not going to
> re-commit this without explicit Tom's consent.

Thanks. I hope we find a way to make it happen.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-06-13 03:45:38
Message-ID:	55f680bc-756d-4dd3-ab27-3c6e663b0e4c@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/7/24 02:59, Robert Haas wrote:
> On Mon, May 6, 2024 at 3:27 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> I agree it was a hurry to put the plan into commit message. I think
>> Tom already gave valuable feedback [1] and probably we will get more.
>> So, plan is to be decided. One way or the other I'm not going to
>> re-commit this without explicit Tom's consent.
>
> Thanks. I hope we find a way to make it happen.
>
Rebased onto current master. Nothing new except rebase-related changes
and some comment fixes.

--
regards,
Andrei Lepikhov
Postgres Professional

Attachment	Content-Type	Size
0001-Remove-useless-self-joins-v49.patch	text/x-patch	117.3 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-06-16 16:42:26
Message-ID:	CAPpHfduGrHEz2rsHRYKaQdg=sk+g-B2JF0n=crEPJWaUM__QQg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg배트맨 토토SQL

Hi!

On Thu, Jun 13, 2024 at 6:45 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 5/7/24 02:59, Robert Haas wrote:
> > On Mon, May 6, 2024 at 3:27 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >> I agree it was a hurry to put the plan into commit message. I think
> >> Tom already gave valuable feedback [1] and probably we will get more.
> >> So, plan is to be decided. One way or the other I'm not going to
> >> re-commit this without explicit Tom's consent.
> >
> > Thanks. I hope we find a way to make it happen.
> >
> Rebased onto current master. Nothing new except rebase-related changes
> and some comment fixes.

Thank you. I've registered the patch on commitfest.

Do you plan to address the feedback from Tom Lane?

------
Regards,
Alexander Korotkov
Supabase

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-06-17 00:00:00
Message-ID:	CACJufxG3sqJKe1OskHhn7OCdtrEeeRFcD8R4TTQE+LGJEQaL9w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I want to go on record right now as disagreeing with the plan proposed
> > in the commit message for the revert commit, namely, committing this
> > again early in the v18 cycle. I don't think Tom would have proposed
> > reverting this feature unless he believed that it had more serious
> > problems than could be easily fixed in a short period of time. I think
> > that concern is well-founded, given the number of fixes that were
> > committed. It seems likely that the patch needs significant rework and
> > stabilization before it gets committed again, and I think it shouldn't
> > be committed again without explicit agreement from Tom or one of the
> > other committers who have significant experience with the query
> > planner.
>
> FWIW I accept some of the blame here, for not having paid any
> attention to the SJE work earlier. I had other things on my mind
> for most of last year, and not enough bandwidth to help.
>
> The main thing I'd like to understand before we try this again is
> why SJE needed so much new query-tree-manipulation infrastructure.
> I would have expected it to be very similar to the left-join
> elimination we do already, and therefore to mostly just share the
> existing infrastructure. (I also harbor suspicions that some of
> the new code existed just because someone didn't research what
> was already there --- for instance, the now-removed replace_varno
> sure looks like ChangeVarNodes should have been used instead.)
>

i have looked around the code.
about replace_varno and ChangeVarNodes:

ChangeVarNodes
have
````
if (IsA(node, RangeTblRef))
{
RangeTblRef *rtr = (RangeTblRef *) node;

if (context->sublevels_up == 0 &&
rtr->rtindex == context->rt_index)
rtr->rtindex = context->new_index;
/* the subquery itself is visited separately */
return false;
}
````
if ChangeVarNodes executed the above code in remove_useless_self_joins and
remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
*nremoved will be zero.
then the below code error branch will be executed.
````
joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
if (nremoved != 1)
elog(ERROR, "failed to find relation %d in joinlist", relid);
```

---------------------------------------------------------------------
replace_varno and replace_varno_walker didn't replace
Query->resultRelation, Query->mergeTargetRelation
as ChangeVarNodes did.

then replace_varno will have problems with DELETE, UPDATE, MERGE
someway.
ChangeVarNodes solved this problem.

if only SELECT, no worth to make it being committed,
do you think support DML but no support RETURNING worth the effort?

excerpt from [1] latest patch:
+/* Returning behavior for Vars in RETURNING list */
+typedef enum VarReturningType
+{
+ VAR_RETURNING_DEFAULT, /* return OLD for DELETE, else return NEW */
+ VAR_RETURNING_OLD, /* return OLD for DELETE/UPDATE, else NULL */
+ VAR_RETURNING_NEW, /* return NEW for INSERT/UPDATE, else NULL */
+} VarReturningType;
+

typedef struct Var
{
Expr xpr;
@@ -265,6 +278,9 @@ typedef struct Var
*/
Index varlevelsup;

+ /* returning type of this var (see above) */
+ VarReturningType varreturningtype;
--------------------------------------------
example. e.g.
explain(costs off)
WITH t1 AS (SELECT * FROM emp1) UPDATE emp1 SET code = t1.code + 1
FROM t1 WHERE t1.id = emp1.id RETURNING emp1.code, t1.code;

the returning (emp1.code,t1.code) these two var the VarReturningType
is VAR_RETURNING_DEFAULT.
That means the patch (support-returning-old-new-v9.patch in [1]) see
the RETURNING (emp1.code, t1.code) are two different table.column
references.
but here we need to transform it to
"RETURNING new.code, old.code", i think.
that would be way more harder.

>Alternatively, perhaps it'd be good enough to forbid SJE
> only when the non-target relation is actually mentioned in RETURNING.
i will try to explore this area. in this case would be
allow SJE apply to " RETURNING t1.code, t1.code".

I've attached 2 patches, based on the latest patch in this thread.
0001 mainly about replacing all replace_varno to ChangeVarNodes.
0002 makes SJE support for DML without RETURNING clause.

now SJE also works with updatable view. for example:
+CREATE TABLE sj_target (tid integer primary key, balance integer)
WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5,
50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;

+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON
t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)

> [1] /message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ(at)mail(dot)gmail(dot)com

Attachment	Content-Type	Size
v2-0002-make-SJE-to-apply-DML.patch	text/x-patch	22.3 KB
v2-0001-Remove-useless-self-joins.patch	text/x-patch	119.6 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-02 00:25:00
Message-ID:	CACJufxFNDukHU2WG7T61orXdgFJ5gEkSVWcgsiGbb8tRL5=DTQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 17, 2024 at 8:00 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> > Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> > result relation). While I don't claim that that destroyed the entire
> > use case for SJE, it certainly knocked its usefulness down by many
> > notches, maybe even to the point where it's not worth putting in the
> > effort needed to get it to re-committability. So I think we need to
> > look harder at finding a way around that. Is the concern that
> > RETURNING should return either old or new values depending on which
> > RTE is mentioned? If so, maybe the feature Dean has proposed to
> > allow RETURNING to access old values [1] is a prerequisite to moving
> > forward. Alternatively, perhaps it'd be good enough to forbid SJE
> > only when the non-target relation is actually mentioned in RETURNING.
> >
> > regards, tom lane
> >
> > [1] /message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ(at)mail(dot)gmail(dot)com

attached patchset:
v3-0001-Remove-useless-self-joins.patch
v3-0002-make-SJE-to-apply-DML-MERGE-UPDATE-INSERT-DELETE.patch
v3-0003-use-SJE-conditionally-with-delete-update-merge-RE.patch

v3-0001: based on [1] patch, replace all replace_varno to ChangeVarNodes.
v3-0002: make SJE apply to UPDATE/DELETE/MERGE
v3-0003: make SJE apply to UPDATE/DELETE/MERGE RETURNING conditionally
v3-0001, v3-0002 content, reasoning is the same as the previous thread[2].
v3-0003 is new.
to make sure it's correct, I have added a lot of tests,
Some of this may be contrived, maybe some of the tests are redundant.

[1]: https://postgr.es/m/55f680bc-756d-4dd3-ab27-3c6e663b0e4c%40postgrespro.ru
[2]: https://postgr.es/m/CACJufxG3sqJKe1OskHhn7OCdtrEeeRFcD8R4TTQE%2BLGJEQaL9w%40mail.gmail.com

Attachment	Content-Type	Size
v3-0002-make-SJE-to-apply-DML-MERGE-UPDATE-INSERT-DELETE.patch	text/x-patch	24.7 KB
v3-0003-use-SJE-conditionally-with-delete-update-merge-RE.patch	text/x-patch	24.2 KB
v3-0001-Remove-useless-self-joins.patch	text/x-patch	119.5 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-03 03:39:09
Message-ID:	CAPpHfduM6X82ExT0r9UzFLJ12wOYPvRw5vT2Htq0gAPBgHhKeQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 17, 2024 at 3:00 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > I want to go on record right now as disagreeing with the plan proposed
> > > in the commit message for the revert commit, namely, committing this
> > > again early in the v18 cycle. I don't think Tom would have proposed
> > > reverting this feature unless he believed that it had more serious
> > > problems than could be easily fixed in a short period of time. I think
> > > that concern is well-founded, given the number of fixes that were
> > > committed. It seems likely that the patch needs significant rework and
> > > stabilization before it gets committed again, and I think it shouldn't
> > > be committed again without explicit agreement from Tom or one of the
> > > other committers who have significant experience with the query
> > > planner.
> >
> > FWIW I accept some of the blame here, for not having paid any
> > attention to the SJE work earlier. I had other things on my mind
> > for most of last year, and not enough bandwidth to help.
> >
> > The main thing I'd like to understand before we try this again is
> > why SJE needed so much new query-tree-manipulation infrastructure.
> > I would have expected it to be very similar to the left-join
> > elimination we do already, and therefore to mostly just share the
> > existing infrastructure. (I also harbor suspicions that some of
> > the new code existed just because someone didn't research what
> > was already there --- for instance, the now-removed replace_varno
> > sure looks like ChangeVarNodes should have been used instead.)
> >
>
> i have looked around the code.
> about replace_varno and ChangeVarNodes:
>
> ChangeVarNodes
> have
> ````
> if (IsA(node, RangeTblRef))
> {
> RangeTblRef *rtr = (RangeTblRef *) node;
>
> if (context->sublevels_up == 0 &&
> rtr->rtindex == context->rt_index)
> rtr->rtindex = context->new_index;
> /* the subquery itself is visited separately */
> return false;
> }
> ````
> if ChangeVarNodes executed the above code in remove_useless_self_joins and
> remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
> to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
> *nremoved will be zero.
> then the below code error branch will be executed.
> ````
> joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
> if (nremoved != 1)
> elog(ERROR, "failed to find relation %d in joinlist", relid);
> ```

Did you manage to overcome this problem in your patch? If not, why do
regression tests pass while this seems to affect pretty much every
self-join removal? If so, how did you do that?

>
> ---------------------------------------------------------------------
> replace_varno and replace_varno_walker didn't replace
> Query->resultRelation, Query->mergeTargetRelation
> as ChangeVarNodes did.
>
> then replace_varno will have problems with DELETE, UPDATE, MERGE
> someway.
> ChangeVarNodes solved this problem.
>
>
> > Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> > result relation). While I don't claim that that destroyed the entire
> > use case for SJE, it certainly knocked its usefulness down by many
> > notches, maybe even to the point where it's not worth putting in the
> > effort needed to get it to re-committability. So I think we need to
> > look harder at finding a way around that. Is the concern that
> > RETURNING should return either old or new values depending on which
> > RTE is mentioned? If so, maybe the feature Dean has proposed to
> > allow RETURNING to access old values [1] is a prerequisite to moving
> > forward. Alternatively, perhaps it'd be good enough to forbid SJE
> > only when the non-target relation is actually mentioned in RETURNING.
> >
> > regards, tom lane
> >
> > [1] /message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ(at)mail(dot)gmail(dot)com
>
> if only SELECT, no worth to make it being committed,
> do you think support DML but no support RETURNING worth the effort?

It appears you didn't try to address the EPQ problem, which seems to
me even more serious than the RETURNING problem.

See the following example.

Session 1
# create table test (id int primary key, val int);
# insert into test values (1,1);
# begin;
# update test set val = val + 1 where id = 1;

Session 2
# update test set val = t.val + 1 from test t where test.id = t.id;
(wait)

Session 1
# commit;

With v3 patch the query of session 2 fails on assert even before
starting to wait for the tuple lock. But even if we fix that, I
expect that after SJE this example would result in val = 3. Without
SJE, it would result with val = 2, because during EPQ alias t still
references the row version read according to the snapshot. In order
to overcome that we need to distinguish Var, which points to the
latest version during EPQ, and Var, which points to the snapshot
version during EPQ. Probably I'm wrong, but this change seems to have
quite terrific complexity.

The way to workaround that could be to apply SJE for target relations
only on REPEATABLE READ or SERIALIZABLE. But that would require
introducing dependency of query planning on the isolation level. I'm
not quite happy with this already. Also, this would lead to very
non-obvious user-visible gotcha that data-modification queries have
better performance on higher isolation levels.

BTW, I don't think SJE for just SELECTs doesn't make sense. For me,
it seems reasonable to take a step-by-step approach to first nail down
SJE just for SELECTs (that looks enough challenge for a single
release). And then attack the problems we have with data-modification
queries.

Links.
1. /message-id/flat/2285d5d0-f330-e8b6-9ee5-b2b90e44409b%40postgrespro.ru#4a665faeb7cca32979d7e389fe89c97c

------
Regards,
Alexander Korotkov
Supabase

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-04 02:14:00
Message-ID:	CACJufxFOuWQecXxWdp1RQnhoznqaDYo7jBRLOHmr7HG6hbugyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 3, 2024 at 11:39 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Mon, Jun 17, 2024 at 3:00 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> > On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >
> > > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > > I want to go on record right now as disagreeing with the plan proposed
> > > > in the commit message for the revert commit, namely, committing this
> > > > again early in the v18 cycle. I don't think Tom would have proposed
> > > > reverting this feature unless he believed that it had more serious
> > > > problems than could be easily fixed in a short period of time. I think
> > > > that concern is well-founded, given the number of fixes that were
> > > > committed. It seems likely that the patch needs significant rework and
> > > > stabilization before it gets committed again, and I think it shouldn't
> > > > be committed again without explicit agreement from Tom or one of the
> > > > other committers who have significant experience with the query
> > > > planner.
> > >
> > > FWIW I accept some of the blame here, for not having paid any
> > > attention to the SJE work earlier. I had other things on my mind
> > > for most of last year, and not enough bandwidth to help.
> > >
> > > The main thing I'd like to understand before we try this again is
> > > why SJE needed so much new query-tree-manipulation infrastructure.
> > > I would have expected it to be very similar to the left-join
> > > elimination we do already, and therefore to mostly just share the
> > > existing infrastructure. (I also harbor suspicions that some of
> > > the new code existed just because someone didn't research what
> > > was already there --- for instance, the now-removed replace_varno
> > > sure looks like ChangeVarNodes should have been used instead.)
> > >
> >
> > i have looked around the code.
> > about replace_varno and ChangeVarNodes:
> >
> > ChangeVarNodes
> > have
> > ````
> > if (IsA(node, RangeTblRef))
> > {
> > RangeTblRef *rtr = (RangeTblRef *) node;
> >
> > if (context->sublevels_up == 0 &&
> > rtr->rtindex == context->rt_index)
> > rtr->rtindex = context->new_index;
> > /* the subquery itself is visited separately */
> > return false;
> > }
> > ````
> > if ChangeVarNodes executed the above code in remove_useless_self_joins and
> > remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
> > to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
> > *nremoved will be zero.
> > then the below code error branch will be executed.
> > ````
> > joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
> > if (nremoved != 1)
> > elog(ERROR, "failed to find relation %d in joinlist", relid);
> > ```
>
> Did you manage to overcome this problem in your patch? If not, why do
> regression tests pass while this seems to affect pretty much every
> self-join removal? If so, how did you do that?
>

in remove_self_join_rel, i have
```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
Immediately after this call, I wrote a function (restore_rangetblref)
to restore the joinlist as original (1,2).
then remove_rel_from_joinlist won't error out.
see remove_self_join_rel, restore_rangetblref.

Andrei Lepikhov:
+ /* Replace varno in all the query structures */
+ replace_varno((Node *) root->parse, toRemove->relid, toKeep->relid);
So Andrei Lepikhov's change didn't touch joinlist,
Query->resultRelation, Query->mergeTargetRelation.

Then in v3-0002 I tried to make SJE work with UPDATE, i thought it worked well,
because ChangeVarNodes also takes care of Query->resultRelation,
Query->mergeTargetRelation.
then later your EPQ demenonsate shows that's not enough.

so, in summary, in v3-0001, by changing all replace_varno to ChangeVarNodes
paves ways to make SJE apply to UPDATE/DELETE/MERGE.
It's just that we need to reverse some effects of ChangeVarNodes.
(restore_rangetblref)

> > > Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> > > result relation). While I don't claim that that destroyed the entire
> > > use case for SJE, it certainly knocked its usefulness down by many
> > > notches, maybe even to the point where it's not worth putting in the
> > > effort needed to get it to re-committability. So I think we need to
> > > look harder at finding a way around that. Is the concern that
> > > RETURNING should return either old or new values depending on which
> > > RTE is mentioned? If so, maybe the feature Dean has proposed to
> > > allow RETURNING to access old values [1] is a prerequisite to moving
> > > forward. Alternatively, perhaps it'd be good enough to forbid SJE
> > > only when the non-target relation is actually mentioned in RETURNING.
> > >
>
> It appears you didn't try to address the EPQ problem, which seems to
> me even more serious than the RETURNING problem.
>
> See the following example.
>
> Session 1
> # create table test (id int primary key, val int);
> # insert into test values (1,1);
> # begin;
> # update test set val = val + 1 where id = 1;
>
> Session 2
> # update test set val = t.val + 1 from test t where test.id = t.id;
> (wait)
>
> Session 1
> # commit;
>
current mechanism, in this example context,
SJE can translate ```update test set val = t.val + 1 from test t where
test.id = t.id;``` as good as to
```update test set val = val + 1```.
if we replace it that way, then this example would result val = 3.

but without SJE,
```update test set val = t.val + 1 from test t where test.id = t.id;```
will result val = 2.

you mentioned the EPQ problem, previously i don't know what that means.
now i see, I feel like it is quite challenging to resolve it.

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-04 03:04:19
Message-ID:	CAPpHfdtpLv98ydqRhWAa9QLmh6m_cqmBy5YZT0Scax9FGZrh6Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 4, 2024 at 5:15 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> in remove_self_join_rel, i have
> ```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
> which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
> Immediately after this call, I wrote a function (restore_rangetblref)
> to restore the joinlist as original (1,2).
> then remove_rel_from_joinlist won't error out.
> see remove_self_join_rel, restore_rangetblref.

Thank you, now this is clear. Could we add additional parameters to
ChangeVarNodes() instead of adding a new function which reverts part
of changes.

> current mechanism, in this example context,
> SJE can translate ```update test set val = t.val + 1 from test t where
> test.id = t.id;``` as good as to
> ```update test set val = val + 1```.
> if we replace it that way, then this example would result val = 3.
>
> but without SJE,
> ```update test set val = t.val + 1 from test t where test.id = t.id;```
> will result val = 2.
>
> you mentioned the EPQ problem, previously i don't know what that means.

Yes, I guessed so. I should have come with more detailed explanation.

> now i see, I feel like it is quite challenging to resolve it.

Yep. Glad to see we are on the same page.

This is why I think we could leave SJE for target relation of
modification queries for future. I'd like to not devalue SELECT-only
SJE, given that this is a step forward anyway.

------
Regards,
Alexander Korotkov
Supabase

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-04 08:40:00
Message-ID:	CACJufxEhyvaGQwGnVczCA52F64YcUrJBNDJ2Td7QeR1b5wUrpQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 4, 2024 at 11:04 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Thu, Jul 4, 2024 at 5:15 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> > in remove_self_join_rel, i have
> > ```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
> > which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
> > Immediately after this call, I wrote a function (restore_rangetblref)
> > to restore the joinlist as original (1,2).
> > then remove_rel_from_joinlist won't error out.
> > see remove_self_join_rel, restore_rangetblref.
>
> Thank you, now this is clear. Could we add additional parameters to
> ChangeVarNodes() instead of adding a new function which reverts part
> of changes.
>

I didn't dare to. we have 42 occurrences of ChangeVarNodes.
adding a parameter to it only for one location seems not intuitive.

Now I have tried.
changing to
`ChangeVarNodes(Node *node, int rt_index, int new_index, int
sublevels_up, bool change_RangeTblRef)`

/* Replace varno in all the query structures */
ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
```

it seems to work, pass the regression test.
```ChangeVarNodes((Node *) root->parse, toRemove->relid,
toKeep->relid, 0, false);```
is in remove_self_join_rel, remove_self_joins_one_group,
remove_self_joins_recurse.
all other places are ```ChangeVarNodes((Node *) root->parse,
toRemove->relid, toKeep->relid, 0, true);```
so ChangeVarNodes add a parameter will only influence the SJE feature.

I also tried, save to a temp list, but it did not work.
original_fromlist = list_copy_deep(root->parse->jointree->fromlist);
/* Replace varno in all the query structures */
ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);
root->parse->jointree->fromlist = list_copy(original_fromlist);

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-08 09:10:57
Message-ID:	CAPpHfdvBddujLDhf7TQP-djeKoG5oyFBEoLSGRsjHfGrcNFkDg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jul 4, 2024 at 11:40 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> On Thu, Jul 4, 2024 at 11:04 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >
> > On Thu, Jul 4, 2024 at 5:15 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> > > in remove_self_join_rel, i have
> > > ```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
> > > which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
> > > Immediately after this call, I wrote a function (restore_rangetblref)
> > > to restore the joinlist as original (1,2).
> > > then remove_rel_from_joinlist won't error out.
> > > see remove_self_join_rel, restore_rangetblref.
> >
> > Thank you, now this is clear. Could we add additional parameters to
> > ChangeVarNodes() instead of adding a new function which reverts part
> > of changes.
> >
>
> I didn't dare to. we have 42 occurrences of ChangeVarNodes.
> adding a parameter to it only for one location seems not intuitive.
>
> Now I have tried.
> changing to
> `ChangeVarNodes(Node *node, int rt_index, int new_index, int
> sublevels_up, bool change_RangeTblRef)`
>
> /* Replace varno in all the query structures */
> ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
> ```
>
> it seems to work, pass the regression test.
> ```ChangeVarNodes((Node *) root->parse, toRemove->relid,
> toKeep->relid, 0, false);```
> is in remove_self_join_rel, remove_self_joins_one_group,
> remove_self_joins_recurse.
> all other places are ```ChangeVarNodes((Node *) root->parse,
> toRemove->relid, toKeep->relid, 0, true);```
> so ChangeVarNodes add a parameter will only influence the SJE feature.

Good. But I think it's not necessary to to replace function signature
in all the 42 occurrences. This will make our patch unnecessarily
conflict with others. Instead we can have two functions
ChangeVarNodes(original function signature) and
ChangeVarNodesExtended(extended function signature). Then existing
occurrences can still use ChangeVarNodes(), which will be just
shortcut for ChangeVarNodesExtended().

------
Regards,
Alexander Korotkov
Supabase

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-09 06:06:10
Message-ID:	96250a42-20e3-40f0-9d45-f53ae852f8ed@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7/2/24 07:25, jian he wrote:
> to make sure it's correct, I have added a lot of tests,
> Some of this may be contrived, maybe some of the tests are redundant.
Thanks for your job!
I passed through the patches and have some notes:
1. Patch 0001 has not been applied anymore since the previous week's
changes in the core. Also, there is one place with trailing whitespace.

Looking into the 0002 and 0003 patches, I think they 1) should be merged
and 2) It makes sense to use the already existing pull_varnos_of_level
routine instead of a new walker. See the patches in the attachment as a
sketch.
Also, I'm not sure about the tests. It looks like we have a lot of new
tests.

However, the main issue mentioned above is the correctness of relid
replacement in planner structures.
We have the machinery to check and replace relids in a Query. But
PlannerInfo is a bin for different stuff that the optimisation needs to
convert the parse tree to the corresponding cloud of paths.
A good demo of the problem is the introduction of the JoinDomain structure:
It contains a relids field and has no tests for that. We haven't known
for a long time about the issue of SJE not replacing the relid in this
structure.
The approach with 'Relation Alias' mentioned by Alexander raises many
hard questions about accessing simple_rel_array directly or, much worse,
about checking the scope of some clause where we didn't touch
RelOptInfo, just compare two relids fields.
The safest decision would be to restart query planning over parse tree
with removed self-joins, but now planner code isn't ready for that yet.
But maybe we should put this problem on the shoulders of a developer and
made something like with nodes: perl script which will generate walker
switch over PlannerInfo structure?

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
v4-0001-Remove-useless-self-joins.patch	text/x-patch	121.0 KB
v4-0002-Apply-SJE-to-DML-queries-with-RETURNING-clause.patch	text/x-patch	42.3 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-11 07:43:00
Message-ID:	CACJufxF-ubw739W+33_rFSA+YTdRTDcKwVidKwXAS8NCy4nx+Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>
> On 7/2/24 07:25, jian he wrote:
> > to make sure it's correct, I have added a lot of tests,
> > Some of this may be contrived, maybe some of the tests are redundant.
> Thanks for your job!
> I passed through the patches and have some notes:
> 1. Patch 0001 has not been applied anymore since the previous week's
> changes in the core. Also, there is one place with trailing whitespace.

thanks.
because the previous thread mentioned the EPQ problem.
in remove_useless_self_joins, i make it can only process CMD_SELECT query.

also thanks to Alexander Korotkov's tip.
I added a bool change_RangeTblRef to ChangeVarNodes_context.
default is true, so won't influence ChangeVarNodes.
After that create a function ChangeVarNodesExtended.
so now replace_varno replaced by ChangeVarNodes.

now in ChangeVarNodes_walker we've add:
```
if (IsA(node, RestrictInfo))
{
RestrictInfo *rinfo = (RestrictInfo *) node;
int relid = -1;
bool is_req_equal =
(rinfo->required_relids == rinfo->clause_relids);
bool clause_relids_is_multiple =
(bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
...
}
```
but this part, we don't have much comments, adding some comments would be good.
but I am not sure how.

static bool
match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
Index relid)
but actually we call it via:
if (!match_unique_clauses(root, inner, uclauses, outer->relid))

I am not sure whether the second argument is "inner" or "outer".
Maybe it will cause confusion.
same with innerrel_is_unique_ext.

/*
* At this stage, joininfo lists of inner and outer can contain
* only clauses, required for a superior outer join that can't
* influence this optimization. So, we can avoid to call the
* build_joinrel_restrictlist() routine.
*/
restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);
build_joinrel_restrictlist require joinrel, innerrel, outrel, but here
we only have innerrel, outterrel.
so i am confused with the comments.

i add following code snippets after generate_join_implied_equalities
```
if (restrictlist == NIL)
continue
```

I have some confusion with the comments.
/*
* Determine if the inner table can duplicate outer rows. We must
* bypass the unique rel cache here since we're possibly using a
* subset of join quals. We can use 'force_cache' == true when all
* join quals are self-join quals. Otherwise, we could end up
* putting false negatives in the cache.
*/
if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
outer, JOIN_INNER, selfjoinquals,
list_length(otherjoinquals) == 0,
&uclauses))
continue;

"unique rel cache", not sure the meaning, obviously, "relcache" has a
different meaning.
so i am slightly confused with
"We must bypass the unique rel cache here since we're possibly using a
subset of join quals"

i have refactored comments below
```
if (!match_unique_clauses(root, inner, uclauses, outer->relid))
```.
please check v5-0002 for comments refactor.

Attachment	Content-Type	Size
v5-0002-refactor-comments-in-remove_self_joins_one_group.patch	text/x-patch	1.6 KB
v5-0001-Remove-useless-self-joins.patch	text/x-patch	122.2 KB

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-12 03:05:15
Message-ID:	5b49501c-9cb3-4c5d-9d56-49704ff08143@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7/11/24 14:43, jian he wrote:
> On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>>
>> On 7/2/24 07:25, jian he wrote:
>>> to make sure it's correct, I have added a lot of tests,
>>> Some of this may be contrived, maybe some of the tests are redundant.
>> Thanks for your job!
>> I passed through the patches and have some notes:
>> 1. Patch 0001 has not been applied anymore since the previous week's
>> changes in the core. Also, there is one place with trailing whitespace.
>
> thanks.
> because the previous thread mentioned the EPQ problem.
> in remove_useless_self_joins, i make it can only process CMD_SELECT query.
I would like to oppose here: IMO, it is just a mishap which we made
because of a long history of patch transformations. There we lost the
case where RowMark exists for only one of candidate relations.
Also, after review I think we don't need so many new tests. Specifically
for DML we already have one:

EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;

And we should just add something to elaborate it a bit.
See the patch in attachment containing my proposal to improve v4-0001
main SJE patch. I think it resolved the issue with EPQ assertion as well
as problems with returning value.

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
v4-0002-Apply-SJE-to-DML-queries-2.patch	text/x-patch	6.9 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-12 10:30:35
Message-ID:	CAPpHfdvTHuhB5+Eohb-z7zWM6u9Y=W6vb8MMOqhfKeRkhVZGxg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Andrei!

On Fri, Jul 12, 2024 at 6:05 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>
> On 7/11/24 14:43, jian he wrote:
> > On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> >>
> >> On 7/2/24 07:25, jian he wrote:
> >>> to make sure it's correct, I have added a lot of tests,
> >>> Some of this may be contrived, maybe some of the tests are redundant.
> >> Thanks for your job!
> >> I passed through the patches and have some notes:
> >> 1. Patch 0001 has not been applied anymore since the previous week's
> >> changes in the core. Also, there is one place with trailing whitespace.
> >
> > thanks.
> > because the previous thread mentioned the EPQ problem.
> > in remove_useless_self_joins, i make it can only process CMD_SELECT query.
> I would like to oppose here: IMO, it is just a mishap which we made
> because of a long history of patch transformations. There we lost the
> case where RowMark exists for only one of candidate relations.
> Also, after review I think we don't need so many new tests. Specifically
> for DML we already have one:
>
> EXPLAIN (COSTS OFF)
> UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
>
> And we should just add something to elaborate it a bit.
> See the patch in attachment containing my proposal to improve v4-0001
> main SJE patch. I think it resolved the issue with EPQ assertion as well
> as problems with returning value.

I tried this. I applied 0001 from [1] and 0002 from [2]. Then I
tried the concurrent test case [3]. It still fails with assert for
me. But assert and related stuff is the least problem. The big
problem, as described in [3], is semantical change in query. When EPQ
is applied, we fetch the latest tuple of the target relation
regardless snapshot. But for the self-joined relation we should still
use the snapshot-satisfying tuple. I don't see even attempt to
address this in your patch. And as I pointed before, this appears
quite complex.

Links.
1. /message-id/96250a42-20e3-40f0-9d45-f53ae852f8ed%40gmail.com
2. /message-id/5b49501c-9cb3-4c5d-9d56-49704ff08143%40gmail.com
3. /message-id/CAPpHfduM6X82ExT0r9UzFLJ12wOYPvRw5vT2Htq0gAPBgHhKeQ%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-12 13:59:13
Message-ID:	CAPpHfdus-O3Ozpn+tFL8r5KPiedaBYSrLK_3BsvJSC7sVBi+_g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg사설 토토 사이트SQL

On Fri, Jul 12, 2024 at 1:30 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> On Fri, Jul 12, 2024 at 6:05 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> > On 7/11/24 14:43, jian he wrote:
> > > On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> > >>
> > >> On 7/2/24 07:25, jian he wrote:
> > >>> to make sure it's correct, I have added a lot of tests,
> > >>> Some of this may be contrived, maybe some of the tests are redundant.
> > >> Thanks for your job!
> > >> I passed through the patches and have some notes:
> > >> 1. Patch 0001 has not been applied anymore since the previous week's
> > >> changes in the core. Also, there is one place with trailing whitespace.
> > >
> > > thanks.
> > > because the previous thread mentioned the EPQ problem.
> > > in remove_useless_self_joins, i make it can only process CMD_SELECT query.
> > I would like to oppose here: IMO, it is just a mishap which we made
> > because of a long history of patch transformations. There we lost the
> > case where RowMark exists for only one of candidate relations.
> > Also, after review I think we don't need so many new tests. Specifically
> > for DML we already have one:
> >
> > EXPLAIN (COSTS OFF)
> > UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
> >
> > And we should just add something to elaborate it a bit.
> > See the patch in attachment containing my proposal to improve v4-0001
> > main SJE patch. I think it resolved the issue with EPQ assertion as well
> > as problems with returning value.
>
> I tried this. I applied 0001 from [1] and 0002 from [2]. Then I
> tried the concurrent test case [3]. It still fails with assert for
> me. But assert and related stuff is the least problem. The big
> problem, as described in [3], is semantical change in query. When EPQ
> is applied, we fetch the latest tuple of the target relation
> regardless snapshot. But for the self-joined relation we should still
> use the snapshot-satisfying tuple. I don't see even attempt to
> address this in your patch. And as I pointed before, this appears
> quite complex.

Oh, sorry, I used wrong binaries during the check. My test case works
correctly, because SJE doesn't apply to the target relation.

# explain update test set val = t.val + 1 from test t where test.id = t.id;
QUERY PLAN
-----------------------------------------------------------------------------
Update on test (cost=60.85..105.04 rows=0 width=0)
-> Hash Join (cost=60.85..105.04 rows=2260 width=16)
Hash Cond: (test.id = t.id)
-> Seq Scan on test (cost=0.00..32.60 rows=2260 width=10)
-> Hash (cost=32.60..32.60 rows=2260 width=14)
-> Seq Scan on test t (cost=0.00..32.60 rows=2260 width=14)
(6 rows)

Previously, patch rejected applying SJE for result relation, which as
I see now is wrong. Andrei's patch rejects SJE for target relation on
the base of row marks, which seems correct to me as the first glance.
So, this doesn't change anything regarding my conclusions regarding
applying SJE for target relation. But the Andrei's patch yet looks
good indeed.

------
Regards,
Alexander Korotkov
Supabase

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <lepihov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-15 05:31:33
Message-ID:	CACJufxHBLhOD1LerM643dgh=UZFGhPWfP1027D2x1W6DhF_BaQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hi.
Here is the latest patch (v6),
I've made the following changes.

* disallow original Query->resultRelation participate in SJE.
for SELECT, nothing is changed. for UPDATE/DELETE/MERGE
we can do:
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM (select s1.* from sj s1 join sj
s2 on s1.a = s2.a) as sz
WHERE sz.a = sq.a;

here, only "(select s1.* from sj s1 join sj s2 on s1.a = s2.a)" can
apply to SJE.

but for now we cannot apply SJE to
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a;

so the EPQ abnormality issue[1] won't happen.

* add a new function: ChangeVarNodesExtended for
address concerns in [2]

* cosmetic refactor remove_self_join_rel, match_unique_clauses,
split_selfjoin_quals functions.
changing some "foreach" to "foreach_node"

* refactor comments above (remove_self_joins_one_group ->> match_unique_clauses)
I am not 100% sure they are correct.

[1] https://postgr.es/m/flat/CAPpHfduM6X82ExT0r9UzFLJ12wOYPvRw5vT2Htq0gAPBgHhKeQ%40mail.gmail.com
[2] https://postgr.es/m/flat/CAPpHfdvBddujLDhf7TQP-djeKoG5oyFBEoLSGRsjHfGrcNFkDg%40mail.gmail.com

Attachment	Content-Type	Size
v6-0001-Remove-useless-self-joins.patch	application/x-patch	122.1 KB

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-15 06:08:43
Message-ID:	6da9bdb2-3ff2-4b8b-818a-8052772f2163@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg메이저 토토 사이트SQL

On 7/15/24 12:31, jian he wrote:
> hi.
> Here is the latest patch (v6),
> I've made the following changes.
>
> * disallow original Query->resultRelation participate in SJE.
> for SELECT, nothing is changed. for UPDATE/DELETE/MERGE
> we can do:
> EXPLAIN (COSTS OFF)
> UPDATE sj sq SET b = sq.b + sz.a FROM (select s1.* from sj s1 join sj
> s2 on s1.a = s2.a) as sz
> WHERE sz.a = sq.a;
>
> here, only "(select s1.* from sj s1 join sj s2 on s1.a = s2.a)" can
> apply to SJE.
>
> but for now we cannot apply SJE to
> EXPLAIN (COSTS OFF)
> UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a;
>
> so the EPQ abnormality issue[1] won't happen.
>
>
> * add a new function: ChangeVarNodesExtended for
> address concerns in [2]
I see you still stay with the code line:
if (omark && imark && omark->markType != imark->markType)

It is definitely an error. What if omark is NULL, but imark is not? Why
not to skip this pair of relids? Or, at least, insert an assertion to
check that you filtered it earlier.

--
regards, Andrei Lepikhov

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Re: Removing unneeded self joins
Date:	2024-07-15 07:35:06
Message-ID:	CACJufxFWiiiSzyB3B2MeMv3wxqTShHyw4mnwwAwGKC0K6VZM+A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 15, 2024 at 2:08 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>
> On 7/15/24 12:31, jian he wrote:
> > hi.
> > Here is the latest patch (v6),
> > I've made the following changes.
> >
> > * disallow original Query->resultRelation participate in SJE.
> > for SELECT, nothing is changed. for UPDATE/DELETE/MERGE
> > we can do:
> > EXPLAIN (COSTS OFF)
> > UPDATE sj sq SET b = sq.b + sz.a FROM (select s1.* from sj s1 join sj
> > s2 on s1.a = s2.a) as sz
> > WHERE sz.a = sq.a;
> >
> > here, only "(select s1.* from sj s1 join sj s2 on s1.a = s2.a)" can
> > apply to SJE.
> >
> > but for now we cannot apply SJE to
> > EXPLAIN (COSTS OFF)
> > UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a;
> >
> > so the EPQ abnormality issue[1] won't happen.
> >
> >
> > * add a new function: ChangeVarNodesExtended for
> > address concerns in [2]
> I see you still stay with the code line:
> if (omark && imark && omark->markType != imark->markType)
>
> It is definitely an error. What if omark is NULL, but imark is not? Why
> not to skip this pair of relids? Or, at least, insert an assertion to
> check that you filtered it earlier.
>

i think "omark is NULL, but imark is not" case won't reach to
remove_self_joins_one_group.
In that case, omark associated RangeTblEntry->rtekind will be RTE_SUBQUERY,
and will be skipped earlier in remove_self_joins_recurse.

Still, do you think the following code is the right way to go?

if ((omark == NULL && imark != NULL) ||
(omark != NULL && imark == NULL) ||
(omark && imark && omark->markType != imark->markType))
continue;

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-15 08:31:08
Message-ID:	3d9457cc-a440-48a5-bf44-8eb3f65316f7@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7/15/24 14:35, jian he wrote:
> On Mon, Jul 15, 2024 at 2:08 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>>
>> On 7/15/24 12:31, jian he wrote:
>>> hi.
>>> Here is the latest patch (v6),
>>> I've made the following changes.
>>>
>>> * disallow original Query->resultRelation participate in SJE.
>>> for SELECT, nothing is changed. for UPDATE/DELETE/MERGE
>>> we can do:
>>> EXPLAIN (COSTS OFF)
>>> UPDATE sj sq SET b = sq.b + sz.a FROM (select s1.* from sj s1 join sj
>>> s2 on s1.a = s2.a) as sz
>>> WHERE sz.a = sq.a;
>>>
>>> here, only "(select s1.* from sj s1 join sj s2 on s1.a = s2.a)" can
>>> apply to SJE.
>>>
>>> but for now we cannot apply SJE to
>>> EXPLAIN (COSTS OFF)
>>> UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a;
>>>
>>> so the EPQ abnormality issue[1] won't happen.
>>>
>>>
>>> * add a new function: ChangeVarNodesExtended for
>>> address concerns in [2]
>> I see you still stay with the code line:
>> if (omark && imark && omark->markType != imark->markType)
>>
>> It is definitely an error. What if omark is NULL, but imark is not? Why
>> not to skip this pair of relids? Or, at least, insert an assertion to
>> check that you filtered it earlier.
>>
>
> i think "omark is NULL, but imark is not" case won't reach to
> remove_self_joins_one_group.
> In that case, omark associated RangeTblEntry->rtekind will be RTE_SUBQUERY,
> and will be skipped earlier in remove_self_joins_recurse.
>
>
> Still, do you think the following code is the right way to go?
>
> if ((omark == NULL && imark != NULL) ||
> (omark != NULL && imark == NULL) ||
> (omark && imark && omark->markType != imark->markType))
> continue;
Sure, if query block needs RowMark it applies proper RowMark to each
base relation. All pull-up transformations executes before this code.
But it is worth to set Assert at the point to check that nothing changed
in the code above and the patch works correctly, am I wrong?

--
regards, Andrei Lepikhov

From:	Вардан Погосян <vardan(dot)pogosyn(at)yandex(dot)ru>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-16 14:26:40
Message-ID:	2581461721118083@mail.yandex.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg배트맨 토토SQL

<div><div><div>Hi!</div><div> </div><div>I did the SJE testing at Andrey's request.</div><div>To do this, I used the automatic testing tool EET (Equivalent Expression Transformation) [1] with some modifications. </div><div>EET transforms the logical conditions in a query, creating multiple queries waiting for the same number of rows.</div><div> </div><div>In order to make sure that the SJE logic is executed at all, I tried to cover the code with ereports() as much as possible.</div><div>During the testing process, I did not find any inconsistencies in the number of rows returned, as well as other critical problems.</div></div><div> </div><div> </div><div><div>[1] <a href="https://github.com/JZuming/EET" rel="noopener noreferrer" target="_blank">https://github.com/JZuming/EET</a></div><div> </div></div></div><div>-- </div><div>Regards,</div><div>Vardan Pogosyan</div><div> </div>

Attachment	Content-Type	Size
unknown_filename	text/html	902 bytes

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Вардан Погосян <vardan(dot)pogosyn(at)yandex(dot)ru>
Cc:	Andrei Lepikhov <lepihov(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-16 14:30:47
Message-ID:	CAPpHfdvs5FAmAdEryRU4MFx8EjJBNu-zBRoQpzuucM2qKYMv7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg사설 토토SQL

Hi, Vardan!

Great, thank you!

On Tue, Jul 16, 2024 at 5:26 PM Вардан Погосян <vardan(dot)pogosyn(at)yandex(dot)ru> wrote:
> I did the SJE testing at Andrey's request.
> To do this, I used the automatic testing tool EET (Equivalent Expression Transformation) [1] with some modifications.
> EET transforms the logical conditions in a query, creating multiple queries waiting for the same number of rows.

What revision of patch did you use?

> In order to make sure that the SJE logic is executed at all, I tried to cover the code with ereports() as much as possible.

Could you share this? Probably some of these ereports() we would like to keep.

> During the testing process, I did not find any inconsistencies in the number of rows returned, as well as other critical problems.

Did you use assert-enabled build? I guess you mean no server crashes,
right? Also, could you share some statistics on how long did you run,
what number of queries did you execute etc.?

------
Regards,
Alexander Korotkov
Supabase

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-17 00:45:07
Message-ID:	CAPpHfduRB9oW3YRDpvPGcDSAbH4iu0LSRhxpRRAzBh7euC9amQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Tom!

I'd like to give you and update on the progress with SJE.

Jian He gave a try to ChangeVarNodes() [1]. That gives some
improvement, but the vast majority of complexity is still here. I
think the reason for complexity of SJE is that it's the first time we
remove relation, which is actually *used* and therefore might has
references in awful a lot of places. In previous cases we removed
relations, which were actually unused.

There are actually alternative designs for this feature. I've
proposed "alias relids" before [2]. But it's not clear we will
resolve more problems than create, given that it could break awfully a
lot of assumptions during query planning. Andrei also proposed that
perl script could generate us a walker over planner structures [3].
Although this method might offer a structured approach, it seems like
overengineering for the problem at hand.

I believe it's worth giving the current approach another chance.
Vardan Pogosyan has conducted some tests, and I am in the process of
clarifying the details. We could enhance the approach by adding more
comments to ensure that any changes in the planner data structure are
flagged for potential revisions in the SJE code. What do you think?

As Andrei pointed it's possible to apply SJE to result relation [4],
but where it's not a target relation. I guess the target relation
case is what you're most interested. In this case we hit problem of
joining relation having different row marks. In turn that triggers
EPQ problem [5] and probably more. In order to resolve that we need a
way to store multiple (at least two, but sure if more is needed)
tuples for relation. I still feel that we should postpone that,
because even basic SJE without target relation support is challenging.

There is probably a way to implement target relation support for PG18
after committing basic SJE. But that would require a lot of your
design work and guidance. I don't dare to design this kind of things.

Links.
1. /message-id/CACJufxHBLhOD1LerM643dgh%3DUZFGhPWfP1027D2x1W6DhF_BaQ%40mail.gmail.com
2. /message-id/CAPpHfdv6B8HCLdj8WidBryRrX0%2BX3F1rrR8uAuMQmp6rvPdscg%40mail.gmail.com
3. /message-id/96250a42-20e3-40f0-9d45-f53ae852f8ed%40gmail.com
4. /message-id/5b49501c-9cb3-4c5d-9d56-49704ff08143%40gmail.com
5. /message-id/flat/CAPpHfduM6X82ExT0r9UzFLJ12wOYPvRw5vT2Htq0gAPBgHhKeQ%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

From:	Pogosyan Vardan <vardan(dot)pogosyn(at)yandex(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <lepihov(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-18 06:26:49
Message-ID:	65d99485-f20e-4e24-b2d6-20b18bba58c3@yandex.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 핫SQL :

On 16.07.2024 21:30, Alexander Korotkov wrote:
> Hi, Vardan!
>
> Great, thank you!
>
> On Tue, Jul 16, 2024 at 5:26 PM Вардан Погосян<vardan(dot)pogosyn(at)yandex(dot)ru> wrote:
>> I did the SJE testing at Andrey's request.
>> To do this, I used the automatic testing tool EET (Equivalent Expression Transformation) [1] with some modifications.
>> EET transforms the logical conditions in a query, creating multiple queries waiting for the same number of rows.
> What revision of patch did you use?

I used Andrey's v4 patches.

>> In order to make sure that the SJE logic is executed at all, I tried to cover the code with ereports() as much as possible.
> Could you share this? Probably some of these ereports() we would like to keep.

I'm not sure if it can be saved. Many reports are some messy and no
information that is really useful for the user.

>> During the testing process, I did not find any inconsistencies in the number of rows returned, as well as other critical problems.
> Did you use assert-enabled build? I guess you mean no server crashes,
> right? Also, could you share some statistics on how long did you run,
> what number of queries did you execute etc.?
>
> ------
> Regards,
> Alexander Korotkov
> Supabase

Of course, i used assertion-enabled build and server didn't fail.

EET ran 10 iterations with 100 random generated queries + 100
transformed queries => 2000 queries in total.

I can't calculate exactly how many clauses EET generates after
transformations, but according to rough estimates for all test time,
about 8-9% clauses detected as self join during the entire testing period.

The last query saved by EET contains 4 clauses, and after
transformations, the new query contains ~132 clauses. SJE was not used
in the original query, but it was used 5 times after the conversion.

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-19 08:30:07
Message-ID:	CAEZATCUeJ93gWCEzK1eRxyZakvv5PVG9LfF3PkSGMsNBC7h9kA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 17 Jul 2024 at 01:45, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> Jian He gave a try to ChangeVarNodes() [1]. That gives some
> improvement, but the vast majority of complexity is still here. I
> think the reason for complexity of SJE is that it's the first time we
> remove relation, which is actually *used* and therefore might has
> references in awful a lot of places. In previous cases we removed
> relations, which were actually unused.
>

I had a quick look at this, and I have a couple of comments on the
rewriter changes.

The new function replace_relid() looks to be the same as adjust_relid_set().

The changes to ChangeVarNodes() look a little messy. There's a lot of
code duplicated between ChangeVarNodesExtended() and ChangeVarNodes(),
which could be avoided by having one call the other. Also, it would be
better for ChangeVarNodesExtended() to have a "flags" parameter
instead of an extra boolean parameter, to make it more extensible in
the future. However,...

I question whether ChangeVarNodesExtended() and the changes to
ChangeVarNodes() are really the right way to go about this.
ChangeVarNodes() in particular gains a lot more logic to handle
RestrictInfo nodes that doesn't really feel like it belongs there --
e.g., building NullTest nodes is really specific to SJE, and doesn't
seem like it's something ChangeVarNodes() should be doing.

A better solution might be to add a new walker function to
analyzejoins.c that does just what SJE needs, which is different from
ChangeVarNodes() in a number of ways. For Var nodes, it might
ultimately be necessary to do more than just change the varno, to
solve the RETURNING/EPQ problems. For RestrictInfo nodes, there's a
lot of SJE-specific logic. The SJE code wants to ignore RangeTblRef
nodes, but it could delegate to ChangeVarNodes() for various other
node types to avoid code duplication. At the top level, the stuff that
ChangeVarNodes() does to fields of the Query struct would be different
for SJE, I think.

Regards,
Dean

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-20 11:38:48
Message-ID:	CAPpHfdsj1WdqoCg+g4oq1XQ_=KOj0U4pXqhauBc7e+0msHu66A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg젠 토토SQL :

On Fri, Jul 19, 2024 at 11:30 AM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> On Wed, 17 Jul 2024 at 01:45, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >
> > Jian He gave a try to ChangeVarNodes() [1]. That gives some
> > improvement, but the vast majority of complexity is still here. I
> > think the reason for complexity of SJE is that it's the first time we
> > remove relation, which is actually *used* and therefore might has
> > references in awful a lot of places. In previous cases we removed
> > relations, which were actually unused.
>
> I had a quick look at this, and I have a couple of comments on the
> rewriter changes.
>
> The new function replace_relid() looks to be the same as adjust_relid_set().

They are similar, not the same. replace_relid() has handling for
negative newId, while adjust_relid_set() hasn't. One thing I'd like
to borrow from adjust_relid_set() to replace_relid() is the usage of
IS_SPECIAL_VARNO() macro.

It would be probably nice to move this logic into bms_replace_member()
residing at bitmapset.c. What do you think?

> The changes to ChangeVarNodes() look a little messy. There's a lot of
> code duplicated between ChangeVarNodesExtended() and ChangeVarNodes(),
> which could be avoided by having one call the other. Also, it would be
> better for ChangeVarNodesExtended() to have a "flags" parameter
> instead of an extra boolean parameter, to make it more extensible in
> the future. However,...

I certainly didn't mean to have duplicate functions
ChangeVarNodesExtended() and ChangeVarNodes(). I mean
ChangeVarNodes() should just call ChangeVarNodesExtended().

> I question whether ChangeVarNodesExtended() and the changes to
> ChangeVarNodes() are really the right way to go about this.
> ChangeVarNodes() in particular gains a lot more logic to handle
> RestrictInfo nodes that doesn't really feel like it belongs there --
> e.g., building NullTest nodes is really specific to SJE, and doesn't
> seem like it's something ChangeVarNodes() should be doing.
>
> A better solution might be to add a new walker function to
> analyzejoins.c that does just what SJE needs, which is different from
> ChangeVarNodes() in a number of ways. For Var nodes, it might
> ultimately be necessary to do more than just change the varno, to
> solve the RETURNING/EPQ problems. For RestrictInfo nodes, there's a
> lot of SJE-specific logic. The SJE code wants to ignore RangeTblRef
> nodes, but it could delegate to ChangeVarNodes() for various other
> node types to avoid code duplication. At the top level, the stuff that
> ChangeVarNodes() does to fields of the Query struct would be different
> for SJE, I think.

We initially didn't use ChangeVarNodes() in SJE at all. See the last
patch version without it [1]. We're trying to address Tom Lane's
proposal to re-use more of existing tree-manipulation infrastructure
[2]. I agree with you that the case with ChangeVarNodes() looks
questionable. Do you have other ideas how we can re-use some more of
existing tree-manipulation infrastructure in SJE?

Links
1. /message-id/55f680bc-756d-4dd3-ab27-3c6e663b0e4c%40postgrespro.ru
2. /message-id/3622801.1715010885%40sss.pgh.pa.us

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v7-0001-Remove-useless-self-joins.patch	application/octet-stream	122.7 KB

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-22 02:13:49
Message-ID:	66a0093d-1ccb-4ab8-9675-6a5e69aa9783@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 20/7/2024 18:38, Alexander Korotkov wrote:
> On Fri, Jul 19, 2024 at 11:30 AM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> On Wed, 17 Jul 2024 at 01:45, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> We initially didn't use ChangeVarNodes() in SJE at all. See the last
> patch version without it [1]. We're trying to address Tom Lane's
> proposal to re-use more of existing tree-manipulation infrastructure
> [2]. I agree with you that the case with ChangeVarNodes() looks
> questionable. Do you have other ideas how we can re-use some more of
> existing tree-manipulation infrastructure in SJE?
As I can see, ChangeVarNodes is dedicated to working with the query tree
before the planning phase. SJE works right in the middle of the planning
process. So, it may be more practical to keep it separate as a walker,
as Dean has proposed. If the optimisation stuff changes, the walker code
will be changed, too.

--
regards, Andrei Lepikhov

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-07-24 11:07:47
Message-ID:	CAEZATCUwwvoaRuK+UuFF+S49KzKA3uMibjj7qUCOw_iSdtrZnA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> > The new function replace_relid() looks to be the same as adjust_relid_set().
>
> They are similar, not the same. replace_relid() has handling for
> negative newId, while adjust_relid_set() hasn't. One thing I'd like
> to borrow from adjust_relid_set() to replace_relid() is the usage of
> IS_SPECIAL_VARNO() macro.

Ah, that makes sense. In that case, I'd say that replace_relid()
should go in analyzejoins.c (and be a local function there), since
that's the only place that requires this special negative newId
handling.

> It would be probably nice to move this logic into bms_replace_member()
> residing at bitmapset.c. What do you think?

Maybe. It feels a little specialised though, so maybe it's not worth the effort.

I have been reviewing more of the patch, mainly focusing on the logic
in analyzejoins.c that decides when to apply SJE.

I understand broadly what the code is doing, but I still find it
somewhat hard to follow. One thing that makes it hard is that in
analyzejoins.c, "inner" and "outer" get swapped round at various
points. For example generate_join_implied_equalities() is defined like
this:

List *
generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo);

but remove_self_joins_one_group() calls it like this:

restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);

So you have to remember that "inner" is "outer" and "outer" is "inner"
when going into generate_join_implied_equalities() from
remove_self_joins_one_group(). And the same thing happens when calling
innerrel_is_unique_ext() and match_unique_clauses(). I think all that
could be resolved by swapping "inner" and "outer" in the variable
names and comments in remove_self_joins_one_group().

Another thing I noticed in remove_self_joins_one_group() was this:

/*
* To enable SJE for the only degenerate case without any self
* join clauses at all, add baserestrictinfo to this list. The
* degenerate case works only if both sides have the same clause.
* So doesn't matter which side to add.
*/
selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);

That appears to be pointless, because is_innerrel_unique_for() will
filter the restrictlist it is given, removing those baserestrictinfo
clauses (because I think they'll always have can_join = false). And
then relation_has_unique_index_ext() will re-add them:

/*
* Examine the rel's restriction clauses for usable var = const clauses
* that we can add to the restrictlist.
*/
foreach(ic, rel->baserestrictinfo)
{
... add suitable clauses
}

where "rel" is "innerrel" from is_innerrel_unique_for(), which is
"outer" from remove_self_joins_one_group(), so it's the same set of
baserestrictinfo clauses.

Something else that looks a little messy is this in innerrel_is_unique_ext():

/*
* innerrel_is_unique_ext
* Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
* additional clauses from a baserestrictinfo list that were used to prove
* uniqueness. A non NULL 'extra_clauses' indicates that we're checking
* for self-join and correspondingly dealing with filtered clauses.
*/
bool
innerrel_is_unique_ext(PlannerInfo *root,
...
List **extra_clauses)
{
bool self_join = (extra_clauses != NULL);

[logic depending on self_join]
}

This presumes that any caller interested in knowing the extra
baserestrictinfo clauses used to prove uniqueness must be looking at a
self join. That may be true today, but it doesn't seem like a good API
design choice. I think it would be better to just add "self_join" as
an extra parameter, and also maybe have the function return the
UniqueRelInfo containing the "extra_clauses", or NULL if it's not
unique. That way, it would be more extensible, if we wanted it to
return more information in the future.

Instead of adding relation_has_unique_index_ext(), maybe it would be
OK to just change the signature of relation_has_unique_index_for(). It
looks like it's only called from one other place outside
analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().

Should match_unique_clauses() be comparing mergeopfamilies or opnos to
ensure that the clauses are using the same equality operator?

Regards,
Dean

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-10-08 03:54:02
Message-ID:	1b69a5a4-674e-4bd2-b711-e15ed593ec7d@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 7/24/24 18:07, Dean Rasheed wrote:
> On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>>
>>> The new function replace_relid() looks to be the same as adjust_relid_set().
>>
>> They are similar, not the same. replace_relid() has handling for
>> negative newId, while adjust_relid_set() hasn't. One thing I'd like
>> to borrow from adjust_relid_set() to replace_relid() is the usage of
>> IS_SPECIAL_VARNO() macro.
>
> Ah, that makes sense. In that case, I'd say that replace_relid()
> should go in analyzejoins.c (and be a local function there), since
> that's the only place that requires this special negative newId
> handling.
Done. See attachment.

> So you have to remember that "inner" is "outer" and "outer" is "inner"
> when going into generate_join_implied_equalities() from
> remove_self_joins_one_group(). And the same thing happens when calling
> innerrel_is_unique_ext() and match_unique_clauses(). I think all that
> could be resolved by swapping "inner" and "outer" in the variable
> names and comments in remove_self_joins_one_group().
As you can see, when you dive into the implementation of the
generate_join_implied_equalities, an inner join works symmetrically. We
don't need any other clauses here, except mentioning both inner and
outer to prove self-join. So, it doesn't matter which side to choose as
an inner - all the effect can be: instead of x1=x2, we will have clause
x2=x1.

> selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
>
> That appears to be pointless, because is_innerrel_unique_for() will
> filter the restrictlist it is given, removing those baserestrictinfo
> clauses (because I think they'll always have can_join = false). And
> then relation_has_unique_index_ext() will re-add them:
Yeah, we used to have it to work out the case, then restrictlist == NIL.
I agree, it may be a bit strange and too expensive. Let's add a flag
check inside the innerrel_is_unique_ext instead.

> This presumes that any caller interested in knowing the extra
> baserestrictinfo clauses used to prove uniqueness must be looking at a
> self join. That may be true today, but it doesn't seem like a good API
> design choice. I think it would be better to just add "self_join" as
> an extra parameter, and also maybe have the function return the
> UniqueRelInfo containing the "extra_clauses", or NULL if it's not
> unique. That way, it would be more extensible, if we wanted it to
> return more information in the future.
If more users are interested in this function, we can just rename the
flag self_join to something more universal - extra_prove, I'm not sure.
>
> Instead of adding relation_has_unique_index_ext(), maybe it would be
> OK to just change the signature of relation_has_unique_index_for(). It
> looks like it's only called from one other place outside
> analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().
I vote against it remembering of extensions and broken APIs.
>
> Should match_unique_clauses() be comparing mergeopfamilies or opnos to
> ensure that the clauses are using the same equality operator?

Having the same variable side, same constant, unique index? I can
imagine only a self-made algebra, which can break something here, but it
looks like over-engineering, right?

Attached version is rebased onto the new master. Because of a3179ab it
may be necessary to do more detailed review.

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
v8-0001-Remove-useless-self-joins.patch	text/x-patch	122.5 KB

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-10-28 09:32:50
Message-ID:	fb18bcae-10e7-42ce-997a-6575c6f9793a@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/8/24 10:54, Andrei Lepikhov wrote:
> On 7/24/24 18:07, Dean Rasheed wrote:
>> On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov
>> <aekorotkov(at)gmail(dot)com> wrote:
> Attached version is rebased onto the new master. Because of a3179ab it
> may be necessary to do more detailed review.
Just a rebase on the current master.

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
v9-0001-Remove-useless-self-joins.patch	text/x-patch	122.4 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-10-28 21:18:27
Message-ID:	CAPpHfds=AapoZR23bU4YzT2Of_hOCr8ecPdOX6MOCK=TrAKMkg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 8, 2024 at 6:54 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 7/24/24 18:07, Dean Rasheed wrote:
> > So you have to remember that "inner" is "outer" and "outer" is "inner"
> > when going into generate_join_implied_equalities() from
> > remove_self_joins_one_group(). And the same thing happens when calling
> > innerrel_is_unique_ext() and match_unique_clauses(). I think all that
> > could be resolved by swapping "inner" and "outer" in the variable
> > names and comments in remove_self_joins_one_group().
> As you can see, when you dive into the implementation of the
> generate_join_implied_equalities, an inner join works symmetrically. We
> don't need any other clauses here, except mentioning both inner and
> outer to prove self-join. So, it doesn't matter which side to choose as
> an inner - all the effect can be: instead of x1=x2, we will have clause
> x2=x1.

I think switching places of inner and outer still makes sense in terms
of readability. I've done so for v10. Also, added a comment
regarding symmetry of arguments for generate_join_implied_equalities()
call.

> > This presumes that any caller interested in knowing the extra
> > baserestrictinfo clauses used to prove uniqueness must be looking at a
> > self join. That may be true today, but it doesn't seem like a good API
> > design choice. I think it would be better to just add "self_join" as
> > an extra parameter, and also maybe have the function return the
> > UniqueRelInfo containing the "extra_clauses", or NULL if it's not
> > unique. That way, it would be more extensible, if we wanted it to
> > return more information in the future.
> If more users are interested in this function, we can just rename the
> flag self_join to something more universal - extra_prove, I'm not sure.
> >
> > Instead of adding relation_has_unique_index_ext(), maybe it would be
> > OK to just change the signature of relation_has_unique_index_for(). It
> > looks like it's only called from one other place outside
> > analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().
> I vote against it remembering of extensions and broken APIs.

I spend some time using github search to find usage of
innerrel_is_unique() and relation_has_unique_index() in extensions.
Didn't find extensions actually use them. Some forks are using them
in the way PostgreSQL doesn't. But for forks it's way easier to adopt
such trivial API changes. If even there are some extensions using
this functions, it wouldn't be a terrible case of the broken API. The
current way seems to be rather bloating of our API. This is why I
decided to remove innerrel_is_unique_ext() and
relation_has_unique_index_ext(), and just add arguments to
innerrel_is_unique() and relation_has_unique_index_for().

I also did some rephrasing and grammar fixes for comments.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v10-0001-Remove-useless-self-joins.patch	application/octet-stream	123.9 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-12-09 06:03:10
Message-ID:	CAPpHfds16a7bjQHv-jCLMNzwdukJ_i3da2e0OJFB9L5rTCWEEg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jul 20, 2024 at 2:38 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> We initially didn't use ChangeVarNodes() in SJE at all. See the last
> patch version without it [1]. We're trying to address Tom Lane's
> proposal to re-use more of existing tree-manipulation infrastructure
> [2]. I agree with you that the case with ChangeVarNodes() looks
> questionable. Do you have other ideas how we can re-use some more of
> existing tree-manipulation infrastructure in SJE?

I think there was one idea to be considered. Given that this feature
has fragility to changes in PlannerInfo and underlying structures,
Andrei proposed to generate a walker over PlannerInfo with a Perl
script. I spent some time analyzing this idea. I came to the
following conclusions.

It seems feasible to generate a function that would walk over all
PlannerInfo fields recursively. Perhaps some more meta-information is
needed, at least, to check whether we should visit a particular
pointer field. But that could be provided by additional
pg_node_attr(). With this approach, we would need to sacrifice some
efficiency (e.g., processing all the EquivalenceClasses instead of
picking only the ones used in the relation to be removed). Also, the
logic in remove_self_join_rel() is more complex than processing all
the Relids. That is, we would still need a massive number of
if-branches specifying how we handle each node type. It doesn't look
like we would end up with a more simple or less error-prone
implementation.

We may decide to generate not just a walker but most of the logic in
remove_self_join_rel(). This is probably possible by injecting way
more meta-information into definitions of structures. That isn't
going to be simpler than the current approach. But it is probably
less error-prone: one could realize that if you add a new field to the
structure, it should have a similar pg_node_attr() as surroundings.
But I am afraid that if we go this way, we may end up with an awkward
heap of pg_node_attr() to generate the functionality of
remove_self_join_rel(). Should we better add comments to PlannerInfo
and other relevant structures saying: if you're going to modify this,
consider how that affects remove_self_join_rel()?

Any thoughts?

Links
1. /message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru

------
Regards,
Alexander Korotkov
Supabase

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Removing unneeded self joins
Date:	2024-12-11 09:02:32
Message-ID:	3c83675a-fdc5-4908-8584-fdc874f5fced@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9/12/2024 13:03, Alexander Korotkov wrote:
> remove_self_join_rel(). Should we better add comments to PlannerInfo
> and other relevant structures saying: if you're going to modify this,
> consider how that affects remove_self_join_rel()?
>
> Any thoughts?
As I see, it is quite typical to keep two parts of the code in sync by
mentioning them in comments (see reparameterisation stuff, for example).
This would reduce the code needed to implement the feature.

--
regards, Andrei Lepikhov

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2024-12-23 08:25:32
Message-ID:	ac7d7334-7ebb-4330-a894-12ba12ffa4ae@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg범퍼카 토토SQL

On 12/9/24 13:03, Alexander Korotkov wrote:
> On Sat, Jul 20, 2024 at 2:38 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> We may decide to generate not just a walker but most of the logic in
> remove_self_join_rel(). This is probably possible by injecting way
> more meta-information into definitions of structures. That isn't
> going to be simpler than the current approach. But it is probably
> less error-prone: one could realize that if you add a new field to the
> structure, it should have a similar pg_node_attr() as surroundings.
> But I am afraid that if we go this way, we may end up with an awkward
> heap of pg_node_attr() to generate the functionality of
> remove_self_join_rel(). Should we better add comments to PlannerInfo
> and other relevant structures saying: if you're going to modify this,
> consider how that affects remove_self_join_rel()?
>
> Any thoughts?

Observing positive cases caused by the SJE feature, I resumed the work
and, following Alexander's suggestion, added developer comments on
checking remove_self_join_rel in case of the PlannerInfo changes.

I see now that it helps apply after pull-up transformations and,
sometimes, because of clauses derived from EquivalenceClasses (see the
patch comment for examples). So, it is not only about dumb ORM-generated
queries.

Also, multiple code changes since the v7 version caused corresponding
changes in the SJE code on rebase. So, the new version provided in the
attachment needs a fresh review.

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
v8-0001-Remove-useless-self-joins.patch	text/x-patch	128.2 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-01-10 03:13:52
Message-ID:	CAPpHfdsuc524-BkXNZ+iz9dH=-KiU8qx9CSyE-Nip4Wjz5pGgg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 23, 2024 at 10:25 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 12/9/24 13:03, Alexander Korotkov wrote:
> > On Sat, Jul 20, 2024 at 2:38 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > We may decide to generate not just a walker but most of the logic in
> > remove_self_join_rel(). This is probably possible by injecting way
> > more meta-information into definitions of structures. That isn't
> > going to be simpler than the current approach. But it is probably
> > less error-prone: one could realize that if you add a new field to the
> > structure, it should have a similar pg_node_attr() as surroundings.
> > But I am afraid that if we go this way, we may end up with an awkward
> > heap of pg_node_attr() to generate the functionality of
> > remove_self_join_rel(). Should we better add comments to PlannerInfo
> > and other relevant structures saying: if you're going to modify this,
> > consider how that affects remove_self_join_rel()?
> >
> > Any thoughts?
>
> Observing positive cases caused by the SJE feature, I resumed the work
> and, following Alexander's suggestion, added developer comments on
> checking remove_self_join_rel in case of the PlannerInfo changes.
>
> I see now that it helps apply after pull-up transformations and,
> sometimes, because of clauses derived from EquivalenceClasses (see the
> patch comment for examples). So, it is not only about dumb ORM-generated
> queries.
>
> Also, multiple code changes since the v7 version caused corresponding
> changes in the SJE code on rebase. So, the new version provided in the
> attachment needs a fresh review.

I've got an off-list report from Alexander Lakhin. The failing query
is added to the regression tests in the revised patch. The query was
failing with an error "negative bitmapset member not allowed" issued
in adjust_relid_set(). In order to fix that I've to teach
adjust_relid_set() about negative newrelid, it became even more
similar to replace_relid(). Notable this error happens in
remove_leftjoinrel_from_query(). This seems to be consequence that we
made self-join removal and left-joins removal use common
infrastructure.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v9-0001-Remove-useless-self-joins.patch	application/octet-stream	128.9 KB

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-01-31 04:57:46
Message-ID:	ac34e44e-f8e1-4601-8bb8-fa5e65578832@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/10/25 10:13, Alexander Korotkov wrote:
> I've got an off-list report from Alexander Lakhin. The failing query
> is added to the regression tests in the revised patch. The query was
> failing with an error "negative bitmapset member not allowed" issued
> in adjust_relid_set(). In order to fix that I've to teach
> adjust_relid_set() about negative newrelid, it became even more
> similar to replace_relid(). Notable this error happens in
> remove_leftjoinrel_from_query(). This seems to be consequence that we
> made self-join removal and left-joins removal use common
> infrastructure.
I have analysed your change. It pertains to the situation where PHV
references both the outer join and its inner join in phrels but actually
utilises only the outer side of the join.

I agree with your proposed fix and have just rewritten the query to make
it simpler and more consistent with the regression tests (see new
version of the patch in attachment).

Additionally, I noticed that the remove_useless_joins code uses the term
"join removal" in the join.sql tests. I wonder if it might be more
appropriate to revert the name from "Self Join Elimination" back to
"Self Join Removal."

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
v10-0001-Remove-useless-self-joins.patch	text/x-patch	129.9 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-09 11:41:10
Message-ID:	CAPpHfdvHzExyROoxvVQz6JR4tGVOjgNyHnu8xJmGcx-bMxWvqw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Andrei!

On Fri, Jan 31, 2025 at 6:57 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 1/10/25 10:13, Alexander Korotkov wrote:
> > I've got an off-list report from Alexander Lakhin. The failing query
> > is added to the regression tests in the revised patch. The query was
> > failing with an error "negative bitmapset member not allowed" issued
> > in adjust_relid_set(). In order to fix that I've to teach
> > adjust_relid_set() about negative newrelid, it became even more
> > similar to replace_relid(). Notable this error happens in
> > remove_leftjoinrel_from_query(). This seems to be consequence that we
> > made self-join removal and left-joins removal use common
> > infrastructure.
> I have analysed your change. It pertains to the situation where PHV
> references both the outer join and its inner join in phrels but actually
> utilises only the outer side of the join.
>
> I agree with your proposed fix and have just rewritten the query to make
> it simpler and more consistent with the regression tests (see new
> version of the patch in attachment).
>
> Additionally, I noticed that the remove_useless_joins code uses the term
> "join removal" in the join.sql tests. I wonder if it might be more
> appropriate to revert the name from "Self Join Elimination" back to
> "Self Join Removal."

Great, thank you.

Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid is given
and not found. In this case adjust_relid_set() returns the original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().

Also, I did some grammar correction to your new comment in tests.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v11-0001-Remove-useless-self-joins.patch	application/octet-stream	130.1 KB

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-10 05:19:49
Message-ID:	5934ce1e-be02-4640-904c-f7fec3962784@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9/2/2025 18:41, Alexander Korotkov wrote:
> Regarding adjust_relid_set() and replace_relid(). I think they are
> now strictly equivalent, except for the case then old relid is given
> and not found. In this case adjust_relid_set() returns the original
> relids while replace_relid() returns a copy. The behavior of
> adjust_relid_set() appears more desirable as we don't need extra
> copying when no modification is done. So, I've replaced all
> replace_relid() with adjust_relid_set().
Ok, I glanced into it, and it makes sense to merge these routines.
I think the comment to adjust_relid_set() should be arranged, too. See
the attachment for a short variant of such modification.
>
> Also, I did some grammar correction to your new comment in tests.
Thanks!

--
regards, Andrei Lepikhov

Attachment	Content-Type	Size
comment.diff	text/plain	833 bytes

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-10 19:58:54
Message-ID:	CAPpHfdt9Z=iriS1Fj0Vyo41Z4cS+zUSZM8D3qXyw5LfHr_jVjA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 9/2/2025 18:41, Alexander Korotkov wrote:
> > Regarding adjust_relid_set() and replace_relid(). I think they are
> > now strictly equivalent, except for the case then old relid is given
> > and not found. In this case adjust_relid_set() returns the original
> > relids while replace_relid() returns a copy. The behavior of
> > adjust_relid_set() appears more desirable as we don't need extra
> > copying when no modification is done. So, I've replaced all
> > replace_relid() with adjust_relid_set().
> Ok, I glanced into it, and it makes sense to merge these routines.
> I think the comment to adjust_relid_set() should be arranged, too. See
> the attachment for a short variant of such modification.
> >
> > Also, I did some grammar correction to your new comment in tests.
> Thanks!

I've further revised adjust_relid_set() header comment.

Looking back to the work done since previous attempt to commit this to
pg17, I can highlight following.
1) We're now using more of existing infrastructure including
adjust_relid_set() and ChangeVarNodes(). The most of complexity is
still there though.
2) We've checked few ways to further simplify this patch. But yet the
current way still feels to be best possible.
3) For sure, several bugs were fixed.

I think we could give it another chance for pg18 after some further
polishing (at least commit message still needs to be revised). Any
thoughts on this? Tom?

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v12-0001-Remove-useless-self-joins.patch	application/octet-stream	130.7 KB

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-11 15:31:33
Message-ID:	d6319b0c-b186-472f-8b64-41dfa41e7b5f@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

Hi! Thank you for the work with this subject, I think it is really
important.

On 10.02.2025 22:58, Alexander Korotkov wrote:
> Hi!
>
> On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> On 9/2/2025 18:41, Alexander Korotkov wrote:
>>> Regarding adjust_relid_set() and replace_relid(). I think they are
>>> now strictly equivalent, except for the case then old relid is given
>>> and not found. In this case adjust_relid_set() returns the original
>>> relids while replace_relid() returns a copy. The behavior of
>>> adjust_relid_set() appears more desirable as we don't need extra
>>> copying when no modification is done. So, I've replaced all
>>> replace_relid() with adjust_relid_set().
>> Ok, I glanced into it, and it makes sense to merge these routines.
>> I think the comment to adjust_relid_set() should be arranged, too. See
>> the attachment for a short variant of such modification.
>>> Also, I did some grammar correction to your new comment in tests.
>> Thanks!
> I've further revised adjust_relid_set() header comment.
>
> Looking back to the work done since previous attempt to commit this to
> pg17, I can highlight following.
> 1) We're now using more of existing infrastructure including
> adjust_relid_set() and ChangeVarNodes(). The most of complexity is
> still there though.
> 2) We've checked few ways to further simplify this patch. But yet the
> current way still feels to be best possible.
> 3) For sure, several bugs were fixed.
>
> I think we could give it another chance for pg18 after some further
> polishing (at least commit message still needs to be revised). Any
> thoughts on this? Tom?
>
I didn't find any mistakes, I just have a refactoring remark. I think
the part where we add non-redundant expressions with the
binfo_candidates, jinfo_candidates
check can be moved to a separate function, otherwise the code is very
repetitive in this place. I did it and attached diff file

--
Regards,
Alena Rybakina
Postgres Professional

Attachment	Content-Type	Size
self-join-removal.diff	text/x-patch	3.9 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-12 23:00:41
Message-ID:	CAPpHfdsFcde5VWCxSup7x_12ig6uynrFaOms62-M_c8DM-FTBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Tue, Feb 11, 2025 at 5:31 PM Alena Rybakina
<a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> Hi! Thank you for the work with this subject, I think it is really
> important.
>
> On 10.02.2025 22:58, Alexander Korotkov wrote:
> > Hi!
> >
> > On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> >> On 9/2/2025 18:41, Alexander Korotkov wrote:
> >>> Regarding adjust_relid_set() and replace_relid(). I think they are
> >>> now strictly equivalent, except for the case then old relid is given
> >>> and not found. In this case adjust_relid_set() returns the original
> >>> relids while replace_relid() returns a copy. The behavior of
> >>> adjust_relid_set() appears more desirable as we don't need extra
> >>> copying when no modification is done. So, I've replaced all
> >>> replace_relid() with adjust_relid_set().
> >> Ok, I glanced into it, and it makes sense to merge these routines.
> >> I think the comment to adjust_relid_set() should be arranged, too. See
> >> the attachment for a short variant of such modification.
> >>> Also, I did some grammar correction to your new comment in tests.
> >> Thanks!
> > I've further revised adjust_relid_set() header comment.
> >
> > Looking back to the work done since previous attempt to commit this to
> > pg17, I can highlight following.
> > 1) We're now using more of existing infrastructure including
> > adjust_relid_set() and ChangeVarNodes(). The most of complexity is
> > still there though.
> > 2) We've checked few ways to further simplify this patch. But yet the
> > current way still feels to be best possible.
> > 3) For sure, several bugs were fixed.
> >
> > I think we could give it another chance for pg18 after some further
> > polishing (at least commit message still needs to be revised). Any
> > thoughts on this? Tom?
> >
> I didn't find any mistakes, I just have a refactoring remark. I think
> the part where we add non-redundant expressions with the
> binfo_candidates, jinfo_candidates
> check can be moved to a separate function, otherwise the code is very
> repetitive in this place. I did it and attached diff file

Thank you. I've integrated that into a patch. However, I've to
change keep_rinfo_list to be passed by pointer to
add_non_redundant_clauses(), because it might be changed in
distribute_restrictinfo_to_rels(). Without that there is a case of
duplicated clause in regression tests.

I've changed 'inner' and 'outer' vise versa in
remove_self_joins_one_group() for better readability (I believe that
was discussed upthread but lost). Also, I did a round of improvement
for comments and commit message.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v13-0001-Implement-Self-Join-Elimination.patch	application/octet-stream	131.3 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-15 09:19:42
Message-ID:	CAPpHfdseDr02BT+kFfW1ORWd0xnf9B=OUNmYkDxF2x-bE32UOw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 13, 2025 at 1:00 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Tue, Feb 11, 2025 at 5:31 PM Alena Rybakina
> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> > Hi! Thank you for the work with this subject, I think it is really
> > important.
> >
> > On 10.02.2025 22:58, Alexander Korotkov wrote:
> > > Hi!
> > >
> > > On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> > >> On 9/2/2025 18:41, Alexander Korotkov wrote:
> > >>> Regarding adjust_relid_set() and replace_relid(). I think they are
> > >>> now strictly equivalent, except for the case then old relid is given
> > >>> and not found. In this case adjust_relid_set() returns the original
> > >>> relids while replace_relid() returns a copy. The behavior of
> > >>> adjust_relid_set() appears more desirable as we don't need extra
> > >>> copying when no modification is done. So, I've replaced all
> > >>> replace_relid() with adjust_relid_set().
> > >> Ok, I glanced into it, and it makes sense to merge these routines.
> > >> I think the comment to adjust_relid_set() should be arranged, too. See
> > >> the attachment for a short variant of such modification.
> > >>> Also, I did some grammar correction to your new comment in tests.
> > >> Thanks!
> > > I've further revised adjust_relid_set() header comment.
> > >
> > > Looking back to the work done since previous attempt to commit this to
> > > pg17, I can highlight following.
> > > 1) We're now using more of existing infrastructure including
> > > adjust_relid_set() and ChangeVarNodes(). The most of complexity is
> > > still there though.
> > > 2) We've checked few ways to further simplify this patch. But yet the
> > > current way still feels to be best possible.
> > > 3) For sure, several bugs were fixed.
> > >
> > > I think we could give it another chance for pg18 after some further
> > > polishing (at least commit message still needs to be revised). Any
> > > thoughts on this? Tom?
> > >
> > I didn't find any mistakes, I just have a refactoring remark. I think
> > the part where we add non-redundant expressions with the
> > binfo_candidates, jinfo_candidates
> > check can be moved to a separate function, otherwise the code is very
> > repetitive in this place. I did it and attached diff file
>
> Thank you. I've integrated that into a patch. However, I've to
> change keep_rinfo_list to be passed by pointer to
> add_non_redundant_clauses(), because it might be changed in
> distribute_restrictinfo_to_rels(). Without that there is a case of
> duplicated clause in regression tests.
>
> I've changed 'inner' and 'outer' vise versa in
> remove_self_joins_one_group() for better readability (I believe that
> was discussed upthread but lost). Also, I did a round of improvement
> for comments and commit message.

I've corrected some spelling error reported by Alexander Lakhin
privately to me. Also, I've revised comments around ChangeVarNodes()
and ChangeVarNodesExtended(). I'm going to continue nitpicking this
patch during next couple days then push it if no objections.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v14-0001-Implement-Self-Join-Elimination.patch	application/octet-stream	131.4 KB

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-18 07:24:02
Message-ID:	38e8cec8-bdef-4892-8e15-7f2207e5e157@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg젠 토토SQL :

On 13.02.2025 02:00, Alexander Korotkov wrote:
> Thank you. I've integrated that into a patch. However, I've to
> change keep_rinfo_list to be passed by pointer to
> add_non_redundant_clauses(), because it might be changed in
> distribute_restrictinfo_to_rels(). Without that there is a case of
> duplicated clause in regression tests.
>
> I've changed 'inner' and 'outer' vise versa in
> remove_self_joins_one_group() for better readability (I believe that
> was discussed upthread but lost). Also, I did a round of improvement
> for comments and commit message.
On 15.02.2025 12:19, Alexander Korotkov wrote:
> On Thu, Feb 13, 2025 at 1:00 AM Alexander Korotkov<aekorotkov(at)gmail(dot)com> wrote:
>> On Tue, Feb 11, 2025 at 5:31 PM Alena Rybakina
>> <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
>>> Hi! Thank you for the work with this subject, I think it is really
>>> important.
>>>
>>> On 10.02.2025 22:58, Alexander Korotkov wrote:
>>>> Hi!
>>>>
>>>> On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov<lepihov(at)gmail(dot)com> wrote:
>>>>> On 9/2/2025 18:41, Alexander Korotkov wrote:
>>>>>> Regarding adjust_relid_set() and replace_relid(). I think they are
>>>>>> now strictly equivalent, except for the case then old relid is given
>>>>>> and not found. In this case adjust_relid_set() returns the original
>>>>>> relids while replace_relid() returns a copy. The behavior of
>>>>>> adjust_relid_set() appears more desirable as we don't need extra
>>>>>> copying when no modification is done. So, I've replaced all
>>>>>> replace_relid() with adjust_relid_set().
>>>>> Ok, I glanced into it, and it makes sense to merge these routines.
>>>>> I think the comment to adjust_relid_set() should be arranged, too. See
>>>>> the attachment for a short variant of such modification.
>>>>>> Also, I did some grammar correction to your new comment in tests.
>>>>> Thanks!
>>>> I've further revised adjust_relid_set() header comment.
>>>>
>>>> Looking back to the work done since previous attempt to commit this to
>>>> pg17, I can highlight following.
>>>> 1) We're now using more of existing infrastructure including
>>>> adjust_relid_set() and ChangeVarNodes(). The most of complexity is
>>>> still there though.
>>>> 2) We've checked few ways to further simplify this patch. But yet the
>>>> current way still feels to be best possible.
>>>> 3) For sure, several bugs were fixed.
>>>>
>>>> I think we could give it another chance for pg18 after some further
>>>> polishing (at least commit message still needs to be revised). Any
>>>> thoughts on this? Tom?
>>>>
>>> I didn't find any mistakes, I just have a refactoring remark. I think
>>> the part where we add non-redundant expressions with the
>>> binfo_candidates, jinfo_candidates
>>> check can be moved to a separate function, otherwise the code is very
>>> repetitive in this place. I did it and attached diff file
>> Thank you. I've integrated that into a patch. However, I've to
>> change keep_rinfo_list to be passed by pointer to
>> add_non_redundant_clauses(), because it might be changed in
>> distribute_restrictinfo_to_rels(). Without that there is a case of
>> duplicated clause in regression tests.
>>
>> I've changed 'inner' and 'outer' vise versa in
>> remove_self_joins_one_group() for better readability (I believe that
>> was discussed upthread but lost). Also, I did a round of improvement
>> for comments and commit message.
> I've corrected some spelling error reported by Alexander Lakhin
> privately to me. Also, I've revised comments around ChangeVarNodes()
> and ChangeVarNodesExtended(). I'm going to continue nitpicking this
> patch during next couple days then push it if no objections.
I agree with your corrections and for me patch looks good.

--
Regards,
Alena Rybakina
Postgres Professional

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-23 17:02:49
Message-ID:	914330.1740330169@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> I've corrected some spelling error reported by Alexander Lakhin
> privately to me. Also, I've revised comments around ChangeVarNodes()
> and ChangeVarNodesExtended(). I'm going to continue nitpicking this
> patch during next couple days then push it if no objections.

Coverity has another nit-pick:

/srv/coverity/git/pgsql-git/postgresql/src/backend/optimizer/plan/analyzejoins.c: 327 in remove_rel_from_query()
321 static void
322 remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
323 int subst, SpecialJoinInfo *sjinfo,
324 Relids joinrelids)
325 {
326 int relid = rel->relid;
>>> CID 1643155: Integer handling issues (INTEGER_OVERFLOW)
>>> Expression "ojrelid", which is equal to 4294967295, where "(sjinfo != NULL) ? sjinfo->ojrelid : 4294967295U" is known to be equal to 4294967295, overflows the type that receives it, a signed integer 32 bits wide.
327 int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
328 Index rti;
329 ListCell *l;
330
331 /*
332 * Update all_baserels and related relid sets.

This is unhappy because sjinfo->ojrelid, which is declared Index
(that is, unsigned int) is being converted to int. Admittedly
there are plenty of other places that do likewise, but the additional
load of assuming that -1 isn't a possible value of sjinfo->ojrelid
seems to be enough to draw its ire.

I suggest finding another way to code this function that doesn't
depend on that type pun. I think it's fairly accidental that
adjust_relid_set doesn't misbehave on -1 anyway, so personally I'd
get rid of that local variable entirely in favor of something like

if (sjinfo != NULL)
{
root->outer_join_rels = adjust_relid_set(root->outer_join_rels,
sjinfo->ojrelid, subst);
root->all_query_rels = adjust_relid_set(root->all_query_rels,
sjinfo->ojrelid, subst);
}

regards, tom lane

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-23 21:15:35
Message-ID:	CAPpHfdt-F2CzQEEDZjhDB0pHMP-FdWt-jv6DVsO+5JZEGpTyTA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 23, 2025 at 7:02 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > I've corrected some spelling error reported by Alexander Lakhin
> > privately to me. Also, I've revised comments around ChangeVarNodes()
> > and ChangeVarNodesExtended(). I'm going to continue nitpicking this
> > patch during next couple days then push it if no objections.
>
> Coverity has another nit-pick:
>
> /srv/coverity/git/pgsql-git/postgresql/src/backend/optimizer/plan/analyzejoins.c: 327 in remove_rel_from_query()
> 321 static void
> 322 remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
> 323 int subst, SpecialJoinInfo *sjinfo,
> 324 Relids joinrelids)
> 325 {
> 326 int relid = rel->relid;
> >>> CID 1643155: Integer handling issues (INTEGER_OVERFLOW)
> >>> Expression "ojrelid", which is equal to 4294967295, where "(sjinfo != NULL) ? sjinfo->ojrelid : 4294967295U" is known to be equal to 4294967295, overflows the type that receives it, a signed integer 32 bits wide.
> 327 int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
> 328 Index rti;
> 329 ListCell *l;
> 330
> 331 /*
> 332 * Update all_baserels and related relid sets.
>
> This is unhappy because sjinfo->ojrelid, which is declared Index
> (that is, unsigned int) is being converted to int. Admittedly
> there are plenty of other places that do likewise, but the additional
> load of assuming that -1 isn't a possible value of sjinfo->ojrelid
> seems to be enough to draw its ire.

Thank you for reporting this!

> I suggest finding another way to code this function that doesn't
> depend on that type pun. I think it's fairly accidental that
> adjust_relid_set doesn't misbehave on -1 anyway, so personally I'd
> get rid of that local variable entirely in favor of something like
>
> if (sjinfo != NULL)
> {
> root->outer_join_rels = adjust_relid_set(root->outer_join_rels,
> sjinfo->ojrelid, subst);
> root->all_query_rels = adjust_relid_set(root->all_query_rels,
> sjinfo->ojrelid, subst);
> }

There is my attempt to implement this approach. Getting rid of local
variable (and computation of the same value other way) required to
change arguments of remove_rel_from_eclass() as well. I'm going to
further polish this tomorrow.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v1-0001-Get-rid-of-ojrelid-local-variable-in-remove_rel_f.patch	application/octet-stream	8.3 KB

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-23 22:12:36
Message-ID:	df21bd7e-5ec8-4e45-a339-b3a59ec48d1e@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 23/2/2025 22:15, Alexander Korotkov wrote:
> There is my attempt to implement this approach. Getting rid of local
> variable (and computation of the same value other way) required to
> change arguments of remove_rel_from_eclass() as well. I'm going to
> further polish this tomorrow.
I passed through the patch. It works correctly.

Multiple ifs in a single routine is not ideal. I have thought about it
already, and it seems the remove_rel_from_query needs refactoring: when
I first reused it for self-join removal, we didn't have the 'ojrelid'
machinery, and it was implemented smoothly.
Right now, this code contains multiple places where we need to remove
the outer join relid and separating the removal code for the baserel and
outer join may simplify the logic and make it safer.
But the way to do it is not apparent now. May be if we implement a new
technique of query tree reduction, the approach will become more evident.

--
regards, Andrei Lepikhov

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-24 10:57:41
Message-ID:	CAPpHfdsN8xSmZ+2vKK0m0--DfDJS8oOZsb-k5Qgk6Nv0v1qk9g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Andrei!

Thank you for your feedback.

On Mon, Feb 24, 2025 at 12:12 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 23/2/2025 22:15, Alexander Korotkov wrote:
> > There is my attempt to implement this approach. Getting rid of local
> > variable (and computation of the same value other way) required to
> > change arguments of remove_rel_from_eclass() as well. I'm going to
> > further polish this tomorrow.
> I passed through the patch. It works correctly.
>
> Multiple ifs in a single routine is not ideal.

I would say there is a brachcing anyway. Even if it is inside
adjust_relid_set(). Patch makes it more explicit. Ideal or not, I
don't think it gets worse.

> I have thought about it
> already, and it seems the remove_rel_from_query needs refactoring: when
> I first reused it for self-join removal, we didn't have the 'ojrelid'
> machinery, and it was implemented smoothly.
> Right now, this code contains multiple places where we need to remove
> the outer join relid and separating the removal code for the baserel and
> outer join may simplify the logic and make it safer.
> But the way to do it is not apparent now. May be if we implement a new
> technique of query tree reduction, the approach will become more evident.

Could you, please, elaborate more on what you mean by "new technique
of query tree reduction"?

------
Regards,
Alexander Korotkov
Supabase

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-24 12:22:35
Message-ID:	5b0ca934-4c61-4c29-8ad4-8988bed1ca0c@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24/2/2025 11:57, Alexander Korotkov wrote:
> Hi, Andrei!
>
> Thank you for your feedback.
>
> On Mon, Feb 24, 2025 at 12:12 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> On 23/2/2025 22:15, Alexander Korotkov wrote:
>>> There is my attempt to implement this approach. Getting rid of local
>>> variable (and computation of the same value other way) required to
>>> change arguments of remove_rel_from_eclass() as well. I'm going to
>>> further polish this tomorrow.
>> I passed through the patch. It works correctly.
>>
>> Multiple ifs in a single routine is not ideal.
>
> I would say there is a brachcing anyway. Even if it is inside
> adjust_relid_set(). Patch makes it more explicit. Ideal or not, I
> don't think it gets worse.
>
>> I have thought about it
>> already, and it seems the remove_rel_from_query needs refactoring: when
>> I first reused it for self-join removal, we didn't have the 'ojrelid'
>> machinery, and it was implemented smoothly.
>> Right now, this code contains multiple places where we need to remove
>> the outer join relid and separating the removal code for the baserel and
>> outer join may simplify the logic and make it safer.
>> But the way to do it is not apparent now. May be if we implement a new
>> technique of query tree reduction, the approach will become more evident.
>
> Could you, please, elaborate more on what you mean by "new technique
> of query tree reduction"?
I mean any transformations and optimisations that reduce search space
for optimisation. Right now, I see the features reduce_unique_semijoins,
remove_useless_joins, and remove_useless_self_joins.
In practice, I see at least a join on a foreign key, where some cases
potentially allow the removal of the JOIN operator.

--
regards, Andrei Lepikhov

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-26 12:12:30
Message-ID:	CAPpHfdt=f7ZOnaH+DjkxYYgmsZE7i1KChHCqrMd9SwzOMBt__Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 24, 2025 at 12:57 PM Alexander Korotkov
<aekorotkov(at)gmail(dot)com> wrote:
> On Mon, Feb 24, 2025 at 12:12 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> > On 23/2/2025 22:15, Alexander Korotkov wrote:
> > > There is my attempt to implement this approach. Getting rid of local
> > > variable (and computation of the same value other way) required to
> > > change arguments of remove_rel_from_eclass() as well. I'm going to
> > > further polish this tomorrow.
> > I passed through the patch. It works correctly.
> >
> > Multiple ifs in a single routine is not ideal.
>
> I would say there is a brachcing anyway. Even if it is inside
> adjust_relid_set(). Patch makes it more explicit. Ideal or not, I
> don't think it gets worse.

I've added a commit message to the patch. I'm going to push it if no
objections.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v2-0001-Get-rid-of-ojrelid-local-variable-in-remove_rel_f.patch	application/octet-stream	8.7 KB

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-02-26 12:14:31
Message-ID:	CAPpHfdtawXSOD2d_yVuzVqaZND218otEHiCByC1r=xvX-Y+x8Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 24, 2025 at 2:22 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>
> On 24/2/2025 11:57, Alexander Korotkov wrote:
> > Hi, Andrei!
> >
> > Thank you for your feedback.
> >
> > On Mon, Feb 24, 2025 at 12:12 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> >> On 23/2/2025 22:15, Alexander Korotkov wrote:
> >>> There is my attempt to implement this approach. Getting rid of local
> >>> variable (and computation of the same value other way) required to
> >>> change arguments of remove_rel_from_eclass() as well. I'm going to
> >>> further polish this tomorrow.
> >> I passed through the patch. It works correctly.
> >>
> >> Multiple ifs in a single routine is not ideal.
> >
> > I would say there is a brachcing anyway. Even if it is inside
> > adjust_relid_set(). Patch makes it more explicit. Ideal or not, I
> > don't think it gets worse.
> >
> >> I have thought about it
> >> already, and it seems the remove_rel_from_query needs refactoring: when
> >> I first reused it for self-join removal, we didn't have the 'ojrelid'
> >> machinery, and it was implemented smoothly.
> >> Right now, this code contains multiple places where we need to remove
> >> the outer join relid and separating the removal code for the baserel and
> >> outer join may simplify the logic and make it safer.
> >> But the way to do it is not apparent now. May be if we implement a new
> >> technique of query tree reduction, the approach will become more evident.
> >
> > Could you, please, elaborate more on what you mean by "new technique
> > of query tree reduction"?
> I mean any transformations and optimisations that reduce search space
> for optimisation. Right now, I see the features reduce_unique_semijoins,
> remove_useless_joins, and remove_useless_self_joins.
> In practice, I see at least a join on a foreign key, where some cases
> potentially allow the removal of the JOIN operator.

Do you mean some generic facility, which generalizes all the
transformations you mentioned? If so, it would be cool. But how
could it look like?

------
Regards,
Alexander Korotkov
Supabase

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-03-07 06:03:53
Message-ID:	9c951cf8-6876-4f0f-9c22-44181e6fda78@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26/2/2025 13:14, Alexander Korotkov wrote:
> On Mon, Feb 24, 2025 at 2:22 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> On 24/2/2025 11:57, Alexander Korotkov wrote:
>>> Could you, please, elaborate more on what you mean by "new technique
>>> of query tree reduction"?
>> I mean any transformations and optimisations that reduce search space
>> for optimisation. Right now, I see the features reduce_unique_semijoins,
>> remove_useless_joins, and remove_useless_self_joins.
>> In practice, I see at least a join on a foreign key, where some cases
>> potentially allow the removal of the JOIN operator.
>
> Do you mean some generic facility, which generalizes all the
> transformations you mentioned? If so, it would be cool. But how
> could it look like?
I think we may realise what it may look like by attempting to implement
more RelOptInfo-removal features. Right now, I have only vague ideas on
that subject. Just for reference, there were other discussions on join
removal [1 - 5] and some blog posts explaining Oracle techniques in this
area - see, for example, [6].

[1] inner join removal
/message-id/flat/AANLkTinS_MlZ2F3Siwgcje--qf5nTJTpuHFnZdcA45bU%40mail.gmail.com
[2] Patch to support SEMI and ANTI join removal
/message-id/flat/CAApHDvpCBEfuc5tD%3DvniepAv0pU5m%3Dq%3DfOQZcOdMHeei7OQPgQ%40mail.gmail.com
[3] Removing INNER JOINs
/message-id/flat/CAApHDvocUEYdt1uT+DLDPs2xEu=v3qJGT6HeXKonQM4rY_OsSA(at)mail(dot)gmail(dot)com#CAApHDvocUEYdt1uT+DLDPs2xEu=v3qJGT6HeXKonQM4rY_OsSA@mail.gmail.com
[4] WIP Join Removal
/message-id/flat/1220176372.4371.118.camel%40ebony.2ndQuadrant
[5] Join Removal/ Vertical Partitioning
/message-id/flat/1214477827.3845.87.camel%40ebony.site
[6] Oracle Join Elimination
https://oracle-base.com/articles/misc/join-elimination

--
regards, Andrei Lepikhov

From:	Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-03-21 17:06:13
Message-ID:	CAEudQAo5SEB99uehUtTDcd3sxsPP3SnMCbwbyWgE5O0W9JgRRQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi.

Em sáb., 15 de fev. de 2025 às 06:20, Alexander Korotkov <
aekorotkov(at)gmail(dot)com> escreveu:

> On Thu, Feb 13, 2025 at 1:00 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com>
> wrote:
> >
> > On Tue, Feb 11, 2025 at 5:31 PM Alena Rybakina
> > <a(dot)rybakina(at)postgrespro(dot)ru> wrote:
> > > Hi! Thank you for the work with this subject, I think it is really
> > > important.
> > >
> > > On 10.02.2025 22:58, Alexander Korotkov wrote:
> > > > Hi!
> > > >
> > > > On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov(at)gmail(dot)com>
> wrote:
> > > >> On 9/2/2025 18:41, Alexander Korotkov wrote:
> > > >>> Regarding adjust_relid_set() and replace_relid(). I think they are
> > > >>> now strictly equivalent, except for the case then old relid is
> given
> > > >>> and not found. In this case adjust_relid_set() returns the
> original
> > > >>> relids while replace_relid() returns a copy. The behavior of
> > > >>> adjust_relid_set() appears more desirable as we don't need extra
> > > >>> copying when no modification is done. So, I've replaced all
> > > >>> replace_relid() with adjust_relid_set().
> > > >> Ok, I glanced into it, and it makes sense to merge these routines.
> > > >> I think the comment to adjust_relid_set() should be arranged, too.
> See
> > > >> the attachment for a short variant of such modification.
> > > >>> Also, I did some grammar correction to your new comment in tests.
> > > >> Thanks!
> > > > I've further revised adjust_relid_set() header comment.
> > > >
> > > > Looking back to the work done since previous attempt to commit this
> to
> > > > pg17, I can highlight following.
> > > > 1) We're now using more of existing infrastructure including
> > > > adjust_relid_set() and ChangeVarNodes(). The most of complexity is
> > > > still there though.
> > > > 2) We've checked few ways to further simplify this patch. But yet
> the
> > > > current way still feels to be best possible.
> > > > 3) For sure, several bugs were fixed.
> > > >
> > > > I think we could give it another chance for pg18 after some further
> > > > polishing (at least commit message still needs to be revised). Any
> > > > thoughts on this? Tom?
> > > >
> > > I didn't find any mistakes, I just have a refactoring remark. I think
> > > the part where we add non-redundant expressions with the
> > > binfo_candidates, jinfo_candidates
> > > check can be moved to a separate function, otherwise the code is very
> > > repetitive in this place. I did it and attached diff file
> >
> > Thank you. I've integrated that into a patch. However, I've to
> > change keep_rinfo_list to be passed by pointer to
> > add_non_redundant_clauses(), because it might be changed in
> > distribute_restrictinfo_to_rels(). Without that there is a case of
> > duplicated clause in regression tests.
> >
> > I've changed 'inner' and 'outer' vise versa in
> > remove_self_joins_one_group() for better readability (I believe that
> > was discussed upthread but lost). Also, I did a round of improvement
> > for comments and commit message.
>
> I've corrected some spelling error reported by Alexander Lakhin
> privately to me. Also, I've revised comments around ChangeVarNodes()
> and ChangeVarNodesExtended(). I'm going to continue nitpicking this
> patch during next couple days then push it if no objections.
>
I think a small optimization is possible here.

The whole block that append *rinfo* to exprs is controlled by extra_clause
not NULL.
So It's worth moving the test to the beginning of the block and avoiding it
altogether if that's the case.

trivial patch attached.

best regards,
Ranier Vilela

Attachment	Content-Type	Size
avoid-expensive-function-indxpath.patch	application/octet-stream	954 bytes

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Cc:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-03 16:02:16
Message-ID:	CAPpHfdtTxSs+t8PXjJ_EC5roaQRLPyxt71M=AJvrGjYFC-wK5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

I've got an off-list bug report from Alexander Lakhin involving a
placeholder variable. Alena and Andrei proposed a fix. It is fairly
simple: we just shouldn't remove PHVs during self-join elimination, as
they might still be referenced from other parts of a query. The patch
is attached. I'm going to fix this if no objections.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v1-0001-Disallow-removing-placeholders-during-Self-Join-E.patch	application/octet-stream	5.2 KB

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-04 02:53:54
Message-ID:	CAMbWs48cO0UUkMT6SbCVyXSx+GvNcjvkEJwt1iDNhT1D2kRPyA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> I've got an off-list bug report from Alexander Lakhin involving a
> placeholder variable. Alena and Andrei proposed a fix. It is fairly
> simple: we just shouldn't remove PHVs during self-join elimination, as
> they might still be referenced from other parts of a query. The patch
> is attached. I'm going to fix this if no objections.

Hmm, I'm not sure about the fix. It seems to me that it simply
prevents removing any PHVs in the self-join removal case. My concern
is that this might result in PHVs that could actually be removed not
being removed in many cases.

Besides, there's the specific comment above this code explaining the
logic behind the removal of PHVs. Shouldn't that comment be updated
to reflect the changes?

Thanks
Richard

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-04 08:35:39
Message-ID:	28ab16e7-6ace-40aa-a76d-9a110799944d@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/4/25 04:53, Richard Guo wrote:
> On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>> I've got an off-list bug report from Alexander Lakhin involving a
>> placeholder variable. Alena and Andrei proposed a fix. It is fairly
>> simple: we just shouldn't remove PHVs during self-join elimination, as
>> they might still be referenced from other parts of a query. The patch
>> is attached. I'm going to fix this if no objections.
>
> Hmm, I'm not sure about the fix. It seems to me that it simply
> prevents removing any PHVs in the self-join removal case. My concern
> is that this might result in PHVs that could actually be removed not
> being removed in many cases.
Let's play with use cases:
If a PHV is needed in the inner or outer only, it means we have a clause
in the baserestrictinfo that will be transferred to the keeping
relation, and we shouldn't remove the PHV.
Another case is when the PHV is needed in a join clause of the
self-join. I may imagine such a case:

toKeep.x+toRemove.y=PHV

This clause will be transformed to "toKeep.x+toKeep.y=PHV", pushed to
baserestrictinfo of keeping relation and should be saved.
I think it is possible to invent quite a narrow case of clause like the
following:

PHV_evaluated_at_inner = PHV_evaluated_at_outer

It needs to prove reproducibility. But even if it makes sense, it seems
to have no danger for further selectivity estimation compared to the
source clause and is a too-narrow case, isn't it?
In other cases, this PHV is needed something else, and we can't remove it.

Maybe I lost the case you keep in mind? I would like to discover it.

>
> Besides, there's the specific comment above this code explaining the
> logic behind the removal of PHVs. Shouldn't that comment be updated
> to reflect the changes?
It makes sense: for now, it seems that PHV removal should be used in the
case of an outer join removal. In the case of SJE, logically we make a
replacement, not a removal, and we should not reduce the number of
entities involved.

--
regards, Andrei Lepikhov

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-05 10:09:23
Message-ID:	CAPpHfdtXzKVbz9MDdo2KD-5S+EitC-VoUJ_mm03aJ7YSpnWR7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 4, 2025 at 11:35 AM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>
> On 4/4/25 04:53, Richard Guo wrote:
> > On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >> I've got an off-list bug report from Alexander Lakhin involving a
> >> placeholder variable. Alena and Andrei proposed a fix. It is fairly
> >> simple: we just shouldn't remove PHVs during self-join elimination, as
> >> they might still be referenced from other parts of a query. The patch
> >> is attached. I'm going to fix this if no objections.
> >
> > Hmm, I'm not sure about the fix. It seems to me that it simply
> > prevents removing any PHVs in the self-join removal case. My concern
> > is that this might result in PHVs that could actually be removed not
> > being removed in many cases.
> Let's play with use cases:
> If a PHV is needed in the inner or outer only, it means we have a clause
> in the baserestrictinfo that will be transferred to the keeping
> relation, and we shouldn't remove the PHV.
> Another case is when the PHV is needed in a join clause of the
> self-join. I may imagine such a case:
>
> toKeep.x+toRemove.y=PHV
>
> This clause will be transformed to "toKeep.x+toKeep.y=PHV", pushed to
> baserestrictinfo of keeping relation and should be saved.
> I think it is possible to invent quite a narrow case of clause like the
> following:
>
> PHV_evaluated_at_inner = PHV_evaluated_at_outer
>
> It needs to prove reproducibility. But even if it makes sense, it seems
> to have no danger for further selectivity estimation compared to the
> source clause and is a too-narrow case, isn't it?
> In other cases, this PHV is needed something else, and we can't remove it.

Should we add more regression tests covering these cases?

> >
> > Besides, there's the specific comment above this code explaining the
> > logic behind the removal of PHVs. Shouldn't that comment be updated
> > to reflect the changes?
> It makes sense: for now, it seems that PHV removal should be used in the
> case of an outer join removal. In the case of SJE, logically we make a
> replacement, not a removal, and we should not reduce the number of
> entities involved.

I have added a brief comment about that. Check the attached patch.

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v2-0001-Disallow-removing-placeholders-during-Self-Join-E.patch	application/octet-stream	5.7 KB

From:	Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc:	Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-05 21:01:57
Message-ID:	5309f032-e2fc-4913-a5d6-a2a2b5e22b29@postgrespro.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg메이저 토토 사이트SQL

On 05.04.2025 13:09, Alexander Korotkov wrote:
> On Fri, Apr 4, 2025 at 11:35 AM Andrei Lepikhov<lepihov(at)gmail(dot)com> wrote:
>> On 4/4/25 04:53, Richard Guo wrote:
>>> On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov<aekorotkov(at)gmail(dot)com> wrote:
>>>> I've got an off-list bug report from Alexander Lakhin involving a
>>>> placeholder variable. Alena and Andrei proposed a fix. It is fairly
>>>> simple: we just shouldn't remove PHVs during self-join elimination, as
>>>> they might still be referenced from other parts of a query. The patch
>>>> is attached. I'm going to fix this if no objections.
>>> Hmm, I'm not sure about the fix. It seems to me that it simply
>>> prevents removing any PHVs in the self-join removal case. My concern
>>> is that this might result in PHVs that could actually be removed not
>>> being removed in many cases.
>> Let's play with use cases:
>> If a PHV is needed in the inner or outer only, it means we have a clause
>> in the baserestrictinfo that will be transferred to the keeping
>> relation, and we shouldn't remove the PHV.
>> Another case is when the PHV is needed in a join clause of the
>> self-join. I may imagine such a case:
>>
>> toKeep.x+toRemove.y=PHV
>>
>> This clause will be transformed to "toKeep.x+toKeep.y=PHV", pushed to
>> baserestrictinfo of keeping relation and should be saved.
>> I think it is possible to invent quite a narrow case of clause like the
>> following:
>>
>> PHV_evaluated_at_inner = PHV_evaluated_at_outer
>>
>> It needs to prove reproducibility. But even if it makes sense, it seems
>> to have no danger for further selectivity estimation compared to the
>> source clause and is a too-narrow case, isn't it?
>> In other cases, this PHV is needed something else, and we can't remove it.
I agree with that, and I also believe this case will be quite rare in
practice.
> Should we add more regression tests covering these cases?
I experimented with some examples like this and noticed that it does
affect cardinality estimation, though I'm not sure the impact is
significant.
I used the tables from the regression tests, so if they’re appropriate
for reproducing this case, it should be straightforward to add them.

EXPLAIN (analyze, VERBOSE)
SELECT 1
FROM tbl_phv t1
LEFT JOIN (
SELECT 1 AS extra, x, y FROM tbl_phv tl
) t3
JOIN (
SELECT y, x FROM tbl_phv tr
) t4 ON t4.y = t3.y and (t4.x*0) = (t3.x *0)
ON true
WHERE t3.extra IS NOT NULL
AND t4.x + t3.y= (t1.x * 0);

The query plan with transformation:
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3.27..7.16 rows=100 width=4) (actual
time=0.149..0.151 rows=0.00 loops=1)
   Output: 1
   Hash Cond: ((tr.x + tr.y) = (t1.x * 0))
   Buffers: shared hit=2
   -> Seq Scan on public.tbl_phv tr (cost=0.00..2.26 rows=100
width=8) (actual time=0.027..0.047 rows=100.00 loops=1)
         Output: tr.x, tr.y
         Filter: ((1 IS NOT NULL) AND ((tr.x * 0) IS NOT NULL))
         Rows Removed by Filter: 1
         Buffers: shared hit=1
   -> Hash (cost=2.01..2.01 rows=101 width=4) (actual
time=0.078..0.079 rows=100.00 loops=1)
         Output: t1.x
         Buckets: 1024 Batches: 1 Memory Usage: 12kB
         Buffers: shared hit=1
         -> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101
width=4) (actual time=0.005..0.038 rows=101.00 loops=1)
               Output: t1.x
               Buffers: shared hit=1
Planning Time: 0.607 ms
Execution Time: 0.194 ms
(18 rows)

The query plan without transformation:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=6.09..8.74 rows=1 width=4) (actual time=0.248..0.250
rows=0.00 loops=1)
Output:1
Hash Cond:((t1.x * 0) = (tr.x + tl.y))
Buffers:shared hit=3
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101 width=4)
(actual time=0.011..0.021 rows=101.00 loops=1)
Output:t1.x, t1.y
Buffers:shared hit=1
-> Hash (cost=6.08..6.08 rows=1 width=8) (actual time=0.206..0.207
rows=100.00 loops=1)
Output:tl.y, tr.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=2
-> Hash Join (cost=3.51..6.08 rows=1 width=8) (actual time=0.090..0.160
rows=100.00 loops=1)
Output:tl.y, tr.x
Inner Unique:true
Hash Cond:((tr.y = tl.y) AND ((tr.x * 0) = (tl.x * 0)))
Buffers:shared hit=2
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=101 width=8)
(actual time=0.005..0.016 rows=101.00 loops=1)
Output:tr.x, tr.y
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=100 width=8) (actual time=0.080..0.080
rows=100.00 loops=1)
Output:tl.y, tl.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv tl (cost=0.00..2.01 rows=100 width=8)
(actual time=0.008..0.035 rows=101.00 loops=1)
Output:tl.y, tl.x
Filter:(1 IS NOT NULL)
Buffers:shared hit=1
Planning:
Buffers:shared hit=25
Planning Time:0.609 ms
Execution Time:0.319 ms
(31 rows)

EXPLAIN (analyze, VERBOSE)
SELECT 1
FROM tbl_phv t1
LEFT JOIN (
SELECT 1 AS extra, x, y FROM tbl_phv tl
) t3
JOIN (
SELECT y, x FROM tbl_phv tr
) t4 ON t4.y = t3.y
ON true
WHERE t3.extra IS NOT NULL
AND t4.x + t3.y= (t1.x * 0);

The query plan with transformation:
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3.26..6.90 rows=100 width=4) (actual
time=0.145..0.146 rows=0.00 loops=1)
   Output: 1
   Hash Cond: ((t1.x * 0) = (tr.x + tr.y))
   Buffers: shared hit=2
   -> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101
width=4) (actual time=0.030..0.040 rows=101.00 loops=1)
         Output: t1.x, t1.y
         Buffers: shared hit=1
   -> Hash (cost=2.01..2.01 rows=100 width=8) (actual
time=0.081..0.081 rows=100.00 loops=1)
         Output: tr.y, tr.x
         Buckets: 1024 Batches: 1 Memory Usage: 12kB
         Buffers: shared hit=1
         -> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=100
width=8) (actual time=0.012..0.040 rows=101.00 loops=1)
               Output: tr.y, tr.x
               Filter: (1 IS NOT NULL)
               Buffers: shared hit=1
Planning Time: 0.571 ms
Execution Time: 0.195 ms
(17 rows)

The query plan without transformation:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=6.53..9.94 rows=50 width=4) (actual time=0.236..0.238
rows=0.00 loops=1)
Output:1
Hash Cond:((tr.x + tl.y) = (t1.x * 0))
Buffers:shared hit=3
-> Hash Join (cost=3.26..5.55 rows=100 width=8) (actual
time=0.075..0.137 rows=101.00 loops=1)
Output:tl.y, tr.x
Inner Unique:true
Hash Cond:(tr.y = tl.y)
Buffers:shared hit=2
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=101 width=8)
(actual time=0.005..0.017 rows=101.00 loops=1)
Output:tr.x, tr.y
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=100 width=4) (actual time=0.064..0.065
rows=101.00 loops=1)
Output:tl.y
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv tl (cost=0.00..2.01 rows=100 width=4)
(actual time=0.006..0.033 rows=101.00 loops=1)
Output:tl.y
Filter:(1 IS NOT NULL)
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=101 width=4) (actual time=0.078..0.078
rows=100.00 loops=1)
Output:t1.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101 width=4)
(actual time=0.014..0.039 rows=101.00 loops=1)
Output:t1.x
Buffers:shared hit=1
Planning:
Buffers:shared hit=12
Planning Time:0.478 ms
Execution Time:0.293 ms
(31 rows)

--
Regards,
Alena Rybakina
Postgres Professional

From:	Tender Wang <tndrwang(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Andrei Lepikhov <lepihov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-06 04:02:38
Message-ID:	CAHewXN=+ghd6O6im46q7j2u6c3H6vkXtXmF=_v4CfGSnjje8PA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I find that the postgresql.conf.sample file doesn't contain
enable_self_join_elimination guc.
If this is necessary, please see the attached patch.

--
Thanks, Tender Wang

Attachment	Content-Type	Size
0001-Put-enable_self_join_elimination-into-postgresql.con.patch	text/plain	757 bytes

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Tender Wang <tndrwang(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-06 11:23:24
Message-ID:	997269A6-FE8A-4E14-BBA7-150BE52BAA3C@yesql.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 6 Apr 2025, at 06:02, Tender Wang <tndrwang(at)gmail(dot)com> wrote:

> I find that the postgresql.conf.sample file doesn't contain enable_self_join_elimination guc.
> If this is necessary, please see the attached patch.

The GUC is marked as not supposed by the in the sample file, either it really
shouldn't or that needs to be removed from the GUC declaration.

--
Daniel Gustafsson

From:	Tender Wang <tndrwang(at)gmail(dot)com>
To:	Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-06 11:42:01
Message-ID:	CAHewXN==F2zZ9S+eNKnuLELunxhhCVafdGkr587MFEekS=qGRA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Daniel Gustafsson <daniel(at)yesql(dot)se> 于2025年4月6日周日 19:23写道：

> > On 6 Apr 2025, at 06:02, Tender Wang <tndrwang(at)gmail(dot)com> wrote:
>
> > I find that the postgresql.conf.sample file doesn't contain
> enable_self_join_elimination guc.
> > If this is necessary, please see the attached patch.
>
> The GUC is marked as not supposed by the in the sample file, either it
> really
> shouldn't or that needs to be removed from the GUC declaration.
>

Sorry, I didn't see the GUC_NOT_IN_SAMPLE flag. Thanks for the reminder.
Please ignore the noise.

--
Thanks, Tender Wang

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tender Wang <tndrwang(at)gmail(dot)com>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-06 11:49:54
Message-ID:	CAPpHfdsqMTEsmxk3aQwt6xPz+KpUELO=6fzmER9ZRGrbs4uMfA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Apr 6, 2025 at 2:42 PM Tender Wang <tndrwang(at)gmail(dot)com> wrote:
> Daniel Gustafsson <daniel(at)yesql(dot)se> 于2025年4月6日周日 19:23写道：
>>
>> > On 6 Apr 2025, at 06:02, Tender Wang <tndrwang(at)gmail(dot)com> wrote:
>>
>> > I find that the postgresql.conf.sample file doesn't contain enable_self_join_elimination guc.
>> > If this is necessary, please see the attached patch.
>>
>> The GUC is marked as not supposed by the in the sample file, either it really
>> shouldn't or that needs to be removed from the GUC declaration.
>
>
> Sorry, I didn't see the GUC_NOT_IN_SAMPLE flag. Thanks for the reminder.
> Please ignore the noise.

Also, sorry for thoughtless commit of that.
Nevertheless, should we consider revisiting this flag? I see the only
other GUC simultaneously QUERY_TUNING_METHOD and GUC_NOT_IN_SAMPLE is
optimize_bounded_sort, which is not exposed in a standard build.

------
Regards,
Alexander Korotkov
Supabase

From:	Tender Wang <tndrwang(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-06 13:09:22
Message-ID:	CAHewXNkfZinf5=Buvb8_ysKqxLbncfeQ2N4WF3TxjewyjCsEHQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 결과SQL

Alexander Korotkov <aekorotkov(at)gmail(dot)com> 于2025年4月6日周日 19:50写道：

> On Sun, Apr 6, 2025 at 2:42 PM Tender Wang <tndrwang(at)gmail(dot)com> wrote:
> > Daniel Gustafsson <daniel(at)yesql(dot)se> 于2025年4月6日周日 19:23写道：
> >>
> >> > On 6 Apr 2025, at 06:02, Tender Wang <tndrwang(at)gmail(dot)com> wrote:
> >>
> >> > I find that the postgresql.conf.sample file doesn't contain
> enable_self_join_elimination guc.
> >> > If this is necessary, please see the attached patch.
> >>
> >> The GUC is marked as not supposed by the in the sample file, either it
> really
> >> shouldn't or that needs to be removed from the GUC declaration.
> >
> >
> > Sorry, I didn't see the GUC_NOT_IN_SAMPLE flag. Thanks for the reminder.
> > Please ignore the noise.
>
> Also, sorry for thoughtless commit of that.
> Nevertheless, should we consider revisiting this flag? I see the only
> other GUC simultaneously QUERY_TUNING_METHOD and GUC_NOT_IN_SAMPLE is
> optimize_bounded_sort, which is not exposed in a standard build.
>

Agree.
Putting enable_self_join_elimination into sample file seems a good choice.

--
Thanks, Tender Wang

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tender Wang <tndrwang(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-06 14:41:38
Message-ID:	2831340.1743950498@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> Nevertheless, should we consider revisiting this flag? I see the only
> other GUC simultaneously QUERY_TUNING_METHOD and GUC_NOT_IN_SAMPLE is
> optimize_bounded_sort, which is not exposed in a standard build.

enable_self_join_elimination is documented, and it has no hint that
it is any different from any other enable_foo flag. It shouldn't
be different.

regards, tom lane

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tender Wang <tndrwang(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-06 16:46:36
Message-ID:	CAPpHfdvA=A2cvGoa8bHcnD+Wz6uvLFbVsDBhm+MvuYF32Cq33g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Apr 6, 2025 at 5:41 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > Nevertheless, should we consider revisiting this flag? I see the only
> > other GUC simultaneously QUERY_TUNING_METHOD and GUC_NOT_IN_SAMPLE is
> > optimize_bounded_sort, which is not exposed in a standard build.
>
> enable_self_join_elimination is documented, and it has no hint that
> it is any different from any other enable_foo flag. It shouldn't
> be different.

Thank you for your point, Tom. Will do this later today.

------
Regards,
Alexander Korotkov
Supabase

From:	Richard Guo <guofenglinux(at)gmail(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-07 03:12:24
Message-ID:	CAMbWs4_xDqpfB-8Yo-27VwyG+ooY=hTUNCcpweffq_kauN=BsA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

FWIW, I reported some issues with this commit in [1]. Any thoughts on
how to fix them?

[1] /message-id/flat/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com

Thanks
Richard

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Richard Guo <guofenglinux(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tender Wang <tndrwang(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject:	Re: Removing unneeded self joins
Date:	2025-04-07 13:11:55
Message-ID:	CAPpHfdtUb547HuHxkAzSJ37_mofu5ppJXKt2D-xZ8xy5BG6NEA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Richard!

On Mon, Apr 7, 2025 at 6:12 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
> FWIW, I reported some issues with this commit in [1]. Any thoughts on
> how to fix them?
>
> [1] /message-id/flat/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com

Thank you for pointing. Looking into that!

------
Regards,
Alexander Korotkov
Supabase