Quick Links

Re: Skip collecting decoded changes of already-aborted transactions

Lists:	Postg스포츠 토토SQL

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Skip collecting decoded changes of already-aborted transactions
Date:	2023-06-09 05:16:44
Message-ID:	CAD21AoDht9Pz_DFv_R2LqBTBbO4eGrpa9Vojmt5z5sEx3XwD7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

In logical decoding, we don't need to collect decoded changes of
aborted transactions. While streaming changes, we can detect
concurrent abort of the (sub)transaction but there is no mechanism to
skip decoding changes of transactions that are known to already be
aborted. With the attached WIP patch, we check CLOG when decoding the
transaction for the first time. If it's already known to be aborted,
we skip collecting decoded changes of such transactions. That way,
when the logical replication is behind or restarts, we don't need to
decode large transactions that already aborted, which helps improve
the decoding performance.

Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
skip_decoding_already_aborted_txn.patch	application/octet-stream	11.8 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2023-06-10 20:31:17
Message-ID:	20230610203117.u7syv4zzhcekhwjk@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote:
> In logical decoding, we don't need to collect decoded changes of
> aborted transactions. While streaming changes, we can detect
> concurrent abort of the (sub)transaction but there is no mechanism to
> skip decoding changes of transactions that are known to already be
> aborted. With the attached WIP patch, we check CLOG when decoding the
> transaction for the first time. If it's already known to be aborted,
> we skip collecting decoded changes of such transactions. That way,
> when the logical replication is behind or restarts, we don't need to
> decode large transactions that already aborted, which helps improve
> the decoding performance.

It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't
return true when a transaction was implicitly aborted due to a crash /
restart. You're also supposed to use it only after a preceding
TransactionIdIsInProgress() call.

I'm not sure there are issues with not checking TransactionIdIsInProgress()
first in this case, but I'm also not sure there aren't.

A separate issue is that TransactionIdDidAbort() can end up being very slow if
a lot of transactions are in progress concurrently. As soon as the clog
buffers are extended all time is spent copying pages from the kernel
pagecache. I'd not at all be surprised if this changed causes a substantial
slowdown in workloads with lots of small transactions, where most transactions
commit.

Greetings,

Andres Freund

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2023-06-13 08:35:45
Message-ID:	CAD21AoBQQtkgeLPMG+JPKHzOi5aksp4_cVnbXEHzQQjb4cOaAw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-06-09 14:16:44 +0900, Masahiko Sawada wrote:
> > In logical decoding, we don't need to collect decoded changes of
> > aborted transactions. While streaming changes, we can detect
> > concurrent abort of the (sub)transaction but there is no mechanism to
> > skip decoding changes of transactions that are known to already be
> > aborted. With the attached WIP patch, we check CLOG when decoding the
> > transaction for the first time. If it's already known to be aborted,
> > we skip collecting decoded changes of such transactions. That way,
> > when the logical replication is behind or restarts, we don't need to
> > decode large transactions that already aborted, which helps improve
> > the decoding performance.
>

Thank you for the comment.

> It's very easy to get uses of TransactionIdDidAbort() wrong. For one, it won't
> return true when a transaction was implicitly aborted due to a crash /
> restart. You're also supposed to use it only after a preceding
> TransactionIdIsInProgress() call.
>
> I'm not sure there are issues with not checking TransactionIdIsInProgress()
> first in this case, but I'm also not sure there aren't.

Yeah, it seems to be better to use !TransactionIdDidCommit() with a
preceding TransactionIdIsInProgress() check.

>
> A separate issue is that TransactionIdDidAbort() can end up being very slow if
> a lot of transactions are in progress concurrently. As soon as the clog
> buffers are extended all time is spent copying pages from the kernel
> pagecache. I'd not at all be surprised if this changed causes a substantial
> slowdown in workloads with lots of small transactions, where most transactions
> commit.
>

Indeed. So it should check the transaction status less frequently. It
doesn't benefit much even if we can skip collecting decoded changes of
small transactions. Another idea is that we check the status of only
large transactions. That is, when the size of decoded changes of an
aborted transaction exceeds logical_decoding_work_mem, we mark it as
aborted , free its changes decoded so far, and skip further
collection.

Regards

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2023-06-15 10:50:12
Message-ID:	CAA4eK1+_412vXR+JwK_5_pPZm4zGr7qey_cVAS+r4OX5G_4wkg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > a lot of transactions are in progress concurrently. As soon as the clog
> > buffers are extended all time is spent copying pages from the kernel
> > pagecache. I'd not at all be surprised if this changed causes a substantial
> > slowdown in workloads with lots of small transactions, where most transactions
> > commit.
> >
>
> Indeed. So it should check the transaction status less frequently. It
> doesn't benefit much even if we can skip collecting decoded changes of
> small transactions. Another idea is that we check the status of only
> large transactions. That is, when the size of decoded changes of an
> aborted transaction exceeds logical_decoding_work_mem, we mark it as
> aborted , free its changes decoded so far, and skip further
> collection.
>

Your idea might work for large transactions but I have not come across
reports where this is reported as a problem. Do you see any such
reports and can we see how much is the benefit with large
transactions? Because we do have the handling of concurrent aborts
during sys table scans and that might help sometimes for large
transactions.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2023-06-21 02:41:52
Message-ID:	CAD21AoCRH3k1azY1+tKX0HYi+4r=3oQ_p8fmPT0D66h0KJQs2Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > > a lot of transactions are in progress concurrently. As soon as the clog
> > > buffers are extended all time is spent copying pages from the kernel
> > > pagecache. I'd not at all be surprised if this changed causes a substantial
> > > slowdown in workloads with lots of small transactions, where most transactions
> > > commit.
> > >
> >
> > Indeed. So it should check the transaction status less frequently. It
> > doesn't benefit much even if we can skip collecting decoded changes of
> > small transactions. Another idea is that we check the status of only
> > large transactions. That is, when the size of decoded changes of an
> > aborted transaction exceeds logical_decoding_work_mem, we mark it as
> > aborted , free its changes decoded so far, and skip further
> > collection.
> >
>
> Your idea might work for large transactions but I have not come across
> reports where this is reported as a problem. Do you see any such
> reports and can we see how much is the benefit with large
> transactions? Because we do have the handling of concurrent aborts
> during sys table scans and that might help sometimes for large
> transactions.

I've heard there was a case where a user had 29 million deletes in a
single transaction with each one wrapped in a savepoint and rolled it
back, which led to 11TB of spill files. If decoding such a large
transaction fails for some reasons (e.g. a disk full), it would try
decoding the same transaction again and again.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2023-06-22 09:07:42
Message-ID:	CAA4eK1KVw_a2m82JWtd__oMGWnANM=-e4gyCRhHMAZ+XUbj+rA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 21, 2023 at 8:12 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > >
> > > > A separate issue is that TransactionIdDidAbort() can end up being very slow if
> > > > a lot of transactions are in progress concurrently. As soon as the clog
> > > > buffers are extended all time is spent copying pages from the kernel
> > > > pagecache. I'd not at all be surprised if this changed causes a substantial
> > > > slowdown in workloads with lots of small transactions, where most transactions
> > > > commit.
> > > >
> > >
> > > Indeed. So it should check the transaction status less frequently. It
> > > doesn't benefit much even if we can skip collecting decoded changes of
> > > small transactions. Another idea is that we check the status of only
> > > large transactions. That is, when the size of decoded changes of an
> > > aborted transaction exceeds logical_decoding_work_mem, we mark it as
> > > aborted , free its changes decoded so far, and skip further
> > > collection.
> > >
> >
> > Your idea might work for large transactions but I have not come across
> > reports where this is reported as a problem. Do you see any such
> > reports and can we see how much is the benefit with large
> > transactions? Because we do have the handling of concurrent aborts
> > during sys table scans and that might help sometimes for large
> > transactions.
>
> I've heard there was a case where a user had 29 million deletes in a
> single transaction with each one wrapped in a savepoint and rolled it
> back, which led to 11TB of spill files. If decoding such a large
> transaction fails for some reasons (e.g. a disk full), it would try
> decoding the same transaction again and again.
>

I was thinking why the existing handling of concurrent aborts doesn't
handle such a case and it seems that we check that only on catalog
access. However, in your case, the user probably is accessing the same
relation without any concurrent DDL on the same table, so it would
just be a cache look-up for catalogs. Your idea of checking aborts
every logical_decoding_work_mem should work for such cases.

--
With Regards,
Amit Kapila.

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2023-06-23 03:38:57
Message-ID:	CAFiTN-vfEOGdbvOjvWSKiMuPykbF9bQBBDUfXEmFqARU+gAifQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Hi,
>
> In logical decoding, we don't need to collect decoded changes of
> aborted transactions. While streaming changes, we can detect
> concurrent abort of the (sub)transaction but there is no mechanism to
> skip decoding changes of transactions that are known to already be
> aborted. With the attached WIP patch, we check CLOG when decoding the
> transaction for the first time. If it's already known to be aborted,
> we skip collecting decoded changes of such transactions. That way,
> when the logical replication is behind or restarts, we don't need to
> decode large transactions that already aborted, which helps improve
> the decoding performance.
>
+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled). Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2023-07-03 01:45:52
Message-ID:	CAD21AoAmN0O6FUUwTEDNoGs5ejjj0wABeqR791b479J4jn4xAA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > In logical decoding, we don't need to collect decoded changes of
> > aborted transactions. While streaming changes, we can detect
> > concurrent abort of the (sub)transaction but there is no mechanism to
> > skip decoding changes of transactions that are known to already be
> > aborted. With the attached WIP patch, we check CLOG when decoding the
> > transaction for the first time. If it's already known to be aborted,
> > we skip collecting decoded changes of such transactions. That way,
> > when the logical replication is behind or restarts, we don't need to
> > decode large transactions that already aborted, which helps improve
> > the decoding performance.
> >
> +1 for the idea of checking the transaction status only when we need
> to flush it to the disk or send it downstream (if streaming in
> progress is enabled). Although this check is costly since we are
> planning only for large transactions then it is worth it if we can
> occasionally avoid disk or network I/O for the aborted transactions.
>

Thanks.

I've attached the updated patch. With this patch, we check the
transaction status for only large-transactions when eviction. For
regression test purposes, I disable this transaction status check when
logical_replication_mode is set to 'immediate'.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v2-0001-Skip-decoding-already-aborted-transactions.patch	application/octet-stream	9.1 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2023-10-03 10:24:17
Message-ID:	CALDaNm0b309D4vnjNYR8vwAvXvHyckPXX90t5svYcPdjdndmiw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > In logical decoding, we don't need to collect decoded changes of
> > > aborted transactions. While streaming changes, we can detect
> > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > skip decoding changes of transactions that are known to already be
> > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > transaction for the first time. If it's already known to be aborted,
> > > we skip collecting decoded changes of such transactions. That way,
> > > when the logical replication is behind or restarts, we don't need to
> > > decode large transactions that already aborted, which helps improve
> > > the decoding performance.
> > >
> > +1 for the idea of checking the transaction status only when we need
> > to flush it to the disk or send it downstream (if streaming in
> > progress is enabled). Although this check is costly since we are
> > planning only for large transactions then it is worth it if we can
> > occasionally avoid disk or network I/O for the aborted transactions.
> >
>
> Thanks.
>
> I've attached the updated patch. With this patch, we check the
> transaction status for only large-transactions when eviction. For
> regression test purposes, I disable this transaction status check when
> logical_replication_mode is set to 'immediate'.

May be there is some changes that are missing in the patch, which is
giving the following errors:
reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
(first use in this function)
3584 | if (unlikely(logical_replication_mode ==
LOGICAL_REP_MODE_IMMEDIATE))
| ^~~~~~~~~~~~~~~~~~~~~~~~

Regards,
Vignesh

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-02-01 15:47:59
Message-ID:	CALDaNm00kAVenyr7sMCkgFo1dhfj-rGrg4EVXHZfoie5T=gZPg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > In logical decoding, we don't need to collect decoded changes of
> > > > aborted transactions. While streaming changes, we can detect
> > > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > > skip decoding changes of transactions that are known to already be
> > > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > > transaction for the first time. If it's already known to be aborted,
> > > > we skip collecting decoded changes of such transactions. That way,
> > > > when the logical replication is behind or restarts, we don't need to
> > > > decode large transactions that already aborted, which helps improve
> > > > the decoding performance.
> > > >
> > > +1 for the idea of checking the transaction status only when we need
> > > to flush it to the disk or send it downstream (if streaming in
> > > progress is enabled). Although this check is costly since we are
> > > planning only for large transactions then it is worth it if we can
> > > occasionally avoid disk or network I/O for the aborted transactions.
> > >
> >
> > Thanks.
> >
> > I've attached the updated patch. With this patch, we check the
> > transaction status for only large-transactions when eviction. For
> > regression test purposes, I disable this transaction status check when
> > logical_replication_mode is set to 'immediate'.
>
> May be there is some changes that are missing in the patch, which is
> giving the following errors:
> reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
> reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
> (first use in this function)
> 3584 | if (unlikely(logical_replication_mode ==
> LOGICAL_REP_MODE_IMMEDIATE))
> | ^~~~~~~~~~~~~~~~~~~~~~~~

With no update to the thread and the compilation still failing I'm
marking this as returned with feedback. Please feel free to resubmit
to the next CF when there is a new version of the patch.

Regards,
Vignesh

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-02-15 05:49:55
Message-ID:	CAD21AoC_9scyNvpVo5cUXLCKBUQsHKvT_h+=RmHKGCZbd4d4LQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 2, 2024 at 12:48 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, 3 Oct 2023 at 15:54, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, 3 Jul 2023 at 07:16, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > In logical decoding, we don't need to collect decoded changes of
> > > > > aborted transactions. While streaming changes, we can detect
> > > > > concurrent abort of the (sub)transaction but there is no mechanism to
> > > > > skip decoding changes of transactions that are known to already be
> > > > > aborted. With the attached WIP patch, we check CLOG when decoding the
> > > > > transaction for the first time. If it's already known to be aborted,
> > > > > we skip collecting decoded changes of such transactions. That way,
> > > > > when the logical replication is behind or restarts, we don't need to
> > > > > decode large transactions that already aborted, which helps improve
> > > > > the decoding performance.
> > > > >
> > > > +1 for the idea of checking the transaction status only when we need
> > > > to flush it to the disk or send it downstream (if streaming in
> > > > progress is enabled). Although this check is costly since we are
> > > > planning only for large transactions then it is worth it if we can
> > > > occasionally avoid disk or network I/O for the aborted transactions.
> > > >
> > >
> > > Thanks.
> > >
> > > I've attached the updated patch. With this patch, we check the
> > > transaction status for only large-transactions when eviction. For
> > > regression test purposes, I disable this transaction status check when
> > > logical_replication_mode is set to 'immediate'.
> >
> > May be there is some changes that are missing in the patch, which is
> > giving the following errors:
> > reorderbuffer.c: In function ‘ReorderBufferCheckTXNAbort’:
> > reorderbuffer.c:3584:22: error: ‘logical_replication_mode’ undeclared
> > (first use in this function)
> > 3584 | if (unlikely(logical_replication_mode ==
> > LOGICAL_REP_MODE_IMMEDIATE))
> > | ^~~~~~~~~~~~~~~~~~~~~~~~
>
> With no update to the thread and the compilation still failing I'm
> marking this as returned with feedback. Please feel free to resubmit
> to the next CF when there is a new version of the patch.
>

I resumed working on this item. I've attached the new version patch.

I rebased the patch to the current HEAD and updated comments and
commit messages. The patch is straightforward and I'm somewhat
satisfied with it, but I'm thinking of adding some tests for it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v3-0001-Skip-logical-decoding-of-already-aborted-transact.patch	application/x-patch	9.4 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-03-15 04:20:51
Message-ID:	CAFPTHDbXZwj+_+ybnuZ2E7gi8ERb3Pqw5=RVUo2kAQP2xV5hnQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

>
> I resumed working on this item. I've attached the new version patch.
>
> I rebased the patch to the current HEAD and updated comments and
> commit messages. The patch is straightforward and I'm somewhat
> satisfied with it, but I'm thinking of adding some tests for it.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

I just had a look at the patch, the patch no longer applies because of a
removal of a header in a recent commit. Overall the patch looks fine, and I
didn't find any issues. Some cosmetic comments:
in ReorderBufferCheckTXNAbort()
+ /* Quick return if we've already knew the transaction status */
+ if (txn->aborted)
+ return true;

knew/know

/*
+ * If logical_replication_mode is "immediate", we don't check the
+ * transaction status so the caller always process this transaction.
+ */
+ if (debug_logical_replication_streaming ==
DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
+ return false;

/process/processes

regards,
Ajin Cherian
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-03-18 08:49:41
Message-ID:	CAD21AoDTMy_gUHB9s=7bG0Q7hzCd-WuqzdeYtMSFVVwBVyYdFQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 15, 2024 at 1:21 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
>
>
> On Fri, Mar 15, 2024 at 3:17 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>>
>> I resumed working on this item. I've attached the new version patch.
>>
>> I rebased the patch to the current HEAD and updated comments and
>> commit messages. The patch is straightforward and I'm somewhat
>> satisfied with it, but I'm thinking of adding some tests for it.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> Amazon Web Services: https://aws.amazon.com
>
>
> I just had a look at the patch, the patch no longer applies because of a removal of a header in a recent commit. Overall the patch looks fine, and I didn't find any issues. Some cosmetic comments:

Thank you for your review comments.

> in ReorderBufferCheckTXNAbort()
> + /* Quick return if we've already knew the transaction status */
> + if (txn->aborted)
> + return true;
>
> knew/know

Maybe it should be "known"?

>
> /*
> + * If logical_replication_mode is "immediate", we don't check the
> + * transaction status so the caller always process this transaction.
> + */
> + if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE)
> + return false;
>
> /process/processes
>

Fixed.

In addition to these changes, I've made some changes to the latest
patch. Here is the summary:

- Use txn_flags field to record the transaction status instead of two
'committed' and 'aborted' flags.
- Add regression tests.
- Update commit message.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v4-0001-Skip-logical-decoding-of-already-aborted-transact.patch	application/octet-stream	13.3 KB

From:	Ajin Cherian <itsajin(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-03-27 11:49:14
Message-ID:	CAFPTHDZ=EA7VpjrWa0wMcXhubuOM82-g1b-0d5JZF2FjzLtDrQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

>
> In addition to these changes, I've made some changes to the latest
> patch. Here is the summary:
>
> - Use txn_flags field to record the transaction status instead of two
> 'committed' and 'aborted' flags.
> - Add regression tests.
> - Update commit message.
>
> Regards,
>
>
Hi Sawada-san,

Thanks for the updated patch. Some comments:

1.
+ * already aborted, we discards all changes accumulated so far and ignore
+ * future changes, and return true. Otherwise return false.

we discards/we discard

2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I
wonder how prepared transactions would be considered, they are neither
committed, nor in progress.

regards,
Ajin Cherian
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-03-27 12:22:43
Message-ID:	CAD21AoAZJODNYq2LU9PT_Y+RKm50cV-D3m+Bpt7jT8pwgQ2Jvg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 27, 2024 at 8:49 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
>
>
> On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>>
>> In addition to these changes, I've made some changes to the latest
>> patch. Here is the summary:
>>
>> - Use txn_flags field to record the transaction status instead of two
>> 'committed' and 'aborted' flags.
>> - Add regression tests.
>> - Update commit message.
>>
>> Regards,
>>
>
> Hi Sawada-san,
>
> Thanks for the updated patch. Some comments:

Thank you for the view comments!

>
> 1.
> + * already aborted, we discards all changes accumulated so far and ignore
> + * future changes, and return true. Otherwise return false.
>
> we discards/we discard

Will fix it.

>
> 2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered, they are neither committed, nor in progress.

The transaction that is prepared but not resolved yet is considered as
in-progress.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-06-12 02:41:02
Message-ID:	CAHut+Pu+5PLiA57rgepPaMPc5LpEDd9JULk2TkmWh32SHuxbQw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, here are some review comments for your patch v4-0001.

======
contrib/test_decoding/sql/stats.sql

1.
Huh? The test fails because the "expected results" file for these new
tests is missing from the patch.

======
.../replication/logical/reorderbuffer.c

2.
static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);

IIUC this new 'mark_streamed' parameter is more like a prerequisite
for the other conditions to decide to mark the tx as streamed -- i.e.
it is more like 'can_mark_streamed', so I felt the name should be
changed to be like that (everywhere it is used).

~~~

3. ReorderBufferTruncateTXN

- * 'txn_prepared' indicates that we have decoded the transaction at prepare
- * time.
+ * If mark_streamed is true, we could mark the transaction as streamed.
+ *
+ * 'streaming_txn' indicates that the given transaction is a
streaming transaction.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_streamed)

What's that new comment about 'streaming_txn' for? It seemed unrelated
to the patch code.

~~~

4.
/*
* Mark the transaction as streamed.
*
* The top-level transaction, is marked as streamed always, even if it
* does not contain any changes (that is, when all the changes are in
* subtransactions).
*
* For subtransactions, we only mark them as streamed when there are
* changes in them.
*
* We do it this way because of aborts - we don't want to send aborts for
* XIDs the downstream is not aware of. And of course, it always knows
* about the toplevel xact (we send the XID in all messages), but we never
* stream XIDs of empty subxacts.
*/
if (mark_streamed && (!txn_prepared) &&
(rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;

With the patch introduction of the new parameter, I felt this code
might be better if it was refactored as follows:

/* Mark the transaction as streamed, if appropriate. */
if (can_mark_streamed)
{
/*
... large comment
*/
if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;
}

~~~

5. ReorderBufferPrepare

- if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
+ if (!txn_aborted && rbtxn_did_abort(txn) && !rbtxn_is_streamed(txn))
rb->prepare(rb, txn, txn->final_lsn);

Maybe I misunderstood this logic, but won't a "concurrent abort" cause
your new Assert added in ReorderBufferProcessTXN to fail?

+ /* Update transaction status */
+ Assert((curtxn->txn_flags & (RBTXN_COMMITTED | RBTXN_ABORTED)) == 0);

~~~

6. ReorderBufferCheckTXNAbort

+ /* Check the transaction status using CLOG lookup */
+ if (TransactionIdIsInProgress(txn->xid))
+ return false;
+
+ if (TransactionIdDidCommit(txn->xid))
+ {
+ /*
+ * Remember the transaction is committed so that we can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ txn->txn_flags |= RBTXN_COMMITTED;
+ return false;
+ }

IIUC the purpose of the TransactionIdDidCommit() was to avoid the
overhead of calling the TransactionIdIsInProgress(). So, shouldn't the
order of these checks be swapped? Otherwise, there might be 1 extra
unnecessary call to TransactionIdIsInProgress() next time.

======
src/include/replication/reorderbuffer.h

7.
#define RBTXN_PREPARE 0x0040
#define RBTXN_SKIPPED_PREPARE 0x0080
#define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
+#define RBTXN_COMMITTED 0x0200
+#define RBTXN_ABORTED 0x0400

For consistency with the existing bitmask names, I guess these should be named:
- RBTXN_COMMITTED --> RBTXN_IS_COMMITTED
- RBTXN_ABORTED --> RBTXN_IS_ABORTED

~~~

8.
Similarly, IMO the macros should have the same names as the bitmasks,
like the other nearby ones generally seem to.

rbtxn_did_commit --> rbtxn_is_committed
rbtxn_did_abort --> rbtxn_is_aborted

======

9.
Also, attached is a top-up patch for other cosmetic nitpicks:
- comment wording
- typos in comments
- excessive or missing blank lines
- etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment	Content-Type	Size
20240612_PS_nitpicks_for_v4.txt	text/plain	3.7 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-10-29 20:24:15
Message-ID:	CAD21AoDJE-bLdxt9T_z1rw74RN=E0n0+esYU0eo+-_P32EbuVg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry for the late reply.

On Tue, Jun 11, 2024 at 7:41 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi, here are some review comments for your patch v4-0001.

Thank you for reviewing the patch!

>
> ======
> contrib/test_decoding/sql/stats.sql
>
> 1.
> Huh? The test fails because the "expected results" file for these new
> tests is missing from the patch.

Fixed.

>
> ======
> .../replication/logical/reorderbuffer.c
>
> 2.
> static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> - bool txn_prepared);
> + bool txn_prepared, bool mark_streamed);
>
> IIUC this new 'mark_streamed' parameter is more like a prerequisite
> for the other conditions to decide to mark the tx as streamed -- i.e.
> it is more like 'can_mark_streamed', so I felt the name should be
> changed to be like that (everywhere it is used).

Agreed. I think 'txn_streaming' sounds better and consistent with
'txn_prepared'.

>
> ~~~
>
> 3. ReorderBufferTruncateTXN
>
> - * 'txn_prepared' indicates that we have decoded the transaction at prepare
> - * time.
> + * If mark_streamed is true, we could mark the transaction as streamed.
> + *
> + * 'streaming_txn' indicates that the given transaction is a
> streaming transaction.
> */
> static void
> -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> bool txn_prepared)
> +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> bool txn_prepared,
> + bool mark_streamed)
>
> ~
>
> What's that new comment about 'streaming_txn' for? It seemed unrelated
> to the patch code.

Removed.

>
> ~~~
>
> 4.
> /*
> * Mark the transaction as streamed.
> *
> * The top-level transaction, is marked as streamed always, even if it
> * does not contain any changes (that is, when all the changes are in
> * subtransactions).
> *
> * For subtransactions, we only mark them as streamed when there are
> * changes in them.
> *
> * We do it this way because of aborts - we don't want to send aborts for
> * XIDs the downstream is not aware of. And of course, it always knows
> * about the toplevel xact (we send the XID in all messages), but we never
> * stream XIDs of empty subxacts.
> */
> if (mark_streamed && (!txn_prepared) &&
> (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> txn->txn_flags |= RBTXN_IS_STREAMED;
>
> ~~
>
> With the patch introduction of the new parameter, I felt this code
> might be better if it was refactored as follows:
>
> /* Mark the transaction as streamed, if appropriate. */
> if (can_mark_streamed)
> {
> /*
> ... large comment
> */
> if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> txn->txn_flags |= RBTXN_IS_STREAMED;
> }

I think we don't necessarily need to make nested if statements just
for comments.

>
> ~~~
>
> 5. ReorderBufferPrepare
>
> - if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
> + if (!txn_aborted && rbtxn_did_abort(txn) && !rbtxn_is_streamed(txn))
> rb->prepare(rb, txn, txn->final_lsn);
>
> ~
>
> Maybe I misunderstood this logic, but won't a "concurrent abort" cause
> your new Assert added in ReorderBufferProcessTXN to fail?
>
> + /* Update transaction status */
> + Assert((curtxn->txn_flags & (RBTXN_COMMITTED | RBTXN_ABORTED)) == 0);
>

I changed txn_flags checks, which should cover your concerns.

> ~~~
>
> 6. ReorderBufferCheckTXNAbort
>
> + /* Check the transaction status using CLOG lookup */
> + if (TransactionIdIsInProgress(txn->xid))
> + return false;
> +
> + if (TransactionIdDidCommit(txn->xid))
> + {
> + /*
> + * Remember the transaction is committed so that we can skip CLOG
> + * check next time, avoiding the pressure on CLOG lookup.
> + */
> + txn->txn_flags |= RBTXN_COMMITTED;
> + return false;
> + }
>
> IIUC the purpose of the TransactionIdDidCommit() was to avoid the
> overhead of calling the TransactionIdIsInProgress(). So, shouldn't the
> order of these checks be swapped? Otherwise, there might be 1 extra
> unnecessary call to TransactionIdIsInProgress() next time.

I'm not sure I understand your comment. IIUC we should use
TransactionIdDidCommit() with a preceding TransactionIdIsInProgress()
check. Also I think once we found the transaction is committed, we no
longer check the transaction status on CLOG nor call
TransactionIdIsInProgress().

>
> ======
> src/include/replication/reorderbuffer.h
>
> 7.
> #define RBTXN_PREPARE 0x0040
> #define RBTXN_SKIPPED_PREPARE 0x0080
> #define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
> +#define RBTXN_COMMITTED 0x0200
> +#define RBTXN_ABORTED 0x0400
>
> For consistency with the existing bitmask names, I guess these should be named:
> - RBTXN_COMMITTED --> RBTXN_IS_COMMITTED
> - RBTXN_ABORTED --> RBTXN_IS_ABORTED

Agreed and changed.

>
> ~~~
>
> 8.
> Similarly, IMO the macros should have the same names as the bitmasks,
> like the other nearby ones generally seem to.
>
> rbtxn_did_commit --> rbtxn_is_committed
> rbtxn_did_abort --> rbtxn_is_aborted

Changed.

>
> ======
>
> 9.
> Also, attached is a top-up patch for other cosmetic nitpicks:
> - comment wording
> - typos in comments
> - excessive or missing blank lines
> - etc.
>

Applied your patch.

I've attached the updated patch. Will register it for the next commit fest.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v5-0001-Skip-logical-decoding-of-already-aborted-transact.patch	application/octet-stream	17.8 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Ajin Cherian <itsajin(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-10-29 20:29:52
Message-ID:	CAD21AoA2HGq9hEAmc+qaP_zxzHhB=B84seE=OmQ2EqfE4e-KXw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 27, 2024 at 4:49 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
>
>
> On Mon, Mar 18, 2024 at 7:50 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>>
>> In addition to these changes, I've made some changes to the latest
>> patch. Here is the summary:
>>
>> - Use txn_flags field to record the transaction status instead of two
>> 'committed' and 'aborted' flags.
>> - Add regression tests.
>> - Update commit message.
>>
>> Regards,
>>
>
> Hi Sawada-san,
>
> Thanks for the updated patch. Some comments:
>
> 1.
> + * already aborted, we discards all changes accumulated so far and ignore
> + * future changes, and return true. Otherwise return false.
>
> we discards/we discard

This comment is incorporated into the latest v5 patch I've just sent[1].

>
> 2. In function ReorderBufferCheckTXNAbort(): I haven't tested this but I wonder how prepared transactions would be considered, they are neither committed, nor in progress.
>

IIUC prepared transactions are considered as in-progress.

Regards,

[1] /message-id/CAD21AoDJE-bLdxt9T_z1rw74RN%3DE0n0%2BesYU0eo%2B-_P32EbuVg%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-11 07:24:17
Message-ID:	CAHut+PsJBRpZXJidkanBNb-WiJpAsCVuBmYL8L1GYtFHkOv6ZA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Sawada-San, here are some review comments for the patch v5-0001.

======
Commit message.

1.
This commit introduces an additional check to determine if a
transaction is already aborted by a CLOG lookup, so the logical
decoding skips further change also when it doesn't touch system
catalogs.

Is that wording backwards? Is it meant to say:

This commit introduces an additional CLOG lookup check to determine if
a transaction is already aborted, so the ...

======
contrib/test_decoding/sql/stats.sql

2
+SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
spill_count FROM pg_stat_replication_slots WHERE slot_name =
'regression_slot_stats4_twophase';

Why do the SELECT "= 0" like this, instead of just having zeros in the
"expected" results?

======
.../replication/logical/reorderbuffer.c

3.
static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
- bool txn_prepared);
+ bool txn_prepared, bool mark_streamed);

That last parameter name ('mark_streamed') does not match the same
parameter name in this function's definition.

~~~

ReorderBufferTruncateTXN:

4.
if (txn_streaming && (!txn_prepared) &&
(rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
txn->txn_flags |= RBTXN_IS_STREAMED;

if (txn_prepared)
{
~

Since the following condition was already "if (txn_prepared)" would it
be better remove the "(!txn_prepared)" here and instead just refactor
the code like:

if (txn_prepared)
{
...
}
else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
{
...
}

~~~

ReorderBufferProcessTXN:

5.
+
+ /* Remember the transaction is aborted */
+ Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+ curtxn->txn_flags |= RBTXN_IS_ABORTED;

Missing period on comment.

~~~

ReorderBufferCheckTXNAbort:

6.
+ * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
+ * check the transaction status, so the caller always processes this
+ * transaction. This is to disable this check for regression tests.
+ */
+static bool
+ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /*
+ * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
+ * check the transaction status, so the caller always processes this
+ * transaction.
+ */
+ if (unlikely(debug_logical_replication_streaming ==
DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
+ return false;
+

The wording of the sentence "This is to disable..." seemed a bit
confusing. Maybe this area can be simplified by doing the following.

6a.
Change the function comment to say more like below:

When the GUC 'debug_logical_replication_streaming' is set to
"immediate", we don't check the transaction status, meaning the caller
will always process this transaction. This mode is used by regression
tests to avoid unnecessary transaction status checking.

6b.
It is not necessary for this 2nd comment to repeat everything that was
already said in the function comment. A simpler comment here might be
all you need:

SUGGESTION:
Quick return for regression tests.

~~~

7.
Is it worth mentioning about this skipping of the transaction status
check in the docs for this GUC? [1]

======
[1] /docs/devel/runtime-config-developer.html

Kind Regards,
Peter Smith.
Fujitsu Australia.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-11 18:00:06
Message-ID:	CAD21AoDWsWS-1c9BEt2mXw480+zaWqr+36AxhB_LuVZk5N1cFQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawada-San, here are some review comments for the patch v5-0001.
>

Thank you for reviewing the patch!

> ======
> Commit message.
>
> 1.
> This commit introduces an additional check to determine if a
> transaction is already aborted by a CLOG lookup, so the logical
> decoding skips further change also when it doesn't touch system
> catalogs.
>
> ~
>
> Is that wording backwards? Is it meant to say:
>
> This commit introduces an additional CLOG lookup check to determine if
> a transaction is already aborted, so the ...

Fixed.

>
> ======
> contrib/test_decoding/sql/stats.sql
>
> 2
> +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
> spill_count FROM pg_stat_replication_slots WHERE slot_name =
> 'regression_slot_stats4_twophase';
>
> Why do the SELECT "= 0" like this, instead of just having zeros in the
> "expected" results?

Indeed. I used "=0" like other queries in the same file do, but it
makes sense to me just to have zeros in the expected file. That way,
it would make it a bit easier to investigate in case of failures.

>
> ======
> .../replication/logical/reorderbuffer.c
>
> 3.
> static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> - bool txn_prepared);
> + bool txn_prepared, bool mark_streamed);
>
> That last parameter name ('mark_streamed') does not match the same
> parameter name in this function's definition.

Fixed.

>
> ~~~
>
> ReorderBufferTruncateTXN:
>
> 4.
> if (txn_streaming && (!txn_prepared) &&
> (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> txn->txn_flags |= RBTXN_IS_STREAMED;
>
> if (txn_prepared)
> {
> ~
>
> Since the following condition was already "if (txn_prepared)" would it
> be better remove the "(!txn_prepared)" here and instead just refactor
> the code like:
>
> if (txn_prepared)
> {
> ...
> }
> else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> {
> ...
> }

Good idea.

>
> ~~~
>
> ReorderBufferProcessTXN:
>
> 5.
> +
> + /* Remember the transaction is aborted */
> + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> + curtxn->txn_flags |= RBTXN_IS_ABORTED;
>
> Missing period on comment.

Fixed.

>
> ~~~
>
> ReorderBufferCheckTXNAbort:
>
> 6.
> + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> + * check the transaction status, so the caller always processes this
> + * transaction. This is to disable this check for regression tests.
> + */
> +static bool
> +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
> +{
> + /*
> + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> + * check the transaction status, so the caller always processes this
> + * transaction.
> + */
> + if (unlikely(debug_logical_replication_streaming ==
> DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
> + return false;
> +
>
> The wording of the sentence "This is to disable..." seemed a bit
> confusing. Maybe this area can be simplified by doing the following.
>
> 6a.
> Change the function comment to say more like below:
>
> When the GUC 'debug_logical_replication_streaming' is set to
> "immediate", we don't check the transaction status, meaning the caller
> will always process this transaction. This mode is used by regression
> tests to avoid unnecessary transaction status checking.
>
> ~
>
> 6b.
> It is not necessary for this 2nd comment to repeat everything that was
> already said in the function comment. A simpler comment here might be
> all you need:
>
> SUGGESTION:
> Quick return for regression tests.

Agreed with the above two comments. Fixed.

>
> ~~~
>
> 7.
> Is it worth mentioning about this skipping of the transaction status
> check in the docs for this GUC? [1]

If we want to mention this optimization in the docs, we have to
explain how the optimization works too. I think it's too detailed.

I've attached the updated patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v6-0001-Skip-logical-decoding-of-already-aborted-transact.patch	application/octet-stream	18.8 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-12 01:39:34
Message-ID:	CAHut+Psx=acCWfN9Ucs1SP5uCUcS_Y4Luek5NE=bfmp=g+qPeg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 12, 2024 at 5:00 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've attached the updated patch.
>

Hi, here are some review comments for the latest v6-0001.

======
contrib/test_decoding/sql/stats.sql

1.
+INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM
generate_series(1, 5000) g(i);

I didn't understand the meaning of "serialize-topbig--1". My guess is
it is a typo that was supposed to say "toobig".

Perhaps there should also be some comment to explain that this
"toobig" stuff was done deliberately like this to exceed
'logical_decoding_work_mem' because that would normally (if it was not
aborted) cause a spill to disk.

~~~

2.
+-- Check stats. We should not spill anything as the transaction is already
+-- aborted.
+SELECT pg_stat_force_next_flush();
+SELECT slot_name, spill_txns AS spill_txn, spill_count AS spill_count
FROM pg_stat_replication_slots WHERE slot_name =
'regression_slot_stats4_twophase';
+

Those aliases seem unnecessary: "spill_txns AS spill_txn" and
"spill_count AS spill_count"

======
.../replication/logical/reorderbuffer.c

ReorderBufferCheckTXNAbort:

3.
Other static functions are also declared at the top of this module.
For consistency, shouldn't this be the same?

~~~

4.
+ * We don't mark the transaction as streamed since this function can be
+ * called for non-streamed transactions too.
+ */
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
+ ReorderBufferToastReset(rb, txn);

Given the comment says "since this function can be called for
non-streamed transactions too", would it be easier to pass
rbtxn_is_streamed(txn) here instead of 'false', and then just remove
the comment?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-13 03:28:55
Message-ID:	CALDaNm06VtK_9C1nLBMXNWQKtsX68--pzueHFXtdoUqfHGAX1Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 11 Nov 2024 at 23:30, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Hi Sawada-San, here are some review comments for the patch v5-0001.
> >
>
> Thank you for reviewing the patch!
>
> > ======
> > Commit message.
> >
> > 1.
> > This commit introduces an additional check to determine if a
> > transaction is already aborted by a CLOG lookup, so the logical
> > decoding skips further change also when it doesn't touch system
> > catalogs.
> >
> > ~
> >
> > Is that wording backwards? Is it meant to say:
> >
> > This commit introduces an additional CLOG lookup check to determine if
> > a transaction is already aborted, so the ...
>
> Fixed.
>
> >
> > ======
> > contrib/test_decoding/sql/stats.sql
> >
> > 2
> > +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
> > spill_count FROM pg_stat_replication_slots WHERE slot_name =
> > 'regression_slot_stats4_twophase';
> >
> > Why do the SELECT "= 0" like this, instead of just having zeros in the
> > "expected" results?
>
> Indeed. I used "=0" like other queries in the same file do, but it
> makes sense to me just to have zeros in the expected file. That way,
> it would make it a bit easier to investigate in case of failures.
>
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > 3.
> > static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > - bool txn_prepared);
> > + bool txn_prepared, bool mark_streamed);
> >
> > That last parameter name ('mark_streamed') does not match the same
> > parameter name in this function's definition.
>
> Fixed.
>
> >
> > ~~~
> >
> > ReorderBufferTruncateTXN:
> >
> > 4.
> > if (txn_streaming && (!txn_prepared) &&
> > (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > txn->txn_flags |= RBTXN_IS_STREAMED;
> >
> > if (txn_prepared)
> > {
> > ~
> >
> > Since the following condition was already "if (txn_prepared)" would it
> > be better remove the "(!txn_prepared)" here and instead just refactor
> > the code like:
> >
> > if (txn_prepared)
> > {
> > ...
> > }
> > else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > {
> > ...
> > }
>
> Good idea.
>
> >
> > ~~~
> >
> > ReorderBufferProcessTXN:
> >
> > 5.
> > +
> > + /* Remember the transaction is aborted */
> > + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> > + curtxn->txn_flags |= RBTXN_IS_ABORTED;
> >
> > Missing period on comment.
>
> Fixed.
>
> >
> > ~~~
> >
> > ReorderBufferCheckTXNAbort:
> >
> > 6.
> > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > + * check the transaction status, so the caller always processes this
> > + * transaction. This is to disable this check for regression tests.
> > + */
> > +static bool
> > +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > +{
> > + /*
> > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > + * check the transaction status, so the caller always processes this
> > + * transaction.
> > + */
> > + if (unlikely(debug_logical_replication_streaming ==
> > DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
> > + return false;
> > +
> >
> > The wording of the sentence "This is to disable..." seemed a bit
> > confusing. Maybe this area can be simplified by doing the following.
> >
> > 6a.
> > Change the function comment to say more like below:
> >
> > When the GUC 'debug_logical_replication_streaming' is set to
> > "immediate", we don't check the transaction status, meaning the caller
> > will always process this transaction. This mode is used by regression
> > tests to avoid unnecessary transaction status checking.
> >
> > ~
> >
> > 6b.
> > It is not necessary for this 2nd comment to repeat everything that was
> > already said in the function comment. A simpler comment here might be
> > all you need:
> >
> > SUGGESTION:
> > Quick return for regression tests.
>
> Agreed with the above two comments. Fixed.
>
> >
> > ~~~
> >
> > 7.
> > Is it worth mentioning about this skipping of the transaction status
> > check in the docs for this GUC? [1]
>
> If we want to mention this optimization in the docs, we have to
> explain how the optimization works too. I think it's too detailed.
>
> I've attached the updated patch.

Few minor suggestions:
1) Can we use rbtxn_is_committed here?
+ /* Remember the transaction is aborted. */
+ Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+ curtxn->txn_flags |= RBTXN_IS_ABORTED;

2) Similarly here too:
+ /*
+ * Mark the transaction as aborted so we ignore future changes of this
+ * transaction.
+ */
+ Assert((txn->txn_flags & RBTXN_IS_COMMITTED) == 0);
+ txn->txn_flags |= RBTXN_IS_ABORTED;

3) Can we use rbtxn_is_aborted here?
+ /*
+ * Remember the transaction is committed so that we
can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ Assert((txn->txn_flags & RBTXN_IS_ABORTED) == 0);

Regards,
Vignesh

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-13 17:58:10
Message-ID:	CAD21AoDtMjbc8YCQiX1K8+RKeahcX2MLt3gwApm5BWGfv14i5A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 11, 2024 at 5:40 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Tue, Nov 12, 2024 at 5:00 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've attached the updated patch.
> >
>
> Hi, here are some review comments for the latest v6-0001.
>
> ======
> contrib/test_decoding/sql/stats.sql
>
> 1.
> +INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM
> generate_series(1, 5000) g(i);
>
> I didn't understand the meaning of "serialize-topbig--1". My guess is
> it is a typo that was supposed to say "toobig".

Fixex. We have another place using 'topbig', but I think we can leave it.

>
> Perhaps there should also be some comment to explain that this
> "toobig" stuff was done deliberately like this to exceed
> 'logical_decoding_work_mem' because that would normally (if it was not
> aborted) cause a spill to disk.

I think we already mentioned the transaction is going to be spilled
but actually not.

+-- Execute a transaction that is prepared and aborted. We detect that the
+-- transaction is aborted before spilling changes, and then skip collecting
+-- further changes.

>
> ~~~
>
> 2.
> +-- Check stats. We should not spill anything as the transaction is already
> +-- aborted.
> +SELECT pg_stat_force_next_flush();
> +SELECT slot_name, spill_txns AS spill_txn, spill_count AS spill_count
> FROM pg_stat_replication_slots WHERE slot_name =
> 'regression_slot_stats4_twophase';
> +
>
> Those aliases seem unnecessary: "spill_txns AS spill_txn" and
> "spill_count AS spill_count"

Fixed.

>
> ======
> .../replication/logical/reorderbuffer.c
>
> ReorderBufferCheckTXNAbort:
>
> 3.
> Other static functions are also declared at the top of this module.
> For consistency, shouldn't this be the same?

Agreed, added.

>
> ~~~
>
> 4.
> + * We don't mark the transaction as streamed since this function can be
> + * called for non-streamed transactions too.
> + */
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> + ReorderBufferToastReset(rb, txn);
>
> Given the comment says "since this function can be called for
> non-streamed transactions too", would it be easier to pass
> rbtxn_is_streamed(txn) here instead of 'false', and then just remove
> the comment?

Agreed.

During more testing, I found some bugs in the previous version patch,
so the latest patch incorporates some changes in addition to the
review comments I got so far.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v6-0001-Skip-logical-decoding-of-already-aborted-transact.patch	application/octet-stream	21.6 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-13 18:00:18
Message-ID:	CAD21AoDRYWVy0R8SfcFGiWytX5PYWEceP+3QYJLFQXEguy88og@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 12, 2024 at 7:29 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 11 Nov 2024 at 23:30, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Nov 10, 2024 at 11:24 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > Hi Sawada-San, here are some review comments for the patch v5-0001.
> > >
> >
> > Thank you for reviewing the patch!
> >
> > > ======
> > > Commit message.
> > >
> > > 1.
> > > This commit introduces an additional check to determine if a
> > > transaction is already aborted by a CLOG lookup, so the logical
> > > decoding skips further change also when it doesn't touch system
> > > catalogs.
> > >
> > > ~
> > >
> > > Is that wording backwards? Is it meant to say:
> > >
> > > This commit introduces an additional CLOG lookup check to determine if
> > > a transaction is already aborted, so the ...
> >
> > Fixed.
> >
> > >
> > > ======
> > > contrib/test_decoding/sql/stats.sql
> > >
> > > 2
> > > +SELECT slot_name, spill_txns = 0 AS spill_txn, spill_count = 0 AS
> > > spill_count FROM pg_stat_replication_slots WHERE slot_name =
> > > 'regression_slot_stats4_twophase';
> > >
> > > Why do the SELECT "= 0" like this, instead of just having zeros in the
> > > "expected" results?
> >
> > Indeed. I used "=0" like other queries in the same file do, but it
> > makes sense to me just to have zeros in the expected file. That way,
> > it would make it a bit easier to investigate in case of failures.
> >
> > >
> > > ======
> > > .../replication/logical/reorderbuffer.c
> > >
> > > 3.
> > > static void ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > > - bool txn_prepared);
> > > + bool txn_prepared, bool mark_streamed);
> > >
> > > That last parameter name ('mark_streamed') does not match the same
> > > parameter name in this function's definition.
> >
> > Fixed.
> >
> > >
> > > ~~~
> > >
> > > ReorderBufferTruncateTXN:
> > >
> > > 4.
> > > if (txn_streaming && (!txn_prepared) &&
> > > (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > > txn->txn_flags |= RBTXN_IS_STREAMED;
> > >
> > > if (txn_prepared)
> > > {
> > > ~
> > >
> > > Since the following condition was already "if (txn_prepared)" would it
> > > be better remove the "(!txn_prepared)" here and instead just refactor
> > > the code like:
> > >
> > > if (txn_prepared)
> > > {
> > > ...
> > > }
> > > else if (txn_streaming && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0)))
> > > {
> > > ...
> > > }
> >
> > Good idea.
> >
> > >
> > > ~~~
> > >
> > > ReorderBufferProcessTXN:
> > >
> > > 5.
> > > +
> > > + /* Remember the transaction is aborted */
> > > + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> > > + curtxn->txn_flags |= RBTXN_IS_ABORTED;
> > >
> > > Missing period on comment.
> >
> > Fixed.
> >
> > >
> > > ~~~
> > >
> > > ReorderBufferCheckTXNAbort:
> > >
> > > 6.
> > > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > > + * check the transaction status, so the caller always processes this
> > > + * transaction. This is to disable this check for regression tests.
> > > + */
> > > +static bool
> > > +ReorderBufferCheckTXNAbort(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > > +{
> > > + /*
> > > + * If GUC 'debug_logical_replication_streaming' is "immediate", we don't
> > > + * check the transaction status, so the caller always processes this
> > > + * transaction.
> > > + */
> > > + if (unlikely(debug_logical_replication_streaming ==
> > > DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE))
> > > + return false;
> > > +
> > >
> > > The wording of the sentence "This is to disable..." seemed a bit
> > > confusing. Maybe this area can be simplified by doing the following.
> > >
> > > 6a.
> > > Change the function comment to say more like below:
> > >
> > > When the GUC 'debug_logical_replication_streaming' is set to
> > > "immediate", we don't check the transaction status, meaning the caller
> > > will always process this transaction. This mode is used by regression
> > > tests to avoid unnecessary transaction status checking.
> > >
> > > ~
> > >
> > > 6b.
> > > It is not necessary for this 2nd comment to repeat everything that was
> > > already said in the function comment. A simpler comment here might be
> > > all you need:
> > >
> > > SUGGESTION:
> > > Quick return for regression tests.
> >
> > Agreed with the above two comments. Fixed.
> >
> > >
> > > ~~~
> > >
> > > 7.
> > > Is it worth mentioning about this skipping of the transaction status
> > > check in the docs for this GUC? [1]
> >
> > If we want to mention this optimization in the docs, we have to
> > explain how the optimization works too. I think it's too detailed.
> >
> > I've attached the updated patch.
>
> Few minor suggestions:
> 1) Can we use rbtxn_is_committed here?
> + /* Remember the transaction is aborted. */
> + Assert((curtxn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> + curtxn->txn_flags |= RBTXN_IS_ABORTED;
>
> 2) Similarly here too:
> + /*
> + * Mark the transaction as aborted so we ignore future changes of this
> + * transaction.
> + */
> + Assert((txn->txn_flags & RBTXN_IS_COMMITTED) == 0);
> + txn->txn_flags |= RBTXN_IS_ABORTED;
>
> 3) Can we use rbtxn_is_aborted here?
> + /*
> + * Remember the transaction is committed so that we
> can skip CLOG
> + * check next time, avoiding the pressure on CLOG lookup.
> + */
> + Assert((txn->txn_flags & RBTXN_IS_ABORTED) == 0);
>

Thank you for reviewing the patch!

These comments are incorporated into the latest v6 patch I just sent[1].

Regards,

[1] /message-id/CAD21AoDtMjbc8YCQiX1K8%2BRKeahcX2MLt3gwApm5BWGfv14i5A%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-14 04:22:57
Message-ID:	CAHut+PuLNFv5tO4n=b6yOsnrm9e6iVCt5zvHKusQvTBX02G-Qw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Sawda-San,

Here are some more review comments for the latest (accidentally called
v6 again?) v6-0001 patch.

======
contrib/test_decoding/sql/stats.sql

1.
+-- Execute a transaction that is prepared and aborted. We detect that the
+-- transaction is aborted before spilling changes, and then skip collecting
+-- further changes.

You had replied (referring to the above comment):
I think we already mentioned the transaction is going to be spilled
but actually not.

Yes, spilling was already mentioned in the current comment but I felt
it assumes the reader is expected to know details of why it was going
to be spilled in the first place.

In other words, I thought the comment could include a bit more
explanatory background info:
(Also, it's not really "we detect" the abort -- it's the new postgres
code of this patch that detects it.)

SUGGESTION:
Execute a transaction that is prepared but then aborted. The INSERT
data exceeds the 'logical_decoding_work_mem limit' limit which
normally would result in the transaction being spilled to disk, but
now when Postgres detects the abort it skips the spilling and also
skips collecting further changes.

~~~

2.
+-- Check if the transaction is not spilled as it's already aborted.
+SELECT count(*) FROM
pg_logical_slot_get_changes('regression_slot_stats4_twophase', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT pg_stat_force_next_flush();
+SELECT slot_name, spill_txns, spill_count FROM
pg_stat_replication_slots WHERE slot_name =
'regression_slot_stats4_twophase';
+

/Check if the transaction is not spilled/Verify that the transaction
was not spilled/

======
.../replication/logical/reorderbuffer.c

ReorderBufferResetTXN:

3.
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);

Looking at the calling code for ReorderBufferResetTXN it seems this
function can called for streaming OR prepared. So is it OK here to be
passing hardwired 'true' as the txn_streaming parameter, or should
that be passing rbtxn_is_streamed(txn)?

~~~

ReorderBufferLargestStreamableTopTXN:

4.
if ((largest == NULL || txn->total_size > largest_size) &&
(txn->total_size > 0) && !(rbtxn_has_partial_change(txn)) &&
- rbtxn_has_streamable_change(txn))
+ rbtxn_has_streamable_change(txn) && !(rbtxn_is_aborted(txn)))
{
largest = txn;
largest_size = txn->total_size;

I felt that this increasingly complicated code would be a lot easier
to understand if you just separate the conditions into: (a) the ones
that filter out transaction you don't care about; (b) the ones that
check for the largest size. For example,

SUGGESTION:
dlist_foreach(...)
{
...

/* Don't consider these kinds of transactions for eviction. */
if (rbtxn_has_partial_change(txn) ||
!rbtxn_has_streamable_change(txn) || rbtxn_is_aborted(txn))
continue;

/* Find the largest of the eviction candidates. */
if ((largest == NULL || txn->total_size > largest_size) &&
(txn->total_size > 0))
{
largest = txn;
largest_size = txn->total_size;
}
}

~~~

ReorderBufferCheckMemoryLimit:

5.
+ /* skip the transaction if already aborted */
+ if (ReorderBufferCheckTXNAbort(rb, txn))
+ {
+ /* All changes should be truncated */
+ Assert(txn->size == 0 && txn->total_size == 0);
+ continue;
+ }

The "discard all changes accumulated so far" side-effect happening
here is not very apparent from the function name. Maybe a better name
for ReorderBufferCheckTXNAbort() would be something like
'ReorderBufferCleanupIfAbortedTXN()'.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-14 23:59:36
Message-ID:	CAD21AoAdspi98=YKRnB8bigRpVd4j-5TkOgxr702ya7UWvOA_w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 13, 2024 at 8:23 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawda-San,
>
> Here are some more review comments for the latest (accidentally called
> v6 again?) v6-0001 patch.

Thank you for reviewing the patch! Indeed, the previous version should
have been v7.

>
> ======
> contrib/test_decoding/sql/stats.sql
>
> 1.
> +-- Execute a transaction that is prepared and aborted. We detect that the
> +-- transaction is aborted before spilling changes, and then skip collecting
> +-- further changes.
>
> You had replied (referring to the above comment):
> I think we already mentioned the transaction is going to be spilled
> but actually not.
>
> ~
>
> Yes, spilling was already mentioned in the current comment but I felt
> it assumes the reader is expected to know details of why it was going
> to be spilled in the first place.

TBH we expect the reader, typically patch authors and reviewers, to
know it.ats1', NULL, NULL, 'skip-empty-xacts', '1');

> In other words, I thought the comment could include a bit more
> explanatory background info:
> (Also, it's not really "we detect" the abort -- it's the new postgres
> code of this patch that detects it.)
>
> SUGGESTION:
> Execute a transaction that is prepared but then aborted. The INSERT
> data exceeds the 'logical_decoding_work_mem limit' limit which
> normally would result in the transaction being spilled to disk, but
> now when Postgres detects the abort it skips the spilling and also
> skips collecting further changes.

But I'm concerned this explanation might be too detailed, and feel odd
to put this comment for the new added tests even though we're doing
similar tests in the same file. For instance, we have:

-- spilling the xact
BEGIN;
INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM
generate_series(1, 5000) g(i);
COMMIT;
SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_st

How about rewording it to the following? I think it's better to
explain why we use a prepared transaction here:

+-- The INSERT changes are large enough to be spilled but not, because the
+-- transaction is aborted. The logical decoding skips collecting further
+-- changes too. The transaction is prepared to make sure the decoding processes
+-- the aborted transaction.

>
> ~~~
>
> 2.
> +-- Check if the transaction is not spilled as it's already aborted.
> +SELECT count(*) FROM
> pg_logical_slot_get_changes('regression_slot_stats4_twophase', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +SELECT pg_stat_force_next_flush();
> +SELECT slot_name, spill_txns, spill_count FROM
> pg_stat_replication_slots WHERE slot_name =
> 'regression_slot_stats4_twophase';
> +
>
> /Check if the transaction is not spilled/Verify that the transaction
> was not spilled/

How about "Verify that the decoding doesn't spill already-aborted
transaction's changes."?

>
> ======
> .../replication/logical/reorderbuffer.c
>
> ReorderBufferResetTXN:
>
> 3.
> /* Discard the changes that we just streamed */
> - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);
>
> Looking at the calling code for ReorderBufferResetTXN it seems this
> function can called for streaming OR prepared. So is it OK here to be
> passing hardwired 'true' as the txn_streaming parameter, or should
> that be passing rbtxn_is_streamed(txn)?

I think it should pass 'true' because otherwise the transaction won't
be marked as streamed.

After more thoughts, I think the name of txn_streaming is the source
of confusion. The flag is actually used to decide whether or not the
given transaction can be marked as streamed, but should not indicate
whether the transaction is being streamed because this function can be
called while streaming. So I renamed it to 'mark_txn_streaming' and
updated the comment.

>
> ~~~
>
> ReorderBufferLargestStreamableTopTXN:
>
> 4.
> if ((largest == NULL || txn->total_size > largest_size) &&
> (txn->total_size > 0) && !(rbtxn_has_partial_change(txn)) &&
> - rbtxn_has_streamable_change(txn))
> + rbtxn_has_streamable_change(txn) && !(rbtxn_is_aborted(txn)))
> {
> largest = txn;
> largest_size = txn->total_size;
>
> I felt that this increasingly complicated code would be a lot easier
> to understand if you just separate the conditions into: (a) the ones
> that filter out transaction you don't care about; (b) the ones that
> check for the largest size. For example,
>
> SUGGESTION:
> dlist_foreach(...)
> {
> ...
>
> /* Don't consider these kinds of transactions for eviction. */
> if (rbtxn_has_partial_change(txn) ||
> !rbtxn_has_streamable_change(txn) || rbtxn_is_aborted(txn))
> continue;
>
> /* Find the largest of the eviction candidates. */
> if ((largest == NULL || txn->total_size > largest_size) &&
> (txn->total_size > 0))
> {
> largest = txn;
> largest_size = txn->total_size;
> }
> }

I like this idea.

>
> ~~~
>
> ReorderBufferCheckMemoryLimit:
>
> 5.
> + /* skip the transaction if already aborted */
> + if (ReorderBufferCheckTXNAbort(rb, txn))
> + {
> + /* All changes should be truncated */
> + Assert(txn->size == 0 && txn->total_size == 0);
> + continue;
> + }
>
> The "discard all changes accumulated so far" side-effect happening
> here is not very apparent from the function name. Maybe a better name
> for ReorderBufferCheckTXNAbort() would be something like
> 'ReorderBufferCleanupIfAbortedTXN()'.

Okay, since we use the term "Cleanup" for different meanings in
reorderbuffer.c (discarding all changes and deallocating the entry),
how about ReorderBufferTruncateTXNIfAborted()?

I've attached the updated patch (v8).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v8-0001-Skip-logical-decoding-of-already-aborted-transact.patch	application/octet-stream	22.8 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-15 03:06:54
Message-ID:	CAHut+PvYYqVm13K8b-asHjD708xi611kh7p46SJAWfU9vzwdnQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Sawada-Sn,

Here are some review comments for patch v8-0001.

======
contrib/test_decoding/sql/stats.sql

1.
+-- The INSERT changes are large enough to be spilled but not, because the
+-- transaction is aborted. The logical decoding skips collecting further
+-- changes too. The transaction is prepared to make sure the decoding processes
+-- the aborted transaction.

/to be spilled but not/to be spilled but will not be/

======
.../replication/logical/reorderbuffer.c

ReorderBufferTruncateTXN:

2.
/*
* Discard changes from a transaction (and subtransactions), either after
- * streaming or decoding them at PREPARE. Keep the remaining info -
- * transactions, tuplecids, invalidations and snapshots.
+ * streaming, decoding them at PREPARE, or detecting the transaction abort.
+ * Keep the remaining info - transactions, tuplecids, invalidations and
+ * snapshots.
*
* We additionally remove tuplecids after decoding the transaction at prepare
* time as we only need to perform invalidation at rollback or commit prepared.
*
+ * The given transaction is marked as streamed if appropriate and the caller
+ * asked it by passing 'mark_txn_streaming' being true.
+ *
* 'txn_prepared' indicates that we have decoded the transaction at prepare
* time.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_txn_streaming)

I think the function comment should describe the parameters in the
same order that they appear in the function signature.

~~~

3.
+ else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
(txn->nentries_mem != 0)))
+ {
...
+ txn->txn_flags |= RBTXN_IS_STREAMED;
+ }

I guess it doesn't matter much, but for the sake of readability,
should the condition also be checking !rbtxn_is_streamed(txn) to avoid
overwriting the RBTXN_IS_STREAMED bit when it was set already?

~~~

ReorderBufferTruncateTXNIfAborted:

4.
+ /*
+ * The transaction aborted. We discard the changes we've collected so far,
+ * and free all resources allocated for toast reconstruction. The full
+ * cleanup will happen as part of decoding ABORT record of this
+ * transaction.
+ *
+ * Since we don't check the transaction status while replaying the
+ * transaction, we don't need to reset toast reconstruction data here.
+ */
+ ReorderBufferTruncateTXN(rb, txn, false, false);

4a.
The first part of the comment says "... and free all resources
allocated for toast reconstruction", but the second part says "we
don't need to reset toast reconstruction data here". Is that a
contradiction?

4b.
Shouldn't this call still be passing rbtxn_prepared(txn) as the 2nd
last param, like it used to?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-15 18:01:45
Message-ID:	CAD21AoBFqE+LOP7Pdp+PPxAZXGZC2RJCcbLbG+M4+ermAgT1Og@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토SQL

On Thu, Nov 14, 2024 at 7:07 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawada-Sn,
>
> Here are some review comments for patch v8-0001.

Thank you for the comments.

>
> ======
> contrib/test_decoding/sql/stats.sql
>
> 1.
> +-- The INSERT changes are large enough to be spilled but not, because the
> +-- transaction is aborted. The logical decoding skips collecting further
> +-- changes too. The transaction is prepared to make sure the decoding processes
> +-- the aborted transaction.
>
> /to be spilled but not/to be spilled but will not be/

Fixed.

>
> ======
> .../replication/logical/reorderbuffer.c
>
> ReorderBufferTruncateTXN:
>
> 2.
> /*
> * Discard changes from a transaction (and subtransactions), either after
> - * streaming or decoding them at PREPARE. Keep the remaining info -
> - * transactions, tuplecids, invalidations and snapshots.
> + * streaming, decoding them at PREPARE, or detecting the transaction abort.
> + * Keep the remaining info - transactions, tuplecids, invalidations and
> + * snapshots.
> *
> * We additionally remove tuplecids after decoding the transaction at prepare
> * time as we only need to perform invalidation at rollback or commit prepared.
> *
> + * The given transaction is marked as streamed if appropriate and the caller
> + * asked it by passing 'mark_txn_streaming' being true.
> + *
> * 'txn_prepared' indicates that we have decoded the transaction at prepare
> * time.
> */
> static void
> -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> bool txn_prepared)
> +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> bool txn_prepared,
> + bool mark_txn_streaming)
>
> I think the function comment should describe the parameters in the
> same order that they appear in the function signature.

Not sure it should be. We sometimes describe the overall idea of the
function first while using arguments names, and then describe what
other arguments mean.

>
> ~~~
>
> 3.
> + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
> (txn->nentries_mem != 0)))
> + {
> ...
> + txn->txn_flags |= RBTXN_IS_STREAMED;
> + }
>
> I guess it doesn't matter much, but for the sake of readability,
> should the condition also be checking !rbtxn_is_streamed(txn) to avoid
> overwriting the RBTXN_IS_STREAMED bit when it was set already?

Not sure it improves readability because it adds one more check there.
If it's important not to re-set RBTXN_IS_STREAMED, it makes sense to
have that check and describe in the comment. But in this case, I think
we don't necessarily need to do that.

> ~~~
>
> ReorderBufferTruncateTXNIfAborted:
>
> 4.
> + /*
> + * The transaction aborted. We discard the changes we've collected so far,
> + * and free all resources allocated for toast reconstruction. The full
> + * cleanup will happen as part of decoding ABORT record of this
> + * transaction.
> + *
> + * Since we don't check the transaction status while replaying the
> + * transaction, we don't need to reset toast reconstruction data here.
> + */
> + ReorderBufferTruncateTXN(rb, txn, false, false);
>
> 4a.
> The first part of the comment says "... and free all resources
> allocated for toast reconstruction", but the second part says "we
> don't need to reset toast reconstruction data here". Is that a
> contradiction?

Yes, the comment is out-of-date. Since this function is not called
while replaying the transaction, it should not have any toast
reconstruction data.

>
> ~
>
> 4b.
> Shouldn't this call still be passing rbtxn_prepared(txn) as the 2nd
> last param, like it used to?

Actually it's not necessary because it should always be false. But
thinking more, it seems to be better to use rbtxn_preapred(txn) since
it's consistent with other places and it's not necessary to put
assumptions there.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v9-0001-Skip-logical-decoding-of-already-aborted-transact.patch	application/octet-stream	22.8 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-19 07:12:44
Message-ID:	CALDaNm3COogB2_aPgFZ_pNBh7BjmwDqoBj5bE4H7D26F3hJQ=Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 15 Nov 2024 at 23:32, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Nov 14, 2024 at 7:07 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Hi Sawada-Sn,
> >
> > Here are some review comments for patch v8-0001.
>
> Thank you for the comments.
>
> >
> > ======
> > contrib/test_decoding/sql/stats.sql
> >
> > 1.
> > +-- The INSERT changes are large enough to be spilled but not, because the
> > +-- transaction is aborted. The logical decoding skips collecting further
> > +-- changes too. The transaction is prepared to make sure the decoding processes
> > +-- the aborted transaction.
> >
> > /to be spilled but not/to be spilled but will not be/
>
> Fixed.
>
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > ReorderBufferTruncateTXN:
> >
> > 2.
> > /*
> > * Discard changes from a transaction (and subtransactions), either after
> > - * streaming or decoding them at PREPARE. Keep the remaining info -
> > - * transactions, tuplecids, invalidations and snapshots.
> > + * streaming, decoding them at PREPARE, or detecting the transaction abort.
> > + * Keep the remaining info - transactions, tuplecids, invalidations and
> > + * snapshots.
> > *
> > * We additionally remove tuplecids after decoding the transaction at prepare
> > * time as we only need to perform invalidation at rollback or commit prepared.
> > *
> > + * The given transaction is marked as streamed if appropriate and the caller
> > + * asked it by passing 'mark_txn_streaming' being true.
> > + *
> > * 'txn_prepared' indicates that we have decoded the transaction at prepare
> > * time.
> > */
> > static void
> > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > bool txn_prepared)
> > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > bool txn_prepared,
> > + bool mark_txn_streaming)
> >
> > I think the function comment should describe the parameters in the
> > same order that they appear in the function signature.
>
> Not sure it should be. We sometimes describe the overall idea of the
> function first while using arguments names, and then describe what
> other arguments mean.
>
> >
> > ~~~
> >
> > 3.
> > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
> > (txn->nentries_mem != 0)))
> > + {
> > ...
> > + txn->txn_flags |= RBTXN_IS_STREAMED;
> > + }
> >
> > I guess it doesn't matter much, but for the sake of readability,
> > should the condition also be checking !rbtxn_is_streamed(txn) to avoid
> > overwriting the RBTXN_IS_STREAMED bit when it was set already?
>
> Not sure it improves readability because it adds one more check there.
> If it's important not to re-set RBTXN_IS_STREAMED, it makes sense to
> have that check and describe in the comment. But in this case, I think
> we don't necessarily need to do that.
>
> > ~~~
> >
> > ReorderBufferTruncateTXNIfAborted:
> >
> > 4.
> > + /*
> > + * The transaction aborted. We discard the changes we've collected so far,
> > + * and free all resources allocated for toast reconstruction. The full
> > + * cleanup will happen as part of decoding ABORT record of this
> > + * transaction.
> > + *
> > + * Since we don't check the transaction status while replaying the
> > + * transaction, we don't need to reset toast reconstruction data here.
> > + */
> > + ReorderBufferTruncateTXN(rb, txn, false, false);
> >
> > 4a.
> > The first part of the comment says "... and free all resources
> > allocated for toast reconstruction", but the second part says "we
> > don't need to reset toast reconstruction data here". Is that a
> > contradiction?
>
> Yes, the comment is out-of-date. Since this function is not called
> while replaying the transaction, it should not have any toast
> reconstruction data.
>
> >
> > ~
> >
> > 4b.
> > Shouldn't this call still be passing rbtxn_prepared(txn) as the 2nd
> > last param, like it used to?
>
> Actually it's not necessary because it should always be false. But
> thinking more, it seems to be better to use rbtxn_preapred(txn) since
> it's consistent with other places and it's not necessary to put
> assumptions there.

Few comments:
1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted
instead of having it at multiple callers, ReorderBufferResetTXN also
has the Assert inside the function after truncate of the transaction:
@@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
Assert(txn->total_size > 0);
Assert(rb->size >= txn->total_size);

+ /* skip the transaction if aborted */
+ if (ReorderBufferTruncateTXNIfAborted(rb, txn))
+ {
+ /* All changes should be discarded */
+ Assert(txn->size == 0 && txn->total_size == 0);
+ continue;
+ }
+
ReorderBufferStreamTXN(rb, txn);
}
else
@@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
Assert(txn->size > 0);
Assert(rb->size >= txn->size);

+ /* skip the transaction if aborted */
+ if (ReorderBufferTruncateTXNIfAborted(rb, txn))
+ {
+ /* All changes should be discarded */
+ Assert(txn->size == 0 && txn->total_size == 0);
+ continue;
+ }

2) txn->txn_flags can be moved to the next line to keep it within 80
chars in this case:
* Check the transaction status by looking CLOG and discard all changes if
* the transaction is aborted. The transaction status is cached in
txn->txn_flags
* so we can skip future changes and avoid CLOG lookups on the next call. Return

3) Is there any scenario where the Assert can fail as the toast is not reset:
+ * Since we don't check the transaction status while replaying the
+ * transaction, we don't need to reset toast reconstruction data here.
+ */
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);

+ if (ReorderBufferTruncateTXNIfAborted(rb, txn))
+ {
+ /* All changes should be discarded */
+ Assert(txn->size == 0 && txn->total_size == 0);
+ continue;
+ }

4) This can be changed to a single line comment:
+ /*
+ * Quick return if the transaction status is already known.
+ */
+ if (rbtxn_is_committed(txn))
+ return false;

Regards,
Vignesh

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-25 04:50:14
Message-ID:	CAHut+PsraMChmh=Tthu9_YHHN_c-mOqM-0pBXHeonhT7vE8qPA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Here are my review comments for patch v9-0001.

These are only trivial nits for some code comments. Everything else
looked good to me.

======
.../replication/logical/reorderbuffer.c

ReorderBufferTruncateTXN:

1.
+ * The given transaction is marked as streamed if appropriate and the caller
+ * asked it by passing 'mark_txn_streaming' being true.

/asked it/requested it/

/being true/as true/

~~~

ReorderBufferPrepare:

2.
+ /*
+ * Remember if the transaction is already aborted to check if we detect
+ * that the transaction is concurrently aborted during the replay.
+ */

SUGGESTION:
Remember if the transaction is already aborted so we can detect when
the transaction is concurrently aborted during the replay.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-25 21:28:00
Message-ID:	CAD21AoDixFrmhNaQgc-e++kAYjf+k1iwSvRwCSq2a-Sf+EgQEg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
>
> Few comments:

Thank you for reviewing the patch!

> 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted
> instead of having it at multiple callers, ReorderBufferResetTXN also
> has the Assert inside the function after truncate of the transaction:
> @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> Assert(txn->total_size > 0);
> Assert(rb->size >= txn->total_size);
>
> + /* skip the transaction if aborted */
> + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> + {
> + /* All changes should be discarded */
> + Assert(txn->size == 0 && txn->total_size == 0);
> + continue;
> + }
> +
> ReorderBufferStreamTXN(rb, txn);
> }
> else
> @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> Assert(txn->size > 0);
> Assert(rb->size >= txn->size);
>
> + /* skip the transaction if aborted */
> + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> + {
> + /* All changes should be discarded */
> + Assert(txn->size == 0 && txn->total_size == 0);
> + continue;
> + }

Moved.

>
> 2) txn->txn_flags can be moved to the next line to keep it within 80
> chars in this case:
> * Check the transaction status by looking CLOG and discard all changes if
> * the transaction is aborted. The transaction status is cached in
> txn->txn_flags
> * so we can skip future changes and avoid CLOG lookups on the next call. Return

Fixed.

>
> 3) Is there any scenario where the Assert can fail as the toast is not reset:
> + * Since we don't check the transaction status while replaying the
> + * transaction, we don't need to reset toast reconstruction data here.
> + */
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
>
> + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> + {
> + /* All changes should be discarded */
> + Assert(txn->size == 0 && txn->total_size == 0);
> + continue;
> + }

IIUC we reconstruct TOAST data when replaying the transaction. On the
other hand, this function is called while adding a decoded change but
not when replaying the transaction. So we should not have any toast
reconstruction data at this point unless I'm missing something. Do you
have any scenario where we call ReorderBufferTruncateTXNIfAborted()
while a transaction has TOAST reconstruction data?

>
> 4) This can be changed to a single line comment:
> + /*
> + * Quick return if the transaction status is already known.
> + */
> + if (rbtxn_is_committed(txn))
> + return false;

Fixed.

I'll post the updated version patch soon.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-25 21:32:17
Message-ID:	CAD21AoDGns-OmZKCvU_gaDAO=gS5-=t6-M2LNS1xvqZmYCcgow@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Nov 24, 2024 at 8:50 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi, Here are my review comments for patch v9-0001.
>
> These are only trivial nits for some code comments. Everything else
> looked good to me.
>
> ======
> .../replication/logical/reorderbuffer.c
>
> ReorderBufferTruncateTXN:
>
> 1.
> + * The given transaction is marked as streamed if appropriate and the caller
> + * asked it by passing 'mark_txn_streaming' being true.
>
> /asked it/requested it/
>
> /being true/as true/
>
> ~~~
>
> ReorderBufferPrepare:
>
> 2.
> + /*
> + * Remember if the transaction is already aborted to check if we detect
> + * that the transaction is concurrently aborted during the replay.
> + */
>
> SUGGESTION:
> Remember if the transaction is already aborted so we can detect when
> the transaction is concurrently aborted during the replay.
>

Thank you for the suggestions.

I've attached a new version patch that incorporates all comments I got so far.

I think the patch is in good shape but I'm considering whether we
might want to call ReorderBufferToastReset() after truncating all
changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
investigate further.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v10-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	22.7 KB

From:	vignesh C <vignesh21(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-11-27 06:01:30
Message-ID:	CALDaNm2A9sJzsqugorJap8Kd90FiGj69vdgPqr+7DPaXq9oxtQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 26 Nov 2024 at 02:58, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> >
> > Few comments:
>
> Thank you for reviewing the patch!
>
> > 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted
> > instead of having it at multiple callers, ReorderBufferResetTXN also
> > has the Assert inside the function after truncate of the transaction:
> > @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> > Assert(txn->total_size > 0);
> > Assert(rb->size >= txn->total_size);
> >
> > + /* skip the transaction if aborted */
> > + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > + {
> > + /* All changes should be discarded */
> > + Assert(txn->size == 0 && txn->total_size == 0);
> > + continue;
> > + }
> > +
> > ReorderBufferStreamTXN(rb, txn);
> > }
> > else
> > @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> > Assert(txn->size > 0);
> > Assert(rb->size >= txn->size);
> >
> > + /* skip the transaction if aborted */
> > + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > + {
> > + /* All changes should be discarded */
> > + Assert(txn->size == 0 && txn->total_size == 0);
> > + continue;
> > + }
>
> Moved.
>
> >
> > 2) txn->txn_flags can be moved to the next line to keep it within 80
> > chars in this case:
> > * Check the transaction status by looking CLOG and discard all changes if
> > * the transaction is aborted. The transaction status is cached in
> > txn->txn_flags
> > * so we can skip future changes and avoid CLOG lookups on the next call. Return
>
> Fixed.
>
> >
> > 3) Is there any scenario where the Assert can fail as the toast is not reset:
> > + * Since we don't check the transaction status while replaying the
> > + * transaction, we don't need to reset toast reconstruction data here.
> > + */
> > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> >
> > + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > + {
> > + /* All changes should be discarded */
> > + Assert(txn->size == 0 && txn->total_size == 0);
> > + continue;
> > + }
>
> IIUC we reconstruct TOAST data when replaying the transaction. On the
> other hand, this function is called while adding a decoded change but
> not when replaying the transaction. So we should not have any toast
> reconstruction data at this point unless I'm missing something. Do you
> have any scenario where we call ReorderBufferTruncateTXNIfAborted()
> while a transaction has TOAST reconstruction data?

I have checked further regarding the toast and verified the population
of the toast hash. I agree with you on this. Overall, the patch
appears to be in good shape.

Regards,
Vignesh

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-10 01:31:09
Message-ID:	CAD21AoD38hZHiMGeXbruNomJGSdoZi4hWxPtYJ3fubhvWva3mg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 26, 2024 at 10:01 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, 26 Nov 2024 at 02:58, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Nov 18, 2024 at 11:12 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > >
> > > Few comments:
> >
> > Thank you for reviewing the patch!
> >
> > > 1) Should we have the Assert inside ReorderBufferTruncateTXNIfAborted
> > > instead of having it at multiple callers, ReorderBufferResetTXN also
> > > has the Assert inside the function after truncate of the transaction:
> > > @@ -3672,6 +3758,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> > > Assert(txn->total_size > 0);
> > > Assert(rb->size >= txn->total_size);
> > >
> > > + /* skip the transaction if aborted */
> > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > > + {
> > > + /* All changes should be discarded */
> > > + Assert(txn->size == 0 && txn->total_size == 0);
> > > + continue;
> > > + }
> > > +
> > > ReorderBufferStreamTXN(rb, txn);
> > > }
> > > else
> > > @@ -3687,6 +3781,14 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
> > > Assert(txn->size > 0);
> > > Assert(rb->size >= txn->size);
> > >
> > > + /* skip the transaction if aborted */
> > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > > + {
> > > + /* All changes should be discarded */
> > > + Assert(txn->size == 0 && txn->total_size == 0);
> > > + continue;
> > > + }
> >
> > Moved.
> >
> > >
> > > 2) txn->txn_flags can be moved to the next line to keep it within 80
> > > chars in this case:
> > > * Check the transaction status by looking CLOG and discard all changes if
> > > * the transaction is aborted. The transaction status is cached in
> > > txn->txn_flags
> > > * so we can skip future changes and avoid CLOG lookups on the next call. Return
> >
> > Fixed.
> >
> > >
> > > 3) Is there any scenario where the Assert can fail as the toast is not reset:
> > > + * Since we don't check the transaction status while replaying the
> > > + * transaction, we don't need to reset toast reconstruction data here.
> > > + */
> > > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> > >
> > > + if (ReorderBufferTruncateTXNIfAborted(rb, txn))
> > > + {
> > > + /* All changes should be discarded */
> > > + Assert(txn->size == 0 && txn->total_size == 0);
> > > + continue;
> > > + }
> >
> > IIUC we reconstruct TOAST data when replaying the transaction. On the
> > other hand, this function is called while adding a decoded change but
> > not when replaying the transaction. So we should not have any toast
> > reconstruction data at this point unless I'm missing something. Do you
> > have any scenario where we call ReorderBufferTruncateTXNIfAborted()
> > while a transaction has TOAST reconstruction data?
>
> I have checked further regarding the toast and verified the population
> of the toast hash. I agree with you on this. Overall, the patch
> appears to be in good shape.

Thank you for the confirmation!

I thought we'd done performance tests with this patch but Michael-san
pointed out we've not done yet. So I've done benchmark tests in two
scenarios:

A. Skip decoding large aborted transactions.

1. Preparation (SQL commands)

create table test (c int);
select pg_create_logical_replication_slot('s', 'test_decoding');
begin;
insert into test select generate_series(1, 1_000_000);
commit;
begin;
insert into test select generate_series(1, 1_000_000);
rollback;
begin;
insert into test select generate_series(1, 1_000_000);
rollback;

2. Performance tests (results are w/o patch vs. w/ patch)

-- causes some spill/streamed transactions
set logical_decoding_work_mem to '64MB';

select 'non-streaming', count(*) from
pg_logical_slot_peek_changes('s', null, null, 'stream-changes',
'false');
-> 2636.208 ms vs. 2070.906 ms

select 'streaming', count(*) from pg_logical_slot_peek_changes('s',
null, null, 'stream-changes', 'true');
-> 910.579 ms vs. 653.574 ms

-- no spill/streamed transactions
set logical_decoding_work_mem to '5GB';

select 'non-streaming', count(*) from
pg_logical_slot_peek_changes('s', null, null, 'stream-changes',
'false');
-> 962.863 ms vs. 956.910 ms

select 'streaming', count(*) from pg_logical_slot_peek_changes('s',
null, null, 'stream-changes', 'true');
-> 973.426 ms vs. 973.033 ms

According to the results, skipping logical decoding of already-aborted
transactions contributes performance improvements.

B. Decoding medium-size transactions to check overheads of CLOG lookups.

1. Preparation (shell script)

pgbench -i -s 1 postgres
psql -c "create table test (c int)"
psql -c "select pg_create_logical_replication_slot('s', 'test_decoding')"
echo "insert into test select generate_series(1, 100)" > /tmp/bench.sql
pgbench -t 10000 -c 10 -j 5 -f /tmp/bench.sql postgres

2. Performance tests

-- spill/streamed transactions
set logical_decoding_work_mem to '64';

select 'non-streaming', count(*) from
pg_logical_slot_peek_changes('s', null, null, 'stream-changes',
'false');
-> 7230.537 ms vs. 7154.322 ms

select 'streaming', count(*) from pg_logical_slot_peek_changes('s',
null, null, 'stream-changes', 'true');
-> 6702.438 ms vs. 6678.232 ms

Overall, I don't see noticeable overheads of CLOG lookups.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-10 05:09:31
Message-ID:	CAA4eK1+6ptSuUZPWkgT6KCWpaa-fBbByeYCLU5-=TYhRxyq2qA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've attached a new version patch that incorporates all comments I got so far.
>

Review comments:
===============
1.
+ * The given transaction is marked as streamed if appropriate and the caller
+ * requested it by passing 'mark_txn_streaming' as true.
+ *
* 'txn_prepared' indicates that we have decoded the transaction at prepare
* time.
*/
static void
-ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared)
+ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
bool txn_prepared,
+ bool mark_txn_streaming)
{
...
}
+ else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
(txn->nentries_mem != 0)))
+ {
+ /*
+ * Mark the transaction as streamed, if appropriate.

The comments related to the above changes don't clarify in which cases
the 'mark_txn_streaming' should be set. Before this patch, it was
clear from the comments and code about the cases where we would decide
to mark it as streamed.

2.
+ /*
+ * Mark the transaction as aborted so we ignore future changes of this
+ * transaction.

/so we ignore/so we can ignore/

3.
* Helper function for ReorderBufferProcessTXN to handle the concurrent
- * abort of the streaming transaction. This resets the TXN such that it
- * can be used to stream the remaining data of transaction being processed.
- * This can happen when the subtransaction is aborted and we still want to
- * continue processing the main or other subtransactions data.
+ * abort of the streaming (prepared) transaction.
...

In the above comment, "... streaming (prepared)...", you added
prepared to imply that this function handles concurrent abort for both
in-progress and prepared transactions. Am I correct? If so, the
current change makes it less clear. If you see the comments at its
caller, they are clearer.

4.
+ /*
+ * Remember if the transaction is already aborted so we can detect when
+ * the transaction is concurrently aborted during the replay.
+ */
+ already_aborted = rbtxn_is_aborted(txn);
+
ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn,
txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn);

@@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb,
TransactionId xid,
* when rollback prepared is decoded and sent, the downstream should be
* able to rollback such a xact. See comments atop DecodePrepare.
*
- * Note, for the concurrent_abort + streaming case a stream_prepare was
+ * Note, for the concurrent abort + streaming case a stream_prepare was
* already sent within the ReorderBufferReplay call above.
*/
- if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
+ if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn))
rb->prepare(rb, txn, txn->final_lsn);

It is not clear from the comments how the 'already_aborted' is
handled. I think after this patch we would have already truncated all
its changes. If so, why do we need to try to replay the changes of
such a xact?

5.
+/*
+ * Check the transaction status by looking CLOG and discard all changes if
+ * the transaction is aborted. The transaction status is cached in
+ * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
+ * next call. Return true if the transaction is aborted, otherwise return
+ * false.
+ *
+ * When the 'debug_logical_replication_streaming' is set to "immediate", we
+ * don't check the transaction status, meaning the caller will always process
+ * this transaction.
+ */
+static bool
+ReorderBufferTruncateTXNIfAborted(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{

I think this function is being invoked to mark a sub-transaction as
aborted. It is better to explain in comments how it interacts with
sub-transactions, why it is okay to mark them as aborted, and how the
other parts of the system interact with it.

--
With Regards,
Amit Kapila.

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-10 05:29:05
Message-ID:	CAFiTN-v+fpnqTdD_bmNHf7XRiQVKB+L6XN5ydOMGMZ9y7Q117w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

>
> I've attached a new version patch that incorporates all comments I got so far.
>
> I think the patch is in good shape but I'm considering whether we
> might want to call ReorderBufferToastReset() after truncating all
> changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
> investigate further.
>

There’s something that seems a bit odd to me. Consider the case where
the largest transaction(s) are aborted. If
ReorderBufferCanStartStreaming() returns true, the changes from this
transaction will only be discarded if it's a streamable transaction.
However, if ReorderBufferCanStartStreaming() is false, the changes
will be discarded regardless.

What seems strange to me in this patch is truncating the changes of a
large aborted transaction depending on whether we need to stream or
spill but actually that should be completely independent IMHO. My
concern is that if the largest transaction is aborted but isn’t yet
streamable, we might end up picking the next transaction, which could
be much smaller. This smaller transaction might not help us stay
within the memory limit, and we could repeat this process for a few
more transactions. In contrast, it might be more efficient to simply
discard the large aborted transaction, even if it’s not streamable, to
avoid this issue.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-10 05:39:28
Message-ID:	CAA4eK1JtgTYOSzmG6-xEAhbTdesxRD5fUUPENjxBJSziL0uuvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> >
> > I've attached a new version patch that incorporates all comments I got so far.
> >
> > I think the patch is in good shape but I'm considering whether we
> > might want to call ReorderBufferToastReset() after truncating all
> > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
> > investigate further.
> >
>
> There’s something that seems a bit odd to me. Consider the case where
> the largest transaction(s) are aborted. If
> ReorderBufferCanStartStreaming() returns true, the changes from this
> transaction will only be discarded if it's a streamable transaction.
> However, if ReorderBufferCanStartStreaming() is false, the changes
> will be discarded regardless.
>
> What seems strange to me in this patch is truncating the changes of a
> large aborted transaction depending on whether we need to stream or
> spill but actually that should be completely independent IMHO. My
> concern is that if the largest transaction is aborted but isn’t yet
> streamable, we might end up picking the next transaction, which could
> be much smaller. This smaller transaction might not help us stay
> within the memory limit, and we could repeat this process for a few
> more transactions. In contrast, it might be more efficient to simply
> discard the large aborted transaction, even if it’s not streamable, to
> avoid this issue.
>

If the largest transaction is non-streamable, won't the transaction
returned by ReorderBufferLargestTXN() in the other case already
suffice the need?

--
With Regards,
Amit Kapila.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-10 05:49:47
Message-ID:	CAA4eK1J75xZ7Lw628GCbkfiSt+J7wqcQNyz0iwNwxj0zwCACWw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 10, 2024 at 10:39 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> 5.
> +/*
> + * Check the transaction status by looking CLOG and discard all changes if
> + * the transaction is aborted. The transaction status is cached in
> + * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
> + * next call. Return true if the transaction is aborted, otherwise return
> + * false.
> + *
> + * When the 'debug_logical_replication_streaming' is set to "immediate", we
> + * don't check the transaction status, meaning the caller will always process
> + * this transaction.
> + */
> +static bool
> +ReorderBufferTruncateTXNIfAborted(ReorderBuffer *rb, ReorderBufferTXN *txn)
> +{
>
> I think this function is being invoked to mark a sub-transaction as
> aborted. It is better to explain in comments how it interacts with
> sub-transactions, why it is okay to mark them as aborted, and how the
> other parts of the system interact with it.
>

The current name suggests that the main purpose is to truncate the txn
which is okay but wouldn't it be better to name on the lines of
ReorderBufferCheckAndTruncateAbortedTXN()?

In the following comment, can we move 'Return ...' to the next line to
make the return values from the function clear?
+ * next call. Return true if the transaction is aborted, otherwise return
+ * false.

--
With Regards,
Amit Kapila.

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-10 06:18:44
Message-ID:	CAFiTN-un38mm2yaCYWxz=se7YEfmfu6noMag4mBaRrcihx-C7w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 10, 2024 at 11:09 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > >
> > > I've attached a new version patch that incorporates all comments I got so far.
> > >
> > > I think the patch is in good shape but I'm considering whether we
> > > might want to call ReorderBufferToastReset() after truncating all
> > > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
> > > investigate further.
> > >
> >
> > There’s something that seems a bit odd to me. Consider the case where
> > the largest transaction(s) are aborted. If
> > ReorderBufferCanStartStreaming() returns true, the changes from this
> > transaction will only be discarded if it's a streamable transaction.
> > However, if ReorderBufferCanStartStreaming() is false, the changes
> > will be discarded regardless.
> >
> > What seems strange to me in this patch is truncating the changes of a
> > large aborted transaction depending on whether we need to stream or
> > spill but actually that should be completely independent IMHO. My
> > concern is that if the largest transaction is aborted but isn’t yet
> > streamable, we might end up picking the next transaction, which could
> > be much smaller. This smaller transaction might not help us stay
> > within the memory limit, and we could repeat this process for a few
> > more transactions. In contrast, it might be more efficient to simply
> > discard the large aborted transaction, even if it’s not streamable, to
> > avoid this issue.
> >
>
> If the largest transaction is non-streamable, won't the transaction
> returned by ReorderBufferLargestTXN() in the other case already
> suffice the need?

I see your point, but I don’t think it’s quite the same. When
ReorderBufferCanStartStreaming() is true, the function
ReorderBufferLargestStreamableTopTXN() looks for the largest
transaction among those that have a base_snapshot. So, if the largest
transaction is aborted but hasn’t yet received a base_snapshot, it
will instead select the largest transaction that does have a
base_snapshot, which could be significantly smaller than the largest
aborted transaction.

I’m not saying this is a very common scenario, but I do feel that the
logic behind truncating the largest transaction doesn’t seem entirely
consistent. However, maybe this isn't a major issue. We could justify
the current behavior by saying that before picking any transaction for
streaming or spilling, we first check whether it has been aborted.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-10 21:48:21
Message-ID:	CAD21AoApi6Mh6DJwyh_gmFDHAc_j_zDVcWJ-PaFknyaPjKE_SQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Dec 10, 2024 at 11:09 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 10, 2024 at 10:59 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Nov 26, 2024 at 3:02 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > >
> > > > I've attached a new version patch that incorporates all comments I got so far.
> > > >
> > > > I think the patch is in good shape but I'm considering whether we
> > > > might want to call ReorderBufferToastReset() after truncating all
> > > > changes, in ReorderBufferTruncateTXNIfAborted() just in case. Will
> > > > investigate further.
> > > >
> > >
> > > There’s something that seems a bit odd to me. Consider the case where
> > > the largest transaction(s) are aborted. If
> > > ReorderBufferCanStartStreaming() returns true, the changes from this
> > > transaction will only be discarded if it's a streamable transaction.
> > > However, if ReorderBufferCanStartStreaming() is false, the changes
> > > will be discarded regardless.
> > >
> > > What seems strange to me in this patch is truncating the changes of a
> > > large aborted transaction depending on whether we need to stream or
> > > spill but actually that should be completely independent IMHO. My
> > > concern is that if the largest transaction is aborted but isn’t yet
> > > streamable, we might end up picking the next transaction, which could
> > > be much smaller. This smaller transaction might not help us stay
> > > within the memory limit, and we could repeat this process for a few
> > > more transactions. In contrast, it might be more efficient to simply
> > > discard the large aborted transaction, even if it’s not streamable, to
> > > avoid this issue.
> > >
> >
> > If the largest transaction is non-streamable, won't the transaction
> > returned by ReorderBufferLargestTXN() in the other case already
> > suffice the need?
>
> I see your point, but I don’t think it’s quite the same. When
> ReorderBufferCanStartStreaming() is true, the function
> ReorderBufferLargestStreamableTopTXN() looks for the largest
> transaction among those that have a base_snapshot. So, if the largest
> transaction is aborted but hasn’t yet received a base_snapshot, it
> will instead select the largest transaction that does have a
> base_snapshot, which could be significantly smaller than the largest
> aborted transaction.

IIUC the transaction entries in reorderbuffer have the base snapshot
before decoding the first change (see SnapBuildProcessChange()). In
which case the transaction doesn't have the base snapshot and has the
largest amount of changes? Subtransaction entries could transfer its
base snapshot to its parent transaction entry but such subtransactions
will be picked by ReorderBufferLargestTXN().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-11 02:51:27
Message-ID:	CAFiTN-tmri+1BUgT9pBsL3gqk5pMpP-g0=WBC9EBOvH3L8Vbkw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> > >
> > > If the largest transaction is non-streamable, won't the transaction
> > > returned by ReorderBufferLargestTXN() in the other case already
> > > suffice the need?
> >
> > I see your point, but I don’t think it’s quite the same. When
> > ReorderBufferCanStartStreaming() is true, the function
> > ReorderBufferLargestStreamableTopTXN() looks for the largest
> > transaction among those that have a base_snapshot. So, if the largest
> > transaction is aborted but hasn’t yet received a base_snapshot, it
> > will instead select the largest transaction that does have a
> > base_snapshot, which could be significantly smaller than the largest
> > aborted transaction.
>
> IIUC the transaction entries in reorderbuffer have the base snapshot
> before decoding the first change (see SnapBuildProcessChange()). In
> which case the transaction doesn't have the base snapshot and has the
> largest amount of changes? Subtransaction entries could transfer its
> base snapshot to its parent transaction entry but such subtransactions
> will be picked by ReorderBufferLargestTXN().
>
IIRC, there could be cases where reorder buffers of transactions can
grow in size without having a base snapshot, I think transactions
doing DDLs and generating a lot of INVALIDATION messages could fall in
such a category. And that was one of the reasons why we were using
txns_by_base_snapshot_lsn inside
ReorderBufferLargestStreamableTopTXN().

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-12 05:38:33
Message-ID:	CAA4eK1+mL-xp4d_k1sU0m81TBcFhH54fdwbRDO1nDGUrXcM1XA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> > > >
> > > > If the largest transaction is non-streamable, won't the transaction
> > > > returned by ReorderBufferLargestTXN() in the other case already
> > > > suffice the need?
> > >
> > > I see your point, but I don’t think it’s quite the same. When
> > > ReorderBufferCanStartStreaming() is true, the function
> > > ReorderBufferLargestStreamableTopTXN() looks for the largest
> > > transaction among those that have a base_snapshot. So, if the largest
> > > transaction is aborted but hasn’t yet received a base_snapshot, it
> > > will instead select the largest transaction that does have a
> > > base_snapshot, which could be significantly smaller than the largest
> > > aborted transaction.
> >
> > IIUC the transaction entries in reorderbuffer have the base snapshot
> > before decoding the first change (see SnapBuildProcessChange()). In
> > which case the transaction doesn't have the base snapshot and has the
> > largest amount of changes? Subtransaction entries could transfer its
> > base snapshot to its parent transaction entry but such subtransactions
> > will be picked by ReorderBufferLargestTXN().
> >
> IIRC, there could be cases where reorder buffers of transactions can
> grow in size without having a base snapshot, I think transactions
> doing DDLs and generating a lot of INVALIDATION messages could fall in
> such a category.
>

Are we recording such changes in the reorder buffer? If so, can you
please share how? AFAICU, the main idea behind skipping aborts is to
avoid sending a lot of data to the client that later needs to be
discarded or cases where we spent resources/time spilling the changes
that later need to be discarded. In that vein, the current idea of the
patch where it truncates and skips aborted xacts before streaming or
spilling them sounds reasonable.

--
With Regards,
Amit Kapila.

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-12 06:01:03
Message-ID:	CAFiTN-us5d3yJannbb+w=0pyTJXycVT68z44mA7mjbt-JZ4tvg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 12, 2024 at 11:08 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > > > >
> > > > > If the largest transaction is non-streamable, won't the transaction
> > > > > returned by ReorderBufferLargestTXN() in the other case already
> > > > > suffice the need?
> > > >
> > > > I see your point, but I don’t think it’s quite the same. When
> > > > ReorderBufferCanStartStreaming() is true, the function
> > > > ReorderBufferLargestStreamableTopTXN() looks for the largest
> > > > transaction among those that have a base_snapshot. So, if the largest
> > > > transaction is aborted but hasn’t yet received a base_snapshot, it
> > > > will instead select the largest transaction that does have a
> > > > base_snapshot, which could be significantly smaller than the largest
> > > > aborted transaction.
> > >
> > > IIUC the transaction entries in reorderbuffer have the base snapshot
> > > before decoding the first change (see SnapBuildProcessChange()). In
> > > which case the transaction doesn't have the base snapshot and has the
> > > largest amount of changes? Subtransaction entries could transfer its
> > > base snapshot to its parent transaction entry but such subtransactions
> > > will be picked by ReorderBufferLargestTXN().
> > >
> > IIRC, there could be cases where reorder buffers of transactions can
> > grow in size without having a base snapshot, I think transactions
> > doing DDLs and generating a lot of INVALIDATION messages could fall in
> > such a category.
> >
>
> Are we recording such changes in the reorder buffer? If so, can you
> please share how?

xact_decode, do add the XLOG_XACT_INVALIDATIONS in the reorder buffer
and for such changes we don't call SnapBuildProcessChange() that means
it is possible to collect such changes in reorder buffer without
setting the base_snapshot

AFAICU, the main idea behind skipping aborts is to
> avoid sending a lot of data to the client that later needs to be
> discarded or cases where we spent resources/time spilling the changes
> that later need to be discarded. In that vein, the current idea of the
> patch where it truncates and skips aborted xacts before streaming or
> spilling them sounds reasonable.

I believe in one of my previous responses (a few emails above), I
agreed that it's a reasonable goal to check for aborted transactions
just before spilling or streaming, and if we detect an aborted
transaction, we can avoid streaming/spilling and simply discard the
changes. However, I wanted to make a point that if we have a large
aborted transaction without a base snapshot (assuming that's
possible), we might end up streaming many small transactions to stay
under the memory limit. Even though we try to stay within the limit,
we still might not succeed because the main issue is the large aborted
transaction, which doesn't have a base snapshot.

So, instead of streaming many small transactions, if we had selected
the largest transaction first and checked if it was aborted, we could
have avoided streaming all those smaller transactions. I agree this is
a hypothetical scenario and may not be worth optimizing, and that's
completely fair. I just wanted to clarify the point I raised when I
first started reviewing this patch.

I haven't tried it myself, but I believe this scenario could be
created by starting a transaction that performs multiple DDLs and then
ultimately gets aborted.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-12 21:30:29
Message-ID:	CAD21AoAW9OP-Qedgo-KHS_DFvrfxYZrZj6-fagjE2fefH_0tcg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 11, 2024 at 10:01 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Dec 12, 2024 at 11:08 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Dec 11, 2024 at 8:21 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Dec 11, 2024 at 3:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Dec 9, 2024 at 10:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > > > >
> > > > > > If the largest transaction is non-streamable, won't the transaction
> > > > > > returned by ReorderBufferLargestTXN() in the other case already
> > > > > > suffice the need?
> > > > >
> > > > > I see your point, but I don’t think it’s quite the same. When
> > > > > ReorderBufferCanStartStreaming() is true, the function
> > > > > ReorderBufferLargestStreamableTopTXN() looks for the largest
> > > > > transaction among those that have a base_snapshot. So, if the largest
> > > > > transaction is aborted but hasn’t yet received a base_snapshot, it
> > > > > will instead select the largest transaction that does have a
> > > > > base_snapshot, which could be significantly smaller than the largest
> > > > > aborted transaction.
> > > >
> > > > IIUC the transaction entries in reorderbuffer have the base snapshot
> > > > before decoding the first change (see SnapBuildProcessChange()). In
> > > > which case the transaction doesn't have the base snapshot and has the
> > > > largest amount of changes? Subtransaction entries could transfer its
> > > > base snapshot to its parent transaction entry but such subtransactions
> > > > will be picked by ReorderBufferLargestTXN().
> > > >
> > > IIRC, there could be cases where reorder buffers of transactions can
> > > grow in size without having a base snapshot, I think transactions
> > > doing DDLs and generating a lot of INVALIDATION messages could fall in
> > > such a category.
> > >
> >
> > Are we recording such changes in the reorder buffer? If so, can you
> > please share how?
>
> xact_decode, do add the XLOG_XACT_INVALIDATIONS in the reorder buffer
> and for such changes we don't call SnapBuildProcessChange() that means
> it is possible to collect such changes in reorder buffer without
> setting the base_snapshot

DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog
changes. I think that when decoding these system catalog changes, we
end up calling SnapBuildProcessChange(). I understand that decoding
XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but
queues invalidation messages to the reorderbuffer, but I still don't
understand cases where a transaction entry is quite big and has only a
lot of invalidation messages.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-15 05:15:17
Message-ID:	CAFiTN-sqmsCFDD_eBUAjyp7Snqbw7amO_4fxLod3xKNbHRupNA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 13, 2024 at 3:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog
> changes. I think that when decoding these system catalog changes, we
> end up calling SnapBuildProcessChange(). I understand that decoding
> XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but
> queues invalidation messages to the reorderbuffer, but I still don't
> understand cases where a transaction entry is quite big and has only a
> lot of invalidation messages.

You are right that SnapBuildProcessChange() will be called when there
are changes in the system catalog. However it is very much possible
that when you are processing the system catalog operation the
snapbuild state is not yet SNAPBUILD_FULL_SNAPSHOT and by the time you
reach to XLOG_XACT_INVALIDATIONS some concurrent transaction get
committed and snapbuild state change to SNAPBUILD_FULL_SNAPSHOT.
However, I need to agree that such a transaction can not really be
very large because this can contain Invalidation messages at max from
a single DDL command so maybe we don't need to do anything special for
them and we can go ahead with the approach you followed in the current
patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-17 11:25:07
Message-ID:	CAA4eK1K4FsYheO_sdhc55iCj3y2+DG2jACxJ374ERR-n5h_aEg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 15, 2024 at 10:45 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Dec 13, 2024 at 3:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > DDLs write not only XLOG_XACT_INVALIDATIONS but also system catalog
> > changes. I think that when decoding these system catalog changes, we
> > end up calling SnapBuildProcessChange(). I understand that decoding
> > XLOG_XACT_INVALIDATIONS doesn't call SnapBuildProcessChange() but
> > queues invalidation messages to the reorderbuffer, but I still don't
> > understand cases where a transaction entry is quite big and has only a
> > lot of invalidation messages.
>
> You are right that SnapBuildProcessChange() will be called when there
> are changes in the system catalog. However it is very much possible
> that when you are processing the system catalog operation the
> snapbuild state is not yet SNAPBUILD_FULL_SNAPSHOT and by the time you
> reach to XLOG_XACT_INVALIDATIONS some concurrent transaction get
> committed and snapbuild state change to SNAPBUILD_FULL_SNAPSHOT.
> However, I need to agree that such a transaction can not really be
> very large because this can contain Invalidation messages at max from
> a single DDL command so maybe we don't need to do anything special for
> them and we can go ahead with the approach you followed in the current
> patch.
>

Thanks, I also think we can proceed with the current approach. So, the
pending task is to address a few comments raised by me.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-19 01:43:56
Message-ID:	CAD21AoDvgeNrubxX-iQWgUVNuYZh6N9XY3-d2YM800rPoJWmOA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 9, 2024 at 9:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've attached a new version patch that incorporates all comments I got so far.
> >
>
> Review comments:

Thank you for reviewing the patch!

> ===============
> 1.
> + * The given transaction is marked as streamed if appropriate and the caller
> + * requested it by passing 'mark_txn_streaming' as true.
> + *
> * 'txn_prepared' indicates that we have decoded the transaction at prepare
> * time.
> */
> static void
> -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> bool txn_prepared)
> +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> bool txn_prepared,
> + bool mark_txn_streaming)
> {
> ...
> }
> + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
> (txn->nentries_mem != 0)))
> + {
> + /*
> + * Mark the transaction as streamed, if appropriate.
>
> The comments related to the above changes don't clarify in which cases
> the 'mark_txn_streaming' should be set. Before this patch, it was
> clear from the comments and code about the cases where we would decide
> to mark it as streamed.

I think we can rename it to txn_streaming for consistency with
txn_prepared. I've changed the comment for that.

>
> 2.
> + /*
> + * Mark the transaction as aborted so we ignore future changes of this
> + * transaction.
>
> /so we ignore/so we can ignore/

Fixed.

>
> 3.
> * Helper function for ReorderBufferProcessTXN to handle the concurrent
> - * abort of the streaming transaction. This resets the TXN such that it
> - * can be used to stream the remaining data of transaction being processed.
> - * This can happen when the subtransaction is aborted and we still want to
> - * continue processing the main or other subtransactions data.
> + * abort of the streaming (prepared) transaction.
> ...
>
> In the above comment, "... streaming (prepared)...", you added
> prepared to imply that this function handles concurrent abort for both
> in-progress and prepared transactions. Am I correct? If so, the
> current change makes it less clear. If you see the comments at its
> caller, they are clearer.

I think we don't need this change as the patch doesn't change what
this function does and what the caller would expect. So removed.

>
> 4.
> + /*
> + * Remember if the transaction is already aborted so we can detect when
> + * the transaction is concurrently aborted during the replay.
> + */
> + already_aborted = rbtxn_is_aborted(txn);
> +
> ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn,
> txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn);
>
> @@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb,
> TransactionId xid,
> * when rollback prepared is decoded and sent, the downstream should be
> * able to rollback such a xact. See comments atop DecodePrepare.
> *
> - * Note, for the concurrent_abort + streaming case a stream_prepare was
> + * Note, for the concurrent abort + streaming case a stream_prepare was
> * already sent within the ReorderBufferReplay call above.
> */
> - if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
> + if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn))
> rb->prepare(rb, txn, txn->final_lsn);
>
> It is not clear from the comments how the 'already_aborted' is
> handled. I think after this patch we would have already truncated all
> its changes. If so, why do we need to try to replay the changes of
> such a xact?

I used ReorderBufferReplay() for convenience; it sends begin_prepare()
and prepare() appropriately, handles streaming-prepared transactions,
and updates statistics etc. But as you pointed out, it would not be
necessary to set up a historical snapshot etc. I agree that we don't
need to try replaying such aborted transactions but I'd like to
confirm we don't really need to execute invalidation messages evein in
aborted transactions.

>
> 5.
> +/*
> + * Check the transaction status by looking CLOG and discard all changes if
> + * the transaction is aborted. The transaction status is cached in
> + * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
> + * next call. Return true if the transaction is aborted, otherwise return
> + * false.
> + *
> + * When the 'debug_logical_replication_streaming' is set to "immediate", we
> + * don't check the transaction status, meaning the caller will always process
> + * this transaction.
> + */
> +static bool
> +ReorderBufferTruncateTXNIfAborted(ReorderBuffer *rb, ReorderBufferTXN *txn)
> +{
>
> I think this function is being invoked to mark a sub-transaction as
> aborted. It is better to explain in comments how it interacts with
> sub-transactions, why it is okay to mark them as aborted, and how the
> other parts of the system interact with it.

This function can be called for top-level transactions and
subtransactions. IIUC there is no main difference between calling it
for top-level transaction and subtransaction. What interaction with
subtransactions are you concerned about?

I've attached the updated patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v11-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	22.3 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-19 10:56:34
Message-ID:	CAA4eK1KBRWgX-7E=m9U418V2LE9A8=JgWk3EFEbdP_xwrRz97A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 19, 2024 at 7:14 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Dec 9, 2024 at 9:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I've attached a new version patch that incorporates all comments I got so far.
> > >
> >
> > Review comments:
>
> Thank you for reviewing the patch!
>
> > ===============
> > 1.
> > + * The given transaction is marked as streamed if appropriate and the caller
> > + * requested it by passing 'mark_txn_streaming' as true.
> > + *
> > * 'txn_prepared' indicates that we have decoded the transaction at prepare
> > * time.
> > */
> > static void
> > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > bool txn_prepared)
> > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > bool txn_prepared,
> > + bool mark_txn_streaming)
> > {
> > ...
> > }
> > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
> > (txn->nentries_mem != 0)))
> > + {
> > + /*
> > + * Mark the transaction as streamed, if appropriate.
> >
> > The comments related to the above changes don't clarify in which cases
> > the 'mark_txn_streaming' should be set. Before this patch, it was
> > clear from the comments and code about the cases where we would decide
> > to mark it as streamed.
>
> I think we can rename it to txn_streaming for consistency with
> txn_prepared. I've changed the comment for that.
>

@@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
ReorderBufferChange *specinsert)
{
/* Discard the changes that we just streamed */
- ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);

@@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb,
ReorderBufferTXN *txn)
* full cleanup will happen as part of the COMMIT PREPAREDs, so now
* just truncate txn by removing changes and tuplecids.
*/
- ReorderBufferTruncateTXN(rb, txn, true);
+ ReorderBufferTruncateTXN(rb, txn, true, true);

In both the above places, the patch unconditionally passes the
'txn_streaming' even for prepared transactions when it wouldn't be a
streaming xact. Inside the function, the patch handled that by first
checking whether the transaction is prepared (txn_prepared). So, the
logic will work but the function signature and the way its callers are
using make it difficult to use and extend in the future.

I think for the first case, we should get the streaming parameter in
ReorderBufferResetTXN(), and for the second case
ReorderBufferStreamCommit(), we should pass it as false because by
that time transaction is already streamed and prepared. We are
invoking it for cleanup. Even when we call ReorderBufferTruncateTXN()
from ReorderBufferCheckAndTruncateAbortedTXN(), it will be better to
write a comment at the caller about why we are passing this parameter
as false.

>
> >
> > 4.
> > + /*
> > + * Remember if the transaction is already aborted so we can detect when
> > + * the transaction is concurrently aborted during the replay.
> > + */
> > + already_aborted = rbtxn_is_aborted(txn);
> > +
> > ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn,
> > txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn);
> >
> > @@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb,
> > TransactionId xid,
> > * when rollback prepared is decoded and sent, the downstream should be
> > * able to rollback such a xact. See comments atop DecodePrepare.
> > *
> > - * Note, for the concurrent_abort + streaming case a stream_prepare was
> > + * Note, for the concurrent abort + streaming case a stream_prepare was
> > * already sent within the ReorderBufferReplay call above.
> > */
> > - if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
> > + if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn))
> > rb->prepare(rb, txn, txn->final_lsn);
> >
> > It is not clear from the comments how the 'already_aborted' is
> > handled. I think after this patch we would have already truncated all
> > its changes. If so, why do we need to try to replay the changes of
> > such a xact?
>
> I used ReorderBufferReplay() for convenience; it sends begin_prepare()
> and prepare() appropriately, handles streaming-prepared transactions,
> and updates statistics etc. But as you pointed out, it would not be
> necessary to set up a historical snapshot etc. I agree that we don't
> need to try replaying such aborted transactions but I'd like to
> confirm we don't really need to execute invalidation messages evein in
> aborted transactions.
>

We need to execute invalidations if we have loaded any cache entries,
for example in the case of streaming. See comments in the function
ReorderBufferAbort(). However, I find both the current changes and the
previous patch a bit difficult to follow. How about if we instead
invent a flag like RBTXN_SENT_PREPARE or something like that and then
use that flag to decide whether to send prepare in
ReorderBufferPrepare(). Then add comments for the cases in which
prepare will be sent from ReorderBufferPrepare().

*
+ * Since we don't check the transaction status while replaying the
+ * transaction, we don't need to reset toast reconstruction data here.
+ */
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
+
+ /* All changes should be discarded */
+ Assert(txn->size == 0);

Can we expect the size to be zero without resetting the toast data? In
ReorderBufferToastReset(), we call ReorderBufferReturnChange() which
reduces the change size. So, won't that size still be accounted for in
txn?

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-19 19:11:50
Message-ID:	CAD21AoCv1Ei6Swkj-BdiFweJsr24YkeXpCXgaVNZApZs2VLnSw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 19, 2024 at 2:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Dec 19, 2024 at 7:14 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 9, 2024 at 9:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Nov 26, 2024 at 3:03 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > I've attached a new version patch that incorporates all comments I got so far.
> > > >
> > >
> > > Review comments:
> >
> > Thank you for reviewing the patch!
> >
> > > ===============
> > > 1.
> > > + * The given transaction is marked as streamed if appropriate and the caller
> > > + * requested it by passing 'mark_txn_streaming' as true.
> > > + *
> > > * 'txn_prepared' indicates that we have decoded the transaction at prepare
> > > * time.
> > > */
> > > static void
> > > -ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > > bool txn_prepared)
> > > +ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > > bool txn_prepared,
> > > + bool mark_txn_streaming)
> > > {
> > > ...
> > > }
> > > + else if (mark_txn_streaming && (rbtxn_is_toptxn(txn) ||
> > > (txn->nentries_mem != 0)))
> > > + {
> > > + /*
> > > + * Mark the transaction as streamed, if appropriate.
> > >
> > > The comments related to the above changes don't clarify in which cases
> > > the 'mark_txn_streaming' should be set. Before this patch, it was
> > > clear from the comments and code about the cases where we would decide
> > > to mark it as streamed.
> >
> > I think we can rename it to txn_streaming for consistency with
> > txn_prepared. I've changed the comment for that.
> >
>
> @@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
> ReorderBufferChange *specinsert)
> {
> /* Discard the changes that we just streamed */
> - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);
>
> @@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb,
> ReorderBufferTXN *txn)
> * full cleanup will happen as part of the COMMIT PREPAREDs, so now
> * just truncate txn by removing changes and tuplecids.
> */
> - ReorderBufferTruncateTXN(rb, txn, true);
> + ReorderBufferTruncateTXN(rb, txn, true, true);
>
> In both the above places, the patch unconditionally passes the
> 'txn_streaming' even for prepared transactions when it wouldn't be a
> streaming xact. Inside the function, the patch handled that by first
> checking whether the transaction is prepared (txn_prepared). So, the
> logic will work but the function signature and the way its callers are
> using make it difficult to use and extend in the future.
>

Valid concern.

> I think for the first case, we should get the streaming parameter in
> ReorderBufferResetTXN(),

I think we cannot pass 'rbtxn_is_streamed(txn)' to
ReorderBufferTruncateTXN() in the first case. ReorderBufferResetTXN()
is called to handle the concurrent abort of the streaming transaction
but the transaction might not have been marked as streamed at that
time. Since ReorderBufferTruncateTXN() is responsible for both
discarding changes and marking the transaction as streamed, we need to
unconditionally pass txn_streaming = true in this case.

> and for the second case
> ReorderBufferStreamCommit(), we should pass it as false because by
> that time transaction is already streamed and prepared. We are
> invoking it for cleanup.

Agreed.

> Even when we call ReorderBufferTruncateTXN()
> from ReorderBufferCheckAndTruncateAbortedTXN(), it will be better to
> write a comment at the caller about why we are passing this parameter
> as false.

Agreed.

On second thoughts, I think the confusion related to txn_streaming
came from the fact that ReorderBufferTruncateTXN() does both
discarding changes and marking the transaction as streamed. If we make
the function do just discarding changes, we don't need to introduce
the txn_streaming function argument. Instead, we need to have a
separate function to mark the transaction as streamed and call it
before ReorderBufferTruncateTXN() where appropriate. And
ReorderBufferCheckAndTruncateAbortedTXN() just calls
ReorderBufferTruncateTXN().

>
> >
> > >
> > > 4.
> > > + /*
> > > + * Remember if the transaction is already aborted so we can detect when
> > > + * the transaction is concurrently aborted during the replay.
> > > + */
> > > + already_aborted = rbtxn_is_aborted(txn);
> > > +
> > > ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn,
> > > txn->xact_time.prepare_time, txn->origin_id, txn->origin_lsn);
> > >
> > > @@ -2832,10 +2918,10 @@ ReorderBufferPrepare(ReorderBuffer *rb,
> > > TransactionId xid,
> > > * when rollback prepared is decoded and sent, the downstream should be
> > > * able to rollback such a xact. See comments atop DecodePrepare.
> > > *
> > > - * Note, for the concurrent_abort + streaming case a stream_prepare was
> > > + * Note, for the concurrent abort + streaming case a stream_prepare was
> > > * already sent within the ReorderBufferReplay call above.
> > > */
> > > - if (txn->concurrent_abort && !rbtxn_is_streamed(txn))
> > > + if (!already_aborted && rbtxn_is_aborted(txn) && !rbtxn_is_streamed(txn))
> > > rb->prepare(rb, txn, txn->final_lsn);
> > >
> > > It is not clear from the comments how the 'already_aborted' is
> > > handled. I think after this patch we would have already truncated all
> > > its changes. If so, why do we need to try to replay the changes of
> > > such a xact?
> >
> > I used ReorderBufferReplay() for convenience; it sends begin_prepare()
> > and prepare() appropriately, handles streaming-prepared transactions,
> > and updates statistics etc. But as you pointed out, it would not be
> > necessary to set up a historical snapshot etc. I agree that we don't
> > need to try replaying such aborted transactions but I'd like to
> > confirm we don't really need to execute invalidation messages evein in
> > aborted transactions.
> >
>
> We need to execute invalidations if we have loaded any cache entries,
> for example in the case of streaming. See comments in the function
> ReorderBufferAbort(). However, I find both the current changes and the
> previous patch a bit difficult to follow. How about if we instead
> invent a flag like RBTXN_SENT_PREPARE or something like that and then
> use that flag to decide whether to send prepare in
> ReorderBufferPrepare(). Then add comments for the cases in which
> prepare will be sent from ReorderBufferPrepare().

The idea of using RBTXN_SENT_PREPARE sounds good to me. I'll use it.

>
> *
> + * Since we don't check the transaction status while replaying the
> + * transaction, we don't need to reset toast reconstruction data here.
> + */
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> +
> + /* All changes should be discarded */
> + Assert(txn->size == 0);
>
> Can we expect the size to be zero without resetting the toast data? In
> ReorderBufferToastReset(), we call ReorderBufferReturnChange() which
> reduces the change size. So, won't that size still be accounted for in
> txn?

IIUC the toast reconstruction data is created only while replaying the
transaction but the ReorderBufferCheckAndTruncateAbortedTXN() is not
called during that. So I think any toast data should not be
accumulated at that time.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-20 05:36:47
Message-ID:	CAA4eK1LpmvKOQP8TG0v-apw1tQeqKsS4KDS8a8Ui37swinOM9w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 20, 2024 at 12:42 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Dec 19, 2024 at 2:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > @@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
> > ReorderBufferTXN *txn,
> > ReorderBufferChange *specinsert)
> > {
> > /* Discard the changes that we just streamed */
> > - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);
> >
> > @@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb,
> > ReorderBufferTXN *txn)
> > * full cleanup will happen as part of the COMMIT PREPAREDs, so now
> > * just truncate txn by removing changes and tuplecids.
> > */
> > - ReorderBufferTruncateTXN(rb, txn, true);
> > + ReorderBufferTruncateTXN(rb, txn, true, true);
> >
> > In both the above places, the patch unconditionally passes the
> > 'txn_streaming' even for prepared transactions when it wouldn't be a
> > streaming xact. Inside the function, the patch handled that by first
> > checking whether the transaction is prepared (txn_prepared). So, the
> > logic will work but the function signature and the way its callers are
> > using make it difficult to use and extend in the future.
> >
>
> Valid concern.
>
> > I think for the first case, we should get the streaming parameter in
> > ReorderBufferResetTXN(),
>
> I think we cannot pass 'rbtxn_is_streamed(txn)' to
> ReorderBufferTruncateTXN() in the first case. ReorderBufferResetTXN()
> is called to handle the concurrent abort of the streaming transaction
> but the transaction might not have been marked as streamed at that
> time. Since ReorderBufferTruncateTXN() is responsible for both
> discarding changes and marking the transaction as streamed, we need to
> unconditionally pass txn_streaming = true in this case.
>

Can't we use 'stream_started' variable available at the call site of
ReorderBufferResetTXN() for our purpose?

>
> On second thoughts, I think the confusion related to txn_streaming
> came from the fact that ReorderBufferTruncateTXN() does both
> discarding changes and marking the transaction as streamed. If we make
> the function do just discarding changes, we don't need to introduce
> the txn_streaming function argument. Instead, we need to have a
> separate function to mark the transaction as streamed and call it
> before ReorderBufferTruncateTXN() where appropriate. And
> ReorderBufferCheckAndTruncateAbortedTXN() just calls
> ReorderBufferTruncateTXN().
>

That sounds good to me. IIRC, initially, ReorderBufferTruncateTXN()
was used to truncate changes only for streaming transactions. Later,
it evolved for prepared facts and now for facts where we explicitly
detect whether they are aborted. So, I think it makes sense to improve
it by following your suggestion.

>
> >
> > *
> > + * Since we don't check the transaction status while replaying the
> > + * transaction, we don't need to reset toast reconstruction data here.
> > + */
> > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> > +
> > + /* All changes should be discarded */
> > + Assert(txn->size == 0);
> >
> > Can we expect the size to be zero without resetting the toast data? In
> > ReorderBufferToastReset(), we call ReorderBufferReturnChange() which
> > reduces the change size. So, won't that size still be accounted for in
> > txn?
>
> IIUC the toast reconstruction data is created only while replaying the
> transaction but the ReorderBufferCheckAndTruncateAbortedTXN() is not
> called during that. So I think any toast data should not be
> accumulated at that time.
>

How about the case where in the first pass, we streamed the
transaction partially, where it has reconstructed toast data, and
then, in the second pass, when memory becomes full, the reorder buffer
contains some partial data, due to which it tries to spill the data
and finds that the transaction is aborted? I could be wrong here
because I haven't tried to test this code path, but I see that it is
theoretically possible.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2024-12-20 07:25:33
Message-ID:	CAD21AoCK-R5tg9SaL2F-6dtdvj-6JMhmkvv1MmCwCOU48_gn6g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 19, 2024 at 9:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Dec 20, 2024 at 12:42 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Dec 19, 2024 at 2:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > >
> > > @@ -2067,7 +2143,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb,
> > > ReorderBufferTXN *txn,
> > > ReorderBufferChange *specinsert)
> > > {
> > > /* Discard the changes that we just streamed */
> > > - ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> > > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), true);
> > >
> > > @@ -1924,7 +2000,7 @@ ReorderBufferStreamCommit(ReorderBuffer *rb,
> > > ReorderBufferTXN *txn)
> > > * full cleanup will happen as part of the COMMIT PREPAREDs, so now
> > > * just truncate txn by removing changes and tuplecids.
> > > */
> > > - ReorderBufferTruncateTXN(rb, txn, true);
> > > + ReorderBufferTruncateTXN(rb, txn, true, true);
> > >
> > > In both the above places, the patch unconditionally passes the
> > > 'txn_streaming' even for prepared transactions when it wouldn't be a
> > > streaming xact. Inside the function, the patch handled that by first
> > > checking whether the transaction is prepared (txn_prepared). So, the
> > > logic will work but the function signature and the way its callers are
> > > using make it difficult to use and extend in the future.
> > >
> >
> > Valid concern.
> >
> > > I think for the first case, we should get the streaming parameter in
> > > ReorderBufferResetTXN(),
> >
> > I think we cannot pass 'rbtxn_is_streamed(txn)' to
> > ReorderBufferTruncateTXN() in the first case. ReorderBufferResetTXN()
> > is called to handle the concurrent abort of the streaming transaction
> > but the transaction might not have been marked as streamed at that
> > time. Since ReorderBufferTruncateTXN() is responsible for both
> > discarding changes and marking the transaction as streamed, we need to
> > unconditionally pass txn_streaming = true in this case.
> >
>
> Can't we use 'stream_started' variable available at the call site of
> ReorderBufferResetTXN() for our purpose?

Right, we can use it.

>
> >
> > On second thoughts, I think the confusion related to txn_streaming
> > came from the fact that ReorderBufferTruncateTXN() does both
> > discarding changes and marking the transaction as streamed. If we make
> > the function do just discarding changes, we don't need to introduce
> > the txn_streaming function argument. Instead, we need to have a
> > separate function to mark the transaction as streamed and call it
> > before ReorderBufferTruncateTXN() where appropriate. And
> > ReorderBufferCheckAndTruncateAbortedTXN() just calls
> > ReorderBufferTruncateTXN().
> >
>
> That sounds good to me. IIRC, initially, ReorderBufferTruncateTXN()
> was used to truncate changes only for streaming transactions. Later,
> it evolved for prepared facts and now for facts where we explicitly
> detect whether they are aborted. So, I think it makes sense to improve
> it by following your suggestion.

I've changed the patch accordingly.

>
> >
> > >
> > > *
> > > + * Since we don't check the transaction status while replaying the
> > > + * transaction, we don't need to reset toast reconstruction data here.
> > > + */
> > > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn), false);
> > > +
> > > + /* All changes should be discarded */
> > > + Assert(txn->size == 0);
> > >
> > > Can we expect the size to be zero without resetting the toast data? In
> > > ReorderBufferToastReset(), we call ReorderBufferReturnChange() which
> > > reduces the change size. So, won't that size still be accounted for in
> > > txn?
> >
> > IIUC the toast reconstruction data is created only while replaying the
> > transaction but the ReorderBufferCheckAndTruncateAbortedTXN() is not
> > called during that. So I think any toast data should not be
> > accumulated at that time.
> >
>
> How about the case where in the first pass, we streamed the
> transaction partially, where it has reconstructed toast data, and
> then, in the second pass, when memory becomes full, the reorder buffer
> contains some partial data, due to which it tries to spill the data
> and finds that the transaction is aborted? I could be wrong here
> because I haven't tried to test this code path, but I see that it is
> theoretically possible.

Yeah, it seems possible. I've changed the patch to reset toast data as well.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v12-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	21.1 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-07 01:52:12
Message-ID:	CAHut+PuOAG+MV-EsWtRFy-cvxEBJFsXNY1mzGSgk2U9Rw_cZ5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Sawada-San.

Here are some review comments for the patch v12-0001.

======
.../replication/logical/reorderbuffer.c

ReorderBufferCheckAndTruncateAbortedTXN:

1.
+/*
+ * Check the transaction status by looking CLOG and discard all changes if
+ * the transaction is aborted. The transaction status is cached in
+ * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
+ * next call.
+ *
+ * Return true if the transaction is aborted, otherwise return false.
+ *
+ * When the 'debug_logical_replication_streaming' is set to "immediate", we
+ * don't check the transaction status, meaning the caller will always process
+ * this transaction.
+ */

Typo "by looking CLOG".

It should be something like "by CLOG lookup".

~~~

2.
+ /* Quick return if the transaction status is already known */
+ if (rbtxn_is_committed(txn))
+ return false;
+ if (rbtxn_is_aborted(txn))
+ {
+ /* Already-aborted transactions should not have any changes */
+ Assert(txn->size == 0);
+
+ return true;
+ }
+

Consider changing that 2nd 'if' to be 'else if', because then that
will make it more obvious that the earlier single line comment "Quick
return if...", in fact applies to both these conditions.

Alternatively, make that a block comment and add some blank lines like:

+ /*
+ * Quick returns if the transaction status is already known.
+ */
+
+ if (rbtxn_is_committed(txn))
+ return false;
+
+ if (rbtxn_is_aborted(txn))
+ {
+ /* Already-aborted transactions should not have any changes */
+ Assert(txn->size == 0);
+
+ return true;
+ }

~~~

3.
+ if (TransactionIdDidCommit(txn->xid))
+ {
+ /*
+ * Remember the transaction is committed so that we can skip CLOG
+ * check next time, avoiding the pressure on CLOG lookup.
+ */
+ Assert(!rbtxn_is_aborted(txn));
+ txn->txn_flags |= RBTXN_IS_COMMITTED;
+ return false;
+ }
+
+ /*
+ * The transaction aborted. We discard the changes we've collected so far
+ * and toast reconstruction data. The full cleanup will happen as part of
+ * decoding ABORT record of this transaction.
+ */
+ ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
+ ReorderBufferToastReset(rb, txn);
+
+ /* All changes should be discarded */
+ Assert(txn->size == 0);
+
+ /*
+ * Mark the transaction as aborted so we can ignore future changes of this
+ * transaction.
+ */
+ Assert(!rbtxn_is_committed(txn));
+ txn->txn_flags |= RBTXN_IS_ABORTED;
+
+ return true;
+}

3a.
That whole last part related to "The transaction aborted", might be
clearer if the whole chunk of code was in an 'else' block from the
previous "if (TransactionIdDidCommit(txn->xid))".

3b.
"toast" is an acronym so it should be written in uppercase IMO.

3c.
The "and toast reconstruction data" seems to be missing a word/s. (??)
- "... and also discard TOAST reconstruction data"
- "... and reset TOAST reconstruction data"

~~~

ReorderBufferMaybeMarkTXNStreamed:

4.
+static void
+ReorderBufferMaybeMarkTXNStreamed(ReorderBuffer *rb, ReorderBufferTXN *txn)
+{
+ /*
+ * The top-level transaction, is marked as streamed always, even if it
+ * does not contain any changes (that is, when all the changes are in
+ * subtransactions).
+ *
+ * For subtransactions, we only mark them as streamed when there are
+ * changes in them.
+ *
+ * We do it this way because of aborts - we don't want to send aborts for
+ * XIDs the downstream is not aware of. And of course, it always knows
+ * about the toplevel xact (we send the XID in all messages), but we never
+ * stream XIDs of empty subxacts.
+ */
+ if (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))
+ txn->txn_flags |= RBTXN_IS_STREAMED;
+}

/the toplevel xact/the top-level xact/

~~~

5.
/*
- * We send the prepare for the concurrently aborted xacts so that later
- * when rollback prepared is decoded and sent, the downstream should be
- * able to rollback such a xact. See comments atop DecodePrepare.
- *
- * Note, for the concurrent_abort + streaming case a stream_prepare was
- * already sent within the ReorderBufferReplay call above.
+ * Send a prepare if not yet. It happens if we detected the concurrent
+ * abort while replaying the non-streaming transaction.
*/

The first sentence "if not yet" seems incomplete/missing words.

SUGGESTION
Send a prepare if not already done so. This might occur if we had
detected a concurrent abort while replaying the non-streaming
transaction.

======
src/include/replication/reorderbuffer.h

6.
#define RBTXN_PREPARE 0x0040
#define RBTXN_SKIPPED_PREPARE 0x0080
#define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
+#define RBTXN_SENT_PREPARE 0x0200
+#define RBTXN_IS_COMMITTED 0x0400
+#define RBTXN_IS_ABORTED 0x0800

Something about this new RBTXN_SENT_PREPARE name seems inconsistent to me.

I feel there is now also some introduced ambiguity with these macros:

/* Has this transaction been prepared? */
#define rbtxn_prepared(txn) \
( \
((txn)->txn_flags & RBTXN_PREPARE) != 0 \
)

+/* Has a prepare or stream_prepare already been sent? */
+#define rbtxn_sent_prepare(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_SENT_PREPARE) != 0 \
+)

e.g. It's also not clear from the comments what is the distinction
between the existing macro comment "Has this transaction been
prepared?" and the new macro comment "Has a prepare or stream_prepare
already been sent?".

Indeed, I was wondering if some of the places currently calling
"rbtxn_prepared(txn)" should now strictly be calling
"rbtxn_sent_prepared(txn)" macro instead?

IMO some minor renaming of the existing constants (and also their
associated macros) might help to make all this more coherent. For
example, perhaps like:

#define RBTXN_IS_PREPARE_NEEDED 0x0040
#define RBTXN_IS_PREPARE_SKIPPED 0x0080
#define RBTXN_IS_PREPARE_SENT 0x0200

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-13 11:06:54
Message-ID:	CAA4eK1L7L0t5NzJsw3q97koh6BjDOKBPmcY7-tcXb_iEph=Y+Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 7, 2025 at 7:22 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> ======
> src/include/replication/reorderbuffer.h
>
> 6.
> #define RBTXN_PREPARE 0x0040
> #define RBTXN_SKIPPED_PREPARE 0x0080
> #define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
> +#define RBTXN_SENT_PREPARE 0x0200
> +#define RBTXN_IS_COMMITTED 0x0400
> +#define RBTXN_IS_ABORTED 0x0800
>
> Something about this new RBTXN_SENT_PREPARE name seems inconsistent to me.
>
> I feel there is now also some introduced ambiguity with these macros:
>
> /* Has this transaction been prepared? */
> #define rbtxn_prepared(txn) \
> ( \
> ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
> )
>
> +/* Has a prepare or stream_prepare already been sent? */
> +#define rbtxn_sent_prepare(txn) \
> +( \
> + ((txn)->txn_flags & RBTXN_SENT_PREPARE) != 0 \
> +)
>
>
> e.g. It's also not clear from the comments what is the distinction
> between the existing macro comment "Has this transaction been
> prepared?" and the new macro comment "Has a prepare or stream_prepare
> already been sent?".
>
> Indeed, I was wondering if some of the places currently calling
> "rbtxn_prepared(txn)" should now strictly be calling
> "rbtxn_sent_prepared(txn)" macro instead?
>

Right, I think after this change, it appears we should try to rename
the existing constants. One place where we can consider to use new
macro is the current usage of rbtxn_prepared() in
SnapBuildDistributeNewCatalogSnapshot().

> IMO some minor renaming of the existing constants (and also their
> associated macros) might help to make all this more coherent. For
> example, perhaps like:
>
> #define RBTXN_IS_PREPARE_NEEDED 0x0040
>

The other option could be RBTXN_IS_PREPARE_REQUESTED.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-13 18:54:52
Message-ID:	CAD21AoBgxqFVKq1yf+NR2dHBt47xtkFQ=JtxwcAv1PSjTahoPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 13, 2025 at 3:07 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jan 7, 2025 at 7:22 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > ======
> > src/include/replication/reorderbuffer.h
> >
> > 6.
> > #define RBTXN_PREPARE 0x0040
> > #define RBTXN_SKIPPED_PREPARE 0x0080
> > #define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
> > +#define RBTXN_SENT_PREPARE 0x0200
> > +#define RBTXN_IS_COMMITTED 0x0400
> > +#define RBTXN_IS_ABORTED 0x0800
> >
> > Something about this new RBTXN_SENT_PREPARE name seems inconsistent to me.
> >
> > I feel there is now also some introduced ambiguity with these macros:
> >
> > /* Has this transaction been prepared? */
> > #define rbtxn_prepared(txn) \
> > ( \
> > ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
> > )
> >
> > +/* Has a prepare or stream_prepare already been sent? */
> > +#define rbtxn_sent_prepare(txn) \
> > +( \
> > + ((txn)->txn_flags & RBTXN_SENT_PREPARE) != 0 \
> > +)
> >
> >
> > e.g. It's also not clear from the comments what is the distinction
> > between the existing macro comment "Has this transaction been
> > prepared?" and the new macro comment "Has a prepare or stream_prepare
> > already been sent?".
> >
> > Indeed, I was wondering if some of the places currently calling
> > "rbtxn_prepared(txn)" should now strictly be calling
> > "rbtxn_sent_prepared(txn)" macro instead?
> >
>
> Right, I think after this change, it appears we should try to rename
> the existing constants. One place where we can consider to use new
> macro is the current usage of rbtxn_prepared() in
> SnapBuildDistributeNewCatalogSnapshot().

I think that RBTXN_PREPARE would mean that the transaction needs to be
prepared but it doesn't mean that a prepare or a stream_prepare has
already been sent. And RBTXN_SENT_PREPARE adds some internal details
about whether a prepare or a stream_prepare has actually been sent.
IIUC RBTXN_SENT_PREPARE is used only in a short term in
ReorderBufferPrepare(). So outside of reorderbuffer such as
snapbuild.c doesn't need to care about the RBTXN_SENT_PREPARE.

>
> > IMO some minor renaming of the existing constants (and also their
> > associated macros) might help to make all this more coherent. For
> > example, perhaps like:
> >
> > #define RBTXN_IS_PREPARE_NEEDED 0x0040
> >
>
> The other option could be RBTXN_IS_PREPARE_REQUESTED.

I'm a bit concerned that these names sound like a state that the
transaction needs to be prepared but has not been done yet. But
rbtxn_prepared() is widely used to check if the transaction is a
prepared transaction regardless of a prepare or a stream_prepare
actually being sent. How about RBTXN_IS_PREPARED_TXN and
rbtxn_is_preapred_txn()? I think it would indicate well that the
transaction needs to be processed as a prepared transaction.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-13 18:56:13
Message-ID:	CAD21AoB_s-7J000LjdEeMWGjhR=EOYwTMe9pUpEyBoNoiE9U0w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 6, 2025 at 5:52 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawada-San.
>
> Here are some review comments for the patch v12-0001.

Thank you for reviewing the patch!

>
> ======
> .../replication/logical/reorderbuffer.c
>
> ReorderBufferCheckAndTruncateAbortedTXN:
>
> 1.
> +/*
> + * Check the transaction status by looking CLOG and discard all changes if
> + * the transaction is aborted. The transaction status is cached in
> + * txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
> + * next call.
> + *
> + * Return true if the transaction is aborted, otherwise return false.
> + *
> + * When the 'debug_logical_replication_streaming' is set to "immediate", we
> + * don't check the transaction status, meaning the caller will always process
> + * this transaction.
> + */
>
> Typo "by looking CLOG".
>
> It should be something like "by CLOG lookup".

Fixed.

>
> ~~~
>
> 2.
> + /* Quick return if the transaction status is already known */
> + if (rbtxn_is_committed(txn))
> + return false;
> + if (rbtxn_is_aborted(txn))
> + {
> + /* Already-aborted transactions should not have any changes */
> + Assert(txn->size == 0);
> +
> + return true;
> + }
> +
>
> Consider changing that 2nd 'if' to be 'else if', because then that
> will make it more obvious that the earlier single line comment "Quick
> return if...", in fact applies to both these conditions.
>
> Alternatively, make that a block comment and add some blank lines like:
>
> + /*
> + * Quick returns if the transaction status is already known.
> + */
> +
> + if (rbtxn_is_committed(txn))
> + return false;
> +
> + if (rbtxn_is_aborted(txn))
> + {
> + /* Already-aborted transactions should not have any changes */
> + Assert(txn->size == 0);
> +
> + return true;
> + }

I used a block comment.

>
> ~~~
>
> 3.
> + if (TransactionIdDidCommit(txn->xid))
> + {
> + /*
> + * Remember the transaction is committed so that we can skip CLOG
> + * check next time, avoiding the pressure on CLOG lookup.
> + */
> + Assert(!rbtxn_is_aborted(txn));
> + txn->txn_flags |= RBTXN_IS_COMMITTED;
> + return false;
> + }
> +
> + /*
> + * The transaction aborted. We discard the changes we've collected so far
> + * and toast reconstruction data. The full cleanup will happen as part of
> + * decoding ABORT record of this transaction.
> + */
> + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> + ReorderBufferToastReset(rb, txn);
> +
> + /* All changes should be discarded */
> + Assert(txn->size == 0);
> +
> + /*
> + * Mark the transaction as aborted so we can ignore future changes of this
> + * transaction.
> + */
> + Assert(!rbtxn_is_committed(txn));
> + txn->txn_flags |= RBTXN_IS_ABORTED;
> +
> + return true;
> +}
>
> 3a.
> That whole last part related to "The transaction aborted", might be
> clearer if the whole chunk of code was in an 'else' block from the
> previous "if (TransactionIdDidCommit(txn->xid))".

I'm not sure it increases the readability. I think it pretty makes
sense to me that we return false in the 'if
(TransactionIdDidCommit(txn->xid))' block. If we add the 'else' block,
the reader might be confused as we have the 'else' block in spite of
having the return in the 'if' block. We can add a local variable for
the result and return it at the end of the function but I'm not sure
it's a good idea to increase the readability.

>
> ~
>
> 3b.
> "toast" is an acronym so it should be written in uppercase IMO.
>
> ~

Hmm, it seems we don't use TOAST at all at least in reorderbuffer.c. I
would prefer to make it consistent with others.

>
> 3c.
> The "and toast reconstruction data" seems to be missing a word/s. (??)
> - "... and also discard TOAST reconstruction data"
> - "... and reset TOAST reconstruction data"

I don't understand this comment. What words are you suggesting adding
to these sentences?

>
> ~~~
>
> ReorderBufferMaybeMarkTXNStreamed:
>
> 4.
> +static void
> +ReorderBufferMaybeMarkTXNStreamed(ReorderBuffer *rb, ReorderBufferTXN *txn)
> +{
> + /*
> + * The top-level transaction, is marked as streamed always, even if it
> + * does not contain any changes (that is, when all the changes are in
> + * subtransactions).
> + *
> + * For subtransactions, we only mark them as streamed when there are
> + * changes in them.
> + *
> + * We do it this way because of aborts - we don't want to send aborts for
> + * XIDs the downstream is not aware of. And of course, it always knows
> + * about the toplevel xact (we send the XID in all messages), but we never
> + * stream XIDs of empty subxacts.
> + */
> + if (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))
> + txn->txn_flags |= RBTXN_IS_STREAMED;
> +}
>
> /the toplevel xact/the top-level xact/

Fixed.

>
> ~~~
>
> 5.
> /*
> - * We send the prepare for the concurrently aborted xacts so that later
> - * when rollback prepared is decoded and sent, the downstream should be
> - * able to rollback such a xact. See comments atop DecodePrepare.
> - *
> - * Note, for the concurrent_abort + streaming case a stream_prepare was
> - * already sent within the ReorderBufferReplay call above.
> + * Send a prepare if not yet. It happens if we detected the concurrent
> + * abort while replaying the non-streaming transaction.
> */
>
> The first sentence "if not yet" seems incomplete/missing words.
>
> SUGGESTION
> Send a prepare if not already done so. This might occur if we had
> detected a concurrent abort while replaying the non-streaming
> transaction.

Fixed.

>
> ======
> src/include/replication/reorderbuffer.h
>
> 6.
> #define RBTXN_PREPARE 0x0040
> #define RBTXN_SKIPPED_PREPARE 0x0080
> #define RBTXN_HAS_STREAMABLE_CHANGE 0x0100
> +#define RBTXN_SENT_PREPARE 0x0200
> +#define RBTXN_IS_COMMITTED 0x0400
> +#define RBTXN_IS_ABORTED 0x0800
>
> Something about this new RBTXN_SENT_PREPARE name seems inconsistent to me.
>
> I feel there is now also some introduced ambiguity with these macros:
>
> /* Has this transaction been prepared? */
> #define rbtxn_prepared(txn) \
> ( \
> ((txn)->txn_flags & RBTXN_PREPARE) != 0 \
> )
>
> +/* Has a prepare or stream_prepare already been sent? */
> +#define rbtxn_sent_prepare(txn) \
> +( \
> + ((txn)->txn_flags & RBTXN_SENT_PREPARE) != 0 \
> +)
>
>
> e.g. It's also not clear from the comments what is the distinction
> between the existing macro comment "Has this transaction been
> prepared?" and the new macro comment "Has a prepare or stream_prepare
> already been sent?".
>
> Indeed, I was wondering if some of the places currently calling
> "rbtxn_prepared(txn)" should now strictly be calling
> "rbtxn_sent_prepared(txn)" macro instead?
>
> IMO some minor renaming of the existing constants (and also their
> associated macros) might help to make all this more coherent. For
> example, perhaps like:
>
> #define RBTXN_IS_PREPARE_NEEDED 0x0040
> #define RBTXN_IS_PREPARE_SKIPPED 0x0080
> #define RBTXN_IS_PREPARE_SENT 0x0200
>

Fair point. I've clarified the comments for macros. As for renaming
the existing constants and associated macros, I sent my thoughts in an
email[1] and implemented it in a separate patch (the 0002 patch).

Regards,

[1] /message-id/CAD21AoBgxqFVKq1yf%2BNR2dHBt47xtkFQ%3DJtxwcAv1PSjTahoPw%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v13-0002-Rename-RBTXN_XXX-constants-for-better-consistenc.patch	application/octet-stream	6.9 KB
v13-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	21.6 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-14 01:35:45
Message-ID:	CAHut+PvCtpBrWBkNuqgu7QB+4ZRxLeY7WnZvFmCYHByzsWhfmg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Sawada-San. Here are some cosmetic review comments for the patch v13-0001.

======
Commit message

1.
This commit introduces an additional CLOG lookup to check the
transaction status, so the logical decoding skips further change also
when it doesn't touch system catalogs if the transaction is already
aborted. This optimization enhances logical decoding performance,
especially for large transactions that have already been rolled back,
as it avoids unnecessary disk or network I/O.

That first sentence seems confusing. How about:

This commit adds a CLOG lookup to check the transaction status,
allowing logical decoding to skip changes for non-system catalogs if
the transaction is already aborted.

On Tue, Jan 14, 2025 at 5:56 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jan 6, 2025 at 5:52 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Hi Sawada-San.
> >
> > Here are some review comments for the patch v12-0001.
>
> Thank you for reviewing the patch!
>
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > ReorderBufferCheckAndTruncateAbortedTXN:
> >
> > ~~~
> >
> > 3.
> > + if (TransactionIdDidCommit(txn->xid))
> > + {
> > + /*
> > + * Remember the transaction is committed so that we can skip CLOG
> > + * check next time, avoiding the pressure on CLOG lookup.
> > + */
> > + Assert(!rbtxn_is_aborted(txn));
> > + txn->txn_flags |= RBTXN_IS_COMMITTED;
> > + return false;
> > + }
> > +
> > + /*
> > + * The transaction aborted. We discard the changes we've collected so far
> > + * and toast reconstruction data. The full cleanup will happen as part of
> > + * decoding ABORT record of this transaction.
> > + */
> > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> > + ReorderBufferToastReset(rb, txn);
> > +
> > + /* All changes should be discarded */
> > + Assert(txn->size == 0);
> > +
> > + /*
> > + * Mark the transaction as aborted so we can ignore future changes of this
> > + * transaction.
> > + */
> > + Assert(!rbtxn_is_committed(txn));
> > + txn->txn_flags |= RBTXN_IS_ABORTED;
> > +
> > + return true;
> > +}
> >
> > 3a.
> > That whole last part related to "The transaction aborted", might be
> > clearer if the whole chunk of code was in an 'else' block from the
> > previous "if (TransactionIdDidCommit(txn->xid))".
>
> I'm not sure it increases the readability. I think it pretty makes
> sense to me that we return false in the 'if
> (TransactionIdDidCommit(txn->xid))' block. If we add the 'else' block,
> the reader might be confused as we have the 'else' block in spite of
> having the return in the 'if' block. We can add a local variable for
> the result and return it at the end of the function but I'm not sure
> it's a good idea to increase the readability.
>

2.
I think adding a local variable is overkill but OTOH introducing
“else” clarifies that the following code can only be reached when the
transaction is aborted. E.g. You don’t even need to read the previous
code block and see the “return false” to know that. Anyway, it’s
probably just a personal preference.

> > 3c.
> > The "and toast reconstruction data" seems to be missing a word/s. (??)
> > - "... and also discard TOAST reconstruction data"
> > - "... and reset TOAST reconstruction data"
>
> I don't understand this comment. What words are you suggesting adding
> to these sentences?
>

3.
I meant something like:

BEFORE
We discard the changes we've collected so far and toast reconstruction data.

SUGGESTION
We discard both the changes collected so far and the TOAST reconstruction data.

======
src/include/replication/reorderbuffer.h

4.
-/* Has this transaction been prepared? */
+/*
+ * Is this transaction a prepared transaction?
+ *
+ * Being true means that this transaction should be prepared instead of
+ * committed. To check whether a prepare or a stream_prepare has already
+ * been sent for this transaction, we need to use rbtxn_sent_prepare().
+ */

/Is this transaction a prepared transaction?/Is this a prepared transaction?/

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-14 02:01:46
Message-ID:	CAHut+PuoAeCJgNxGp6FWvg+JG41O9zLxqQ7Tv+8+bKf2d_pTDw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Sawada-San.

Some review comments for patch v13-0002.

======

I think the v12 ambiguity of RBTXN_PREPARE versus RBTXN_SENT_PREPARE
was mostly addressed already by the improved comments for the macros
in patch 0001.

Meanwhile, patch v13-0002 says it is renaming constants for better
consistency, but I don't think it went far enough.

For example, better name consistency would be achieved by changing
*all* of the constants related to prepared transactions:

#define RBTXN_IS_PREPARED 0x0040
#define RBTXN_IS_PREPARED_SKIPPED 0x0080
#define RBTXN_IS_PREPARED_SENT 0x0200

where:

RBTXN_IS_PREPARED. This means it's a prepared transaction. (but we
can't tell from this if it is skipped or sent).

RBTXN_IS_PREPARED_SKIPPED. This means it's a prepared transaction
(RBTXN_IS_PREPARED) and it's being skipped.

RBTXN_IS_PREPARED_SENT. This means it's a prepared transaction
(RBTXN_IS_PREPARED) and we've sent it.

A note about RBTXN_IS_PREPARED. Since all of these constants are
clearly about transactions (e.g. "TXN" in prefix "RBTXN_"), I felt
patch 0002 calling this RBTXN_IS_PREPARED_TXN just seemed like adding
a redundant _TXN. e.g. we don't say RBTXN_IS_COMMITTED_TXN etc.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-14 04:47:52
Message-ID:	CAA4eK1KgNmBsG=155E7QQ6TX9RoWnM4z5Z20SvsbwxSe_QXYsg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 14, 2025 at 7:32 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawada-San.
>
> Some review comments for patch v13-0002.
>
> ======
>
> I think the v12 ambiguity of RBTXN_PREPARE versus RBTXN_SENT_PREPARE
> was mostly addressed already by the improved comments for the macros
> in patch 0001.
>
> Meanwhile, patch v13-0002 says it is renaming constants for better
> consistency, but I don't think it went far enough.
>
> For example, better name consistency would be achieved by changing
> *all* of the constants related to prepared transactions:
>
> #define RBTXN_IS_PREPARED 0x0040
> #define RBTXN_IS_PREPARED_SKIPPED 0x0080
> #define RBTXN_IS_PREPARED_SENT 0x0200
>
> where:
>
> RBTXN_IS_PREPARED. This means it's a prepared transaction. (but we
> can't tell from this if it is skipped or sent).
>
> RBTXN_IS_PREPARED_SKIPPED. This means it's a prepared transaction
> (RBTXN_IS_PREPARED) and it's being skipped.
>
> RBTXN_IS_PREPARED_SENT. This means it's a prepared transaction
> (RBTXN_IS_PREPARED) and we've sent it.
>

The first one (RBTXN_IS_PREPARED) sounds like an improvement over what
we have now. I am not convinced about the other two.

> ~
>
> A note about RBTXN_IS_PREPARED. Since all of these constants are
> clearly about transactions (e.g. "TXN" in prefix "RBTXN_"), I felt
> patch 0002 calling this RBTXN_IS_PREPARED_TXN just seemed like adding
> a redundant _TXN. e.g. we don't say RBTXN_IS_COMMITTED_TXN etc.
>

+1. I felt the same.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-14 20:02:36
Message-ID:	CAD21AoAn8Q8NkkYSbd1JQT+0qPsrQcfXzJczDDbMUeUwCzKRag@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 13, 2025 at 5:36 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Sawada-San. Here are some cosmetic review comments for the patch v13-0001.

Thank you for reviewing the patch.

>
> ======
> Commit message
>
> 1.
> This commit introduces an additional CLOG lookup to check the
> transaction status, so the logical decoding skips further change also
> when it doesn't touch system catalogs if the transaction is already
> aborted. This optimization enhances logical decoding performance,
> especially for large transactions that have already been rolled back,
> as it avoids unnecessary disk or network I/O.
>
> ~
>
> That first sentence seems confusing. How about:
>
> This commit adds a CLOG lookup to check the transaction status,
> allowing logical decoding to skip changes for non-system catalogs if
> the transaction is already aborted.

I'm concerned that the proposed sentence doesn't explain the change
enough. I think that what we need to mention in the commit message is
that we will have more opportunities to check the transaction aborts
in addition to when touching system catalogs while replaying a
transaction in streaming mode.

>
> On Tue, Jan 14, 2025 at 5:56 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Jan 6, 2025 at 5:52 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > Hi Sawada-San.
> > >
> > > Here are some review comments for the patch v12-0001.
> >
> > Thank you for reviewing the patch!
> >
> > >
> > > ======
> > > .../replication/logical/reorderbuffer.c
> > >
> > > ReorderBufferCheckAndTruncateAbortedTXN:
> > >
> > > ~~~
> > >
> > > 3.
> > > + if (TransactionIdDidCommit(txn->xid))
> > > + {
> > > + /*
> > > + * Remember the transaction is committed so that we can skip CLOG
> > > + * check next time, avoiding the pressure on CLOG lookup.
> > > + */
> > > + Assert(!rbtxn_is_aborted(txn));
> > > + txn->txn_flags |= RBTXN_IS_COMMITTED;
> > > + return false;
> > > + }
> > > +
> > > + /*
> > > + * The transaction aborted. We discard the changes we've collected so far
> > > + * and toast reconstruction data. The full cleanup will happen as part of
> > > + * decoding ABORT record of this transaction.
> > > + */
> > > + ReorderBufferTruncateTXN(rb, txn, rbtxn_prepared(txn));
> > > + ReorderBufferToastReset(rb, txn);
> > > +
> > > + /* All changes should be discarded */
> > > + Assert(txn->size == 0);
> > > +
> > > + /*
> > > + * Mark the transaction as aborted so we can ignore future changes of this
> > > + * transaction.
> > > + */
> > > + Assert(!rbtxn_is_committed(txn));
> > > + txn->txn_flags |= RBTXN_IS_ABORTED;
> > > +
> > > + return true;
> > > +}
> > >
> > > 3a.
> > > That whole last part related to "The transaction aborted", might be
> > > clearer if the whole chunk of code was in an 'else' block from the
> > > previous "if (TransactionIdDidCommit(txn->xid))".
> >
> > I'm not sure it increases the readability. I think it pretty makes
> > sense to me that we return false in the 'if
> > (TransactionIdDidCommit(txn->xid))' block. If we add the 'else' block,
> > the reader might be confused as we have the 'else' block in spite of
> > having the return in the 'if' block. We can add a local variable for
> > the result and return it at the end of the function but I'm not sure
> > it's a good idea to increase the readability.
> >
>
> 2.
> I think adding a local variable is overkill but OTOH introducing
> “else” clarifies that the following code can only be reached when the
> transaction is aborted. E.g. You don’t even need to read the previous
> code block and see the “return false” to know that. Anyway, it’s
> probably just a personal preference.

I prefer to reduce blocks where possible.

>
> > > 3c.
> > > The "and toast reconstruction data" seems to be missing a word/s. (??)
> > > - "... and also discard TOAST reconstruction data"
> > > - "... and reset TOAST reconstruction data"
> >
> > I don't understand this comment. What words are you suggesting adding
> > to these sentences?
> >
>
> 3.
> I meant something like:
>
> BEFORE
> We discard the changes we've collected so far and toast reconstruction data.
>
> SUGGESTION
> We discard both the changes collected so far and the TOAST reconstruction data.
>

Thanks, fixed.

> ======
> src/include/replication/reorderbuffer.h
>
> 4.
> -/* Has this transaction been prepared? */
> +/*
> + * Is this transaction a prepared transaction?
> + *
> + * Being true means that this transaction should be prepared instead of
> + * committed. To check whether a prepare or a stream_prepare has already
> + * been sent for this transaction, we need to use rbtxn_sent_prepare().
> + */
>
> /Is this transaction a prepared transaction?/Is this a prepared transaction?/
>

Fixed.

I've attached the updated patch (only 0001 patch). I'll submit the
updated patch for 0002 patch once we get consensus on names.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v14-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	21.7 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-14 21:40:44
Message-ID:	CAD21AoDgggHQOqno8_+66wdss1uZc7Zoif1O0Ve_iLEyF5-LmQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 13, 2025 at 8:48 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jan 14, 2025 at 7:32 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Hi Sawada-San.
> >
> > Some review comments for patch v13-0002.
> >
> > ======
> >
> > I think the v12 ambiguity of RBTXN_PREPARE versus RBTXN_SENT_PREPARE
> > was mostly addressed already by the improved comments for the macros
> > in patch 0001.
> >
> > Meanwhile, patch v13-0002 says it is renaming constants for better
> > consistency, but I don't think it went far enough.
> >
> > For example, better name consistency would be achieved by changing
> > *all* of the constants related to prepared transactions:
> >
> > #define RBTXN_IS_PREPARED 0x0040
> > #define RBTXN_IS_PREPARED_SKIPPED 0x0080
> > #define RBTXN_IS_PREPARED_SENT 0x0200
> >
> > where:
> >
> > RBTXN_IS_PREPARED. This means it's a prepared transaction. (but we
> > can't tell from this if it is skipped or sent).
> >
> > RBTXN_IS_PREPARED_SKIPPED. This means it's a prepared transaction
> > (RBTXN_IS_PREPARED) and it's being skipped.
> >
> > RBTXN_IS_PREPARED_SENT. This means it's a prepared transaction
> > (RBTXN_IS_PREPARED) and we've sent it.
> >
>
> The first one (RBTXN_IS_PREPARED) sounds like an improvement over what
> we have now. I am not convinced about the other two.

I agree with the above usage; it's more consistent to set
RBTXN_IS_PREPARED also for a skipped prepared transaction. But I'm not
sure it's better to have the RBTXN_IS_PREPARED prefix for all
constants.

> > ~
> >
> > A note about RBTXN_IS_PREPARED. Since all of these constants are
> > clearly about transactions (e.g. "TXN" in prefix "RBTXN_"), I felt
> > patch 0002 calling this RBTXN_IS_PREPARED_TXN just seemed like adding
> > a redundant _TXN. e.g. we don't say RBTXN_IS_COMMITTED_TXN etc.
> >
>
> +1. I felt the same.

I followed RBTXN_IS_SUBXACT (I think TXN and XACT have the same
meaning) but that's a fair point.

It seems we agreed on RBTXN_IS_PREPARED and rbtxn_is_prepared().
Adding 'IS' seems to clarify the transaction having this flag *is* a
prepared transaction. Both other two constants RBTXN_SENT_PREAPRE and
RBTXN_SKIPPED_PREPARE seem not bad to me. I find that the proposed
names don't increase the consistency much. Thoughts?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-15 06:49:31
Message-ID:	CAA4eK1K53eu7wdhHANPkVzidrUPVVJTbqHHsNz5sUhzRBYZfYw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 15, 2025 at 3:11 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> It seems we agreed on RBTXN_IS_PREPARED and rbtxn_is_prepared().
> Adding 'IS' seems to clarify the transaction having this flag *is* a
> prepared transaction. Both other two constants RBTXN_SENT_PREAPRE and
> RBTXN_SKIPPED_PREPARE seem not bad to me.
>

Agreed.

> I find that the proposed
> names don't increase the consistency much. Thoughts?
>

I also think so.

--
With Regards,
Amit Kapila.

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-16 00:43:28
Message-ID:	CAHut+PumfXqbwZuLaX_JK28KnVM_twGQEnMJp-Sa=Cdn0QJe6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 15, 2025 at 5:49 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jan 15, 2025 at 3:11 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > It seems we agreed on RBTXN_IS_PREPARED and rbtxn_is_prepared().
> > Adding 'IS' seems to clarify the transaction having this flag *is* a
> > prepared transaction. Both other two constants RBTXN_SENT_PREAPRE and
> > RBTXN_SKIPPED_PREPARE seem not bad to me.
> >
>
> Agreed.
>
> > I find that the proposed
> > names don't increase the consistency much. Thoughts?
> >
>
> I also think so.
>

My thoughts are that any consistency improvement is a step in the
right direction so even "don't increase the consistency much" is still
better than nothing.

But if I am outvoted that's OK. It is not a big deal.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-17 17:48:44
Message-ID:	CAD21AoBGkki5ktmxO5yEEWO9ckHVbTRYiwRgc5fayFZ1PfoNBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 15, 2025 at 4:43 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Wed, Jan 15, 2025 at 5:49 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 15, 2025 at 3:11 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > It seems we agreed on RBTXN_IS_PREPARED and rbtxn_is_prepared().
> > > Adding 'IS' seems to clarify the transaction having this flag *is* a
> > > prepared transaction. Both other two constants RBTXN_SENT_PREAPRE and
> > > RBTXN_SKIPPED_PREPARE seem not bad to me.
> > >
> >
> > Agreed.
> >
> > > I find that the proposed
> > > names don't increase the consistency much. Thoughts?
> > >
> >
> > I also think so.
> >
>
> My thoughts are that any consistency improvement is a step in the
> right direction so even "don't increase the consistency much" is still
> better than nothing.

I agree that doing something is better than nothing. The proposed
idea, having RBTXN_IS_PREPARED prefix for all related flags, improves
the consistency in terms of names, but I'm not sure this is the right
direction. For example, RBTXN_IS_PREPARED_SKIPPED is quite confusing
to me. I think this name implies "this is a prepared transaction but
is skipped", but I don't think it conveys the meaning well. In
addition to that, if we add RBTXN_IS_PREPARED flag also for skipped
prepared transactions, we would end up with doing like:

txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_IS_PREPARED_SKIPPED);

Which seems quite redundant. It makes more sense to me to do like:

txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);

I'd like to avoid a situation like where we rename these names just
for better consistency in terms of names and later rename them to
better names for other reasons again and again.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-20 03:52:48
Message-ID:	CAA4eK1KDg-uwZMQ2_dj1JRBM=j_xCCuC6jN1JtedK8R2cYF3FA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 17, 2025 at 11:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jan 15, 2025 at 4:43 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > My thoughts are that any consistency improvement is a step in the
> > right direction so even "don't increase the consistency much" is still
> > better than nothing.
>
> I agree that doing something is better than nothing. The proposed
> idea, having RBTXN_IS_PREPARED prefix for all related flags, improves
> the consistency in terms of names, but I'm not sure this is the right
> direction. For example, RBTXN_IS_PREPARED_SKIPPED is quite confusing
> to me. I think this name implies "this is a prepared transaction but
> is skipped", but I don't think it conveys the meaning well. In
> addition to that, if we add RBTXN_IS_PREPARED flag also for skipped
> prepared transactions, we would end up with doing like:
>
> txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_IS_PREPARED_SKIPPED);
>
> Which seems quite redundant. It makes more sense to me to do like:
>
> txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
>
> I'd like to avoid a situation like where we rename these names just
> for better consistency in terms of names and later rename them to
> better names for other reasons again and again.
>

Sounds reasonable. We agree with just changing RBTXN_PREPARE to
RBTXN_IS_PREPARED and its corresponding macro. The next step is to
update the patch to reflect the same.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-21 18:36:11
Message-ID:	CAD21AoC2yf2i9PNfkGt2mdbjTvPcVta0-1cC5G7kuq=y5UjqQQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 19, 2025 at 7:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jan 17, 2025 at 11:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 15, 2025 at 4:43 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > My thoughts are that any consistency improvement is a step in the
> > > right direction so even "don't increase the consistency much" is still
> > > better than nothing.
> >
> > I agree that doing something is better than nothing. The proposed
> > idea, having RBTXN_IS_PREPARED prefix for all related flags, improves
> > the consistency in terms of names, but I'm not sure this is the right
> > direction. For example, RBTXN_IS_PREPARED_SKIPPED is quite confusing
> > to me. I think this name implies "this is a prepared transaction but
> > is skipped", but I don't think it conveys the meaning well. In
> > addition to that, if we add RBTXN_IS_PREPARED flag also for skipped
> > prepared transactions, we would end up with doing like:
> >
> > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_IS_PREPARED_SKIPPED);
> >
> > Which seems quite redundant. It makes more sense to me to do like:
> >
> > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
> >
> > I'd like to avoid a situation like where we rename these names just
> > for better consistency in terms of names and later rename them to
> > better names for other reasons again and again.
> >
>
> Sounds reasonable. We agree with just changing RBTXN_PREPARE to
> RBTXN_IS_PREPARED and its corresponding macro. The next step is to
> update the patch to reflect the same.

Right. I've attached the updated patches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v15-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	21.7 KB
v15-0002-Rename-RBTXN_PREPARE-to-RBTXN_IS_PREPARE-for-bet.patch	application/octet-stream	7.8 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-22 03:51:26
Message-ID:	CAHut+Pt4FniL1Lpve-jLGcFBExFdp8+eZBZH9x=ZPCpTQnc4Hg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 22, 2025 at 5:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Jan 19, 2025 at 7:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jan 17, 2025 at 11:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jan 15, 2025 at 4:43 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > My thoughts are that any consistency improvement is a step in the
> > > > right direction so even "don't increase the consistency much" is still
> > > > better than nothing.
> > >
> > > I agree that doing something is better than nothing. The proposed
> > > idea, having RBTXN_IS_PREPARED prefix for all related flags, improves
> > > the consistency in terms of names, but I'm not sure this is the right
> > > direction. For example, RBTXN_IS_PREPARED_SKIPPED is quite confusing
> > > to me. I think this name implies "this is a prepared transaction but
> > > is skipped", but I don't think it conveys the meaning well. In
> > > addition to that, if we add RBTXN_IS_PREPARED flag also for skipped
> > > prepared transactions, we would end up with doing like:
> > >
> > > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_IS_PREPARED_SKIPPED);
> > >
> > > Which seems quite redundant. It makes more sense to me to do like:
> > >
> > > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
> > >
> > > I'd like to avoid a situation like where we rename these names just
> > > for better consistency in terms of names and later rename them to
> > > better names for other reasons again and again.
> > >
> >
> > Sounds reasonable. We agree with just changing RBTXN_PREPARE to
> > RBTXN_IS_PREPARED and its corresponding macro. The next step is to
> > update the patch to reflect the same.
>
> Right. I've attached the updated patches.
>

Some review comments for v15-0002.

======
Commit message

typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/

======

I'm not trying to be pedantic, but there seems to be something strange
about the combination usage of these PREPARE constants, which raises
lots of questions for me...

For example.
I had thought RBTXN_SKIPPED_PREPARE meant it is a prepared tx AND it is skipped
I had thought RBTXN_SENT_PREPARE meant it is a prepared tx AND it is sent

So I was surprised that the patch makes this change:
- txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
+ txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);

because, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* mean it
is a prepared transaction then why does that constant even have
"PREPARE" in its name at all instead of just being called
RBTXN_SKIPPED?

e.g., either of these makes sense to me:
txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED);
txn->txn_flags |= RBTXN_SKIPPED_PREPARE;

But this combination seemed odd:
txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);

Also, this code (below) seems to be treating those macros as
unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
possible unless rbtxn_is_prepared(txn) is true.

- if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
+ if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
continue;

Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
be a prepared transaction, then why aren't the macros changed to match
that interpretation?

e.g.

/* prepare for this transaction skipped? */
#define rbtxn_skip_prepared(txn) \
( \
((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
)

/* Has a prepare or stream_prepare already been sent? */
#define rbtxn_sent_prepare(txn) \
( \
((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
)

~~~

I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
constants, removing the ambiguity about how exactly to interpret those
two constants.

e.g. something like

#define RBTXN_IS_PREPARED 0x0040
#define RBTXN_SKIPPED_PREPARE (0x0080 | RBTXN_IS_PREPARED)
#define RBTXN_SENT_PREPARE (0x0200 | RBTXN_IS_PREPARED)

and make appropriate macro changes

e.g.

/* prepare for this transaction skipped? */
#define rbtxn_skip_prepared(txn) \
( \
((txn)->txn_flags & RBTXN_SKIPPED_PREPARE == RBTXN_SKIPPED_PREPARE) \
)

/* Has a prepare or stream_prepare already been sent? */
#define rbtxn_sent_prepare(txn) \
( \
((txn)->txn_flags & RBTXN_SENT_PREPARE == RBTXN_SENT_PREPARE) \
)

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-23 03:17:14
Message-ID:	CAA4eK1+Q0UviNCSo2JMGwTSvGgBwhO2VHwK1989GuQChA3cJag@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Wed, Jan 22, 2025 at 5:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Jan 19, 2025 at 7:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jan 17, 2025 at 11:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jan 15, 2025 at 4:43 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > >
> > > > > My thoughts are that any consistency improvement is a step in the
> > > > > right direction so even "don't increase the consistency much" is still
> > > > > better than nothing.
> > > >
> > > > I agree that doing something is better than nothing. The proposed
> > > > idea, having RBTXN_IS_PREPARED prefix for all related flags, improves
> > > > the consistency in terms of names, but I'm not sure this is the right
> > > > direction. For example, RBTXN_IS_PREPARED_SKIPPED is quite confusing
> > > > to me. I think this name implies "this is a prepared transaction but
> > > > is skipped", but I don't think it conveys the meaning well. In
> > > > addition to that, if we add RBTXN_IS_PREPARED flag also for skipped
> > > > prepared transactions, we would end up with doing like:
> > > >
> > > > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_IS_PREPARED_SKIPPED);
> > > >
> > > > Which seems quite redundant. It makes more sense to me to do like:
> > > >
> > > > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
> > > >
> > > > I'd like to avoid a situation like where we rename these names just
> > > > for better consistency in terms of names and later rename them to
> > > > better names for other reasons again and again.
> > > >
> > >
> > > Sounds reasonable. We agree with just changing RBTXN_PREPARE to
> > > RBTXN_IS_PREPARED and its corresponding macro. The next step is to
> > > update the patch to reflect the same.
> >
> > Right. I've attached the updated patches.
> >
>
> Some review comments for v15-0002.
>
> ======
> Commit message
>
> typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
>
> ======
>
> I'm not trying to be pedantic, but there seems to be something strange
> about the combination usage of these PREPARE constants, which raises
> lots of questions for me...
>
> For example.
> I had thought RBTXN_SKIPPED_PREPARE meant it is a prepared tx AND it is skipped
> I had thought RBTXN_SENT_PREPARE meant it is a prepared tx AND it is sent
>
> So I was surprised that the patch makes this change:
> - txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
> + txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
>
> because, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* mean it
> is a prepared transaction then why does that constant even have
> "PREPARE" in its name at all instead of just being called
> RBTXN_SKIPPED?
>
> e.g., either of these makes sense to me:
> txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED);
> txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
>
> But this combination seemed odd:
> txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
>
> Also, this code (below) seems to be treating those macros as
> unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> possible unless rbtxn_is_prepared(txn) is true.
>
> - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> continue;
>
> ~~
>
> Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> be a prepared transaction, then why aren't the macros changed to match
> that interpretation?
>
> e.g.
>
> /* prepare for this transaction skipped? */
> #define rbtxn_skip_prepared(txn) \
> ( \
> ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> )
>
> /* Has a prepare or stream_prepare already been sent? */
> #define rbtxn_sent_prepare(txn) \
> ( \
> ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> )
>
> ~~~
>
> I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> constants, removing the ambiguity about how exactly to interpret those
> two constants.
>
> e.g. something like
>
> #define RBTXN_IS_PREPARED 0x0040
> #define RBTXN_SKIPPED_PREPARE (0x0080 | RBTXN_IS_PREPARED)
> #define RBTXN_SENT_PREPARE (0x0200 | RBTXN_IS_PREPARED)
>

I think the better way would be to ensure that where we set
RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
prepared one (RBTXN_IS_PREPARED must be already set). It should be
already the case for RBTXN_SENT_PREPARE but we can ensure the same for
RBTXN_SKIPPED_PREPARE as well.

Will that address your concern? Does anyone else have an opinion on this matter?

--
With Regards,
Amit Kapila.

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-23 03:35:06
Message-ID:	CAHut+PvPs15FrdFqceOgdMpqgS2HG1yjs_LSdBLXSXx8HmSNUw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 22, 2025 at 5:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Sun, Jan 19, 2025 at 7:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Jan 17, 2025 at 11:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Wed, Jan 15, 2025 at 4:43 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > My thoughts are that any consistency improvement is a step in the
> > > > > > right direction so even "don't increase the consistency much" is still
> > > > > > better than nothing.
> > > > >
> > > > > I agree that doing something is better than nothing. The proposed
> > > > > idea, having RBTXN_IS_PREPARED prefix for all related flags, improves
> > > > > the consistency in terms of names, but I'm not sure this is the right
> > > > > direction. For example, RBTXN_IS_PREPARED_SKIPPED is quite confusing
> > > > > to me. I think this name implies "this is a prepared transaction but
> > > > > is skipped", but I don't think it conveys the meaning well. In
> > > > > addition to that, if we add RBTXN_IS_PREPARED flag also for skipped
> > > > > prepared transactions, we would end up with doing like:
> > > > >
> > > > > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_IS_PREPARED_SKIPPED);
> > > > >
> > > > > Which seems quite redundant. It makes more sense to me to do like:
> > > > >
> > > > > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
> > > > >
> > > > > I'd like to avoid a situation like where we rename these names just
> > > > > for better consistency in terms of names and later rename them to
> > > > > better names for other reasons again and again.
> > > > >
> > > >
> > > > Sounds reasonable. We agree with just changing RBTXN_PREPARE to
> > > > RBTXN_IS_PREPARED and its corresponding macro. The next step is to
> > > > update the patch to reflect the same.
> > >
> > > Right. I've attached the updated patches.
> > >
> >
> > Some review comments for v15-0002.
> >
> > ======
> > Commit message
> >
> > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> >
> > ======
> >
> > I'm not trying to be pedantic, but there seems to be something strange
> > about the combination usage of these PREPARE constants, which raises
> > lots of questions for me...
> >
> > For example.
> > I had thought RBTXN_SKIPPED_PREPARE meant it is a prepared tx AND it is skipped
> > I had thought RBTXN_SENT_PREPARE meant it is a prepared tx AND it is sent
> >
> > So I was surprised that the patch makes this change:
> > - txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
> > + txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
> >
> > because, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* mean it
> > is a prepared transaction then why does that constant even have
> > "PREPARE" in its name at all instead of just being called
> > RBTXN_SKIPPED?
> >
> > e.g., either of these makes sense to me:
> > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED);
> > txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
> >
> > But this combination seemed odd:
> > txn->txn_flags |= (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE);
> >
> > Also, this code (below) seems to be treating those macros as
> > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > possible unless rbtxn_is_prepared(txn) is true.
> >
> > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > continue;
> >
> > ~~
> >
> > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> > be a prepared transaction, then why aren't the macros changed to match
> > that interpretation?
> >
> > e.g.
> >
> > /* prepare for this transaction skipped? */
> > #define rbtxn_skip_prepared(txn) \
> > ( \
> > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > )
> >
> > /* Has a prepare or stream_prepare already been sent? */
> > #define rbtxn_sent_prepare(txn) \
> > ( \
> > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > )
> >
> > ~~~
> >
> > I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> > bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> > constants, removing the ambiguity about how exactly to interpret those
> > two constants.
> >
> > e.g. something like
> >
> > #define RBTXN_IS_PREPARED 0x0040
> > #define RBTXN_SKIPPED_PREPARE (0x0080 | RBTXN_IS_PREPARED)
> > #define RBTXN_SENT_PREPARE (0x0200 | RBTXN_IS_PREPARED)
> >
>
> I think the better way would be to ensure that where we set
> RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> prepared one (RBTXN_IS_PREPARED must be already set). It should be
> already the case for RBTXN_SENT_PREPARE but we can ensure the same for
> RBTXN_SKIPPED_PREPARE as well.
>
> Will that address your concern? Does anyone else have an opinion on this matter?

Yes that would be OK, but should also add some clarifying comments in
the "reorderbuffer.h" like:

#define RBTXN_SKIPPED_PREPARE 0x0080 /* this flag can only be set
for RBTXN_IS_PREPARED transactions */
#define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
RBTXN_IS_PREPARED transactions */

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-23 19:07:41
Message-ID:	CAD21AoCCkb9Hu41nz_atRcxwt-Mea6WJ6erNVwiDxXNnp_ROxw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 22, 2025 at 7:35 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > >
> > > ======
> > > Commit message
> > >
> > > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> > >

Will fix.

> > >
> > > Also, this code (below) seems to be treating those macros as
> > > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > > possible unless rbtxn_is_prepared(txn) is true.
> > >
> > > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > > continue;

Right. We no longer need to check rbtxn_skip_prepared() here.

> > >
> > > ~~
> > >
> > > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> > > be a prepared transaction, then why aren't the macros changed to match
> > > that interpretation?
> > >
> > > e.g.
> > >
> > > /* prepare for this transaction skipped? */
> > > #define rbtxn_skip_prepared(txn) \
> > > ( \
> > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > > )
> > >
> > > /* Has a prepare or stream_prepare already been sent? */
> > > #define rbtxn_sent_prepare(txn) \
> > > ( \
> > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > > )
> > >
> > > ~~~
> > >
> > > I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> > > bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> > > constants, removing the ambiguity about how exactly to interpret those
> > > two constants.
> > >
> > > e.g. something like
> > >
> > > #define RBTXN_IS_PREPARED 0x0040
> > > #define RBTXN_SKIPPED_PREPARE (0x0080 | RBTXN_IS_PREPARED)
> > > #define RBTXN_SENT_PREPARE (0x0200 | RBTXN_IS_PREPARED)
> > >
> >
> > I think the better way would be to ensure that where we set
> > RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> > prepared one (RBTXN_IS_PREPARED must be already set). It should be
> > already the case for RBTXN_SENT_PREPARE but we can ensure the same for
> > RBTXN_SKIPPED_PREPARE as well.

Since the patch already does "txn->txn_flags |= (RBTXN_IS_PREPARED |
RBTXN_SKIPPED_PREPARE);", it's already ensured, no?

I think we need to add both flags in ReorderBufferSkipPrepare(),
because there is a case where a transaction might not be marked as
RBTXN_IS_PREPARED here.

> >
> > Will that address your concern? Does anyone else have an opinion on this matter?
>
> Yes that would be OK, but should also add some clarifying comments in
> the "reorderbuffer.h" like:
>
> #define RBTXN_SKIPPED_PREPARE 0x0080 /* this flag can only be set
> for RBTXN_IS_PREPARED transactions */
> #define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
> RBTXN_IS_PREPARED transactions */

I think the same is true for RBTXN_IS_SERIALIZED and
RBTXN_IS_SERIALIZED_CLEAR; RBTXN_IS_SERIALIZED_CLEAR can only be set
for RBTXN_IS_SERIALIZED transaction. Should we add some comments to
them too? But I'm concerned about having too much explanation if we
add descriptions to flags too while already having comments for
corresponding macros.

Another way to ensure that is to convert these macros to inline
functions and add an Assert() there, but it seems overkill.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-27 06:26:19
Message-ID:	CAA4eK1Ljpx3qsfF6_YxsqDqNEp7hMcc0QxFoqFa1_sfsC7zK6g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 24, 2025 at 12:38 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jan 22, 2025 at 7:35 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > ======
> > > > Commit message
> > > >
> > > > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> > > >
>
> Will fix.
>
> > > >
> > > > Also, this code (below) seems to be treating those macros as
> > > > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > > > possible unless rbtxn_is_prepared(txn) is true.
> > > >
> > > > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > continue;
>
> Right. We no longer need to check rbtxn_skip_prepared() here.
>
> > > >
> > > > ~~
> > > >
> > > > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> > > > be a prepared transaction, then why aren't the macros changed to match
> > > > that interpretation?
> > > >
> > > > e.g.
> > > >
> > > > /* prepare for this transaction skipped? */
> > > > #define rbtxn_skip_prepared(txn) \
> > > > ( \
> > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > > > )
> > > >
> > > > /* Has a prepare or stream_prepare already been sent? */
> > > > #define rbtxn_sent_prepare(txn) \
> > > > ( \
> > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > > > )
> > > >
> > > > ~~~
> > > >
> > > > I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> > > > bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> > > > constants, removing the ambiguity about how exactly to interpret those
> > > > two constants.
> > > >
> > > > e.g. something like
> > > >
> > > > #define RBTXN_IS_PREPARED 0x0040
> > > > #define RBTXN_SKIPPED_PREPARE (0x0080 | RBTXN_IS_PREPARED)
> > > > #define RBTXN_SENT_PREPARE (0x0200 | RBTXN_IS_PREPARED)
> > > >
> > >
> > > I think the better way would be to ensure that where we set
> > > RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> > > prepared one (RBTXN_IS_PREPARED must be already set). It should be
> > > already the case for RBTXN_SENT_PREPARE but we can ensure the same for
> > > RBTXN_SKIPPED_PREPARE as well.
>
> Since the patch already does "txn->txn_flags |= (RBTXN_IS_PREPARED |
> RBTXN_SKIPPED_PREPARE);", it's already ensured, no?
>

I mean to say that we add assert to ensure the same.

> I think we need to add both flags in ReorderBufferSkipPrepare(),
> because there is a case where a transaction might not be marked as
> RBTXN_IS_PREPARED here.
>

Are you talking about the case when it is invoked from
DecodePrepare()? I thought we would set the flag in that code path.

> > >
> > > Will that address your concern? Does anyone else have an opinion on this matter?
> >
> > Yes that would be OK, but should also add some clarifying comments in
> > the "reorderbuffer.h" like:
> >
> > #define RBTXN_SKIPPED_PREPARE 0x0080 /* this flag can only be set
> > for RBTXN_IS_PREPARED transactions */
> > #define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
> > RBTXN_IS_PREPARED transactions */
>
> I think the same is true for RBTXN_IS_SERIALIZED and
> RBTXN_IS_SERIALIZED_CLEAR; RBTXN_IS_SERIALIZED_CLEAR can only be set
> for RBTXN_IS_SERIALIZED transaction. Should we add some comments to
> them too? But I'm concerned about having too much explanation if we
> add descriptions to flags too while already having comments for
> corresponding macros.
>

Yeah, I am fine either way especially, if we decide to add asserts for
RBTXN_IS_PREPARED when we set those flags.

> Another way to ensure that is to convert these macros to inline
> functions and add an Assert() there, but it seems overkill.
>

True, but that would ensure, we won't make any coding mistakes which
Peter wants to ensure by writing additional comments but asserting is
probably a better way.

--
With Regards,
Amit Kapila.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-27 17:30:51
Message-ID:	CAD21AoCMum7X6Z2xSTDxix7f5kS8MgCC3Dy68c0YdT7iDdq-PA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 26, 2025 at 10:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jan 24, 2025 at 12:38 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 22, 2025 at 7:35 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > >
> > > > >
> > > > > ======
> > > > > Commit message
> > > > >
> > > > > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> > > > >
> >
> > Will fix.
> >
> > > > >
> > > > > Also, this code (below) seems to be treating those macros as
> > > > > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > > > > possible unless rbtxn_is_prepared(txn) is true.
> > > > >
> > > > > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > continue;
> >
> > Right. We no longer need to check rbtxn_skip_prepared() here.
> >
> > > > >
> > > > > ~~
> > > > >
> > > > > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> > > > > be a prepared transaction, then why aren't the macros changed to match
> > > > > that interpretation?
> > > > >
> > > > > e.g.
> > > > >
> > > > > /* prepare for this transaction skipped? */
> > > > > #define rbtxn_skip_prepared(txn) \
> > > > > ( \
> > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > > > > )
> > > > >
> > > > > /* Has a prepare or stream_prepare already been sent? */
> > > > > #define rbtxn_sent_prepare(txn) \
> > > > > ( \
> > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > > > > )
> > > > >
> > > > > ~~~
> > > > >
> > > > > I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> > > > > bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> > > > > constants, removing the ambiguity about how exactly to interpret those
> > > > > two constants.
> > > > >
> > > > > e.g. something like
> > > > >
> > > > > #define RBTXN_IS_PREPARED 0x0040
> > > > > #define RBTXN_SKIPPED_PREPARE (0x0080 | RBTXN_IS_PREPARED)
> > > > > #define RBTXN_SENT_PREPARE (0x0200 | RBTXN_IS_PREPARED)
> > > > >
> > > >
> > > > I think the better way would be to ensure that where we set
> > > > RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> > > > prepared one (RBTXN_IS_PREPARED must be already set). It should be
> > > > already the case for RBTXN_SENT_PREPARE but we can ensure the same for
> > > > RBTXN_SKIPPED_PREPARE as well.
> >
> > Since the patch already does "txn->txn_flags |= (RBTXN_IS_PREPARED |
> > RBTXN_SKIPPED_PREPARE);", it's already ensured, no?
> >
>
> I mean to say that we add assert to ensure the same.
>
> > I think we need to add both flags in ReorderBufferSkipPrepare(),
> > because there is a case where a transaction might not be marked as
> > RBTXN_IS_PREPARED here.
> >
>
> Are you talking about the case when it is invoked from
> DecodePrepare()?

Yes. IIUC ReorderBufferSkipPrepare() is called only from DecodePrepare().

> I thought we would set the flag in that code path.

I agree that it makes sense to add the flag before calling
ReorderBufferSkipPrepare().

>
> > > >
> > > > Will that address your concern? Does anyone else have an opinion on this matter?
> > >
> > > Yes that would be OK, but should also add some clarifying comments in
> > > the "reorderbuffer.h" like:
> > >
> > > #define RBTXN_SKIPPED_PREPARE 0x0080 /* this flag can only be set
> > > for RBTXN_IS_PREPARED transactions */
> > > #define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
> > > RBTXN_IS_PREPARED transactions */
> >
> > I think the same is true for RBTXN_IS_SERIALIZED and
> > RBTXN_IS_SERIALIZED_CLEAR; RBTXN_IS_SERIALIZED_CLEAR can only be set
> > for RBTXN_IS_SERIALIZED transaction. Should we add some comments to
> > them too? But I'm concerned about having too much explanation if we
> > add descriptions to flags too while already having comments for
> > corresponding macros.
> >
>
> Yeah, I am fine either way especially, if we decide to add asserts for
> RBTXN_IS_PREPARED when we set those flags.
>
> > Another way to ensure that is to convert these macros to inline
> > functions and add an Assert() there, but it seems overkill.
> >
>
> True, but that would ensure, we won't make any coding mistakes which
> Peter wants to ensure by writing additional comments but asserting is
> probably a better way.

I've attached the updated patch. In the 0002 patch, I've marked the
transaction as a prepared transaction in
ReorderBufferRememberPrepareInfo() so that all prepared transactions
that have a ReordeBufferTXN entry at that time can be marked properly.
And I've put some Assertions to ensure that all prepared transaction
related flags have been set properly. Thoughts?

Nothing changed to the 0001 patch from the previous version.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v16-0002-Rename-RBTXN_PREPARE-to-RBTXN_IS_PREPARE-for-bet.patch	application/octet-stream	9.0 KB
v16-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	21.7 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-28 03:01:20
Message-ID:	CAHut+Ps_A8VEOm8VPTxu5dD-dV4WEncKzjdB5QrVt83xZaEB+w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2025 at 4:31 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Jan 26, 2025 at 10:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jan 24, 2025 at 12:38 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jan 22, 2025 at 7:35 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > > >
> > > > > >
> > > > > > ======
> > > > > > Commit message
> > > > > >
> > > > > > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> > > > > >
> > >
> > > Will fix.
> > >
> > > > > >
> > > > > > Also, this code (below) seems to be treating those macros as
> > > > > > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > > > > > possible unless rbtxn_is_prepared(txn) is true.
> > > > > >
> > > > > > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > > continue;
> > >
> > > Right. We no longer need to check rbtxn_skip_prepared() here.
> > >
> > > > > >
> > > > > > ~~
> > > > > >
> > > > > > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> > > > > > be a prepared transaction, then why aren't the macros changed to match
> > > > > > that interpretation?
> > > > > >
> > > > > > e.g.
> > > > > >
> > > > > > /* prepare for this transaction skipped? */
> > > > > > #define rbtxn_skip_prepared(txn) \
> > > > > > ( \
> > > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > > > > > )
> > > > > >
> > > > > > /* Has a prepare or stream_prepare already been sent? */
> > > > > > #define rbtxn_sent_prepare(txn) \
> > > > > > ( \
> > > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > > > > > )
> > > > > >
> > > > > > ~~~
> > > > > >
> > > > > > I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> > > > > > bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> > > > > > constants, removing the ambiguity about how exactly to interpret those
> > > > > > two constants.
> > > > > >
> > > > > > e.g. something like
> > > > > >
> > > > > > #define RBTXN_IS_PREPARED 0x0040
> > > > > > #define RBTXN_SKIPPED_PREPARE (0x0080 | RBTXN_IS_PREPARED)
> > > > > > #define RBTXN_SENT_PREPARE (0x0200 | RBTXN_IS_PREPARED)
> > > > > >
> > > > >
> > > > > I think the better way would be to ensure that where we set
> > > > > RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> > > > > prepared one (RBTXN_IS_PREPARED must be already set). It should be
> > > > > already the case for RBTXN_SENT_PREPARE but we can ensure the same for
> > > > > RBTXN_SKIPPED_PREPARE as well.
> > >
> > > Since the patch already does "txn->txn_flags |= (RBTXN_IS_PREPARED |
> > > RBTXN_SKIPPED_PREPARE);", it's already ensured, no?
> > >
> >
> > I mean to say that we add assert to ensure the same.
> >
> > > I think we need to add both flags in ReorderBufferSkipPrepare(),
> > > because there is a case where a transaction might not be marked as
> > > RBTXN_IS_PREPARED here.
> > >
> >
> > Are you talking about the case when it is invoked from
> > DecodePrepare()?
>
> Yes. IIUC ReorderBufferSkipPrepare() is called only from DecodePrepare().
>
> > I thought we would set the flag in that code path.
>
> I agree that it makes sense to add the flag before calling
> ReorderBufferSkipPrepare().
>
> >
> > > > >
> > > > > Will that address your concern? Does anyone else have an opinion on this matter?
> > > >
> > > > Yes that would be OK, but should also add some clarifying comments in
> > > > the "reorderbuffer.h" like:
> > > >
> > > > #define RBTXN_SKIPPED_PREPARE 0x0080 /* this flag can only be set
> > > > for RBTXN_IS_PREPARED transactions */
> > > > #define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
> > > > RBTXN_IS_PREPARED transactions */
> > >
> > > I think the same is true for RBTXN_IS_SERIALIZED and
> > > RBTXN_IS_SERIALIZED_CLEAR; RBTXN_IS_SERIALIZED_CLEAR can only be set
> > > for RBTXN_IS_SERIALIZED transaction. Should we add some comments to
> > > them too? But I'm concerned about having too much explanation if we
> > > add descriptions to flags too while already having comments for
> > > corresponding macros.

Hm That RBTXN_IS_SERIALIZED / RBTXN_IS_SERIALIZED_CLEAR is used
differently -- it seems more tricky because RBTXN_IS_SERIALIZED flag
is turned OFF again when RBTXN_IS_SERIALIZED_CLEAR is turned ON.
(Whereas setting SKIPPED_PREPARE and SENT_PREPARE will never turn off
the tx type IS_PREPARED)

To be honest, I didn't understand the "CLEAR" part of that name. It
seems more like it should've been called something like
RBTXN_IS_SERIALIZED_ALREADY or RBTXN_IS_SERIALIZED_PREVIOUSLY or
whatever instead of something that appears to be saying "has the
RBTXN_IS_SERIALIZED bitflag been cleared?" I understand the reluctance
to over-comment everything but OTOH currently there is no way really
to understand what these flags mean without looking through all the
code to try to figure them out from the usage.

My recurring gripe about these flags is simply that their meanings and
how to use them should be apparent just by looking at reorderbuffer.h
and not having to guess anything or look at how they get used in the
code. It doesn't matter if that is achieved by better constant names,
by more comments or by enhanced macros/functions with asserts but
currently just looking at that file still leaves the reader with lots
of unanswered questions.

> > >
> >
> > Yeah, I am fine either way especially, if we decide to add asserts for
> > RBTXN_IS_PREPARED when we set those flags.
> >
> > > Another way to ensure that is to convert these macros to inline
> > > functions and add an Assert() there, but it seems overkill.
> > >
> >
> > True, but that would ensure, we won't make any coding mistakes which
> > Peter wants to ensure by writing additional comments but asserting is
> > probably a better way.
>

Maybe I misunderstood, but I thought Amit's reply there meant that
rewriting the macros as inline functions with asserts would be a good
way to ensure no coding mistakes. Yet, the macros are still unchanged
in v16-0002.

> I've attached the updated patch. In the 0002 patch, I've marked the
> transaction as a prepared transaction in
> ReorderBufferRememberPrepareInfo() so that all prepared transactions
> that have a ReordeBufferTXN entry at that time can be marked properly.
> And I've put some Assertions to ensure that all prepared transaction
> related flags have been set properly. Thoughts?
>

Here are a couple of other review comments for patch v16-0002

======
Commit message

1.
The RBTXN_PREPARE flag (and its corresponding macro) have been renamed
to RBTXN_IS_PREPARE to explicitly indicate the transaction
type. Therefore, this commit also adds the RBTXN_IS_PREAPRE flag also
to the transaction that is a prepared transaction and has been
skipped, which previously had only the RBTXN_SKIPPED_PREPARE flag.

Instead of fixing the "RBTXN_IS_PREAPRE" typo, it looks like a new
problem (double "also" in the same sentence) was added in v16.

======
.../replication/logical/reorderbuffer.c

2.
if ((txn->final_lsn < two_phase_at) && is_commit)
{
- txn->txn_flags |= RBTXN_PREPARE;
+ txn->txn_flags |= RBTXN_IS_PREPARED;

Won't this flag be already this flag already set? The next code
comment ("The prepare info must have been updated ...") made me think
so.

But if it does need to be assigned here, then why are there not the
same assertions about existing IS_PREPARED, SKIPPED and SENT as they
were in the other place where this flag was set?

======
Kind Regards,
Peter Smith.
Fujitsu Australia.

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-28 10:25:27
Message-ID:	CAD21AoD7MNK+qNUFD3RqT7MhwNwYwEsAZbQQqRG8h9Zhqu7KoA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 27, 2025 at 7:01 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Tue, Jan 28, 2025 at 4:31 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Jan 26, 2025 at 10:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Jan 24, 2025 at 12:38 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jan 22, 2025 at 7:35 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Jan 23, 2025 at 2:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Wed, Jan 22, 2025 at 9:21 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > ======
> > > > > > > Commit message
> > > > > > >
> > > > > > > typo /RBTXN_IS_PREAPRE/RBTXN_IS_PREPARE/
> > > > > > >
> > > >
> > > > Will fix.
> > > >
> > > > > > >
> > > > > > > Also, this code (below) seems to be treating those macros as
> > > > > > > unrelated, but IIUC we know that rbtxn_skip_prepared(txn) is not
> > > > > > > possible unless rbtxn_is_prepared(txn) is true.
> > > > > > >
> > > > > > > - if (rbtxn_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > > > + if (rbtxn_is_prepared(txn) || rbtxn_skip_prepared(txn))
> > > > > > > continue;
> > > >
> > > > Right. We no longer need to check rbtxn_skip_prepared() here.
> > > >
> > > > > > >
> > > > > > > ~~
> > > > > > >
> > > > > > > Furthermore, if we cannot infer that RBTXN_SKIPPED_PREPARE *must* also
> > > > > > > be a prepared transaction, then why aren't the macros changed to match
> > > > > > > that interpretation?
> > > > > > >
> > > > > > > e.g.
> > > > > > >
> > > > > > > /* prepare for this transaction skipped? */
> > > > > > > #define rbtxn_skip_prepared(txn) \
> > > > > > > ( \
> > > > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > > > ((txn)->txn_flags & RBTXN_SKIPPED_PREPARE != 0) \
> > > > > > > )
> > > > > > >
> > > > > > > /* Has a prepare or stream_prepare already been sent? */
> > > > > > > #define rbtxn_sent_prepare(txn) \
> > > > > > > ( \
> > > > > > > ((txn)->txn_flags & RBTXN_IS_PREPARED != 0) && \
> > > > > > > ((txn)->txn_flags & RBTXN_SENT_PREPARE != 0) \
> > > > > > > )
> > > > > > >
> > > > > > > ~~~
> > > > > > >
> > > > > > > I think a to fix all this might be to enforce the RBTXN_IS_PREPARED
> > > > > > > bitflag is set also for RBTXN_SKIPPED_PREPARE and RBTXN_SENT_PREPARE
> > > > > > > constants, removing the ambiguity about how exactly to interpret those
> > > > > > > two constants.
> > > > > > >
> > > > > > > e.g. something like
> > > > > > >
> > > > > > > #define RBTXN_IS_PREPARED 0x0040
> > > > > > > #define RBTXN_SKIPPED_PREPARE (0x0080 | RBTXN_IS_PREPARED)
> > > > > > > #define RBTXN_SENT_PREPARE (0x0200 | RBTXN_IS_PREPARED)
> > > > > > >
> > > > > >
> > > > > > I think the better way would be to ensure that where we set
> > > > > > RBTXN_SENT_PREPARE or RBTXN_SKIPPED_PREPARE, the transaction is a
> > > > > > prepared one (RBTXN_IS_PREPARED must be already set). It should be
> > > > > > already the case for RBTXN_SENT_PREPARE but we can ensure the same for
> > > > > > RBTXN_SKIPPED_PREPARE as well.
> > > >
> > > > Since the patch already does "txn->txn_flags |= (RBTXN_IS_PREPARED |
> > > > RBTXN_SKIPPED_PREPARE);", it's already ensured, no?
> > > >
> > >
> > > I mean to say that we add assert to ensure the same.
> > >
> > > > I think we need to add both flags in ReorderBufferSkipPrepare(),
> > > > because there is a case where a transaction might not be marked as
> > > > RBTXN_IS_PREPARED here.
> > > >
> > >
> > > Are you talking about the case when it is invoked from
> > > DecodePrepare()?
> >
> > Yes. IIUC ReorderBufferSkipPrepare() is called only from DecodePrepare().
> >
> > > I thought we would set the flag in that code path.
> >
> > I agree that it makes sense to add the flag before calling
> > ReorderBufferSkipPrepare().
> >
> > >
> > > > > >
> > > > > > Will that address your concern? Does anyone else have an opinion on this matter?
> > > > >
> > > > > Yes that would be OK, but should also add some clarifying comments in
> > > > > the "reorderbuffer.h" like:
> > > > >
> > > > > #define RBTXN_SKIPPED_PREPARE 0x0080 /* this flag can only be set
> > > > > for RBTXN_IS_PREPARED transactions */
> > > > > #define RBTXN_SENT_PREPARE 0x0200 /* this flag can only be set for
> > > > > RBTXN_IS_PREPARED transactions */
> > > >
> > > > I think the same is true for RBTXN_IS_SERIALIZED and
> > > > RBTXN_IS_SERIALIZED_CLEAR; RBTXN_IS_SERIALIZED_CLEAR can only be set
> > > > for RBTXN_IS_SERIALIZED transaction. Should we add some comments to
> > > > them too? But I'm concerned about having too much explanation if we
> > > > add descriptions to flags too while already having comments for
> > > > corresponding macros.
>
> Hm That RBTXN_IS_SERIALIZED / RBTXN_IS_SERIALIZED_CLEAR is used
> differently -- it seems more tricky because RBTXN_IS_SERIALIZED flag
> is turned OFF again when RBTXN_IS_SERIALIZED_CLEAR is turned ON.
> (Whereas setting SKIPPED_PREPARE and SENT_PREPARE will never turn off
> the tx type IS_PREPARED)

You're right.

> To be honest, I didn't understand the "CLEAR" part of that name. It
> seems more like it should've been called something like
> RBTXN_IS_SERIALIZED_ALREADY or RBTXN_IS_SERIALIZED_PREVIOUSLY or
> whatever instead of something that appears to be saying "has the
> RBTXN_IS_SERIALIZED bitflag been cleared?" I understand the reluctance
> to over-comment everything but OTOH currently there is no way really
> to understand what these flags mean without looking through all the
> code to try to figure them out from the usage.
>
> My recurring gripe about these flags is simply that their meanings and
> how to use them should be apparent just by looking at reorderbuffer.h
> and not having to guess anything or look at how they get used in the
> code. It doesn't matter if that is achieved by better constant names,
> by more comments or by enhanced macros/functions with asserts but
> currently just looking at that file still leaves the reader with lots
> of unanswered questions.

I see your point. IIUC we have the comments about what the checks with
the flags means but not have the description about the relationship
among the flags. I think we can start a new thread for clarifying
these flags and their usage. We can also discuss renaming
RBTXN_IS_SERIALIZED[_CLEARE] there too.

>
> > > >
> > >
> > > Yeah, I am fine either way especially, if we decide to add asserts for
> > > RBTXN_IS_PREPARED when we set those flags.
> > >
> > > > Another way to ensure that is to convert these macros to inline
> > > > functions and add an Assert() there, but it seems overkill.
> > > >
> > >
> > > True, but that would ensure, we won't make any coding mistakes which
> > > Peter wants to ensure by writing additional comments but asserting is
> > > probably a better way.
> >
>
> Maybe I misunderstood, but I thought Amit's reply there meant that
> rewriting the macros as inline functions with asserts would be a good
> way to ensure no coding mistakes. Yet, the macros are still unchanged
> in v16-0002.

I forgot to mention; while converting all macros to inline functions
is a good idea, adding assertions to some places reasonably also makes
the code robust. The prepared transactions related flags are currently
used in specific cases. So I thought what the patch does also makes
sense to me.

>
> > I've attached the updated patch. In the 0002 patch, I've marked the
> > transaction as a prepared transaction in
> > ReorderBufferRememberPrepareInfo() so that all prepared transactions
> > that have a ReordeBufferTXN entry at that time can be marked properly.
> > And I've put some Assertions to ensure that all prepared transaction
> > related flags have been set properly. Thoughts?
> >
>
> Here are a couple of other review comments for patch v16-0002

Thank you for the comments!

>
> ======
> Commit message
>
> 1.
> The RBTXN_PREPARE flag (and its corresponding macro) have been renamed
> to RBTXN_IS_PREPARE to explicitly indicate the transaction
> type. Therefore, this commit also adds the RBTXN_IS_PREAPRE flag also
> to the transaction that is a prepared transaction and has been
> skipped, which previously had only the RBTXN_SKIPPED_PREPARE flag.
>
> Instead of fixing the "RBTXN_IS_PREAPRE" typo, it looks like a new
> problem (double "also" in the same sentence) was added in v16.

Fixed.

>
> ======
> .../replication/logical/reorderbuffer.c
>
> 2.
> if ((txn->final_lsn < two_phase_at) && is_commit)
> {
> - txn->txn_flags |= RBTXN_PREPARE;
> + txn->txn_flags |= RBTXN_IS_PREPARED;
>
> Won't this flag be already this flag already set? The next code
> comment ("The prepare info must have been updated ...") made me think
> so.
>

Good point. The transaction must have both flags: RBTXN_IS_PREPARED
and RBTXN_SKIPPED_PREPARE, unless I'm missing something.

I've attached the updated patches.

BTW if there is no comment on 0001 patch, I'm going to push it this week .

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v17-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	21.7 KB
v17-0002-Rename-RBTXN_PREPARE-to-RBTXN_IS_PREPARE-for-bet.patch	application/octet-stream	9.4 KB

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-30 05:31:40
Message-ID:	CAHut+PsQMP1RnGpqAyK+LA622GQzczNSDECbGG49qsQbm=kDYg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Some comments for patch v17-0001.

======
Commit message.

1.
typo /noticeble/noticeable/

======
.../replication/logical/reorderbuffer.c

ReorderBufferCheckAndTruncateAbortedTXN:

2.
It seemed tricky that the only place that is setting the
RBTXN_IS_COMMITTED flag is the function
ReorderBufferCheckAndTruncateAbortedTXN because neither the function
name nor the function comment gives any indication that it should be
having this side effect

~~~

ReorderBufferProcessTXN:

3.
if (rbtxn_prepared(txn))
+ {
rb->prepare(rb, txn, commit_lsn);
+ txn->txn_flags |= RBTXN_SENT_PREPARE;
+ }

In ReorderBufferStreamCommit there is an assertion that we are not
trying to do another prepare() if the _SENT_PREPARE flag is already
set. Should this code have a similar assert?

======
src/include/replication/reorderbuffer.h

4.
+#define RBTXN_SENT_PREPARE 0x0200
+#define RBTXN_IS_COMMITTED 0x0400
+#define RBTXN_IS_ABORTED 0x0800

IIUC, unlike the _SENT_PREPARE, those _IS_COMMITTED and _IS_ABORTED
flags are not quite the same as saying rb->commit() or rb->abort() was
called. But, those flags are only set some time later by
ReorderBufferCheckAndTruncateAbortedTXN() function based on the commit
log status.

The lag between the commit/abort happening and these flag getting set
seems unintuitive. Should they be named differently -- e.g. maybe
RBTXN_IS_CLOG_COMMITTED, RBTXN_IS_CLOG_ABORTED instead?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-30 07:12:22
Message-ID:	CAHut+PuM4vEeb50c0Kaon9FnsHcK4-MgTVW8bgXDfzAdxat6+w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2025 at 9:26 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jan 27, 2025 at 7:01 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
...

> > To be honest, I didn't understand the "CLEAR" part of that name. It
> > seems more like it should've been called something like
> > RBTXN_IS_SERIALIZED_ALREADY or RBTXN_IS_SERIALIZED_PREVIOUSLY or
> > whatever instead of something that appears to be saying "has the
> > RBTXN_IS_SERIALIZED bitflag been cleared?" I understand the reluctance
> > to over-comment everything but OTOH currently there is no way really
> > to understand what these flags mean without looking through all the
> > code to try to figure them out from the usage.
> >
> > My recurring gripe about these flags is simply that their meanings and
> > how to use them should be apparent just by looking at reorderbuffer.h
> > and not having to guess anything or look at how they get used in the
> > code. It doesn't matter if that is achieved by better constant names,
> > by more comments or by enhanced macros/functions with asserts but
> > currently just looking at that file still leaves the reader with lots
> > of unanswered questions.
>
> I see your point. IIUC we have the comments about what the checks with
> the flags means but not have the description about the relationship
> among the flags. I think we can start a new thread for clarifying
> these flags and their usage. We can also discuss renaming
> RBTXN_IS_SERIALIZED[_CLEARE] there too.
>

OK.

======

Some comments for patch v17-0002.

======
.../replication/logical/reorderbuffer.c

ReorderBufferSkipPrepare:

1.
+ /* txn must have been marked as a prepared transaction */
+ Assert((txn->txn_flags & RBTXN_IS_PREPARED) != 0);
+
txn->txn_flags |= RBTXN_SKIPPED_PREPARE;

Should this also be asserting that the _SENT_PREPARE flag is false,
because we cannot be skipping it if we already sent the prepare.

~~~

ReorderBufferFinishPrepared:

- txn->txn_flags |= RBTXN_PREPARE;
-
/*
- * The prepare info must have been updated in txn even if we skip
- * prepare.
+ * txn must have been marked as a prepared transaction and skipped but
+ * not sent a prepare. Also, the prepare info must have been updated
+ * in txn even if we skip prepare.
*/
+ Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE)) != 0);
+ Assert((txn->txn_flags & RBTXN_SENT_PREPARE) == 0);
Assert(txn->final_lsn != InvalidXLogRecPtr);

2a.
If it must have been prepared *and* skipped (as the comment says) then
the first assert should be written as:
Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE))
== (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE));

or easier to just have 2 asserts:
Assert(txn->txn_flags & RBTXN_IS_PREPARED);
Assert(txn->txn_flags & RBTXN_SKIPPED_PREPARE);

2b.
later in the same function there is code:

if (is_commit)
rb->commit_prepared(rb, txn, commit_lsn);
else
rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);

So it is OK to do a commit_prepared/rollback_prepared even though no
prepare() has been sent?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-31 00:03:56
Message-ID:	CAD21AoBTLTh3kQDjsoO8y_4o6PndouqB3XiDGGuw6VmfVNwZcw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 29, 2025 at 9:32 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Some comments for patch v17-0001.

Thank you for reviewing the patch!

>
> ======
> Commit message.
>
> 1.
> typo /noticeble/noticeable/

Fixed.

>
> ======
> .../replication/logical/reorderbuffer.c
>
> ReorderBufferCheckAndTruncateAbortedTXN:
>
> 2.
> It seemed tricky that the only place that is setting the
> RBTXN_IS_COMMITTED flag is the function
> ReorderBufferCheckAndTruncateAbortedTXN because neither the function
> name nor the function comment gives any indication that it should be
> having this side effect

Hmm, it doesn't seem so tricky to me that a function with the name
ReorderBufferCheckAndTruncateAbortedTXN() checks the transaction
status to truncate an aborted transaction and caches the transaction
status as a side effect.

>
> ~~~
>
> ReorderBufferProcessTXN:
>
> 3.
> if (rbtxn_prepared(txn))
> + {
> rb->prepare(rb, txn, commit_lsn);
> + txn->txn_flags |= RBTXN_SENT_PREPARE;
> + }
>
> In ReorderBufferStreamCommit there is an assertion that we are not
> trying to do another prepare() if the _SENT_PREPARE flag is already
> set. Should this code have a similar assert?

We can have a similar assert there but why do you think it's needed there?

>
> ======
> src/include/replication/reorderbuffer.h
>
> 4.
> +#define RBTXN_SENT_PREPARE 0x0200
> +#define RBTXN_IS_COMMITTED 0x0400
> +#define RBTXN_IS_ABORTED 0x0800
>
> IIUC, unlike the _SENT_PREPARE, those _IS_COMMITTED and _IS_ABORTED
> flags are not quite the same as saying rb->commit() or rb->abort() was
> called. But, those flags are only set some time later by
> ReorderBufferCheckAndTruncateAbortedTXN() function based on the commit
> log status.
>
> The lag between the commit/abort happening and these flag getting set
> seems unintuitive. Should they be named differently -- e.g. maybe
> RBTXN_IS_CLOG_COMMITTED, RBTXN_IS_CLOG_ABORTED instead?
>

I'm not sure these names are better.

In logical decoding context, we neither commit nor rollback
transactions decoded from WAL records as the transaction outcomes come
only from WAL records. So I guess it's easy-to-grasp that
RBTXN_IS_COMMITTED means "this is a committed transaction" but not "we
committed the transaction". I think this is a similar understanding as
what we're trying to rename RBTXN_PREPARE to RBTXN_IS_PREPARE.
Similarly, we have rb->commit() and rb->abort(), I would not think
like we're committing or aborting the transaction. So the lag between
the ->commit()/abort() happening and these flags getting set is not
confusing (at least for me). I think we can leave these names as they
are, and if we need to remember if a commit message has been sent, we
would be able to have a flag like RBTXN_SENT_COMMIT.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-01-31 03:06:57
Message-ID:	CAHut+PvYwACsoVSmqb=wowr+tUG5Kn8G0LY1EYjW=QtqFg4JDQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 31, 2025 at 11:04 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jan 29, 2025 at 9:32 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > ReorderBufferCheckAndTruncateAbortedTXN:
> >
> > 2.
> > It seemed tricky that the only place that is setting the
> > RBTXN_IS_COMMITTED flag is the function
> > ReorderBufferCheckAndTruncateAbortedTXN because neither the function
> > name nor the function comment gives any indication that it should be
> > having this side effect
>
> Hmm, it doesn't seem so tricky to me that a function with the name
> ReorderBufferCheckAndTruncateAbortedTXN() checks the transaction
> status to truncate an aborted transaction and caches the transaction
> status as a side effect.
>

I was coming at this from a different perspective, asking myself the
question "When can I know the RBTXN_IS_COMMITTED bit setting?" -- aka
rbtxn_is_committed()?

AFAICT it turns out we can only have confidence in that result when
know ReorderBufferCheckAndTruncateAbortedTXN was called already for
this tx. But this happens only when ReorderBufferCheckMemoryLimit()
gets called. So, these bitflags are getting set as a side-effect of
calling unrelated functions. (e.g. the fact we can't test if a tx was
aborted/committed unless ReorderBufferCheckMemoryLimit is called
seemed unusual to me). I don't know what the solution is; maybe some
more comments would be enough.

> >
> > ~~~
> >
> > ReorderBufferProcessTXN:
> >
> > 3.
> > if (rbtxn_prepared(txn))
> > + {
> > rb->prepare(rb, txn, commit_lsn);
> > + txn->txn_flags |= RBTXN_SENT_PREPARE;
> > + }
> >
> > In ReorderBufferStreamCommit there is an assertion that we are not
> > trying to do another prepare() if the _SENT_PREPARE flag is already
> > set. Should this code have a similar assert?
>
> We can have a similar assert there but why do you think it's needed there?

No particular reason, other than for consistency to have similar
assertions everywhere that the RBTXN_SENT_PREPARE flag is set.

>
> >
> > ======
> > src/include/replication/reorderbuffer.h
> >
> > 4.
> > +#define RBTXN_SENT_PREPARE 0x0200
> > +#define RBTXN_IS_COMMITTED 0x0400
> > +#define RBTXN_IS_ABORTED 0x0800
> >
> > IIUC, unlike the _SENT_PREPARE, those _IS_COMMITTED and _IS_ABORTED
> > flags are not quite the same as saying rb->commit() or rb->abort() was
> > called. But, those flags are only set some time later by
> > ReorderBufferCheckAndTruncateAbortedTXN() function based on the commit
> > log status.
> >
> > The lag between the commit/abort happening and these flag getting set
> > seems unintuitive. Should they be named differently -- e.g. maybe
> > RBTXN_IS_CLOG_COMMITTED, RBTXN_IS_CLOG_ABORTED instead?
> >
>
> I'm not sure these names are better.
>
> In logical decoding context, we neither commit nor rollback
> transactions decoded from WAL records as the transaction outcomes come
> only from WAL records. So I guess it's easy-to-grasp that
> RBTXN_IS_COMMITTED means "this is a committed transaction" but not "we
> committed the transaction". I think this is a similar understanding as
> what we're trying to rename RBTXN_PREPARE to RBTXN_IS_PREPARE.
> Similarly, we have rb->commit() and rb->abort(), I would not think
> like we're committing or aborting the transaction. So the lag between
> the ->commit()/abort() happening and these flags getting set is not
> confusing (at least for me). I think we can leave these names as they
> are, and if we need to remember if a commit message has been sent, we
> would be able to have a flag like RBTXN_SENT_COMMIT.
>

OK.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-02-03 18:41:03
Message-ID:	CAD21AoDmYZtLnPLuiERT6Cibv1Gf1DwDjzBevtqKYn0ZzMQqBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 29, 2025 at 11:12 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Tue, Jan 28, 2025 at 9:26 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Jan 27, 2025 at 7:01 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> ...
>
> > > To be honest, I didn't understand the "CLEAR" part of that name. It
> > > seems more like it should've been called something like
> > > RBTXN_IS_SERIALIZED_ALREADY or RBTXN_IS_SERIALIZED_PREVIOUSLY or
> > > whatever instead of something that appears to be saying "has the
> > > RBTXN_IS_SERIALIZED bitflag been cleared?" I understand the reluctance
> > > to over-comment everything but OTOH currently there is no way really
> > > to understand what these flags mean without looking through all the
> > > code to try to figure them out from the usage.
> > >
> > > My recurring gripe about these flags is simply that their meanings and
> > > how to use them should be apparent just by looking at reorderbuffer.h
> > > and not having to guess anything or look at how they get used in the
> > > code. It doesn't matter if that is achieved by better constant names,
> > > by more comments or by enhanced macros/functions with asserts but
> > > currently just looking at that file still leaves the reader with lots
> > > of unanswered questions.
> >
> > I see your point. IIUC we have the comments about what the checks with
> > the flags means but not have the description about the relationship
> > among the flags. I think we can start a new thread for clarifying
> > these flags and their usage. We can also discuss renaming
> > RBTXN_IS_SERIALIZED[_CLEARE] there too.
> >
>
> OK.
>
> ======
>
> Some comments for patch v17-0002.

Thank you for reviewing the patch.

>
> ======
> .../replication/logical/reorderbuffer.c
>
> ReorderBufferSkipPrepare:
>
> 1.
> + /* txn must have been marked as a prepared transaction */
> + Assert((txn->txn_flags & RBTXN_IS_PREPARED) != 0);
> +
> txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
>
> Should this also be asserting that the _SENT_PREPARE flag is false,
> because we cannot be skipping it if we already sent the prepare.
>
> ~~~
>
> ReorderBufferFinishPrepared:
>
> 2.
>
> - txn->txn_flags |= RBTXN_PREPARE;
> -
> /*
> - * The prepare info must have been updated in txn even if we skip
> - * prepare.
> + * txn must have been marked as a prepared transaction and skipped but
> + * not sent a prepare. Also, the prepare info must have been updated
> + * in txn even if we skip prepare.
> */
> + Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE)) != 0);
> + Assert((txn->txn_flags & RBTXN_SENT_PREPARE) == 0);
> Assert(txn->final_lsn != InvalidXLogRecPtr);
>
> 2a.
> If it must have been prepared *and* skipped (as the comment says) then
> the first assert should be written as:
> Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE))
> == (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE));
>
> or easier to just have 2 asserts:
> Assert(txn->txn_flags & RBTXN_IS_PREPARED);
> Assert(txn->txn_flags & RBTXN_SKIPPED_PREPARE);
>

Agreed with all the above comments. Since checking
prepared-transaction-related-flags is getting complicated I've
introduced RBTXN_PREPARE_STATUS_FLAGS so that we can check the desired
prepared transaction status easily.

> ~
>
> 2b.
> later in the same function there is code:
>
> if (is_commit)
> rb->commit_prepared(rb, txn, commit_lsn);
> else
> rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);
>
> So it is OK to do a commit_prepared/rollback_prepared even though no
> prepare() has been sent?

IIUC ReorderBufferReplay() is responsible for sending a prepare
message in this case. See the comment around there:

/*
* By this time the txn has the prepare record information and it is
* important to use that so that downstream gets the accurate
* information. If instead, we have passed commit information here
* then downstream can behave as it has already replayed commit
* prepared after the restart.
*/

I've attached the updated patches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v18-0001-Skip-logical-decoding-of-already-aborted-transac.patch	application/octet-stream	21.7 KB
v18-0002-Rename-RBTXN_PREPARE-to-RBTXN_IS_PREPARE-for-bet.patch	application/octet-stream	9.5 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-02-03 18:55:52
Message-ID:	CAD21AoAnDFAvSF-uGSvJXn+u6fbZO7GNvXqb4NcODqTXEAR_bw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 30, 2025 at 7:07 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Fri, Jan 31, 2025 at 11:04 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 29, 2025 at 9:32 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > ======
> > > .../replication/logical/reorderbuffer.c
> > >
> > > ReorderBufferCheckAndTruncateAbortedTXN:
> > >
> > > 2.
> > > It seemed tricky that the only place that is setting the
> > > RBTXN_IS_COMMITTED flag is the function
> > > ReorderBufferCheckAndTruncateAbortedTXN because neither the function
> > > name nor the function comment gives any indication that it should be
> > > having this side effect
> >
> > Hmm, it doesn't seem so tricky to me that a function with the name
> > ReorderBufferCheckAndTruncateAbortedTXN() checks the transaction
> > status to truncate an aborted transaction and caches the transaction
> > status as a side effect.
> >
>
> I was coming at this from a different perspective, asking myself the
> question "When can I know the RBTXN_IS_COMMITTED bit setting?" -- aka
> rbtxn_is_committed()?
>
> AFAICT it turns out we can only have confidence in that result when
> know ReorderBufferCheckAndTruncateAbortedTXN was called already for
> this tx. But this happens only when ReorderBufferCheckMemoryLimit()
> gets called. So, these bitflags are getting set as a side-effect of
> calling unrelated functions. (e.g. the fact we can't test if a tx was
> aborted/committed unless ReorderBufferCheckMemoryLimit is called
> seemed unusual to me).

I'm not sure if ReorderBufferCheckMemoryLimit() is an unrelated
function because the whole idea (also mentioned in the commit message)
is that we check the transaction status only for large transactions to
avoid CLOG lookup overheads. TBH I'm not sure why readers expect these
transaction status flags to always be set. Also in the function
comment we have:

* the transaction is aborted. The transaction status is cached in
* txn->txn_flags so we can skip future changes and avoid CLOG lookups on the
* next call.

which describes the side-effect of the function that it caches the
transaction status.

> I don't know what the solution is; maybe some
> more comments would be enough.

I'm not sure how we can improve the comment TBH.

>
> > >
> > > ~~~
> > >
> > > ReorderBufferProcessTXN:
> > >
> > > 3.
> > > if (rbtxn_prepared(txn))
> > > + {
> > > rb->prepare(rb, txn, commit_lsn);
> > > + txn->txn_flags |= RBTXN_SENT_PREPARE;
> > > + }
> > >
> > > In ReorderBufferStreamCommit there is an assertion that we are not
> > > trying to do another prepare() if the _SENT_PREPARE flag is already
> > > set. Should this code have a similar assert?
> >
> > We can have a similar assert there but why do you think it's needed there?
>
> No particular reason, other than for consistency to have similar
> assertions everywhere that the RBTXN_SENT_PREPARE flag is set.

Okay, addressed in the v18 patch I've just sent[1].

Regards,

[1] /message-id/CAD21AoDmYZtLnPLuiERT6Cibv1Gf1DwDjzBevtqKYn0ZzMQqBQ%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-02-11 22:13:24
Message-ID:	CAD21AoDcDB79nhGfRPmSUZv5OVSTwQY8Z3K4DoG8zRDwvSzyaw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 3, 2025 at 10:41 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Jan 29, 2025 at 11:12 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 28, 2025 at 9:26 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jan 27, 2025 at 7:01 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > ...
> >
> > > > To be honest, I didn't understand the "CLEAR" part of that name. It
> > > > seems more like it should've been called something like
> > > > RBTXN_IS_SERIALIZED_ALREADY or RBTXN_IS_SERIALIZED_PREVIOUSLY or
> > > > whatever instead of something that appears to be saying "has the
> > > > RBTXN_IS_SERIALIZED bitflag been cleared?" I understand the reluctance
> > > > to over-comment everything but OTOH currently there is no way really
> > > > to understand what these flags mean without looking through all the
> > > > code to try to figure them out from the usage.
> > > >
> > > > My recurring gripe about these flags is simply that their meanings and
> > > > how to use them should be apparent just by looking at reorderbuffer.h
> > > > and not having to guess anything or look at how they get used in the
> > > > code. It doesn't matter if that is achieved by better constant names,
> > > > by more comments or by enhanced macros/functions with asserts but
> > > > currently just looking at that file still leaves the reader with lots
> > > > of unanswered questions.
> > >
> > > I see your point. IIUC we have the comments about what the checks with
> > > the flags means but not have the description about the relationship
> > > among the flags. I think we can start a new thread for clarifying
> > > these flags and their usage. We can also discuss renaming
> > > RBTXN_IS_SERIALIZED[_CLEARE] there too.
> > >
> >
> > OK.
> >
> > ======
> >
> > Some comments for patch v17-0002.
>
> Thank you for reviewing the patch.
>
> >
> > ======
> > .../replication/logical/reorderbuffer.c
> >
> > ReorderBufferSkipPrepare:
> >
> > 1.
> > + /* txn must have been marked as a prepared transaction */
> > + Assert((txn->txn_flags & RBTXN_IS_PREPARED) != 0);
> > +
> > txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
> >
> > Should this also be asserting that the _SENT_PREPARE flag is false,
> > because we cannot be skipping it if we already sent the prepare.
> >
> > ~~~
> >
> > ReorderBufferFinishPrepared:
> >
> > 2.
> >
> > - txn->txn_flags |= RBTXN_PREPARE;
> > -
> > /*
> > - * The prepare info must have been updated in txn even if we skip
> > - * prepare.
> > + * txn must have been marked as a prepared transaction and skipped but
> > + * not sent a prepare. Also, the prepare info must have been updated
> > + * in txn even if we skip prepare.
> > */
> > + Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE)) != 0);
> > + Assert((txn->txn_flags & RBTXN_SENT_PREPARE) == 0);
> > Assert(txn->final_lsn != InvalidXLogRecPtr);
> >
> > 2a.
> > If it must have been prepared *and* skipped (as the comment says) then
> > the first assert should be written as:
> > Assert((txn->txn_flags & (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE))
> > == (RBTXN_IS_PREPARED | RBTXN_SKIPPED_PREPARE));
> >
> > or easier to just have 2 asserts:
> > Assert(txn->txn_flags & RBTXN_IS_PREPARED);
> > Assert(txn->txn_flags & RBTXN_SKIPPED_PREPARE);
> >
>
> Agreed with all the above comments. Since checking
> prepared-transaction-related-flags is getting complicated I've
> introduced RBTXN_PREPARE_STATUS_FLAGS so that we can check the desired
> prepared transaction status easily.
>
> > ~
> >
> > 2b.
> > later in the same function there is code:
> >
> > if (is_commit)
> > rb->commit_prepared(rb, txn, commit_lsn);
> > else
> > rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);
> >
> > So it is OK to do a commit_prepared/rollback_prepared even though no
> > prepare() has been sent?
>
> IIUC ReorderBufferReplay() is responsible for sending a prepare
> message in this case. See the comment around there:
>
> /*
> * By this time the txn has the prepare record information and it is
> * important to use that so that downstream gets the accurate
> * information. If instead, we have passed commit information here
> * then downstream can behave as it has already replayed commit
> * prepared after the restart.
> */
>
> I've attached the updated patches.
>

If there are no further comments on v18 patches, I'm going to push
them tomorrow.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-02-12 05:43:12
Message-ID:	CAHut+Pu8Eqc35OmK_uaC=NFtqOn=bVhrYvVrcJ_ewBkcZx77+g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi. Here are some minor comments for the v18* patch set.

//////////

Patch v18-0001

1.1. Commit message

A previously reported typo still exists:

/noticeble/noticeable/

//////////

Patch v18-0002

2.1
+#define RBTXN_PREPARE_STATUS_FLAGS (RBTXN_IS_PREPARED |
RBTXN_SKIPPED_PREPARE | RBTXN_SENT_PREPARE)
+

AFAICT bitmasks like this are more commonly named with a _MASK suffix.

How about something like:
- RBTXN_PREPARE_MASK
- RBTXN_PREPARE_STATUS_MASK
- RBTXN_PREPARE_FLAGS_MASK

==========
Kind Regards,
Peter Smith.
Fujitsu Australia

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-02-13 01:58:12
Message-ID:	CAD21AoAFTS65BTfyawtiQ2hhV-iMSBpToUaCd3Ja6WUwN5PNKg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 11, 2025 at 9:43 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi. Here are some minor comments for the v18* patch set.
>
> //////////
>
> Patch v18-0001
>
> 1.1. Commit message
>
> A previously reported typo still exists:
>
> /noticeble/noticeable/
>
> //////////
>
> Patch v18-0002
>
> 2.1
> +#define RBTXN_PREPARE_STATUS_FLAGS (RBTXN_IS_PREPARED |
> RBTXN_SKIPPED_PREPARE | RBTXN_SENT_PREPARE)
> +
>
> AFAICT bitmasks like this are more commonly named with a _MASK suffix.
>
> How about something like:
> - RBTXN_PREPARE_MASK
> - RBTXN_PREPARE_STATUS_MASK
> - RBTXN_PREPARE_FLAGS_MASK
>

Pushed both patches after addressing the above comments. Thank you for
your review!

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	RE: Skip collecting decoded changes of already-aborted transactions
Date:	2025-03-14 05:04:20
Message-ID:	OSCPR01MB149667A7AC89C6FEA627A02FAF5D22@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dear hackers,

I hope I'm in the correct thread. In abfb296, rbtxn_skip_prepared() was removed from
SnapBuildDistributeNewCatalogSnapshot(). ISTM it was an only caller of the function.

Is it an intentional for external projects? Or it can be removed like attached?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment	Content-Type	Size
remove_func.diffs	application/octet-stream	551 bytes

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	Re: Skip collecting decoded changes of already-aborted transactions
Date:	2025-03-17 17:02:17
Message-ID:	CAD21AoBcsKicXiOwkwPg2_UK3E81Rz6QFjM2eaOaVcURonGfVw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 13, 2025 at 10:04 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear hackers,
>
> I hope I'm in the correct thread. In abfb296, rbtxn_skip_prepared() was removed from
> SnapBuildDistributeNewCatalogSnapshot(). ISTM it was an only caller of the function.
>
> Is it an intentional for external projects? Or it can be removed like attached?

I think we can keep it as all RBTXN_xxx flags have the corresponding
macro and the comments of these macros somewhat help understand what
the flag indicates.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com