Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

Lists: pgsql-hackers
From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2021-12-23 15:05:54
Message-ID: CALj2ACWt6kqriAHrO+AJj+OmP=suwbktHT5JoYAn-nqZe2gd2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN and REDO LSN [1]. It gives more context while
analyzing checkpoint-related issues. The pg_controldata gives the last
checkpoint LSN and REDO LSN, but having this info alongside the log
message helps analyze issues that happened previously, connect the
dots and identify the root cause.

Attaching a small patch herewith. Thoughts?

[1]
2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting:
shutdown immediate
2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
LSN=0/14D0AD0, REDO LSN=0/14D0AD0

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v1-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.8 KB

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2021-12-24 05:51:34
Message-ID: 20211224.145134.1164706994861031206.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Thu, 23 Dec 2021 20:35:54 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> Hi,
>
> It is useful (for debugging purposes) if the checkpoint end message
> has the checkpoint LSN and REDO LSN [1]. It gives more context while
> analyzing checkpoint-related issues. The pg_controldata gives the last
> checkpoint LSN and REDO LSN, but having this info alongside the log
> message helps analyze issues that happened previously, connect the
> dots and identify the root cause.
>
> Attaching a small patch herewith. Thoughts?

A big +1 from me. I thought about proposing the same in the past.

> [1]
> 2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting:
> shutdown immediate
> 2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote
> 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
> write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
> longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
> LSN=0/14D0AD0, REDO LSN=0/14D0AD0

I thougt about something like the following, but your proposal may be
clearer.

> WAL range=[0/14D0340, 0/14D0AD0]

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2021-12-24 12:07:51
Message-ID: CALj2ACWpaOnZrg5FT+bm19WAqBgvgj_coAMikrSMEpQofThjRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 24, 2021 at 11:21 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 23 Dec 2021 20:35:54 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> > Hi,
> >
> > It is useful (for debugging purposes) if the checkpoint end message
> > has the checkpoint LSN and REDO LSN [1]. It gives more context while
> > analyzing checkpoint-related issues. The pg_controldata gives the last
> > checkpoint LSN and REDO LSN, but having this info alongside the log
> > message helps analyze issues that happened previously, connect the
> > dots and identify the root cause.
> >
> > Attaching a small patch herewith. Thoughts?
>
> A big +1 from me. I thought about proposing the same in the past.

Thanks for the review. I've added a CF entry to not lose track -
https://commitfest.postgresql.org/36/3474/.

> > [1]
> > 2021-12-23 14:58:54.694 UTC [1965649] LOG: checkpoint starting:
> > shutdown immediate
> > 2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote
> > 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
> > write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
> > longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
> > LSN=0/14D0AD0, REDO LSN=0/14D0AD0
>
> I thougt about something like the following, but your proposal may be
> clearer.
>
> > WAL range=[0/14D0340, 0/14D0AD0]

Yeah the proposed in the v1 is clear saying checkpoint/restartpoint
LSN and REDO LSN.

Regards,
Bharath Rupireddy.


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2021-12-24 12:11:59
Message-ID: YcW5D2GjWlIzbbKs@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 24, 2021 at 02:51:34PM +0900, Kyotaro Horiguchi wrote:
> I thougt about something like the following, but your proposal may be
> clearer.

+ "LSN=%X/%X, REDO LSN=%X/%X",
This could be rather confusing for the average user, even if I agree
that this is some information that only an advanced user could
understand. Could it be possible to define those fields in a more
deterministic way? For one, it is hard to understand the relationship
between both fields without looking at the code, particulary if both
share the same value. It is at least rather.. Well, mostly, easy to
guess what each other field means in this context, which is not the
case of what you are proposing here. One idea could be use also
"start point" for REDO, for example.
--
Michael


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2021-12-24 12:24:02
Message-ID: CALj2ACXzv6FJNbTnP9uYsPXFeA4UcugSWDecxr19SjFCUCxP-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 24, 2021 at 5:42 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Dec 24, 2021 at 02:51:34PM +0900, Kyotaro Horiguchi wrote:
> > I thougt about something like the following, but your proposal may be
> > clearer.
>
> + "LSN=%X/%X, REDO LSN=%X/%X",
> This could be rather confusing for the average user, even if I agree
> that this is some information that only an advanced user could
> understand. Could it be possible to define those fields in a more
> deterministic way? For one, it is hard to understand the relationship
> between both fields without looking at the code, particulary if both
> share the same value. It is at least rather.. Well, mostly, easy to
> guess what each other field means in this context, which is not the
> case of what you are proposing here. One idea could be use also
> "start point" for REDO, for example.

How about "location=%X/%X, REDO start location=%X/%X"? The entire log
message can look like below:

2021-12-24 12:20:19.140 UTC [1977834] LOG: checkpoint
complete:location=%X/%X, REDO start location=%X/%X; wrote 7 buffers
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s,
sync=0.007 s, total=0.192 s; sync files=5, longest=0.005 s,
average=0.002 s; distance=293 kB, estimate=56584 kB

Another variant:
2021-12-24 12:20:19.140 UTC [1977834] LOG: checkpoint completed at
location=%X/%X with REDO start location=%X/%X: wrote 7 buffers (0.0%);
0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.007
s, total=0.192 s; sync files=5, longest=0.005 s, average=0.002 s;
distance=293 kB, estimate=56584 kB
2021-12-24 12:20:19.140 UTC [1977834] LOG: restartpoint completed at
location=%X/%X with REDO start location=%X/%X: wrote 7 buffers (0.0%);
0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.007
s, total=0.192 s; sync files=5, longest=0.005 s, average=0.002 s;
distance=293 kB, estimate=56584 kB

Thoughts?

Regards,
Bharath Rupireddy.


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2021-12-28 02:56:28
Message-ID: CALj2ACUP84UYSimewLSs8--rT9dUTA7CUYTgXTnEb-2iHkc9wA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 커뮤니티SQL

On Fri, Dec 24, 2021 at 5:54 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Dec 24, 2021 at 5:42 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Fri, Dec 24, 2021 at 02:51:34PM +0900, Kyotaro Horiguchi wrote:
> > > I thougt about something like the following, but your proposal may be
> > > clearer.
> >
> > + "LSN=%X/%X, REDO LSN=%X/%X",
> > This could be rather confusing for the average user, even if I agree
> > that this is some information that only an advanced user could
> > understand. Could it be possible to define those fields in a more
> > deterministic way? For one, it is hard to understand the relationship
> > between both fields without looking at the code, particulary if both
> > share the same value. It is at least rather.. Well, mostly, easy to
> > guess what each other field means in this context, which is not the
> > case of what you are proposing here. One idea could be use also
> > "start point" for REDO, for example.
>
> How about "location=%X/%X, REDO start location=%X/%X"? The entire log
> message can look like below:
>
> 2021-12-24 12:20:19.140 UTC [1977834] LOG: checkpoint
> complete:location=%X/%X, REDO start location=%X/%X; wrote 7 buffers
> (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s,
> sync=0.007 s, total=0.192 s; sync files=5, longest=0.005 s,
> average=0.002 s; distance=293 kB, estimate=56584 kB
>
> Another variant:
> 2021-12-24 12:20:19.140 UTC [1977834] LOG: checkpoint completed at
> location=%X/%X with REDO start location=%X/%X: wrote 7 buffers (0.0%);
> 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.007
> s, total=0.192 s; sync files=5, longest=0.005 s, average=0.002 s;
> distance=293 kB, estimate=56584 kB
> 2021-12-24 12:20:19.140 UTC [1977834] LOG: restartpoint completed at
> location=%X/%X with REDO start location=%X/%X: wrote 7 buffers (0.0%);
> 0 WAL file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.007
> s, total=0.192 s; sync files=5, longest=0.005 s, average=0.002 s;
> distance=293 kB, estimate=56584 kB

Here are the 2 patches.

one(v2-1-XXX.patch) adding the info as:
2021-12-28 02:44:34.870 UTC [2384386] LOG: checkpoint complete:
location=0/1B03040, REDO start location=0/1B03008; wrote 466 buffers
(2.8%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.014 s,
sync=0.038 s, total=0.072 s; sync files=21, longest=0.022 s,
average=0.002 s; distance=6346 kB, estimate=6346 kB

another(v2-2-XXX.patch) adding the info as:
2021-12-28 02:52:24.464 UTC [2394396] LOG: checkpoint completed at
location=0/212FFC8 with REDO start location=0/212FF90: wrote 451
buffers (2.8%); 0 WAL file(s) added, 0 removed, 1 recycled;
write=0.012 s, sync=0.032 s, total=0.071 s; sync files=6,
longest=0.022 s, average=0.006 s; distance=6272 kB, estimate=6272 kB

attaching v1-0001-XXX from the initial mail again just for the sake of
completion:
2021-12-23 14:58:54.714 UTC [1965649] LOG: checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.025 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB;
LSN=0/14D0AD0, REDO LSN=0/14D0AD0

Thoughts?

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v2-1-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.5 KB
v2-2-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.5 KB
v1-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.8 KB

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-04 01:49:38
Message-ID: 20220104.104938.1202548681429146328.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Tue, 28 Dec 2021 08:26:28 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> On Fri, Dec 24, 2021 at 5:54 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 24, 2021 at 5:42 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > >
> > > On Fri, Dec 24, 2021 at 02:51:34PM +0900, Kyotaro Horiguchi wrote:
> > > > I thougt about something like the following, but your proposal may be
> > > > clearer.
> > >
> > > + "LSN=%X/%X, REDO LSN=%X/%X",
> > > This could be rather confusing for the average user, even if I agree
> > > that this is some information that only an advanced user could
> > > understand. Could it be possible to define those fields in a more
> > > deterministic way? For one, it is hard to understand the relationship
> > > between both fields without looking at the code, particulary if both
> > > share the same value. It is at least rather.. Well, mostly, easy to
> > > guess what each other field means in this context, which is not the
> > > case of what you are proposing here. One idea could be use also
> > > "start point" for REDO, for example.
> >
> > How about "location=%X/%X, REDO start location=%X/%X"? The entire log
> > message can look like below:
..
> Thoughts?

It seems to me "LSN" or just "location" is more confusing or
mysterious than "REDO LSN" for the average user. If we want to avoid
being technically too detailed, we would use just "start LSN=%X/%X,
end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-12 06:09:21
Message-ID: CAOBaU_aefLN5Gze52os5JzkNR_Fc+bJHPk=m67zov33-66MfZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Tue, Dec 28, 2021 at 10:56 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> attaching v1-0001-XXX from the initial mail again just for the sake of
> completion:

Unfortunately this breaks the cfbot as it tries to apply this patch
too: http://cfbot.cputube.org/patch_36_3474.log.

For this kind of situation I think that the usual solution is to use a
.txt extension to make sure that the cfbot won't try to apply it.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: cfbot wrangling (was Re: Add checkpoint and redo LSN to LogCheckpointEnd log message)
Date: 2022-01-12 06:19:22
Message-ID: 2949510.1641968362@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Julien Rouhaud <rjuju123(at)gmail(dot)com> writes:
> For this kind of situation I think that the usual solution is to use a
> .txt extension to make sure that the cfbot won't try to apply it.

Yeah ... this has come up before. Is there a documented way to
attach files that the cfbot will ignore?

Two specific scenarios seem to be interesting:

1. You are attaching patch(es) plus some non-patch files

2. You are attaching some random files, and would like to not
displace the cfbot's idea of the latest patchset.

I don't know exactly how to do either of those.

regards, tom lane


From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cfbot wrangling (was Re: Add checkpoint and redo LSN to LogCheckpointEnd log message)
Date: 2022-01-12 06:31:53
Message-ID: Yd512a0TNiE/5fAb@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 12, 2022 at 01:19:22AM -0500, Tom Lane wrote:
> Julien Rouhaud <rjuju123(at)gmail(dot)com> writes:
> > For this kind of situation I think that the usual solution is to use a
> > .txt extension to make sure that the cfbot won't try to apply it.
>
> Yeah ... this has come up before. Is there a documented way to
> attach files that the cfbot will ignore?

Not that I know of unfortunately. I think the apply part is done by
https://github.com/macdice/cfbot/blob/master/cfbot_patchburner_chroot_ctl.sh#L93-L120,
which seems reasonable. So basically any extension outside of those could be
used, excluding of course anything clearly suspicious that would be rejected by
many email providers.

> Two specific scenarios seem to be interesting:
>
> 1. You are attaching patch(es) plus some non-patch files
>
> 2. You are attaching some random files, and would like to not
> displace the cfbot's idea of the latest patchset.

I'm assuming that someone wanting to send an additional patch to be applied on
top of the OP patchset is part of 2?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cfbot wrangling (was Re: Add checkpoint and redo LSN to LogCheckpointEnd log message)
Date: 2022-01-12 06:37:05
Message-ID: 2951336.1641969425@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Julien Rouhaud <rjuju123(at)gmail(dot)com> writes:
> On Wed, Jan 12, 2022 at 01:19:22AM -0500, Tom Lane wrote:
>> 2. You are attaching some random files, and would like to not
>> displace the cfbot's idea of the latest patchset.

> I'm assuming that someone wanting to send an additional patch to be applied on
> top of the OP patchset is part of 2?

AFAIK, if you're submitting a patch then you have to attach a complete
patchset, or the cfbot will be totally lost. Getting the bot to
understand incremental patches would be useful for sure ... but it's
outside the scope of what I'm asking for now, which is just clear
documentation of what the bot can do already.

regards, tom lane


From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cfbot wrangling (was Re: Add checkpoint and redo LSN to LogCheckpointEnd log message)
Date: 2022-01-12 06:50:36
Message-ID: CAOBaU_apUCB_o8P3x5O8HL9UuVpXxDgKa68CSPebBPCtUW+dxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 12, 2022 at 2:37 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> AFAIK, if you're submitting a patch then you have to attach a complete
> patchset, or the cfbot will be totally lost. Getting the bot to
> understand incremental patches would be useful for sure ... but it's
> outside the scope of what I'm asking for now, which is just clear
> documentation of what the bot can do already.

Right, but the use case I'm mentioning is a bit different: provide
another patch while *not* triggering the cfbot. I've seen cases in
the past where people want to share some code to the OP and it seems
reasonable to allow that without risking the trigger the cfbot, at
least not without the OP validating or adapting the changes.


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cfbot wrangling (was Re: Add checkpoint and redo LSN to LogCheckpointEnd log message)
Date: 2022-01-12 06:52:00
Message-ID: CA+hUKGL1zcQj0VM_-uxNUOTFv7W0LNjvaV13cYTFSTS_6xH4qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 12, 2022 at 7:37 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Julien Rouhaud <rjuju123(at)gmail(dot)com> writes:
> > On Wed, Jan 12, 2022 at 01:19:22AM -0500, Tom Lane wrote:
> >> 2. You are attaching some random files, and would like to not
> >> displace the cfbot's idea of the latest patchset.
>
> > I'm assuming that someone wanting to send an additional patch to be applied on
> > top of the OP patchset is part of 2?
>
> AFAIK, if you're submitting a patch then you have to attach a complete
> patchset, or the cfbot will be totally lost. Getting the bot to
> understand incremental patches would be useful for sure ... but it's
> outside the scope of what I'm asking for now, which is just clear
> documentation of what the bot can do already.

By way of documentation, I've just now tried to answer these question
in the new FAQ at:

https://wiki.postgresql.org/wiki/Cfbot


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cfbot wrangling (was Re: Add checkpoint and redo LSN to LogCheckpointEnd log message)
Date: 2022-01-12 06:58:40
Message-ID: 2953440.1641970720@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> By way of documentation, I've just now tried to answer these question
> in the new FAQ at:
> https://wiki.postgresql.org/wiki/Cfbot

Thanks!

regards, tom lane


From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cfbot wrangling (was Re: Add checkpoint and redo LSN to LogCheckpointEnd log message)
Date: 2022-01-12 07:01:40
Message-ID: CAOBaU_ZCTuHOc+o7qxpvWmq8sVpXUH5pk2=R5n7aqVL+MfBM=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 12, 2022 at 2:52 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> By way of documentation, I've just now tried to answer these question
> in the new FAQ at:
>
> https://wiki.postgresql.org/wiki/Cfbot

Great! Thanks a lot!


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-13 06:20:53
Message-ID: CALj2ACUGutu6As-9p76Ta2bV59J97fbiCc9NSzaWXqxPJdMvag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 12, 2022 at 11:39 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Dec 28, 2021 at 10:56 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > attaching v1-0001-XXX from the initial mail again just for the sake of
> > completion:
>
> Unfortunately this breaks the cfbot as it tries to apply this patch
> too: http://cfbot.cputube.org/patch_36_3474.log.
>
> For this kind of situation I think that the usual solution is to use a
> .txt extension to make sure that the cfbot won't try to apply it.

Thanks. IMO, the following format of logging is better, so attaching
the v2-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch as
.patch

2021-12-28 02:52:24.464 UTC [2394396] LOG: checkpoint completed at
location=0/212FFC8 with REDO start location=0/212FF90: wrote 451
buffers (2.8%); 0 WAL file(s) added, 0 removed, 1 recycled;
write=0.012 s, sync=0.032 s, total=0.071 s; sync files=6,
longest=0.022 s, average=0.006 s; distance=6272 kB, estimate=6272 kB

Others are attached as .txt files.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v2-0002-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.txt text/plain 2.5 KB
v2-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.5 KB
v2-0003-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.txt text/plain 2.5 KB

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-14 13:22:45
Message-ID: CALj2ACVM4wjS+wiNZf4ssdNMhSR-XUhR99fYbV=pC9_-=5wCtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 13, 2022 at 11:50 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> Thanks. IMO, the following format of logging is better, so attaching
> the v2-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch as
> .patch
>
> 2021-12-28 02:52:24.464 UTC [2394396] LOG: checkpoint completed at
> location=0/212FFC8 with REDO start location=0/212FF90: wrote 451
> buffers (2.8%); 0 WAL file(s) added, 0 removed, 1 recycled;
> write=0.012 s, sync=0.032 s, total=0.071 s; sync files=6,
> longest=0.022 s, average=0.006 s; distance=6272 kB, estimate=6272 kB

One of the test cases was failing with the above style of the log
message, changing "checkpoint complete" to "checkpoint completed at
location" doesn't seem to be a better idea. It looks like the users or
the log monitoring tools might be using the same text "checkpoint
complete", therefore I don't want to break that. Here's the v3 patch
that I think will work better. Please review.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v3-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/x-patch 2.5 KB

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-20 00:36:32
Message-ID: BDAE2BB3-2F37-4B7C-A053-874250A47C21@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> It seems to me "LSN" or just "location" is more confusing or
> mysterious than "REDO LSN" for the average user. If we want to avoid
> being technically too detailed, we would use just "start LSN=%X/%X,
> end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..

My first instinct was that this should stay aligned with
pg_controldata, but that would mean using "location=%X/%X, REDO
location=%X/%X," which doesn't seem terribly descriptive. IIUC the
"checkpoint location" is the LSN of the WAL record for the checkpoint,
and the "checkpoint's REDO location" is the LSN where checkpoint
creation began (i.e., what you must retain for crash recovery). My
vote is for "start=%X/%X, end=%X/%X."

Nathan


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bossartn(at)amazon(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-20 03:00:29
Message-ID: 20220120.120029.1092463916382268628.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Thu, 20 Jan 2022 00:36:32 +0000, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote in
> On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > It seems to me "LSN" or just "location" is more confusing or
> > mysterious than "REDO LSN" for the average user. If we want to avoid
> > being technically too detailed, we would use just "start LSN=%X/%X,
> > end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..
>
> My first instinct was that this should stay aligned with
> pg_controldata, but that would mean using "location=%X/%X, REDO
> location=%X/%X," which doesn't seem terribly descriptive. IIUC the
> "checkpoint location" is the LSN of the WAL record for the checkpoint,
> and the "checkpoint's REDO location" is the LSN where checkpoint
> creation began (i.e., what you must retain for crash recovery). My
> vote is for "start=%X/%X, end=%X/%X."

+1. Works for me. %X/%X itself expresses it is an LSN.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-27 15:07:37
Message-ID: CALj2ACUfLrN8XGn0ZNrAou5eoDwMnFGnDE9rhcs8==jmtSd4WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 20, 2022 at 6:06 AM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>
> On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > It seems to me "LSN" or just "location" is more confusing or
> > mysterious than "REDO LSN" for the average user. If we want to avoid
> > being technically too detailed, we would use just "start LSN=%X/%X,
> > end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..
>
> My first instinct was that this should stay aligned with
> pg_controldata, but that would mean using "location=%X/%X, REDO
> location=%X/%X," which doesn't seem terribly descriptive. IIUC the
> "checkpoint location" is the LSN of the WAL record for the checkpoint,
> and the "checkpoint's REDO location" is the LSN where checkpoint
> creation began (i.e., what you must retain for crash recovery). My
> vote is for "start=%X/%X, end=%X/%X."

I'm still not clear how the REDO location can be treated as a start
LSN? Can someone throw some light one what this checkpoint's REDO
location is?

Regards,
Bharath Rupireddy.


From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-27 23:57:49
Message-ID: 20220127235749.GA558032@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 27, 2022 at 08:37:37PM +0530, Bharath Rupireddy wrote:
> I'm still not clear how the REDO location can be treated as a start
> LSN? Can someone throw some light one what this checkpoint's REDO
> location is?

It's the WAL insert location at the time the checkpoint began (i.e., where
you need to begin replaying WAL from after a crash).

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-28 03:13:36
Message-ID: CALj2ACUaGJQVbQ5Hms_C2tu+6iJX-EWkEdOiXwX1WfrxQX8pHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 20, 2022 at 8:30 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 20 Jan 2022 00:36:32 +0000, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote in
> > On 1/3/22, 5:52 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > It seems to me "LSN" or just "location" is more confusing or
> > > mysterious than "REDO LSN" for the average user. If we want to avoid
> > > being technically too detailed, we would use just "start LSN=%X/%X,
> > > end LSN=%X/%X". And it is equivalent to "WAL range=[%X/%X, %X/%X]"..
> >
> > My first instinct was that this should stay aligned with
> > pg_controldata, but that would mean using "location=%X/%X, REDO
> > location=%X/%X," which doesn't seem terribly descriptive. IIUC the
> > "checkpoint location" is the LSN of the WAL record for the checkpoint,
> > and the "checkpoint's REDO location" is the LSN where checkpoint
> > creation began (i.e., what you must retain for crash recovery). My
> > vote is for "start=%X/%X, end=%X/%X."
>
> +1. Works for me. %X/%X itself expresses it is an LSN.

Thanks for the review. Here's the v4 patch, please have a look.

Let's not attempt to change how pg_controldata (tool and core
functions) emit the start and end LSN as checkpoint_lsn/redo_lsn and
checkpoint location/checkpoint's REDO location.

[1]
2022-01-28 03:06:10.213 UTC [2409486] LOG: checkpoint starting:
immediate force wait
2022-01-28 03:06:10.257 UTC [2409486] LOG: checkpoint complete:
start=0/14D9510, end=0/14D9548; wrote 4 buffers (0.0%); 0 WAL file(s)
added, 0 removed, 0 recycled; write=0.007 s, sync=0.008 s, total=0.044
s; sync files=3, longest=0.005 s, average=0.003 s; distance=0 kB,
estimate=0 kB

2022-01-28 03:06:42.254 UTC [2409486] LOG: checkpoint starting:
immediate force wait
2022-01-28 03:06:42.279 UTC [2409486] LOG: checkpoint complete:
start=0/14DBDB8, end=0/14DBDF0; wrote 2 buffers (0.0%); 0 WAL file(s)
added, 0 removed, 0 recycled; write=0.004 s, sync=0.004 s, total=0.025
s; sync files=2, longest=0.003 s, average=0.002 s; distance=10 kB,
estimate=10 kB

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v4-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.4 KB

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-28 05:46:20
Message-ID: 20220128054620.GA654866@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 28, 2022 at 08:43:36AM +0530, Bharath Rupireddy wrote:
> 2022-01-28 03:06:10.213 UTC [2409486] LOG: checkpoint starting:
> immediate force wait
> 2022-01-28 03:06:10.257 UTC [2409486] LOG: checkpoint complete:
> start=0/14D9510, end=0/14D9548; wrote 4 buffers (0.0%); 0 WAL file(s)
> added, 0 removed, 0 recycled; write=0.007 s, sync=0.008 s, total=0.044
> s; sync files=3, longest=0.005 s, average=0.003 s; distance=0 kB,
> estimate=0 kB
>
> 2022-01-28 03:06:42.254 UTC [2409486] LOG: checkpoint starting:
> immediate force wait
> 2022-01-28 03:06:42.279 UTC [2409486] LOG: checkpoint complete:
> start=0/14DBDB8, end=0/14DBDF0; wrote 2 buffers (0.0%); 0 WAL file(s)
> added, 0 removed, 0 recycled; write=0.004 s, sync=0.004 s, total=0.025
> s; sync files=2, longest=0.003 s, average=0.002 s; distance=10 kB,
> estimate=10 kB

I know I voted for "start=%X/%X, end=%X/%X," but looking at this again, I
wonder if it could be misleading. "start" is the redo location, and "end"
is the location of the checkpoint record, but I could understand why
someone might think that "start" is the location of the previous checkpoint
record and "end" is the redo location of the new one. I think your
original idea of "lsn=%X/%X, redo lsn=%X/%X" could be a good alternative.

Іn any case, this patch is small and otherwise looks reasonable to me, so I
am going to mark it as ready-for-committer.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-28 06:39:28
Message-ID: CALj2ACXWyK9DujhndkuGXb7idjL9wPJfMpzRPpM-DT1LQ6MB=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 28, 2022 at 11:16 AM Nathan Bossart
<nathandbossart(at)gmail(dot)com> wrote:
>
> I know I voted for "start=%X/%X, end=%X/%X," but looking at this again, I
> wonder if it could be misleading. "start" is the redo location, and "end"
> is the location of the checkpoint record, but I could understand why
> someone might think that "start" is the location of the previous checkpoint
> record and "end" is the redo location of the new one. I think your
> original idea of "lsn=%X/%X, redo lsn=%X/%X" could be a good alternative.
>
> Іn any case, this patch is small and otherwise looks reasonable to me, so I
> am going to mark it as ready-for-committer.

Thanks for your review. In summary, we have these options to choose
checkpoint LSN and last REDO LSN:

1) "start=%X/%X, end=%X/%X" (ControlFile->checkPointCopy.redo,
ControlFile->checkPoint)
2) "lsn=%X/%X, redo lsn=%X/%X"
3) "location=%X/%X, REDO location=%X/%X" -- similar to what
pg_controldata and pg_control_checkpoint shows currently.
4) "location=%X/%X, REDO start location=%X/%X"

I will leave that decision to the committer.

Regards,
Bharath Rupireddy.


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-31 15:11:00
Message-ID: 20220131151100.GR10577@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greetings,

* Bharath Rupireddy (bharath(dot)rupireddyforpostgres(at)gmail(dot)com) wrote:
> On Fri, Jan 28, 2022 at 11:16 AM Nathan Bossart
> <nathandbossart(at)gmail(dot)com> wrote:
> > I know I voted for "start=%X/%X, end=%X/%X," but looking at this again, I
> > wonder if it could be misleading. "start" is the redo location, and "end"
> > is the location of the checkpoint record, but I could understand why
> > someone might think that "start" is the location of the previous checkpoint
> > record and "end" is the redo location of the new one. I think your
> > original idea of "lsn=%X/%X, redo lsn=%X/%X" could be a good alternative.
> >
> > Іn any case, this patch is small and otherwise looks reasonable to me, so I
> > am going to mark it as ready-for-committer.
>
> Thanks for your review. In summary, we have these options to choose
> checkpoint LSN and last REDO LSN:
>
> 1) "start=%X/%X, end=%X/%X" (ControlFile->checkPointCopy.redo,
> ControlFile->checkPoint)
> 2) "lsn=%X/%X, redo lsn=%X/%X"
> 3) "location=%X/%X, REDO location=%X/%X" -- similar to what
> pg_controldata and pg_control_checkpoint shows currently.
> 4) "location=%X/%X, REDO start location=%X/%X"
>
> I will leave that decision to the committer.

I'd also vote for #2. Regarding 3 and 4, I'd argue that those should
have been changed when we changed a number of other things from the
generic 'location' to be 'lsn' in system views and functions, and
therefore we should go change those to also specify 'lsn' rather than
just saying 'location'.

Thanks,

Stephen


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-31 18:15:19
Message-ID: CALj2ACU5NZ142kp2Bn-PK1S2G8AEENYVBfD=Cyz=fskQ6Vtf1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 31, 2022 at 8:41 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
> > Thanks for your review. In summary, we have these options to choose
> > checkpoint LSN and last REDO LSN:
> >
> > 1) "start=%X/%X, end=%X/%X" (ControlFile->checkPointCopy.redo,
> > ControlFile->checkPoint)
> > 2) "lsn=%X/%X, redo lsn=%X/%X"
> > 3) "location=%X/%X, REDO location=%X/%X" -- similar to what
> > pg_controldata and pg_control_checkpoint shows currently.
> > 4) "location=%X/%X, REDO start location=%X/%X"
> >
> > I will leave that decision to the committer.
>
> I'd also vote for #2. Regarding 3 and 4, I'd argue that those should
> have been changed when we changed a number of other things from the
> generic 'location' to be 'lsn' in system views and functions, and
> therefore we should go change those to also specify 'lsn' rather than
> just saying 'location'.

Thanks. Here are 2 patches, 0001 for adding checkpoint lsn and redo
lsn in the checkpoint completed message and 0002 for changing the
"location" to LSN in pg_controdata's output. With the 0002,
pg_control_checkpont, pg_controldata and checkpoint completed message
will all be in sync with the checkpoint lsn and redo lsn.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v5-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/x-patch 2.4 KB
v5-0002-change-location-to-lsn-in-pg_controldata.patch application/x-patch 2.6 KB

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-31 18:30:09
Message-ID: 20220131183009.GA660545@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 31, 2022 at 11:45:19PM +0530, Bharath Rupireddy wrote:
> Thanks. Here are 2 patches, 0001 for adding checkpoint lsn and redo
> lsn in the checkpoint completed message and 0002 for changing the
> "location" to LSN in pg_controdata's output. With the 0002,
> pg_control_checkpont, pg_controldata and checkpoint completed message
> will all be in sync with the checkpoint lsn and redo lsn.

I think the pg_controldata change needs some extra spaces for alignment,
but otherwise these patches seem reasonable to me.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-01-31 18:53:10
Message-ID: CALj2ACUaUJGvwYi4ztcc6-o2ju48BcNvayB0OuDt0-1gYHmHRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 1, 2022 at 12:00 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Mon, Jan 31, 2022 at 11:45:19PM +0530, Bharath Rupireddy wrote:
> > Thanks. Here are 2 patches, 0001 for adding checkpoint lsn and redo
> > lsn in the checkpoint completed message and 0002 for changing the
> > "location" to LSN in pg_controdata's output. With the 0002,
> > pg_control_checkpont, pg_controldata and checkpoint completed message
> > will all be in sync with the checkpoint lsn and redo lsn.
>
> I think the pg_controldata change needs some extra spaces for alignment,
> but otherwise these patches seem reasonable to me.

Thanks. My bad it was. Changed in v6.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v6-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.4 KB
v6-0002-change-location-to-lsn-in-pg_controldata.patch application/octet-stream 2.6 KB

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 01:44:35
Message-ID: 20220201014435.GA738413@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg젠 토토SQL :

On Tue, Feb 01, 2022 at 12:23:10AM +0530, Bharath Rupireddy wrote:
> On Tue, Feb 1, 2022 at 12:00 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> I think the pg_controldata change needs some extra spaces for alignment,
>> but otherwise these patches seem reasonable to me.
>
> Thanks. My bad it was. Changed in v6.

LGTM

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 03:40:41
Message-ID: a2fc0da7-70e4-3db0-733c-0abdc1aae4ce@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/02/01 10:44, Nathan Bossart wrote:
> On Tue, Feb 01, 2022 at 12:23:10AM +0530, Bharath Rupireddy wrote:
>> On Tue, Feb 1, 2022 at 12:00 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>>> I think the pg_controldata change needs some extra spaces for alignment,
>>> but otherwise these patches seem reasonable to me.
>>
>> Thanks. My bad it was. Changed in v6.

- (errmsg("restartpoint complete: wrote %d buffers (%.1f%%); "
+ (errmsg("restartpoint complete: lsn=%X/%X, redo lsn=%X/%X; "
+ "wrote %d buffers (%.1f%%); "
"%d WAL file(s) added, %d removed, %d recycled; "
"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
"distance=%d kB, estimate=%d kB",
+ LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo),
+ LSN_FORMAT_ARGS(ControlFile->checkPoint),

The order of arguments for LSN seems wrong. LSN_FORMAT_ARGS(ControlFile->checkPoint) should be specified ahead of LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo)?

Could you tell me why the information for LSN is reported earlierly in the log message? Since ordinally users would be more interested in the information about I/O by checkpoint, the information for LSN should be placed later? Sorry if this was already discussed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 04:01:09
Message-ID: CALj2ACXDHGfhKkwzi81BRqWft4n85AO1zmkeqf6xEg+0xTCH9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 순위SQL

On Tue, Feb 1, 2022 at 9:10 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> The order of arguments for LSN seems wrong. LSN_FORMAT_ARGS(ControlFile->checkPoint) should be specified ahead of LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo)?

Thanks. Corrected.

> Could you tell me why the information for LSN is reported earlierly in the log message? Since ordinally users would be more interested in the information about I/O by checkpoint, the information for LSN should be placed later? Sorry if this was already discussed.

It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN(end) and REDO LSN(start). It gives more context
while analyzing checkpoint-related issues. The pg_controldata gives
the last checkpoint LSN and REDO LSN, but having this info alongside
the log message helps analyze issues that happened previously, connect
the dots and identify the root cause.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v7-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.4 KB
v7-0002-Change-location-to-lsn-in-pg_controldata.patch application/octet-stream 2.6 KB

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 04:19:43
Message-ID: a1de2fd3-4a88-0325-c698-580c044f38b2@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/02/01 13:01, Bharath Rupireddy wrote:
> On Tue, Feb 1, 2022 at 9:10 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>> The order of arguments for LSN seems wrong. LSN_FORMAT_ARGS(ControlFile->checkPoint) should be specified ahead of LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo)?
>
> Thanks. Corrected.

Thanks!

>> Could you tell me why the information for LSN is reported earlierly in the log message? Since ordinally users would be more interested in the information about I/O by checkpoint, the information for LSN should be placed later? Sorry if this was already discussed.
>
> It is useful (for debugging purposes) if the checkpoint end message
> has the checkpoint LSN(end) and REDO LSN(start). It gives more context
> while analyzing checkpoint-related issues. The pg_controldata gives
> the last checkpoint LSN and REDO LSN, but having this info alongside
> the log message helps analyze issues that happened previously, connect
> the dots and identify the root cause.

My previous comment was confusing... Probably I understand why you tried to put this information in checkpoint log message. But I was suggesting to put that information at the end of log message instead of the beginning of it. Because ordinary users would be less interested in this LSN information than other ones like the number of buffers written.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 04:38:04
Message-ID: CALj2ACU9ynpCwfuL0a4Bx0SDbHcjRDk49pwjDwfOzoQ1-kx1pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 1, 2022 at 9:49 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
> My previous comment was confusing... Probably I understand why you tried to put this information in checkpoint log message. But I was suggesting to put that information at the end of log message instead of the beginning of it. Because ordinary users would be less interested in this LSN information than other ones like the number of buffers written.

Actually, there's no strong reason to put LSN info at the beginning of
the message except that LSN/REDO LSN next to the
checkpoint/restartpoint complete would make the users understand the
LSN and REDO LSN belong to the checkpoint/restartpoint. Since this
wasn't a strong reason, I agree to keep it at the end.

Modified in v8.

[1]
2022-02-01 04:34:17.657 UTC [3597073] LOG: checkpoint complete: wrote
21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.004 s, sync=0.008 s, total=0.031 s; sync files=18,
longest=0.006 s, average=0.001 s; distance=77 kB, estimate=77 kB;
lsn=0/14D5AF0, redo lsn=0/14D5AB8

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v8-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.8 KB
v8-0002-Change-location-to-lsn-in-pg_controldata.patch application/octet-stream 2.6 KB

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 06:27:56
Message-ID: 20220201.152756.1157327790387186473.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Tue, 1 Feb 2022 10:08:04 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> On Tue, Feb 1, 2022 at 9:49 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >
> > My previous comment was confusing... Probably I understand why you tried to put this information in checkpoint log message. But I was suggesting to put that information at the end of log message instead of the beginning of it. Because ordinary users would be less interested in this LSN information than other ones like the number of buffers written.
>
> Actually, there's no strong reason to put LSN info at the beginning of
> the message except that LSN/REDO LSN next to the
> checkpoint/restartpoint complete would make the users understand the
> LSN and REDO LSN belong to the checkpoint/restartpoint. Since this
> wasn't a strong reason, I agree to keep it at the end.
>
> Modified in v8.
>
> [1]
> 2022-02-01 04:34:17.657 UTC [3597073] LOG: checkpoint complete: wrote
> 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled;
> write=0.004 s, sync=0.008 s, total=0.031 s; sync files=18,
> longest=0.006 s, average=0.001 s; distance=77 kB, estimate=77 kB;
> lsn=0/14D5AF0, redo lsn=0/14D5AB8

0001 looks good to me.

I tend to agree to 0002.

FWIW, I collected other user-facing usage of "location" as LSN.

xlog.c:5965, 6128:
(errmsg("recovery stopping before WAL location (LSN) \"%X/%X\"",

xlog.c:6718:
(errmsg("control file contains invalid checkpoint location")));

xlog.c:6846:
(errmsg("starting point-in-time recovery to WAL location (LSN) \"%X/%X\"",

xlog.c:6929:
(errmsg("could not find redo location referenced by checkpoint record"),

xlog.c:11298, 11300: (in backup-label)
appendStringInfo(labelfile, "START WAL LOCATION: %X/%X (file %s)\n",
appendStringInfo(labelfile, "CHECKPOINT LOCATION: %X/%X\n",
(and corresponding reader-code)

xlog,c:11791, 11793: (in backup history file)
fprintf(fp, "START WAL LOCATION: %X/%X (file %s)\n",
fprintf(fp, "STOP WAL LOCATION: %X/%X (file %s)\n",
(and corresponding reader-code)

repl_scanner.l:151:
yyerror("invalid streaming start location");

pg_proc.dat:
many function descriptions use "location" as LSN.

pg_waldump.c:768,777,886,938,1029,1071,1083:
printf(_(" -e, --end=RECPTR stop reading at WAL location RECPTR\n"));
printf(_(" -s, --start=RECPTR start reading at WAL location RECPTR\n"));
pg_log_error("could not parse end WAL location \"%s\"",
pg_log_error("could not parse start WAL location \"%s\"",
pg_log_error("start WAL location %X/%X is not inside file \"%s\"",
pg_log_error("end WAL location %X/%X is not inside file \"%s\"",
pg_log_error("no start WAL location given");

pg_basebackup.c:476, 615: (confusing with file/directory path..)
pg_log_error("could not parse write-ahead log location \"%s\"",
pg_log_error("could not parse write-ahead log location \"%s\"",

pg_rewind.c:346:
pg_log_info("servers diverged at WAL location %X/%X on timeline %u",
pg_rewind/timeline.c:82:
pg_log_error("Expected a write-ahead log switchpoint location.");

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 13:03:24
Message-ID: CALj2ACUtZhTb=2ENkF3BQ3wi137uaGi__qzvXC-qFYC0XwjALw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 1, 2022 at 11:58 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > Modified in v8.
>
> 0001 looks good to me.
>
> I tend to agree to 0002.

Thanks.

> FWIW, I collected other user-facing usage of "location" as LSN.
>
> xlog.c:11298, 11300: (in backup-label)
> appendStringInfo(labelfile, "START WAL LOCATION: %X/%X (file %s)\n",
> appendStringInfo(labelfile, "CHECKPOINT LOCATION: %X/%X\n",
> (and corresponding reader-code)
>
> xlog,c:11791, 11793: (in backup history file)
> fprintf(fp, "START WAL LOCATION: %X/%X (file %s)\n",
> fprintf(fp, "STOP WAL LOCATION: %X/%X (file %s)\n",
> (and corresponding reader-code)

I tried to change the "location" to "lsn" in most of the user-facing
messages/text. I refrained from changing the bakup_label file content
(above) as it might break many applications/service layer code and
it's not good for backward compatibility.

Attaching the above changes 0003 (0001 and 0002 remain the same). If
the committer doesn't agree on the text or wording in 0003, I would
like the 0001 and 0002 to be taken here and I can start a new thread
for discussing 0003 separately.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v8-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch application/octet-stream 2.8 KB
v8-0002-Change-location-to-lsn-in-pg_controldata.patch application/octet-stream 2.6 KB
v8-0003-Change-location-to-lsn-in-user-facing-text.patch application/octet-stream 15.1 KB

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 16:09:43
Message-ID: be548399-4c5e-77a7-2a95-fad224f1d24d@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/02/01 22:03, Bharath Rupireddy wrote:
> On Tue, Feb 1, 2022 at 11:58 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>>> Modified in v8.
>>
>> 0001 looks good to me.

I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log message. To avoid duplicated logging of the same information, we should update this code?

ereport((log_checkpoints ? LOG : DEBUG2),
(errmsg("recovery restart point at %X/%X",
LSN_FORMAT_ARGS(lastCheckPoint.redo)),
xtime ? errdetail("Last completed transaction was at log time %s.",
timestamptz_to_str(xtime)) : 0));

This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be different, for example, when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case, which lsn should we report as redo lsn?

+ "lsn=%X/%X, redo lsn=%X/%X",

Originally you proposed to use upper cases for "lsn". But the latest patch uses the lower cases. Why? It seems better to use upper cases, i.e., LSN and REDO LSN because LSN is basically used in other errmsg().

> Attaching the above changes 0003 (0001 and 0002 remain the same). If
> the committer doesn't agree on the text or wording in 0003, I would
> like the 0001 and 0002 to be taken here and I can start a new thread
> for discussing 0003 separately.

Personally I'm ok with 001, but regarding 0002 and 0003 patches, I'm not sure if it's really worth replacing "location" with "lsn" there. BTW, the similar idea was proposed at [1] before, but seems "location" was left as it was.

[1]
https://postgr.es/m/20487.1494514594@sss.pgh.pa.us

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-01 21:05:38
Message-ID: 20220201210538.GX10577@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 순위SQL

Greetings,

* Fujii Masao (masao(dot)fujii(at)oss(dot)nttdata(dot)com) wrote:
> On 2022/02/01 22:03, Bharath Rupireddy wrote:
> >On Tue, Feb 1, 2022 at 11:58 AM Kyotaro Horiguchi
> ><horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >>>Modified in v8.
> >>
> >>0001 looks good to me.
>
> I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log message. To avoid duplicated logging of the same information, we should update this code?
>
> ereport((log_checkpoints ? LOG : DEBUG2),
> (errmsg("recovery restart point at %X/%X",
> LSN_FORMAT_ARGS(lastCheckPoint.redo)),
> xtime ? errdetail("Last completed transaction was at log time %s.",
> timestamptz_to_str(xtime)) : 0));
>
> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be different, for example, when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case, which lsn should we report as redo lsn?
>
> + "lsn=%X/%X, redo lsn=%X/%X",
>
> Originally you proposed to use upper cases for "lsn". But the latest patch uses the lower cases. Why? It seems better to use upper cases, i.e., LSN and REDO LSN because LSN is basically used in other errmsg().

We do use 'lsn=' quite a bit in verify_nbtree.c already and lowercase is
also what's in the various functions and views in the catalog in the
database, of course. I don't see even one usage of "LSN=" in the tree
today. We also use 'lsn %X/%X' in replorigindesc.c, xactdesc.c,
xactdesc.c, tablesync.c, standby.c, parsexlog.c, then 'redo %X/%X' in
xactdesc.c.

xlog.c does have a number of "WAL location (LSN)", along with a bunch of
other usages. logical.c uses both "LSN" and "lsn". worker.c uses
"LSN". slot.c uses "restart_lsn". pg_rewind.c uses "WAL location" while
pg_waldump.c uses, "lsn:", "WAL location", and "WAL record".

Overall, we don't seem to be super consistent, but I'd say that 'lsn='
looks the best, to my eyes anyway, and isn't out of place in the code
base. Lowercase seems to generally be more common overall.

> >Attaching the above changes 0003 (0001 and 0002 remain the same). If
> >the committer doesn't agree on the text or wording in 0003, I would
> >like the 0001 and 0002 to be taken here and I can start a new thread
> >for discussing 0003 separately.
>
> Personally I'm ok with 001, but regarding 0002 and 0003 patches, I'm not sure if it's really worth replacing "location" with "lsn" there. BTW, the similar idea was proposed at [1] before, but seems "location" was left as it was.
>
> [1]
> https://postgr.es/m/20487.1494514594@sss.pgh.pa.us

This discussion strikes me as sufficient reason to make the change, with
the prior comment not really having all that much weight. When we're
actually pretty consistent with one term, having random places where we
are inconsistent leads people to be unsure about which way to go and
then we end up having to have this discussion. Would be great to avoid
having to have it again in the future.

Thanks,

Stephen


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-02 14:46:35
Message-ID: CALj2ACVVMAGDvourutDvtDUOhTqS_yTtfrdCjtP75-RrQU0dFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg범퍼카 토토SQL

On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log message. To avoid duplicated logging of the same information, we should update this code?
>
> ereport((log_checkpoints ? LOG : DEBUG2),
> (errmsg("recovery restart point at %X/%X",
> LSN_FORMAT_ARGS(lastCheckPoint.redo)),
> xtime ? errdetail("Last completed transaction was at log time %s.",
> timestamptz_to_str(xtime)) : 0));
>
> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be different, for example, when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case, which lsn should we report as redo lsn?

Do we ever reach CreateRestartPoint when ControlFile->stat !=
DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
regression tests.

Here's what can happen:
lastCheckPoint.redo is 100 and ControlFile->checkPointCopy.redo is
105, so, "skipping restartpoint, already performed at %X/%X"
LogCheckpointEnd isn't reached
lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100
and ControlFile->state == DB_IN_ARCHIVE_RECOVERY, then the control
file gets updated and LogCheckpointEnd prints the right redo lsn.
lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100
and ControlFile->state != DB_IN_ARCHIVE_RECOVERY, the the control file
doesn't get updated and LogCheckpointEnd just prints the control redo
lsn. Looks like this case is rare given Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); doesn't fail any tests.

I think we should just let LogCheckpointEnd print the redo lsn from
the control file. We can just remove the above errmsg("recovery
restart point at %X/%X" message altogether or just print it only in
the rare scenario, something like below:

if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY)
{
ereport((log_checkpoints ? LOG : DEBUG2),
(errmsg("performed recovery restart point at %X/%X while
the database state is %s",
LSN_FORMAT_ARGS(lastCheckPoint.redo)),
getDBState(ControlFile->state)));
}

And the last commit/abort records's timestamp will always get logged
even before we reach here in the main redo loop (errmsg("last
completed transaction was at log time %s").

Or another way is to just pass the redo lsn to LogCheckpointEnd and
pass the lastCheckPoint.redo in if (ControlFile->state !=
DB_IN_ARCHIVE_RECOVERY) cases or when control file isn't updated but
restart point happened.

Thoughts?

Regards,
Bharath Rupireddy.


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-03 04:59:03
Message-ID: 7bfad665-db9c-0c2a-2604-9f54763c5f9e@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/02/02 23:46, Bharath Rupireddy wrote:
> On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>> I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log message. To avoid duplicated logging of the same information, we should update this code?
>>
>> ereport((log_checkpoints ? LOG : DEBUG2),
>> (errmsg("recovery restart point at %X/%X",
>> LSN_FORMAT_ARGS(lastCheckPoint.redo)),
>> xtime ? errdetail("Last completed transaction was at log time %s.",
>> timestamptz_to_str(xtime)) : 0));
>>
>> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be different, for example, when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case, which lsn should we report as redo lsn?
>
> Do we ever reach CreateRestartPoint when ControlFile->stat !=
> DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
> DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
> regression tests.

ISTM that CreateRestartPoint() can reach the condition ControlFile->state != DB_IN_ARCHIVE_RECOVERY. Please imagine the case where CreateRestartPoint() has already started and calls CheckPointGuts(). If the standby server is promoted while CreateRestartPoint() is flushing the data to disk at CheckPointGuts(), the state would be updated to DB_IN_PRODUCTION and CreateRestartPoint() can see the state != DB_IN_ARCHIVE_RECOVERY later.

As far as I read the code, this case seems to be able to make the server unrecoverable. If this case happens, since pg_control is not updated, pg_control still indicates the REDO LSN of last valid restartpoint. But CreateRestartPoint() seems to delete old WAL files based on its "current" REDO LSN not pg_control's REDO LSN. That is, WAL files required for the crash recovery starting from pg_control's REDO LSN would be removed.

If this understanding is right, to address this issue, probably we need to make CreateRestartPoint() do nothing (return immediately) when the state != DB_IN_ARCHIVE_RECOVERY?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-03 06:50:53
Message-ID: 20220203.155053.170368585238406731.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Thu, 3 Feb 2022 13:59:03 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2022/02/02 23:46, Bharath Rupireddy wrote:
> > On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao
> > <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >> I found that CreateRestartPoint() already reported the redo lsn as
> >> follows after emitting the restartpoint log message. To avoid
> >> duplicated logging of the same information, we should update this
> >> code?
> >>
> >> ereport((log_checkpoints ? LOG : DEBUG2),
> >> (errmsg("recovery restart point at %X/%X",
> >> LSN_FORMAT_ARGS(lastCheckPoint.redo)),
> >> xtime ? errdetail("Last completed transaction was
> >> at log time %s.",
> >> timestamptz_to_str(xtime))
> >> : 0));
> >>
> >> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the
> >> patch, LogCheckpointEnd() reports
> >> ControlFile->checkPointCopy.redo. They may be different, for example,
> >> when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case,
> >> which lsn should we report as redo lsn?
> > Do we ever reach CreateRestartPoint when ControlFile->stat !=
> > DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
> > DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
> > regression tests.
>
> ISTM that CreateRestartPoint() can reach the condition
> ControlFile->state != DB_IN_ARCHIVE_RECOVERY. Please imagine the case
> where CreateRestartPoint() has already started and calls
> CheckPointGuts(). If the standby server is promoted while
> CreateRestartPoint() is flushing the data to disk at CheckPointGuts(),
> the state would be updated to DB_IN_PRODUCTION and
> CreateRestartPoint() can see the state != DB_IN_ARCHIVE_RECOVERY
> later.

By the way a comment there:
> * this is a quick hack to make sure nothing really bad happens if somehow
> * we get here after the end-of-recovery checkpoint.

now looks a bit wrong since now it's normal that a restartpoint ends
after promotion.

> As far as I read the code, this case seems to be able to make the
> server unrecoverable. If this case happens, since pg_control is not
> updated, pg_control still indicates the REDO LSN of last valid
> restartpoint. But CreateRestartPoint() seems to delete old WAL files
> based on its "current" REDO LSN not pg_control's REDO LSN. That is,
> WAL files required for the crash recovery starting from pg_control's
> REDO LSN would be removed.

Seems right. (I didn't confirm the behavior, though )

> If this understanding is right, to address this issue, probably we
> need to make CreateRestartPoint() do nothing (return immediately) when
> the state != DB_IN_ARCHIVE_RECOVERY?

On way to take. In that case should we log something like
"Restartpoint canceled" or something?

By the way, restart point should start only while recoverying, and at
the timeof the start both checkpoint.redo and checkpoint LSN are
already past. We shouldn't update minRecovery point after promotion,
but is there any reason for not updating the checkPoint and
checkPointCopy? If we update them after promotion, the
which-LSN-to-show problem would be gone.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-04 01:59:04
Message-ID: 7de7f720-666c-ea97-8926-7bc4c43f482a@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/02/03 15:50, Kyotaro Horiguchi wrote:
> On way to take. In that case should we log something like
> "Restartpoint canceled" or something?

+1

> By the way, restart point should start only while recoverying, and at
> the timeof the start both checkpoint.redo and checkpoint LSN are
> already past. We shouldn't update minRecovery point after promotion,
> but is there any reason for not updating the checkPoint and
> checkPointCopy? If we update them after promotion, the
> which-LSN-to-show problem would be gone.

I tried to find the reason by reading the past discussion, but have not found that yet.

If we update checkpoint and REDO LSN at pg_control in that case, we also need to update min recovery point at pg_control? Otherwise the min recovery point at pg_control still indicates the old LSN that previous restart point set.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-07 01:16:34
Message-ID: 20220207.101634.205396783432098074.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Fri, 4 Feb 2022 10:59:04 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2022/02/03 15:50, Kyotaro Horiguchi wrote:
> > On way to take. In that case should we log something like
> > "Restartpoint canceled" or something?
>
> +1
>
>
> > By the way, restart point should start only while recoverying, and at
> > the timeof the start both checkpoint.redo and checkpoint LSN are
> > already past. We shouldn't update minRecovery point after promotion,
> > but is there any reason for not updating the checkPoint and
> > checkPointCopy? If we update them after promotion, the
> > which-LSN-to-show problem would be gone.
>
> I tried to find the reason by reading the past discussion, but have
> not found that yet.
>
> If we update checkpoint and REDO LSN at pg_control in that case, we
> also need to update min recovery point at pg_control? Otherwise the
> min recovery point at pg_control still indicates the old LSN that
> previous restart point set.

I had an assuption that the reason I think it shouldn't update
minRecoveryPoint is that it has been or is going to be reset to
invalid LSN by promotion and the checkpoint should refrain from
touching it.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-07 03:02:58
Message-ID: 20220207.120258.310426179780547983.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Mon, 07 Feb 2022 10:16:34 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Fri, 4 Feb 2022 10:59:04 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> > On 2022/02/03 15:50, Kyotaro Horiguchi wrote:
> > > By the way, restart point should start only while recoverying, and at
> > > the timeof the start both checkpoint.redo and checkpoint LSN are
> > > already past. We shouldn't update minRecovery point after promotion,
> > > but is there any reason for not updating the checkPoint and
> > > checkPointCopy? If we update them after promotion, the
> > > which-LSN-to-show problem would be gone.
> >
> > I tried to find the reason by reading the past discussion, but have
> > not found that yet.
> >
> > If we update checkpoint and REDO LSN at pg_control in that case, we
> > also need to update min recovery point at pg_control? Otherwise the
> > min recovery point at pg_control still indicates the old LSN that
> > previous restart point set.
>
> I had an assuption that the reason I think it shouldn't update
> minRecoveryPoint is that it has been or is going to be reset to
> invalid LSN by promotion and the checkpoint should refrain from
> touching it.

Hmm.. It doesn't seem to be the case. If a server crashes just after
promotion and before requesting post-promtion checkpoint,
minRecoveryPoint stays at a valid LSN.

(Promoted at 0/7000028)
Database cluster state: in production
Latest checkpoint location: 0/6000060
Latest checkpoint's REDO location: 0/6000028
Latest checkpoint's REDO WAL file: 000000010000000000000006
Minimum recovery ending location: 0/7000090
Min recovery ending loc's timeline: 2

minRecoveryPoint/TLI are ignored in any case where a server in
in-production state is started. In other words, the values are
useless. There's no clear or written reason for unrecording the last
ongoing restartpoint after promotion.

Before fast-promotion was introduced, we shouldn't get there after
end-of-recovery checkpoint (but somehow reached sometimes?) but it is
quite normal nowadays. Or to the contrary, we're expecting it to
happen and it is regarded as a normal checkponit. So we should do
there nowadays are as the follows.

- If any later checkpoint/restartpoint has been established, just skip
remaining task then return false. (!chkpt_was_latest)
(I'm not sure this can happen, though.)

- we update control file only when archive recovery is still ongoing.

- Otherwise reset minRecoveryPoint then continue.

Do you have any thoughts or opinions?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-07 04:51:31
Message-ID: b3b23cb7-9002-d6a3-2189-fa9ca99b0969@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/02/07 12:02, Kyotaro Horiguchi wrote:
> - If any later checkpoint/restartpoint has been established, just skip
> remaining task then return false. (!chkpt_was_latest)
> (I'm not sure this can happen, though.)
>
> - we update control file only when archive recovery is still ongoing.

This comment seems to conflict with what your PoC patch does. Because with the patch, ControlFile->checkPoint and ControlFile->checkPointCopy seem to be updated even when ControlFile->state != DB_IN_ARCHIVE_RECOVERY.

I agree with what your PoC patch does for now. When we're not in archive recovery state, checkpoint and REDO locations in pg_control should be updated but min recovery point should be reset to invalid one (which instruments that subsequent crash recovery should replay all available WAL files).

> - Otherwise reset minRecoveryPoint then continue.
>
> Do you have any thoughts or opinions?

Regarding chkpt_was_latest, whether the state is DB_IN_ARCHIVE_RECOVERY or not, if checkpoint and redo locations in pg_control are updated, IMO we don't need to skip the "remaining tasks". Since those locations are updated and subsequent crash recovery will start from that redo location, for example, ISTM that we can safely delete old WAL files prior to the redo location as the "remaining tasks". Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-08 07:58:22
Message-ID: 20220208.165822.2155503452120234158.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Mon, 7 Feb 2022 13:51:31 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2022/02/07 12:02, Kyotaro Horiguchi wrote:
> > - If any later checkpoint/restartpoint has been established, just skip
> > remaining task then return false. (!chkpt_was_latest)
> > (I'm not sure this can happen, though.)
> > - we update control file only when archive recovery is still ongoing.
>
> This comment seems to conflict with what your PoC patch does. Because
> with the patch, ControlFile->checkPoint and
> ControlFile->checkPointCopy seem to be updated even when
> ControlFile->state != DB_IN_ARCHIVE_RECOVERY.

Ah, yeah, by "update" I meant that "move forward". Sorry for confusing
wording.

> I agree with what your PoC patch does for now. When we're not in
> archive recovery state, checkpoint and REDO locations in pg_control
> should be updated but min recovery point should be reset to invalid
> one (which instruments that subsequent crash recovery should replay
> all available WAL files).

Yes. All buffers before the last recovery point's end have been
flushed out so the recovery point is valid as a checkpoint. On the
other hand minRecoveryPoint is no longer needed and actually is
ignored at the next crash recovery. We can leave it alone but it is
consistent that it is cleared.

> > - Otherwise reset minRecoveryPoint then continue.
> > Do you have any thoughts or opinions?
>
> Regarding chkpt_was_latest, whether the state is
> DB_IN_ARCHIVE_RECOVERY or not, if checkpoint and redo locations in
> pg_control are updated, IMO we don't need to skip the "remaining
> tasks". Since those locations are updated and subsequent crash
> recovery will start from that redo location, for example, ISTM that we
> can safely delete old WAL files prior to the redo location as the
> "remaining tasks". Thought?

If I read you correctly, the PoC works that way. It updates pg_control
if the restart point is latest then performs the remaining cleanup
tasks in that case. Recovery state doesn't affect this process.

I reexamined about the possibility of concurrent checkpoints.

Both CreateCheckPoint and CreateRestartPoint are called from
checkpointer loop, shutdown handler of checkpointer and standalone
process. So I can't see a possibility of concurrent checkpoints.

In the past we had a time when startup process called CreateCheckPoint
directly in the crash recovery case where checkpoint is not running
but since 7ff23c6d27 checkpoint is started before startup process
starts. So I conclude that that cannot happen.

So the attached takes away the path for the case where the restart
point is overtaken by a concurrent checkpoint.

Thus.. the attached removes the ambiguity of of the proposed patch
about the LSNs in the restartpoint-ending log message.

Thoughts?

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-08 08:40:29
Message-ID: 20220208.174029.1890340465677956009.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mmm.. checkpoint and checkpointer are quite confusing in this context..

At Tue, 08 Feb 2022 16:58:22 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Mon, 7 Feb 2022 13:51:31 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> >
> >
> > On 2022/02/07 12:02, Kyotaro Horiguchi wrote:
> > > - If any later checkpoint/restartpoint has been established, just skip
> > > remaining task then return false. (!chkpt_was_latest)
> > > (I'm not sure this can happen, though.)
> > > - we update control file only when archive recovery is still ongoing.
> >
> > This comment seems to conflict with what your PoC patch does. Because
> > with the patch, ControlFile->checkPoint and
> > ControlFile->checkPointCopy seem to be updated even when
> > ControlFile->state != DB_IN_ARCHIVE_RECOVERY.
>
> Ah, yeah, by "update" I meant that "move forward". Sorry for confusing
> wording.

Please ignore the "that".

> > I agree with what your PoC patch does for now. When we're not in
> > archive recovery state, checkpoint and REDO locations in pg_control
> > should be updated but min recovery point should be reset to invalid
> > one (which instruments that subsequent crash recovery should replay
> > all available WAL files).
>
> Yes. All buffers before the last recovery point's end have been
> flushed out so the recovery point is valid as a checkpoint. On the
> other hand minRecoveryPoint is no longer needed and actually is
> ignored at the next crash recovery. We can leave it alone but it is
> consistent that it is cleared.
>
> > > - Otherwise reset minRecoveryPoint then continue.
> > > Do you have any thoughts or opinions?
> >
> > Regarding chkpt_was_latest, whether the state is
> > DB_IN_ARCHIVE_RECOVERY or not, if checkpoint and redo locations in
> > pg_control are updated, IMO we don't need to skip the "remaining
> > tasks". Since those locations are updated and subsequent crash
> > recovery will start from that redo location, for example, ISTM that we
> > can safely delete old WAL files prior to the redo location as the
> > "remaining tasks". Thought?
>
> If I read you correctly, the PoC works that way. It updates pg_control
> if the restart point is latest then performs the remaining cleanup
> tasks in that case. Recovery state doesn't affect this process.
>
> I reexamined about the possibility of concurrent checkpoints.
>
> Both CreateCheckPoint and CreateRestartPoint are called from
> checkpointer loop, shutdown handler of checkpointer and standalone
> process. So I can't see a possibility of concurrent checkpoints.
>
> In the past we had a time when startup process called CreateCheckPoint

- directly in the crash recovery case where checkpoint is not running
- but since 7ff23c6d27 checkpoint is started before startup process
+ directly in the crash recovery case where checkpointer is not running
+ but since 7ff23c6d27 checkpointer is launched before startup process

> starts. So I conclude that that cannot happen.
>
> So the attached takes away the path for the case where the restart
> point is overtaken by a concurrent checkpoint.
>
> Thus.. the attached removes the ambiguity of of the proposed patch
> about the LSNs in the restartpoint-ending log message.

Thoughts?

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-08 09:03:01
Message-ID: CALj2ACXkuMYfa1_RMtVyD+F8rg1F8fuPbku7wSyzdK79RvgdZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 8, 2022 at 2:10 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > Thus.. the attached removes the ambiguity of of the proposed patch
> > about the LSNs in the restartpoint-ending log message.
>
> Thoughts?

Thanks for the patch. I have few comments on the
v1-0001-Get-rid-of-unused-path-to-handle-concurrent-check.patch

1) Can we have this Assert right after "skipping restartpoint, already
performed at %X/%X" error message block? Does it make any difference?
My point is that if at all, we were to assert this, why can't we do it
before CheckPointGuts?
+ /* We mustn't have a concurrent checkpoint that advances checkpoint LSN */
+ Assert(lastCheckPoint.redo > ControlFile->checkPointCopy.redo);
+
2) Related to the above Assert, do we really need an assertion or a FATAL error?
3) Let's be consistent with "crash recovery" - replace
"archive-recovery" with "archive recovery"?
+ * We have exited from archive-recovery mode after this restartpoint
+ * started. Crash recovery ever after should always recover to the end
4) Isn't it enough to say "Crash recovery should always recover to the
end of WAL."?
+ * started. Crash recovery ever after should always recover to the end
5) Is there a reliable test case covering this code? Please point me
if the test case is shared upthread somewhere.
6) So, with this patch, the v8 patch-set posted at [1] doesn't need
any changes IIUC. If that's the case, please feel free to post all the
patches together such that they get tested in cfbot.

[1] - /message-id/CALj2ACUtZhTb%3D2ENkF3BQ3wi137uaGi__qzvXC-qFYC0XwjALw%40mail.gmail.com

Regards,
Bharath Rupireddy.


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-09 02:52:04
Message-ID: 20220209.115204.1794224638476710282.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Bharath.

At Tue, 8 Feb 2022 14:33:01 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> On Tue, Feb 8, 2022 at 2:10 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > Thus.. the attached removes the ambiguity of of the proposed patch
> > > about the LSNs in the restartpoint-ending log message.
> >
> > Thoughts?
>
> Thanks for the patch. I have few comments on the
> v1-0001-Get-rid-of-unused-path-to-handle-concurrent-check.patch

Thanks for checking this!

> 1) Can we have this Assert right after "skipping restartpoint, already
> performed at %X/%X" error message block? Does it make any difference?
> My point is that if at all, we were to assert this, why can't we do it
> before CheckPointGuts?
> + /* We mustn't have a concurrent checkpoint that advances checkpoint LSN */
> + Assert(lastCheckPoint.redo > ControlFile->checkPointCopy.redo);

No. The assertion checks if something wrong has happend while
CheckPointGuts - which may take a long time - is running. If we need
to do that, it should be after CheckPointGuts. (However, I removed it
finally. See below.)

> 2) Related to the above Assert, do we really need an assertion or a FATAL error?

It's just to make sure in case where that happens by any chance in
future. But on second thought, as I mentioned, CreateRestartPoint is
called from standalone process or from checkpointer. I added that
assertion at the beginning of CreateRestartPoint. I think that
assertion is logically equal to the old assertion.

I remember that I felt uncomfortable with the lock-less behavior on
ControlFile, which makes the code a bit complex to read. So I moved
"PriorRedoPtr = " line to within the lock section just below.

Addition to that, I feel being confused by the parallel-use of
lastCheckPoint.redo and RedoRecPtr So I replaced lastCheckPoint.redo
after assiging the former to the latter with RedoRecPtr.

> 3) Let's be consistent with "crash recovery" - replace
> "archive-recovery" with "archive recovery"?
> + * We have exited from archive-recovery mode after this restartpoint
> + * started. Crash recovery ever after should always recover to the end

That's sensible. I found several existing use of archive-recovery in
xlog.c and a few other files but the fix for them is separated as
another patch (0005).

> 4) Isn't it enough to say "Crash recovery should always recover to the
> end of WAL."?
> + * started. Crash recovery ever after should always recover to the end

I think we need to explicitly say something like "we have exited
archive recovery while *this* restartpoint is running". I simplified
the sentence as the follows.

+ * Archive recovery have ended. Crash recovery ever after should
+ * always recover to the end of WAL.

> 5) Is there a reliable test case covering this code? Please point me
> if the test case is shared upthread somewhere.

Nothing. Looking from the opposite side, the existing code for the
competing restartpoint/checkpoint case should have not been exercised
at least for these several major versions. Instead, I added an
assertion at the beginning of CreateRestartPoint that asserts that
"this should be called only under standalone process or from
checkpointer.". If that assert doesn't fire while the whole test, it
should be the proof of the premise for this patch is correct.

> 6) So, with this patch, the v8 patch-set posted at [1] doesn't need
> any changes IIUC. If that's the case, please feel free to post all the
> patches together such that they get tested in cfbot.

The two are different fixes so I don't think they are ought to be
merged together.

> [1] - /message-id/CALj2ACUtZhTb%3D2ENkF3BQ3wi137uaGi__qzvXC-qFYC0XwjALw%40mail.gmail.com

As the old 0001, though I said it'fine :p) I added a comment that
reading ControlFile->checkPoint* is safe here.

The old 0002 (attached 0003) looks file to me.

The old 0003 (attached 0004):

+++ b/src/backend/access/rmgrdesc/xlogdesc.c
- appendStringInfo(buf, "redo %X/%X; "
+ appendStringInfo(buf, "redo lsn %X/%X; "

It is shown in the context of a checkpoint record, so I think it is
not needed or rather lengthning the dump line uselessly.

+++ b/src/backend/access/transam/xlog.c
- (errmsg("request to flush past end of generated WAL; request %X/%X, current position %X/%X",
+ (errmsg("request to flush past end of generated WAL; request lsn %X/%X, current lsn %X/%X",

+++ b/src/backend/replication/walsender.c
- (errmsg("requested starting point %X/%X is ahead of the WAL flush position of this server %X/%X",
+ (errmsg("requested starting point %X/%X is ahead of the WAL flush LSN of this server %X/%X",

"WAL" is upper-cased. So it seems rather strange that the "lsn" is
lower-cased. In the first place the message doesn't look like a
user-facing error message and I feel we don't need position or lsn
there..

+++ b/src/bin/pg_rewind/pg_rewind.c
- pg_log_info("servers diverged at WAL location %X/%X on timeline %u",
+ pg_log_info("servers diverged at WAL LSN %X/%X on timeline %u",

I feel that we don't need "WAL" there.

+++ b/src/bin/pg_waldump/pg_waldump.c
- printf(_(" -e, --end=RECPTR stop reading at WAL location RECPTR\n"));
+ printf(_(" -e, --end=RECPTR stop reading at WAL LSN RECPTR\n"));

Mmm.. "WAL LSN RECPTR" looks strange to me. In the first place I
don't think "RECPTR" is a user-facing term. Doesn't something like the
follows work?

+ printf(_(" -e, --end=WAL-LSN stop reading at WAL-LSN\n"));

In some changes in this patch shorten the main message text of
fprintf-ish functions. That makes the succeeding parameters can be
inlined.

0001: The fix of CreateRestartPoint
0002: Add LSNs to checkpoint logs (the main patch)
0003: Change "location" to "LSN" of pg_controldata
0004: The same as 0003 for other uses of "location".
0005: Unhyphnate "-recovery" terms.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v9-0001-Get-rid-of-unused-path-to-handle-concurrent-check.patch text/x-patch 6.0 KB
v9-0002-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch text/x-patch 3.0 KB
v9-0003-Change-location-to-lsn-in-pg_controldata.patch text/x-patch 2.6 KB
v9-0004-Change-location-to-lsn-in-user-facing-text.patch text/x-patch 15.1 KB
v9-0005-Get-rid-of-uses-of-some-hyphenated-words.patch text/x-patch 4.2 KB

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-10 16:00:03
Message-ID: fa563db8-6f7d-ea8b-1ecf-69be07c099cc@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/02/09 11:52, Kyotaro Horiguchi wrote:
> 0001: The fix of CreateRestartPoint

This patch might be ok for the master branch. But since concurrent checkpoint and restartpoint can happen in v14 or before, we would need another patch based on that assumption, for the backport. How about fixing the bug all the branches at first, then apply this patch in the master to improve the code?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-14 05:40:22
Message-ID: 20220214.144022.2233054833762514893.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Fri, 11 Feb 2022 01:00:03 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2022/02/09 11:52, Kyotaro Horiguchi wrote:
> > 0001: The fix of CreateRestartPoint
>
> This patch might be ok for the master branch. But since concurrent
> checkpoint and restartpoint can happen in v14 or before, we would need
> another patch based on that assumption, for the backport. How about
> fixing the bug all the branches at first, then apply this patch in the
> master to improve the code?

For backbranches, the attached for pg14 does part of the full patch.
Of the following, I think we should do (a) and (b) to make future
backpatchings easier.

a) Use RedoRecPtr and PriorRedoPtr after they are assigned.

b) Move assignment to PriorRedoPtr into the ControlFileLock section.

c) Skip udpate of minRecoveryPoint only when the checkpoint gets old.

d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <=
PriorRedoPtr.

# Mmm. The v9-0001 contains a silly mistake here..

Still I'm not sure whether that case really happens and how checkpoint
behaves *after* that happenes, but at least it protects database from
the possible unrecoverable state due to the known issue here..

It doesn't apply even on pg13 (due to LSN_FORMAT_ARGS). I will make
the per-version patches if you are fine with this.

regards.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v2-0001-Correctly-update-contfol-file-at-the-end-of-archi.patch text/x-patch 6.8 KB

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-14 05:42:18
Message-ID: 20220214.144218.213122978786686156.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Mon, 14 Feb 2022 14:40:22 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> It doesn't apply even on pg13 (due to LSN_FORMAT_ARGS). I will make
> the per-version patches if you are fine with this.

Oops! I forgot to rename the patch to avoid confusion on CF-bots.
I'll resend new version soon to avoid the confusion..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-14 05:52:15
Message-ID: 20220214.145215.1633261010375598220.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Mon, 14 Feb 2022 14:42:18 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> I'll resend new version soon to avoid the confusion..

In this version , 0001 gets one fix and two comment updates.

- * Archive recovery have ended. Crash recovery ever after should always
+ * Archive recovery has ended. Crash recovery ever after should always

- /* the second term is just in case */
- if (PriorRedoPtr != InvalidXLogRecPtr || RedoRecPtr > PriorRedoPtr)
+ /*
+ * Update the average distance between checkpoints/restartpoints if the
+ * prior checkpoint exists. The second term is just in case.
+ */
+ if (PriorRedoPtr != InvalidXLogRecPtr && RedoRecPtr > PriorRedoPtr)
UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v10-0001-Get-rid-of-unused-path-to-handle-concurrent-chec.patch text/x-patch 6.1 KB
v10-0002-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-.patch text/x-patch 3.0 KB
v10-0003-Change-location-to-lsn-in-pg_controldata.patch text/x-patch 2.6 KB
v10-0004-Change-location-to-lsn-in-user-facing-text.patch text/x-patch 15.1 KB
v10-0005-Get-rid-of-uses-of-some-hyphenated-words.patch text/x-patch 4.2 KB

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-21 16:59:45
Message-ID: 01b8ecd8-7d95-1bff-7f8c-fac3d530aab7@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/02/14 14:40, Kyotaro Horiguchi wrote:
> For backbranches, the attached for pg14 does part of the full patch.

Thanks for updating the patch!

> Of the following, I think we should do (a) and (b) to make future
> backpatchings easier.
>
> a) Use RedoRecPtr and PriorRedoPtr after they are assigned.
>
> b) Move assignment to PriorRedoPtr into the ControlFileLock section.

I failed to understand how (a) and (b) can make the backpatching easier. How easy to backpatch seems the same whether we apply (a) and (b) or not...

> c) Skip udpate of minRecoveryPoint only when the checkpoint gets old.

Yes.

> d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <=
> PriorRedoPtr.

But "RedoRecPtr <= PriorRedoPtr" will never happen, will it? Because a restartpoint is skipped at the beginning of CreateRestartPoint() in that case. If this understanding is right, the check of "RedoRecPtr <= PriorRedoPtr" is not necessary before calling UpdateCheckPointDistanceEstimate().

+ ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
+ ControlFile->minRecoveryPointTLI = 0;

Don't we need to update LocalMinRecoveryPoint and LocalMinRecoveryPointTLI after this? Maybe it's not necessary, but ISTM that it's safer and better to always update them whether the state is DB_IN_ARCHIVE_RECOVERY or not.

if (flags & CHECKPOINT_IS_SHUTDOWN)
ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;

Same as above. IMO it's safer and better to always update the state (whether the state is DB_IN_ARCHIVE_RECOVERY or not) if CHECKPOINT_IS_SHUTDOWN flag is passed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-22 08:44:01
Message-ID: 20220222.174401.765586897814316743.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Tue, 22 Feb 2022 01:59:45 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
> > Of the following, I think we should do (a) and (b) to make future
> > backpatchings easier.
> > a) Use RedoRecPtr and PriorRedoPtr after they are assigned.
> > b) Move assignment to PriorRedoPtr into the ControlFileLock section.
>
> I failed to understand how (a) and (b) can make the backpatching
> easier. How easy to backpatch seems the same whether we apply (a) and
> (b) or not...

That premises that the patch applied to master contains (a) and (b).
So if it doesn't, those are not need by older branches.

> > c) Skip udpate of minRecoveryPoint only when the checkpoint gets old.
>
> Yes.
>
>
> > d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <=
> > PriorRedoPtr.
>
> But "RedoRecPtr <= PriorRedoPtr" will never happen, will it? Because a

I didn't believe that it happens. (So, it came from my
convervativeness, or laziness, or both:p) The code dates from 2009 and
StartupXLOG makes a concurrent checkpoint with bgwriter. But as of at
least 9.5, StartupXLOG doesn't directly call CreateCheckPoint. So I
think that won't happen.

So, in short, I agree to remove it or turn it into Assert().

> restartpoint is skipped at the beginning of CreateRestartPoint() in
> that case. If this understanding is right, the check of "RedoRecPtr <=
> PriorRedoPtr" is not necessary before calling
> UpdateCheckPointDistanceEstimate().
>
>
> + ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
> + ControlFile->minRecoveryPointTLI = 0;
>
> Don't we need to update LocalMinRecoveryPoint and
> LocalMinRecoveryPointTLI after this? Maybe it's not necessary, but
> ISTM that it's safer and better to always update them whether the
> state is DB_IN_ARCHIVE_RECOVERY or not.

Agree that it's safer and tidy.

> if (flags & CHECKPOINT_IS_SHUTDOWN)
> ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
>
> Same as above. IMO it's safer and better to always update the state
> (whether the state is DB_IN_ARCHIVE_RECOVERY or not) if
> CHECKPOINT_IS_SHUTDOWN flag is passed.

That means we may exit recovery mode after ShutdownXLOG called
CreateRestartPoint. I don't think that may happen. So I'd rather add
Assert ((flags&CHECKPOINT_IS_SHUTDOWN) == 0) there instaed.

I'll post the new version later.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-25 06:31:12
Message-ID: 20220225.153112.120893179127555595.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Tue, 22 Feb 2022 17:44:01 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Tue, 22 Feb 2022 01:59:45 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> >
> > > Of the following, I think we should do (a) and (b) to make future
> > > backpatchings easier.
> > > a) Use RedoRecPtr and PriorRedoPtr after they are assigned.
> > > b) Move assignment to PriorRedoPtr into the ControlFileLock section.
> >
> > I failed to understand how (a) and (b) can make the backpatching
> > easier. How easy to backpatch seems the same whether we apply (a) and
> > (b) or not...
>
> That premises that the patch applied to master contains (a) and (b).
> So if it doesn't, those are not need by older branches.

I was once going to remove them. But according the discussion below,
the patch for back-patching is now quite close to that for the master
branch. So I left them alone.

> > > d) Skip call to UpdateCheckPointDistanceEstimate() when RedoRecPtr <=
> > > PriorRedoPtr.
> >
> > But "RedoRecPtr <= PriorRedoPtr" will never happen, will it? Because a
>
> I didn't believe that it happens. (So, it came from my
> convervativeness, or laziness, or both:p) The code dates from 2009 and
> StartupXLOG makes a concurrent checkpoint with bgwriter. But as of at
> least 9.5, StartupXLOG doesn't directly call CreateCheckPoint. So I
> think that won't happen.
>
> So, in short, I agree to remove it or turn it into Assert().

It was a bit out of point. If we assume RedoRecPtr is always larger
than PriorRedoPtr and then we don't need to check that there, we
should also remove the "if (PriorRedoPtr < RedoRecPtr)" branch just
above, which means the patch for back-branches gets very close to that
for the master. Do we make such a large change on back branches?
Anyways this version once takes that way.

> > if (flags & CHECKPOINT_IS_SHUTDOWN)
> > ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
> >
> > Same as above. IMO it's safer and better to always update the state
> > (whether the state is DB_IN_ARCHIVE_RECOVERY or not) if
> > CHECKPOINT_IS_SHUTDOWN flag is passed.
>
> That means we may exit recovery mode after ShutdownXLOG called
> CreateRestartPoint. I don't think that may happen. So I'd rather add
> Assert ((flags&CHECKPOINT_IS_SHUTDOWN) == 0) there instaed.

So this version for v14 gets updated in the following points.

Completely removed the code path for the case some other process runs
simultaneous checkpoint.

Removed the condition (RedoRecPtr > PriorRedoPtr) for
UpdateCheckPointDistanceEstimate() call.

Added an assertion to the recoery-end path.

# Honestly I feel this is a bit too much for back-patching, though.

While making patches for v12, I see a test failure of pg_rewind for
uncertain reason. I'm investigating that but I post this for
discussion.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-25 07:06:31
Message-ID: 20220225.160631.285270596121353805.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Fri, 25 Feb 2022 15:31:12 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> While making patches for v12, I see a test failure of pg_rewind for
> uncertain reason. I'm investigating that but I post this for
> discussion.

Hmm. Too stupid. Somehow I overly removed the latchet condition for
minRecoveryPoint. So the same patch worked for v12.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-25 07:47:01
Message-ID: 20220225.164701.2129889180141558946.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Fri, 25 Feb 2022 16:06:31 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Fri, 25 Feb 2022 15:31:12 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > While making patches for v12, I see a test failure of pg_rewind for
> > uncertain reason. I'm investigating that but I post this for
> > discussion.
>
> Hmm. Too stupid. Somehow I overly removed the latchet condition for
> minRecoveryPoint. So the same patch worked for v12.

So, this is the patches for pg12-10. 11 can share the same patch with
12. 10 has differences in two points.

10 has ControlFile->prevCheckPoint.

The DETAILS of the "recovery restart point at" message is not
capitalized. But I suppose it is so close to EOL so that we don't
want to "fix" it risking existing usecases.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-25 07:52:30
Message-ID: 20220225.165230.1749382100044379365.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Fri, 25 Feb 2022 16:47:01 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> So, this is the patches for pg12-10. 11 can share the same patch with
> 12. 10 has differences in two points.
>
> 10 has ControlFile->prevCheckPoint.
>
> The DETAILS of the "recovery restart point at" message is not
> capitalized. But I suppose it is so close to EOL so that we don't
> want to "fix" it risking existing usecases.

Ugh! Wait for a moment. Something's wrong.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-02-25 08:14:12
Message-ID: 20220225.171412.1098540050124370659.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Fri, 25 Feb 2022 16:52:30 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> Ugh! Wait for a moment. Something's wrong.

Sorry, what is wrong was my working directory. It was broken by my
bogus operation. All the files apply corresponding versions correctly.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-03-04 05:10:38
Message-ID: 20220304.141038.514673958958319954.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Mon, 14 Feb 2022 14:52:15 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> In this version , 0001 gets one fix and two comment updates.

While the disucssion on back-patching of 0001 proceeding on another
branch, the main patch iself gets looks like as if rotten on
CF-App. So I rebased v10 on the current master. 0001 is replaced by
an adjusted patch based on the latest "control file update fix" patch.

If someone wants to voice on the message-fix patches (0002-0004), be
our guest. 0005 also wants opinions.

0001: Fix possible incorrect controlfile update that leads to
unrecoverable database.

0002: Add REDO/Checkpiont LSNs to checkpoinkt-end log message.
(The main patch in this thread)

0003: Replace (WAL-)location to LSN in pg_controldata.

0004: Replace (WAL-)location to LSN in user-facing texts.
(This doesn't reflect my recent comments.)

0005: Unhyphenate the word archive-recovery and similars.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v11-0001-Correctly-update-contfol-file-at-the-end-of-arch.patch text/x-patch 5.9 KB
v11-0002-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-.patch text/x-patch 3.0 KB
v11-0003-Change-location-to-lsn-in-pg_controldata.patch text/x-patch 2.6 KB
v11-0004-Change-location-to-lsn-in-user-facing-text.patch text/x-patch 15.5 KB
v11-0005-Get-rid-of-uses-of-some-hyphenated-words.patch text/x-patch 3.1 KB

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-03-15 06:49:47
Message-ID: CALj2ACX7XOuOs1dnk93vtez7ULmiPvU0YvAQ7bJnsqQBrUgGRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 4, 2022 at 10:40 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Mon, 14 Feb 2022 14:52:15 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > In this version , 0001 gets one fix and two comment updates.
>
> While the disucssion on back-patching of 0001 proceeding on another
> branch, the main patch iself gets looks like as if rotten on
> CF-App. So I rebased v10 on the current master. 0001 is replaced by
> an adjusted patch based on the latest "control file update fix" patch.
>
> 0001: Fix possible incorrect controlfile update that leads to
> unrecoverable database.

0001 - I don't think you need to do this as UpdateControlFile
(update_controlfile) will anyway update it, no?
+ /* Update control file using current time */
+ ControlFile->time = (pg_time_t) time(NULL);

> 0002: Add REDO/Checkpiont LSNs to checkpoinkt-end log message.
> (The main patch in this thread)

0002 - If at all the intention is to say that no ControlFileLock is
required while reading ControlFile->checkPoint and
ControlFile->checkPointCopy.redo, let's say it, no? How about
something like "No ControlFileLock is required while reading
ControlFile->checkPoint and ControlFile->checkPointCopy.redo as there
can't be any other process updating them concurrently."?

+ /* we are the only updator of these variables */
+ LSN_FORMAT_ARGS(ControlFile->checkPoint),
+ LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));

> 0003: Replace (WAL-)location to LSN in pg_controldata.
>
> 0004: Replace (WAL-)location to LSN in user-facing texts.
> (This doesn't reflect my recent comments.)

If you don't mind, can you please put the comments here?

> 0005: Unhyphenate the word archive-recovery and similars.

0005 - How about replacing "crash-recovery" to "crash recovery" in
postgres-ref.sgml too?

Regards,
Bharath Rupireddy.


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-03-15 08:23:40
Message-ID: 20220315.172340.1059971522574284501.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Tue, 15 Mar 2022 12:19:47 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> On Fri, Mar 4, 2022 at 10:40 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> 0001 - I don't think you need to do this as UpdateControlFile
> (update_controlfile) will anyway update it, no?
> + /* Update control file using current time */
> + ControlFile->time = (pg_time_t) time(NULL);

Ugh.. Yes. It is a copy-pasto from older versions. They may have the
same copy-pasto..

> > 0002: Add REDO/Checkpiont LSNs to checkpoinkt-end log message.
> > (The main patch in this thread)
>
> 0002 - If at all the intention is to say that no ControlFileLock is
> required while reading ControlFile->checkPoint and
> ControlFile->checkPointCopy.redo, let's say it, no? How about
> something like "No ControlFileLock is required while reading
> ControlFile->checkPoint and ControlFile->checkPointCopy.redo as there
> can't be any other process updating them concurrently."?
>
> + /* we are the only updator of these variables */
> + LSN_FORMAT_ARGS(ControlFile->checkPoint),
> + LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));

I thought the comment explains that. But it would be better to be more
specific. It is changed as the follows.

> * ControlFileLock is not required as we are the only
> * updator of these variables.

> > 0003: Replace (WAL-)location to LSN in pg_controldata.
> >
> > 0004: Replace (WAL-)location to LSN in user-facing texts.
> > (This doesn't reflect my recent comments.)
>
> If you don't mind, can you please put the comments here?

Okay. It's the following message.

/message-id/20220209.115204.1794224638476710282.horikyota.ntt@gmail.com

> The old 0003 (attached 0004):
>
>
>
> +++ b/src/backend/access/rmgrdesc/xlogdesc.c
> - appendStringInfo(buf, "redo %X/%X; "
> + appendStringInfo(buf, "redo lsn %X/%X; "
>
>
>
> It is shown in the context of a checkpoint record, so I think it is
> not needed or rather lengthning the dump line uselessly.
>
>
>
> +++ b/src/backend/access/transam/xlog.c
> - (errmsg("request to flush past end of generated WAL; request %X/%X, current position %X/%X",
> + (errmsg("request to flush past end of generated WAL; request lsn %X/%X, current lsn %X/%X",
>
>
>
> +++ b/src/backend/replication/walsender.c
> - (errmsg("requested starting point %X/%X is ahead of the WAL flush position of this server %X/%X",
> + (errmsg("requested starting point %X/%X is ahead of the WAL flush LSN of this server %X/%X",
>
>
>
> "WAL" is upper-cased. So it seems rather strange that the "lsn" is
> lower-cased. In the first place the message doesn't look like a
> user-facing error message and I feel we don't need position or lsn
> there..
>
>
>
> +++ b/src/bin/pg_rewind/pg_rewind.c
> - pg_log_info("servers diverged at WAL location %X/%X on timeline %u",
> + pg_log_info("servers diverged at WAL LSN %X/%X on timeline %u",
>
>
>
> I feel that we don't need "WAL" there.
>
>
>
> +++ b/src/bin/pg_waldump/pg_waldump.c
> - printf(_(" -e, --end=RECPTR stop reading at WAL location RECPTR\n"));
> + printf(_(" -e, --end=RECPTR stop reading at WAL LSN RECPTR\n"));
>
>
>
> Mmm.. "WAL LSN RECPTR" looks strange to me. In the first place I
> don't think "RECPTR" is a user-facing term. Doesn't something like the
> follows work?
>
>
>
> + printf(_(" -e, --end=WAL-LSN stop reading at WAL-LSN\n"));
>
>
>
> In some changes in this patch shorten the main message text of
> fprintf-ish functions. That makes the succeeding parameters can be
> inlined.

> > 0005: Unhyphenate the word archive-recovery and similars.
>
> 0005 - How about replacing "crash-recovery" to "crash recovery" in
> postgres-ref.sgml too?

Oh, that's a left-over. Fixed. Thanks!

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v12-0001-Correctly-update-contfol-file-at-the-end-of-arch.patch text/x-patch 5.8 KB
v12-0002-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-.patch text/x-patch 3.1 KB
v12-0003-Change-location-to-lsn-in-pg_controldata.patch text/x-patch 2.6 KB
v12-0004-Change-location-to-lsn-in-user-facing-text.patch text/x-patch 15.5 KB
v12-0005-Get-rid-of-uses-of-some-hyphenated-words.patch text/x-patch 3.8 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-03-15 09:26:26
Message-ID: YjBbwiZSRsGzgbcW@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 15, 2022 at 05:23:40PM +0900, Kyotaro Horiguchi wrote:
> At Tue, 15 Mar 2022 12:19:47 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
>> On Fri, Mar 4, 2022 at 10:40 AM Kyotaro Horiguchi
>> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>> 0001 - I don't think you need to do this as UpdateControlFile
>> (update_controlfile) will anyway update it, no?
>> + /* Update control file using current time */
>> + ControlFile->time = (pg_time_t) time(NULL);
>
> Ugh.. Yes. It is a copy-pasto from older versions. They may have the
> same copy-pasto..

This thread has shifted to an entirely different discussion,
presenting patches that touch code paths unrelated to what was first
stated. Shouldn't you create a new thread with a proper $subject to
attract a more correct audience?
--
Michael


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-03-16 00:19:13
Message-ID: 20220316.091913.806120467943749797.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Tue, 15 Mar 2022 18:26:26 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Tue, Mar 15, 2022 at 05:23:40PM +0900, Kyotaro Horiguchi wrote:
> > At Tue, 15 Mar 2022 12:19:47 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> >> On Fri, Mar 4, 2022 at 10:40 AM Kyotaro Horiguchi
> >> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >> 0001 - I don't think you need to do this as UpdateControlFile
> >> (update_controlfile) will anyway update it, no?
> >> + /* Update control file using current time */
> >> + ControlFile->time = (pg_time_t) time(NULL);
> >
> > Ugh.. Yes. It is a copy-pasto from older versions. They may have the
> > same copy-pasto..
>
> This thread has shifted to an entirely different discussion,
> presenting patches that touch code paths unrelated to what was first
> stated. Shouldn't you create a new thread with a proper $subject to
> attract a more correct audience?

I felt the same since some messages ago. I thought Fujii-san thought
that he wants to fix the CreateRestartPoint issue before the
checkpoint log patch but he looks like busy these days.

Since the CreateRestartPoint issue is orthogonal to the main patch,
I'll separate that part into anther thread.

This thread is discussing one other topic, wordings in user-facing
texts. This is also orthogonal (3-dimentionally?) to the two topics.

In short, I split out the two topics other than checkpoint log to
other threads.

Thanks for cueing me to do that!

regareds.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-03-16 01:29:47
Message-ID: 20220316.102947.378283819592596734.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Wed, 16 Mar 2022 09:19:13 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> In short, I split out the two topics other than checkpoint log to
> other threads.

So, this is about the main topic of this thread, adding LSNs to
checkpint log. Other topics have moved to other treads [1], [2] ,
[3].

I think this is no longer controversial alone. So this patch is now
really Read-for-Commiter and is waiting to be picked up.

regards.

[1] /message-id/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com
[2] /message-id/20220316.102900.2003692961119672246.horikyota.ntt%40gmail.com
[3] /message-id/20220316.102509.785466054344164656.horikyota.ntt%40gmail.com

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v13-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-.patch text/x-patch 3.1 KB

From: Greg Stark <stark(at)mit(dot)edu>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, nathandbossart(at)gmail(dot)com, Stephen Frost <sfrost(at)snowman(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-03-28 19:31:16
Message-ID: CAM-w4HP6UquxPZuWr_Yh0xh+1KaUAwPnLe6NSYFx-=RxgLWC4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This patch is currently showing up with a test failure in the CFBot
however I do *not* believe this is a bug in the patch. I think it's a
bug in that test which is being discussed elsewhere.

It's also a very short and straightforward patch that a committer
could probably make a decision about whether it's a good idea or not
and then apply it quickly if so.

Just to give people a leg up and an idea how short the patch is...
Here's the entire patch:

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index ed16f279b1..b85c76d8f8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6121,7 +6121,8 @@ LogCheckpointEnd(bool restartpoint)
"%d WAL file(s) added, %d removed, %d recycled; "
"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
- "distance=%d kB, estimate=%d kB",
+ "distance=%d kB, estimate=%d kB; "
+ "lsn=%X/%X, redo lsn=%X/%X",
CheckpointStats.ckpt_bufs_written,
(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
CheckpointStats.ckpt_segs_added,
@@ -6134,14 +6135,21 @@ LogCheckpointEnd(bool restartpoint)
longest_msecs / 1000, (int) (longest_msecs % 1000),
average_msecs / 1000, (int) (average_msecs % 1000),
(int) (PrevCheckPointDistance / 1024.0),
- (int) (CheckPointDistanceEstimate / 1024.0))));
+ (int) (CheckPointDistanceEstimate / 1024.0),
+ /*
+ * ControlFileLock is not required as we are the only
+ * updator of these variables.
+ */
+ LSN_FORMAT_ARGS(ControlFile->checkPoint),
+ LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
else
ereport(LOG,
(errmsg("checkpoint complete: wrote %d buffers (%.1f%%); "
"%d WAL file(s) added, %d removed, %d recycled; "
"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
- "distance=%d kB, estimate=%d kB",
+ "distance=%d kB, estimate=%d kB; "
+ "lsn=%X/%X, redo lsn=%X/%X",
CheckpointStats.ckpt_bufs_written,
(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
CheckpointStats.ckpt_segs_added,
@@ -6154,7 +6162,13 @@ LogCheckpointEnd(bool restartpoint)
longest_msecs / 1000, (int) (longest_msecs % 1000),
average_msecs / 1000, (int) (average_msecs % 1000),
(int) (PrevCheckPointDistance / 1024.0),
- (int) (CheckPointDistanceEstimate / 1024.0))));
+ (int) (CheckPointDistanceEstimate / 1024.0),
+ /*
+ * ControlFileLock is not required as we are the only
+ * updator of these variables.
+ */
+ LSN_FORMAT_ARGS(ControlFile->checkPoint),
+ LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
}

/*
--
2.27.0


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, michael(at)paquier(dot)xyz
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-07-06 16:11:33
Message-ID: 9a762503-79c9-e7aa-191a-9bfa0d77cc38@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/03/16 10:29, Kyotaro Horiguchi wrote:
> At Wed, 16 Mar 2022 09:19:13 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
>> In short, I split out the two topics other than checkpoint log to
>> other threads.
>
> So, this is about the main topic of this thread, adding LSNs to
> checkpint log. Other topics have moved to other treads [1], [2] ,
> [3].
>
> I think this is no longer controversial alone. So this patch is now
> really Read-for-Commiter and is waiting to be picked up.

+1

+ * ControlFileLock is not required as we are the only
+ * updator of these variables.

Isn't it better to add "at this time" or something at the end of the comment because only we're not always updator of them? No?

Barring any objection, I'm thinking to apply the above small change and commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: michael(at)paquier(dot)xyz, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-07-07 07:26:59
Message-ID: 20220707.162659.422073547959031915.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At Thu, 7 Jul 2022 01:11:33 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2022/03/16 10:29, Kyotaro Horiguchi wrote:
> > At Wed, 16 Mar 2022 09:19:13 +0900 (JST), Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> >> In short, I split out the two topics other than checkpoint log to
> >> other threads.
> > So, this is about the main topic of this thread, adding LSNs to
> > checkpint log. Other topics have moved to other treads [1], [2] ,
> > [3].
> > I think this is no longer controversial alone. So this patch is now
> > really Read-for-Commiter and is waiting to be picked up.
>
> +1
>
> + * ControlFileLock is not required as we are the only
> + * updator of these variables.
>
> Isn't it better to add "at this time" or something at the end of the
> comment because only we're not always updator of them? No?

Excluding initialization, (I believe) checkpointer is really the only
updator of the variables/members. But I'm fine with the addition.

> Barring any objection, I'm thinking to apply the above small change
> and commit the patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: michael(at)paquier(dot)xyz, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, nathandbossart(at)gmail(dot)com, sfrost(at)snowman(dot)net, bossartn(at)amazon(dot)com, rjuju123(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Date: 2022-07-07 13:45:39
Message-ID: 157f5f76-8337-dc29-8e3c-ea59fb56b823@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2022/07/07 16:26, Kyotaro Horiguchi wrote:
>> + * ControlFileLock is not required as we are the only
>> + * updator of these variables.
>>
>> Isn't it better to add "at this time" or something at the end of the
>> comment because only we're not always updator of them? No?
>
> Excluding initialization, (I believe) checkpointer is really the only
> updator of the variables/members. But I'm fine with the addition.

Ok, so I modified the patch slightly and pushed it. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION