Lists: | pgsql-bugs |
---|
From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | xu(dot)xw2008(at)163(dot)com |
Subject: | BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-19 10:26:13 |
Message-ID: | 17903-4288d439dee856c6@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 17903
Logged by: xu xingwang
Email address: xu(dot)xw2008(at)163(dot)com
PostgreSQL version: 13.3
Operating system: openEuler
Description:
Hi,
I found that KeepLogSeg() has a piece of code that is not correctly.
segno may be larger than currSegNo, since the slot_keep_segs variable is of
type "uint64", in this case the code "if (currSegNo - segno >
slot_keep_segs)" is incorrect.
"if (currSegNo - segno < keep_segs)" is also the same.
Checkpoint calls the KeepLogSeg function, and there are many operations
between recptr and XLogGetReplicationSlotMinimumLSN, including updating the
pg_control file, so segno may be larger than currSegNo.
KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
{
XLogSegNo currSegNo;
XLogSegNo segno;
XLogRecPtr keep;
XLByteToSeg(recptr, currSegNo, wal_segment_size);
segno = currSegNo;
/*
* Calculate how many segments are kept by slots first, adjusting for
* max_slot_wal_keep_size.
*/
keep = XLogGetReplicationSlotMinimumLSN();
if (keep != InvalidXLogRecPtr)
{
XLByteToSeg(keep, segno, wal_segment_size);
/* Cap by max_slot_wal_keep_size ... */
if (max_slot_wal_keep_size_mb >= 0)
{
uint64 slot_keep_segs;
slot_keep_segs =
ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
if (currSegNo - segno > slot_keep_segs)
segno = currSegNo - slot_keep_segs;
}
}
/* but, keep at least wal_keep_size if that's set */
if (wal_keep_size_mb > 0)
{
uint64 keep_segs;
keep_segs = ConvertToXSegs(wal_keep_size_mb, wal_segment_size);
if (currSegNo - segno < keep_segs)
{
/* avoid underflow, don't go below 1 */
if (currSegNo <= keep_segs)
segno = 1;
else
segno = currSegNo - keep_segs;
}
}
regards.
--
xu xingwang
From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-20 03:04:17 |
Message-ID: | 20230420.120417.1609083651022565895.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
At Wed, 19 Apr 2023 10:26:13 +0000, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote in
> I found that KeepLogSeg() has a piece of code that is not correctly.
>
> segno may be larger than currSegNo, since the slot_keep_segs variable is of
> type "uint64", in this case the code "if (currSegNo - segno >
> slot_keep_segs)" is incorrect.
>
> "if (currSegNo - segno < keep_segs)" is also the same.
>
> Checkpoint calls the KeepLogSeg function, and there are many operations
> between recptr and XLogGetReplicationSlotMinimumLSN, including updating the
> pg_control file, so segno may be larger than currSegNo.
Correct. Thanks for the report.
If checkpointer somehow takes a long time between inserting a
checkpoint record and removing WAL files, while replication advances a
certain distnace, it can actually happen. Although that behavior
doesn't directly affect max_slot_wal_keep_size, it does disrupt the
effect of wal_keep_size.
The thinko was that we incorrectly assumed the slot minimum LSN can't
be larger than the checkpoint record LSN. We don't need to consider
max_slot_wal_keep_size if the slot minimum LSN is already larger than
currSegNo.
The attached fix works. However, I can't come up with a reasonable
testing script.
This dates back to 13, where max_slot_wal_keep_size was introduced.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-incorrect-calculation-regarding-max_slot_wal_kee.patch | text/x-patch | 1.2 KB |
From: | Junwang Zhao <zhjwpku(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-20 07:40:14 |
Message-ID: | CAEG8a3L2BB5k9USO4fC_wQjULGX-1rqaX2JSHn1_0vco8KB28Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
If `segno` can be larger than `currSegNo`, your patch seems to miss
the following branch,
are you missing this for some particular reason?
```
/* but, keep at least wal_keep_size if that's set */
if (wal_keep_size_mb > 0)
{
uint64 keep_segs;
keep_segs = ConvertToXSegs(wal_keep_size_mb, wal_segment_size);
if (currSegNo - segno < keep_segs) <<<< see here
{
/* avoid underflow, don't go below 1 */
if (currSegNo <= keep_segs)
segno = 1;
else
segno = currSegNo - keep_segs;
}
}
```
On Thu, Apr 20, 2023 at 11:04 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Wed, 19 Apr 2023 10:26:13 +0000, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote in
> > I found that KeepLogSeg() has a piece of code that is not correctly.
> >
> > segno may be larger than currSegNo, since the slot_keep_segs variable is of
> > type "uint64", in this case the code "if (currSegNo - segno >
> > slot_keep_segs)" is incorrect.
> >
> > "if (currSegNo - segno < keep_segs)" is also the same.
> >
> > Checkpoint calls the KeepLogSeg function, and there are many operations
> > between recptr and XLogGetReplicationSlotMinimumLSN, including updating the
> > pg_control file, so segno may be larger than currSegNo.
>
> Correct. Thanks for the report.
>
> If checkpointer somehow takes a long time between inserting a
> checkpoint record and removing WAL files, while replication advances a
> certain distnace, it can actually happen. Although that behavior
> doesn't directly affect max_slot_wal_keep_size, it does disrupt the
> effect of wal_keep_size.
>
> The thinko was that we incorrectly assumed the slot minimum LSN can't
> be larger than the checkpoint record LSN. We don't need to consider
> max_slot_wal_keep_size if the slot minimum LSN is already larger than
> currSegNo.
>
> The attached fix works. However, I can't come up with a reasonable
> testing script.
>
> This dates back to 13, where max_slot_wal_keep_size was introduced.
>
> regards.
>
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
--
Regards
Junwang Zhao
From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | zhjwpku(at)gmail(dot)com |
Cc: | xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-20 08:58:14 |
Message-ID: | 20230420.175814.1157027125103997227.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
At Thu, 20 Apr 2023 15:40:14 +0800, Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote in
> If `segno` can be larger than `currSegNo`, your patch seems to miss
> the following branch,
> are you missing this for some particular reason?
Oops! Sorry for the mistake and thanks for pointing it out.
I should have kept segno within the reasonable range.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Fix-incorrect-calculation-regarding-max_slot_wal_.patch | text/x-patch | 1.2 KB |
From: | Junwang Zhao <zhjwpku(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-20 11:17:02 |
Message-ID: | CAEG8a3Kx2tO4d0kaDH8-75LthUtOc_A2HEZFpkgGcQ4NB1A=vA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
This patch looks reasonable.
+1
On Thu, Apr 20, 2023 at 4:58 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 20 Apr 2023 15:40:14 +0800, Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote in
> > If `segno` can be larger than `currSegNo`, your patch seems to miss
> > the following branch,
> > are you missing this for some particular reason?
>
> Oops! Sorry for the mistake and thanks for pointing it out.
>
> I should have kept segno within the reasonable range.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
--
Regards
Junwang Zhao
From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | zhjwpku(at)gmail(dot)com, xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-20 22:02:42 |
Message-ID: | 20230420220242.GB1435672@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Thu, Apr 20, 2023 at 05:58:14PM +0900, Kyotaro Horiguchi wrote:
> + /*
> + * Slot minimum LSN could be greater than recptr. In such cases, no
> + * segments are protected by slots but we still need to keep segno in a
> + * reasonable range for subsequent calculations to avoid underflow.
> + */
> + if (segno > currSegNo)
> + segno = currSegNo;
> +
I wonder if it'd be better to instead change
if (currSegNo - segno > slot_keep_segs)
to
if (currSegNo > segno + slot_keep_segs)
and
if (currSegNo - segno < keep_segs)
to
if (currSegNo < segNo + keep_segs)
If segno > currSegNo, the first conditional would be false as expected, and
the second would be true as expected. We could also use
pg_sub_u64_overflow() to detect underflow, but that might be excessive in
this case.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | nathandbossart(at)gmail(dot)com |
Cc: | zhjwpku(at)gmail(dot)com, xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-21 01:42:31 |
Message-ID: | 20230421.104231.1594652765237022266.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
At Thu, 20 Apr 2023 15:02:42 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> I wonder if it'd be better to instead change
>
> if (currSegNo - segno > slot_keep_segs)
> to
> if (currSegNo > segno + slot_keep_segs)
>
> and
>
> if (currSegNo - segno < keep_segs)
> to
> if (currSegNo < segNo + keep_segs)
>
> If segno > currSegNo, the first conditional would be false as expected, and
> the second would be true as expected. We could also use
> pg_sub_u64_overflow() to detect underflow, but that might be excessive in
> this case.
From what I understand, the XLogSegNo calculations are designed
without considering the actual value range. Basically, it assumes that
(XLogSegNo + <any positive int>) can overflow. If we take the actual
value range into account, we can make that change.
The choice lies on whether we assume the actual value range or not.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | zhjwpku(at)gmail(dot)com, xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-21 21:43:47 |
Message-ID: | 20230421214347.GA1441786@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Fri, Apr 21, 2023 at 10:42:31AM +0900, Kyotaro Horiguchi wrote:
> From what I understand, the XLogSegNo calculations are designed
> without considering the actual value range. Basically, it assumes that
> (XLogSegNo + <any positive int>) can overflow. If we take the actual
> value range into account, we can make that change.
>
> The choice lies on whether we assume the actual value range or not.
Yeah, after staring at this some more, I think your proposed fix is the way
to go. Alternatively, we could adjust the conditional for the
max_slot_wal_keep_size block to
if (keep != InvalidXLogRecPtr && keep < recptr)
but AFAICT that yields the exact same behavior.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | zhjwpku(at)gmail(dot)com, xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-24 19:14:52 |
Message-ID: | 20230424191452.GA1698829@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Here is a new version of the patch. It is fundamentally the same as v2,
but I've adjusted the comment and commit message a bit. Barring
objections, I am planning to commit this (and back-patch to v13) in the
near future.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Prevent-underflow-in-KeepLogSeg.patch | text/x-diff | 1.6 KB |
From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | nathandbossart(at)gmail(dot)com |
Cc: | zhjwpku(at)gmail(dot)com, xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-25 04:14:52 |
Message-ID: | 20230425.131452.517547289278432754.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
At Mon, 24 Apr 2023 12:14:52 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> Here is a new version of the patch. It is fundamentally the same as v2,
> but I've adjusted the comment and commit message a bit. Barring
> objections, I am planning to commit this (and back-patch to v13) in the
> near future.
> When determining the oldest segment that must be kept for
> replication slots, KeepLogSeg() might calculate a segment number
> ahead of the one for the LSN given to the function. This causes
Maybe the KeepLogSeg() is a mistake of
XLogGetReplicationSlotMinimumLSN()?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | zhjwpku(at)gmail(dot)com, xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-27 21:47:46 |
Message-ID: | 20230427214746.GA1986346@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Tue, Apr 25, 2023 at 01:14:52PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 24 Apr 2023 12:14:52 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
>> When determining the oldest segment that must be kept for
>> replication slots, KeepLogSeg() might calculate a segment number
>> ahead of the one for the LSN given to the function. This causes
>
> Maybe the KeepLogSeg() is a mistake of
> XLogGetReplicationSlotMinimumLSN()?
I adjusted the commit message to call out that function explicitly. Also,
I decided to go with the "keep < recptr" approach since there's no reason
to do anything in that block otherwise.
Thanks!
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, zhjwpku(at)gmail(dot)com, xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-27 23:08:01 |
Message-ID: | ZEsAUeAsD6zvvKhc@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Thu, Apr 27, 2023 at 02:47:46PM -0700, Nathan Bossart wrote:
> I adjusted the commit message to call out that function explicitly. Also,
> I decided to go with the "keep < recptr" approach since there's no reason
> to do anything in that block otherwise.
b726236 exists in the tree, but it seems like the message of
pgsql-committers is held on moderation?
--
Michael
From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | nathandbossart(at)gmail(dot)com |
Cc: | zhjwpku(at)gmail(dot)com, xu(dot)xw2008(at)163(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17903: There is a bug in the KeepLogSeg() |
Date: | 2023-04-28 06:21:49 |
Message-ID: | 20230428.152149.1957059339218200205.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | PostgreSQL : PostgreSQL 메일 링리스트 : 2023-04-28 이후 PGSQL 윈 토토 06:21 |
At Thu, 27 Apr 2023 14:47:46 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> On Tue, Apr 25, 2023 at 01:14:52PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 24 Apr 2023 12:14:52 -0700, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote in
> >> When determining the oldest segment that must be kept for
> >> replication slots, KeepLogSeg() might calculate a segment number
> >> ahead of the one for the LSN given to the function. This causes
> >
> > Maybe the KeepLogSeg() is a mistake of
> > XLogGetReplicationSlotMinimumLSN()?
>
> I adjusted the commit message to call out that function explicitly. Also,
> I decided to go with the "keep < recptr" approach since there's no reason
> to do anything in that block otherwise.
That works, too. Thanks!
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center