Re: Don't try fetching future segment of a TLI.

Lists: pgsql-bugspgsql-hackers
From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: psuderevsky(at)gmail(dot)com
Subject: BUG #16159: recovery requests WALs for the next timelines before timeline switch LSN has been reached
Date: 2019-12-11 12:39:20
Message-ID: 16159-f5a34a3a04dc67e0@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The following bug has been logged on the website:

Bug reference: 16159
Logged by: Pavel Suderevsky
Email address: psuderevsky(at)gmail(dot)com
PostgreSQL version: 11.6
Operating system: CentOS 7.6.1810 (3.10.0-957.el7.x86_64)
Description:

Reproduced 11.2,11.6.

If PostgreSQL starts recovery and finds a history file for a timeline that
is higher than current one, it will request file with the segment for the
future timeline (that most likely doesn't exist yet) and only then it will
request file with the segment for current timeline.
If archive is located on remote storage it can take huge time to find that
segments for the future timelines are not exist yet and therefore recovery
can take too long.

Example:

recovery.conf:
>restore_command = 'echo -e "Searching WAL: %f, location: %p";
/usr/bin/pgbackrest --stanza=platform archive-get %f "%p"'
>recovery_target_timeline = 'latest'
>standby_mode = 'on'

Postgres log during startup:
>
> 2019-12-06 07:11:16 CST LOG: database system was shut down in recovery
> at 2019-12-06 07:11:08 CST
> Searching WAL: 00000022.history, location: pg_wal/RECOVERYHISTORY
> 2019-12-06 07:11:16 CST LOG: restored log file "00000022.history" from
> archive
> Searching WAL: 00000023.history, location: pg_wal/RECOVERYHISTORY
> 2019-12-06 07:11:16 CST LOG: entering standby mode
> Searching WAL: 00000022.history, location: pg_wal/RECOVERYHISTORY
> 2019-12-06 07:11:16 CST LOG: restored log file "00000022.history" from
> archive
> Searching WAL: 00000022000018C60000003F, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018C60000003F, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:20 CST LOG: restored log file
> "00000021000018C60000003F" from archive
> Searching WAL: 00000021.history, location: pg_wal/RECOVERYHISTORY
> 2019-12-06 07:11:20 CST LOG: restored log file "00000021.history" from
> archive
> Searching WAL: 00000022000018BF0000001B, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001B, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:27 CST LOG: restored log file
> "00000021000018BF0000001B" from archive
> 2019-12-06 07:11:27 CST LOG: redo starts at 18BF/1B311260
> Searching WAL: 00000022000018BF0000001C, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001C, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:34 CST LOG: restored log file
> "00000021000018BF0000001C" from archive
> Searching WAL: 00000022000018BF0000001D, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001D, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:40 CST LOG: restored log file
> "00000021000018BF0000001D" from archive
> Searching WAL: 00000022000018BF0000001E, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001E, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:46 CST LOG: restored log file
> "00000021000018BF0000001E" from archive
> Searching WAL: 00000022000018BF0000001F, location: pg_wal/RECOVERYXLOG
> Searching WAL: 00000021000018BF0000001F, location: pg_wal/RECOVERYXLOG
> 2019-12-06 07:11:53 CST LOG: restored log file
> "00000021000018BF0000001F" from archive

As you can see Postgres tries to restore 00000022* WALs before timeline
switch LSN has been reached while restoring 00000021*.


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: psuderevsky(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, noreply(at)postgresql(dot)org
Subject: Re: BUG #16159: recovery requests WALs for the next timelines before timeline switch LSN has been reached
Date: 2019-12-12 03:48:56
Message-ID: 20191212.124856.202839248708529678.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hello.

In short, it is not a bug.

At Wed, 11 Dec 2019 12:39:20 +0000, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote in
> The following bug has been logged on the website:
>
> Bug reference: 16159
> Logged by: Pavel Suderevsky
> Email address: psuderevsky(at)gmail(dot)com
> PostgreSQL version: 11.6
> Operating system: CentOS 7.6.1810 (3.10.0-957.el7.x86_64)
> Description:
>
> Reproduced 11.2,11.6.
>
> If PostgreSQL starts recovery and finds a history file for a timeline that
> is higher than current one, it will request file with the segment for the
> future timeline (that most likely doesn't exist yet) and only then it will
> request file with the segment for current timeline.

The cause of the "future" timeline is that the standby has received
the history file for TLI=22 but has not completed replaying the first
checkpoint after promotion. In that case, WAL files before the
timeline switch should not exist for TLI=22 and PostgreSQL is making
sure that by peeking the archive for the file.

Since standby always starts archive recovery from the REDO location of
the last checkpoint performed on the standby(or the restart point),
the WAL amount to read is irrelevant to promotion.

> If archive is located on remote storage it can take huge time to find that
> segments for the future timelines are not exist yet and therefore recovery
> can take too long.

I don't think that peeking non-existent remote files takes comparable
amount of time to 16MB transfer. If the problem is the amount of WAL
files to transfer during recovery, I came up of three ways to make
standby startup faster.

1. For operational shutdown/restarts, make sure that the latest
restart point is close enough to the replay location on the standby
before shutting down. If not, manual checkpoint on the master then
that on the standby would help. The functions pg_control_checkpoint()
and pg_last_wal_replay_lsn() would work for checking that condition.

2. PostgreSQL 11 accepts "always" for the archive_mode GUC setting. It
enables standby-side archiving.

/docs/11/runtime-config-wal.html#GUC-ARCHIVE-MODE

3. Decrease max_wal_size or checkopint_timeout on the master, and/or
decrease checkpoint_timeout on the standby. This decreases the
amount of time needed during recovery.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Pavel Suderevsky <psuderevsky(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, noreply(at)postgresql(dot)org
Subject: Re: BUG #16159: recovery requests WALs for the next timelines before timeline switch LSN has been reached
Date: 2020-01-28 16:13:32
Message-ID: CAEBTBzsi8HrTTSC1xwL0otzrwdefRSLuwdQ7LnAs+jQeN+EJ8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

Thank you very much for your explanation and sorry for delay with an
answer.
But for me it still seems that PostgreSQL has enough information to check
that no WALs exist for the new timeline to omit searching all the
possibly-existing WALs.

> 0000005300004AE1000000A3
> rmgr: Standby len (rec/tot): 62/ 62, tx: 0, lsn:
> 4AE1/A3000028, prev 4AE1/A28EC5F8, desc: RUNNING_XACTS nextXid 82249762
> latestCompletedXid 82249758 oldestRunningXid 82249759; 1 xacts: 82249759
> rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn:
> 4AE1/A3000068, prev 4AE1/A3000028, desc: CHECKPOINT_SHUTDOWN redo
> 4AE1/A3000068; tli 83; prev tli 83; fpw true; xid 0:82249762; oid 1074976;
> multi 144; offset 4568; oldest xid 562 in DB 1; oldest multi 1 in DB 1;
> oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
> rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn:
> 4AE1/A30000D8, prev 4AE1/A3000068, desc: SWITCH
> 0000005400004AE1000000A4
> rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn:
> 4AE1/A4000028, prev 4AE1/A30000D8, desc: CHECKPOINT_SHUTDOWN redo
> 4AE1/A4000028; tli 83; prev tli 83; fpw true; xid 0:82249762; oid 1074976;
> multi 144; offset 4568; oldest xid 562 in DB 1; oldest multi 1 in DB 1;
> oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
> rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn:
> 4AE1/A4000098, prev 4AE1/A4000028, desc: END_OF_RECOVERY tli 84; prev tli
> 83; time 2020-01-28 06:29:03.432938 CST
> 00000054.history
> 83 4AE1/A4000098 no recovery target specified
>
It can just look through the first received new-timeline's WAL and ensure
timeline switch occured in this WAL. Finally, it can check archive for the
only one possibly-existing previous WAL.

Regading influence: issue is not about the large amount of WALs to apply
but in searching for the non-existing WALs on the remote storage, each such
search can take 5-10 seconds while obtaining existing WAL takes
milliseconds.

Regards,
Pavel Suderevsky

чт, 12 дек. 2019 г. в 06:49, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>:

> Hello.
>
> In short, it is not a bug.
>
> At Wed, 11 Dec 2019 12:39:20 +0000, PG Bug reporting form <
> noreply(at)postgresql(dot)org> wrote in
> > The following bug has been logged on the website:
> >
> > Bug reference: 16159
> > Logged by: Pavel Suderevsky
> > Email address: psuderevsky(at)gmail(dot)com
> > PostgreSQL version: 11.6
> > Operating system: CentOS 7.6.1810 (3.10.0-957.el7.x86_64)
> > Description:
> >
> > Reproduced 11.2,11.6.
> >
> > If PostgreSQL starts recovery and finds a history file for a timeline
> that
> > is higher than current one, it will request file with the segment for the
> > future timeline (that most likely doesn't exist yet) and only then it
> will
> > request file with the segment for current timeline.
>
> The cause of the "future" timeline is that the standby has received
> the history file for TLI=22 but has not completed replaying the first
> checkpoint after promotion. In that case, WAL files before the
> timeline switch should not exist for TLI=22 and PostgreSQL is making
> sure that by peeking the archive for the file.
>
> Since standby always starts archive recovery from the REDO location of
> the last checkpoint performed on the standby(or the restart point),
> the WAL amount to read is irrelevant to promotion.
>
> > If archive is located on remote storage it can take huge time to find
> that
> > segments for the future timelines are not exist yet and therefore
> recovery
> > can take too long.
>
> I don't think that peeking non-existent remote files takes comparable
> amount of time to 16MB transfer. If the problem is the amount of WAL
> files to transfer during recovery, I came up of three ways to make
> standby startup faster.
>
> 1. For operational shutdown/restarts, make sure that the latest
> restart point is close enough to the replay location on the standby
> before shutting down. If not, manual checkpoint on the master then
> that on the standby would help. The functions pg_control_checkpoint()
> and pg_last_wal_replay_lsn() would work for checking that condition.
>
> 2. PostgreSQL 11 accepts "always" for the archive_mode GUC setting. It
> enables standby-side archiving.
>
> /docs/11/runtime-config-wal.html#GUC-ARCHIVE-MODE
>
> 3. Decrease max_wal_size or checkopint_timeout on the master, and/or
> decrease checkpoint_timeout on the standby. This decreases the
> amount of time needed during recovery.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: psuderevsky(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Don't try fetching future segment of a TLI.
Date: 2020-01-29 03:02:22
Message-ID: 20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hello, I added (moved to) -hackers.

At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky <psuderevsky(at)gmail(dot)com> wrote in
> But for me it still seems that PostgreSQL has enough information to check
> that no WALs exist for the new timeline to omit searching all the
> possibly-existing WALs.
>
> It can just look through the first received new-timeline's WAL and ensure
> timeline switch occured in this WAL. Finally, it can check archive for the
> only one possibly-existing previous WAL.

Right. The timeline history file tells where a timeline ends.

> Regading influence: issue is not about the large amount of WALs to apply
> but in searching for the non-existing WALs on the remote storage, each such
> search can take 5-10 seconds while obtaining existing WAL takes
> milliseconds.

Wow. I didn't know of a file system that takes that much seconds to
trying non-existent files. Although I still think this is not a bug,
but avoiding that actually leads to a big win on such systems.

After a thought, I think it's safe and effectively doable to let
XLogFileReadAnyTLI() refrain from trying WAL segments of too-high
TLIs. Some garbage archive files out of the range of a timeline might
be seen, for example, after reusing archive directory without clearing
files. However, fetching such garbages just to fail doesn't
contribute durability or reliablity at all, I think.

The attached does that.

Any thoughts?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v1-0001-Don-t-try-fetching-out-of-timeline-segments.patch text/x-patch 1.9 KB

From: David Steele <david(at)pgmasters(dot)net>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: psuderevsky(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-02-01 05:31:40
Message-ID: 8c831a30-a3a2-de89-2059-e7ce842d7aed@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 1/28/20 8:02 PM, Kyotaro Horiguchi wrote:
> At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky
>> Regading influence: issue is not about the large amount of WALs to apply
>> but in searching for the non-existing WALs on the remote storage,
each such
>> search can take 5-10 seconds while obtaining existing WAL takes
>> milliseconds.
>
> Wow. I didn't know of a file system that takes that much seconds to
> trying non-existent files. Although I still think this is not a bug,
> but avoiding that actually leads to a big win on such systems.

I have not tested this case but I can imagine it would be slow in
practice. It's axiomatic that is hard to prove a negative. With
multi-region replication it might well take some time to be sure that
the file *really* doesn't exist and hasn't just been lost in a single
region.

> After a thought, I think it's safe and effectively doable to let
> XLogFileReadAnyTLI() refrain from trying WAL segments of too-high
> TLIs. Some garbage archive files out of the range of a timeline might
> be seen, for example, after reusing archive directory without clearing
> files. However, fetching such garbages just to fail doesn't
> contribute durability or reliablity at all, I think.

The patch seems sane, the trick will be testing it.

Pavel, do you have an environment where you can ensure this is a
performance benefit?

Regards,
--
-David
david(at)pgmasters(dot)net


From: Pavel Suderevsky <psuderevsky(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-03-19 13:22:16
Message-ID: CAEBTBzupObE6Gep1y-ajvUh4AjuvyTmfTewshVOei12uRV=Q8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

I've tested patch provided by Kyotaro and do confirm it fixes the issue.
Any chance it will be merged to one of the next minor releases?

Thank you very much!

сб, 1 февр. 2020 г. в 08:31, David Steele <david(at)pgmasters(dot)net>:

> On 1/28/20 8:02 PM, Kyotaro Horiguchi wrote:
> > At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky
> >> Regading influence: issue is not about the large amount of WALs to
> apply
> >> but in searching for the non-existing WALs on the remote storage,
> each such
> >> search can take 5-10 seconds while obtaining existing WAL takes
> >> milliseconds.
> >
> > Wow. I didn't know of a file system that takes that much seconds to
> > trying non-existent files. Although I still think this is not a bug,
> > but avoiding that actually leads to a big win on such systems.
>
> I have not tested this case but I can imagine it would be slow in
> practice. It's axiomatic that is hard to prove a negative. With
> multi-region replication it might well take some time to be sure that
> the file *really* doesn't exist and hasn't just been lost in a single
> region.
>
> > After a thought, I think it's safe and effectively doable to let
> > XLogFileReadAnyTLI() refrain from trying WAL segments of too-high
> > TLIs. Some garbage archive files out of the range of a timeline might
> > be seen, for example, after reusing archive directory without clearing
> > files. However, fetching such garbages just to fail doesn't
> > contribute durability or reliablity at all, I think.
>
> The patch seems sane, the trick will be testing it.
>
> Pavel, do you have an environment where you can ensure this is a
> performance benefit?
>
> Regards,
> --
> -David
> david(at)pgmasters(dot)net
>


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Pavel Suderevsky <psuderevsky(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-06 17:43:02
Message-ID: 792ea085-95c4-bca0-ae82-47fdc80e146d@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2020/03/19 22:22, Pavel Suderevsky wrote:
> Hi,
>
> I've tested patch provided by Kyotaro and do confirm it fixes the issue.

The patch looks good to me. Attached is the updated version of the patch.
I updated only comments.

Barring any objection, I will commit this patch.

> Any chance it will be merged to one of the next minor releases?

This doesn't seem a bug, so I'm thinking to merge this to next *major*
version release, i.e., v13.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
v2-0001-Don-t-try-fetching-out-of-timeline-segments.patch text/plain 1.5 KB

From: David Steele <david(at)pgmasters(dot)net>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Pavel Suderevsky <psuderevsky(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-06 19:04:07
Message-ID: 762ffb60-df9c-0efc-2714-9859219157e3@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg무지개 토토SQL : Postg무지개 토토SQL 메일 링리스트 : 2020-04-06 이후 PGSQL-BUGS 19:04 pgsql-hackers

On 4/6/20 1:43 PM, Fujii Masao wrote:
>
>
> On 2020/03/19 22:22, Pavel Suderevsky wrote:
>> Hi,
>>
>> I've tested patch provided by Kyotaro and do confirm it fixes the issue.
>
> The patch looks good to me. Attached is the updated version of the patch.
> I updated only comments.
>
> Barring any objection, I will commit this patch.

The patch looks good to me.

>> Any chance it will be merged to one of the next minor releases?
>
> This doesn't seem a bug, so I'm thinking to merge this to next *major*
> version release, i.e., v13.

Not a bug, perhaps, but I think we do consider back-patching performance
problems. The rise in S3 usage has just exposed how poorly this
performed code in high-latency environments.

Regards,
--
-David
david(at)pgmasters(dot)net


From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: psuderevsky(at)gmail(dot)com, david(at)pgmasters(dot)net, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-07 01:29:09
Message-ID: 20200407.102909.709448322643210522.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Thank you for picking this up.

At Tue, 7 Apr 2020 02:43:02 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> On 2020/03/19 22:22, Pavel Suderevsky wrote:
> > Hi,
> > I've tested patch provided by Kyotaro and do confirm it fixes the
> > issue.
>
> The patch looks good to me. Attached is the updated version of the
> patch.
> I updated only comments.

+ * The logfile segment that doesn't belong to the timeline is
+ * older or newer than the segment that the timeline started or
+ * eneded at, respectively. It's sufficient to check only the

s/eneded/ended/ ?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: David Steele <david(at)pgmasters(dot)net>, Pavel Suderevsky <psuderevsky(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-07 03:15:00
Message-ID: 3e0713d2-56d2-9883-4213-d5ba7f0dcabf@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2020/04/07 4:04, David Steele wrote:
> On 4/6/20 1:43 PM, Fujii Masao wrote:
>>
>>
>> On 2020/03/19 22:22, Pavel Suderevsky wrote:
>>> Hi,
>>>
>>> I've tested patch provided by Kyotaro and do confirm it fixes the issue.
>>
>> The patch looks good to me. Attached is the updated version of the patch.
>> I updated only comments.
>>
>> Barring any objection, I will commit this patch.
>
> The patch looks good to me.
>
>>> Any chance it will be merged to one of the next minor releases?
>>
>> This doesn't seem a bug, so I'm thinking to merge this to next *major*
>> version release, i.e., v13.
>
> Not a bug, perhaps, but I think we do consider back-patching performance problems. The rise in S3 usage has just exposed how poorly this performed code in high-latency environments.

I understood the situation and am fine to back-patch that. But I'm not sure
if it's fair to do that. Maybe we need to hear more opinions about this?
OTOH, feature freeze for v13 is today, so what about committing the patch
in v13 at first, and then doing the back-patch after hearing opinions and
receiving many +1?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: psuderevsky(at)gmail(dot)com, david(at)pgmasters(dot)net, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-07 03:22:20
Message-ID: c722ac2a-e923-6179-8fc8-30dbca48f018@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2020/04/07 10:29, Kyotaro Horiguchi wrote:
> Thank you for picking this up.
>
> At Tue, 7 Apr 2020 02:43:02 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>> On 2020/03/19 22:22, Pavel Suderevsky wrote:
>>> Hi,
>>> I've tested patch provided by Kyotaro and do confirm it fixes the
>>> issue.
>>
>> The patch looks good to me. Attached is the updated version of the
>> patch.
>> I updated only comments.
>
> + * The logfile segment that doesn't belong to the timeline is
> + * older or newer than the segment that the timeline started or
> + * eneded at, respectively. It's sufficient to check only the
>
> s/eneded/ended/ ?

Yes! Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
v3-0001-Don-t-try-fetching-out-of-timeline-segments.patch text/plain 1.5 KB

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: david(at)pgmasters(dot)net, psuderevsky(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-07 07:48:01
Message-ID: 20200407.164801.1830442317313243264.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2020/04/07 4:04, David Steele wrote:
> > On 4/6/20 1:43 PM, Fujii Masao wrote:
> >>
> >>
> >> On 2020/03/19 22:22, Pavel Suderevsky wrote:
> >>> Hi,
> >>>
> >>> I've tested patch provided by Kyotaro and do confirm it fixes the
> >>> issue.
> >>
> >> The patch looks good to me. Attached is the updated version of the
> >> patch.
> >> I updated only comments.
> >>
> >> Barring any objection, I will commit this patch.
> > The patch looks good to me.
> >
> >>> Any chance it will be merged to one of the next minor releases?
> >>
> >> This doesn't seem a bug, so I'm thinking to merge this to next *major*
> >> version release, i.e., v13.
> > Not a bug, perhaps, but I think we do consider back-patching
> > performance problems. The rise in S3 usage has just exposed how poorly
> > this performed code in high-latency environments.
>
> I understood the situation and am fine to back-patch that. But I'm not
> sure
> if it's fair to do that. Maybe we need to hear more opinions about
> this?
> OTOH, feature freeze for v13 is today, so what about committing the
> patch
> in v13 at first, and then doing the back-patch after hearing opinions
> and
> receiving many +1?

+1 for commit only v13 today, then back-patch if people wants and/or
accepts.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Pavel Suderevsky <psuderevsky(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-07 08:17:32
Message-ID: 20200407081732.GC6655@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Apr 07, 2020 at 12:15:00PM +0900, Fujii Masao wrote:
> I understood the situation and am fine to back-patch that. But I'm not sure
> if it's fair to do that. Maybe we need to hear more opinions about this?
> OTOH, feature freeze for v13 is today, so what about committing the patch
> in v13 at first, and then doing the back-patch after hearing opinions and
> receiving many +1?

I have not looked at the patch so I cannot say much about it, but it
is annoying to fetch segments you are not going to need anyway if you
target recovery with a timeline older than the segments fetched and
this has a cost when you pay for the bandwidth of your environment
with only one archive location. So a backpatch sounds like a good
thing to do even if recovery is not broken per-se, only slower.

Designing a TAP test for that is tricky, but you could look at the
logs of the backend to make sure that only the wanted segments are
fetched with a central archived solution and multiple timelines
involved. And costly it is.
--
Michael


From: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-07 09:36:18
Message-ID: c3f1c0f6-5e57-9b47-7dff-a21e760920b8@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I`ve bumped into this issue recently:
/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru

On 4/6/20 8:43 PM, Fujii Masao wrote:

> The patch looks good to me. Attached is the updated version of the patch.
> I updated only comments.
>
> Barring any objection, I will commit this patch.

I`ve been running tests on your patch. So far so good.

On Tue, Apr 07, 2020 at 12:15:00PM +0900, Fujii Masao wrote:

> I understood the situation and am fine to back-patch that. But I'm not sure
> if it's fair to do that. Maybe we need to hear more opinions about this?
> OTOH, feature freeze for v13 is today, so what about committing the patch
> in v13 at first, and then doing the back-patch after hearing opinions and
> receiving many +1?

+1 to back-patching it.

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: David Steele <david(at)pgmasters(dot)net>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: psuderevsky(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-07 11:21:58
Message-ID: 4a461c7b-b90a-6644-64a6-80eac69c27bc@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote:
> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major*
>>>> version release, i.e., v13.
>>> Not a bug, perhaps, but I think we do consider back-patching
>>> performance problems. The rise in S3 usage has just exposed how poorly
>>> this performed code in high-latency environments.
>>
>> I understood the situation and am fine to back-patch that. But I'm not
>> sure
>> if it's fair to do that. Maybe we need to hear more opinions about
>> this?
>> OTOH, feature freeze for v13 is today, so what about committing the
>> patch
>> in v13 at first, and then doing the back-patch after hearing opinions
>> and
>> receiving many +1?
>
> +1 for commit only v13 today, then back-patch if people wants and/or
> accepts.

Definitely +1 for a commit today to v13. I certainly was not trying to
hold that up.

--
-David
david(at)pgmasters(dot)net


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: psuderevsky(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Don't try fetching future segment of a TLI.
Date: 2020-04-07 16:49:27
Message-ID: 21d2590f-2739-9f45-748f-7a68d667d7cd@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2020/04/07 20:21, David Steele wrote:
>
> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote:
>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major*
>>>>> version release, i.e., v13.
>>>> Not a bug, perhaps, but I think we do consider back-patching
>>>> performance problems. The rise in S3 usage has just exposed how poorly
>>>> this performed code in high-latency environments.
>>>
>>> I understood the situation and am fine to back-patch that. But I'm not
>>> sure
>>> if it's fair to do that. Maybe we need to hear more opinions about
>>> this?
>>> OTOH, feature freeze for v13 is today, so what about committing the
>>> patch
>>> in v13 at first, and then doing the back-patch after hearing opinions
>>> and
>>> receiving many +1?
>>
>> +1 for commit only v13 today, then back-patch if people wants and/or
>> accepts.
>
> Definitely +1 for a commit today to v13. I certainly was not trying to hold that up
Pushed the patch to v13, at first!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: psuderevsky(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.
Date: 2020-04-30 14:15:51
Message-ID: af0c5065-7807-c06d-29eb-e5b4121327ee@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2020/04/08 1:49, Fujii Masao wrote:
>
>
> On 2020/04/07 20:21, David Steele wrote:
>>
>> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote:
>>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major*
>>>>>> version release, i.e., v13.
>>>>> Not a bug, perhaps, but I think we do consider back-patching
>>>>> performance problems. The rise in S3 usage has just exposed how poorly
>>>>> this performed code in high-latency environments.
>>>>
>>>> I understood the situation and am fine to back-patch that. But I'm not
>>>> sure
>>>> if it's fair to do that. Maybe we need to hear more opinions about
>>>> this?
>>>> OTOH, feature freeze for v13 is today, so what about committing the
>>>> patch
>>>> in v13 at first, and then doing the back-patch after hearing opinions
>>>> and
>>>> receiving many +1?
>>>
>>> +1 for commit only v13 today, then back-patch if people wants and/or
>>> accepts.

Please let me revisit this. Currently Grigory Smolkin, David Steele,
Michael Paquier and Pavel Suderevsky agree to the back-patch and
there has been no objection to that. So we should do the back-patch?
Or does anyone object to that?

I don't think that this is a feature bug because archive recovery works
fine from a functional perspective without this commit. OTOH,
I understand that, without the commit, there is complaint about that
archive recovery may be slow unnecessarily when archival storage is
located in remote, e.g., Amazon S3 and it takes a long time to fetch
the non-existent archive WAL file. So I'm ok to the back-patch unless
there is no strong objection to that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, psuderevsky(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.
Date: 2020-05-02 11:40:38
Message-ID: CAA4eK1JQaL8Kk+E4DqSrYqrNLYusNk22ZLY38JkmCeVdUjotpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Apr 30, 2020 at 7:46 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2020/04/08 1:49, Fujii Masao wrote:
> >
> >
> > On 2020/04/07 20:21, David Steele wrote:
> >>
> >> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote:
> >>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> >>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major*
> >>>>>> version release, i.e., v13.
> >>>>> Not a bug, perhaps, but I think we do consider back-patching
> >>>>> performance problems. The rise in S3 usage has just exposed how poorly
> >>>>> this performed code in high-latency environments.
> >>>>
> >>>> I understood the situation and am fine to back-patch that. But I'm not
> >>>> sure
> >>>> if it's fair to do that. Maybe we need to hear more opinions about
> >>>> this?
> >>>> OTOH, feature freeze for v13 is today, so what about committing the
> >>>> patch
> >>>> in v13 at first, and then doing the back-patch after hearing opinions
> >>>> and
> >>>> receiving many +1?
> >>>
> >>> +1 for commit only v13 today, then back-patch if people wants and/or
> >>> accepts.
>
> Please let me revisit this. Currently Grigory Smolkin, David Steele,
> Michael Paquier and Pavel Suderevsky agree to the back-patch and
> there has been no objection to that. So we should do the back-patch?
> Or does anyone object to that?
>
> I don't think that this is a feature bug because archive recovery works
> fine from a functional perspective without this commit. OTOH,
> I understand that, without the commit, there is complaint about that
> archive recovery may be slow unnecessarily when archival storage is
> located in remote, e.g., Amazon S3 and it takes a long time to fetch
> the non-existent archive WAL file. So I'm ok to the back-patch unless
> there is no strong objection to that.
>

I don't see any obvious problem with the changed code but we normally
don't backpatch performance improvements. I can see that the code
change here appears to be straight forward so it might be fine to
backpatch this. Have we seen similar reports earlier as well? AFAIK,
this functionality is for a long time and if people were facing this
on a regular basis then we would have seen such reports multiple
times. I mean to say if the chances of this hitting are less then we
can even choose not to backpatch this.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, psuderevsky(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.
Date: 2020-05-07 06:43:40
Message-ID: a8baf027-808b-3fdd-7922-0a6c4815d9a6@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2020/05/02 20:40, Amit Kapila wrote:
> On Thu, Apr 30, 2020 at 7:46 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>
>> On 2020/04/08 1:49, Fujii Masao wrote:
>>>
>>>
>>> On 2020/04/07 20:21, David Steele wrote:
>>>>
>>>> On 4/7/20 3:48 AM, Kyotaro Horiguchi wrote:
>>>>> At Tue, 7 Apr 2020 12:15:00 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>>>>>>>> This doesn't seem a bug, so I'm thinking to merge this to next *major*
>>>>>>>> version release, i.e., v13.
>>>>>>> Not a bug, perhaps, but I think we do consider back-patching
>>>>>>> performance problems. The rise in S3 usage has just exposed how poorly
>>>>>>> this performed code in high-latency environments.
>>>>>>
>>>>>> I understood the situation and am fine to back-patch that. But I'm not
>>>>>> sure
>>>>>> if it's fair to do that. Maybe we need to hear more opinions about
>>>>>> this?
>>>>>> OTOH, feature freeze for v13 is today, so what about committing the
>>>>>> patch
>>>>>> in v13 at first, and then doing the back-patch after hearing opinions
>>>>>> and
>>>>>> receiving many +1?
>>>>>
>>>>> +1 for commit only v13 today, then back-patch if people wants and/or
>>>>> accepts.
>>
>> Please let me revisit this. Currently Grigory Smolkin, David Steele,
>> Michael Paquier and Pavel Suderevsky agree to the back-patch and
>> there has been no objection to that. So we should do the back-patch?
>> Or does anyone object to that?
>>
>> I don't think that this is a feature bug because archive recovery works
>> fine from a functional perspective without this commit. OTOH,
>> I understand that, without the commit, there is complaint about that
>> archive recovery may be slow unnecessarily when archival storage is
>> located in remote, e.g., Amazon S3 and it takes a long time to fetch
>> the non-existent archive WAL file. So I'm ok to the back-patch unless
>> there is no strong objection to that.
>>
>
> I don't see any obvious problem with the changed code but we normally
> don't backpatch performance improvements. I can see that the code
> change here appears to be straight forward so it might be fine to
> backpatch this. Have we seen similar reports earlier as well? AFAIK,
> this functionality is for a long time and if people were facing this
> on a regular basis then we would have seen such reports multiple
> times. I mean to say if the chances of this hitting are less then we
> can even choose not to backpatch this.

I found the following two reports. ISTM there are not so many reports...
/message-id/16159-f5a34a3a04dc67e0@postgresql.org
/message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, psuderevsky(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.
Date: 2020-05-07 08:57:00
Message-ID: CAA4eK1LC_6+2Rufgen8UrjWcr1QKgaqyk3atfBHEG=0ZWw2mwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2020/05/02 20:40, Amit Kapila wrote:
> >
> > I don't see any obvious problem with the changed code but we normally
> > don't backpatch performance improvements. I can see that the code
> > change here appears to be straight forward so it might be fine to
> > backpatch this. Have we seen similar reports earlier as well? AFAIK,
> > this functionality is for a long time and if people were facing this
> > on a regular basis then we would have seen such reports multiple
> > times. I mean to say if the chances of this hitting are less then we
> > can even choose not to backpatch this.
>
> I found the following two reports. ISTM there are not so many reports...
> /message-id/16159-f5a34a3a04dc67e0@postgresql.org
> /message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru
>

The first seems to be the same where this bug has been fixed. It has
been moved to hackers in email [1]. Am, I missing something?
Considering it has been encountered by two different people, I think
it would not be a bad idea to back-patch this.

[1] - /message-id/20200129.120222.1476610231001551715.horikyota.ntt%40gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, psuderevsky(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.
Date: 2020-05-08 05:23:32
Message-ID: 9c76bcab-1804-8c91-b834-5876ae72b9fb@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2020/05/07 17:57, Amit Kapila wrote:
> On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>
>> On 2020/05/02 20:40, Amit Kapila wrote:
>>>
>>> I don't see any obvious problem with the changed code but we normally
>>> don't backpatch performance improvements. I can see that the code
>>> change here appears to be straight forward so it might be fine to
>>> backpatch this. Have we seen similar reports earlier as well? AFAIK,
>>> this functionality is for a long time and if people were facing this
>>> on a regular basis then we would have seen such reports multiple
>>> times. I mean to say if the chances of this hitting are less then we
>>> can even choose not to backpatch this.
>>
>> I found the following two reports. ISTM there are not so many reports...
>> /message-id/16159-f5a34a3a04dc67e0@postgresql.org
>> /message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru
>>
>
> The first seems to be the same where this bug has been fixed. It has
> been moved to hackers in email [1].

Yes, that's the original report that leaded to the commit.

> Am, I missing something?
> Considering it has been encountered by two different people, I think
> it would not be a bad idea to back-patch this.

+1 So I will do the back-patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, psuderevsky(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Back-patch is necessary? Re: Don't try fetching future segment of a TLI.
Date: 2020-05-09 03:26:22
Message-ID: 9d7ad280-9ce8-a538-83a3-29c57583f1ef@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2020/05/08 14:23, Fujii Masao wrote:
>
>
> On 2020/05/07 17:57, Amit Kapila wrote:
>> On Thu, May 7, 2020 at 12:13 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>
>>> On 2020/05/02 20:40, Amit Kapila wrote:
>>>>
>>>> I don't see any obvious problem with the changed code but we normally
>>>> don't backpatch performance improvements.  I can see that the code
>>>> change here appears to be straight forward so it might be fine to
>>>> backpatch this.  Have we seen similar reports earlier as well?  AFAIK,
>>>> this functionality is for a long time and if people were facing this
>>>> on a regular basis then we would have seen such reports multiple
>>>> times.  I mean to say if the chances of this hitting are less then we
>>>> can even choose not to backpatch this.
>>>
>>> I found the following two reports. ISTM there are not so many reports...
>>> /message-id/16159-f5a34a3a04dc67e0@postgresql.org
>>> /message-id/dd6690b0-ec03-6b3c-6fac-c963f91f87a7%40postgrespro.ru
>>>
>>
>> The first seems to be the same where this bug has been fixed.  It has
>> been moved to hackers in email [1].
>
> Yes, that's the original report that leaded to the commit.
>
>>  Am, I missing something?
>> Considering it has been encountered by two different people, I think
>> it would not be a bad idea to back-patch this.
>
> +1 So I will do the back-patch.

Done. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION