Quick Links

Re: [HACKERS] Restricting maximum keep segments by repslots

Lists:	Postg스포츠 토토SQL

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Restricting maximum keep segments by repslots
Date:	2017-02-28 03:27:36
Message-ID:	20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

Although replication slot is helpful to avoid unwanted WAL
deletion, on the other hand it can cause a disastrous situation
by keeping WAL segments without a limit. Removing the causal
repslot will save this situation but it is not doable if the
standby is active. We should do a rather complex and forcible
steps to relieve the situation especially in an automatic
manner. (As for me, specifically in an HA cluster.)

This patch adds a GUC to put a limit to the number of segments
that replication slots can keep. Hitting the limit during
checkpoint shows a warining and the segments older than the limit
are removed.

> WARNING: restart LSN of replication slots is ignored by checkpoint
> DETAIL: Some replication slots lose required WAL segnents to continue.

Another measure would be automatic deletion or inactivation of
the culprit slot but it seems too complex for the problem.

As we have already postponed some patches by the triage for the
last commit fest, this might should be postponed to PG11.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	3.3 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-02-28 03:42:32
Message-ID:	CAB7nPqQm0QetoShggQnn4bLFd9oXKKHG7RafBP3Krno62=ORww@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 28, 2017 at 12:27 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Although replication slot is helpful to avoid unwanted WAL
> deletion, on the other hand it can cause a disastrous situation
> by keeping WAL segments without a limit. Removing the causal
> repslot will save this situation but it is not doable if the
> standby is active. We should do a rather complex and forcible
> steps to relieve the situation especially in an automatic
> manner. (As for me, specifically in an HA cluster.)
>
> This patch adds a GUC to put a limit to the number of segments
> that replication slots can keep. Hitting the limit during
> checkpoint shows a warining and the segments older than the limit
> are removed.
>
>> WARNING: restart LSN of replication slots is ignored by checkpoint
>> DETAIL: Some replication slots lose required WAL segnents to continue.
>
> Another measure would be automatic deletion or inactivation of
> the culprit slot but it seems too complex for the problem.
>
>
> As we have already postponed some patches by the triage for the
> last commit fest, this might should be postponed to PG11.

Please no. Replication slots are designed the current way because we
don't want to have to use something like wal_keep_segments as it is a
wart, and this applies as well for replication slots in my opinion. If
a slot is bloating WAL and you care about your Postgres instance, I
would recommend instead that you use a background worker that does
monitoring of the situation based on max_wal_size for example, killing
the WAL sender associated to the slot if there is something connected
but it is frozen or it cannot keep up the pace of WAL generation, and
then dropping the slot. You may want to issue a checkpoint in this
case as well to ensure that segments get recycled. But anyway, if you
reach this point of WAL bloat, perhaps that's for the best as users
would know about it because backups would get in danger. For some
applications, that is acceptable, but you could always rely on
monitoring slots and kill them on sight if needed. That's as well more
flexible than having a parameter that basically is just a synonym of
max_wal_size.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-02-28 04:16:38
Message-ID:	20170228.131638.144031864.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the opinion.

At Tue, 28 Feb 2017 12:42:32 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqQm0QetoShggQnn4bLFd9oXKKHG7RafBP3Krno62=ORww(at)mail(dot)gmail(dot)com>
> Please no. Replication slots are designed the current way because we
> don't want to have to use something like wal_keep_segments as it is a
> wart, and this applies as well for replication slots in my opinion. If
> a slot is bloating WAL and you care about your Postgres instance, I
> would recommend instead that you use a background worker that does
> monitoring of the situation based on max_wal_size for example, killing
> the WAL sender associated to the slot if there is something connected
> but it is frozen or it cannot keep up the pace of WAL generation, and
> then dropping the slot.

It is doable without a plugin and currently we are planning to do
in the way (Maybe such plugin would be unacceptable..). Killing
walsender (which one?), removing the slot and if failed.. This is
the 'steps rather complex' and fragile.

> You may want to issue a checkpoint in this
> case as well to ensure that segments get recycled. But anyway, if you
> reach this point of WAL bloat, perhaps that's for the best as users
> would know about it because backups would get in danger.

Yes, but at the end it is better than that a server just stops
with a PANIC.

> For some applications, that is acceptable, but you could always
> rely on monitoring slots and kill them on sight if
> needed.

Another solution would be that removing a slot kills
corresponding walsender. What do you think about this?

pg_drop_replication_slot(name, *force*)

force = true kills the walsender runs on the slot.

> That's as well more flexible than having a parameter
> that basically is just a synonym of max_wal_size.

I thought the same thing first, max_wal_size_hard, that limits
the wal size including extra (other than them for the two
checkpoig cycles) segments.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-02-28 04:27:05
Message-ID:	33bcf803-4aa8-27b6-b6bd-4fb3ae950b19@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28/02/17 04:27, Kyotaro HORIGUCHI wrote:
> Hello.
>
> Although replication slot is helpful to avoid unwanted WAL
> deletion, on the other hand it can cause a disastrous situation
> by keeping WAL segments without a limit. Removing the causal
> repslot will save this situation but it is not doable if the
> standby is active. We should do a rather complex and forcible
> steps to relieve the situation especially in an automatic
> manner. (As for me, specifically in an HA cluster.)
>

I agree that that it should be possible to limit how much WAL slot keeps.

> This patch adds a GUC to put a limit to the number of segments
> that replication slots can keep. Hitting the limit during
> checkpoint shows a warining and the segments older than the limit
> are removed.
>
>> WARNING: restart LSN of replication slots is ignored by checkpoint
>> DETAIL: Some replication slots lose required WAL segnents to continue.
>

However this is dangerous as logical replication slot does not consider
it error when too old LSN is requested so we'd continue replication,
hiding data loss.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-02-28 04:34:54
Message-ID:	CAB7nPqRqhe1PgfoCEayj=bYuroEX4dP7CjWqhO_SwjTPErq6+A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 28, 2017 at 1:16 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> It is doable without a plugin and currently we are planning to do
> in the way (Maybe such plugin would be unacceptable..). Killing
> walsender (which one?), removing the slot and if failed..

The PID and restart_lsn associated to each slot offer enough
information for monitoring.

> This is the 'steps rather complex' and fragile.

The handling of slot drop is not complex. The insurance that WAL
segments get recycled on time and avoid a full bloat is though.

>> That's as well more flexible than having a parameter
>> that basically is just a synonym of max_wal_size.
>
> I thought the same thing first, max_wal_size_hard, that limits
> the wal size including extra (other than them for the two
> checkpoig cycles) segments.

It would make more sense to just switch max_wal_size from a soft to a
hard limit. The current behavior is not cool with activity spikes.
--
Michael

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-01 09:43:30
Message-ID:	CA+TgmoaNJbMCxKZAAZkgcrxyg81QF9eT9CGM0iWFd2mxQ2b6sA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 28, 2017 at 10:04 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> It would make more sense to just switch max_wal_size from a soft to a
> hard limit. The current behavior is not cool with activity spikes.

Having a hard limit on WAL size would be nice, but that's a different
problem from the one being discussed here. If max_wal_size becomes a
hard limit, and a standby with a replication slot dies, then the
master eventually starts refusing all writes. I guess that's better
than a PANIC, but it's not likely to make users very happy. I think
it's entirely reasonable to want a behavior where the master is
willing to retain up to X amount of extra WAL for the benefit of some
standby, but after that the health of the master takes priority.

You can't really get that behavior today. Either you can retain as
much WAL as might be necessary through archiving or a slot, or you can
retain a fixed amount of WAL whether it's actually needed or not.
There's currently no approach that retains min(wal_needed,
configured_value).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-01 16:06:10
Message-ID:	20170301160610.wc7ez3vihmialntd@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-02-28 12:42:32 +0900, Michael Paquier wrote:
> Please no. Replication slots are designed the current way because we
> don't want to have to use something like wal_keep_segments as it is a
> wart, and this applies as well for replication slots in my opinion.

I think a per-slot option to limit the amount of retention would make
sense.

- Andres

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-01 17:17:43
Message-ID:	dc7faead-61c4-402e-a6dc-534192833d77@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/27/17 23:27, Petr Jelinek wrote:
>>> WARNING: restart LSN of replication slots is ignored by checkpoint
>>> DETAIL: Some replication slots lose required WAL segnents to continue.
> However this is dangerous as logical replication slot does not consider
> it error when too old LSN is requested so we'd continue replication,
> hiding data loss.

In general, we would need a much more evident and strict way to discover
when this condition is hit. Like a "full" column in
pg_stat_replication_slot, and refusing connections to the slot until it
is cleared.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-01 17:18:07
Message-ID:	98538b00-42ae-6a6b-f852-50b3c937ade4@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/27/17 22:27, Kyotaro HORIGUCHI wrote:
> This patch adds a GUC to put a limit to the number of segments
> that replication slots can keep.

Please measure it in size, not in number of segments.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)anarazel(dot)de
Cc:	michael(dot)paquier(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-02 00:39:57
Message-ID:	20170302.093957.120662180.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 1 Mar 2017 08:06:10 -0800, Andres Freund <andres(at)anarazel(dot)de> wrote in <20170301160610(dot)wc7ez3vihmialntd(at)alap3(dot)anarazel(dot)de>
> On 2017-02-28 12:42:32 +0900, Michael Paquier wrote:
> > Please no. Replication slots are designed the current way because we
> > don't want to have to use something like wal_keep_segments as it is a
> > wart, and this applies as well for replication slots in my opinion.
>
> I think a per-slot option to limit the amount of retention would make
> sense.

I started from that but I found that all slots refer to the same
location as the origin of the distance, that is, the last segment
number that KeepLogSeg returns. As the result the whole logic
became as the following. This is one reason of the proposed pach.

- Take the maximum value of the maximum-retain-LSN-amount per slot.
- Apply the maximum value during the calcuation in KeepLogSeg.
- (These steps runs only when at least one slot exists)

The another reason was, as Robert retold, I thought that this is
a matter of system (or a DB cluster) wide health and works in a
bit different way from what the name "max_wal_size_hard"
suggests.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	petr(dot)jelinek(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-02 00:43:50
Message-ID:	20170302.094350.259304249.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 1 Mar 2017 12:17:43 -0500, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote in <dc7faead-61c4-402e-a6dc-534192833d77(at)2ndquadrant(dot)com>
> On 2/27/17 23:27, Petr Jelinek wrote:
> >>> WARNING: restart LSN of replication slots is ignored by checkpoint
> >>> DETAIL: Some replication slots lose required WAL segnents to continue.
> > However this is dangerous as logical replication slot does not consider
> > it error when too old LSN is requested so we'd continue replication,
> > hiding data loss.
>
> In general, we would need a much more evident and strict way to discover
> when this condition is hit. Like a "full" column in
> pg_stat_replication_slot, and refusing connections to the slot until it
> is cleared.

Anyway, if preserving WAL to replicate has priority to the
master's health, this doesn't nothing by leaving
'max_wal_keep_segments' to 0.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-02 00:54:14
Message-ID:	20170302.095414.264042561.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 1 Mar 2017 12:18:07 -0500, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote in <98538b00-42ae-6a6b-f852-50b3c937ade4(at)2ndquadrant(dot)com>
> On 2/27/17 22:27, Kyotaro HORIGUCHI wrote:
> > This patch adds a GUC to put a limit to the number of segments
> > that replication slots can keep.
>
> Please measure it in size, not in number of segments.

It was difficult to dicide which is reaaonable but I named it
after wal_keep_segments because it has the similar effect.

In bytes(or LSN)
max_wal_size
min_wal_size
wal_write_flush_after

In segments
wal_keep_segments

But surely max_slot_wal_keep_segments works to keep disk space so
bytes would be reasonable.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-03 19:47:20
Message-ID:	ac510b45-7805-7ccc-734c-1b38a6645f3e@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/1/17 19:54, Kyotaro HORIGUCHI wrote:
>> Please measure it in size, not in number of segments.
> It was difficult to dicide which is reaaonable but I named it
> after wal_keep_segments because it has the similar effect.
>
> In bytes(or LSN)
> max_wal_size
> min_wal_size
> wal_write_flush_after
>
> In segments
> wal_keep_segments

We have been moving away from measuring in segments. For example,
checkpoint_segments was replaced by max_wal_size.

Also, with the proposed patch that allows changing the segment size more
easily, this will become more important. (I wonder if that will require
wal_keep_segments to change somehow.)

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-06 09:20:06
Message-ID:	20170306.182006.172683338.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the comment.

At Fri, 3 Mar 2017 14:47:20 -0500, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote in <ac510b45-7805-7ccc-734c-1b38a6645f3e(at)2ndquadrant(dot)com>
> On 3/1/17 19:54, Kyotaro HORIGUCHI wrote:
> >> Please measure it in size, not in number of segments.
> > It was difficult to dicide which is reaaonable but I named it
> > after wal_keep_segments because it has the similar effect.
> >
> > In bytes(or LSN)
> > max_wal_size
> > min_wal_size
> > wal_write_flush_after
> >
> > In segments
> > wal_keep_segments
>
> We have been moving away from measuring in segments. For example,
> checkpoint_segments was replaced by max_wal_size.
>
> Also, with the proposed patch that allows changing the segment size more
> easily, this will become more important. (I wonder if that will require
> wal_keep_segments to change somehow.)

Agreed. It is 'max_slot_wal_keep_size' in the new version.

wal_keep_segments might should be removed someday.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots_20170306.patch	text/x-patch	3.3 KB

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-03-07 02:01:53
Message-ID:	CAMsr+YFa2TvmM+ZNWME3W9jOgAxOyu7OByQV8tsL1XbUM=Fwcw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28 February 2017 at 12:27, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:

>> This patch adds a GUC to put a limit to the number of segments
>> that replication slots can keep. Hitting the limit during
>> checkpoint shows a warining and the segments older than the limit
>> are removed.
>>
>>> WARNING: restart LSN of replication slots is ignored by checkpoint
>>> DETAIL: Some replication slots lose required WAL segnents to continue.
>>
>
> However this is dangerous as logical replication slot does not consider
> it error when too old LSN is requested so we'd continue replication,
> hiding data loss.

That skipping only happens if you request a startpoint older than
confirmed_flush_lsn . It doesn't apply to this situation.

The client cannot control where we start decoding, it's always
restart_lsn, and if we can't find a needed WAL segment we'll ERROR. So
this is safe, though the error will be something about being unable to
find a wal segment that users might not directly associate with having
set this option. It won't say "slot disabled because needed WAL has
been discarded due to [setting]" or anything.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-08-28 09:36:14
Message-ID:	20170828.183614.200480339.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

I'll add this to CF2017-09.

At Mon, 06 Mar 2017 18:20:06 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170306(dot)182006(dot)172683338(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Thank you for the comment.
>
> At Fri, 3 Mar 2017 14:47:20 -0500, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote in <ac510b45-7805-7ccc-734c-1b38a6645f3e(at)2ndquadrant(dot)com>
> > On 3/1/17 19:54, Kyotaro HORIGUCHI wrote:
> > >> Please measure it in size, not in number of segments.
> > > It was difficult to dicide which is reaaonable but I named it
> > > after wal_keep_segments because it has the similar effect.
> > >
> > > In bytes(or LSN)
> > > max_wal_size
> > > min_wal_size
> > > wal_write_flush_after
> > >
> > > In segments
> > > wal_keep_segments
> >
> > We have been moving away from measuring in segments. For example,
> > checkpoint_segments was replaced by max_wal_size.
> >
> > Also, with the proposed patch that allows changing the segment size more
> > easily, this will become more important. (I wonder if that will require
> > wal_keep_segments to change somehow.)
>
> Agreed. It is 'max_slot_wal_keep_size' in the new version.
>
> wal_keep_segments might should be removed someday.

- Following to min/max_wal_size, the variable was renamed to
"max_slot_wal_keep_size_mb" and used as ConvertToXSegs(x)"

- Stopped warning when checkpoint doesn't flush segments required
by slots even if max_slot_wal_keep_size have worked.

- Avoided subraction that may be negative.

regards,

Attachment	Content-Type	Size
slot_releaf_vent_v3.patch	text/x-patch	3.0 KB

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-09-02 03:49:21
Message-ID:	751e09c4-93e0-de57-edd2-e64c4950f5e3@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I'm still concerned about how the critical situation is handled. Your
patch just prints a warning to the log and then goes on -- doing what?
The warning rolls off the log, and then you have no idea what happened,
or how to recover.

I would like a flag in pg_replication_slots, and possibly also a
numerical column that indicates how far away from the critical point
each slot is. That would be great for a monitoring system.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-09-07 05:12:12
Message-ID:	20170907.141212.227032666.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Fri, 1 Sep 2017 23:49:21 -0400, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote in <751e09c4-93e0-de57-edd2-e64c4950f5e3(at)2ndquadrant(dot)com>
> I'm still concerned about how the critical situation is handled. Your
> patch just prints a warning to the log and then goes on -- doing what?
>
> The warning rolls off the log, and then you have no idea what happened,
> or how to recover.

The victims should be complaining in their log files, but, yes, I
must admit that it's extremely resembles /dev/null. And the
catastrophe comes suddenly.

> I would like a flag in pg_replication_slots, and possibly also a
> numerical column that indicates how far away from the critical point
> each slot is. That would be great for a monitoring system.

Great! I'll do that right now.

> --
> Peter Eisentraut http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Thanks.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-09-07 12:59:56
Message-ID:	20170907.215956.110216588.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Thu, 07 Sep 2017 14:12:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170907(dot)141212(dot)227032666(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > I would like a flag in pg_replication_slots, and possibly also a
> > numerical column that indicates how far away from the critical point
> > each slot is. That would be great for a monitoring system.
>
> Great! I'll do that right now.

Done.

In the attached patch on top of the previous patch, I added two
columns in pg_replication_slots, "live" and "distance". The first
indicates the slot will "live" after the next checkpoint. The
second shows the how many bytes checkpoint lsn can advance before
the slot will "die", or how many bytes the slot have lost after
"death".

Setting wal_keep_segments = 1 and max_slot_wal_keep_size = 16MB.

=# select slot_name, restart_lsn, pg_current_wal_lsn(), live, distance from pg_replication_slots;

slot_name | restart_lsn | pg_current_wal_lsn | live | distance
-----------+-------------+--------------------+------+-----------
s1 | 0/162D388 | 0/162D3C0 | t | 0/29D2CE8

This shows that checkpoint can advance 0x29d2ce8 bytes before the
slot will die even if the connection stalls.

s1 | 0/4001180 | 0/6FFF2B8 | t | 0/DB8

Just before the slot loses sync.

s1 | 0/4001180 | 0/70008A8 | f | 0/FFEE80

The checkpoint after this removes some required segments.

2017-09-07 19:04:07.677 JST [13720] WARNING: restart LSN of replication slots is ignored by checkpoint
2017-09-07 19:04:07.677 JST [13720] DETAIL: Some replication slots have lost required WAL segnents to continue by up to 1 segments.

If max_slot_wal_keep_size if not set (0), live is always true and
distance is NULL.

- The name (or its content) of the new columns should be arguable.

- pg_replication_slots view takes LWLock on ControlFile and
spinlock on XLogCtl for every slot. But seems difficult to
reduce it..

- distance seems mitakenly becomes 0/0 for certain condition..

- The result seems almost right but more precise check needed.
(Anyway it cannot be perfectly exact.);

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	7.2 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-09-13 02:43:06
Message-ID:	20170913.114306.67844218.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 07 Sep 2017 21:59:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170907(dot)215956(dot)110216588(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello,
>
> At Thu, 07 Sep 2017 14:12:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170907(dot)141212(dot)227032666(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > I would like a flag in pg_replication_slots, and possibly also a
> > > numerical column that indicates how far away from the critical point
> > > each slot is. That would be great for a monitoring system.
> >
> > Great! I'll do that right now.
>
> Done.

The CF status of this patch turned into "Waiting on Author".
This is because the second patch is posted separately from the
first patch. I repost them together after rebasing to the current
master.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001_slot_releaf_vent_v4.patch	text/x-patch	3.0 KB
0002_slot_behind_monitor_v2.patch	text/x-patch	7.1 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-09-13 08:08:16
Message-ID:	20170913.170816.148239681.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 13 Sep 2017 11:43:06 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170913(dot)114306(dot)67844218(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
horiguchi.kyotaro> At Thu, 07 Sep 2017 21:59:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170907(dot)215956(dot)110216588(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > Hello,
> >
> > At Thu, 07 Sep 2017 14:12:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170907(dot)141212(dot)227032666(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > > I would like a flag in pg_replication_slots, and possibly also a
> > > > numerical column that indicates how far away from the critical point
> > > > each slot is. That would be great for a monitoring system.
> > >
> > > Great! I'll do that right now.
> >
> > Done.
>
> The CF status of this patch turned into "Waiting on Author".
> This is because the second patch is posted separately from the
> first patch. I repost them together after rebasing to the current
> master.

Hmm. I was unconsciously careless of regression test since it is
in a tentative shape. This must pass the regression..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001_slot_releaf_vent_v4.patch	text/x-patch	3.0 KB
0002_slot_behind_monitor_v3.patch	text/x-patch	8.0 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-10-31 09:43:10
Message-ID:	20171031.184310.182012625.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, this is a rebased version.

It gets a change of the meaning of monitoring value along with
rebasing.

In previous version, the "live" column mysteriously predicts the
necessary segments will be kept or lost by the next checkpoint
and the "distance" offered a still more mysterious value.

In this version the meaning of the two columns became clear and
informative.

pg_replication_slots
- live :
true the slot have not lost necessary segments.

- distance:
how many bytes LSN can advance before the margin defined by
max_slot_wal_keep_size (and wal_keep_segments) is exhasuted,
or how many bytes this slot have lost xlog from restart_lsn.

There is a case where live = t and distance = 0. The slot is
currently having all the necessary segments but will start to
lose them at most two checkpoint passes.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	3.9 KB
0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	9.5 KB

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-11-06 01:02:58
Message-ID:	CAEepm=31P_f6ajnu1URqGSB1N0MjDCzePGrJmgxidemGhk7H6Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 31, 2017 at 10:43 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, this is a rebased version.

Hello Horiguchi-san,

I think the "ddl" test under contrib/test_decoding also needs to be
updated because it looks at pg_replication_slots and doesn't expect
your new columns.

--
Thomas Munro
http://www.enterprisedb.com

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-11-06 03:07:04
Message-ID:	CAMsr+YHbr-akKkz+aAN65SyWGfFivdR=NKKUmJmH7XDiWBGj=Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 31 October 2017 at 17:43, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, this is a rebased version.
>
> It gets a change of the meaning of monitoring value along with
> rebasing.
>
> In previous version, the "live" column mysteriously predicts the
> necessary segments will be kept or lost by the next checkpoint
> and the "distance" offered a still more mysterious value.

Would it make sense to teach xlogreader how to fetch from WAL archive,
too? That way if there's an archive, slots could continue to be used
even after we purge from local pg_xlog, albeit at a performance cost.

I'm thinking of this mainly for logical slots.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-11-06 13:19:37
Message-ID:	20171106131937.acyi5opnxf2ixlrd@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-11-06 11:07:04 +0800, Craig Ringer wrote:
> Would it make sense to teach xlogreader how to fetch from WAL archive,
> too? That way if there's an archive, slots could continue to be used
> even after we purge from local pg_xlog, albeit at a performance cost.
>
> I'm thinking of this mainly for logical slots.

That seems more like a page read callback's job than xlogreader's.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-11-06 13:20:50
Message-ID:	20171106132050.6apzynxrqrzghb4r@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-10-31 18:43:10 +0900, Kyotaro HORIGUCHI wrote:
> - distance:
> how many bytes LSN can advance before the margin defined by
> max_slot_wal_keep_size (and wal_keep_segments) is exhasuted,
> or how many bytes this slot have lost xlog from restart_lsn.

I don't think 'distance' is a good metric - that's going to continually
change. Why not store the LSN that's available and provide a function
that computes this? Or just rely on the lsn - lsn operator?

- Andres

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)anarazel(dot)de
Cc:	peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-11-08 04:14:31
Message-ID:	20171108.131431.170534842.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Mon, 6 Nov 2017 05:20:50 -0800, Andres Freund <andres(at)anarazel(dot)de> wrote in <20171106132050(dot)6apzynxrqrzghb4r(at)alap3(dot)anarazel(dot)de>
> Hi,
>
> On 2017-10-31 18:43:10 +0900, Kyotaro HORIGUCHI wrote:
> > - distance:
> > how many bytes LSN can advance before the margin defined by
> > max_slot_wal_keep_size (and wal_keep_segments) is exhasuted,
> > or how many bytes this slot have lost xlog from restart_lsn.
>
> I don't think 'distance' is a good metric - that's going to continually
> change. Why not store the LSN that's available and provide a function
> that computes this? Or just rely on the lsn - lsn operator?

It seems reasonable.,The 'secured minimum LSN' is common among
all slots so showing it in the view may look a bit stupid but I
don't find another suitable place for it. distance = 0 meant the
state that the slot is living but insecured in the previous patch
and that information is lost by changing 'distance' to
'min_secure_lsn'.

Thus I changed the 'live' column to 'status' and show that staus
in text representation.

status: secured | insecured | broken

So this looks like the following (max_slot_wal_keep_size = 8MB,
which is a half of the default segment size)

-- slots that required WAL is surely available
select restart_lsn, status, min_secure_lsn, pg_current_wal_lsn() from pg_replication_slots;
restart_lsn | status | min_recure_lsn | pg_current_wal_lsn
------------+---------+----------------+--------------------
0/1A000060 | secured | 0/1A000000 | 0/1B42BC78

-- slots that required WAL is lost
# We should have seen the log 'Some replication slots have lost...'

I noticed that I abandoned the segment fragment of
max_slot_wal_keep_size in calculating in the routines. The
current patch honors the frament part of max_slot_wal_keep_size.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	4.4 KB
0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	9.5 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)anarazel(dot)de
Cc:	peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Restricting maximum keep segments by repslots
Date:	2017-11-09 08:31:28
Message-ID:	20171109.173128.220115527.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Oops! The previous patch is forgetting the default case and crashes.

At Wed, 08 Nov 2017 13:14:31 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20171108(dot)131431(dot)170534842(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > I don't think 'distance' is a good metric - that's going to continually
> > change. Why not store the LSN that's available and provide a function
> > that computes this? Or just rely on the lsn - lsn operator?
>
> It seems reasonable.,The 'secured minimum LSN' is common among
> all slots so showing it in the view may look a bit stupid but I
> don't find another suitable place for it. distance = 0 meant the
> state that the slot is living but insecured in the previous patch
> and that information is lost by changing 'distance' to
> 'min_secure_lsn'.
>
> Thus I changed the 'live' column to 'status' and show that staus
> in text representation.
>
> status: secured | insecured | broken
>
> So this looks like the following (max_slot_wal_keep_size = 8MB,
> which is a half of the default segment size)
>
> -- slots that required WAL is surely available
> select restart_lsn, status, min_secure_lsn, pg_current_wal_lsn() from pg_replication_slots;
> restart_lsn | status | min_recure_lsn | pg_current_wal_lsn
> ------------+---------+----------------+--------------------
> 0/1A000060 | secured | 0/1A000000 | 0/1B42BC78
>
> -- slots that required WAL is still available but insecured
> restart_lsn | status | min_recure_lsn | pg_current_wal_lsn
> ------------+-----------+----------------+--------------------
> 0/1A000060 | insecured | 0/1C000000 | 0/1D76C948
>
> -- slots that required WAL is lost
> # We should have seen the log 'Some replication slots have lost...'
>
> restart_lsn | status | min_recure_lsn | pg_current_wal_lsn
> ------------+--------+----------------+--------------------
> 0/1A000060 | broken | 0/1C000000 | 0/1D76C9F0
>
>
> I noticed that I abandoned the segment fragment of
> max_slot_wal_keep_size in calculating in the routines. The
> current patch honors the frament part of max_slot_wal_keep_size.

I changed IsLsnStillAvailable to return meaningful values
regardless whether max_slot_wal_keep_size is set or not.

# I had been forgetting to count the version for latestst several
# patches. I give the version '4' - as the next of the last
# numbered patch.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots-4.patch	text/x-patch	4.4 KB
0002-Add-monitoring-aid-for-max_replication_slots-4.patch	text/x-patch	9.5 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2017-11-30 03:44:16
Message-ID:	CAB7nPqS4bhSsDm_47GVjQno=iU6thx13MQVwwXXKBHQwfwwNCA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 9, 2017 at 5:31 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> # I had been forgetting to count the version for latestst several
> # patches. I give the version '4' - as the next of the last
> # numbered patch.

With all the changes that have happened in the documentation lately, I
suspect that this is going to need a rework.. Moved to next CF per
lack of reviews, with waiting on author as status.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2017-12-22 06:03:20
Message-ID:	20171222.150320.244267360.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 30 Nov 2017 12:44:16 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqS4bhSsDm_47GVjQno=iU6thx13MQVwwXXKBHQwfwwNCA(at)mail(dot)gmail(dot)com>
> On Thu, Nov 9, 2017 at 5:31 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > # I had been forgetting to count the version for latestst several
> > # patches. I give the version '4' - as the next of the last
> > # numbered patch.
>
> With all the changes that have happened in the documentation lately, I
> suspect that this is going to need a rework.. Moved to next CF per
> lack of reviews, with waiting on author as status.

I refactored this patch so that almost-same don't appear
twice. And added recovery TAP test for this.

New function GetMinSecuredSegment() calculates the segment number
considering wal_keep_segments and
max_slot_wal_keep_size. KeepLogSeg and IsLsnStillAvailable no
longer have the code block that should be in "sync".
I think the new code is far understandable than the previous one.

The new third patch contains a TAP test to check
max_slot_wal_keep_size and relevant stats view are surely
working.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	6.9 KB
0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	8.3 KB
0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB

From:	Sergei Kornilov <sk(at)zsrv(dot)org>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>
Cc:	"andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "peter(dot)eisentraut(at)2ndquadrant(dot)com" <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2017-12-22 12:04:20
Message-ID:	337571513944260@web55j.yandex.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello
I think limit wal in replication slots is useful in some cases. But first time i was confused with proposed terminology secured/insecured/broken/unknown state.

patch -p1 gives some "Stripping trailing CRs from patch" messages for me, but applied to current HEAD and builds. After little testing i understood the difference in secured/insecured/broken terminology. Secured means garantee to keep wal, insecure - wal may be deleted with next checkpoint, broken - wal already deleted.
I think, we may split "secure" to "streaming" and... hmm... "waiting"? "keeping"? - according active flag for clarify and readable "status" field.

regards, Sergei

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sk(at)zsrv(dot)org
Cc:	michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-11 06:59:10
Message-ID:	20180111.155910.26212237.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello. Thank you for the comment.

(And sorry for the absense.)

At Fri, 22 Dec 2017 15:04:20 +0300, Sergei Kornilov <sk(at)zsrv(dot)org> wrote in <337571513944260(at)web55j(dot)yandex(dot)ru>
> Hello
> I think limit wal in replication slots is useful in some cases. But first time i was confused with proposed terminology secured/insecured/broken/unknown state.

I'm not confident on the terminology. Suggestions are welcome on
the wording that makes more sense.

> patch -p1 gives some "Stripping trailing CRs from patch"
> messages for me, but applied to current HEAD and builds. After

Hmm. I wonder why I get that complaint so often. (It's rather
common? or caused by the MIME format of my mail?) I'd say with
confidence that it is because you retrieved the patch file on
Windows mailer.

> little testing i understood the difference in
> secured/insecured/broken terminology. Secured means garantee to
> keep wal, insecure - wal may be deleted with next checkpoint,
> broken - wal already deleted.

Right. I'm sorry that I haven't written that clearly anywhere and
bothered you confirming that. I added documentation as the forth
patch.

> I think, we may split "secure" to "streaming"
> and... hmm... "waiting"? "keeping"? - according active flag for
> clarify and readable "status" field.

streaming / keeping and lost? (and unknown) Also "status" is
surely offers somewhat obscure meaning. wal_status (or
(wal_)availability) and min_keep_lsn maeke more sense?

The additional fields in pg_replication_slots have been changed
as the follows in the attached patch.

confirmed_flush_lsn:
+ wal_status : (streaming | keeping | lost | unknown)
+ min_keep_lsn : <The oldest LSN that is available in WAL files>

The changes of documentation are seen in the following html files.

doc/src/sgml/html/warm-standby.html#STREAMING-REPLICATION-SLOTS
doc/src/sgml/html/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE
doc/src/sgml/html/view-pg-replication-slots.html

One annoyance is that the min_keep_lsn always has the same value
among all slots.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	6.9 KB
0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	8.3 KB
0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.2 KB

From:	Sergei Kornilov <sk(at)zsrv(dot)org>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	"michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "peter(dot)eisentraut(at)2ndquadrant(dot)com" <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-11 09:55:27
Message-ID:	2798121515664527@web40g.yandex.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello

>> patch -p1 gives some "Stripping trailing CRs from patch"
>> messages for me, but applied to current HEAD and builds. After
>
> Hmm. I wonder why I get that complaint so often. (It's rather
> common? or caused by the MIME format of my mail?) I'd say with
> confidence that it is because you retrieved the patch file on
> Windows mailer.
I use Debian and web based mailer. Hm, i wget patches from links here /message-id/flat/20180111.155910.26212237.horiguchi.kyotaro%40lab.ntt.co.jp - applies clean both last and previous messages. Its strange.

Updated patches builds ok, but i found one failed test in make check-world: contrib/test_decoding/sql/ddl.sql at the end makes SELECT * FROM pg_replication_slots; which result of course was changed
And i still have no better ideas for naming. I think on something like
if (min_keep_lsn <= restart_lsn)
if (active_pid != 0)
status = "streaming";
else
status = "keeping";
else
status = "may_lost";
This duplicates an existing active field, but I think it's useful as slot status description.
wal_status streaming/keeping/lost/unknown as described in docs patch is also acceptable for me. Maybe anyone else has better idea?

Regards, Sergei

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Sergei Kornilov <sk(at)zsrv(dot)org>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "michael(dot)paquier(at)gmail(dot)com" <michael(dot)paquier(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "peter(dot)eisentraut(at)2ndquadrant(dot)com" <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-11 13:56:14
Message-ID:	CAM-w4HOVYZkCbCdFt8N8zwAAcuETFimwOB_Db+jgFajn-iYHEQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11 January 2018 at 09:55, Sergei Kornilov <sk(at)zsrv(dot)org> wrote:
> if (active_pid != 0)
> status = "streaming";
> else
> status = "keeping";

Perhaps "idle" by analogy to a pg_stat_activity entry for a backend
that's connected but not doing anything.

> status = "may_lost";

Perhaps "stale" or "expired"?

Is this patch in bike-shed territory? Are there any questions about
whether we want the basic shape to look like this?

Fwiw I think there's a real need for this feature so I would like to
get it in for Postgres 11.

--
greg

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sk(at)zsrv(dot)org
Cc:	michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-15 06:05:07
Message-ID:	20180115.150507.251293598.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Thu, 11 Jan 2018 12:55:27 +0300, Sergei Kornilov <sk(at)zsrv(dot)org> wrote in <2798121515664527(at)web40g(dot)yandex(dot)ru>
> Hello
>
> >> patch -p1 gives some "Stripping trailing CRs from patch"
> >> messages for me, but applied to current HEAD and builds. After
> >
> > Hmm. I wonder why I get that complaint so often. (It's rather
> > common? or caused by the MIME format of my mail?) I'd say with
> > confidence that it is because you retrieved the patch file on
> > Windows mailer.
> I use Debian and web based mailer. Hm, i wget patches from links here /message-id/flat/20180111.155910.26212237.horiguchi.kyotaro%40lab.ntt.co.jp - applies clean both last and previous messages. Its strange.

Thanks for the information. The cause I suppose is that *I*
attached the files in *text* MIME type. I taught my mailer
application to use "Application/Octet-stream" instead and that
should make most (or all) people here happy.

> Updated patches builds ok, but i found one failed test in make check-world: contrib/test_decoding/sql/ddl.sql at the end makes SELECT * FROM pg_replication_slots; which result of course was changed

Mmm. Good catch. check-world (contribs) was out of my sight.
It is fixed locally.

> And i still have no better ideas for naming. I think on something like
> if (min_keep_lsn <= restart_lsn)
> if (active_pid != 0)
> status = "streaming";
> else
> status = "keeping";
> else
> status = "may_lost";
> This duplicates an existing active field, but I think it's useful as slot status description.
> wal_status streaming/keeping/lost/unknown as described in docs patch is also acceptable for me. Maybe anyone else has better idea?

I'll fix this after the discussion.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	stark(at)mit(dot)edu
Cc:	sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-15 06:36:32
Message-ID:	20180115.153632.63046748.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Thu, 11 Jan 2018 13:56:14 +0000, Greg Stark <stark(at)mit(dot)edu> wrote in <CAM-w4HOVYZkCbCdFt8N8zwAAcuETFimwOB_Db+jgFajn-iYHEQ(at)mail(dot)gmail(dot)com>
> On 11 January 2018 at 09:55, Sergei Kornilov <sk(at)zsrv(dot)org> wrote:
> > if (active_pid != 0)
> > status = "streaming";
> > else
> > status = "keeping";
>
> Perhaps "idle" by analogy to a pg_stat_activity entry for a backend
> that's connected but not doing anything.

The state "keeping" is "some segments that are needed by a slot
are still existing but to be removed by the next checkpoint". The
three states are alogogous to green/yellow/red in traffic
lights. "idle" doesn't feel right.

> > status = "may_lost";
>
> Perhaps "stale" or "expired"?

Some random thoughts on this topic:

Reading the field as "WAL record at restrat_lsn is/has been
$(status)", "expired" fits there. "safe"/"crtical"/("stale" and
"expired") would fit "restart_lsn is $(status)"?

If we merge the second sate to the red-side, a boolean column
with the names "wal_preserved" or "wal_available" might work. But
I believe the second state is crucial.

> Is this patch in bike-shed territory? Are there any questions about
> whether we want the basic shape to look like this?

FWIW the summary history of this patch follows.

- added monitoring feature,
- GUC in bytes not in segments,
- show the "min_keep_lsn" instead of "spare amount of avalable
WAL(distance)" (*1)
- changed the words to show the status. (still under discussion)
- added documentation.

I didn't adopt "setting per slot" since the keep amount is not
measured from slot's restart_lsn, but from checkpoint LSN.

*1: As I mentioned upthread, I think that at least the
"pg_replication_slots.min_keep_lsn" is arguable since it shows
the same value for all slots and I haven't found no other
appropriate place.

> Fwiw I think there's a real need for this feature so I would like to
> get it in for Postgres 11.

It encourages me a lot. Thanks.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-16 02:34:05
Message-ID:	CA+TgmoZZNa6DUaoT0Bw3C420BBrshF5FY_L9Aa9CakC=hpCqqQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 15, 2018 at 1:05 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> >> patch -p1 gives some "Stripping trailing CRs from patch"
>> >> messages for me, but applied to current HEAD and builds. After
>> >
>> > Hmm. I wonder why I get that complaint so often. (It's rather
>> > common? or caused by the MIME format of my mail?) I'd say with
>> > confidence that it is because you retrieved the patch file on
>> > Windows mailer.
>> I use Debian and web based mailer. Hm, i wget patches from links here /message-id/flat/20180111.155910.26212237.horiguchi.kyotaro%40lab.ntt.co.jp - applies clean both last and previous messages. Its strange.
>
> Thanks for the information. The cause I suppose is that *I*
> attached the files in *text* MIME type. I taught my mailer
> application to use "Application/Octet-stream" instead and that
> should make most (or all) people here happy.

Since the "Stripping trailing CRs from patch" message is totally
harmless, I'm not sure why you should need to devote any effort to
avoiding it. Anyone who gets it should just ignore it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-16 02:45:34
Message-ID:	26718.1516070734@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Since the "Stripping trailing CRs from patch" message is totally
> harmless, I'm not sure why you should need to devote any effort to
> avoiding it. Anyone who gets it should just ignore it.

Not sure, but that might be another situation in which "patch"
works and "git apply" doesn't. (Feeling too lazy to test it...)

regards, tom lane

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	robertmhaas(at)gmail(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-16 09:04:51
Message-ID:	20180116.180451.243269920.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I'm digressing...

At Mon, 15 Jan 2018 21:45:34 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <26718(dot)1516070734(at)sss(dot)pgh(dot)pa(dot)us>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Since the "Stripping trailing CRs from patch" message is totally
> > harmless, I'm not sure why you should need to devote any effort to
> > avoiding it. Anyone who gets it should just ignore it.

I know that and totally agree to Robert but still I wonder why
(and am annoyed by) I sometimes receive such complain or even an
accusation that I sent an out-of-the-convention patch and I was
afraid that it is not actually common.

For thie reason I roughly counted up CT/CTE's that people here is
using for patches in my mail box this time and got the following
numbers. (Counted on attachments with a name "*.patch/diff".)

Rank : Freq : CT/CTE
1: 3308: application/octet-stream:base64
2: 1642: text/x-patch;charset=us-ascii:base64
3: 1286: text/x-diff;charset=us-ascii:7bit
* 4: 997: text/x-patch;charset=us-ascii:7bit
5: 497: text/x-diff;charset=us-ascii:base64
6: 406: text/x-diff:quoted-printable
7: 403: text/plain;charset=us-ascii:7bit
8: 389: text/x-diff:base64
9: 321: application/x-gzip:base64
10: 281: text/plain;charset=us-ascii:base64
<snip>
Total: attachments=11461 / mails=158121

The most common setting is application/octet-stream:base64 but
text/x-patch;charset=us-ascii:7bit is also one of ... the majority?

I'm convinced that my original setting is not so problematic so I
reverted it.

> Not sure, but that might be another situation in which "patch"
> works and "git apply" doesn't. (Feeling too lazy to test it...)

I was also afraid of that as I wrote upthread but it seems also a
needless fear.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-28 22:07:31
Message-ID:	CAEepm=3nOUqNWyKQ83StGUeCB9LUsTw66w=Sy6H+xKfSbcRu3Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 11, 2018 at 7:59 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> [new patch set]

FYI this is still broken:

test ddl ... FAILED

You could see that like this:

cd contrib/test_decoding
make check

--
Thomas Munro
http://www.enterprisedb.com

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	thomas(dot)munro(at)enterprisedb(dot)com
Cc:	sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-29 10:26:34
Message-ID:	20180129.192634.217484965.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for kindly noticing me of that.

At Mon, 29 Jan 2018 11:07:31 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=3nOUqNWyKQ83StGUeCB9LUsTw66w=Sy6H+xKfSbcRu3Q(at)mail(dot)gmail(dot)com>
> On Thu, Jan 11, 2018 at 7:59 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > [new patch set]
>
> FYI this is still broken:
>
> test ddl ... FAILED
>
> You could see that like this:
>
> cd contrib/test_decoding
> make check

I guess I might somehow have sent a working version of 0002.
While rechecking the patch, I fixed the message issued on losing
segments in 0001, revised the TAP test since I found that it was
unstable.

The attached files are the correct version of the latest patch.

Thanks.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	6.9 KB
0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	9.3 KB
0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.2 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	thomas(dot)munro(at)enterprisedb(dot)com
Cc:	sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-01-29 10:40:23
Message-ID:	20180129.194023.228030941.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Mon, 29 Jan 2018 19:26:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180129(dot)192634(dot)217484965(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> While rechecking the patch, I fixed the message issued on losing
> segments in 0001, revised the TAP test since I found that it was
> unstable.
>
> The attached files are the correct version of the latest patch.

The name of the new function GetMinKeepSegment seems giving wrong
meaning. I renamed it to GetOlestKeepSegment.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	6.9 KB
0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	9.4 KB
0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.2 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	thomas(dot)munro(at)enterprisedb(dot)com
Cc:	sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-03-19 08:09:48
Message-ID:	20180319.170948.139803971.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 29 Jan 2018 19:40:23 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180129(dot)194023(dot)228030941(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello,
>
> At Mon, 29 Jan 2018 19:26:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180129(dot)192634(dot)217484965(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > While rechecking the patch, I fixed the message issued on losing
> > segments in 0001, revised the TAP test since I found that it was
> > unstable.
> >
> > The attached files are the correct version of the latest patch.
>
> The name of the new function GetMinKeepSegment seems giving wrong
> meaning. I renamed it to GetOlestKeepSegment.

I found that fd1a421fe6 and df411e7c66 hit this . Rebased to the
current HEAD.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	7.0 KB
0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	9.4 KB
0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.3 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-06-26 07:26:59
Message-ID:	20180626.162659.223208514.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello. This is the reabased version of slot-limit feature.

This patch limits maximum WAL segments to be kept by replication
slots. Replication slot is useful to avoid desync with replicas
after temporary disconnection but it is dangerous when some of
replicas are lost. The WAL space can be exhausted and server can
PANIC in the worst case. This can prevent the worst case having a
benefit from replication slots using a new GUC variable
max_slot_wal_keep_size.

This is a feature mentioned in the documentation.

/docs/current/static/warm-standby.html#STREAMING-REPLICATION-SLOTS

> In lieu of using replication slots, it is possible to prevent the
> removal of old WAL segments using wal_keep_segments, or by
> storing the segments in an archive using
> archive_command. However, these methods often result in retaining
> more WAL segments than required, whereas replication slots retain
> only the number of segments known to be needed. An advantage of
> these methods is that they bound the space requirement for
> pg_wal; there is currently no way to do this using replication
> slots.

The previous patche files doesn't have version number so I let
the attached latest version be v2.

v2-0001-Add-WAL-releaf-vent-for-replication-slots.patch
The body of the limiting feature

v2-0002-Add-monitoring-aid-for-max_replication_slots.patch
Shows the status of WAL rataining in pg_replication_slot view

v2-0003-TAP-test-for-the-slot-limit-feature.patch
TAP test for this feature

v2-0004-Documentation-for-slot-limit-feature.patch
Documentation, as the name.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v2-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.3 KB
v2-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
v2-0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	8.9 KB
v2-0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	7.0 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-04 08:28:38
Message-ID:	20180704.172838.258613819.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Tue, 26 Jun 2018 16:26:59 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180626(dot)162659(dot)223208514(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> The previous patche files doesn't have version number so I let
> the attached latest version be v2.
>
>
> v2-0001-Add-WAL-releaf-vent-for-replication-slots.patch
> The body of the limiting feature
>
> v2-0002-Add-monitoring-aid-for-max_replication_slots.patch
> Shows the status of WAL rataining in pg_replication_slot view
>
> v2-0003-TAP-test-for-the-slot-limit-feature.patch
> TAP test for this feature
>
> v2-0004-Documentation-for-slot-limit-feature.patch
> Documentation, as the name.

Travis (test_decoding test) showed that GetOldestXLogFileSegNo
added by 0002 forgets to close temporarily opened pg_wal
directory. This is the fixed version v3.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v3-0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	7.0 KB
v3-0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	8.9 KB
v3-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
v3-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.3 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-05 06:43:56
Message-ID:	CAD21AoDiiA4qHj0thqw80Bt=vefSQ9yGpZnr0kuLTXszbrV9iQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 4, 2018 at 5:28 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello.
>
> At Tue, 26 Jun 2018 16:26:59 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180626(dot)162659(dot)223208514(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> The previous patche files doesn't have version number so I let
>> the attached latest version be v2.
>>
>>
>> v2-0001-Add-WAL-releaf-vent-for-replication-slots.patch
>> The body of the limiting feature
>>
>> v2-0002-Add-monitoring-aid-for-max_replication_slots.patch
>> Shows the status of WAL rataining in pg_replication_slot view
>>
>> v2-0003-TAP-test-for-the-slot-limit-feature.patch
>> TAP test for this feature
>>
>> v2-0004-Documentation-for-slot-limit-feature.patch
>> Documentation, as the name.
>
> Travis (test_decoding test) showed that GetOldestXLogFileSegNo
> added by 0002 forgets to close temporarily opened pg_wal
> directory. This is the fixed version v3.
>

Thank you for updating the patch! I looked at v3 patches. Here is
review comments.

---
+ {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
+ gettext_noop("Sets the maximum size of extra
WALs kept by replication slots."),
+ NULL,
+ GUC_UNIT_MB
+ },
+ &max_slot_wal_keep_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },

Maybe MAX_KILOBYTES would be better instead of INT_MAX like wal_max_size.

---
Once the following WARNING emitted this message is emitted whenever we
execute CHECKPOINT even if we don't change anything. Is that expected
behavior? I think it would be better to emit this message only when we
remove WAL segements that are required by slots.

WARNING: some replication slots have lost required WAL segments
DETAIL: The mostly affected slot has lost 153 segments.

---
+ Assert(wal_keep_segments >= 0);
+ Assert(max_slot_wal_keep_size_mb >= 0);

These assertions are not meaningful because these parameters are
ensured >= 0 by those definition.

---
+ /* slots aren't useful, consider only wal_keep_segments */
+ if (slotpos == InvalidXLogRecPtr)
+ {

Perhaps XLogRecPtrIsInvalid(slotpos) is better.

---
+ if (slotpos != InvalidXLogRecPtr && currSeg <= slotSeg + wal_keep_segments)
+ slotpos = InvalidXLogRecPtr;
+
+ /* slots aren't useful, consider only wal_keep_segments */
+ if (slotpos == InvalidXLogRecPtr)
+ {

This logic is hard to read to me. The slotpos can be any of: valid,
valid but then become invalid in halfway or invalid from beginning of
this function. Can we convert this logic to following?

if (XLogRecPtrIsInvalid(slotpos) ||
currSeg <= slotSeg + wal_keep_segments)

---
+ keepSegs = wal_keep_segments +
+ ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);

Why do we need to keep (wal_keep_segment + max_slot_wal_keep_size) WAL
segments? I think what this feature does is, if wal_keep_segments is
not useful (that is, slotSeg < (currSeg - wal_keep_segment) then we
normally choose slotSeg as lower boundary but max_slot_wal_keep_size
restrict the lower boundary so that it doesn't get lower than the
threshold. So I thought what this function should do is to calculate
min(currSeg - wal_keep_segment, max(currSeg - max_slot_wal_keep_size,
slotSeg)), I might be missing something though.

---
+ SpinLockAcquire(&XLogCtl->info_lck);
+ oldestSeg = XLogCtl->lastRemovedSegNo;
+ SpinLockRelease(&XLogCtl->info_lck);

We can use XLogGetLastRemovedSegno() instead.

---
+ xldir = AllocateDir(XLOGDIR);
+ if (xldir == NULL)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open write-ahead log directory \"%s\": %m",
+ XLOGDIR)));

Looking at other code allocating a directory we don't check xldir ==
NULL because it will be detected by ReadDir() function and raise an
error in that function. So maybe we don't need to check it just after
allocation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-09 05:47:06
Message-ID:	20180709.144706.258526585.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello. Sawada-san.

Thank you for the comments.

At Thu, 5 Jul 2018 15:43:56 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDiiA4qHj0thqw80Bt=vefSQ9yGpZnr0kuLTXszbrV9iQ(at)mail(dot)gmail(dot)com>
> On Wed, Jul 4, 2018 at 5:28 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Hello.
> >
> > At Tue, 26 Jun 2018 16:26:59 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180626(dot)162659(dot)223208514(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> >> The previous patche files doesn't have version number so I let
> >> the attached latest version be v2.
> >>
> >>
> >> v2-0001-Add-WAL-releaf-vent-for-replication-slots.patch
> >> The body of the limiting feature
> >>
> >> v2-0002-Add-monitoring-aid-for-max_replication_slots.patch
> >> Shows the status of WAL rataining in pg_replication_slot view
> >>
> >> v2-0003-TAP-test-for-the-slot-limit-feature.patch
> >> TAP test for this feature
> >>
> >> v2-0004-Documentation-for-slot-limit-feature.patch
> >> Documentation, as the name.
> >
> > Travis (test_decoding test) showed that GetOldestXLogFileSegNo
> > added by 0002 forgets to close temporarily opened pg_wal
> > directory. This is the fixed version v3.
> >
>
> Thank you for updating the patch! I looked at v3 patches. Here is
> review comments.
>
> ---
> + {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
> + gettext_noop("Sets the maximum size of extra
> WALs kept by replication slots."),
> + NULL,
> + GUC_UNIT_MB
> + },
> + &max_slot_wal_keep_size_mb,
> + 0, 0, INT_MAX,
> + NULL, NULL, NULL
> + },
>
> Maybe MAX_KILOBYTES would be better instead of INT_MAX like wal_max_size.

The MAX_KILOBYTES is maximum value of size in kB, which fits long
or Size/size_t variables after convreted into bytes. Applying the
limit there means that we assume that the _mb variable can be
converted not into bytes but kB. So applying it to
max/min_wal_size seems somewhat wrong but doesn't harm since they
are not acutually converted into bytes.

max_slot_wal_keep_size is not converted into bytes so capping
with INT_MAX is no problem. However it doesn't need to be larger
than MAX_KILOBYTES, I follow that in order to make it same with
max/min_wal_size.

> ---
> Once the following WARNING emitted this message is emitted whenever we
> execute CHECKPOINT even if we don't change anything. Is that expected
> behavior? I think it would be better to emit this message only when we
> remove WAL segements that are required by slots.
>
> WARNING: some replication slots have lost required WAL segments
> DETAIL: The mostly affected slot has lost 153 segments.

I didn't consider the situation the number of lost segments
doesn't change. Changed to mute the message when the number of
lost segments is not changed.

> ---
> + Assert(wal_keep_segments >= 0);
> + Assert(max_slot_wal_keep_size_mb >= 0);
>
> These assertions are not meaningful because these parameters are
> ensured >= 0 by those definition.

Yeah, that looks a bit being paranoid. Removed.

> ---
> + /* slots aren't useful, consider only wal_keep_segments */
> + if (slotpos == InvalidXLogRecPtr)
> + {
>
> Perhaps XLogRecPtrIsInvalid(slotpos) is better.

Agreed. It is changed to "slotpos != InvalidXLogRecPtr" after
changing the function by the comments below. I think that the
double negation !XLogRecPtrInvalid() is not fine.

> ---
> + if (slotpos != InvalidXLogRecPtr && currSeg <= slotSeg + wal_keep_segments)
> + slotpos = InvalidXLogRecPtr;
> +
> + /* slots aren't useful, consider only wal_keep_segments */
> + if (slotpos == InvalidXLogRecPtr)
> + {
>
> This logic is hard to read to me. The slotpos can be any of: valid,
> valid but then become invalid in halfway or invalid from beginning of
> this function. Can we convert this logic to following?
>
> if (XLogRecPtrIsInvalid(slotpos) ||
> currSeg <= slotSeg + wal_keep_segments)

Right. But it is removed.

> ---
> + keepSegs = wal_keep_segments +
> + ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
>
> Why do we need to keep (wal_keep_segment + max_slot_wal_keep_size) WAL
> segments? I think what this feature does is, if wal_keep_segments is
> not useful (that is, slotSeg < (currSeg - wal_keep_segment) then we
> normally choose slotSeg as lower boundary but max_slot_wal_keep_size
> restrict the lower boundary so that it doesn't get lower than the
> threshold. So I thought what this function should do is to calculate
> min(currSeg - wal_keep_segment, max(currSeg - max_slot_wal_keep_size,
> slotSeg)), I might be missing something though.

You're right that wal_keep_segments should not been added, but
should give lower limit of segments to keep as the current
KeepLogSeg() does. Fixed that.

Since the amount is specified in mega bytes, silently rounding
down to segment bounds may not be proper in general and this
feature used to use the fragments to show users something. But
there's no loner a place where the fragments are perceptible to
users and anyway the fragments are way smaller than the expected
total WAL size.

As the result, I removed the fragment calculation at all as you
suggested. It gets way smaller and simpler.

> ---
> + SpinLockAcquire(&XLogCtl->info_lck);
> + oldestSeg = XLogCtl->lastRemovedSegNo;
> + SpinLockRelease(&XLogCtl->info_lck);
>
> We can use XLogGetLastRemovedSegno() instead.

It is because I thought that it is for external usage,
spcifically by slot.c since CheckXLogRemoved() is reading it
directly. I leave it alone and they would have to be fixed at
once if we decide to use it internally.

> ---
> + xldir = AllocateDir(XLOGDIR);
> + if (xldir == NULL)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not open write-ahead log directory \"%s\": %m",
> + XLOGDIR)));
>
> Looking at other code allocating a directory we don't check xldir ==
> NULL because it will be detected by ReadDir() function and raise an
> error in that function. So maybe we don't need to check it just after
> allocation.

Thanks. I found that in the comment of ReadDir(). This doesn't
need a special error handling so I leave it to ReadDir there.

Addition to that, documentation is fixed.

Attached is the v4 files.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v4-0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	6.5 KB
v4-0002-Add-monitoring-aid-for-max_replication_slots.patch	text/x-patch	8.7 KB
v4-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
v4-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-11 06:09:23
Message-ID:	CAD21AoCFtW6+SN_eVTszDAjQeeU2sSea2VpCEx08ejNafk8H9w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 9, 2018 at 2:47 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello. Sawada-san.
>
> Thank you for the comments.
>

Thank you for updating the patch!

> At Thu, 5 Jul 2018 15:43:56 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDiiA4qHj0thqw80Bt=vefSQ9yGpZnr0kuLTXszbrV9iQ(at)mail(dot)gmail(dot)com>
>> On Wed, Jul 4, 2018 at 5:28 PM, Kyotaro HORIGUCHI
>> ---
>> + SpinLockAcquire(&XLogCtl->info_lck);
>> + oldestSeg = XLogCtl->lastRemovedSegNo;
>> + SpinLockRelease(&XLogCtl->info_lck);
>>
>> We can use XLogGetLastRemovedSegno() instead.
>
> It is because I thought that it is for external usage,
> spcifically by slot.c since CheckXLogRemoved() is reading it
> directly. I leave it alone and they would have to be fixed at
> once if we decide to use it internally.

Agreed. I noticed that after commented.

Here is review comments of v4 patches.

+ if (minKeepLSN)
+ {
+ XLogRecPtr slotPtr = XLogGetReplicationSlotMinimumLSN();
+ Assert(!XLogRecPtrIsInvalid(slotPtr));
+
+ tailSeg = GetOldestKeepSegment(currpos, slotPtr);
+
+ XLogSegNoOffsetToRecPtr(tailSeg, 0, *minKeepLSN,
wal_segment_size);
+ }

The usage of XLogSegNoOffsetToRecPtr is wrong. Since we specify the
destination at 4th argument the wal_segment_size will be changed in
the above expression. The regression tests by PostgreSQL Patch Tester
seem passed but I got the following assertion failure in
recovery/t/010_logical_decoding_timelines.pl

TRAP: FailedAssertion("!(XLogRecPtrToBytePos(*StartPos) ==
startbytepos)", File: "xlog.c", Line: 1277)
----
+ XLByteToSeg(restartLSN, restartSeg, wal_segment_size);
+
+
+ if (minKeepLSN)

There is an extra empty line.

----
+ /* but, keep larger than wal_segment_size if any*/
+ if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
+ keepSegs = wal_keep_segments;

You meant wal_keep_segments in the above comment rather than
wal_segment_size? Also, the above comment need a whitespace just after
"any".

----
+bool
+IsLsnStillAvaiable(XLogRecPtr restartLSN, XLogRecPtr *minKeepLSN)
+{

I think restartLSN is a word used for replication slots. Since this
function is defined in xlog.c it would be better to change the
argument name to more generic name, for example recptr.

----
+ /*
+ * Calcualte keep segments by slots first. The second term of the
+ * condition is just a sanity check.
+ */

s/calcualte/calculate/

----
+ /* get minimum segment ignorig timeline ID */

s/ignorig/ignoring/

----
min_keep_lsn in pg_replication_slots currently shows the same value in
every slots but I felt that the value seems not easy to understand
intuitively for users because users will have to confirm that value
and to compare the current LSN in order to check if replication slots
will become the "lost" status. So how about showing values that
indicate how far away from the point where we become "lost" for
individual slots?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-13 08:40:04
Message-ID:	20180713.174004.249224160.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Wed, 11 Jul 2018 15:09:23 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCFtW6+SN_eVTszDAjQeeU2sSea2VpCEx08ejNafk8H9w(at)mail(dot)gmail(dot)com>
> On Mon, Jul 9, 2018 at 2:47 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
..
> Here is review comments of v4 patches.
>
> + if (minKeepLSN)
> + {
> + XLogRecPtr slotPtr = XLogGetReplicationSlotMinimumLSN();
> + Assert(!XLogRecPtrIsInvalid(slotPtr));
> +
> + tailSeg = GetOldestKeepSegment(currpos, slotPtr);
> +
> + XLogSegNoOffsetToRecPtr(tailSeg, 0, *minKeepLSN,
> wal_segment_size);
> + }
>
> The usage of XLogSegNoOffsetToRecPtr is wrong. Since we specify the
> destination at 4th argument the wal_segment_size will be changed in
> the above expression. The regression tests by PostgreSQL Patch Tester

I'm not sure I get this correctly, the definition of the macro is
as follows.

| #define XLogSegNoOffsetToRecPtr(segno, offset, dest, wal_segsz_bytes) \
| (dest) = (segno) * (wal_segsz_bytes) + (offset)

The destination is the *3rd* parameter and the forth is segment
size which is not to be written.

> seem passed but I got the following assertion failure in
> recovery/t/010_logical_decoding_timelines.pl
>
> TRAP: FailedAssertion("!(XLogRecPtrToBytePos(*StartPos) ==
> startbytepos)", File: "xlog.c", Line: 1277)

Hmm. I don't see a relation with this patch, but how did you
cause the failure? The failure means inconsistency between
existing XLogBytePosToRecPtr and XLogRecPtrToBytePos, which
doesn't seem to happen without modifying the two functions.

> ----
> + XLByteToSeg(restartLSN, restartSeg, wal_segment_size);
> +
> +
> + if (minKeepLSN)
> There is an extra empty line.
>
> ----
> + /* but, keep larger than wal_segment_size if any*/
> + if (wal_keep_segments > 0 && keepSegs < wal_keep_segments)
> + keepSegs = wal_keep_segments;
>
> You meant wal_keep_segments in the above comment rather than
> wal_segment_size? Also, the above comment need a whitespace just after
> "any".

Ouch! You're right. Fixed.

> ----
> +bool
> +IsLsnStillAvaiable(XLogRecPtr restartLSN, XLogRecPtr *minKeepLSN)
> +{
>
> I think restartLSN is a word used for replication slots. Since this
> function is defined in xlog.c it would be better to change the
> argument name to more generic name, for example recptr.

Agreed. I used "target" instead.

> ----
> + /*
> + * Calcualte keep segments by slots first. The second term of the
> + * condition is just a sanity check.
> + */
>
> s/calcualte/calculate/

Fixed.

> ----
> + /* get minimum segment ignorig timeline ID */
>
> s/ignorig/ignoring/

Fixed.

# One of my fingers is literally fatter with bandaid than usual..

> ----
> min_keep_lsn in pg_replication_slots currently shows the same value in
> every slots but I felt that the value seems not easy to understand
> intuitively for users because users will have to confirm that value
> and to compare the current LSN in order to check if replication slots
> will become the "lost" status. So how about showing values that
> indicate how far away from the point where we become "lost" for
> individual slots?

Yeah, that is what I did in the first cut of this patch from the
same thought. pg_replication_slots have two additional columns
"live" and "distance".

/message-id/20171031.184310.182012625.horiguchi.kyotaro@lab.ntt.co.jp

Thre current design is changed following a comment.

/message-id/20171108.131431.170534842.horiguchi.kyotaro%40lab.ntt.co.jp

> > I don't think 'distance' is a good metric - that's going to continually
> > change. Why not store the LSN that's available and provide a function
> > that computes this? Or just rely on the lsn - lsn operator?
>
> It seems reasonable.,The 'secured minimum LSN' is common among
> all slots so showing it in the view may look a bit stupid but I
> don't find another suitable place for it. distance = 0 meant the
> state that the slot is living but insecured in the previous patch
> and that information is lost by changing 'distance' to
> 'min_secure_lsn'.

As I reconsidered this, I noticed that "lsn - lsn" doesn't make
sense here. The correct formula for the value is
"max_slot_wal_keep_size * 1024 * 1024 - ((oldest LSN to keep) -
restart_lsn). It is not a simple formula to write by hand but
doesn't seem general enough. I re-changed my mind to show the
"distance" there again.

pg_replication_slots now has the column "remain" instaed of
"min_keep_lsn", which shows an LSN when wal_status is "streaming"
and otherwise "0/0". In a special case, "remain" can be "0/0"
while "wal_status" is "streaming". It is the reason for the
tristate return value of IsLsnStillAvaialbe().

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v5-0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	6.5 KB
v5-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	11.9 KB
v5-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
v5-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-17 04:37:59
Message-ID:	CAD21AoCAdDfXNwVhoAKhBtpmrY-0tfQoQh2NiTX_Ji15msNPew@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Please see commit a22445ff0b which flipped input and output arguments.
So maybe you need to rebase the patches to current HEAD.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-17 09:58:48
Message-ID:	20180717.185848.00340514.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Tue, 17 Jul 2018 13:37:59 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCAdDfXNwVhoAKhBtpmrY-0tfQoQh2NiTX_Ji15msNPew(at)mail(dot)gmail(dot)com>
> >> The usage of XLogSegNoOffsetToRecPtr is wrong. Since we specify the
> >> destination at 4th argument the wal_segment_size will be changed in
> >> the above expression. The regression tests by PostgreSQL Patch Tester
> >
> > I'm not sure I get this correctly, the definition of the macro is
> > as follows.
> >
> > | #define XLogSegNoOffsetToRecPtr(segno, offset, dest, wal_segsz_bytes) \
> > | (dest) = (segno) * (wal_segsz_bytes) + (offset)
> >
> > The destination is the *3rd* parameter and the forth is segment
> > size which is not to be written.
>
> Please see commit a22445ff0b which flipped input and output arguments.
> So maybe you need to rebase the patches to current HEAD.

Mmm. Thanks. I never thought such change happned but it is
accidentially took away in the latest patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-20 01:13:58
Message-ID:	CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jul 13, 2018 at 5:40 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello.
>
> At Wed, 11 Jul 2018 15:09:23 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCFtW6+SN_eVTszDAjQeeU2sSea2VpCEx08ejNafk8H9w(at)mail(dot)gmail(dot)com>
>> On Mon, Jul 9, 2018 at 2:47 PM, Kyotaro HORIGUCHI
>> ----
>> min_keep_lsn in pg_replication_slots currently shows the same value in
>> every slots but I felt that the value seems not easy to understand
>> intuitively for users because users will have to confirm that value
>> and to compare the current LSN in order to check if replication slots
>> will become the "lost" status. So how about showing values that
>> indicate how far away from the point where we become "lost" for
>> individual slots?
>
> Yeah, that is what I did in the first cut of this patch from the
> same thought. pg_replication_slots have two additional columns
> "live" and "distance".
>
> /message-id/20171031.184310.182012625.horiguchi.kyotaro@lab.ntt.co.jp
>
> Thre current design is changed following a comment.
>
> /message-id/20171108.131431.170534842.horiguchi.kyotaro%40lab.ntt.co.jp
>
>> > I don't think 'distance' is a good metric - that's going to continually
>> > change. Why not store the LSN that's available and provide a function
>> > that computes this? Or just rely on the lsn - lsn operator?
>>
>> It seems reasonable.,The 'secured minimum LSN' is common among
>> all slots so showing it in the view may look a bit stupid but I
>> don't find another suitable place for it. distance = 0 meant the
>> state that the slot is living but insecured in the previous patch
>> and that information is lost by changing 'distance' to
>> 'min_secure_lsn'.
>
> As I reconsidered this, I noticed that "lsn - lsn" doesn't make
> sense here. The correct formula for the value is
> "max_slot_wal_keep_size * 1024 * 1024 - ((oldest LSN to keep) -
> restart_lsn). It is not a simple formula to write by hand but
> doesn't seem general enough. I re-changed my mind to show the
> "distance" there again.
>
> pg_replication_slots now has the column "remain" instaed of
> "min_keep_lsn", which shows an LSN when wal_status is "streaming"
> and otherwise "0/0". In a special case, "remain" can be "0/0"
> while "wal_status" is "streaming". It is the reason for the
> tristate return value of IsLsnStillAvaialbe().
>
> wal_status | remain
> streaming | 0/19E3C0 -- WAL is reserved
> streaming | 0/0 -- Still reserved but on the boundary
> keeping | 0/0 -- About to be lost.
> lost | 0/0 -- Lost.
>

The "remain" column still shows same value at all rows as follows
because you always compare between the current LSN and the minimum LSN
of replication slots. Is that you expected? My comment was to show the
distance from the restart_lsn of individual slots to the critical
point where it will lost WAL. That way, we can easily find out which
slots is about to get lost.

Also, I'm not sure it's a good way to show the distance as LSN. LSN is
a monotone increasing value but in your patch, a value of the "remain"
column can get decreased. As an alternative way I'd suggest to show it
as the number of segments. Attached patch is a patch for your v5 patch
that changes it so that the column shows how many WAL segments of
individual slots are remained until they get lost WAL.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
report_remaining_xlogsegs.patch	application/octet-stream	9.4 KB

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-20 04:25:20
Message-ID:	20180720042520.GD7023@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jul 20, 2018 at 10:13:58AM +0900, Masahiko Sawada wrote:
> Also, I'm not sure it's a good way to show the distance as LSN. LSN is
> a monotone increasing value but in your patch, a value of the "remain"
> column can get decreased.

If that can happen, I think that this is a very, very bad idea. A
couple of code paths, including segment recycling and the new WAL
advancing rely on such monotonic properties. That would be also very
confusing for any monitoring job looking at pg_replication_slots.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-23 07:16:18
Message-ID:	20180723.161618.46636100.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
> > As I reconsidered this, I noticed that "lsn - lsn" doesn't make
> > sense here. The correct formula for the value is
> > "max_slot_wal_keep_size * 1024 * 1024 - ((oldest LSN to keep) -
> > restart_lsn). It is not a simple formula to write by hand but
> > doesn't seem general enough. I re-changed my mind to show the
> > "distance" there again.
> >
> > pg_replication_slots now has the column "remain" instaed of
> > "min_keep_lsn", which shows an LSN when wal_status is "streaming"
> > and otherwise "0/0". In a special case, "remain" can be "0/0"
> > while "wal_status" is "streaming". It is the reason for the
> > tristate return value of IsLsnStillAvaialbe().
> >
> > wal_status | remain
> > streaming | 0/19E3C0 -- WAL is reserved
> > streaming | 0/0 -- Still reserved but on the boundary
> > keeping | 0/0 -- About to be lost.
> > lost | 0/0 -- Lost.
> >
>
> The "remain" column still shows same value at all rows as follows
> because you always compare between the current LSN and the minimum LSN
> of replication slots. Is that you expected? My comment was to show the

Ouch! Sorry for the silly mistake. GetOldestKeepSegment should
calculate restBytes based on the distance from the cutoff LSN to
restart_lsn, not to minSlotLSN. The attached fixed v6 correctly
shows the distance individually.

> Also, I'm not sure it's a good way to show the distance as LSN. LSN is
> a monotone increasing value but in your patch, a value of the "remain"
> column can get decreased. As an alternative way I'd suggest to show it

The LSN of WAL won't be decreased but an LSN is just a position
in a WAL stream. Since the representation of LSN is composed of
the two components 'file number' and 'offset', it's quite natural
to show the difference in the same unit. The distance between the
points at "6m" and "10m" is "4m".

> as the number of segments. Attached patch is a patch for your v5 patch
> that changes it so that the column shows how many WAL segments of
> individual slots are remained until they get lost WAL.

Segment size varies by configuration, so segment number is not
intuitive to show distance. I think it is the most significant
reason we move to "bytes" from "segments" about WAL sizings like
max_wal_size. More than anything, it's too coarse. The required
segments may be lasts for the time to consume a whole segment or
may be removed just after. We could calculate the fragment bytes
but it requires some internal knowledge.

Instead, I made the field be shown in flat "bytes" using bigint,
which can be nicely shown using pg_size_pretty;

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v6-0001-Add-WAL-releaf-vent-for-replication-slots.patch	text/x-patch	6.5 KB
v6-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	12.2 KB
v6-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
v6-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-24 07:47:41
Message-ID:	CAD21AoD0rChq7wQE=_o95quopcQGjcVG9omwdH07nT5cm81hzg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello.
>
> At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
>> > As I reconsidered this, I noticed that "lsn - lsn" doesn't make
>> > sense here. The correct formula for the value is
>> > "max_slot_wal_keep_size * 1024 * 1024 - ((oldest LSN to keep) -
>> > restart_lsn). It is not a simple formula to write by hand but
>> > doesn't seem general enough. I re-changed my mind to show the
>> > "distance" there again.
>> >
>> > pg_replication_slots now has the column "remain" instaed of
>> > "min_keep_lsn", which shows an LSN when wal_status is "streaming"
>> > and otherwise "0/0". In a special case, "remain" can be "0/0"
>> > while "wal_status" is "streaming". It is the reason for the
>> > tristate return value of IsLsnStillAvaialbe().
>> >
>> > wal_status | remain
>> > streaming | 0/19E3C0 -- WAL is reserved
>> > streaming | 0/0 -- Still reserved but on the boundary
>> > keeping | 0/0 -- About to be lost.
>> > lost | 0/0 -- Lost.
>> >
>>
>> The "remain" column still shows same value at all rows as follows
>> because you always compare between the current LSN and the minimum LSN
>> of replication slots. Is that you expected? My comment was to show the
>
> Ouch! Sorry for the silly mistake. GetOldestKeepSegment should
> calculate restBytes based on the distance from the cutoff LSN to
> restart_lsn, not to minSlotLSN. The attached fixed v6 correctly
> shows the distance individually.
>
>> Also, I'm not sure it's a good way to show the distance as LSN. LSN is
>> a monotone increasing value but in your patch, a value of the "remain"
>> column can get decreased. As an alternative way I'd suggest to show it
>
> The LSN of WAL won't be decreased but an LSN is just a position
> in a WAL stream. Since the representation of LSN is composed of
> the two components 'file number' and 'offset', it's quite natural
> to show the difference in the same unit. The distance between the
> points at "6m" and "10m" is "4m".
>
>> as the number of segments. Attached patch is a patch for your v5 patch
>> that changes it so that the column shows how many WAL segments of
>> individual slots are remained until they get lost WAL.
>
> Segment size varies by configuration, so segment number is not
> intuitive to show distance. I think it is the most significant
> reason we move to "bytes" from "segments" about WAL sizings like
> max_wal_size. More than anything, it's too coarse. The required
> segments may be lasts for the time to consume a whole segment or
> may be removed just after. We could calculate the fragment bytes
> but it requires some internal knowledge.
>
> Instead, I made the field be shown in flat "bytes" using bigint,
> which can be nicely shown using pg_size_pretty;

Thank you for updating. I agree showing the remain in bytes.

Here is review comments for v6 patch.

@@ -967,9 +969,9 @@ postgres=# SELECT * FROM
pg_create_physical_replication_slot('node_a_slot');
node_a_slot |

This funk should be updated.

-----
+/*
+ * Returns minimum segment number the next checktpoint must leave considering
+ * wal_keep_segments, replication slots and max_slot_wal_keep_size.
+ *
+ * If resetBytes is not NULL, returns remaining LSN bytes to advance until any
+ * slot loses reserving a WAL record.
+ */
+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
XLogRecPtr restartLSN, uint64 *restBytes)
+{

You're assuming that the minSlotLSN is the minimum LSN of replication
slots but it's not mentioned anywhere. Since you check minSlotSeg <=
currSeg but not force it, if a caller sets a wrong value to minSlotLSN
this function will return a wrong value with no complaints. Similarly
there is not explanation about the resetartLSN, so you can add it. I'm
not sure the augment name restartLSN is suitable for the function in
xlog.c but I'd defer it to committers.

Since this function assumes that both restartLSN and *restBytes are
valid or invalid (and NULL) it's better to add assertions for safety.
The current code accepts even the case where only either argment is
valid.

-----
+ if (limitSegs > 0 && currSeg <= restartSeg + limitSegs)
+ {
+ /*
+ * This slot still has all required segments.
Calculate how many
+ * LSN bytes the slot has until it loses restart_lsn.
+ */
+ fragbytes = wal_segment_size - (currLSN %
wal_segment_size);
+ *restBytes =
+ (restartSeg + limitSegs - currSeg) *
wal_segment_size
+ + fragbytes;
+ }
+ }

This code doesn't consider the case where wal_keep_segments >
max_slot_keep_size. In the case I think we should use (currSeg -
wal_keep_segments) as the lower bound in order to avoid showing
"streaming" in the wal_status although the remain is 0.

-----
+ *restBytes =
+ (restartSeg + limitSegs - currSeg) *
wal_segment_size
+ + fragbytes;

Maybe you can use XLogSegNoOffsetToRecPtr instead.

-----
+ * 0 means that WAL record at targetLSN is alredy removed.
+ * 1 means that WAL record at tagetLSN is availble.
+ * 2 means that WAL record at tagetLSN is availble but about to be removed by

s/alredy/already/
s/tagetLSN/targetLSN/
s/availble/available/

-----
+ * If resetBytes is not NULL, returns remaining LSN bytes to advance until any
+ * slot loses reserving a WAL record.

s/resetBytes/restBytes/

-----
+ Specify the maximum size of WAL files
+ that <link linkend="streaming-replication-slots">replication
+ slots</link> are allowed to reatin in the <filename>pg_wal</filename>

s/reatin/retain/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-31 09:11:21
Message-ID:	20180731.181121.32422923.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Tue, 24 Jul 2018 16:47:41 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD0rChq7wQE=_o95quopcQGjcVG9omwdH07nT5cm81hzg(at)mail(dot)gmail(dot)com>
> On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Hello.
> >
> > At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
..
> > Instead, I made the field be shown in flat "bytes" using bigint,
> > which can be nicely shown using pg_size_pretty;
>
> Thank you for updating. I agree showing the remain in bytes.
>
> Here is review comments for v6 patch.
>
> @@ -967,9 +969,9 @@ postgres=# SELECT * FROM
> pg_create_physical_replication_slot('node_a_slot');
> node_a_slot |
>
> postgres=# SELECT * FROM pg_replication_slots;
> - slot_name | slot_type | datoid | database | active | xmin |
> restart_lsn | confirmed_flush_lsn
> --------------+-----------+--------+----------+--------+------+-------------+---------------------
> - node_a_slot | physical | | | f | | |
> + slot_name | slot_type | datoid | database | active | xmin |
> restart_lsn | confirmed_flush_lsn | wal_status | min_keep_lsn
> +-------------+-----------+--------+----------+--------+------+-------------+---------------------+------------+--------------
> + node_a_slot | physical | | | f | |
> | | unknown | 0/1000000
>
> This funk should be updated.

Perhaps you need a fresh database cluster.

> -----
> +/*
> + * Returns minimum segment number the next checktpoint must leave considering
> + * wal_keep_segments, replication slots and max_slot_wal_keep_size.
> + *
> + * If resetBytes is not NULL, returns remaining LSN bytes to advance until any
> + * slot loses reserving a WAL record.
> + */
> +static XLogSegNo
> +GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
> XLogRecPtr restartLSN, uint64 *restBytes)
> +{
>
> You're assuming that the minSlotLSN is the minimum LSN of replication
> slots but it's not mentioned anywhere. Since you check minSlotSeg <=

I added description for parameters in the function comment.

> currSeg but not force it, if a caller sets a wrong value to minSlotLSN
> this function will return a wrong value with no complaints. Similarly

I don't think such case can happen on a sane system. Even that
happenes it behaves in the same way as minSlotLSN being invalid
in the case. KeepLogSeg() also behaves in the same way and the
wal recycling will be performed as pg_replication_losts
predicted. Nothing can improve the behavior and I don't think
placing assertion there is overkill.

> there is not explanation about the resetartLSN, so you can add it. I'm
> not sure the augment name restartLSN is suitable for the function in
> xlog.c but I'd defer it to committers.

Done.

> Since this function assumes that both restartLSN and *restBytes are
> valid or invalid (and NULL) it's better to add assertions for safety.
> The current code accepts even the case where only either argment is
> valid.
> -----
> + if (limitSegs > 0 && currSeg <= restartSeg + limitSegs)
> + {

Even if the caller gives InvalidRecPtr as restartLSN, which is an
insane situation, the function just treats the value as zero and
reuturns the "correct" value for the restartLSN, which doesn't
harm anything.

> + /*
> + * This slot still has all required segments.
> Calculate how many
> + * LSN bytes the slot has until it loses restart_lsn.
> + */
> + fragbytes = wal_segment_size - (currLSN %
> wal_segment_size);
> + *restBytes =
> + (restartSeg + limitSegs - currSeg) *
> wal_segment_size
> + + fragbytes;
> + }
> + }
>
> This code doesn't consider the case where wal_keep_segments >
> max_slot_keep_size. In the case I think we should use (currSeg -
> wal_keep_segments) as the lower bound in order to avoid showing
> "streaming" in the wal_status although the remain is 0.

Thanks. It should use keepSegs instead of limitSegs. Fixed.

> -----
> + *restBytes =
> + (restartSeg + limitSegs - currSeg) *
> wal_segment_size
> + + fragbytes;
>
> Maybe you can use XLogSegNoOffsetToRecPtr instead.

Indeed. I'm not sure it is easier to read, though. (Maybe the
functions should use wal_segment_size out-of-band. (That is, not
passed as a parameter)).

> -----
> + * 0 means that WAL record at targetLSN is alredy removed.
> + * 1 means that WAL record at tagetLSN is availble.
> + * 2 means that WAL record at tagetLSN is availble but about to be removed by
>
> s/alredy/already/
> s/tagetLSN/targetLSN/
> s/availble/available/
> -----
> + * If resetBytes is not NULL, returns remaining LSN bytes to advance until any
> + * slot loses reserving a WAL record.
>
> s/resetBytes/restBytes/

Ugggh! Sorry that my fingers are extra-fat.. Fixed. I rechecked
through the whole patch and found one more typo.

> -----
> + Specify the maximum size of WAL files
> + that <link linkend="streaming-replication-slots">replication
> + slots</link> are allowed to reatin in the <filename>pg_wal</filename>
>
> s/reatin/retain/

Thank you. I also found other leftovers in catalogs.sgml and
high-availability.sgml.

# The latter file seems needing amendment for v11.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v7-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	5.3 KB
v7-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	5.3 KB
v7-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	11.4 KB
v7-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	6.7 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-31 19:11:52
Message-ID:	20180731191152.GA2791@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 26, 2018 at 04:26:59PM +0900, Kyotaro HORIGUCHI wrote:
> Hello. This is the reabased version of slot-limit feature.
>
> This patch limits maximum WAL segments to be kept by replication
> slots. Replication slot is useful to avoid desync with replicas
> after temporary disconnection but it is dangerous when some of
> replicas are lost. The WAL space can be exhausted and server can
> PANIC in the worst case. This can prevent the worst case having a
> benefit from replication slots using a new GUC variable
> max_slot_wal_keep_size.

Have you considered just using a boolean to control if max_wal_size
honors WAL preserved by replication slots, rather than creating the new
GUC max_slot_wal_keep_size?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-31 19:14:03
Message-ID:	20180731191403.satjiy4i3ce3voqs@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2018-07-31 15:11:52 -0400, Bruce Momjian wrote:
> On Tue, Jun 26, 2018 at 04:26:59PM +0900, Kyotaro HORIGUCHI wrote:
> > Hello. This is the reabased version of slot-limit feature.
> >
> > This patch limits maximum WAL segments to be kept by replication
> > slots. Replication slot is useful to avoid desync with replicas
> > after temporary disconnection but it is dangerous when some of
> > replicas are lost. The WAL space can be exhausted and server can
> > PANIC in the worst case. This can prevent the worst case having a
> > benefit from replication slots using a new GUC variable
> > max_slot_wal_keep_size.
>
> Have you considered just using a boolean to control if max_wal_size
> honors WAL preserved by replication slots, rather than creating the new
> GUC max_slot_wal_keep_size?

That seems like a bad idea. max_wal_size influences checkpoint
scheduling - there's no good reason to conflate that with retention?

Greetings,

Andres Freund

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-31 19:21:27
Message-ID:	20180731192127.GF27724@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greetings,

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2018-07-31 15:11:52 -0400, Bruce Momjian wrote:
> > On Tue, Jun 26, 2018 at 04:26:59PM +0900, Kyotaro HORIGUCHI wrote:
> > > Hello. This is the reabased version of slot-limit feature.
> > >
> > > This patch limits maximum WAL segments to be kept by replication
> > > slots. Replication slot is useful to avoid desync with replicas
> > > after temporary disconnection but it is dangerous when some of
> > > replicas are lost. The WAL space can be exhausted and server can
> > > PANIC in the worst case. This can prevent the worst case having a
> > > benefit from replication slots using a new GUC variable
> > > max_slot_wal_keep_size.
> >
> > Have you considered just using a boolean to control if max_wal_size
> > honors WAL preserved by replication slots, rather than creating the new
> > GUC max_slot_wal_keep_size?
>
> That seems like a bad idea. max_wal_size influences checkpoint
> scheduling - there's no good reason to conflate that with retention?

I agree that we shouldn't conflate checkpointing and retention. What I
wonder about though is what value will wal_keep_segments have once this
new GUC exists..? I wonder if we could deprecate it... I wish we had
implemented repliation slots from the start with wal_keep_segments
capping the max WAL retained but that ship has sailed and changing it
now would break existing configurations.

Thanks!

Stephen

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-07-31 19:24:13
Message-ID:	20180731192413.7lr4qbc4qbyoim5y@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2018-07-31 15:21:27 -0400, Stephen Frost wrote:
> Greetings,
>
> * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > On 2018-07-31 15:11:52 -0400, Bruce Momjian wrote:
> > > On Tue, Jun 26, 2018 at 04:26:59PM +0900, Kyotaro HORIGUCHI wrote:
> > > > Hello. This is the reabased version of slot-limit feature.
> > > >
> > > > This patch limits maximum WAL segments to be kept by replication
> > > > slots. Replication slot is useful to avoid desync with replicas
> > > > after temporary disconnection but it is dangerous when some of
> > > > replicas are lost. The WAL space can be exhausted and server can
> > > > PANIC in the worst case. This can prevent the worst case having a
> > > > benefit from replication slots using a new GUC variable
> > > > max_slot_wal_keep_size.
> > >
> > > Have you considered just using a boolean to control if max_wal_size
> > > honors WAL preserved by replication slots, rather than creating the new
> > > GUC max_slot_wal_keep_size?
> >
> > That seems like a bad idea. max_wal_size influences checkpoint
> > scheduling - there's no good reason to conflate that with retention?
>
> I agree that we shouldn't conflate checkpointing and retention. What I
> wonder about though is what value will wal_keep_segments have once this
> new GUC exists..? I wonder if we could deprecate it...

Don't think that's a good idea. It's entirely conceivable to have a
wal_keep_segments much lower than max_slot_wal_keep_size. For some
throwaway things it can be annoying to have to slots, and if you remove
wal_keep_segments there's no alternative.

Greetings,

Andres Freund

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)anarazel(dot)de
Cc:	sfrost(at)snowman(dot)net, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-08-01 01:52:21
Message-ID:	20180801.105221.55858423.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 31 Jul 2018 12:24:13 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in <20180731192413(dot)7lr4qbc4qbyoim5y(at)alap3(dot)anarazel(dot)de>
> On 2018-07-31 15:21:27 -0400, Stephen Frost wrote:
> > Greetings,
> >
> > * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > > On 2018-07-31 15:11:52 -0400, Bruce Momjian wrote:
> > > > On Tue, Jun 26, 2018 at 04:26:59PM +0900, Kyotaro HORIGUCHI wrote:
> > > > > Hello. This is the reabased version of slot-limit feature.
> > > > >
> > > > > This patch limits maximum WAL segments to be kept by replication
> > > > > slots. Replication slot is useful to avoid desync with replicas
> > > > > after temporary disconnection but it is dangerous when some of
> > > > > replicas are lost. The WAL space can be exhausted and server can
> > > > > PANIC in the worst case. This can prevent the worst case having a
> > > > > benefit from replication slots using a new GUC variable
> > > > > max_slot_wal_keep_size.
> > > >
> > > > Have you considered just using a boolean to control if max_wal_size
> > > > honors WAL preserved by replication slots, rather than creating the new
> > > > GUC max_slot_wal_keep_size?
> > >
> > > That seems like a bad idea. max_wal_size influences checkpoint
> > > scheduling - there's no good reason to conflate that with retention?
> >
> > I agree that we shouldn't conflate checkpointing and retention. What I
> > wonder about though is what value will wal_keep_segments have once this
> > new GUC exists..? I wonder if we could deprecate it...
>
> Don't think that's a good idea. It's entirely conceivable to have a
> wal_keep_segments much lower than max_slot_wal_keep_size. For some
> throwaway things it can be annoying to have to slots, and if you remove
> wal_keep_segments there's no alternative.

I thought it's to be deprecated for some reason so I'm leaving
wal_keep_segments in '# of segments' even though the new GUC is
in MB. I'm a bit uneasy that the two similar settings are in
different units. Couldn't we turn it into MB taking this
opportunity if we will keep wal_keep_segments, changing its name
to min_wal_keep_size? max_slot_wal_keep_size could be changed to
just max_wal_keep_size along with it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-08-02 13:05:33
Message-ID:	CA+TgmoYVrKY0W0jigJymFZo0ewkQoWGfLLpiTSgJLQN3tcHGTg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 31, 2018 at 9:52 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I thought it's to be deprecated for some reason so I'm leaving
> wal_keep_segments in '# of segments' even though the new GUC is
> in MB. I'm a bit uneasy that the two similar settings are in
> different units. Couldn't we turn it into MB taking this
> opportunity if we will keep wal_keep_segments, changing its name
> to min_wal_keep_size? max_slot_wal_keep_size could be changed to
> just max_wal_keep_size along with it.

This seems like it's a little bit of a separate topic from what this
thread about, but FWIW, +1 for standardizing on MB.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	andres(at)anarazel(dot)de, sfrost(at)snowman(dot)net, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-08-03 04:59:51
Message-ID:	20180803.135951.149443155.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 2 Aug 2018 09:05:33 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoYVrKY0W0jigJymFZo0ewkQoWGfLLpiTSgJLQN3tcHGTg(at)mail(dot)gmail(dot)com>
> On Tue, Jul 31, 2018 at 9:52 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > I thought it's to be deprecated for some reason so I'm leaving
> > wal_keep_segments in '# of segments' even though the new GUC is
> > in MB. I'm a bit uneasy that the two similar settings are in
> > different units. Couldn't we turn it into MB taking this
> > opportunity if we will keep wal_keep_segments, changing its name
> > to min_wal_keep_size? max_slot_wal_keep_size could be changed to
> > just max_wal_keep_size along with it.
>
> This seems like it's a little bit of a separate topic from what this
> thread about, but FWIW, +1 for standardizing on MB.

Thanks. Ok, I'll raise this after separately with this.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-04 00:42:20
Message-ID:	CAD21AoBD6h_dth1prZsEGSYsc=Y8M6BL5+TOx+BUSqCJ0zH=CQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for updating the patch.

On Tue, Jul 31, 2018 at 6:11 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello.
>
> At Tue, 24 Jul 2018 16:47:41 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD0rChq7wQE=_o95quopcQGjcVG9omwdH07nT5cm81hzg(at)mail(dot)gmail(dot)com>
>> On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Hello.
>> >
>> > At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
> ..
>> > Instead, I made the field be shown in flat "bytes" using bigint,
>> > which can be nicely shown using pg_size_pretty;
>>
>> Thank you for updating. I agree showing the remain in bytes.
>>
>> Here is review comments for v6 patch.
>>
>> @@ -967,9 +969,9 @@ postgres=# SELECT * FROM
>> pg_create_physical_replication_slot('node_a_slot');
>> node_a_slot |
>>
>> postgres=# SELECT * FROM pg_replication_slots;
>> - slot_name | slot_type | datoid | database | active | xmin |
>> restart_lsn | confirmed_flush_lsn
>> --------------+-----------+--------+----------+--------+------+-------------+---------------------
>> - node_a_slot | physical | | | f | | |
>> + slot_name | slot_type | datoid | database | active | xmin |
>> restart_lsn | confirmed_flush_lsn | wal_status | min_keep_lsn
>> +-------------+-----------+--------+----------+--------+------+-------------+---------------------+------------+--------------
>> + node_a_slot | physical | | | f | |
>> | | unknown | 0/1000000
>>
>> This funk should be updated.
>
> Perhaps you need a fresh database cluster.

I meant this was a doc update in 0004 patch but it's fixed in v7 patch.

While testing the v7 patch, I got the following result with
max_slot_wal_keep_size = 5GB and without wal_keep_segments setting.

The actual distance between the slot limit and the slot 'l1' is about
1GB(5GB - (2/A30000D8 - 1/AC000910)) but the system view says the
remain is only 16MB. For the calculation of resetBytes in
GetOldestKeepSegment(), the current patch seems to calculate the
distance between the minSlotLSN and restartLSN when (curLSN -
max_slot_wal_keep_size) < minSlotLSN. However, I think that the actual
remained bytes until the slot lost the required WAL is (restartLSN -
(currLSN - max_slot_wal_keep_size)) in that case.

Also, 0004 patch needs to be rebased on the current HEAD.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-04 10:52:50
Message-ID:	20180904.195250.144186960.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 3 Sep 2018 18:14:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBgCMc9bp2cADMFm40qoEXxbomdu1dtj5EaFSAS4BtAyw(at)mail(dot)gmail(dot)com>
> Thank you for updating the patch!
>
> On Tue, Jul 31, 2018 at 6:11 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Hello.
> >
> > At Tue, 24 Jul 2018 16:47:41 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD0rChq7wQE=_o95quopcQGjcVG9omwdH07nT5cm81hzg(at)mail(dot)gmail(dot)com>
> >> On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >> > Hello.
> >> >
> >> > At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
> >> This funk should be updated.
> >
> > Perhaps you need a fresh database cluster.
>
> I meant this was a doc update in 0004 patch but it's fixed in v7 patch.

Wow..

> While testing the v7 patch, I got the following result with
> max_slot_wal_keep_size = 5GB and without wal_keep_segments setting.
>
> =# select pg_current_wal_lsn(), slot_name, restart_lsn,
> confirmed_flush_lsn, wal_status, remain, pg_size_pretty(remain) from
> pg_replication_slots ;
> pg_current_wal_lsn | slot_name | restart_lsn | confirmed_flush_lsn |
> wal_status | remain | pg_size_pretty
> --------------------+-----------+-------------+---------------------+------------+----------+----------------
> 2/A30000D8 | l1 | 1/AC000910 | 1/AC000948 |
> streaming | 16777000 | 16 MB
> (1 row)
>
> The actual distance between the slot limit and the slot 'l1' is about
> 1GB(5GB - (2/A30000D8 - 1/AC000910)) but the system view says the
> remain is only 16MB. For the calculation of resetBytes in
> GetOldestKeepSegment(), the current patch seems to calculate the
> distance between the minSlotLSN and restartLSN when (curLSN -
> max_slot_wal_keep_size) < minSlotLSN. However, I think that the actual
> remained bytes until the slot lost the required WAL is (restartLSN -
> (currLSN - max_slot_wal_keep_size)) in that case.

Oops! That's a silly thinko or rather a typo. It's apparently
wrong that keepSegs instead of limitSegs is involved in making
the calculation of restBytes. Just using limitSegs makes it
sane. It's a pity that I removed the remain from regression test.

Fixed that and added an item for remain calculation in the TAP
test. I expect that pg_size_pretty() adds some robustness to the
test.

> Also, 0004 patch needs to be rebased on the current HEAD.

Done. Please find the v8 attached.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v8-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	6.7 KB
v8-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	11.8 KB
v8-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	6.1 KB
v8-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	4.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-05 05:31:10
Message-ID:	CAD21AoB-HJvL+uKsv40Gb8Dymh9uBBQUXTucqv4MDtH_AGKh4g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Mon, 3 Sep 2018 18:14:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBgCMc9bp2cADMFm40qoEXxbomdu1dtj5EaFSAS4BtAyw(at)mail(dot)gmail(dot)com>
>> Thank you for updating the patch!
>>
>> On Tue, Jul 31, 2018 at 6:11 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Hello.
>> >
>> > At Tue, 24 Jul 2018 16:47:41 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD0rChq7wQE=_o95quopcQGjcVG9omwdH07nT5cm81hzg(at)mail(dot)gmail(dot)com>
>> >> On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> >> > Hello.
>> >> >
>> >> > At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
>> >> This funk should be updated.
>> >
>> > Perhaps you need a fresh database cluster.
>>
>> I meant this was a doc update in 0004 patch but it's fixed in v7 patch.
>
> Wow..
>
>> While testing the v7 patch, I got the following result with
>> max_slot_wal_keep_size = 5GB and without wal_keep_segments setting.
>>
>> =# select pg_current_wal_lsn(), slot_name, restart_lsn,
>> confirmed_flush_lsn, wal_status, remain, pg_size_pretty(remain) from
>> pg_replication_slots ;
>> pg_current_wal_lsn | slot_name | restart_lsn | confirmed_flush_lsn |
>> wal_status | remain | pg_size_pretty
>> --------------------+-----------+-------------+---------------------+------------+----------+----------------
>> 2/A30000D8 | l1 | 1/AC000910 | 1/AC000948 |
>> streaming | 16777000 | 16 MB
>> (1 row)
>>
>> The actual distance between the slot limit and the slot 'l1' is about
>> 1GB(5GB - (2/A30000D8 - 1/AC000910)) but the system view says the
>> remain is only 16MB. For the calculation of resetBytes in
>> GetOldestKeepSegment(), the current patch seems to calculate the
>> distance between the minSlotLSN and restartLSN when (curLSN -
>> max_slot_wal_keep_size) < minSlotLSN. However, I think that the actual
>> remained bytes until the slot lost the required WAL is (restartLSN -
>> (currLSN - max_slot_wal_keep_size)) in that case.
>
> Oops! That's a silly thinko or rather a typo. It's apparently
> wrong that keepSegs instead of limitSegs is involved in making
> the calculation of restBytes. Just using limitSegs makes it
> sane. It's a pity that I removed the remain from regression test.
>
> Fixed that and added an item for remain calculation in the TAP
> test. I expect that pg_size_pretty() adds some robustness to the
> test.
>
>> Also, 0004 patch needs to be rebased on the current HEAD.
>
> Done. Please find the v8 attached.
>

Thank you for updating! Here is the review comment for v8 patch.

+ /*
+ * This slot still has all required segments. Calculate how many
+ * LSN bytes the slot has until it loses restart_lsn.
+ */
+ fragbytes = wal_segment_size - (currLSN % wal_segment_size);
+ XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
fragbytes,
+ wal_segment_size, *restBytes);

For the calculation of fragbytes, I think we should calculate the
fragment bytes of restartLSN instead. The the formula "restartSeg +
limitSegs - currSeg" means the # of segment between restartLSN and the
limit by the new parameter. I don't think that the summation of it and
fragment bytes of currenLSN is correct. As the following result
(max_slot_wal_keep_size is 128MB) shows, the remain column shows the
actual remains + 16MB (get_bytes function returns the value of
max_slot_wal_keep_size in bytes).

---
If the wal_keeps_segments is greater than max_slot_wal_keep_size, the
wal_keep_segments doesn't affect the value of the remain column.

postgres(1:48422)=# show max_slot_wal_keep_size ;
max_slot_wal_keep_size
------------------------
128MB
(1 row)

postgres(1:48422)=# show wal_keep_segments ;
wal_keep_segments
-------------------
5000
(1 row)

*** After consumed over 128MB WAL ***

---
For the cosmetic stuff there are code where need the line break.

static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr
minSlotPtr, XLogRecPtr restartLSN, uint64 *restBytes);
static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);

and

+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
XLogRecPtr restartLSN, uint64 *restBytes)
+{

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-06 07:10:03
Message-ID:	20180906.161003.134412044.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the comment.

At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB-HJvL+uKsv40Gb8Dymh9uBBQUXTucqv4MDtH_AGKh4g(at)mail(dot)gmail(dot)com>
> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
> Thank you for updating! Here is the review comment for v8 patch.
>
> + /*
> + * This slot still has all required segments. Calculate how many
> + * LSN bytes the slot has until it loses restart_lsn.
> + */
> + fragbytes = wal_segment_size - (currLSN % wal_segment_size);
> + XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
> fragbytes,
> + wal_segment_size, *restBytes);
>
> For the calculation of fragbytes, I think we should calculate the
> fragment bytes of restartLSN instead. The the formula "restartSeg +
> limitSegs - currSeg" means the # of segment between restartLSN and the
> limit by the new parameter. I don't think that the summation of it and
> fragment bytes of currenLSN is correct. As the following result
> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the
> actual remains + 16MB (get_bytes function returns the value of
> max_slot_wal_keep_size in bytes).

Since a oldest segment is removed after the current LSN moves to
the next segmen, current LSN naturally determines the fragment
bytes. Maybe you're concerning that the number of segments looks
too much by one segment.

One arguable point of the feature is how max_slot_wal_keep_size
works exactly. I assume that even though the name is named as
"max_", we actually expect that "at least that bytes are
kept". So, for example, with 16MB of segment size and 50MB of
max_s_w_k_s, I designed this so that the size of preserved WAL
doesn't go below 50MB, actually (rounding up to multples of 16MB
of 50MB), and loses the oldest segment when it reaches 64MB +
16MB = 80MB as you saw.

# I believe that the difference is not so significant since we
# have around hunderd or several hundreds of segments in common
# cases.

Do you mean that we should define the GUC parameter literally as
"we won't have exactly that many bytes of WAL segmetns"? That is,
we have at most 48MB preserved WAL records for 50MB of
max_s_w_k_s setting. This is the same to how max_wal_size is
counted but I don't think max_slot_wal_keep_size will be regarded
in the same way.

The another design would be that we remove the oldest segnent
when WAL reaches to 64MB and reduces to 48MB after deletion.

> ---
> For the cosmetic stuff there are code where need the line break.
>
> static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
> +static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr
> minSlotPtr, XLogRecPtr restartLSN, uint64 *restBytes);
> static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
> static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
>
> and
>
> +static XLogSegNo
> +GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
> XLogRecPtr restartLSN, uint64 *restBytes)
> +{

Thanks, I folded the parameter list in my working repository.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-06 10:55:39
Message-ID:	CAD21AoAZCdvdMN-vG4D_653vb_FN-AaMAP5+GXgF1JRjy+LeyA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 6, 2018 at 4:10 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Thank you for the comment.
>
> At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB-HJvL+uKsv40Gb8Dymh9uBBQUXTucqv4MDtH_AGKh4g(at)mail(dot)gmail(dot)com>
>> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
>> Thank you for updating! Here is the review comment for v8 patch.
>>
>> + /*
>> + * This slot still has all required segments. Calculate how many
>> + * LSN bytes the slot has until it loses restart_lsn.
>> + */
>> + fragbytes = wal_segment_size - (currLSN % wal_segment_size);
>> + XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
>> fragbytes,
>> + wal_segment_size, *restBytes);
>>
>> For the calculation of fragbytes, I think we should calculate the
>> fragment bytes of restartLSN instead. The the formula "restartSeg +
>> limitSegs - currSeg" means the # of segment between restartLSN and the
>> limit by the new parameter. I don't think that the summation of it and
>> fragment bytes of currenLSN is correct. As the following result
>> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the
>> actual remains + 16MB (get_bytes function returns the value of
>> max_slot_wal_keep_size in bytes).
>
> Since a oldest segment is removed after the current LSN moves to
> the next segmen, current LSN naturally determines the fragment
> bytes. Maybe you're concerning that the number of segments looks
> too much by one segment.
>
> One arguable point of the feature is how max_slot_wal_keep_size
> works exactly. I assume that even though the name is named as
> "max_", we actually expect that "at least that bytes are
> kept". So, for example, with 16MB of segment size and 50MB of
> max_s_w_k_s, I designed this so that the size of preserved WAL
> doesn't go below 50MB, actually (rounding up to multples of 16MB
> of 50MB), and loses the oldest segment when it reaches 64MB +
> 16MB = 80MB as you saw.
>
> # I believe that the difference is not so significant since we
> # have around hunderd or several hundreds of segments in common
> # cases.
>
> Do you mean that we should define the GUC parameter literally as
> "we won't have exactly that many bytes of WAL segmetns"? That is,
> we have at most 48MB preserved WAL records for 50MB of
> max_s_w_k_s setting. This is the same to how max_wal_size is
> counted but I don't think max_slot_wal_keep_size will be regarded
> in the same way.

I might be missing something but what I'm expecting to this feature is
to restrict the how much WAL we can keep at a maximum for replication
slots. In other words, the distance between the current LSN and the
minimum restart_lsn of replication slots doesn't over the value of
max_slot_wal_keep_size. It's similar to wal_keep_segments except for
that this feature affects only replication slots. And
wal_keep_segments cannot restrict WAL that replication slots are
holding. For example, with 16MB of segment size and 50MB of
max_slot_wal_keep_size, we can keep at most 50MB WAL for replication
slots. However, once we consumed more than 50MB WAL while not
advancing any restart_lsn the required WAL might be lost by the next
checkpoint, which depends on the min_wal_size. On the other hand, if
we mostly can advance restart_lsn to approximately the current LSN the
size of preserved WAL for replication slots can go below 50MB.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-06 20:32:21
Message-ID:	29bbd79d-696b-509e-578a-0fc38a3b9405@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This documentation

+ <para>
+ Specify the maximum size of WAL files
+ that <link linkend="streaming-replication-slots">replication
+ slots</link> are allowed to retain in the
<filename>pg_wal</filename>
+ directory at checkpoint time.
+ If <varname>max_slot_wal_keep_size</varname> is zero (the default),
+ replication slots retain unlimited size of WAL files.
+ </para>

doesn't say anything about what happens when the limit is exceeded.
Does the system halt until the WAL is fetched from the slots? Do the
slots get invalidated?

Also, I don't think 0 is a good value for the default behavior. 0 would
mean that a slot is not allowed to retain any more WAL than already
exists anyway. Maybe we don't want to support that directly, but it's a
valid configuration. So maybe use -1 for infinity.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de, peter(dot)eisentraut(at)2ndquadrant(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-10 10:19:21
Message-ID:	20180910.191921.82261601.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Thu, 6 Sep 2018 19:55:39 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAZCdvdMN-vG4D_653vb_FN-AaMAP5+GXgF1JRjy+LeyA(at)mail(dot)gmail(dot)com>
> On Thu, Sep 6, 2018 at 4:10 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Thank you for the comment.
> >
> > At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB-HJvL+uKsv40Gb8Dymh9uBBQUXTucqv4MDtH_AGKh4g(at)mail(dot)gmail(dot)com>
> >> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
> >> Thank you for updating! Here is the review comment for v8 patch.
> >>
> >> + /*
> >> + * This slot still has all required segments. Calculate how many
> >> + * LSN bytes the slot has until it loses restart_lsn.
> >> + */
> >> + fragbytes = wal_segment_size - (currLSN % wal_segment_size);
> >> + XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
> >> fragbytes,
> >> + wal_segment_size, *restBytes);
> >>
> >> For the calculation of fragbytes, I think we should calculate the
> >> fragment bytes of restartLSN instead. The the formula "restartSeg +
> >> limitSegs - currSeg" means the # of segment between restartLSN and the
> >> limit by the new parameter. I don't think that the summation of it and
> >> fragment bytes of currenLSN is correct. As the following result
> >> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the
> >> actual remains + 16MB (get_bytes function returns the value of
> >> max_slot_wal_keep_size in bytes).
> >
> > Since a oldest segment is removed after the current LSN moves to
> > the next segmen, current LSN naturally determines the fragment
> > bytes. Maybe you're concerning that the number of segments looks
> > too much by one segment.
> >
> > One arguable point of the feature is how max_slot_wal_keep_size
> > works exactly. I assume that even though the name is named as
> > "max_", we actually expect that "at least that bytes are
> > kept". So, for example, with 16MB of segment size and 50MB of
> > max_s_w_k_s, I designed this so that the size of preserved WAL
> > doesn't go below 50MB, actually (rounding up to multples of 16MB
> > of 50MB), and loses the oldest segment when it reaches 64MB +
> > 16MB = 80MB as you saw.
> >
> > # I believe that the difference is not so significant since we
> > # have around hunderd or several hundreds of segments in common
> > # cases.
> >
> > Do you mean that we should define the GUC parameter literally as
> > "we won't have exactly that many bytes of WAL segmetns"? That is,
> > we have at most 48MB preserved WAL records for 50MB of
> > max_s_w_k_s setting. This is the same to how max_wal_size is
> > counted but I don't think max_slot_wal_keep_size will be regarded
> > in the same way.
>
> I might be missing something but what I'm expecting to this feature is
> to restrict the how much WAL we can keep at a maximum for replication
> slots. In other words, the distance between the current LSN and the
> minimum restart_lsn of replication slots doesn't over the value of
> max_slot_wal_keep_size.

Yes, it's one possible design, the same with "we won't have more
than exactly that many bytes of WAL segmetns" above ("more than"
is added, which I meant). But anyway we cannot keep the limit
strictly since WAL segments are removed only at checkpoint
time. So If doing so, we can reach the lost state before the
max_slot_wal_keep_size is filled up meanwhile WAL can exceed the
size by a WAL flood. We can define it precisely at most as "wal
segments are preserved at most aorund the value". So I choosed
the definition so that we can tell about this as "we don't
guarantee more than that bytes".

# Uuuu. sorry for possiblly hard-to-read sentence..

> It's similar to wal_keep_segments except for
> that this feature affects only replication slots. And

It defines the *extra* segments to be kept, that is, if we set it
to 2, at least 3 segments are present. If we set
max_slot_wal_keep_size to 32MB (= 2 segs here), we have at most 3
segments since 32MB range before the current LSN almost always
spans over 3 segments. Doesn't this seemingly in a similar way
with wal_keep_segments?

If the current LSN is at the very last of a segment and
restart_lsn is catching up to the current LSN, the "remain" is
equal to max_slot_wal_keep_size as the guaranteed size. If very
beginning of a segments, it gets extra 16MB.

> wal_keep_segments cannot restrict WAL that replication slots are
> holding. For example, with 16MB of segment size and 50MB of
> max_slot_wal_keep_size, we can keep at most 50MB WAL for replication
> slots. However, once we consumed more than 50MB WAL while not
> advancing any restart_lsn the required WAL might be lost by the next
> checkpoint, which depends on the min_wal_size.

I don't get the last phrase. With small min_wal_size, we don't
recycle most of the "removed" segments. If large, we recycle more
of them. It doesn't affect up to where the checkpoint removes WAL
files. But it is right that LSN advance with
max_slot_wal_keep_size bytes immediately leands to breaking a
slot and it is intended behavior.

> On the other hand, if
> we mostly can advance restart_lsn to approximately the current LSN the
> size of preserved WAL for replication slots can go below 50MB.

Y..eah.. That's right. It is just how this works. But I don't
understand how this is related to the intepretation of the "max"
of the GUC variable.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-10 10:52:24
Message-ID:	20180910.195224.22629595.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Thu, 6 Sep 2018 22:32:21 +0200, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote in <29bbd79d-696b-509e-578a-0fc38a3b9405(at)2ndquadrant(dot)com>
> This documentation
>
> + <para>
> + Specify the maximum size of WAL files
> + that <link linkend="streaming-replication-slots">replication
> + slots</link> are allowed to retain in the
> <filename>pg_wal</filename>
> + directory at checkpoint time.
> + If <varname>max_slot_wal_keep_size</varname> is zero (the default),
> + replication slots retain unlimited size of WAL files.
> + </para>
>
> doesn't say anything about what happens when the limit is exceeded.
> Does the system halt until the WAL is fetched from the slots? Do the
> slots get invalidated?

Thanks for pointing that. That's a major cause of confusion. Does
the following make sense?

> Specify the maximum size of WAL files that <link
> linkend="streaming-replication-slots">replication slots</link>
> are allowed to retain in the <filename>pg_wal</filename>
> directory at checkpoint time. If
> <varname>max_slot_wal_keep_size</varname> is zero (the
> default), replication slots retain unlimited size of WAL files.
+ If restart_lsn of a replication slot gets behind more than that
+ bytes from the current LSN, the standby using the slot may not
+ be able to reconnect due to removal of required WAL records.

And the following sentense is wrong now. I'll remove it in the
coming version 9.

> <para>
> This parameter is used being rounded down to the multiples of WAL file
> size.
> </para>

> Also, I don't think 0 is a good value for the default behavior. 0 would
> mean that a slot is not allowed to retain any more WAL than already
> exists anyway. Maybe we don't want to support that directly, but it's a
> valid configuration. So maybe use -1 for infinity.

In realtion to the reply just sent to Sawada-san, remain of a
slot can be at most 16MB in the 0 case with the default segment
size. So you're right in this sense. Will fix in the coming
version. Thanks.

=# show max_slot_wal_keep_size;
max_slot_wal_keep_size
------------------------
0
(1 row)
=# select pg_current_wal_lsn(), restart_lsn, remain, pg_size_pretty(remain) as remain from pg_replication_slots ;
pg_current_wal_lsn | restart_lsn | remain | remain
--------------------+-------------+----------+--------
0/4000000 | 0/4000000 | 16777216 | 16 MB
(1 row)
....
=# select pg_current_wal_lsn(), restart_lsn, remain, pg_size_pretty(remain) as remain from pg_replication_slots ;
pg_current_wal_lsn | restart_lsn | remain | remain
--------------------+-------------+--------+--------
0/4FF46D8 | 0/4FF46D8 | 47400 | 46 kB
(1 row)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-12 04:45:32
Message-ID:	CAD21AoAg_Rnbc_6EQJ6-Q+1kdzym9vjEZxhSAZOhjYPh5-921g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 10, 2018 at 7:19 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello.
>
> At Thu, 6 Sep 2018 19:55:39 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAZCdvdMN-vG4D_653vb_FN-AaMAP5+GXgF1JRjy+LeyA(at)mail(dot)gmail(dot)com>
>> On Thu, Sep 6, 2018 at 4:10 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Thank you for the comment.
>> >
>> > At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB-HJvL+uKsv40Gb8Dymh9uBBQUXTucqv4MDtH_AGKh4g(at)mail(dot)gmail(dot)com>
>> >> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
>> >> Thank you for updating! Here is the review comment for v8 patch.
>> >>
>> >> + /*
>> >> + * This slot still has all required segments. Calculate how many
>> >> + * LSN bytes the slot has until it loses restart_lsn.
>> >> + */
>> >> + fragbytes = wal_segment_size - (currLSN % wal_segment_size);
>> >> + XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
>> >> fragbytes,
>> >> + wal_segment_size, *restBytes);
>> >>
>> >> For the calculation of fragbytes, I think we should calculate the
>> >> fragment bytes of restartLSN instead. The the formula "restartSeg +
>> >> limitSegs - currSeg" means the # of segment between restartLSN and the
>> >> limit by the new parameter. I don't think that the summation of it and
>> >> fragment bytes of currenLSN is correct. As the following result
>> >> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the
>> >> actual remains + 16MB (get_bytes function returns the value of
>> >> max_slot_wal_keep_size in bytes).
>> >
>> > Since a oldest segment is removed after the current LSN moves to
>> > the next segmen, current LSN naturally determines the fragment
>> > bytes. Maybe you're concerning that the number of segments looks
>> > too much by one segment.
>> >
>> > One arguable point of the feature is how max_slot_wal_keep_size
>> > works exactly. I assume that even though the name is named as
>> > "max_", we actually expect that "at least that bytes are
>> > kept". So, for example, with 16MB of segment size and 50MB of
>> > max_s_w_k_s, I designed this so that the size of preserved WAL
>> > doesn't go below 50MB, actually (rounding up to multples of 16MB
>> > of 50MB), and loses the oldest segment when it reaches 64MB +
>> > 16MB = 80MB as you saw.
>> >
>> > # I believe that the difference is not so significant since we
>> > # have around hunderd or several hundreds of segments in common
>> > # cases.
>> >
>> > Do you mean that we should define the GUC parameter literally as
>> > "we won't have exactly that many bytes of WAL segmetns"? That is,
>> > we have at most 48MB preserved WAL records for 50MB of
>> > max_s_w_k_s setting. This is the same to how max_wal_size is
>> > counted but I don't think max_slot_wal_keep_size will be regarded
>> > in the same way.
>>
>> I might be missing something but what I'm expecting to this feature is
>> to restrict the how much WAL we can keep at a maximum for replication
>> slots. In other words, the distance between the current LSN and the
>> minimum restart_lsn of replication slots doesn't over the value of
>> max_slot_wal_keep_size.
>
> Yes, it's one possible design, the same with "we won't have more
> than exactly that many bytes of WAL segmetns" above ("more than"
> is added, which I meant). But anyway we cannot keep the limit
> strictly since WAL segments are removed only at checkpoint
> time.

Agreed. It should be something like a soft limit.

> So If doing so, we can reach the lost state before the
> max_slot_wal_keep_size is filled up meanwhile WAL can exceed the
> size by a WAL flood. We can define it precisely at most as "wal
> segments are preserved at most aorund the value". So I choosed
> the definition so that we can tell about this as "we don't
> guarantee more than that bytes".

Agreed.

>
> # Uuuu. sorry for possiblly hard-to-read sentence..
>
>> It's similar to wal_keep_segments except for
>> that this feature affects only replication slots. And
>
> It defines the *extra* segments to be kept, that is, if we set it
> to 2, at least 3 segments are present. If we set
> max_slot_wal_keep_size to 32MB (= 2 segs here), we have at most 3
> segments since 32MB range before the current LSN almost always
> spans over 3 segments. Doesn't this seemingly in a similar way
> with wal_keep_segments

Yeah, that's fine with me. The wal_keep_segments works regardless
existence of replication slots. If we have replication slots and set
both settings we can reserve extra WAL as much as
max(wal_keep_segments, max_slot_wal_keep_size).

>
> If the current LSN is at the very last of a segment and
> restart_lsn is catching up to the current LSN, the "remain" is
> equal to max_slot_wal_keep_size as the guaranteed size. If very
> beginning of a segments, it gets extra 16MB.

Agreed.

>
>> wal_keep_segments cannot restrict WAL that replication slots are
>> holding. For example, with 16MB of segment size and 50MB of
>> max_slot_wal_keep_size, we can keep at most 50MB WAL for replication
>> slots. However, once we consumed more than 50MB WAL while not
>> advancing any restart_lsn the required WAL might be lost by the next
>> checkpoint, which depends on the min_wal_size.
>
> I don't get the last phrase. With small min_wal_size, we don't
> recycle most of the "removed" segments. If large, we recycle more
> of them. It doesn't affect up to where the checkpoint removes WAL
> files. But it is right that LSN advance with
> max_slot_wal_keep_size bytes immediately leands to breaking a
> slot and it is intended behavior.

Sorry I was wrong. Please ignore the last sentence. What I want to say
is that there is no guarantees that the required WAL is kept once the
reserved extra WAL by replication slots exceeds the threshold.

>
>> On the other hand, if
>> we mostly can advance restart_lsn to approximately the current LSN the
>> size of preserved WAL for replication slots can go below 50MB.
>
> Y..eah.. That's right. It is just how this works. But I don't
> understand how this is related to the intepretation of the "max"
> of the GUC variable.

When I wrote this I understood that the following sentence is that we
regularly keep at least max_slot_wal_keep_size byte regardless the
progress of the minimum restart_lsn, I might have been
misunderstanding though.

>> > I assume that even though the name is named as
>> > "max_", we actually expect that "at least that bytes are
>> > kept".

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	peter(dot)eisentraut(at)2ndquadrant(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-09-13 09:29:31
Message-ID:	20180913.182931.87638304.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

Thank you for the comments, Sawada-san, Peter.

At Mon, 10 Sep 2018 19:52:24 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180910(dot)195224(dot)22629595(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> At Thu, 6 Sep 2018 22:32:21 +0200, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote in <29bbd79d-696b-509e-578a-0fc38a3b9405(at)2ndquadrant(dot)com>
> Thanks for pointing that. That's a major cause of confusion. Does
> the following make sense?
>
> > Specify the maximum size of WAL files that <link
> > linkend="streaming-replication-slots">replication slots</link>
> > are allowed to retain in the <filename>pg_wal</filename>
> > directory at checkpoint time. If
> > <varname>max_slot_wal_keep_size</varname> is zero (the
> > default), replication slots retain unlimited size of WAL files.
> + If restart_lsn of a replication slot gets behind more than that
> + bytes from the current LSN, the standby using the slot may not
> + be able to reconnect due to removal of required WAL records.
...
> > Also, I don't think 0 is a good value for the default behavior. 0 would
> > mean that a slot is not allowed to retain any more WAL than already
> > exists anyway. Maybe we don't want to support that directly, but it's a
> > valid configuration. So maybe use -1 for infinity.
>
> In realtion to the reply just sent to Sawada-san, remain of a
> slot can be at most 16MB in the 0 case with the default segment
> size. So you're right in this sense. Will fix in the coming
> version. Thanks.

I did the following thinkgs in the new version.

- Changed the disable (or infinite) and default value of
max_slot_wal_keep_size to -1 from 0.
(patch 1, 2. guc.c, xlog.c: GetOldestKeepSegment())

- Fixed documentation for max_slot_wal_keep_size tomention what
happnes when WAL exceeds the size, and additional rewrites.
(patch 4, catalogs.sgml, config.sgml)

- Folded parameter list of GetOldestKeepSegment().
(patch 1, 2. xlog.c)

- Provided the plural form of errdetail of checkpoint-time
warning. (patch 1, xlog.c: KeepLogSeg())

- Some cosmetic change and small refactor.
(patch 1, 2, 3)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v9-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	6.8 KB
v9-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	12.0 KB
v9-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	6.2 KB
v9-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	4.5 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-10-22 10:35:04
Message-ID:	CAD21AoBdfoLSgujPZ_TpnH5zdQz0jg-Y8OXtZ=TCO787Sey-=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 13, 2018 at 6:30 PM Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hello.
>
> Thank you for the comments, Sawada-san, Peter.
>
> At Mon, 10 Sep 2018 19:52:24 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180910(dot)195224(dot)22629595(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > At Thu, 6 Sep 2018 22:32:21 +0200, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote in <29bbd79d-696b-509e-578a-0fc38a3b9405(at)2ndquadrant(dot)com>
> > Thanks for pointing that. That's a major cause of confusion. Does
> > the following make sense?
> >
> > > Specify the maximum size of WAL files that <link
> > > linkend="streaming-replication-slots">replication slots</link>
> > > are allowed to retain in the <filename>pg_wal</filename>
> > > directory at checkpoint time. If
> > > <varname>max_slot_wal_keep_size</varname> is zero (the
> > > default), replication slots retain unlimited size of WAL files.
> > + If restart_lsn of a replication slot gets behind more than that
> > + bytes from the current LSN, the standby using the slot may not
> > + be able to reconnect due to removal of required WAL records.
> ...
> > > Also, I don't think 0 is a good value for the default behavior. 0 would
> > > mean that a slot is not allowed to retain any more WAL than already
> > > exists anyway. Maybe we don't want to support that directly, but it's a
> > > valid configuration. So maybe use -1 for infinity.
> >
> > In realtion to the reply just sent to Sawada-san, remain of a
> > slot can be at most 16MB in the 0 case with the default segment
> > size. So you're right in this sense. Will fix in the coming
> > version. Thanks.
>
> I did the following thinkgs in the new version.
>
> - Changed the disable (or infinite) and default value of
> max_slot_wal_keep_size to -1 from 0.
> (patch 1, 2. guc.c, xlog.c: GetOldestKeepSegment())
>
> - Fixed documentation for max_slot_wal_keep_size tomention what
> happnes when WAL exceeds the size, and additional rewrites.
> (patch 4, catalogs.sgml, config.sgml)
>
> - Folded parameter list of GetOldestKeepSegment().
> (patch 1, 2. xlog.c)
>
> - Provided the plural form of errdetail of checkpoint-time
> warning. (patch 1, xlog.c: KeepLogSeg())
>
> - Some cosmetic change and small refactor.
> (patch 1, 2, 3)
>

Sorry for the late response. The patch still can be applied to the
curent HEAD so I reviewed the latest patch.

The value of 'remain' and 'wal_status' might not be correct. Although
'wal_stats' shows 'lost' but we can get changes from the slot. I've
tested it with the following steps.

=# alter system set max_slot_wal_keep_size to '64MB'; -- while
wal_keep_segments is 0
=# select pg_reload_conf();
=# select slot_name, wal_status, remain, pg_size_pretty(remain) as
remain_pretty from pg_replication_slots ;
slot_name | wal_status | remain | remain_pretty
-----------+------------+----------+---------------
1 | streaming | 83885648 | 80 MB
(1 row)

** consume 80MB WAL, and do CHECKPOINT **

=# select slot_name, wal_status, remain, pg_size_pretty(remain) as
remain_pretty from pg_replication_slots ;
slot_name | wal_status | remain | remain_pretty
-----------+------------+--------+---------------
1 | lost | 0 | 0 bytes
(1 row)
=# select count(*) from pg_logical_slot_get_changes('1', NULL, NULL);
count
-------
15
(1 row)

-----
I got the following result with setting of wal_keep_segments >
max_slot_keep_size. The 'wal_status' shows 'streaming' although the
'remain' is 0.

=# select slot_name, wal_status, remain from pg_replication_slots limit 1;
slot_name | wal_status | remain
-----------+------------+--------
1 | streaming | 0
(1 row)

+ XLByteToSeg(targetLSN, restartSeg, wal_segment_size);
+ if (max_slot_wal_keep_size_mb >= 0 && currSeg <=
restartSeg + limitSegs)
+ {

You use limitSegs here but shouldn't we use keepSeg instead? Actually
I've commented this point for v6 patch before[1], and this had been
fixed in the v7 patch. However you're using limitSegs again from v8
patch again. I might be missing something though.

Changed the status to 'Waiting on Author'.

[1] /message-id/CAD21AoD0rChq7wQE%3D_o95quopcQGjcVG9omwdH07nT5cm81hzg%40mail.gmail.com
[2] /message-id/20180904.195250.144186960.horiguchi.kyotaro%40lab.ntt.co.jp

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATIONNTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-10-25 12:55:18
Message-ID:	20181025.215518.189844649.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Mon, 22 Oct 2018 19:35:04 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBdfoLSgujPZ_TpnH5zdQz0jg-Y8OXtZ=TCO787Sey-=w(at)mail(dot)gmail(dot)com>
> On Thu, Sep 13, 2018 at 6:30 PM Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Sorry for the late response. The patch still can be applied to the

It's alright. Thanks.

> curent HEAD so I reviewed the latest patch.
> The value of 'remain' and 'wal_status' might not be correct. Although
> 'wal_stats' shows 'lost' but we can get changes from the slot. I've
> tested it with the following steps.
>
> =# alter system set max_slot_wal_keep_size to '64MB'; -- while
> wal_keep_segments is 0
> =# select pg_reload_conf();
> =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> remain_pretty from pg_replication_slots ;
> slot_name | wal_status | remain | remain_pretty
> -----------+------------+----------+---------------
> 1 | streaming | 83885648 | 80 MB
> (1 row)
>
> ** consume 80MB WAL, and do CHECKPOINT **
>
> =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> remain_pretty from pg_replication_slots ;
> slot_name | wal_status | remain | remain_pretty
> -----------+------------+--------+---------------
> 1 | lost | 0 | 0 bytes
> (1 row)
> =# select count(*) from pg_logical_slot_get_changes('1', NULL, NULL);
> count
> -------
> 15
> (1 row)

Mmm. The function looks into the segment already open before
losing the segment in the file system (precisely, its direcotory
entry has been deleted). So just 1 lost segment doesn't
matter. Please try losing more one segment.

=# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
ERROR: unexpected pageaddr 0/29000000 in log segment 000000010000000000000023, offset 0

Or, instead just restarting will let the opened segment forgotten.

...
> 1 | lost | 0 | 0 bytes
(just restart)
> =# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
> ERROR: requested WAL segment pg_wal/000000010000000000000029 has already been removed

I'm not sure this is counted to be a bug...

> -----
> I got the following result with setting of wal_keep_segments >
> max_slot_keep_size. The 'wal_status' shows 'streaming' although the
> 'remain' is 0.
>
> =# select slot_name, wal_status, remain from pg_replication_slots limit 1;
> slot_name | wal_status | remain
> -----------+------------+--------
> 1 | streaming | 0
> (1 row)
>
> + XLByteToSeg(targetLSN, restartSeg, wal_segment_size);
> + if (max_slot_wal_keep_size_mb >= 0 && currSeg <=
> restartSeg + limitSegs)
> + {
>
> You use limitSegs here but shouldn't we use keepSeg instead? Actually
> I've commented this point for v6 patch before[1], and this had been
> fixed in the v7 patch. However you're using limitSegs again from v8
> patch again. I might be missing something though.

No. keepSegs is the number of segments *actually* kept around. So
reverting it to keptSegs just resurrects the bug you pointed
upthread. What needed here is at most how many segments will be
kept. So raising limitSegs by wal_keep_segments fixes that.
Sorry for the sequence of silly bugs. TAP test for the case
added.

> Changed the status to 'Waiting on Author'.
>
> [1] /message-id/CAD21AoD0rChq7wQE%3D_o95quopcQGjcVG9omwdH07nT5cm81hzg%40mail.gmail.com
> [2] /message-id/20180904.195250.144186960.horiguchi.kyotaro%40lab.ntt.co.jp

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v10-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	6.8 KB
v10-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	12.3 KB
v10-0003-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	6.8 KB
v10-0004-Documentation-for-slot-limit-feature.patch	text/x-patch	4.5 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-10-26 02:26:36
Message-ID:	20181026.112636.147537766.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 25 Oct 2018 21:55:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20181025(dot)215518(dot)189844649(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > =# alter system set max_slot_wal_keep_size to '64MB'; -- while
> > wal_keep_segments is 0
> > =# select pg_reload_conf();
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> > slot_name | wal_status | remain | remain_pretty
> > -----------+------------+----------+---------------
> > 1 | streaming | 83885648 | 80 MB
> > (1 row)
> >
> > ** consume 80MB WAL, and do CHECKPOINT **
> >
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> > slot_name | wal_status | remain | remain_pretty
> > -----------+------------+--------+---------------
> > 1 | lost | 0 | 0 bytes
> > (1 row)
> > =# select count(*) from pg_logical_slot_get_changes('1', NULL, NULL);
> > count
> > -------
> > 15
> > (1 row)
>
> Mmm. The function looks into the segment already open before
> losing the segment in the file system (precisely, its direcotory
> entry has been deleted). So just 1 lost segment doesn't
> matter. Please try losing more one segment.

I considered this a bit more and the attached patch let
XLogReadRecord() check for segment removal every time it is
called and emits the following error in the case.

> =# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
> ERROR: WAL record at 0/870001B0 no longer available
> DETAIL: The segment for the record has been removed.

The reason for doing that in the fucntion is it can happen also
for physical replication when walsender is active but far
behind. The removed(renamed)-but-still-open segment may be
recycled and can be overwritten while reading, and it will be
caught by page/record validation. It is substantially lost in
that sense. I don't think the strictness is useful for anything..

Thoughts?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v10-0005-Check-removal-of-in-read-segment-file.patch	text/x-patch	2.0 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-10-29 08:37:52
Message-ID:	CAD21AoDz2nUu-JyLxp5UTO19074C7bGdYJV-vDWLkaRxLAy8+w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 25, 2018 at 9:56 PM Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hello.
>
> At Mon, 22 Oct 2018 19:35:04 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBdfoLSgujPZ_TpnH5zdQz0jg-Y8OXtZ=TCO787Sey-=w(at)mail(dot)gmail(dot)com>
> > On Thu, Sep 13, 2018 at 6:30 PM Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Sorry for the late response. The patch still can be applied to the
>
> It's alright. Thanks.
>
> > curent HEAD so I reviewed the latest patch.
> > The value of 'remain' and 'wal_status' might not be correct. Although
> > 'wal_stats' shows 'lost' but we can get changes from the slot. I've
> > tested it with the following steps.
> >
> > =# alter system set max_slot_wal_keep_size to '64MB'; -- while
> > wal_keep_segments is 0
> > =# select pg_reload_conf();
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> > slot_name | wal_status | remain | remain_pretty
> > -----------+------------+----------+---------------
> > 1 | streaming | 83885648 | 80 MB
> > (1 row)
> >
> > ** consume 80MB WAL, and do CHECKPOINT **
> >
> > =# select slot_name, wal_status, remain, pg_size_pretty(remain) as
> > remain_pretty from pg_replication_slots ;
> > slot_name | wal_status | remain | remain_pretty
> > -----------+------------+--------+---------------
> > 1 | lost | 0 | 0 bytes
> > (1 row)
> > =# select count(*) from pg_logical_slot_get_changes('1', NULL, NULL);
> > count
> > -------
> > 15
> > (1 row)
>
> Mmm. The function looks into the segment already open before
> losing the segment in the file system (precisely, its direcotory
> entry has been deleted). So just 1 lost segment doesn't
> matter. Please try losing more one segment.
>
> =# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
> ERROR: unexpected pageaddr 0/29000000 in log segment 000000010000000000000023, offset 0
>
> Or, instead just restarting will let the opened segment forgotten.
>
> ...
> > 1 | lost | 0 | 0 bytes
> (just restart)
> > =# select * from pg_logical_slot_get_changes('s1', NULL, NULL);
> > ERROR: requested WAL segment pg_wal/000000010000000000000029 has already been removed
>
> I'm not sure this is counted to be a bug...
>
>
> > -----
> > I got the following result with setting of wal_keep_segments >
> > max_slot_keep_size. The 'wal_status' shows 'streaming' although the
> > 'remain' is 0.
> >
> > =# select slot_name, wal_status, remain from pg_replication_slots limit 1;
> > slot_name | wal_status | remain
> > -----------+------------+--------
> > 1 | streaming | 0
> > (1 row)
> >
> > + XLByteToSeg(targetLSN, restartSeg, wal_segment_size);
> > + if (max_slot_wal_keep_size_mb >= 0 && currSeg <=
> > restartSeg + limitSegs)
> > + {
> >
> > You use limitSegs here but shouldn't we use keepSeg instead? Actually
> > I've commented this point for v6 patch before[1], and this had been
> > fixed in the v7 patch. However you're using limitSegs again from v8
> > patch again. I might be missing something though.
>
> No. keepSegs is the number of segments *actually* kept around. So
> reverting it to keptSegs just resurrects the bug you pointed
> upthread. What needed here is at most how many segments will be
> kept. So raising limitSegs by wal_keep_segments fixes that.
> Sorry for the sequence of silly bugs. TAP test for the case
> added.
>

Thank you for updating the patch. The 0001 - 0004 patches work fine
and looks good to me except for the following comment for the code.

+ /*
+ * Calculate keep segments by slots first. The second term of the
+ * condition is just a sanity check.
+ */
+ if (minSlotLSN != InvalidXLogRecPtr && minSlotSeg <= currSeg)
+ keepSegs = currSeg - minSlotSeg;

I think that we can use assertion of the second term of the condition
instead of just checking. If the function get minSlotSeg > currSeg the
return value will be incorrect. That means that the function requires
the condition is always true. Thought?

Since this comment can be deferred to committers I've marked this
patch as "Ready for Committer". For 0005 patch the issue I reported is
a relatively rare issue and is not critical, we can discuss it after
this patch gets committed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-11-19 04:39:58
Message-ID:	20181119043958.GE4400@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 26, 2018 at 11:26:36AM +0900, Kyotaro HORIGUCHI wrote:
> The reason for doing that in the fucntion is it can happen also
> for physical replication when walsender is active but far
> behind. The removed(renamed)-but-still-open segment may be
> recycled and can be overwritten while reading, and it will be
> caught by page/record validation. It is substantially lost in
> that sense. I don't think the strictness is useful for anything..

I was just coming by to look at bit at the patch series, and bumped
into that:

> + /*
> + * checkpoint can remove the segment currently looking for. make sure the
> + * current segment is still exists. We check this only once per record.
> + */
> + XLByteToSeg(targetPagePtr, targetSegNo, state->wal_segment_size);
> + if (targetSegNo <= XLogGetLastRemovedSegno())
> + ereport(ERROR,
> + (errcode(ERRCODE_NO_DATA),
> + errmsg("WAL record at %X/%X no longer available",
> + (uint32)(RecPtr >> 32), (uint32) RecPtr),
> + errdetail("The segment for the record has been removed.")));
> +

ereport should not be called within xlogreader.c as a base rule:
* This file is compiled as both front-end and backend code, so it
* may not use ereport, server-defined static variables, etc.
--
Michael

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-11-20 05:07:44
Message-ID:	20181120050744.GJ4400@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> I was just coming by to look at bit at the patch series, and bumped
> into that:

So I have been looking at the last patch series 0001-0004 posted on this
thread, and coming from here:
https://postgr.es/m/20181025.215518.189844649.horiguchi.kyotaro@lab.ntt.co.jp

/* check that the slot is gone */
SELECT * FROM pg_replication_slots
It could be an idea to switch to the expanded mode here, not that it
matters much still..

+IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
You mean Available here, not Avaiable. This function is only used when
scanning for slot information with pg_replication_slots, so wouldn't it
be better to just return the status string in this case?

Not sure I see the point of the "remain" field, which can be found with
a simple calculation using the current insertion LSN, the segment size
and the amount of WAL that the slot is retaining. It may be interesting
to document a query to do that though.

GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
parallel, no? How is it safe to scan pg_wal on a process querying
pg_replication_slots while another process may manipulate its contents
(aka the checkpointer or just the startup process with an
end-of-recovery checkpoint.). This routine relies on unsafe
assumptions as this is not concurrent-safe. You can avoid problems by
making sure instead that lastRemovedSegNo is initialized correctly at
startup, which would be normally one segment older than what's in
pg_wal, which feels a bit hacky to rely on to track the oldest segment.

It seems to me that GetOldestXLogFileSegNo() should also check for
segments matching the current timeline, no?

+ if (prev_lost_segs != lost_segs)
+ ereport(WARNING,
+ (errmsg ("some replication slots have lost
required WAL segments"),
+ errdetail_plural(
+ "The mostly affected slot has lost %ld
segment.",
+ "The mostly affected slot has lost %ld
segments.",
+ lost_segs, lost_segs)));
This can become very noisy with the time, and it would be actually
useful to know which replication slot is impacted by that.

+ slot doesn't have valid restart_lsn, this field
Missing a determinant here, and restart_lsn should have a <literal>
markup.

+ many WAL segments that they fill up the space allotted
s/allotted/allocated/.

+ available. The last two states are seen only when
+ <xref linkend="guc-max-slot-wal-keep-size"/> is non-negative. If the
+ slot doesn't have valid restart_lsn, this field
+ is <literal>unknown</literal>.
I am a bit confused by this statement. The last two states are "lost"
and "keeping", but shouldn't "keeping" be the state showing up by
default as it means that all WAL segments are kept around.

+# Advance WAL by ten segments (= 160MB) on master
+advance_wal($node_master, 10);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
This makes the tests very costly, which is something we should avoid as
much as possible. One trick which could be used here, on top of
reducing the number of segment switches, is to use initdb
--wal-segsize=1.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(at)paquier(dot)xyz
Cc:	sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2018-12-20 07:24:38
Message-ID:	20181220.162438.121484007.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for piking this and sorry being late.

At Mon, 19 Nov 2018 13:39:58 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20181119043958(dot)GE4400(at)paquier(dot)xyz>
> ereport should not be called within xlogreader.c as a base rule:

Ouch! I forgot that. Fixed to use report_invalid_record slightly
changing the message. The code is not required (or cannot be
used) on frontend so #ifndef FRONTENDed the code.

At Tue, 20 Nov 2018 14:07:44 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20181120050744(dot)GJ4400(at)paquier(dot)xyz>
> On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> > I was just coming by to look at bit at the patch series, and bumped
> > into that:
>
> So I have been looking at the last patch series 0001-0004 posted on this
> thread, and coming from here:
> https://postgr.es/m/20181025.215518.189844649.horiguchi.kyotaro@lab.ntt.co.jp
>
> /* check that the slot is gone */
> SELECT * FROM pg_replication_slots
> It could be an idea to switch to the expanded mode here, not that it
> matters much still..

No problem doing that. Done.

TAP test complains that it still uses recovery.conf. Fixed. On
the way doing that I added parameter primary_slot_name to
init_from_backup in PostgresNode.pm

> +IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
> You mean Available here, not Avaiable. This function is only used when
> scanning for slot information with pg_replication_slots, so wouldn't it
> be better to just return the status string in this case?

Mmm. Sure. Auto-completion hid it from my eyes. Fixed the name.
The fix sounds reasonable. The function was created as returning
boolean and the name doen't fit the current function. I renamed
the name to GetLsnAvailability() that returns a string.

> Not sure I see the point of the "remain" field, which can be found with
> a simple calculation using the current insertion LSN, the segment size
> and the amount of WAL that the slot is retaining. It may be interesting
> to document a query to do that though.

It's not that simple. wal_segment_size, max_slot_wal_keep_size,
wal_keep_segments, max_slot_wal_keep_size and the current LSN are
invoved in the calculation which including several conditional
branches, maybe as you see upthread. We could show "the largest
current LSN until WAL is lost" but the "current LSN" is not shown
there. So it is showing the "remain".

> GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
> parallel, no? How is it safe to scan pg_wal on a process querying
> pg_replication_slots while another process may manipulate its contents
> (aka the checkpointer or just the startup process with an
> end-of-recovery checkpoint.). This routine relies on unsafe
> assumptions as this is not concurrent-safe. You can avoid problems by
> making sure instead that lastRemovedSegNo is initialized correctly at
> startup, which would be normally one segment older than what's in
> pg_wal, which feels a bit hacky to rely on to track the oldest segment.

Concurrent recycling makes the function's result vary between the
segment numbers before and after it. It is unstable but doesn't
matter so much. The reason for the timing is to avoid extra
startup time by a scan over pg_wal that is unncecessary in most
cases.

Anyway the attached patch initializes lastRemovedSegNo in
StartupXLOG().

> It seems to me that GetOldestXLogFileSegNo() should also check for
> segments matching the current timeline, no?

RemoveOldXlogFiles() ignores timeline and the function is made to
behave the same way (in different manner). I added a comment for
the behavior in the function.

> + if (prev_lost_segs != lost_segs)
> + ereport(WARNING,
> + (errmsg ("some replication slots have lost
> required WAL segments"),
> + errdetail_plural(
> + "The mostly affected slot has lost %ld
> segment.",
> + "The mostly affected slot has lost %ld
> segments.",
> + lost_segs, lost_segs)));
> This can become very noisy with the time, and it would be actually
> useful to know which replication slot is impacted by that.

One message per one segment doen't seem so noisy. The reason for
not showing slot identifier individually is just to avoid
complexity comes from involving slot details. DBAs will see the
details in pg_stat_replication.

Anyway I did that in the attached patch. ReplicationSlotsBehind
returns the list of the slot names that behind specified
LSN. With this patch the messages looks as the follows:

WARNING: some replication slots have lost required WAL segments
DETAIL: Slot s1 lost 8 segment(s).
WARNING: some replication slots have lost required WAL segments
DETAIL: Slots s1, s2, s3 lost at most 9 segment(s).

> + slot doesn't have valid restart_lsn, this field
> Missing a determinant here, and restart_lsn should have a <literal>
> markup.

structfield? Reworded as below:

| non-negative. If <structfield>restart_lsn</structfield> is NULL, this
| field is <literal>unknown</literal>.

I changed "the slot" with "this slot" in the two added fields
(wal_status, remain).

> + many WAL segments that they fill up the space allotted
> s/allotted/allocated/.

Fixed.

> + available. The last two states are seen only when
> + <xref linkend="guc-max-slot-wal-keep-size"/> is non-negative. If the
> + slot doesn't have valid restart_lsn, this field
> + is <literal>unknown</literal>.
> I am a bit confused by this statement. The last two states are "lost"
> and "keeping", but shouldn't "keeping" be the state showing up by
> default as it means that all WAL segments are kept around.

It's "streaming". I didn't came up with nice words to
distinguish the two states. I'm not sure "keep around" exactly
means but "keeping" here means rather "just not removed yet". The
states could be reworded as the follows:

streaming: kept/keeping/(secure, in the first version)
keeping : mortal/about to be removed
lost/unkown : (lost/unknown)

Do you have any better wording?

> +# Advance WAL by ten segments (= 160MB) on master
> +advance_wal($node_master, 10);
> +$node_master->safe_psql('postgres', "CHECKPOINT;");
> This makes the tests very costly, which is something we should avoid as
> much as possible. One trick which could be used here, on top of
> reducing the number of segment switches, is to use initdb
> --wal-segsize=1.

That sounds nice. Done. In the new version the number of segments
can be reduced and a new test item for the initial unkonwn state
as the first item.

Please find the attached new version.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v11-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	9.8 KB
v11-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	12.6 KB
v11-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch	text/x-patch	1.2 KB
v11-0004-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	7.1 KB
v11-0005-Documentation-for-slot-limit-feature.patch	text/x-patch	4.5 KB
v11-0006-Check-removal-of-in-reading-segment-file.patch	text/x-patch	2.3 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(at)paquier(dot)xyz
Cc:	sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com, andres(at)anarazel(dot)de
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-01-30 01:42:04
Message-ID:	20190130.104204.249058820.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 20 Dec 2018 16:24:38 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20181220(dot)162438(dot)121484007(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Thank you for piking this and sorry being late.
>
> At Mon, 19 Nov 2018 13:39:58 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20181119043958(dot)GE4400(at)paquier(dot)xyz>
> > ereport should not be called within xlogreader.c as a base rule:
>
> Ouch! I forgot that. Fixed to use report_invalid_record slightly
> changing the message. The code is not required (or cannot be
> used) on frontend so #ifndef FRONTENDed the code.
>
> At Tue, 20 Nov 2018 14:07:44 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20181120050744(dot)GJ4400(at)paquier(dot)xyz>
> > On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> > > I was just coming by to look at bit at the patch series, and bumped
> > > into that:
> >
> > So I have been looking at the last patch series 0001-0004 posted on this
> > thread, and coming from here:
> > https://postgr.es/m/20181025.215518.189844649.horiguchi.kyotaro@lab.ntt.co.jp
> >
> > /* check that the slot is gone */
> > SELECT * FROM pg_replication_slots
> > It could be an idea to switch to the expanded mode here, not that it
> > matters much still..
>
> No problem doing that. Done.
>
> TAP test complains that it still uses recovery.conf. Fixed. On
> the way doing that I added parameter primary_slot_name to
> init_from_backup in PostgresNode.pm
>
> > +IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
> > You mean Available here, not Avaiable. This function is only used when
> > scanning for slot information with pg_replication_slots, so wouldn't it
> > be better to just return the status string in this case?
>
> Mmm. Sure. Auto-completion hid it from my eyes. Fixed the name.
> The fix sounds reasonable. The function was created as returning
> boolean and the name doen't fit the current function. I renamed
> the name to GetLsnAvailability() that returns a string.
>
> > Not sure I see the point of the "remain" field, which can be found with
> > a simple calculation using the current insertion LSN, the segment size
> > and the amount of WAL that the slot is retaining. It may be interesting
> > to document a query to do that though.
>
> It's not that simple. wal_segment_size, max_slot_wal_keep_size,
> wal_keep_segments, max_slot_wal_keep_size and the current LSN are
> invoved in the calculation which including several conditional
> branches, maybe as you see upthread. We could show "the largest
> current LSN until WAL is lost" but the "current LSN" is not shown
> there. So it is showing the "remain".
>
> > GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
> > parallel, no? How is it safe to scan pg_wal on a process querying
> > pg_replication_slots while another process may manipulate its contents
> > (aka the checkpointer or just the startup process with an
> > end-of-recovery checkpoint.). This routine relies on unsafe
> > assumptions as this is not concurrent-safe. You can avoid problems by
> > making sure instead that lastRemovedSegNo is initialized correctly at
> > startup, which would be normally one segment older than what's in
> > pg_wal, which feels a bit hacky to rely on to track the oldest segment.
>
> Concurrent recycling makes the function's result vary between the
> segment numbers before and after it. It is unstable but doesn't
> matter so much. The reason for the timing is to avoid extra
> startup time by a scan over pg_wal that is unncecessary in most
> cases.
>
> Anyway the attached patch initializes lastRemovedSegNo in
> StartupXLOG().
>
> > It seems to me that GetOldestXLogFileSegNo() should also check for
> > segments matching the current timeline, no?
>
> RemoveOldXlogFiles() ignores timeline and the function is made to
> behave the same way (in different manner). I added a comment for
> the behavior in the function.
>
> > + if (prev_lost_segs != lost_segs)
> > + ereport(WARNING,
> > + (errmsg ("some replication slots have lost
> > required WAL segments"),
> > + errdetail_plural(
> > + "The mostly affected slot has lost %ld
> > segment.",
> > + "The mostly affected slot has lost %ld
> > segments.",
> > + lost_segs, lost_segs)));
> > This can become very noisy with the time, and it would be actually
> > useful to know which replication slot is impacted by that.
>
> One message per one segment doen't seem so noisy. The reason for
> not showing slot identifier individually is just to avoid
> complexity comes from involving slot details. DBAs will see the
> details in pg_stat_replication.
>
> Anyway I did that in the attached patch. ReplicationSlotsBehind
> returns the list of the slot names that behind specified
> LSN. With this patch the messages looks as the follows:
>
> WARNING: some replication slots have lost required WAL segments
> DETAIL: Slot s1 lost 8 segment(s).
> WARNING: some replication slots have lost required WAL segments
> DETAIL: Slots s1, s2, s3 lost at most 9 segment(s).
>
> > + slot doesn't have valid restart_lsn, this field
> > Missing a determinant here, and restart_lsn should have a <literal>
> > markup.
>
> structfield? Reworded as below:
>
> | non-negative. If <structfield>restart_lsn</structfield> is NULL, this
> | field is <literal>unknown</literal>.
>
> I changed "the slot" with "this slot" in the two added fields
> (wal_status, remain).
>
> > + many WAL segments that they fill up the space allotted
> > s/allotted/allocated/.
>
> Fixed.
>
> > + available. The last two states are seen only when
> > + <xref linkend="guc-max-slot-wal-keep-size"/> is non-negative. If the
> > + slot doesn't have valid restart_lsn, this field
> > + is <literal>unknown</literal>.
> > I am a bit confused by this statement. The last two states are "lost"
> > and "keeping", but shouldn't "keeping" be the state showing up by
> > default as it means that all WAL segments are kept around.
>
> It's "streaming". I didn't came up with nice words to
> distinguish the two states. I'm not sure "keep around" exactly
> means but "keeping" here means rather "just not removed yet". The
> states could be reworded as the follows:
>
> streaming: kept/keeping/(secure, in the first version)
> keeping : mortal/about to be removed
> lost/unkown : (lost/unknown)
>
> Do you have any better wording?
>
> > +# Advance WAL by ten segments (= 160MB) on master
> > +advance_wal($node_master, 10);
> > +$node_master->safe_psql('postgres', "CHECKPOINT;");
> > This makes the tests very costly, which is something we should avoid as
> > much as possible. One trick which could be used here, on top of
> > reducing the number of segment switches, is to use initdb
> > --wal-segsize=1.
>
> That sounds nice. Done. In the new version the number of segments
> can be reduced and a new test item for the initial unkonwn state
> as the first item.
>
> Please find the attached new version.

Rebased. No conflict found since the last version.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v12-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	9.8 KB
v12-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	12.6 KB
v12-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch	text/x-patch	1.2 KB
v12-0004-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	7.1 KB
v12-0005-Documentation-for-slot-limit-feature.patch	text/x-patch	4.5 KB
v12-0006-Check-removal-of-in-reading-segment-file.patch	text/x-patch	2.3 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-02-16 03:13:23
Message-ID:	20190216031323.t7tfrae4l6zqtseo@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토 사이트SQL

Hi,

On 2019-01-30 10:42:04 +0900, Kyotaro HORIGUCHI wrote:
> From 270aff9b08ced425b4c4e23b53193285eb2359a6 Mon Sep 17 00:00:00 2001
> From: Kyotaro Horiguchi <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Date: Thu, 21 Dec 2017 21:20:20 +0900
> Subject: [PATCH 1/6] Add WAL relief vent for replication slots
>
> Adds a capability to limit the number of segments kept by replication
> slots by a GUC variable.

Maybe I'm missing something, but how does this prevent issues with
active slots that are currently accessing the WAL this patch now
suddenly allows to be removed? Especially for logical slots that seems
not unproblematic?

Besides that, this patch needs substantial spelling / language / comment
polishing. Horiguchi-san, it'd probably be good if you could make a
careful pass, and then maybe a native speaker could go over it?

Greetings,

Andres Freund

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)anarazel(dot)de
Cc:	michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-02-22 01:12:51
Message-ID:	20190222.101251.03333542.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Fri, 15 Feb 2019 19:13:23 -0800, Andres Freund <andres(at)anarazel(dot)de> wrote in <20190216031323(dot)t7tfrae4l6zqtseo(at)alap3(dot)anarazel(dot)de>
> Hi,
>
> On 2019-01-30 10:42:04 +0900, Kyotaro HORIGUCHI wrote:
> > From 270aff9b08ced425b4c4e23b53193285eb2359a6 Mon Sep 17 00:00:00 2001
> > From: Kyotaro Horiguchi <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > Date: Thu, 21 Dec 2017 21:20:20 +0900
> > Subject: [PATCH 1/6] Add WAL relief vent for replication slots
> >
> > Adds a capability to limit the number of segments kept by replication
> > slots by a GUC variable.
>
> Maybe I'm missing something, but how does this prevent issues with
> active slots that are currently accessing the WAL this patch now
> suddenly allows to be removed? Especially for logical slots that seems
> not unproblematic?

No matter whether logical or physical, when reading an
overwritten page of a recycled/renamed segment file, page
validation at reading-in finds that it is of a different segment
than expected. 0006 in [1] introduces more active checking on
that.

[1] /message-id/20181220.162438.121484007.horiguchi.kyotaro%40lab.ntt.co.jp

> Besides that, this patch needs substantial spelling / language / comment
> polishing. Horiguchi-san, it'd probably be good if you could make a
> careful pass, and then maybe a native speaker could go over it?

Thank you for your kind suggestion. As I did for other patches,
I'll review it by myself and come up with a new version soon.

# I often don't understand what I wrote:(

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	andres(at)anarazel(dot)de
Cc:	michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-02-22 05:12:28
Message-ID:	20190222.141228.34263256.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Fri, 22 Feb 2019 10:12:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190222(dot)101251(dot)03333542(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
horiguchi.kyotaro> At Fri, 15 Feb 2019 19:13:23 -0800, Andres Freund <andres(at)anarazel(dot)de> wrote in <20190216031323(dot)t7tfrae4l6zqtseo(at)alap3(dot)anarazel(dot)de>
> > Maybe I'm missing something, but how does this prevent issues with
> > active slots that are currently accessing the WAL this patch now
> > suddenly allows to be removed? Especially for logical slots that seems
> > not unproblematic?
>
> No matter whether logical or physical, when reading an
> overwritten page of a recycled/renamed segment file, page
> validation at reading-in finds that it is of a different segment
> than expected. 0006 in [1] introduces more active checking on
> that.
>
> [1] /message-id/20181220.162438.121484007.horiguchi.kyotaro%40lab.ntt.co.jp
>
> > Besides that, this patch needs substantial spelling / language / comment
> > polishing. Horiguchi-san, it'd probably be good if you could make a
> > careful pass, and then maybe a native speaker could go over it?
>
> Thank you for your kind suggestion. As I did for other patches,
> I'll review it by myself and come up with a new version soon.

I checked spelling comments and commit messages, then perhaps
corrected and improved them. I hope they looks nice..

0006 is separate from 0001, since I still doubt the necessity.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v13-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	10.1 KB
v13-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	12.6 KB
v13-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch	text/x-patch	1.2 KB
v13-0004-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	7.1 KB
v13-0005-Documentation-for-slot-limit-feature.patch	text/x-patch	4.5 KB
v13-0006-Check-removal-of-in-reading-segment-file.patch	text/x-patch	2.4 KB

From:	Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-06-27 14:22:56
Message-ID:	20190627162256.4f4872b8@firost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Being interested by this feature, I did a patch review.

This features adds the GUC "max_slot_wal_keep_size". This is the maximum amount
of WAL that can be kept in "pg_wal" by active slots.

If the amount of WAL is superior to this limit, the slot is deactivated and
its status (new filed in pg_replication_slot) is set as "lost".

Patching
========

The patch v13-0003 does not apply on HEAD anymore.

The patch v13-0005 applies using "git am --ignore-space-change"

Other patches applies correctly.

Please, find attached the v14 set of patches rebased on master.

Documentation
=============

The documentation explains the GUC and related columns in "pg_replication_slot".

It reflects correctly the current behavior of the patch.

Usability
=========

The patch implement what it described. It is easy to enable and disable. The
GUC name is describing correctly its purpose.

This feature is useful in some HA scenario where slot are required (eg. no
possible archiving), but where primary availability is more important than
standbys.

In "pg_replication_slots" view, the new "wal_status" field is misleading.
Consider this sentence and the related behavior from documentation
(catalogs.sgml):

<literal>keeping</literal> means that some of them are to be removed by the
next checkpoint.

"keeping" appears when the current checkpoint will delete some WAL further than
"current_lsn - max_slot_wal_keep_size", but still required by at least one slot.
As some WAL required by some slots will be deleted quite soon, probably before
anyone can react, "keeping" status is misleading here. We are already in the
red zone.

I would expect this "wal_status" to be:

- streaming: slot lag between 0 and "max_wal_size"
- keeping: slot lag between "max_wal_size" and "max_slot_wal_keep_size". the
slot actually protect some WALs from being deleted
- lost: slot lag superior of max_slot_wal_keep_size. The slot couldn't protect
some WAL from deletion

Documentation follows with:

The last two states are seen only when max_slot_wal_keep_size is
non-negative

This is true with the current behavior. However, if "keeping" is set as soon as
te slot lag is superior than "max_wal_size", this status could be useful even
with "max_slot_wal_keep_size = -1". As soon as a slot is stacking WALs that
should have been removed by previous checkpoint, it "keeps" them.

Feature tests
=============

I have played with various traffic shaping setup between nodes, to observe how
columns "active", "wal_status" and "remain" behaves in regard to each others
using:

while true; do
sleep 0.3;
psql -p 5432 -AXtc "
select now(), active, restart_lsn, wal_status, pg_size_pretty(remain)
from pg_replication_slots
where slot_name='slot_limit_st'"
done

The primary is created using:

initdb -U postgres -D slot_limit_pr --wal-segsize=1

cat<<EOF >>slot_limit_pr/postgresql.conf
port=5432
max_wal_size = 3MB
min_wal_size = 2MB
max_slot_wal_keep_size = 4MB
logging_collector = on
synchronous_commit = off
EOF

WAL activity is generated using a simple pgbench workload. Then, during
this activity, packets on loopback are delayed using:

tc qdisc add dev lo root handle 1:0 netem delay 140msec

Here is how the wal_status behave. I removed the timestamps, but the
record order is the original one:

When increasing the network delay to 145ms, the slot has been lost for real.
Note that it has been shown as lost but active twice (during approx 0.6s) before
being deactivated.

t|1/85700000|streaming|2048 kB
t|1/85800000|keeping|0 bytes
t|1/85940000|lost|0 bytes
t|1/85AC0000|lost|0 bytes
f|1/85C40000|lost|0 bytes

Finally, at least once the following messages appeared in primary logs
**before** the "wal_status" changed from "keeping" to "streaming":

WARNING: some replication slots have lost required WAL segments

So the slot lost one WAL, but the standby has been able to catch-up anyway.

My humble opinion about these results:

* after many different tests, the status "keeping" appears only when "remain"
equals 0. In current implementation, "keeping" really adds no value...
* "remain" should be NULL if "max_slot_wal_keep_size=-1 or if the slot isn't
active
* the "lost" status should be a definitive status
* it seems related, but maybe the "wal_status" should be set as "lost"
only when the slot has been deactivate ?
* logs should warn about a failing slot as soon as it is effectively
deactivated, not before.

Attachment	Content-Type	Size
v14-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	10.1 KB
v14-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	12.6 KB
v14-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch	text/x-patch	1.4 KB
v14-0004-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	7.1 KB
v14-0005-Documentation-for-slot-limit-feature.patch	text/x-patch	4.5 KB
v14-0006-Check-removal-of-in-reading-segment-file.patch	text/x-patch	2.4 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	jgdr(at)dalibo(dot)com
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-07-30 12:30:45
Message-ID:	20190730.213045.221405075.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for reviewing!

At Thu, 27 Jun 2019 16:22:56 +0200, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote in <20190627162256(dot)4f4872b8(at)firost>
> Hi all,
>
> Being interested by this feature, I did a patch review.
>
> This features adds the GUC "max_slot_wal_keep_size". This is the maximum amount
> of WAL that can be kept in "pg_wal" by active slots.
>
> If the amount of WAL is superior to this limit, the slot is deactivated and
> its status (new filed in pg_replication_slot) is set as "lost".

This patch doesn't deactivate walsender. A walsender stops by
itself when it finds that it cannot continue ongoing replication.

> Patching
> ========
>
> The patch v13-0003 does not apply on HEAD anymore.
>
> The patch v13-0005 applies using "git am --ignore-space-change"
>
> Other patches applies correctly.
>
> Please, find attached the v14 set of patches rebased on master.

Sorry for missing this for log time. It is hit by 67b9b3ca32
again so I repost a rebased version.

> Documentation
> =============
>
> The documentation explains the GUC and related columns in "pg_replication_slot".
>
> It reflects correctly the current behavior of the patch.
>
>
> Usability
> =========
>
> The patch implement what it described. It is easy to enable and disable. The
> GUC name is describing correctly its purpose.
>
> This feature is useful in some HA scenario where slot are required (eg. no
> possible archiving), but where primary availability is more important than
> standbys.

Yes. Thanks for the clear explanation on the purpose.

> In "pg_replication_slots" view, the new "wal_status" field is misleading.
> Consider this sentence and the related behavior from documentation
> (catalogs.sgml):
>
> <literal>keeping</literal> means that some of them are to be removed by the
> next checkpoint.
>
> "keeping" appears when the current checkpoint will delete some WAL further than
> "current_lsn - max_slot_wal_keep_size", but still required by at least one slot.
> As some WAL required by some slots will be deleted quite soon, probably before
> anyone can react, "keeping" status is misleading here. We are already in the
> red zone.

It may be "losing", which would be less misleading.

> I would expect this "wal_status" to be:
>
> - streaming: slot lag between 0 and "max_wal_size"
> - keeping: slot lag between "max_wal_size" and "max_slot_wal_keep_size". the
> slot actually protect some WALs from being deleted
> - lost: slot lag superior of max_slot_wal_keep_size. The slot couldn't protect
> some WAL from deletion

I agree that comparing to max_wal_size is meaningful. The revised
version behaves as that.

> Documentation follows with:
>
> The last two states are seen only when max_slot_wal_keep_size is
> non-negative
>
> This is true with the current behavior. However, if "keeping" is set as soon as
> te slot lag is superior than "max_wal_size", this status could be useful even
> with "max_slot_wal_keep_size = -1". As soon as a slot is stacking WALs that
> should have been removed by previous checkpoint, it "keeps" them.

I revised the documentation that way. Both
view-pg-replication-slots.html and
runtime-config-replication.html are reworded.

> Feature tests
> =============
>
> I have played with various traffic shaping setup between nodes, to observe how
> columns "active", "wal_status" and "remain" behaves in regard to each others
> using:
>
> while true; do
>
<removed testing details>
>
> Finally, at least once the following messages appeared in primary logs
> **before** the "wal_status" changed from "keeping" to "streaming":
>
> WARNING: some replication slots have lost required WAL segments
>
> So the slot lost one WAL, but the standby has been able to catch-up anyway.

Thanks for the intensive test run. It is quite helpful.

> My humble opinion about these results:
>
> * after many different tests, the status "keeping" appears only when "remain"
> equals 0. In current implementation, "keeping" really adds no value...

Hmm. I agree that given that the "lost" (or "losing" in the
patch) state is a not definite state. That is, the slot may
recover from the state.

> * "remain" should be NULL if "max_slot_wal_keep_size=-1 or if the slot isn't
> active

The revised version shows the following statuses.

streaming / NULL max_slot_wal_keep_size is -1
unkown / NULL mswks >= 0 and restart_lsn is invalid
<status> / <bytes> elsewise

> * the "lost" status should be a definitive status
> * it seems related, but maybe the "wal_status" should be set as "lost"
> only when the slot has been deactivate ?

Agreed. While replication is active, if required segments seems
to be lost once, delayed walreceiver ack can advance restart_lsn
to "safe" zone later. So, in the revised version, if the segment
for restart_lsn has been removed, GetLsnAvailablity() returns
"losing" if walsender is active and "lost" if not.

> * logs should warn about a failing slot as soon as it is effectively
> deactivated, not before.

Agreed. Slots on which walsender is running are exlucded from the
output of ReplicationSlotsEnumerateBehnds. As theresult the "some
replcation slots lost.." is emitted after related walsender
stops.

I attach the revised patch. I'll repost the polished version
sooner.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v15-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	10.2 KB
v15-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	12.4 KB
v15-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch	text/x-patch	1.0 KB
v15-0004-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	7.9 KB
v15-0005-Documentation-for-slot-limit-feature.patch	text/x-patch	4.6 KB
v15-0006-Check-removal-of-in-reading-segment-file.patch	text/x-patch	2.4 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	jgdr(at)dalibo(dot)com
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-07-30 12:33:00
Message-ID:	20190730.213300.125800258.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 30 Jul 2019 21:30:45 +0900 (Tokyo Standard Time), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in <20190730(dot)213045(dot)221405075(dot)horikyota(dot)ntt(at)gmail(dot)com>
> I attach the revised patch. I'll repost the polished version
> sooner.

(Mainly TAP test and documentation, code comments will be revised)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	jgdr(at)dalibo(dot)com
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-07-31 07:56:16
Message-ID:	20190731.165616.261402513.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is the revised patch.

- Status criteria has been changed.

"streaming" : restart_lsn is within max_wal_size. (and kept)

"keeping" : restart_lsn is behind max_wal_size but still kept
by max_slot_wal_keep_size or wal_keep_segments.

"losing" : The segment for restart_lsn is being lost or has
been lost, but active walsender (or session) using the
slot is still running. If the walsender caught up before
stopped, the state will transfer to "keeping" or
"streaming" again.

"lost" : The segment for restart_lsn has been lost and the
active session on the slot is gone. The standby cannot
continue replication using this slot.

null : restart_lsn is null (never activated).

- remain is null if restart_lsn is null (never activated) or
wal_status is "losing" or "lost".

- catalogs.sgml is updated.

- Refactored GetLsnAvailability and GetOldestKeepSegment and
pg_get_replication_slots.

- TAP test is fied. But test for "losing" state cannot be done
since it needs interactive session. (I think using isolation
tester is too much)..

reards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v16-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	10.4 KB
v16-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	13.0 KB
v16-0003-Add-primary_slot_name-to-init_from_backup-in-TAP-tes.patch	text/x-patch	1.0 KB
v16-0004-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	8.1 KB
v16-0005-Documentation-for-slot-limit-feature.patch	text/x-patch	5.0 KB
v16-0006-Check-removal-of-in-reading-segment-file.patch	text/x-patch	2.4 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-09-17 19:58:00
Message-ID:	20190917195800.GA16694@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello

I have a couple of API-level reservation about this patch series.

Firstly, "behind" when used as a noun refers to buttocks. Therefore,
the ReplicationSlotsEnumerateBehinds function name seems funny (I think
when used as a preposition you wouldn't put it in plural). I don't
suggest a substitute name, because the API itself doesn't convince me; I
think it would be sufficient to have it return a single slot name,
perhaps the one that is behind the most ... or maybe the one that is
behind the least? This simplifies a lot of code (in particular you do
away with the bunch of statics, right?), and I don't think the warning
messages loses anything, because for details the user should really look
into the monitoring view anyway.

I didn't like GetLsnAvailability() returning a string either. It seems
more reasonable to me to define a enum with possible return states, and
have the enum value be expanded to some string in
pg_get_replication_slots().

In the same function, I think that setting restBytes to -1 when
"useless" is bad style. I would just leave that variable alone when the
returned status is not one that receives the number of bytes. So the
caller is only entitled to read the value if the returned enum value is
such-and-such ("keeping" and "streaming" I think).

I'm somewhat uncomfortable with the API change to GetOldestKeepSegment
in 0002. Can't its caller do the math itself instead?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-10-02 15:08:07
Message-ID:	20191002170807.5fdc8206@firost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 30 Jul 2019 21:30:45 +0900 (Tokyo Standard Time)
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:

> Thanks for reviewing!
>
> At Thu, 27 Jun 2019 16:22:56 +0200, Jehan-Guillaume de Rorthais
> <jgdr(at)dalibo(dot)com> wrote in <20190627162256(dot)4f4872b8(at)firost>
> > Hi all,
> >
> > Being interested by this feature, I did a patch review.
> >
> > This features adds the GUC "max_slot_wal_keep_size". This is the maximum
> > amount of WAL that can be kept in "pg_wal" by active slots.
> >
> > If the amount of WAL is superior to this limit, the slot is deactivated and
> > its status (new filed in pg_replication_slot) is set as "lost".
>
> This patch doesn't deactivate walsender. A walsender stops by
> itself when it finds that it cannot continue ongoing replication.

Sure, sorry for the confusion, I realize my sentence is ambiguous. Thanks for
the clarification.

[...]

> > In "pg_replication_slots" view, the new "wal_status" field is misleading.
> > Consider this sentence and the related behavior from documentation
> > (catalogs.sgml):
> >
> > <literal>keeping</literal> means that some of them are to be removed by
> > the next checkpoint.
> >
> > "keeping" appears when the current checkpoint will delete some WAL further
> > than "current_lsn - max_slot_wal_keep_size", but still required by at least
> > one slot. As some WAL required by some slots will be deleted quite soon,
> > probably before anyone can react, "keeping" status is misleading here. We
> > are already in the red zone.
>
> It may be "losing", which would be less misleading.

Indeed, "loosing" is a better match for this state.

However, what's the point of this state from the admin point of view? In various
situation, the admin will have no time to react immediately and fix whatever
could help.

How useful is this specific state?

> > I would expect this "wal_status" to be:
> >
> > - streaming: slot lag between 0 and "max_wal_size"
> > - keeping: slot lag between "max_wal_size" and "max_slot_wal_keep_size". the
> > slot actually protect some WALs from being deleted
> > - lost: slot lag superior of max_slot_wal_keep_size. The slot couldn't
> > protect some WAL from deletion
>
> I agree that comparing to max_wal_size is meaningful. The revised
> version behaves as that.

The v16-0006 patch doesn't apply anymore because of commit 709d003fbd. Here is
the fix:

--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -304,7 +304,7
- XLByteToSeg(targetPagePtr, targetSegNo, state->wal_segment_size);
+ XLByteToSeg(targetPagePtr, targetSegNo, state->segcxt.ws_segsize);

I suppose you might have more refactoring to do in regard with Alvaro's
review.

I confirm the new patch behaves correctly in my tests in regard with the
"wal_status" field.

> > Documentation follows with:
> >
> > The last two states are seen only when max_slot_wal_keep_size is
> > non-negative
> >
> > This is true with the current behavior. However, if "keeping" is set as
> > soon as te slot lag is superior than "max_wal_size", this status could be
> > useful even with "max_slot_wal_keep_size = -1". As soon as a slot is
> > stacking WALs that should have been removed by previous checkpoint, it
> > "keeps" them.
>
> I revised the documentation that way. Both
> view-pg-replication-slots.html and
> runtime-config-replication.html are reworded.

+ <entry>Availability of WAL records claimed by this
+ slot. <literal>streaming</literal>, <literal>keeping</literal>,

Slots are keeping WALs, not WAL records. Shouldn't it be "Availability of WAL
files claimed by this slot"?

+ <literal>streaming</literal> means that the claimed records are
+ available within max_wal_size. <literal>keeping</literal> means

I wonder if streaming is the appropriate name here. The WALs required might be
available for streaming, but the slot not active, thus not "streaming". What
about merging with the "active" field, in the same fashion as
pg_stat_activity.state? We would have an enum "pg_replication_slots.state" with
the following states:

* inactive: non active slot
* active: activated, required WAL within max_wal_size
* keeping: activated, max_wal_size is exceeded but still held by replication
slots or wal_keep_segments.
* lost: some WAL are definitely lost

Thoughts?

[...]
> > * "remain" should be NULL if "max_slot_wal_keep_size=-1 or if the slot isn't
> > active
>
> The revised version shows the following statuses.
>
> streaming / NULL max_slot_wal_keep_size is -1
> unkown / NULL mswks >= 0 and restart_lsn is invalid
> <status> / <bytes> elsewise

Works for me.

> > * the "lost" status should be a definitive status
> > * it seems related, but maybe the "wal_status" should be set as "lost"
> > only when the slot has been deactivate ?
>
> Agreed. While replication is active, if required segments seems
> to be lost once, delayed walreceiver ack can advance restart_lsn
> to "safe" zone later. So, in the revised version, if the segment
> for restart_lsn has been removed, GetLsnAvailablity() returns
> "losing" if walsender is active and "lost" if not.

ok.

> > * logs should warn about a failing slot as soon as it is effectively
> > deactivated, not before.
>
> Agreed. Slots on which walsender is running are exlucded from the
> output of ReplicationSlotsEnumerateBehnds. As theresult the "some
> replcation slots lost.." is emitted after related walsender
> stops.

Once a slot lost WALs and has been deactivated, the following message appears
during every checkpoints:

WARNING: some replication slots have lost required WAL segments
DETAIL: Slot slot_limit_st lost 177 segment(s)

I wonder if this is useful to show these messages for slots that were already
dead before this checkpoint?

Regards,

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, andres(at)anarazel(dot)de, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-11-28 03:09:03
Message-ID:	20191128030903.GN237562@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 02, 2019 at 05:08:07PM +0200, Jehan-Guillaume de Rorthais wrote:
> I wonder if this is useful to show these messages for slots that were already
> dead before this checkpoint?

This thread has been waiting for input from the patch author,
Horiguchi-san, for a couple of months now, so I am switching it to
returned with feedback in the CF app.
--
Michael

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	jgdr(at)dalibo(dot)com
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-12-24 12:26:14
Message-ID:	20191224.212614.633369820509385571.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I'm very sorry for being late to reply.

At Wed, 2 Oct 2019 17:08:07 +0200, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote in
> On Tue, 30 Jul 2019 21:30:45 +0900 (Tokyo Standard Time)
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > In "pg_replication_slots" view, the new "wal_status" field is misleading.
> > > Consider this sentence and the related behavior from documentation
> > > (catalogs.sgml):
> > >
> > > <literal>keeping</literal> means that some of them are to be removed by
> > > the next checkpoint.
> > >
> > > "keeping" appears when the current checkpoint will delete some WAL further
> > > than "current_lsn - max_slot_wal_keep_size", but still required by at least
> > > one slot. As some WAL required by some slots will be deleted quite soon,
> > > probably before anyone can react, "keeping" status is misleading here. We
> > > are already in the red zone.
> >
> > It may be "losing", which would be less misleading.
>
> Indeed, "loosing" is a better match for this state.
>
> However, what's the point of this state from the admin point of view? In various
> situation, the admin will have no time to react immediately and fix whatever
> could help.
>
> How useful is this specific state?

If we assume "losing" segments as "lost", a segment once "lost" can
return to "keeping" or "streaming" state. That is intuitively
impossible. On the other hand if we assume it as "keeping", it should
not be removed by the next checkpoint but actually it can be
removed. The state "losing" means such a unstable state different from
both "lost" and "keeping".

> > > I would expect this "wal_status" to be:
> > >
> > > - streaming: slot lag between 0 and "max_wal_size"
> > > - keeping: slot lag between "max_wal_size" and "max_slot_wal_keep_size". the
> > > slot actually protect some WALs from being deleted
> > > - lost: slot lag superior of max_slot_wal_keep_size. The slot couldn't
> > > protect some WAL from deletion
> >
> > I agree that comparing to max_wal_size is meaningful. The revised
> > version behaves as that.
>
> The v16-0006 patch doesn't apply anymore because of commit 709d003fbd. Here is
> the fix:
>
> --- a/src/backend/access/transam/xlogreader.c
> +++ b/src/backend/access/transam/xlogreader.c
> @@ -304,7 +304,7
> - XLByteToSeg(targetPagePtr, targetSegNo, state->wal_segment_size);
> + XLByteToSeg(targetPagePtr, targetSegNo, state->segcxt.ws_segsize);
>
> I suppose you might have more refactoring to do in regard with Alvaro's
> review.
>
> I confirm the new patch behaves correctly in my tests in regard with the
> "wal_status" field.

Thanks for testing. I fixed it in the attached patch.

> + <entry>Availability of WAL records claimed by this
> + slot. <literal>streaming</literal>, <literal>keeping</literal>,
>
> Slots are keeping WALs, not WAL records. Shouldn't it be "Availability of WAL
> files claimed by this slot"?

I choosed "record" since a slot points a record. I'm not sure but I'm
fine with "file". Fixed catalogs.sgml and config.sgml that way.

> + <literal>streaming</literal> means that the claimed records are
> + available within max_wal_size. <literal>keeping</literal> means
>
> I wonder if streaming is the appropriate name here. The WALs required might be
> available for streaming, but the slot not active, thus not "streaming". What
> about merging with the "active" field, in the same fashion as
> pg_stat_activity.state? We would have an enum "pg_replication_slots.state" with
> the following states:
> * inactive: non active slot
> * active: activated, required WAL within max_wal_size
> * keeping: activated, max_wal_size is exceeded but still held by replication
> slots or wal_keep_segments.
> * lost: some WAL are definitely lost
>
> Thoughts?

In the first place, I realized that I was missed a point about the
relationship between max_wal_size and max_slot_wal_keep_size
here. Since the v15 of this patch, GetLsnAvailablity returns
"streaming" when the restart_lsn is within max_wal_size. That behavior
makes sense when max_slot_wal_keep_size > max_wal_size. However, in
the contrary case, restart_lsn could be lost even it is
withinmax_wal_size. So we would see "streaming" (or "normal") even
though restart_lsn is already lost. That is broken.

In short, the "streaming/normal" state is useless if
max_slot_wal_keep_size < max_wal_size.

Finally I used the following wordings.

(there's no "inactive" wal_state)

* normal: required WAL within max_wal_size when max_slot_wal_keep_size
is larger than max_wal_size.
* keeping: required segments are held by replication slots or
wal_keep_segments.

* losing: required segments are about to be removed or may be already
removed but streaming is not dead yet.

* lost: cannot continue streaming using this slot.

> [...]
> > > * "remain" should be NULL if "max_slot_wal_keep_size=-1 or if the slot isn't
> > > active
> >
> > The revised version shows the following statuses.
> >
> > streaming / NULL max_slot_wal_keep_size is -1
> > unkown / NULL mswks >= 0 and restart_lsn is invalid
> > <status> / <bytes> elsewise
>
> Works for me.

Thanks.

> > > * the "lost" status should be a definitive status
> > > * it seems related, but maybe the "wal_status" should be set as "lost"
> > > only when the slot has been deactivate ?
> >
> > Agreed. While replication is active, if required segments seems
> > to be lost once, delayed walreceiver ack can advance restart_lsn
> > to "safe" zone later. So, in the revised version, if the segment
> > for restart_lsn has been removed, GetLsnAvailablity() returns
> > "losing" if walsender is active and "lost" if not.
>
> ok.
>
> > > * logs should warn about a failing slot as soon as it is effectively
> > > deactivated, not before.
> >
> > Agreed. Slots on which walsender is running are exlucded from the
> > output of ReplicationSlotsEnumerateBehnds. As theresult the "some
> > replcation slots lost.." is emitted after related walsender
> > stops.
>
> Once a slot lost WALs and has been deactivated, the following message appears
> during every checkpoints:
>
> WARNING: some replication slots have lost required WAL segments
> DETAIL: Slot slot_limit_st lost 177 segment(s)
>
> I wonder if this is useful to show these messages for slots that were already
> dead before this checkpoint?

Makes sense. I changed KeepLogSeg so that it emits the message only on
slot_names changes.

The attached v17 patch is changed in the follwing points.

- Rebased to the current master.

- Change KeepLogSeg not to emit the message "Slot %s lost %ld
segment(s)" if the slot list is not changed.

- Documentation is fixed following the change of state names.

- Change GetLsnAvailability returns more correct state for wider
situations. It returned a wrong status when max_slot_wal_keep_size
is smaller than max_wal_size, or when max_slot_wal_keep_size is
increased so that the new value covers the restart_lsn of a slot
that have lost required segments in the old setting.

Since it is needed by the above change, I revived
GetOldestXLogFileSegNo() that was removed in v15 as
FindOldestXLogFileSegNo() in a bit different shape.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v17-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	11.1 KB
v17-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	15.7 KB
v17-0003-Add-primary_slot_name-to-init_from_backup-in-TAP.patch	text/x-patch	1.0 KB
v17-0004-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	8.1 KB
v17-0005-Documentation-for-slot-limit-feature.patch	text/x-patch	5.0 KB
v17-0006-Check-removal-of-in-reading-segment-file.patch	text/x-patch	2.3 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	jgdr(at)dalibo(dot)com
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2019-12-26 09:08:12
Message-ID:	20191226.180812.1478956694079662301.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 24 Dec 2019 21:26:14 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> The attached v17 patch is changed in the follwing points.
>
> - Rebased to the current master.
>
> - Change KeepLogSeg not to emit the message "Slot %s lost %ld
> segment(s)" if the slot list is not changed.
>
> - Documentation is fixed following the change of state names.
>
> - Change GetLsnAvailability returns more correct state for wider
> situations. It returned a wrong status when max_slot_wal_keep_size
> is smaller than max_wal_size, or when max_slot_wal_keep_size is
> increased so that the new value covers the restart_lsn of a slot
> that have lost required segments in the old setting.
>
> Since it is needed by the above change, I revived
> GetOldestXLogFileSegNo() that was removed in v15 as
> FindOldestXLogFileSegNo() in a bit different shape.

I'd like to re-enter this patch to the next cf.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-01-22 16:47:23
Message-ID:	20200122174723.4a26ca12@firost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

First, it seems you did not reply to Alvaro's concerns in your new set of
patch. See:

/message-id/20190917195800.GA16694%40alvherre.pgsql

On Tue, 24 Dec 2019 21:26:14 +0900 (JST)
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
[...]
> > Indeed, "loosing" is a better match for this state.
> >
> > However, what's the point of this state from the admin point of view? In
> > various situation, the admin will have no time to react immediately and fix
> > whatever could help.
> >
> > How useful is this specific state?
>
> If we assume "losing" segments as "lost", a segment once "lost" can
> return to "keeping" or "streaming" state. That is intuitively
> impossible. On the other hand if we assume it as "keeping", it should
> not be removed by the next checkpoint but actually it can be
> removed. The state "losing" means such a unstable state different from
> both "lost" and "keeping".

OK, indeed.

But I'm still unconfortable with this "unstable" state. It would be better if
we could grab a stable state: either "keeping" or "lost".

> > + <entry>Availability of WAL records claimed by this
> > + slot. <literal>streaming</literal>, <literal>keeping</literal>,
> >
> > Slots are keeping WALs, not WAL records. Shouldn't it be "Availability of
> > WAL files claimed by this slot"?
>
> I choosed "record" since a slot points a record. I'm not sure but I'm
> fine with "file". Fixed catalogs.sgml and config.sgml that way.

Thanks.

> > + <literal>streaming</literal> means that the claimed records are
> > + available within max_wal_size. <literal>keeping</literal> means
> >
> > I wonder if streaming is the appropriate name here. The WALs required might
> > be available for streaming, but the slot not active, thus not "streaming".
> > What about merging with the "active" field, in the same fashion as
> > pg_stat_activity.state? We would have an enum "pg_replication_slots.state"
> > with the following states:
> > * inactive: non active slot
> > * active: activated, required WAL within max_wal_size
> > * keeping: activated, max_wal_size is exceeded but still held by replication
> > slots or wal_keep_segments.
> > * lost: some WAL are definitely lost
> >
> > Thoughts?
>
> In the first place, I realized that I was missed a point about the
> relationship between max_wal_size and max_slot_wal_keep_size
> here. Since the v15 of this patch, GetLsnAvailablity returns
> "streaming" when the restart_lsn is within max_wal_size. That behavior
> makes sense when max_slot_wal_keep_size > max_wal_size. However, in
> the contrary case, restart_lsn could be lost even it is
> withinmax_wal_size. So we would see "streaming" (or "normal") even
> though restart_lsn is already lost. That is broken.
>
> In short, the "streaming/normal" state is useless if
> max_slot_wal_keep_size < max_wal_size.

Good catch!

> Finally I used the following wordings.
>
> (there's no "inactive" wal_state)
>
> * normal: required WAL within max_wal_size when max_slot_wal_keep_size
> is larger than max_wal_size.
> * keeping: required segments are held by replication slots or
> wal_keep_segments.
>
> * losing: required segments are about to be removed or may be already
> removed but streaming is not dead yet.

As I wrote, I'm still uncomfortable with this state. Maybe we should ask
other reviewers opinions on this.

[...]
> > WARNING: some replication slots have lost required WAL segments
> > DETAIL: Slot slot_limit_st lost 177 segment(s)
> >
> > I wonder if this is useful to show these messages for slots that were
> > already dead before this checkpoint?
>
> Makes sense. I changed KeepLogSeg so that it emits the message only on
> slot_names changes.

Thanks.

Bellow some code review.

In regard with FindOldestXLogFileSegNo(...):

> /*
> * Return the oldest WAL segment file.
> *
> * The returned value is XLogGetLastRemovedSegno() + 1 when the function
> * returns a valid value. Otherwise this function scans over WAL files and
> * finds the oldest segment at the first time, which could be very slow.
> */
> XLogSegNo
> FindOldestXLogFileSegNo(void)

The comment is not clear to me. I suppose "at the first time" might better be
expressed as "if none has been removed since last startup"?

Moreover, what about patching XLogGetLastRemovedSegno() itself instead of
adding a function?

In regard with GetLsnAvailability(...):

> /*
> * Detect availability of the record at given targetLSN.
> *
> * targetLSN is restart_lsn of a slot.

Wrong argument name. It is called "restart_lsn" in the function
declaration.

> * restBytes is the pointer to uint64 variable, to store the remaining bytes
> * until the slot goes into "losing" state.

I'm not convinced with this argument name. What about "remainingBytes"? Note
that you use remaining_bytes elsewhere in your patch.

> * -1 is stored to restBytes if the values is useless.

What about returning a true negative value when the slot is really lost?

All in all, I feel like this function is on the fence between being generic
because of its name and being slot-only oriented because of the first parameter
name, use of "max_slot_wal_keep_size_mb", returned status and "slotPtr".

I wonder if it should be more generic and stay here or move to xlogfuncs.c with
a more specific name?

> * slot limitation is not activated, WAL files are kept unlimitedlllly

"unlimitedly"? "infinitely"? "unconditionally"?

> /* it is useless for the states below */
> *restBytes = -1;

This might be set to the real bytes kept, even if status is "losing".

> * The segment is alrady lost or being lost. If the oldest segment is just

"already"

> if (oldestSeg == restartSeg + 1 && walsender_pid != 0)
> return "losing";

I wonder if this should be "oldestSeg > restartSeg"?
Many segments can be removed by the next or running checkpoint. And a running
walsender can send more than one segment in the meantime I suppose?

In regard with GetOldestKeepSegment(...):

> static XLogSegNo
> GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
> XLogRecPtr targetLSN, int64 *restBytes)

I wonder if minSlotLSN is really useful as a parameter or if it should be
fetched from GetOldestKeepSegment() itself? Currently,
XLogGetReplicationSlotMinimumLSN() is always called right before
GetOldestKeepSegment() just to fill this argument.

> walstate =
> GetLsnAvailability(restart_lsn, active_pid, &remaining_bytes);

I agree with Alvaro: we might want to return an enum and define the related
state string here. Or, if we accept negative remaining_bytes, GetLsnAvailability
might even only return remaining_bytes and we deduce the state directly from
here.

Regards,

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	jgdr(at)dalibo(dot)com
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-01-23 12:28:54
Message-ID:	20200123.212854.658794168913258596.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, Jehan.

At Wed, 22 Jan 2020 17:47:23 +0100, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote in
> Hi,
>
> First, it seems you did not reply to Alvaro's concerns in your new set of
> patch. See:
>
> /message-id/20190917195800.GA16694%40alvherre.pgsql

Mmmm. Thank you very much for noticing that, Jehan, and sorry for
overlooking, Alvaro.

At Tue, 17 Sep 2019 16:58:00 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> suggest a substitute name, because the API itself doesn't convince me; I
> think it would be sufficient to have it return a single slot name,
> perhaps the one that is behind the most ... or maybe the one that is
> behind the least? This simplifies a lot of code (in particular you do
> away with the bunch of statics, right?), and I don't think the warning
> messages loses anything, because for details the user should really look
> into the monitoring view anyway.

Ok, I removed the fannily-named function. The message become more or
less the following. The DETAILS might not needed.

| WARNING: 2 replication slots have lost required WAL segments by 5 segments
| DETAIL: Most affected slot is s1.

> I didn't like GetLsnAvailability() returning a string either. It seems
> more reasonable to me to define a enum with possible return states, and
> have the enum value be expanded to some string in
> pg_get_replication_slots().

Agreed. Done.

> In the same function, I think that setting restBytes to -1 when
> "useless" is bad style. I would just leave that variable alone when the
> returned status is not one that receives the number of bytes. So the
> caller is only entitled to read the value if the returned enum value is
> such-and-such ("keeping" and "streaming" I think).

That is the only condition. If max_slot_wal_keep_size = -1, The value
is useless for the two states. I added that explanation to the
comment of Get(Lsn)Walavailability().

> I'm somewhat uncomfortable with the API change to GetOldestKeepSegment
> in 0002. Can't its caller do the math itself instead?

Mmm. Finally I found that I merged two calculations that have scarce
relation. You're right here. Thanks.

The attached v18 addressed all of your (Alvaro's) comments.

> On Tue, 24 Dec 2019 21:26:14 +0900 (JST)
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > If we assume "losing" segments as "lost", a segment once "lost" can
> > return to "keeping" or "streaming" state. That is intuitively
> > impossible. On the other hand if we assume it as "keeping", it should
> > not be removed by the next checkpoint but actually it can be
> > removed. The state "losing" means such a unstable state different from
> > both "lost" and "keeping".
>
> OK, indeed.
>
> But I'm still unconfortable with this "unstable" state. It would be better if
> we could grab a stable state: either "keeping" or "lost".

I feel the same, but the being-removed WAL segments remain until
checkpoint runs and even after removal replication can continue if
walsender is reading the removed-but-already-opened file. I'll put
more thought on that.

> > In short, the "streaming/normal" state is useless if
> > max_slot_wal_keep_size < max_wal_size.
>
> Good catch!

Thanks!:)

> > Finally I used the following wordings.
> >
> > (there's no "inactive" wal_state)
> >
> > * normal: required WAL within max_wal_size when max_slot_wal_keep_size
> > is larger than max_wal_size.
> > * keeping: required segments are held by replication slots or
> > wal_keep_segments.
> >
> > * losing: required segments are about to be removed or may be already
> > removed but streaming is not dead yet.
>
> As I wrote, I'm still uncomfortable with this state. Maybe we should ask
> other reviewers opinions on this.
>
> [...]
> > > WARNING: some replication slots have lost required WAL segments
> > > DETAIL: Slot slot_limit_st lost 177 segment(s)
> > >
> > > I wonder if this is useful to show these messages for slots that were
> > > already dead before this checkpoint?
> >
> > Makes sense. I changed KeepLogSeg so that it emits the message only on
> > slot_names changes.
>
> Thanks.
>
> Bellow some code review.

Thank you for the review, I don't have a time right now but address
the below comments them soon.

> In regard with FindOldestXLogFileSegNo(...):
>
> > /*
> > * Return the oldest WAL segment file.
> > *
> > * The returned value is XLogGetLastRemovedSegno() + 1 when the function
> > * returns a valid value. Otherwise this function scans over WAL files and
> > * finds the oldest segment at the first time, which could be very slow.
> > */
> > XLogSegNo
> > FindOldestXLogFileSegNo(void)
>
> The comment is not clear to me. I suppose "at the first time" might better be
> expressed as "if none has been removed since last startup"?
>
> Moreover, what about patching XLogGetLastRemovedSegno() itself instead of
> adding a function?
>
> In regard with GetLsnAvailability(...):
>
> > /*
> > * Detect availability of the record at given targetLSN.
> > *
> > * targetLSN is restart_lsn of a slot.
>
> Wrong argument name. It is called "restart_lsn" in the function
> declaration.
>
> > * restBytes is the pointer to uint64 variable, to store the remaining bytes
> > * until the slot goes into "losing" state.
>
> I'm not convinced with this argument name. What about "remainingBytes"? Note
> that you use remaining_bytes elsewhere in your patch.
>
> > * -1 is stored to restBytes if the values is useless.
>
> What about returning a true negative value when the slot is really lost?
>
> All in all, I feel like this function is on the fence between being generic
> because of its name and being slot-only oriented because of the first parameter
> name, use of "max_slot_wal_keep_size_mb", returned status and "slotPtr".
>
> I wonder if it should be more generic and stay here or move to xlogfuncs.c with
> a more specific name?
>
> > * slot limitation is not activated, WAL files are kept unlimitedlllly
>
> "unlimitedly"? "infinitely"? "unconditionally"?
>
> > /* it is useless for the states below */
> > *restBytes = -1;
>
> This might be set to the real bytes kept, even if status is "losing".
>
> > * The segment is alrady lost or being lost. If the oldest segment is just
>
> "already"
>
> > if (oldestSeg == restartSeg + 1 && walsender_pid != 0)
> > return "losing";
>
> I wonder if this should be "oldestSeg > restartSeg"?
> Many segments can be removed by the next or running checkpoint. And a running
> walsender can send more than one segment in the meantime I suppose?
>
> In regard with GetOldestKeepSegment(...):
>
> > static XLogSegNo
> > GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
> > XLogRecPtr targetLSN, int64 *restBytes)
>
> I wonder if minSlotLSN is really useful as a parameter or if it should be
> fetched from GetOldestKeepSegment() itself? Currently,
> XLogGetReplicationSlotMinimumLSN() is always called right before
> GetOldestKeepSegment() just to fill this argument.
>
> > walstate =
> > GetLsnAvailability(restart_lsn, active_pid, &remaining_bytes);
>
> I agree with Alvaro: we might want to return an enum and define the related
> state string here. Or, if we accept negative remaining_bytes, GetLsnAvailability
> might even only return remaining_bytes and we deduce the state directly from
> here.
>
> Regards,

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v18-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	11.1 KB
v18-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch	text/x-patch	21.6 KB
v18-0003-Add-primary_slot_name-to-init_from_backup-in-TAP.patch	text/x-patch	1.0 KB
v18-0004-TAP-test-for-the-slot-limit-feature.patch	text/x-patch	8.1 KB
v18-0005-Documentation-for-slot-limit-feature.patch	text/x-patch	5.0 KB
v18-0006-Check-removal-of-in-reading-segment-file.patch	text/x-patch	2.3 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	jgdr(at)dalibo(dot)com
Cc:	andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-01-23 12:33:25
Message-ID:	20200123.213325.1644493868573457113.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 23 Jan 2020 21:28:54 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > In the same function, I think that setting restBytes to -1 when
> > "useless" is bad style. I would just leave that variable alone when the
> > returned status is not one that receives the number of bytes. So the
> > caller is only entitled to read the value if the returned enum value is
> > such-and-such ("keeping" and "streaming" I think).
>
> That is the only condition. If max_slot_wal_keep_size = -1, The value
> is useless for the two states. I added that explanation to the
> comment of Get(Lsn)Walavailability().

The reply is bogus since restBytes is no longer a parameter of
GetWalAvailability following the next comment.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-30 23:03:27
Message-ID:	20200330230327.GA19428@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I rebased this patch; it's failing to apply due to minor concurrent
changes in PostgresNode.pm. I squashed the patches in a series that
made the most sense to me.

I have a question about static variable lastFoundOldestSeg in
FindOldestXLogFileSegNo. It may be set the first time the function
runs; if it is, the function never again does anything, it just returns
that value. In other words, the static value is never reset; it never
advances either. Isn't that strange? I think the coding is to assume
that XLogCtl->lastRemovedSegNo will always be set, so its code will
almost never run ... except when the very first wal file has not been
removed yet. This seems weird and pointless. Maybe we should think
about this differently -- example: if XLogGetLastRemovedSegno returns
zero, then the oldest file is the zeroth one. In what cases this is
wrong? Maybe we should fix those.

Regarding the PostgresNode change in 0001, I think adding a special
parameter for primary_slot_name is limited. I'd like to change the
definition so that anything that you give as a parameter that's not one
of the recognized keywords (has_streaming, etc) is tested to see if it's
a GUC; and if it is, then put it in postgresql.conf. This would have to
apply both to PostgresNode::init() as well as
PostgresNode::init_from_backup(), obviously, since it would make no
sense for the APIs to diverge on this point. So you'd be able to do
$node->init_from_backup(allow_streaming => 1, work_mem => "4MB");
without having to add code to init_from_backup to handle work_mem
specifically. This could be done by having a Perl hash with all the GUC
names, that we could read lazily from "postmaster --describe-config" the
first time we see an unrecognized keyword as an option to init() /
init_from_backup().

I edited the doc changes a bit.

I don't know what to think of 0003 yet. Has this been agreed to be a
good idea?

I also made a few small edits to the code; all cosmetic so far:

* added long_desc to the new GUC; it now reads:

{"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
gettext_noop("Sets the maximum size of WAL space reserved by replication slots."),
gettext_noop("Replication slots will be marked as failed, and segments released "
"for deletion or recycling, if this much space is occupied by WAL "
"on disk."),

* updated the comment to ConvertToXSegs() which is now being used for
this purpose

* remove outdated comment to GetWalAvailability; it was talking about
restBytes parameter that no longer exists

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v19-0001-Add-primary_slot_name-to-init_from_backup-in-TAP.patch	text/x-diff	1.1 KB
v19-0002-Add-WAL-relief-vent-for-replication-slots.patch	text/x-diff	35.2 KB
v19-0003-Check-removal-of-in-reading-segment-file.patch	text/x-diff	2.2 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-31 05:10:14
Message-ID:	20200331.141014.2028906248823085365.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for looking this and trouble rebasing!

At Mon, 30 Mar 2020 20:03:27 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> I rebased this patch; it's failing to apply due to minor concurrent
> changes in PostgresNode.pm. I squashed the patches in a series that
> made the most sense to me.
>
> I have a question about static variable lastFoundOldestSeg in
> FindOldestXLogFileSegNo. It may be set the first time the function
> runs; if it is, the function never again does anything, it just returns
> that value. In other words, the static value is never reset; it never
> advances either. Isn't that strange? I think the coding is to assume
> that XLogCtl->lastRemovedSegNo will always be set, so its code will
> almost never run ... except when the very first wal file has not been
> removed yet. This seems weird and pointless. Maybe we should think
> about this differently -- example: if XLogGetLastRemovedSegno returns
> zero, then the oldest file is the zeroth one. In what cases this is
> wrong? Maybe we should fix those.

That's right, but without the static variable, every call to the
pg_replication_slots view before the fist checkpoint causes scanning
pg_xlog. XLogCtl->lastRemovedSegNo advances only at a checkpoint, so
it is actually right that the return value from
FindOldestXLogFileSegNo doesn't change until the first checkpoint.

Also we could set XLogCtl->lastRemovedSegNo at startup, but the
scanning on pg_xlog is useless in most cases.

I avoided to update the XLogCtl->lastRemovedSegNo directlry, but the
third way would be if XLogGetLastRemovedSegno() returned 0, then set
XLogCtl->lastRemovedSegNo by scanning the WAL directory. The attached
takes this way.

> Regarding the PostgresNode change in 0001, I think adding a special
> parameter for primary_slot_name is limited. I'd like to change the
> definition so that anything that you give as a parameter that's not one
> of the recognized keywords (has_streaming, etc) is tested to see if it's
> a GUC; and if it is, then put it in postgresql.conf. This would have to
> apply both to PostgresNode::init() as well as
> PostgresNode::init_from_backup(), obviously, since it would make no
> sense for the APIs to diverge on this point. So you'd be able to do
> $node->init_from_backup(allow_streaming => 1, work_mem => "4MB");
> without having to add code to init_from_backup to handle work_mem
> specifically. This could be done by having a Perl hash with all the GUC
> names, that we could read lazily from "postmaster --describe-config" the
> first time we see an unrecognized keyword as an option to init() /
> init_from_backup().

Done that way. We could exclude "known" parameters by explicitly
delete the key at reading it, but I choosed to enumerate the known
keywords. Although it can be used widely but actually I changed only
018_repslot_limit.pl to use the feature.

> I edited the doc changes a bit.
>
> I don't know what to think of 0003 yet. Has this been agreed to be a
> good idea?

So it was a separate patch. I think it has not been approved nor
rejected. The main objective of the patch is preventing
pg_replication_slots.wal_status from strange coming back from the
"lost" state to other states. However, in the first place I doubt that
it's right that logical replication sends the content of a WAL segment
already recycled.

> I also made a few small edits to the code; all cosmetic so far:
>
> * added long_desc to the new GUC; it now reads:
>
> {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
> gettext_noop("Sets the maximum size of WAL space reserved by replication slots."),
> gettext_noop("Replication slots will be marked as failed, and segments released "
> "for deletion or recycling, if this much space is occupied by WAL "
> "on disk."),
>
> * updated the comment to ConvertToXSegs() which is now being used for
> this purpose
>
> * remove outdated comment to GetWalAvailability; it was talking about
> restBytes parameter that no longer exists

Thank you for the fixes. All of the looks fine.

I fixed several typos. (s/requred/required/, s/devinitly/definitely/,
s/errror/error/)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v20-0001-Allow-arbitrary-GUC-parameter-setting-init-and-i.patch	text/x-patch	2.7 KB
v20-0002-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	35.5 KB
v20-0003-Allow-init-and-init_from_backup-to-set-arbitrary.patch	text/x-patch	2.1 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-31 05:20:16
Message-ID:	20200331.142016.971756479150607227.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for looking this and trouble rebasing!

Also we could set XLogCtl->lastRemovedSegNo at startup, but the
scanning on pg_xlog is useless in most cases.

> I edited the doc changes a bit.
>
> I don't know what to think of 0003 yet. Has this been agreed to be a
> good idea?

Thank you for the fixes. All of the looks fine.

I fixed several typos. (s/requred/required/, s/devinitly/definitely/,
s/errror/error/)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v20-0001-Allow-arbitrary-GUC-parameter-setting-init-and-i.patch	text/x-patch	2.7 KB
v20-0002-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	35.5 KB
v20-0003-Allow-init-and-init_from_backup-to-set-arbitrary.patch	text/x-patch	2.1 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-31 15:07:55
Message-ID:	20200331150755.GA3858@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Mar-31, Kyotaro Horiguchi wrote:

> Thank you for looking this and trouble rebasing!
>
> At Mon, 30 Mar 2020 20:03:27 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> > I rebased this patch; it's failing to apply due to minor concurrent
> > changes in PostgresNode.pm. I squashed the patches in a series that
> > made the most sense to me.
> >
> > I have a question about static variable lastFoundOldestSeg in
> > FindOldestXLogFileSegNo. It may be set the first time the function
> > runs; if it is, the function never again does anything, it just returns
> > that value. In other words, the static value is never reset; it never
> > advances either. Isn't that strange? I think the coding is to assume
> > that XLogCtl->lastRemovedSegNo will always be set, so its code will
> > almost never run ... except when the very first wal file has not been
> > removed yet. This seems weird and pointless. Maybe we should think
> > about this differently -- example: if XLogGetLastRemovedSegno returns
> > zero, then the oldest file is the zeroth one. In what cases this is
> > wrong? Maybe we should fix those.
>
> That's right, but without the static variable, every call to the
> pg_replication_slots view before the fist checkpoint causes scanning
> pg_xlog. XLogCtl->lastRemovedSegNo advances only at a checkpoint, so
> it is actually right that the return value from
> FindOldestXLogFileSegNo doesn't change until the first checkpoint.
>
> Also we could set XLogCtl->lastRemovedSegNo at startup, but the
> scanning on pg_xlog is useless in most cases.
>
> I avoided to update the XLogCtl->lastRemovedSegNo directlry, but the
> third way would be if XLogGetLastRemovedSegno() returned 0, then set
> XLogCtl->lastRemovedSegNo by scanning the WAL directory. The attached
> takes this way.

I'm not sure if I explained my proposal clearly. What if
XLogGetLastRemovedSegno returning zero means that every segment is
valid? We don't need to scan pg_xlog at all.

> > Regarding the PostgresNode change in 0001, I think adding a special
> > parameter for primary_slot_name is limited. I'd like to change the
> > definition so that anything that you give as a parameter that's not one
> > of the recognized keywords (has_streaming, etc) is tested to see if it's
> > a GUC; and if it is, then put it in postgresql.conf. This would have to
> > apply both to PostgresNode::init() as well as
> > PostgresNode::init_from_backup(), obviously, since it would make no
> > sense for the APIs to diverge on this point. So you'd be able to do
> > $node->init_from_backup(allow_streaming => 1, work_mem => "4MB");
> > without having to add code to init_from_backup to handle work_mem
> > specifically. This could be done by having a Perl hash with all the GUC
> > names, that we could read lazily from "postmaster --describe-config" the
> > first time we see an unrecognized keyword as an option to init() /
> > init_from_backup().
>
> Done that way. We could exclude "known" parameters by explicitly
> delete the key at reading it, but I choosed to enumerate the known
> keywords. Although it can be used widely but actually I changed only
> 018_repslot_limit.pl to use the feature.

Hmm. I like this idea in general, but I'm not sure I want to introduce
it in this form right away. For the time being I realized while waking
up this morning we can just use $node->append_conf() in the
018_replslot_limit.pl file, like every other place that needs a special
config. There's no need to change the test infrastructure for this.

I'll go through this again. Many thanks,

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-31 17:18:36
Message-ID:	20200331171836.GA7973@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Mar-31, Alvaro Herrera wrote:

> I'm not sure if I explained my proposal clearly. What if
> XLogGetLastRemovedSegno returning zero means that every segment is
> valid? We don't need to scan pg_xlog at all.

I mean this:

XLogSegNo
FindOldestXLogFileSegNo(void)
{
XLogSegNo segno = XLogGetLastRemovedSegno();

/* this is the only special case we need to care about */
if (segno == 0)
return some-value;

return segno + 1;
}

... and that point one can further note that a freshly initdb'd system
(no file has been removed) has "1" as the first file. So when segno is
0, you can return 1 and all should be well. That means you can reduce
the function to this:

XLogSegNo
FindOldestXLogFileSegNo(void)
{
return XLogGetLastRemovedSegno() + 1;
}

The tests still pass with this coding.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-31 19:59:05
Message-ID:	20200331195905.GA20488@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Mar-31, Alvaro Herrera wrote:

> On 2020-Mar-31, Alvaro Herrera wrote:
>
> > I'm not sure if I explained my proposal clearly. What if
> > XLogGetLastRemovedSegno returning zero means that every segment is
> > valid? We don't need to scan pg_xlog at all.
>
> I mean this:

[v21 does it that way. Your typo fixes are included, but not the
LastRemoved stuff being discussed here. I also edited the shortdesc in
guc.c to better match {min,max}_wal_size.]

Hmm ... but if the user runs pg_resetwal to remove WAL segments, then
this will work badly for a time (until a segment is removed next). I'm
not very worried for that scenario, since surely the user will have to
reclone any standbys anyway. I think your v20 behaves better in that
case. But I'm not sure we should have that code to cater only to that
case ... seems to me that it will go untested 99.999% of the time.

Maybe you're aware of some other cases where lastRemovedSegNo is not
correct for the purposes of this feature?

I pushed the silly test_decoding test adjustment to get it out of the
way.

/me tries to figure out KeepLogSeg next

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v21-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-diff	33.4 KB
v21-0002-Check-removal-of-in-reading-segment-file.patch	text/x-diff	2.2 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-31 21:01:36
Message-ID:	20200331210136.GA6633@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I noticed some other things:

1. KeepLogSeg sends a warning message when slots fall behind. To do
this, it searches for "the most affected slot", that is, the slot that
lost the most data. But it seems to me that that's a bit pointless; if
a slot data, it's now useless and anything that was using that slot must
be recreated. If you only know what's the most affected slot, it's not
possible to see which *other* slots are affected. It doesn't matter if
the slot missed one segment or twenty segments or 9999 segments -- the
slot is now useless, or it is not useless. I think we should list the
slot that was *least* affected, i.e., the slot that lost the minimum
amount of segments; then the user knows that all slots that are older
than that one are *also* affected.

2. KeepLogSeg ignores slots that are active. I guess the logic here is
that if a slot is active, then it'll keep going until it catches up and
we don't need to do anything about the used disk space. But that seems
a false premise, because if a standby is so slow that it cannot keep up,
it will eventually run the master out of diskspace even if it's active
all the time. So I'm not seeing the reasoning that makes it useful to
skip checking active slots.

(BTW I don't think you need to keep that many static variables in that
function. Just the slot name should be sufficient, I think ... or maybe
even the *pointer* to the slot that was last reported.

I think if a slot is behind and it lost segments, we should kill the
walsender that's using it, and unreserve the segments. So maybe
something like

LWLockAcquire( ... );
for (i = 0 ; i < max_replication_slots; i++)
{
ReplicationSlot *s =
&ReplicationSlotCtl->replication_slots[i];
XLogSegNo slotSegNo;

XLByteToSeg(s->data.restart_lsn, slotSegNo, wal_segment_size);

if (s->in_use)
{
if (s->active_pid)
pids_to_kill = lappend(pids_to_kill, s->active_pid);

nslots_affected++;
... ; /* other stuff */
}
}
LWLockRelease( ... )
/* release lock before syscalls */
foreach(l, pids_to_kill)
{
kill(SIGTERM, lfirst_int(l));
}

I sense some attempt to salvage slots that are reading a segment that is
"outdated" and removed, but for which the walsender has an open file
descriptor. (This appears to be the "losing" state.) This seems
dangerous, for example the segment might be recycled and is being
overwritten with different data. Trying to keep track of that seems
doomed. And even if the walsender can still read that data, it's only a
matter of time before the next segment is also removed. So keeping the
walsender alive is futile; it only delays the inevitable.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-31 22:02:57
Message-ID:	20200331220257.GA9763@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Mar-31, Alvaro Herrera wrote:

> /* release lock before syscalls */
> foreach(l, pids_to_kill)
> {
> kill(SIGTERM, lfirst_int(l));
> }
>
> I sense some attempt to salvage slots that are reading a segment that is
> "outdated" and removed, but for which the walsender has an open file
> descriptor. (This appears to be the "losing" state.) This seems
> dangerous, for example the segment might be recycled and is being
> overwritten with different data. Trying to keep track of that seems
> doomed. And even if the walsender can still read that data, it's only a
> matter of time before the next segment is also removed. So keeping the
> walsender alive is futile; it only delays the inevitable.

I think we should kill(SIGTERM) the walsender using the slot (slot->active_pid),
then acquire the slot and set it to some state indicating that it is now
useless, no longer reserving WAL; so when the walsender is restarted, it
will find the slot cannot be used any longer. Two ideas come to mind
about doing this:

1. set the LSNs and Xmins to Invalid; keep only the slot name, database,
plug_in, etc. This makes monitoring harder, I think, because as soon as
the slot is gone you know nothing at all about it.

2. add a new flag to ReplicationSlotPersistentData to indicate that the
slot is dead. This preserves the LSN info for forensics, and might even
be easier to code.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-03-31 22:07:49
Message-ID:	20200331220749.GA10008@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Mar-31, Alvaro Herrera wrote:

> I think we should kill(SIGTERM) the walsender using the slot (slot->active_pid),
> then acquire the slot and set it to some state indicating that it is now
> useless, no longer reserving WAL; so when the walsender is restarted, it
> will find the slot cannot be used any longer.

Ah, I see ioguix already pointed this out and the response was that the
walsender stops by itself. Hmm. I suppose this works too ... it seems
a bit fragile, but maybe I'm too sensitive. Do we have other opinions
on this point?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-01 01:18:04
Message-ID:	20200401.101804.1527965430575324083.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 31 Mar 2020 14:18:36 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> On 2020-Mar-31, Alvaro Herrera wrote:
>
> > I'm not sure if I explained my proposal clearly. What if
> > XLogGetLastRemovedSegno returning zero means that every segment is
> > valid? We don't need to scan pg_xlog at all.
>
> I mean this:
>
> XLogSegNo
> FindOldestXLogFileSegNo(void)
> {
> XLogSegNo segno = XLogGetLastRemovedSegno();
>
> /* this is the only special case we need to care about */
> if (segno == 0)
> return some-value;
>
> return segno + 1;
> }
>
> ... and that point one can further note that a freshly initdb'd system
> (no file has been removed) has "1" as the first file. So when segno is
> 0, you can return 1 and all should be well. That means you can reduce
> the function to this:

If we don't scan the wal files, for example (somewhat artificail), if
segments canoot be removed by a wrong setting of archive_command,
GetWalAvailability can return false "removed(lost)" state. If
max_slot_wal_keep_size is shrinked is changed then restarted, the
function can return false "normal" or "keeping" states.

By the way the oldest segment of initdb'ed cluster was (14x)th for
me. So I think we can treat segno == 1 as "uncertain" or "unknown"
state, but that state lasts until a checkpoint actually removes a
segment.

> XLogSegNo
> FindOldestXLogFileSegNo(void)
> {
> return XLogGetLastRemovedSegno() + 1;
> }
>
>
> The tests still pass with this coding.

Mmm. Yeah, that affects when under an abnormal condition.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-01 01:25:43
Message-ID:	20200401.102543.257983075916136729.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 31 Mar 2020 16:59:05 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> On 2020-Mar-31, Alvaro Herrera wrote:
>
> > On 2020-Mar-31, Alvaro Herrera wrote:
> >
> > > I'm not sure if I explained my proposal clearly. What if
> > > XLogGetLastRemovedSegno returning zero means that every segment is
> > > valid? We don't need to scan pg_xlog at all.
> >
> > I mean this:
>
> [v21 does it that way. Your typo fixes are included, but not the
> LastRemoved stuff being discussed here. I also edited the shortdesc in
> guc.c to better match {min,max}_wal_size.]
>
> Hmm ... but if the user runs pg_resetwal to remove WAL segments, then
> this will work badly for a time (until a segment is removed next). I'm
> not very worried for that scenario, since surely the user will have to
> reclone any standbys anyway. I think your v20 behaves better in that
> case. But I'm not sure we should have that code to cater only to that
> case ... seems to me that it will go untested 99.999% of the time.

I feel the same. If we allow bogus status or "unkown" status before
the first checkpoint, we don't need to scan the directory.

> Maybe you're aware of some other cases where lastRemovedSegNo is not
> correct for the purposes of this feature?

The cases of archive-failure (false "removed") and change of
max_slot_wal_keep_size(false "normal/kept") mentioned in another mail.

> I pushed the silly test_decoding test adjustment to get it out of the
> way.
>
> /me tries to figure out KeepLogSeg next

Thanks.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-01 05:39:22
Message-ID:	20200401.143922.231086326869486790.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 31 Mar 2020 18:01:36 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> I noticed some other things:
>
> 1. KeepLogSeg sends a warning message when slots fall behind. To do
> this, it searches for "the most affected slot", that is, the slot that
> lost the most data. But it seems to me that that's a bit pointless; if
> a slot data, it's now useless and anything that was using that slot must
> be recreated. If you only know what's the most affected slot, it's not
> possible to see which *other* slots are affected. It doesn't matter if
> the slot missed one segment or twenty segments or 9999 segments -- the
> slot is now useless, or it is not useless. I think we should list the
> slot that was *least* affected, i.e., the slot that lost the minimum
> amount of segments; then the user knows that all slots that are older
> than that one are *also* affected.

Mmm. v17-0001 patch [1] shows it as the following:

> WARNING: some replication slots have lost required WAL segments
> DETAIL: Slot s1 lost 8 segment(s).
> WARNING: some replication slots have lost required WAL segments
> DETAIL: Slots s1, s2, s3 lost at most 9 segment(s).

And it is removed following a comment as [2] :p

I restored the feature in simpler shape in v22.

[1] /message-id/flat/20191224.212614.633369820509385571.horikyota.ntt%40gmail.com#cbc193425b95edd166a5c6d42fd579c6
[2] /message-id/20200123.212854.658794168913258596.horikyota.ntt%40gmail.com

> 2. KeepLogSeg ignores slots that are active. I guess the logic here is
> that if a slot is active, then it'll keep going until it catches up and
> we don't need to do anything about the used disk space. But that seems
> a false premise, because if a standby is so slow that it cannot keep up,
> it will eventually run the master out of diskspace even if it's active
> all the time. So I'm not seeing the reasoning that makes it useful to
> skip checking active slots.

Right. I unconsciously assumed synchronous replication. It should be
removed. Fixed.

> (BTW I don't think you need to keep that many static variables in that
> function. Just the slot name should be sufficient, I think ... or maybe
> even the *pointer* to the slot that was last reported.

Agreed. Fixed.

> I think if a slot is behind and it lost segments, we should kill the
> walsender that's using it, and unreserve the segments. So maybe
> something like

At Tue, 31 Mar 2020 19:07:49 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> > I think we should kill(SIGTERM) the walsender using the slot (slot->active_pid),
> > then acquire the slot and set it to some state indicating that it is now
> > useless, no longer reserving WAL; so when the walsender is restarted, it
> > will find the slot cannot be used any longer.
>
> Ah, I see ioguix already pointed this out and the response was that the
> walsender stops by itself. Hmm. I suppose this works too ... it seems
> a bit fragile, but maybe I'm too sensitive. Do we have other opinions
> on this point?

Yes it the check is performed after every block-read so walsender
doesn't seem to send a wrong record. The 0002 added that for
per-record basis so it can be said useless. But things get simpler by
killing such walsenders under a subtle condition, I think.

In the attached, 0002 removed and added walsender-kill code.

> I sense some attempt to salvage slots that are reading a segment
that is
> "outdated" and removed, but for which the walsender has an open file
> descriptor. (This appears to be the "losing" state.) This seems
> dangerous, for example the segment might be recycled and is being
> overwritten with different data. Trying to keep track of that seems
> doomed. And even if the walsender can still read that data, it's only a
> matter of time before the next segment is also removed. So keeping the
> walsender alive is futile; it only delays the inevitable.

Agreed.

The attached is v22, only one patch file.

- 0002 is removed

- I didn't add "unknown" status in wal_status, because it is quite
hard to explain reasonably. Instead, I added the following comment.

+ * Find the oldest extant segment file. We get 1 until checkpoint removes
+ * the first WAL segment file since startup, which causes the status being
+ * wrong under certain abnormal conditions but that doesn't actually harm.

- Changed the message in KeepLogSeg as described above.

- Don't ignore inactive slots in KeepLogSeg.

- Out-of-sync walsenders are killed immediately.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v22-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	33.6 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-03 23:14:03
Message-ID:	20200403231403.GA14901@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

So, the more I look at this patch, the less I like the way the slots are
handled.

* I think it's a mistake to try to do anything in KeepLogSeg itself;
that function is merely in charge of some arithmetic. I propose to
make that function aware of the new size limitation (so that it
doesn't trust the slot's LSNs completely), but to make the function
have no side effects. The attached patch does that, I hope.
To replace that responsibility, let's add another function. I named it
InvalidateObsoleteReplicationSlots(). In CreateCheckPoint and
CreateRestartPoint, we call the new function just before removing
segments. Note: the one in this patch doesn't actually work or even
compile.
The new function must:

1. mark the slot as "invalid" somehow. Maybe it would make sense to
add a new flag in the on-disk struct for this; but for now I'm just
thinking that changing the slot's restart_lsn is sufficient.
(Of course, I haven't tested this, so there might be side-effects that
mean that this idea doesn't work).

2. send SIGTERM to a walsender that's using such a slot.

3. Send the warning message. Instead of trying to construct a message
with a list of slots, send one message per slot. (I think this is
better from a translatability point of view, and also from a
monitoring PoV).

* GetWalAvailability seems too much in competition with
DistanceToWalRemoval. Which is weird, because both functions do
pretty much the same thing. I think a better design is to make the
former function return the distance as an out parameter.

* Andres complained that the "distance" column was not a great value to
expose (20171106132050(dot)6apzynxrqrzghb4r(at)alap3(dot)anarazel(dot)de). That's
right: it changes both by the insertion LSN as well as the slot's
consumption. Maybe we can expose the earliest live LSN (start of the
earliest segment?) as a new column. It'll be the same for all slots,
I suppose, but we don't care, do we?

I attach a rough sketch, which as I said before doesn't work and doesn't
compile. Sadly I have reached the end of my day here so I won't be able
to work on this for today anymore. I'll be glad to try again tomorrow,
but in the meantime I thought it was better to send it over and see
whether you had any thoughts about this proposed design (maybe you know
it doesn't work for some reason), or better yet, you have the chance to
actually complete the code or at least move it a little further.

Thanks

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v23-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-diff	29.1 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-06 09:50:27
Message-ID:	20200406.185027.648866525989475817.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Fri, 3 Apr 2020 20:14:03 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> So, the more I look at this patch, the less I like the way the slots are
> handled.
>
> * I think it's a mistake to try to do anything in KeepLogSeg itself;
> that function is merely in charge of some arithmetic. I propose to
> make that function aware of the new size limitation (so that it
> doesn't trust the slot's LSNs completely), but to make the function
> have no side effects. The attached patch does that, I hope.
> To replace that responsibility, let's add another function. I named it
> InvalidateObsoleteReplicationSlots(). In CreateCheckPoint and
> CreateRestartPoint, we call the new function just before removing
> segments. Note: the one in this patch doesn't actually work or even
> compile.

Agreed and thanks for the code. The patch is enough to express the
intention. I fixed some compilation errors and made a clean up of
KeepLogSeg. InvalidateObsoleteReplicationSlots requires the "oldest
preserved segment" so it should be called before _logSegNo--, not
after.

> The new function must:
>
> 1. mark the slot as "invalid" somehow. Maybe it would make sense to
> add a new flag in the on-disk struct for this; but for now I'm just
> thinking that changing the slot's restart_lsn is sufficient.
> (Of course, I haven't tested this, so there might be side-effects that
> mean that this idea doesn't work).
>
> 2. send SIGTERM to a walsender that's using such a slot.
>
> 3. Send the warning message. Instead of trying to construct a message
> with a list of slots, send one message per slot. (I think this is
> better from a translatability point of view, and also from a
> monitoring PoV).
>
> * GetWalAvailability seems too much in competition with
> DistanceToWalRemoval. Which is weird, because both functions do
> pretty much the same thing. I think a better design is to make the
> former function return the distance as an out parameter.

I agree to the aboves. When a slot is invlidated, the following
message is logged.

LOG: slot rep1 is invalidated at 0/1C00000 due to exceeding max_slot_wal_keep_size

> * Andres complained that the "distance" column was not a great value to
> expose (20171106132050(dot)6apzynxrqrzghb4r(at)alap3(dot)anarazel(dot)de). That's
> right: it changes both by the insertion LSN as well as the slot's
> consumption. Maybe we can expose the earliest live LSN (start of the
> earliest segment?) as a new column. It'll be the same for all slots,
> I suppose, but we don't care, do we?

I don't care as far as users can calculate the "remain" of individual
slots (that is, how far the current LSN can advance before the slot
loses data). But the "earliest live LSN (EL-LSN) is really not
relevant to the safeness of each slot. The distance from EL-LSN to
restart_lsn or the current LSN doesn't generally suggest the safeness
of individual slots. The only relevance would be if the distance from
EL-LSN to the current LSN is close to max_slot_wal_keep_size, the most
lagged slot could die in a short term.

FWIW, the relationship between the values are shown below.

(now)>>>
<--- past ----------------------------+--------------------future --->
lastRemovedSegment + 1
"earliest_live_lsn" | segment X |
| min(restart_lsn) restart_lsn[i] current_lsn | "The LSN X"
.+...+................+...............+>>>..............|...+ |
<--------max_slot_wal_keep_size------> |
<---"remain" --------------->|

So the "remain" is calculated using "restart_lsn(pg_lsn)",
max_slot_wal_keep_size(int in MB), wal_keep_segments(in segments) and
wal_segment_size (int in MB) and pg_current_wal_lsn()(pg_lsn). The
formula could be simplified by ignoring the segment size, but anyway
we don't have an arithmetic between pg_lsn and int in SQL interface.

Anyway in this version I added the "min_safe_lsn". And adjust the TAP
tests for that. It can use (pg_current_wal_lsn() - min_safe_lsn) as
the alternative index since there is only one slot while the test.

> I attach a rough sketch, which as I said before doesn't work and doesn't
> compile. Sadly I have reached the end of my day here so I won't be able
> to work on this for today anymore. I'll be glad to try again tomorrow,
> but in the meantime I thought it was better to send it over and see
> whether you had any thoughts about this proposed design (maybe you know
> it doesn't work for some reason), or better yet, you have the chance to
> actually complete the code or at least move it a little further.

WALAVAIL_BEING_REMOVED is removed since walsender is now actively
killed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v24-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	22.1 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-06 16:54:56
Message-ID:	20200406165456.GA4951@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-06, Kyotaro Horiguchi wrote:

> > * Andres complained that the "distance" column was not a great value to
> > expose (20171106132050(dot)6apzynxrqrzghb4r(at)alap3(dot)anarazel(dot)de). That's
> > right: it changes both by the insertion LSN as well as the slot's
> > consumption. Maybe we can expose the earliest live LSN (start of the
> > earliest segment?) as a new column. It'll be the same for all slots,
> > I suppose, but we don't care, do we?
>
> I don't care as far as users can calculate the "remain" of individual
> slots (that is, how far the current LSN can advance before the slot
> loses data). But the "earliest live LSN (EL-LSN) is really not
> relevant to the safeness of each slot. The distance from EL-LSN to
> restart_lsn or the current LSN doesn't generally suggest the safeness
> of individual slots. The only relevance would be if the distance from
> EL-LSN to the current LSN is close to max_slot_wal_keep_size, the most
> lagged slot could die in a short term.

Thanks for the revised version. Please note that you forgot to "git
add" the test file, to it's not in the patch.

I'm reviewing the patch now.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-06 18:58:39
Message-ID:	20200406185839.GA12500@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-06, Kyotaro Horiguchi wrote:

> At Fri, 3 Apr 2020 20:14:03 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in

> Agreed and thanks for the code. The patch is enough to express the
> intention. I fixed some compilation errors and made a clean up of
> KeepLogSeg. InvalidateObsoleteReplicationSlots requires the "oldest
> preserved segment" so it should be called before _logSegNo--, not
> after.

Ah, of course, thanks.

> I agree to the aboves. When a slot is invlidated, the following
> message is logged.
>
> LOG: slot rep1 is invalidated at 0/1C00000 due to exceeding max_slot_wal_keep_size

Sounds good. Here's a couple of further adjustments to your v24. This
passes the existing tests (pg_basebackup exception noted below), but I
don't have the updated 019_replslot_limit.pl, so that still needs to be
verified.

First, cosmetic changes in xlog.c.

Second, an unrelated bugfix: ReplicationSlotsComputeLogicalRestartLSN()
is able to return InvalidXLogRecPtr if there's a slot with invalid
restart_lsn. I'm fairly certain that that's bogus. I think this needs
to be backpatched.

Third: The loop in InvalidateObsoleteReplicationSlots was reading
restart_lsn without aquiring mutex. Split the "continue" line in two, so
in_use is checked without spinlock and restart_lsn is checked with it.
This means we also need to store restart_lsn in a local variable before
logging the message (because we don't want to log with spinlock held).
Also, use ereport() not elog() for that, and add quotes to the slot
name.

Lastly, I noticed that we're now changing the slot's restart_lsn to
Invalid without being the slot's owner, which goes counter to what is
said in slot.h:

* - Individual fields are protected by mutex where only the backend owning
* the slot is authorized to update the fields from its own slot. The
* backend owning the slot does not need to take this lock when reading its
* own fields, while concurrent backends not owning this slot should take the
* lock when reading this slot's data.

What this means is that if the slot owner walsender updates the
restart_lsn to a newer value just as we (checkpointer) are trying to set
it to Invalid, the owner's value might persist and our value would be
lost.

AFAICT if we were really stressed about getting this exactly correct,
then we would have to kill the walsender, wait for it to die, then
ReplicationSlotAcquire and *then* update
MyReplicationSlot->data.restart_lsn. But I don't think we want to do
that during checkpoint, and I'm not sure we need to be as strict anyway:
it seems to me that it suffices to check restart_lsn for being invalid
in the couple of places where the slot's owner advances (which is the
two auxiliary functions for ProcessStandbyReplyMessage). I have done so
in the attached. There are other places where the restart_lsn is set,
but those seem to be used only when the slot is created. I don't think
we need to cover for those, but I'm not 100% sure about that.

However, the change in PhysicalConfirmReceivedLocation() breaks
the way slots work for pg_basebackup: apparently the slot is created
with a restart_lsn of Invalid and we only advance it the first time we
process a feedback message from pg_basebackup. I have a vague feeling
that that's bogus, but I'll have to look at the involved code a little
bit more closely to be sure about this.

One last thing: I think we need to ReplicationSlotMarkDirty() and
ReplicationSlotSave() after changing the LSN. My patch doesn't do that.
I noticed that the checkpoint already saved the slot once; maybe it
would make more sense to avoid doubly-writing the files by removing
CheckPointReplicationSlots() from CheckPointGuts, and instead call it
just after doing InvalidateObsoleteReplicationSlots(). But this is not
very important, since we don't expect to be modifying slots because of
disk-space reasons very frequently anyway.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-Further-changes.patch	text/x-diff	6.0 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-06 22:15:55
Message-ID:	20200406221555.GA16211@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-06, Alvaro Herrera wrote:

> Lastly, I noticed that we're now changing the slot's restart_lsn to
> Invalid without being the slot's owner, which goes counter to what is
> said in slot.h:
>
> * - Individual fields are protected by mutex where only the backend owning
> * the slot is authorized to update the fields from its own slot. The
> * backend owning the slot does not need to take this lock when reading its
> * own fields, while concurrent backends not owning this slot should take the
> * lock when reading this slot's data.
>
> What this means is that if the slot owner walsender updates the
> restart_lsn to a newer value just as we (checkpointer) are trying to set
> it to Invalid, the owner's value might persist and our value would be
> lost.
>
> AFAICT if we were really stressed about getting this exactly correct,
> then we would have to kill the walsender, wait for it to die, then
> ReplicationSlotAcquire and *then* update
> MyReplicationSlot->data.restart_lsn.

So I had cold feet about the whole business of trying to write a
non-owned replication slot, so I tried to implemented the "exactly
correct" idea above. That's v25 here.

I think there's a race condition in this: if we kill a walsender and it
restarts immediately before we (checkpoint) can acquire the slot, we
will wait for it to terminate on its own. Fixing this requires changing
the ReplicationSlotAcquire API so that it knows not to wait but not
raise error either (so we can use an infinite loop: "acquire, if busy
send signal")

I also include a separate diff for a change that might or might not be
necessary, where xmins reserved by slots with restart_lsn=invalid are
ignored. I'm not yet sure that we should include this, but we should
keep an eye on it.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v25-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-diff	23.7 KB
ignore-xmin-invalidated.patch	text/x-diff	947 bytes

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-06 23:51:48
Message-ID:	20200406235148.GA29998@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-06, Alvaro Herrera wrote:

> I think there's a race condition in this: if we kill a walsender and it
> restarts immediately before we (checkpoint) can acquire the slot, we
> will wait for it to terminate on its own. Fixing this requires changing
> the ReplicationSlotAcquire API so that it knows not to wait but not
> raise error either (so we can use an infinite loop: "acquire, if busy
> send signal")

I think this should do it, but I didn't test it super-carefully and the
usage of the condition variable is not entirely kosher.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v25delta-walsender-kill-loop.patch	text/x-diff	6.1 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-07 00:12:43
Message-ID:	20200407.091243.1526068106063312556.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 6 Apr 2020 12:54:56 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> Thanks for the revised version. Please note that you forgot to "git
> add" the test file, to it's not in the patch.

Oops! I forgot that I was working after just doing patch -p1 on my
working directory. This is the version that contains the test script.

> I'm reviewing the patch now.

Thanks!

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
v25-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-patch	30.5 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-07 00:25:02
Message-ID:	20200407002502.GA7078@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-07, Kyotaro Horiguchi wrote:

> At Mon, 6 Apr 2020 12:54:56 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> > Thanks for the revised version. Please note that you forgot to "git
> > add" the test file, to it's not in the patch.
>
> Oops! I forgot that I was working after just doing patch -p1 on my
> working directory. This is the version that contains the test script.

Thanks! This v26 is what I submitted last (sans the "xmin" business I
mentioned), with this test file included, adjusted for the message
wording I used. These tests all pass for me.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v26-0001-Add-WAL-relief-vent-for-replication-slots.patch	text/x-diff	36.4 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-07 03:09:05
Message-ID:	20200407.120905.1507671100168805403.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 6 Apr 2020 14:58:39 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> > LOG: slot rep1 is invalidated at 0/1C00000 due to exceeding max_slot_wal_keep_size
>
> Sounds good. Here's a couple of further adjustments to your v24. This
> passes the existing tests (pg_basebackup exception noted below), but I
> don't have the updated 019_replslot_limit.pl, so that still needs to be
> verified.
>
> First, cosmetic changes in xlog.c.
>
> Second, an unrelated bugfix: ReplicationSlotsComputeLogicalRestartLSN()
> is able to return InvalidXLogRecPtr if there's a slot with invalid
> restart_lsn. I'm fairly certain that that's bogus. I think this needs
> to be backpatched.

Logical slots are not assumed to be in that state, tait is, in_use but
having invalid restart_lsn. Maybe we need to define the behavor if
restart_lsn is invalid (but confirmed_flush_lsn is valid)?

> Third: The loop in InvalidateObsoleteReplicationSlots was reading
> restart_lsn without aquiring mutex. Split the "continue" line in two, so
> in_use is checked without spinlock and restart_lsn is checked with it.

Right. Thanks.

> This means we also need to store restart_lsn in a local variable before
> logging the message (because we don't want to log with spinlock held).
> Also, use ereport() not elog() for that, and add quotes to the slot
> name.

I omitted the quotes since slot names don't contain white spaces, but,
yes, it is quoted in other places. elog is just my bad.

Right.

> AFAICT if we were really stressed about getting this exactly correct,
> then we would have to kill the walsender, wait for it to die, then
> ReplicationSlotAcquire and *then* update
> MyReplicationSlot->data.restart_lsn. But I don't think we want to do
> that during checkpoint, and I'm not sure we need to be as strict anyway:

Agreed.

> it seems to me that it suffices to check restart_lsn for being invalid
> in the couple of places where the slot's owner advances (which is the
> two auxiliary functions for ProcessStandbyReplyMessage). I have done so
> in the attached. There are other places where the restart_lsn is set,
> but those seem to be used only when the slot is created. I don't think
> we need to cover for those, but I'm not 100% sure about that.

StartLogicalReplcation does
"XLogBeginRead(,MyReplicationSlot->data.restart_lsn)". If the
restart_lsn is invalid, following call to XLogReadRecord runs into
assertion failure. Walsender (or StartLogicalReplication) should
correctly reject reconnection from the subscriber if restart_lsn is
invalid.

> However, the change in PhysicalConfirmReceivedLocation() breaks
> the way slots work for pg_basebackup: apparently the slot is created
> with a restart_lsn of Invalid and we only advance it the first time we
> process a feedback message from pg_basebackup. I have a vague feeling
> that that's bogus, but I'll have to look at the involved code a little
> bit more closely to be sure about this.

Mmm. Couldn't we have a new member 'invalidated' in ReplicationSlot?

> One last thing: I think we need to ReplicationSlotMarkDirty() and
> ReplicationSlotSave() after changing the LSN. My patch doesn't do that.

Oops.

> I noticed that the checkpoint already saved the slot once; maybe it
> would make more sense to avoid doubly-writing the files by removing
> CheckPointReplicationSlots() from CheckPointGuts, and instead call it
> just after doing InvalidateObsoleteReplicationSlots(). But this is not
> very important, since we don't expect to be modifying slots because of
> disk-space reasons very frequently anyway.

Agreed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-07 07:30:43
Message-ID:	20200407.163043.2050717072576572791.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 07 Apr 2020 12:09:05 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > it seems to me that it suffices to check restart_lsn for being invalid
> > in the couple of places where the slot's owner advances (which is the
> > two auxiliary functions for ProcessStandbyReplyMessage). I have done so
> > in the attached. There are other places where the restart_lsn is set,
> > but those seem to be used only when the slot is created. I don't think
> > we need to cover for those, but I'm not 100% sure about that.
>
> StartLogicalReplcation does
> "XLogBeginRead(,MyReplicationSlot->data.restart_lsn)". If the
> restart_lsn is invalid, following call to XLogReadRecord runs into
> assertion failure. Walsender (or StartLogicalReplication) should
> correctly reject reconnection from the subscriber if restart_lsn is
> invalid.
>
> > However, the change in PhysicalConfirmReceivedLocation() breaks
> > the way slots work for pg_basebackup: apparently the slot is created
> > with a restart_lsn of Invalid and we only advance it the first time we
> > process a feedback message from pg_basebackup. I have a vague feeling
> > that that's bogus, but I'll have to look at the involved code a little
> > bit more closely to be sure about this.
>
> Mmm. Couldn't we have a new member 'invalidated' in ReplicationSlot?

I did that in the attached. The invalidated is shared-but-not-saved
member of a slot and initialized to false then irreversibly changed to
true when the slot loses required segment.

It is checked by the new function CheckReplicationSlotInvalidated() at
acquireing a slot and at updating slot by standby reply message. This
change stops walsender without explicitly killing but I didn't remove
that code.

When logical slot loses segment, the publisher complains as:

[backend ] LOG: slot "s1" is invalidated at 0/370001C0 due to exceeding max_slot_wal_keep_size
[walsender] FATAL: terminating connection due to administrator command

The subscriber tries to reconnect and that fails as follows:

[19350] ERROR: replication slot "s1" is invalidated
[19352] ERROR: replication slot "s1" is invalidated
...

If the publisher restarts, the message is not seen and see the
following instead.

[19372] ERROR: requested WAL segment 000000010000000000000037 has already been removed

The check is done at ReplicationSlotAcquire, some slot-related SQL
functions are affected.

=# select pg_replication_slot_advance('s1', '0/37000000');
ERROR: replication slot "s1" is invalidated

After restarting the publisher, the message changes as the same with
walsender.

=# select pg_replication_slot_advance('s1', '0/380001C0');
ERROR: requested WAL segment pg_wal/000000010000000000000037 has already been removed

Since I didn't touch restart_lsn at all so no fear for changing other
behavior inadvertently.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-further-change-type-2.patch	text/x-patch	6.5 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-07 22:45:22
Message-ID:	20200407224522.GA18671@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-07, Kyotaro Horiguchi wrote:

> > Mmm. Couldn't we have a new member 'invalidated' in ReplicationSlot?
>
> I did that in the attached. The invalidated is shared-but-not-saved
> member of a slot and initialized to false then irreversibly changed to
> true when the slot loses required segment.
>
> It is checked by the new function CheckReplicationSlotInvalidated() at
> acquireing a slot and at updating slot by standby reply message. This
> change stops walsender without explicitly killing but I didn't remove
> that code.

This change didn't work well with my proposed change to make
checkpointer acquire slots before marking them invalid. When I
incorporated your patch in the last version I posted yesterday, there
was a problem that when checkpointer attempted to acquire the slot, it
would fail with "the slot is invalidated"; also if you try to drop the
slot, it would obviously fail. I think it would work to remove the
SlotIsInvalidated check from the Acquire routine, and instead move it to
the routines that need it (ie. not the InvalidateObsolete one, and also
not the routine to drop slots).

I pushed version 26, with a few further adjustments.

I think what we have now is sufficient, but if you want to attempt this
"invalidated" flag on top of what I pushed, be my guest.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-08 00:37:10
Message-ID:	20200408.093710.447591748588426656.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for committing this.

At Tue, 7 Apr 2020 18:45:22 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> On 2020-Apr-07, Kyotaro Horiguchi wrote:
>
> > > Mmm. Couldn't we have a new member 'invalidated' in ReplicationSlot?
> >
> > I did that in the attached. The invalidated is shared-but-not-saved
> > member of a slot and initialized to false then irreversibly changed to
> > true when the slot loses required segment.
> >
> > It is checked by the new function CheckReplicationSlotInvalidated() at
> > acquireing a slot and at updating slot by standby reply message. This
> > change stops walsender without explicitly killing but I didn't remove
> > that code.
>
> This change didn't work well with my proposed change to make
> checkpointer acquire slots before marking them invalid. When I
> incorporated your patch in the last version I posted yesterday, there
> was a problem that when checkpointer attempted to acquire the slot, it
> would fail with "the slot is invalidated"; also if you try to drop the
> slot, it would obviously fail. I think it would work to remove the
> SlotIsInvalidated check from the Acquire routine, and instead move it to
> the routines that need it (ie. not the InvalidateObsolete one, and also
> not the routine to drop slots).
>
> I pushed version 26, with a few further adjustments.
>
> I think what we have now is sufficient, but if you want to attempt this
> "invalidated" flag on top of what I pushed, be my guest.

I don't think the invalidation flag is essential but it can prevent
unanticipated behavior, in other words, it makes us feel at ease:p

After the current master/HEAD, the following steps causes assertion
failure in xlogreader.c.

P(ublisher) $ vi $PGDATA/postgresql.conf
wal_level=logical
max_slot_wal_keep_size=0
^Z
(start publisher and subscriber)

P=> create table t(a int);
P=> create publication p1 for table t;
S=> create table t(a int);
P=> create table tt(); drop table tt; select pg_switch_wal(); checkpoint;
(publisher crashes)

2020-04-08 09:20:16.893 JST [9582] LOG: invalidating slot "s1" because its restart_lsn 0/1571770 exceeds max_slot_wal_keep_size
2020-04-08 09:20:16.897 JST [9496] LOG: database system is ready to accept connections
2020-04-08 09:20:21.472 JST [9597] LOG: starting logical decoding for slot "s1"
2020-04-08 09:20:21.472 JST [9597] DETAIL: Streaming transactions committing after 0/1571770, reading WAL from 0/0.
TRAP: FailedAssertion("!XLogRecPtrIsInvalid(RecPtr)", File: "xlogreader.c", Line: 235)
postgres: walsender horiguti [local] idle(ExceptionalCondition+0xa8)[0xaac4c1]
postgres: walsender horiguti [local] idle(XLogBeginRead+0x30)[0x588dbf]
postgres: walsender horiguti [local] idle[0x8c938b]
postgres: walsender horiguti [local] idle(exec_replication_command+0x311)[0x8c9c75]
postgres: walsender horiguti [local] idle(PostgresMain+0x79a)[0x92f091]
postgres: walsender horiguti [local] idle[0x87eec3]
postgres: walsender horiguti [local] idle[0x87e69a]
postgres: walsender horiguti [local] idle[0x87abc2]
postgres: walsender horiguti [local] idle(PostmasterMain+0x11cd)[0x87a48f]
postgres: walsender horiguti [local] idle[0x7852cb]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7fc190958873]
postgres: walsender horiguti [local] idle(_start+0x2e)[0x48169e]
2020-04-08 09:20:22.255 JST [9496] LOG: server process (PID 9597) was terminated by signal 6: Aborted
2020-04-08 09:20:22.255 JST [9496] LOG: terminating any other active server processes
2020-04-08 09:20:22.256 JST [9593] WARNING: terminating connection because of crash of another server process
2020-04-08 09:20:22.256 JST [9593] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

I will look at it.

On the other hand, physical replication doesn't break by invlidation.

Primary: postgres.conf
max_slot_wal_keep_size=0
Standby: postgres.conf
primary_conninfo='connect to master'
primary_slot_name='x1'

(start the primary)
P=> select pg_create_physical_replication_slot('x1');
(start the standby)
S=> create table tt(); drop table tt; select pg_switch_wal(); checkpoint;

(primary log)
2020-04-08 09:35:09.719 JST [10064] LOG: terminating walsender 10076 because replication slot "x1" is too far behind
2020-04-08 09:35:09.719 JST [10076] FATAL: terminating connection due to administrator command
2020-04-08 09:35:09.720 JST [10064] LOG: invalidating slot "x1" because its restart_lsn 0/B9F2000 exceeds max_slot_wal_keep_size
(standby)
[10075] 2020-04-08 09:35:09.723 JST FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[10101] 2020-04-08 09:35:09.734 JST LOG: started streaming WAL from primary at 0/C000000 on timeline 1

Doesn't harm but something's strange. I'll look it, too.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-08 05:19:56
Message-ID:	20200408.141956.891237856186513376.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > I pushed version 26, with a few further adjustments.
> >
> > I think what we have now is sufficient, but if you want to attempt this
> > "invalidated" flag on top of what I pushed, be my guest.
>
> I don't think the invalidation flag is essential but it can prevent
> unanticipated behavior, in other words, it makes us feel at ease:p
>
> After the current master/HEAD, the following steps causes assertion
> failure in xlogreader.c.
..
> I will look at it.

Just avoiding starting replication when restart_lsn is invalid is
sufficient (the attached, which is equivalent to a part of what the
invalidated flag did). I thing that the error message needs a Hint but
it looks on the subscriber side as:

[22086] 2020-04-08 10:35:04.188 JST ERROR: could not receive data from WAL stream: ERROR: replication slot "s1" is invalidated
HINT: The slot exceeds the limit by max_slot_wal_keep_size.

I don't think it is not clean.. Perhaps the subscriber should remove
the trailing line of the message from the publisher?

> On the other hand, physical replication doesn't break by invlidation.
>
> Primary: postgres.conf
> max_slot_wal_keep_size=0
> Standby: postgres.conf
> primary_conninfo='connect to master'
> primary_slot_name='x1'
>
> (start the primary)
> P=> select pg_create_physical_replication_slot('x1');
> (start the standby)
> S=> create table tt(); drop table tt; select pg_switch_wal(); checkpoint;

If we don't mind that standby can reconnect after a walsender
termination due to the invalidation, we don't need to do something for
this. Restricting max_slot_wal_keep_size to be larger than a certain
threshold would reduce the chance we see that behavior.

I saw another issue, the following sequence on the primary freezes
when invalidation happens.

=# create table tt(); drop table tt; select pg_switch_wal();create table tt(); drop table tt; select pg_switch_wal();create table tt(); drop table tt; select pg_switch_wal(); checkpoint;

The last checkpoint command is waiting for CV on
CheckpointerShmem->start_cv in RequestCheckpoint(), while Checkpointer
is waiting for the next latch at the end of
CheckpointerMain. new_started doesn't move but it is the same value
with old_started.

That freeze didn't happen when I removed
ConditionVariableSleep(&s->active_cv) in
InvalidateObsoleteReplicationSlots.

I continue investigating it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-walsender-crash-fix.patch	text/x-patch	1.0 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-08 07:46:05
Message-ID:	20200408.164605.1874250940847340108.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> I saw another issue, the following sequence on the primary freezes
> when invalidation happens.
>
> =# create table tt(); drop table tt; select pg_switch_wal();create table tt(); drop table tt; select pg_switch_wal();create table tt(); drop table tt; select pg_switch_wal(); checkpoint;
>
> The last checkpoint command is waiting for CV on
> CheckpointerShmem->start_cv in RequestCheckpoint(), while Checkpointer
> is waiting for the next latch at the end of
> CheckpointerMain. new_started doesn't move but it is the same value
> with old_started.
>
> That freeze didn't happen when I removed
> ConditionVariableSleep(&s->active_cv) in
> InvalidateObsoleteReplicationSlots.
>
> I continue investigating it.

I understand how it happens.

The latch triggered by checkpoint request by CHECKPOINT command has
been absorbed by ConditionVariableSleep() in
InvalidateObsoleteReplicationSlots. The attached allows checkpointer
use MyLatch for other than checkpoint request while a checkpoint is
running.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-MyLatch-of-checkpointer-for-other-use.patch	text/x-patch	1.8 KB

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-08 08:02:22
Message-ID:	20200408.170222.994602368402118750.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 08 Apr 2020 16:46:05 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> The latch triggered by checkpoint request by CHECKPOINT command has
> been absorbed by ConditionVariableSleep() in
> InvalidateObsoleteReplicationSlots. The attached allows checkpointer
> use MyLatch for other than checkpoint request while a checkpoint is
> running.

Checkpoint requests happens during waiting for the CV causes spurious
wake up but that doesn't harm.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-10 02:59:20
Message-ID:	20200410.115920.1130303270055020682.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
me> Just avoiding starting replication when restart_lsn is invalid is
me> sufficient (the attached, which is equivalent to a part of what the
me> invalidated flag did). I thing that the error message needs a Hint but
me> it looks on the subscriber side as:

At Wed, 08 Apr 2020 17:02:22 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
me> > At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
me> > The latch triggered by checkpoint request by CHECKPOINT command has
me> > been absorbed by ConditionVariableSleep() in
me> > InvalidateObsoleteReplicationSlots. The attached allows checkpointer
me> > use MyLatch for other than checkpoint request while a checkpoint is
me> > running.
me>
me> Checkpoint requests happens during waiting for the CV causes spurious
me> wake up but that doesn't harm.

I added the two above to open items[1] so as not to be forgotten.

[1] https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items#Open_Issues

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-27 22:33:42
Message-ID:	20200427223342.GA23152@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-08, Kyotaro Horiguchi wrote:

> At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
>
> Just avoiding starting replication when restart_lsn is invalid is
> sufficient (the attached, which is equivalent to a part of what the
> invalidated flag did). I thing that the error message needs a Hint but
> it looks on the subscriber side as:
>
> [22086] 2020-04-08 10:35:04.188 JST ERROR: could not receive data from WAL stream: ERROR: replication slot "s1" is invalidated
> HINT: The slot exceeds the limit by max_slot_wal_keep_size.
>
> I don't think it is not clean.. Perhaps the subscriber should remove
> the trailing line of the message from the publisher?

Thanks for the fix! I propose two changes:

1. reword the error like this:

ERROR: replication slot "regression_slot3" cannot be advanced
DETAIL: This slot has never previously reserved WAL, or has been invalidated

2. use the same error in one other place, to wit
pg_logical_slot_get_changes() and pg_replication_slot_advance(). I
made the DETAIL part the same in all places, but the ERROR line is
adjusted to what each callsite is doing.
I do think that this change in test_decoding is a bit unpleasant:

-ERROR: cannot use physical replication slot for logical decoding
+ERROR: cannot get changes from replication slot "repl"

The test is
-- check that we're detecting a streaming rep slot used for logical decoding
SELECT 'init' FROM pg_create_physical_replication_slot('repl');
SELECT data FROM pg_logical_slot_get_changes('repl', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');

> > On the other hand, physical replication doesn't break by invlidation.
> > [...]
>
> If we don't mind that standby can reconnect after a walsender
> termination due to the invalidation, we don't need to do something for
> this. Restricting max_slot_wal_keep_size to be larger than a certain
> threshold would reduce the chance we see that behavior.

Yeah, I think you're referring to the fact that StartReplication()
doesn't verify the restart_lsn of the slot; and if we do add a check, a
few tests that rely on physical replication start to fail. This patch
only adds a comment in that spot. But I don't (yet) know what the
consequences of this are, or whether it can be fixed by setting a valid
restart_lsn ahead of time. This test in pg_basebackup fails, for
example:

# Running: pg_basebackup -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl -X stream -S slot1
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: cannot read from replication slot "slot1"
DETAIL: This slot has never previously reserved WAL, or has been invalidated
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory "/home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl"
not ok 95 - pg_basebackup -X stream with replication slot runs

# Failed test 'pg_basebackup -X stream with replication slot runs'
# at t/010_pg_basebackup.pl line 461.

Anyway I think the current patch can be applied as is -- and if we want
physical replication to have some other behavior, we can patch for that
afterwards.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-check-slot-restart_lsn-in-a-couple-of-places.patch	text/x-diff	5.1 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-27 23:40:07
Message-ID:	20200427234007.GA14631@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토SQL

On 2020-Apr-08, Kyotaro Horiguchi wrote:

> I understand how it happens.
>
> The latch triggered by checkpoint request by CHECKPOINT command has
> been absorbed by ConditionVariableSleep() in
> InvalidateObsoleteReplicationSlots. The attached allows checkpointer
> use MyLatch for other than checkpoint request while a checkpoint is
> running.

Hmm, that explanation makes sense, but I couldn't reproduce it with the
steps you provided. Perhaps I'm missing something.

Anyway I think this patch should fix it also -- instead of adding a new
flag, we just rely on the existing flags (since do_checkpoint must have
been set correctly from the flags earlier in that block.)

I think it'd be worth to verify this bugfix in a new test. Would you
have time to produce that? I could try in a couple of days ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-Don-t-freeze-on-checkpoints.patch	text/x-diff	843 bytes

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-28 04:58:57
Message-ID:	20200428.135857.1243584941650464602.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 27 Apr 2020 18:33:42 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> On 2020-Apr-08, Kyotaro Horiguchi wrote:
>
> > At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> >
> > Just avoiding starting replication when restart_lsn is invalid is
> > sufficient (the attached, which is equivalent to a part of what the
> > invalidated flag did). I thing that the error message needs a Hint but
> > it looks on the subscriber side as:
> >
> > [22086] 2020-04-08 10:35:04.188 JST ERROR: could not receive data from WAL stream: ERROR: replication slot "s1" is invalidated
> > HINT: The slot exceeds the limit by max_slot_wal_keep_size.
> >
> > I don't think it is not clean.. Perhaps the subscriber should remove
> > the trailing line of the message from the publisher?
>
> Thanks for the fix! I propose two changes:
>
> 1. reword the error like this:
>
> ERROR: replication slot "regression_slot3" cannot be advanced
> DETAIL: This slot has never previously reserved WAL, or has been invalidated

Agreed to describe what is failed rather than the cause. However,
logical replications slots are always "previously reserved" at
creation.

> 2. use the same error in one other place, to wit
> pg_logical_slot_get_changes() and pg_replication_slot_advance(). I
> made the DETAIL part the same in all places, but the ERROR line is
> adjusted to what each callsite is doing.
> I do think that this change in test_decoding is a bit unpleasant:
>
> -ERROR: cannot use physical replication slot for logical decoding
> +ERROR: cannot get changes from replication slot "repl"
>
> The test is
> -- check that we're detecting a streaming rep slot used for logical decoding
> SELECT 'init' FROM pg_create_physical_replication_slot('repl');
> SELECT data FROM pg_logical_slot_get_changes('repl', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');

The message may be understood as "No change has been made since
restart_lsn". Does something like the following work?

ERROR: replication slot "repl" is not usable to get changes

By the way there are some other messages that doesn't render the
symptom but the cause.

"cannot use physical replication slot for logical decoding"
"replication slot \"%s\" was not created in this database"

Don't they need the same amendment?

> > > On the other hand, physical replication doesn't break by invlidation.
> > > [...]
> >
> > If we don't mind that standby can reconnect after a walsender
> > termination due to the invalidation, we don't need to do something for
> > this. Restricting max_slot_wal_keep_size to be larger than a certain
> > threshold would reduce the chance we see that behavior.
>
> Yeah, I think you're referring to the fact that StartReplication()
> doesn't verify the restart_lsn of the slot; and if we do add a check, a
> few tests that rely on physical replication start to fail. This patch
> only adds a comment in that spot. But I don't (yet) know what the
> consequences of this are, or whether it can be fixed by setting a valid
> restart_lsn ahead of time. This test in pg_basebackup fails, for
> example:
>
> # Running: pg_basebackup -D /home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl -X stream -S slot1
> pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: cannot read from replication slot "slot1"
> DETAIL: This slot has never previously reserved WAL, or has been invalidated
> pg_basebackup: error: child process exited with exit code 1
> pg_basebackup: removing data directory "/home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl"
> not ok 95 - pg_basebackup -X stream with replication slot runs
>
> # Failed test 'pg_basebackup -X stream with replication slot runs'
> # at t/010_pg_basebackup.pl line 461.
>
>
> Anyway I think the current patch can be applied as is -- and if we want
> physical replication to have some other behavior, we can patch for that
> afterwards.

Agreed here. The false-invalidation doesn't lead to any serious
consequences.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-28 08:18:15
Message-ID:	20200428.171815.1687900483771598932.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 27 Apr 2020 19:40:07 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> On 2020-Apr-08, Kyotaro Horiguchi wrote:
>
> > I understand how it happens.
> >
> > The latch triggered by checkpoint request by CHECKPOINT command has
> > been absorbed by ConditionVariableSleep() in
> > InvalidateObsoleteReplicationSlots. The attached allows checkpointer
> > use MyLatch for other than checkpoint request while a checkpoint is
> > running.
>
> Hmm, that explanation makes sense, but I couldn't reproduce it with the
> steps you provided. Perhaps I'm missing something.

Sorry for the incomplete reproducer. A checkpoint needs to be running
simultaneously for the manual checkpoint to hang up. The following is
the complete sequence.

1. Build a primary database cluster with the following setup, then start it.
max_slot_wal_keep_size=0
max_wal_size=32MB
min_wal_size=32MB

2. Build a replica from the primary creating a slot, then start it.

$ pg_basebackup -R -C -S s1 -D...

3. Try the following commands. Try several times if it succeeds.
=# create table tt(); drop table tt; select pg_switch_wal();checkpoint;

It is evidently stochastic, but it works quite reliably for me.

> Anyway I think this patch should fix it also -- instead of adding a new
> flag, we just rely on the existing flags (since do_checkpoint must have
> been set correctly from the flags earlier in that block.)

Since the added (!do_checkpoint) check is reached with
do_checkpoint=false at server start and at archive_timeout intervals,
the patch makes checkpointer run a busy-loop at that timings, and that
loop lasts until a checkpoint is actually executed.

What we need to do here is not forgetting the fact that the latch has
been set even if the latch itself gets reset before reaching to
WaitLatch.

> I think it'd be worth to verify this bugfix in a new test. Would you
> have time to produce that? I could try in a couple of days ...

The attached patch on 019_replslot_limit.pl does the commands above
automatically. It sometimes succeed but fails in most cases, at least
for me.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
TAP_checkpoint_freeze.patch	text/x-patch	1.6 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-28 16:29:41
Message-ID:	20200428162941.GA6196@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-28, Kyotaro Horiguchi wrote:

> At Mon, 27 Apr 2020 18:33:42 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> > On 2020-Apr-08, Kyotaro Horiguchi wrote:
> >
> > > At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in

> > Thanks for the fix! I propose two changes:
> >
> > 1. reword the error like this:
> >
> > ERROR: replication slot "regression_slot3" cannot be advanced
> > DETAIL: This slot has never previously reserved WAL, or has been invalidated
>
> Agreed to describe what is failed rather than the cause. However,
> logical replications slots are always "previously reserved" at
> creation.

Bah, of course. I was thinking in making the equivalent messages all
identical in all callsites, but maybe they should be different when
slots are logical. I'll go over them again.

> > 2. use the same error in one other place, to wit
> > pg_logical_slot_get_changes() and pg_replication_slot_advance(). I
> > made the DETAIL part the same in all places, but the ERROR line is
> > adjusted to what each callsite is doing.
> > I do think that this change in test_decoding is a bit unpleasant:
> >
> > -ERROR: cannot use physical replication slot for logical decoding
> > +ERROR: cannot get changes from replication slot "repl"
> >
> > The test is
> > -- check that we're detecting a streaming rep slot used for logical decoding
> > SELECT 'init' FROM pg_create_physical_replication_slot('repl');
> > SELECT data FROM pg_logical_slot_get_changes('repl', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
>
> The message may be understood as "No change has been made since
> restart_lsn". Does something like the following work?
>
> ERROR: replication slot "repl" is not usable to get changes

That wording seems okay, but my specific point for this error message is
that we were trying to use a physical slot to get logical changes; so
the fact that the slot has been invalidated is secondary and we should
complain about the *type* of slot rather than the restart_lsn.

> By the way there are some other messages that doesn't render the
> symptom but the cause.
>
> "cannot use physical replication slot for logical decoding"
> "replication slot \"%s\" was not created in this database"
>
> Don't they need the same amendment?

Maybe, but I don't want to start rewording every single message in uses
of replication slots ... I prefer to only modify the ones related to the
problem at hand.

> > > > On the other hand, physical replication doesn't break by invlidation.
> > > > [...]

> > Anyway I think the current patch can be applied as is -- and if we want
> > physical replication to have some other behavior, we can patch for that
> > afterwards.
>
> Agreed here. The false-invalidation doesn't lead to any serious
> consequences.

But does it? What happens, for example, if we have a slot used to get a
pg_basebackup, then time passes before starting to stream from it and is
invalidated? I think this "works fine" (meaning that once we try to
stream from the slot to replay at the restored base backup, we will
raise an error immediately), but I haven't tried.

The worst situation would be producing a corrupt replica. I don't think
this is possible.

The ideal behavior I think would be that pg_basebackup aborts
immediately when the slot is invalidated, to avoid wasting more time
producing a doomed backup.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-28 23:14:10
Message-ID:	20200428231410.GA9805@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-28, Kyotaro Horiguchi wrote:

> > Anyway I think this patch should fix it also -- instead of adding a new
> > flag, we just rely on the existing flags (since do_checkpoint must have
> > been set correctly from the flags earlier in that block.)
>
> Since the added (!do_checkpoint) check is reached with
> do_checkpoint=false at server start and at archive_timeout intervals,
> the patch makes checkpointer run a busy-loop at that timings, and that
> loop lasts until a checkpoint is actually executed.
>
> What we need to do here is not forgetting the fact that the latch has
> been set even if the latch itself gets reset before reaching to
> WaitLatch.

After a few more false starts :-) I think one easy thing we can do
without the additional boolean flag is to call SetLatch there in the
main loop if we see that ckpt_flags is nonzero.

(I had two issues with the boolean flag. One is that the comment in
ReqCheckpointHandler needed an update to, essentially, say exactly the
opposite of what it was saying; such a change was making me very
uncomfortable. The other is that the place where the flag was reset in
CheckpointerMain() was ... not really appropriate; or it could have been
appropriate if the flag was called, say, "CheckpointerMainNoSleepOnce".
Because "RequestPending" was the wrong name to use, because if the flag
was for really request pending, then we should reset it inside the "if
do_checkpoint" block .. but as I understand this would cause the
busy-loop behavior you described.)

> The attached patch on 019_replslot_limit.pl does the commands above
> automatically. It sometimes succeed but fails in most cases, at least
> for me.

With the additional SetLatch, the test passes reproducibly for me.
Before the patch, it failed ten out of ten times I ran it.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-Fix-checkpoint-signalling.patch	text/x-diff	2.9 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-29 00:47:10
Message-ID:	20200429004710.GA4742@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I pushed this one. Some closing remarks:

On 2020-Apr-28, Alvaro Herrera wrote:

> On 2020-Apr-28, Kyotaro Horiguchi wrote:

> > Agreed to describe what is failed rather than the cause. However,
> > logical replications slots are always "previously reserved" at
> > creation.
>
> Bah, of course. I was thinking in making the equivalent messages all
> identical in all callsites, but maybe they should be different when
> slots are logical. I'll go over them again.

I changed the ones that can only be logical slots so that they no longer
say "previously reserved WAL". The one in
pg_replication_slot_advance still uses that wording, because I didn't
think it was worth creating two separate error paths.

> > ERROR: replication slot "repl" is not usable to get changes
>
> That wording seems okay, but my specific point for this error message is
> that we were trying to use a physical slot to get logical changes; so
> the fact that the slot has been invalidated is secondary and we should
> complain about the *type* of slot rather than the restart_lsn.

I moved the check for validity to after CreateDecodingContext, so the
other errors are reported preferently. I also chose a different wording:

/*
* After the sanity checks in CreateDecodingContext, make sure the
* restart_lsn is valid. Avoid "cannot get changes" wording in this
* errmsg because that'd be confusingly ambiguous about no changes
* being available.
*/
if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("can no longer get changes from replication slot \"%s\"",
NameStr(*name)),
errdetail("This slot has never previously reserved WAL, or has been invalidated.")));

I hope this is sufficiently clear, but if not, feel free to nudge me and
we can discuss it further.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-29 22:58:16
Message-ID:	20200429225816.GA18918@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-28, Alvaro Herrera wrote:

> On 2020-Apr-28, Kyotaro Horiguchi wrote:
>
> > > Anyway I think this patch should fix it also -- instead of adding a new
> > > flag, we just rely on the existing flags (since do_checkpoint must have
> > > been set correctly from the flags earlier in that block.)
> >
> > Since the added (!do_checkpoint) check is reached with
> > do_checkpoint=false at server start and at archive_timeout intervals,
> > the patch makes checkpointer run a busy-loop at that timings, and that
> > loop lasts until a checkpoint is actually executed.
> >
> > What we need to do here is not forgetting the fact that the latch has
> > been set even if the latch itself gets reset before reaching to
> > WaitLatch.
>
> After a few more false starts :-) I think one easy thing we can do
> without the additional boolean flag is to call SetLatch there in the
> main loop if we see that ckpt_flags is nonzero.

I went back to "continue" instead of SetLatch, because it seems less
wasteful, but I changed the previously "do_checkpoint" condition to
rechecking ckpt_flags. We would not get in the busy loop in that case,
because the condition is true when the next loop would take action and
false otherwise. So I think this should fix the problem without causing
any other issues. But if you do see problems with this, please let us
know.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-30 01:25:20
Message-ID:	20200430.102520.937018442080739049.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for polishing and committing this.

At Tue, 28 Apr 2020 20:47:10 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> I pushed this one. Some closing remarks:
>
> On 2020-Apr-28, Alvaro Herrera wrote:
>
> > On 2020-Apr-28, Kyotaro Horiguchi wrote:
>
> > > Agreed to describe what is failed rather than the cause. However,
> > > logical replications slots are always "previously reserved" at
> > > creation.
> >
> > Bah, of course. I was thinking in making the equivalent messages all
> > identical in all callsites, but maybe they should be different when
> > slots are logical. I'll go over them again.
>
> I changed the ones that can only be logical slots so that they no longer
> say "previously reserved WAL". The one in
> pg_replication_slot_advance still uses that wording, because I didn't
> think it was worth creating two separate error paths.

Agreed.

> > > ERROR: replication slot "repl" is not usable to get changes
> >
> > That wording seems okay, but my specific point for this error message is
> > that we were trying to use a physical slot to get logical changes; so
> > the fact that the slot has been invalidated is secondary and we should
> > complain about the *type* of slot rather than the restart_lsn.
>
> I moved the check for validity to after CreateDecodingContext, so the
> other errors are reported preferently. I also chose a different wording:

Yes. It is what I had in my mind. The function checks invariable
properties of the slot, then the following code checks a variable
state of the same.

> /*
> * After the sanity checks in CreateDecodingContext, make sure the
> * restart_lsn is valid. Avoid "cannot get changes" wording in this
> * errmsg because that'd be confusingly ambiguous about no changes
> * being available.
> */
> if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("can no longer get changes from replication slot \"%s\"",
> NameStr(*name)),
> errdetail("This slot has never previously reserved WAL, or has been invalidated.")));
>
> I hope this is sufficiently clear, but if not, feel free to nudge me and
> we can discuss it further.

That somewhat sounds odd that 'we "no longer" get changes from "never
previously reserved" slots'. More than that, I think we don't reach
there for physical slots, since CreateDecodingContext doesn't accept a
physical slot and ERRORs out. (That is the reason for the location of
the checking.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-30 01:32:02
Message-ID:	20200430013202.GA22210@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Apr-30, Kyotaro Horiguchi wrote:

> At Tue, 28 Apr 2020 20:47:10 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in

> > /*
> > * After the sanity checks in CreateDecodingContext, make sure the
> > * restart_lsn is valid. Avoid "cannot get changes" wording in this
> > * errmsg because that'd be confusingly ambiguous about no changes
> > * being available.
> > */
> > if (XLogRecPtrIsInvalid(MyReplicationSlot->data.restart_lsn))
> > ereport(ERROR,
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > errmsg("can no longer get changes from replication slot \"%s\"",
> > NameStr(*name)),
> > errdetail("This slot has never previously reserved WAL, or has been invalidated.")));
> >
> > I hope this is sufficiently clear, but if not, feel free to nudge me and
> > we can discuss it further.
>
> That somewhat sounds odd that 'we "no longer" get changes from "never
> previously reserved" slots'. More than that, I think we don't reach
> there for physical slots, since CreateDecodingContext doesn't accept a
> physical slot and ERRORs out. (That is the reason for the location of
> the checking.)

Oh, right, so we could reword the errdetail to just "This slot has been
invalidated."

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	alvherre(at)2ndquadrant(dot)com
Cc:	jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-04-30 02:30:15
Message-ID:	20200430.113015.2265412284039773103.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 29 Apr 2020 18:58:16 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> On 2020-Apr-28, Alvaro Herrera wrote:
>
> > On 2020-Apr-28, Kyotaro Horiguchi wrote:
> >
> > > > Anyway I think this patch should fix it also -- instead of adding a new
> > > > flag, we just rely on the existing flags (since do_checkpoint must have
> > > > been set correctly from the flags earlier in that block.)
> > >
> > > Since the added (!do_checkpoint) check is reached with
> > > do_checkpoint=false at server start and at archive_timeout intervals,
> > > the patch makes checkpointer run a busy-loop at that timings, and that
> > > loop lasts until a checkpoint is actually executed.
> > >
> > > What we need to do here is not forgetting the fact that the latch has
> > > been set even if the latch itself gets reset before reaching to
> > > WaitLatch.
> >
> > After a few more false starts :-) I think one easy thing we can do
> > without the additional boolean flag is to call SetLatch there in the
> > main loop if we see that ckpt_flags is nonzero.
>
> I went back to "continue" instead of SetLatch, because it seems less
> wasteful, but I changed the previously "do_checkpoint" condition to
> rechecking ckpt_flags. We would not get in the busy loop in that case,
> because the condition is true when the next loop would take action and
> false otherwise. So I think this should fix the problem without causing
> any other issues. But if you do see problems with this, please let us
> know.

Checking ckpt_flags then continue makes sense to me.

Thanks for committing.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-05-17 01:00:05
Message-ID:	20200517010005.jzaf2245w4rrgs2o@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2020-04-29 18:58:16 -0400, Alvaro Herrera wrote:
> On 2020-Apr-28, Alvaro Herrera wrote:
>
> > On 2020-Apr-28, Kyotaro Horiguchi wrote:
> >
> > > > Anyway I think this patch should fix it also -- instead of adding a new
> > > > flag, we just rely on the existing flags (since do_checkpoint must have
> > > > been set correctly from the flags earlier in that block.)
> > >
> > > Since the added (!do_checkpoint) check is reached with
> > > do_checkpoint=false at server start and at archive_timeout intervals,
> > > the patch makes checkpointer run a busy-loop at that timings, and that
> > > loop lasts until a checkpoint is actually executed.
> > >
> > > What we need to do here is not forgetting the fact that the latch has
> > > been set even if the latch itself gets reset before reaching to
> > > WaitLatch.
> >
> > After a few more false starts :-) I think one easy thing we can do
> > without the additional boolean flag is to call SetLatch there in the
> > main loop if we see that ckpt_flags is nonzero.
>
> I went back to "continue" instead of SetLatch, because it seems less
> wasteful, but I changed the previously "do_checkpoint" condition to
> rechecking ckpt_flags. We would not get in the busy loop in that case,
> because the condition is true when the next loop would take action and
> false otherwise. So I think this should fix the problem without causing
> any other issues. But if you do see problems with this, please let us
> know.

I don't think this is quite sufficient:
I, independent of this patch, added a few additional paths in which
checkpointer's latch is reset, and I found a few shutdowns in regression
tests to be extremely slow / timing out. The reason for that is that
the only check for interrupts is at the top of the loop. So if
checkpointer gets SIGUSR2 we don't see ShutdownRequestPending until we
decide to do a checkpoint for other reasons.

I also suspect that it could have harmful consequences to not do a
AbsorbSyncRequests() if something "ate" the set latch.

I don't think it's reasonable to expect this much code between a
ResetLatch and WaitLatch to never reset a latch. So I think we need to
make the coding more robust in face of that. Without having to duplicate
the top and the bottom of the loop.

One way to do that would be to WaitLatch() call to much earlier, and
only do a WaitLatch() if do_checkpoint is false. Roughly like in the
attached.

Greetings,

Andres Freund

Attachment	Content-Type	Size
checkpointer-latch-bug.diff	text/x-diff	3.3 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-05-17 02:51:50
Message-ID:	20200517025150.GA12478@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-May-16, Andres Freund wrote:

> I, independent of this patch, added a few additional paths in which
> checkpointer's latch is reset, and I found a few shutdowns in regression
> tests to be extremely slow / timing out. The reason for that is that
> the only check for interrupts is at the top of the loop. So if
> checkpointer gets SIGUSR2 we don't see ShutdownRequestPending until we
> decide to do a checkpoint for other reasons.

Ah, yeah, this seems a genuine bug.

> I also suspect that it could have harmful consequences to not do a
> AbsorbSyncRequests() if something "ate" the set latch.

I traced through this when looking over the previous fix, and given that
checkpoint execution itself calls AbsorbSyncRequests frequently, I
don't think this one qualifies as a bug.

> I don't think it's reasonable to expect this much code between a
> ResetLatch and WaitLatch to never reset a latch. So I think we need to
> make the coding more robust in face of that. Without having to duplicate
> the top and the bottom of the loop.

That makes sense to me.

> One way to do that would be to WaitLatch() call to much earlier, and
> only do a WaitLatch() if do_checkpoint is false. Roughly like in the
> attached.

Hm. I'd do "WaitLatch() / continue" in the "!do_checkpoint" block, and
put the checpkoint code not in the else block; seems easier to read to
me.

While we're here, can we change CreateCheckPoint to return true so
that we can do

ckpt_performed = do_restartpoint ? CreateRestartPoint(flags) : CreateCheckPoint(flags);
instead of the mess we have there now? (Also add a comment that
CreateCheckPoint must not return false, to avoid messing with the
schedule)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-05-17 03:23:01
Message-ID:	20200517032301.ddzwnqq7szkbdn7y@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2020-05-16 22:51:50 -0400, Alvaro Herrera wrote:
> On 2020-May-16, Andres Freund wrote:
>
> > I, independent of this patch, added a few additional paths in which
> > checkpointer's latch is reset, and I found a few shutdowns in regression
> > tests to be extremely slow / timing out. The reason for that is that
> > the only check for interrupts is at the top of the loop. So if
> > checkpointer gets SIGUSR2 we don't see ShutdownRequestPending until we
> > decide to do a checkpoint for other reasons.
>
> Ah, yeah, this seems a genuine bug.
>
> > I also suspect that it could have harmful consequences to not do a
> > AbsorbSyncRequests() if something "ate" the set latch.
>
> I traced through this when looking over the previous fix, and given that
> checkpoint execution itself calls AbsorbSyncRequests frequently, I
> don't think this one qualifies as a bug.

There's no AbsorbSyncRequests() after CheckPointBuffers(), I think. And
e.g. CheckPointTwoPhase() could take a while. Which then would mean that
we'd potentially not AbsorbSyncRequests() until checkpoint_timeout
causes us to wake up. Am I missing something?

> > One way to do that would be to WaitLatch() call to much earlier, and
> > only do a WaitLatch() if do_checkpoint is false. Roughly like in the
> > attached.
>
> Hm. I'd do "WaitLatch() / continue" in the "!do_checkpoint" block, and
> put the checpkoint code not in the else block; seems easier to read to
> me.

Yea, that'd probably be better. I was also pondering if we shouldn't
just move the checkpoint code into, gasp, it's own function ;)

Greetings,

Andres Freund

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-05-17 07:02:49
Message-ID:	20200517070249.GA21156@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-May-16, Andres Freund wrote:

> Hi,
>
> On 2020-05-16 22:51:50 -0400, Alvaro Herrera wrote:
> > On 2020-May-16, Andres Freund wrote:
> >
> > > I, independent of this patch, added a few additional paths in which
> > > checkpointer's latch is reset, and I found a few shutdowns in regression
> > > tests to be extremely slow / timing out. The reason for that is that
> > > the only check for interrupts is at the top of the loop. So if
> > > checkpointer gets SIGUSR2 we don't see ShutdownRequestPending until we
> > > decide to do a checkpoint for other reasons.
> >
> > Ah, yeah, this seems a genuine bug.
> >
> > > I also suspect that it could have harmful consequences to not do a
> > > AbsorbSyncRequests() if something "ate" the set latch.
> >
> > I traced through this when looking over the previous fix, and given that
> > checkpoint execution itself calls AbsorbSyncRequests frequently, I
> > don't think this one qualifies as a bug.
>
> There's no AbsorbSyncRequests() after CheckPointBuffers(), I think. And
> e.g. CheckPointTwoPhase() could take a while. Which then would mean that
> we'd potentially not AbsorbSyncRequests() until checkpoint_timeout
> causes us to wake up. Am I missing something?

True. There's no delay like CheckpointWriteDelay in that code though,
so the "a while" is much smaller. My understanding of these sync
requests is that they're not for immediate processing anyway -- I mean
it's okay for checkpointer to take a bit of time before syncing ... or
am I mistaken? (If another sync request is queued and the queue hasn't
been emptied, that would set the latch again, so it's not like this
could fill the queue arbitrarily.)

> > > One way to do that would be to WaitLatch() call to much earlier, and
> > > only do a WaitLatch() if do_checkpoint is false. Roughly like in the
> > > attached.
> >
> > Hm. I'd do "WaitLatch() / continue" in the "!do_checkpoint" block, and
> > put the checpkoint code not in the else block; seems easier to read to
> > me.
>
> Yea, that'd probably be better. I was also pondering if we shouldn't
> just move the checkpoint code into, gasp, it's own function ;)

That might work :-)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-05-18 23:44:59
Message-ID:	20200518234459.GA1850@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

BTW while you're messing with checkpointer, I propose this patch to
simplify things.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-CreateCheckPoint-return-bool.patch	text/x-diff	2.1 KB

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-05-19 02:43:57
Message-ID:	20200519024357.GB11835@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 18, 2020 at 07:44:59PM -0400, Alvaro Herrera wrote:
> BTW while you're messing with checkpointer, I propose this patch to
> simplify things.

It seems to me that this would have a benefit if we begin to have a
code path in CreateCheckpoint() where where it makes sense to let the
checkpointer know that no checkpoint has happened, and now we assume
that a skipped checkpoint is a performed one. As that's not the case
now, I would vote for keeping the code as-is.
--
Michael

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-05-19 03:46:49
Message-ID:	20200519034649.GA8356@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-May-19, Michael Paquier wrote:

> On Mon, May 18, 2020 at 07:44:59PM -0400, Alvaro Herrera wrote:
> > BTW while you're messing with checkpointer, I propose this patch to
> > simplify things.
>
> It seems to me that this would have a benefit if we begin to have a
> code path in CreateCheckpoint() where where it makes sense to let the
> checkpointer know that no checkpoint has happened, and now we assume
> that a skipped checkpoint is a performed one.

Well, my first attempt at this was returning false in that case, until I
realized that it would break the scheduling algorithm.

> As that's not the case now, I would vote for keeping the code as-is.

The presented patch doesn't have any functional impact; it just writes
the same code in a more concise way. Like you, I wouldn't change this
if we didn't have a reason to rewrite this section of code.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com
Subject:	Re: [HACKERS] Restricting maximum keep segments by repslots
Date:	2020-06-20 23:06:27
Message-ID:	20200620230627.GA17995@telsasoft.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Minor language tweak:

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7050ce6e2e..08142d64cb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3800,8 +3800,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
slots</link> are allowed to retain in the <filename>pg_wal</filename>
directory at checkpoint time.
If <varname>max_slot_wal_keep_size</varname> is -1 (the default),
replication slots {+may+} retain {+an+} unlimited amount of WAL files. [-If-]{+Otherwise, if+}
restart_lsn of a replication slot [-gets-]{+falls+} behind {+by+} more than [-that megabytes-]{+the given size+}
from the current LSN, the standby using the slot may no longer be able
to continue replication due to removal of required WAL files. You
can see the WAL availability of replication slots

Attachment	Content-Type	Size
v1-0001-Tweak-user-facing-doc-in-c6550776394e25c1620bc825.patch	text/x-diff	1.3 KB