From: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz>, bungina(at)gmail(dot)com |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, cyberdemn(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Subject: | Re: pg_rewind WAL segments deletion pitfall |
Date: | 2023-08-23 09:04:18 |
Message-ID: | 41e7c31b85aad03a4a2cd14daad31acd@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs Postg토토 꽁 머니SQL |
On 2023-08-22 14:32, Michael Paquier wrote:
Thanks for your review!
> On Fri, Aug 18, 2023 at 03:40:57PM +0900, torikoshia wrote:
>> Thanks for the patch, I've marked this as ready-for-committer.
>>
>> BTW, this issue can be considered a bug, right?
>> I think it would be appropriate to provide backpatch.
>
> Hmm, I agree that there is a good argument in back-patching as we have
> the WAL files between the redo LSN and the divergence LSN, but
> pg_rewind is not smart enough to keep them around. If the archives of
> the primary were not able to catch up, the old primary is as good as
> kaput, and restore_command won't help here.
True.
I also imagine that in the typical failover scenario where the target
cluster was shut down soon after the divergence and pg_rewind was
executed without much time, we can avoid this kind of 'requested WAL
segment has already removed' error by preventing pg_rewind from deleting
necessary WALs.
> I don't like much this patch. While it takes correctly advantage of
> the backward record read logic from SimpleXLogPageRead() able to
> handle correctly timeline jumps, it creates a hidden dependency in the
> code between the hash table from filemap.c and the page callback.
> Wouldn't it be simpler to build a list of the segment names using the
> information from WALOpenSegment and build this list in
> findLastCheckpoint()? Also, I am wondering if we should be smarter
> with any potential conflict handling between the source and the
> target, rather than just enforcing a FILE_ACTION_NONE for all these
> files. In short, could it be better to copy the WAL file from the
> source if it exists there?
>
> + /*
> + * Some entries (WAL segments) already have an action assigned
> + * (see SimpleXLogPageRead()).
> + */
> + if (entry->action == FILE_ACTION_UNDECIDED)
> + entry->action = decide_file_action(entry);
>
> This change makes me a bit uneasy, per se my previous comment with the
> additional code dependencies.
>
> I think that this scenario deserves a test case. If one wants to
> emulate a delay in WAL archiving, it is possible to set
> archive_command to a command that we know will fail, for instance.
> --
> Michael
Bungina, are you going to respond to these comments?
--
Regards,
--
Atsushi Torikoshi
NTT DATA Group Corporation
From | Date | Subject | |
---|---|---|---|
Next Message | kaijian xu | 2023-08-23 11:40:32 | Re: BUG #18065: An error occurred when attempting to add a column of type "vector" to a table named "vector". |
Previous Message | Amit Kapila | 2023-08-23 08:46:32 | Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder() |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2023-08-23 09:30:52 | Re: EBCDIC sorting as a use case for ICU rules |
Previous Message | vignesh C | 2023-08-23 08:57:11 | Re: persist logical slots to disk during shutdown checkpoint |