From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz> |
Subject: | Re: [BUG] non archived WAL removed during production crash recovery |
Date: | 2020-04-01 08:27:22 |
Message-ID: | d0ef12c5-7e36-b9b3-c8ea-2e98d69d1317@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs Postg토토 커뮤니티SQL |
On 2020/04/01 0:22, Jehan-Guillaume de Rorthais wrote:
> Hello,
>
> A colleague of mine reported an expected behavior.
>
> On production cluster is in crash recovery, eg. after killing a backend, the
> WALs ready to be archived are removed before being archived.
>
> See in attachment the reproduction script "non-arch-wal-on-recovery.bash".
>
> This behavior has been introduced in 78ea8b5daab9237fd42d7a8a836c1c451765499f.
> Function XLogArchiveCheckDone() badly consider the in crashed recovery
> production cluster as a standby without archive_mode=always. So the check
> conclude the WAL can be removed safely.
Thanks for the report! Yeah, this seems a bug.
> bool inRecovery = RecoveryInProgress();
>
> /*
> * The file is always deletable if archive_mode is "off". On standbys
> * archiving is disabled if archive_mode is "on", and enabled with
> * "always". On a primary, archiving is enabled if archive_mode is "on"
> * or "always".
> */
> if (!((XLogArchivingActive() && !inRecovery) ||
> (XLogArchivingAlways() && inRecovery)))
> return true;
>
> Please find in attachment a patch that fix this issue using the following test
> instead:
>
> if (!((XLogArchivingActive() && !StandbyModeRequested) ||
> (XLogArchivingAlways() && inRecovery)))
> return true;
>
> I'm not sure if we should rely on StandbyModeRequested for the second part of
> the test as well thought. What was the point to rely on RecoveryInProgress() to
> get the recovery status from shared mem?
Since StandbyModeRequested is the startup process-local variable,
it basically cannot be used in XLogArchiveCheckDone() that other process
may call. So another approach would be necessary... One straight idea is to
add new shmem flag indicating whether we are in standby mode or not.
Another is to make the startup process remove .ready file if necessary.
If it's not easy to fix the issue, we might need to just revert the commit
that introduced the issue at first.
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2020-04-01 08:51:56 | BUG #16331: segfault in checkpointer with full disk |
Previous Message | Michael Paquier | 2020-04-01 07:08:42 | Re: BUG #16325: Assert failure on partitioning by int for a text value with a collation |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-04-01 08:37:23 | Re: shared-memory based stats collector |
Previous Message | Michael Paquier | 2020-04-01 08:18:28 | Re: pgbench - add \aset to store results of a combined query |