From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | michael(at)paquier(dot)xyz |
Cc: | jgdr(at)dalibo(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: [BUG] non archived WAL removed during production crash recovery |
Date: | 2020-04-21 03:09:25 |
Message-ID: | 20200421.120925.76453535156648004.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | Postg토토 꽁 머니SQL : Postg토토 꽁 머니SQL 메일 링리스트 : 2020-04-21 이후 PGSQL-BUGS Postg사설 토토 사이트SQL |
At Tue, 21 Apr 2020 11:15:01 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Mon, Apr 20, 2020 at 02:22:35PM +0200, Jehan-Guillaume de Rorthais wrote:
> > The problem is that we would have to read the controldata file each time we
> > wonder if a segment should be archived/removed. Moreover, the controldata
> > file might not be in sync quickly enough with the real state for some other
> > code path or futur needs.
>
> I don't think that this is what Horiguchi-san meant here. What I got
> from his previous message would be to be to copy the shared value from
> the control file when necessary, and have the shared state use only a
> subset of the existing values of DBState, aka:
> - DB_IN_CRASH_RECOVERY
> - DB_IN_ARCHIVE_RECOVERY
> - DB_IN_PRODUCTION
First I thought as above, but I thought that we could use
ControlFile->state itself in this case, by regarding the symbols less
than DB_IN_CRASH_RECOVERY as RECOVERY_STATE_CRASH. I don't think
there's no problem if the update to DB_IN_ARCHIVE_RECOVERY reaches
checkpointer with some delay.
> Still, that sounds wrong to me because then somebody would be tempted
> to change the shared value thinking that things like DB_SHUTDOWNING,
> DB_SHUTDOWNED_* or DB_STARTUP are valid but we don't want that here.
That is not an issue if we just use DBState to know whether we have
started archive recovery.
> Note that there may be a case for DB_STARTUP to be used in
> XLOGShmemInit(), but I'd rather let the code use the safest default,
> DB_IN_CRASH_RECOVERY to control that we won't remove .ready files by
> default until the startup process sees fit to do the actual switch
> depending on the checkpoint record lookup, if archive recovery was
> actually requested, etc.
I'm not sure I read this correctly. But I think I agree to this.
+ if (!XLogArchivingAlways() &&
+ GetRecoveryState() == RECOVERY_STATE_ARCHIVE)
Is rewritten as
+ if (!XLogArchivingAlways() &&
+ GetDBState() > DB_IN_CRASH_RECOVERY)
FWIW, what annoyed me is there are three variables that are quite
similar but has different domains, ControlFile->state,
XLogCtl->SharedRecoveryState, and LocalRecoveryInProgress. I didn't
mind there were two, but three seems a bit too many to me.
But it may be different issue.
> > Indeed, Benoît Lobréau reported this behavior to me.
>
> Noted. Thanks for the information. I don't think that I have ever
> met Benoît in person, do I? Tell him that I owe him one beer or a
> beverage of his choice when we meet IRL, and that he had better use
> this message-id to make me keep my promise :)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | cbw | 2020-04-21 04:07:49 | Backend stuck in tirigger.c:afterTriggerInvokeEvents forever |
Previous Message | Michael Paquier | 2020-04-21 02:15:01 | Re: [BUG] non archived WAL removed during production crash recovery |
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2020-04-21 04:24:29 | Re: fixing old_snapshot_threshold's time->xid mapping |
Previous Message | David Rowley | 2020-04-21 03:03:18 | Re: Parallel Append can break run-time partition pruning |