Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date: 2024-11-11 22:41:47
Message-ID: CAD21AoD6g1rjsbt6bKp6uAyEskKPEDehS=TyUV-cTVzHc9OkEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> If this analysis is correct, I think it's rather suspicious we don't
> reset the candidate fields on restart. Can those "old" values ever be
> valid? But I haven't tried resetting them.

I had the same question. IIRC resetting them also fixes the
problem[1]. However, I got a comment from Alvaro[2]:

Hmm, interesting -- I was studying some other bug recently involving the
xmin of a slot that had been invalidated and I remember wondering if
these "candidate" fields were being properly ignored when the slot is
marked not in use; but I didn't check. Are you sure that resetting them
when the slot is released is the appropriate thing to do? I mean,
shouldn't they be kept set while the slot is in use, and only reset if
we destroy it?

Which made me re-investigate the issue and thought that it doesn't
necessarily need to clear these candidate values in memory on
releasing a slot as long as we're carefully updating restart_lsn.
Which seems a bit efficient for example when restarting from a very
old point. Of course, even if we reset them on releasing a slot, it
would perfectly work since it's the same as restarting logical
decoding with a server restart. I think
LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems
not to be working expectedly, but I could not have proof that we
should either keep or reset them on releasing a slot.

Regards,

[1] /message-id/CAD21AoBG2OSDOFTtpPtQ7fx5Vt8p3dS5hPAv28CBSC6z2kHx-g%40mail.gmail.com
[2] /message-id/20230206152209.yglmntznhcmaueyn%40alvherre.pgsql

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-11-11 23:17:44 Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Previous Message Tomas Vondra 2024-11-11 22:10:32 Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4