Re: Data-only pg_rewind, take 2

Lists: pgsql-hackers
From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Data-only pg_rewind, take 2
Date: 2019-03-17 13:00:57
Message-ID: CAN-RpxCT4uH4nfrx4k3h+5TAJHGUDL0hqhaTAXSzmzVtLGfp8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi;

Attached is my second attempt at the pg_rewind change which allows one to
include only a minimal set. To my understanding, all past feedback has
been addressed.

The current patch does not change default behavior at present. It does add
a --data-only flag which allows pg_rewind to only rewind minimal files to
work. I believe this would generally be a good practice though maybe there
is some disagreement on that.

I have not run pg_indent because of the large number of other changes but
if that is desired at some point I can do that.

I also added test cases and some docs. I don't know if the docs are
sufficient. Feedback is appreciated. This is of course submitted for v13.

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Attachment Content-Type Size
rewind_data_only_mode.patch application/octet-stream 15.4 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data-only pg_rewind, take 2
Date: 2019-03-18 04:09:42
Message-ID: 20190318040942.GD1885@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 17, 2019 at 09:00:57PM +0800, Chris Travers wrote:
> I also added test cases and some docs. I don't know if the docs are
> sufficient. Feedback is appreciated.

To be honest, I don't think that this approach is a good idea per the
same reasons as mentioned the last time, as this can cause pg_rewind
to break if any newly-added folder in the data directory has
non-replaceable data which is needed at the beginning of recovery and
cannot be automatically rebuilt. So that's one extra maintenance
burden to worry about.

Here is the reference of the last thread about the same topic:
/message-id/CAN-RpxD8Y7hMOjzd93hOqV6n8kPEo5cmW9gYm+8JirTPiFnmmQ@mail.gmail.com
--
Michael


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Chris Travers <chris(dot)travers(at)adjust(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data-only pg_rewind, take 2
Date: 2019-03-18 06:32:20
Message-ID: 20190318063219.GH6197@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greetings,

* Michael Paquier (michael(at)paquier(dot)xyz) wrote:
> On Sun, Mar 17, 2019 at 09:00:57PM +0800, Chris Travers wrote:
> > I also added test cases and some docs. I don't know if the docs are
> > sufficient. Feedback is appreciated.
>
> To be honest, I don't think that this approach is a good idea per the
> same reasons as mentioned the last time, as this can cause pg_rewind
> to break if any newly-added folder in the data directory has
> non-replaceable data which is needed at the beginning of recovery and
> cannot be automatically rebuilt. So that's one extra maintenance
> burden to worry about.

The right approach to deal with that is to have a canonical list of
those, isn't it? So that we have one place to update that takes care to
make sure that all of the tools realize what's actually needed.

In general, I agree completely with Chris on the reasoning behind this
patch and that we really should try to avoid copying random files and
directories that have shown up in the data directory during a pg_rewind.
Having regular expressions and other such things just strike me as a
really bad idea for a low-level tool like pg_rewind- if users have
dropped other stuff in the data directory that they want copied around
between systems then it should be on them to make that happen, not
expect pg_rewind to copy them..

Thanks!

Stephen


From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data-only pg_rewind, take 2
Date: 2019-03-18 07:45:44
Message-ID: CAN-RpxC2Fn2eZnpa6J9DB-Bgt_8K-z816Ys7B=j3zf1Rx30GqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 18, 2019 at 4:09 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Sun, Mar 17, 2019 at 09:00:57PM +0800, Chris Travers wrote:
> > I also added test cases and some docs. I don't know if the docs are
> > sufficient. Feedback is appreciated.
>
> To be honest, I don't think that this approach is a good idea per the
> same reasons as mentioned the last time, as this can cause pg_rewind
> to break if any newly-added folder in the data directory has
> non-replaceable data which is needed at the beginning of recovery and
> cannot be automatically rebuilt. So that's one extra maintenance
> burden to worry about.
>

Actually I think this is safe. Let me go through the cases not handled in
the current behavior at all:

1. For rpms we distribute, we clobber db logs, which means we overwrite
application logs on the failed system with copes of logs from a replica.
This means that after you rewind, you lose the ability to figure out what
went wrong. This is an exquisitely bad idea unless you like data loss, and
since this location is configurable you can't just say "well we put our
logs here so we are excluding them." Making this configured per rewind run
strikes me as error-prone and something that will may lead to hidden
interference with postmortems in the future, and postmortems are vitally
important in terms of running database clusters with any sort of
reliability guarantees.

2. With the PostgreSQL.conf.auto now having recovery.conf info, you have
some very significant failure cases with regard to replication and
accidentally clobbering these.

On to the corner cases with --data-only enabled and the implications as I
see them since this preserves files on the old master but does not copy
them from the replica:

1. If the changes are not wal logged (let's say CSVs created using a file
foreign data wrapper), then deleting the files on rewind is where you can
lose data, and --data-only avoids this, so here you *avoid* data loss where
you put state files on the systems and do not rewind them because they were
not wal-logged. However the files still exist on the old master and are
not deleted, so the data can easily be restored at that point. Now, we can
say, probably, that putting data files in $PGDATA that are not wal-logged
is a bad idea. But even if you put them somewhere else, pg_rewind isn't
going to magically move them over to the replica for you.

2. If the changes *are* wal-logged, then you have a problem with
--data-only which is not present without it, namely that files can get out
of sync with their wal-logged updates. So in this case, --data-dir is
*not* safe.

So here I think we have to issue a choice. For now I don't feel
comfortable changing the default behavior, but the default behavior could
cause data loss in certain cases (including the ones I think you are
concerned about). Maybe it would be better if I document the above points?

>
> Here is the reference of the last thread about the same topic:
>
> /message-id/CAN-RpxD8Y7hMOjzd93hOqV6n8kPEo5cmW9gYm+8JirTPiFnmmQ@mail.gmail.com
> --
> Michael
> -----BEGIN PGP SIGNATURE-----
>
> iQIzBAABCgAdFiEEG72nH6vTowiyblFKnvQgOdbyQH0FAlyPGgYACgkQnvQgOdby
> QH0ekRAAiXcZRcDZwwwdbdlIpkniE/SuG5gaS7etUcAW88m8Vts5r4QoAEwUwGhg
> EZzuOb77OKvti7lmOZkBgC0VB1PmFku+mIdqJtzvdcSDdlOkABcLaw4JRrm//2/7
> jAi5Jw4um1EAz38dZXcWYwORavyo/4tR2S1PCyBA35F704w2NILAEDiq233P/ALf
> M3cOjgwiFIPf0v9PJIfYsl56sIwqW4rofPH63V6teaz5W8Qf2zHSsG5CeNqnEix0
> QZwwlzuhtAUYINab3oN3qMtF2q9vzJWCoSprzxx1qYrzPHEX8EMot0+L7YPdaAp0
> xyiUKSzy1rXtpoW0rsJ7w5bdrh1gS7HzprCEtqRZGe6NlVDcNjXfJIG9sT6hMWYS
> GTNbVH5VpKziw3byT8JpyqR38+iFqeXoLd1PEVadYjP62qOWbK8P2wokQwM+7EcK
> Hpr8jrvgV5x8IEnhR4bPyTqjORCJMBGTXCNgT99cPYpuVSasr/0IsBC/RtmQfRB9
> xhK0/qp5koQbX+mbLK11XsaFS9JAL2DNmSQg8TqICtV3bb0UTThs331XgjEjlOpm
> 1RjM6Tzwqq2is04mkkT+DtRAOclQuL8wWJWU5rr4fMKHCeFxtvUfwTyKlo2u+mI0
> x7YZhd4AFCM14ga2Ko/qiGqeOWR5Y0RvYANmnmjG5bxQGi+Dtek=
> =LNZB
> -----END PGP SIGNATURE-----
>

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data-only pg_rewind, take 2
Date: 2019-07-08 07:04:05
Message-ID: CA+hUKGKzZoZkg0EJiEdT3CewTZqgYO=jVWGPE4Y3rrTZXM8HfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 18, 2019 at 8:46 PM Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
> On Mon, Mar 18, 2019 at 4:09 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> On Sun, Mar 17, 2019 at 09:00:57PM +0800, Chris Travers wrote:
>> > I also added test cases and some docs. I don't know if the docs are
>> > sufficient. Feedback is appreciated.
>>
>> To be honest, I don't think that this approach is a good idea per the
>> same reasons as mentioned the last time, as this can cause pg_rewind
>> to break if any newly-added folder in the data directory has
>> non-replaceable data which is needed at the beginning of recovery and
>> cannot be automatically rebuilt. So that's one extra maintenance
>> burden to worry about.
>
> Actually I think this is safe. Let me go through the cases not handled in the current behavior at all:

Hi Chris,

Could you please post a rebase? This has fairly thoroughly bitrotted.
The Commitfest is here, so now would be an excellent time for people
to be able to apply and test the patch.

Thanks,

--
Thomas Munro
https://enterprisedb.com


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data-only pg_rewind, take 2
Date: 2019-08-01 23:07:12
Message-ID: CA+hUKGKauyfUs4=8oyZxcEv=g3sxL4ET9Ybzbf9HdQ0ce=BC7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 8, 2019 at 7:04 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Mon, Mar 18, 2019 at 8:46 PM Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
> > On Mon, Mar 18, 2019 at 4:09 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >> On Sun, Mar 17, 2019 at 09:00:57PM +0800, Chris Travers wrote:
> >> > I also added test cases and some docs. I don't know if the docs are
> >> > sufficient. Feedback is appreciated.
> >>
> >> To be honest, I don't think that this approach is a good idea per the
> >> same reasons as mentioned the last time, as this can cause pg_rewind
> >> to break if any newly-added folder in the data directory has
> >> non-replaceable data which is needed at the beginning of recovery and
> >> cannot be automatically rebuilt. So that's one extra maintenance
> >> burden to worry about.
> >
> > Actually I think this is safe. Let me go through the cases not handled in the current behavior at all:
>
> Hi Chris,
>
> Could you please post a rebase? This has fairly thoroughly bitrotted.
> The Commitfest is here, so now would be an excellent time for people
> to be able to apply and test the patch.

Hi Chris,

I set this to "Returned with feedback" due to lack of response. If
you'd prefer to move it to the next CF instead because you're planning
to work on it in time for the September CF, that might still be
possible, otherwise of course please create a new entry when you're
ready. Thanks!

--
Thomas Munro
https://enterprisedb.com