pgsql: Tolerate timeline switches while "pg_basebackup -X fetch" is run

Lists: pgsql-committerspgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Tolerate timeline switches while "pg_basebackup -X fetch" is run
Date: 2013-01-03 18:01:41
Message-ID: E1Tqp6r-00025f-GJ@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tolerate timeline switches while "pg_basebackup -X fetch" is running.

If you take a base backup from a standby server with "pg_basebackup -X
fetch", and the timeline switches while the backup is being taken, the
backup used to fail with an error "requested WAL segment %s has already
been removed". This is because the server-side code that sends over the
required WAL files would not construct the WAL filename with the correct
timeline after a switch.

Fix that by using readdir() to scan pg_xlog for all the WAL segments in the
range, regardless of timeline.

Also, include all timeline history files in the backup, if taken with
"-X fetch". That fixes another related bug: If a timeline switch happened
just before the backup was initiated in a standby, the WAL segment
containing the initial checkpoint record contains WAL from the older
timeline too. Recovery will not accept that without a timeline history file
that lists the older timeline.

Backpatch to 9.2. Versions prior to that were not affected as you could not
take a base backup from a standby before 9.2.

Branch
------
REL9_2_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/b4c99c9af379157a6224b0a4c01da22192633adf

Modified Files
--------------
src/backend/access/transam/xlog.c | 27 +++-
src/backend/replication/basebackup.c | 234 ++++++++++++++++++++++++++++------
src/backend/replication/walsender.c | 15 +--
src/include/access/xlog.h | 2 +-
4 files changed, 217 insertions(+), 61 deletions(-)


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Tolerate timeline switches while "pg_basebackup -X fetch" is run
Date: 2013-01-29 18:55:44
Message-ID: 20130129185544.GE3945@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Heikki Linnakangas wrote:
> Tolerate timeline switches while "pg_basebackup -X fetch" is running.

I just noticed that this commit introduced a few error messages that
have a file argument which is not properly quoted:

+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("requested WAL segment %s has already been removed",
+ filename)));

+ ereport(ERROR,
+ (errmsg("could not find WAL file %s", startfname)));

The first one seems to come from e57cd7f0a16, which is pretty old so
it's a bit strange that no one noticed.

Not sure what to do here ... should we just update everything including
the back branches, or just leave them alone and touch master only?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [COMMITTERS] pgsql: Tolerate timeline switches while "pg_basebackup -X fetch" is run
Date: 2013-02-01 19:46:52
Message-ID: CA+TgmoZoJM+1_mzmKHczz_94bro559OrZdn0-1-dOpmos+Obiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Jan 29, 2013 at 1:55 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Heikki Linnakangas wrote:
>> Tolerate timeline switches while "pg_basebackup -X fetch" is running.
>
> I just noticed that this commit introduced a few error messages that
> have a file argument which is not properly quoted:
>
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("requested WAL segment %s has already been removed",
> + filename)));
>
> + ereport(ERROR,
> + (errmsg("could not find WAL file %s", startfname)));
>
> The first one seems to come from e57cd7f0a16, which is pretty old so
> it's a bit strange that no one noticed.
>
> Not sure what to do here ... should we just update everything including
> the back branches, or just leave them alone and touch master only?

-1 from me on any message changes in the back-branches. It's not
worth confusing large parsing software that's already out there, and
it's definitely not worth forcing people to make the regex contingent
on which *minor* version is in use. But +1 for making it consistent
in HEAD.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company