Re: Some other things about contrib/bloom and generic_xlog.c

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-10 04:49:18
Message-ID: 19594.1460263758@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

1. It doesn't seem like generic_xlog.c has thought very carefully about
the semantics of the "hole" between pd_lower and pd_upper. The mainline
XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
for example RestoreBlockImage() explicitly zeroes the hole when restoring
from a full-page image that has a hole. But generic_xlog.c's redo routine
does not do anything comparable, nor does GenericXLogFinish make any
effort to ensure that the "hole" is all-zeroes after normal application of
a generic update. The reason this is of interest is that it means the
contents of the "hole" could diverge between master and slave, or differ
between the original state of a database and what it is after a crash and
recovery. That would at least complicate forensic comparisons of pages,
and I think it might also break checksumming. We thought that this was
important enough to take the trouble of explicitly zeroing holes during
mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?

2. Unless I'm missing something, contrib/bloom is making no effort
to ensure that bloom index pages can be distinguished from other pages
by pg_filedump. That's fine if your expectation is that bloom will always
be a toy with no use in production; but otherwise, not so much. You need
to make sure that the last two bytes of the page special space contain a
uniquely identifiable code; compare spgist_page_id in SPGiST indexes.

regards, tom lane


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-11 07:44:12
Message-ID: CAB7nPqSqGb6Eo2JTGV2in9Qd430AA6Mfq0N3eJrJM7V-AY2eXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 10, 2016 at 1:49 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> 1. It doesn't seem like generic_xlog.c has thought very carefully about
> the semantics of the "hole" between pd_lower and pd_upper. The mainline
> XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
> for example RestoreBlockImage() explicitly zeroes the hole when restoring
> from a full-page image that has a hole.

Yes.

> But generic_xlog.c's redo routine
> does not do anything comparable, nor does GenericXLogFinish make any
> effort to ensure that the "hole" is all-zeroes after normal application of
> a generic update. The reason this is of interest is that it means the
> contents of the "hole" could diverge between master and slave, or differ
> between the original state of a database and what it is after a crash and
> recovery. That would at least complicate forensic comparisons of pages,
> and I think it might also break checksumming. We thought that this was
> important enough to take the trouble of explicitly zeroing holes during
> mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?

One look at checksum_impl.h shows up that the page hole is taken into
account in the checksum calculation, so we definitely has better zero
the page. And looking at generic_xlog.c, this code either logs in a
full page, or a delta.

Actually, as I look at this code for the first time, I find that
GenericXLogFinish() is a very confusing interface. It makes no
distinction between a page and the meta data associated to this page,
this is controlled by a flag isNew and buffer data is saved as some
delta. Actually, fullmage is not exactly true, this may refer to a
page that has a hole. It seems to me that we should not have one but
two routines: GenericXLogRegisterBuffer and
GenericXLogRegisterBufData, similar to what the normal XLOG routines
offer.
--
Michael


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-11 14:22:48
Message-ID: 570BB338.5000000@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?
Yes, it should. Alexander now works on it.

>
> 2. Unless I'm missing something, contrib/bloom is making no effort
> to ensure that bloom index pages can be distinguished from other pages
> by pg_filedump. That's fine if your expectation is that bloom will always
> be a toy with no use in production; but otherwise, not so much. You need
> to make sure that the last two bytes of the page special space contain a
> uniquely identifiable code; compare spgist_page_id in SPGiST indexes.

Now contrib/bloom is a canonical example how to implement yourown index and how
to use generic WAL interface. And it is an useful toy: in some cases it could
help in production system, patch attached. I'm a little dissatisfied with patch
because I had to add unused field to BloomPageOpaqueData, in opposite case size
of BloomPageOpaqueData struct equals 6 bytes but special size is MAXALIGNED, so,
last two bytes will be unused. If unused field is not a problem, I will push
this patch.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-11 14:23:03
Message-ID: CAPpHfdu_U7rd8NZk0s5YA+QtATjVf-b9iyue2yh6hTVDPniCvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Tom!

On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> 1. It doesn't seem like generic_xlog.c has thought very carefully about
> the semantics of the "hole" between pd_lower and pd_upper. The mainline
> XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
> for example RestoreBlockImage() explicitly zeroes the hole when restoring
> from a full-page image that has a hole. But generic_xlog.c's redo routine
> does not do anything comparable, nor does GenericXLogFinish make any
> effort to ensure that the "hole" is all-zeroes after normal application of
> a generic update. The reason this is of interest is that it means the
> contents of the "hole" could diverge between master and slave, or differ
> between the original state of a database and what it is after a crash and
> recovery. That would at least complicate forensic comparisons of pages,
> and I think it might also break checksumming. We thought that this was
> important enough to take the trouble of explicitly zeroing holes during
> mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?
>
> 2. Unless I'm missing something, contrib/bloom is making no effort
> to ensure that bloom index pages can be distinguished from other pages
> by pg_filedump. That's fine if your expectation is that bloom will always
> be a toy with no use in production; but otherwise, not so much. You need
> to make sure that the last two bytes of the page special space contain a
> uniquely identifiable code; compare spgist_page_id in SPGiST indexes.
>

Thank you for spotting these issues.
I'm going to prepare patches for fixing both of them.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-11 14:29:06
Message-ID: 18988.1460384946@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> Actually, as I look at this code for the first time, I find that
> GenericXLogFinish() is a very confusing interface. It makes no
> distinction between a page and the meta data associated to this page,
> this is controlled by a flag isNew and buffer data is saved as some
> delta. Actually, fullmage is not exactly true, this may refer to a
> page that has a hole. It seems to me that we should not have one but
> two routines: GenericXLogRegisterBuffer and
> GenericXLogRegisterBufData, similar to what the normal XLOG routines
> offer.

Hmm. Maybe the documentation leaves something to be desired, but
I thought that the interface was reasonable if you grant that the
goal is to be easy to use rather than to have maximum performance.
The calling code only has to concern itself with making the actual
changes to the target pages, not with constructing WAL descriptions
of those changes. Also, the fact that the changes don't have to be
made within a critical section is a bigger help than it might sound.

So I don't really have any problems with the API as such. I tried
to improve the docs a day or two ago, but maybe that needs further
work.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-11 14:36:57
Message-ID: 19258.1460385417@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> 2. Unless I'm missing something, contrib/bloom is making no effort
>> to ensure that bloom index pages can be distinguished from other pages
>> by pg_filedump. That's fine if your expectation is that bloom will always
>> be a toy with no use in production; but otherwise, not so much. You need
>> to make sure that the last two bytes of the page special space contain a
>> uniquely identifiable code; compare spgist_page_id in SPGiST indexes.

> Now contrib/bloom is a canonical example how to implement yourown index and how
> to use generic WAL interface. And it is an useful toy: in some cases it could
> help in production system, patch attached. I'm a little dissatisfied with patch
> because I had to add unused field to BloomPageOpaqueData, in opposite case size
> of BloomPageOpaqueData struct equals 6 bytes but special size is MAXALIGNED, so,
> last two bytes will be unused. If unused field is not a problem, I will push
> this patch.

Yes, it will mean that the special space will have to be 8 bytes not 4.
But on the other hand, that only makes a difference on 32-bit machines;
if MAXALIGN is 8 then you won't be able to fit anything into those bytes
anyway. And someday there might be a use for the other two bytes ...
so I'm not particularly concerned by the "wasted space" argument.

regards, tom lane


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 00:08:52
Message-ID: CAB7nPqTwhDhXHYXxafG436R_scgu8+rGF2_tRcx7+7BoXFe3AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 11, 2016 at 11:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> Actually, as I look at this code for the first time, I find that
>> GenericXLogFinish() is a very confusing interface. It makes no
>> distinction between a page and the meta data associated to this page,
>> this is controlled by a flag isNew and buffer data is saved as some
>> delta. Actually, fullmage is not exactly true, this may refer to a
>> page that has a hole. It seems to me that we should not have one but
>> two routines: GenericXLogRegisterBuffer and
>> GenericXLogRegisterBufData, similar to what the normal XLOG routines
>> offer.
>
> Hmm. Maybe the documentation leaves something to be desired, but
> I thought that the interface was reasonable if you grant that the
> goal is to be easy to use rather than to have maximum performance.
> The calling code only has to concern itself with making the actual
> changes to the target pages, not with constructing WAL descriptions
> of those changes. Also, the fact that the changes don't have to be
> made within a critical section is a bigger help than it might sound.
>
> So I don't really have any problems with the API as such. I tried
> to improve the docs a day or two ago, but maybe that needs further
> work.

Well, I am not saying that the given interface does nothing, far from
that. Though I think that we could take advantage of the rmgr
RM_GENERIC_ID introduced by this interface to be able to generate
custom WAL records and pass them through a stream.

As given out now, we are able to do the following:
- Log a full page
- Log a delta of a full page, which is actually data associated to a page.
- At recovery, replay those full pages with a delta.

What I thought we should be able to do with that should not be only
limited to indexes, aka to:
- Be able to register and log full pages
- Be able to log data associated to a block
- Be able to have record data
- Use at recovery custom routines to apply those WAL records
The first 3 ones are done via the set of routines generating WAL
records for the rmgr RM_GENERIC_ID. The 4th one needs a hook in
generic_redo. Looking at the thread where this patch has been debated
I am not seeing a discussion regarding that.
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 00:21:19
Message-ID: 18571.1460420479@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> As given out now, we are able to do the following:
> - Log a full page
> - Log a delta of a full page, which is actually data associated to a page.
> - At recovery, replay those full pages with a delta.

Right.

> What I thought we should be able to do with that should not be only
> limited to indexes, aka to:
> - Be able to register and log full pages
> - Be able to log data associated to a block
> - Be able to have record data
> - Use at recovery custom routines to apply those WAL records

I'm not following your distinction between a "page" and a "block",
perhaps. Also, the entire point here is that extensions DON'T
get to have custom apply routines, because we have no way for
replay to know about such apply routines. (It surely cannot look
in the catalogs for them, but there's no other suitable infrastructure
either.) In turn, that means there's little value in having any custom
data associated with the WAL records.

If we ever solve the registering-custom-replay-routines conundrum,
it'll be time to think about what the frontend API for that ought
to be. But this patch is not pretending to solve that, and indeed is
mainly showing that it's possible to have a useful WAL extension API
that doesn't require solving it.

In any case, the existence of this API doesn't foreclose adding
other APIs (perhaps backed by other RM_GENERIC_ID WAL record types)
later on. So I don't think we need to insist that it do everything
anyone will ever want.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 00:33:32
Message-ID: 18993.1460421212@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

... BTW, with respect to the documentation angle, it seems to me
that it'd be better if GenericXLogRegister were renamed to
GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
I think this would make the documentation clearer, and it would
also make it easier to add other sorts of Register actions later,
if we ever think of some (which seems not unlikely, really).

Another thing to think about is whether we're going to regret
hard-wiring the third argument as a boolean. Should we consider
making it a bitmask of flags, instead? It's not terribly hard
to think of other flags we might want there in future; for example
maybe something to tell GenericXLogFinish whether it's worth trying
to identify data movement on the page rather than just doing the
byte-by-byte delta calculation.

regards, tom lane


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 00:36:41
Message-ID: CAB7nPqQruy22hBHhOFpU3tYgmRgmk7YxZo6WVUqTOSocNUGnZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 9:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> What I thought we should be able to do with that should not be only
>> limited to indexes, aka to:
>> - Be able to register and log full pages
>> - Be able to log data associated to a block
>> - Be able to have record data
>> - Use at recovery custom routines to apply those WAL records
>
> I'm not following your distinction between a "page" and a "block",
> perhaps.

s/block/page/. Sorry for the confusion.

> Also, the entire point here is that extensions DON'T
> get to have custom apply routines, because we have no way for
> replay to know about such apply routines. (It surely cannot look
> in the catalogs for them, but there's no other suitable infrastructure
> either.) In turn, that means there's little value in having any custom
> data associated with the WAL records.

Well, yes, the startup process has no knowledge of that. That's why it
would need to go through a hook, but the startup process has no
knowledge of routines loaded via _PG_init perhaps? I recall that it
loaded them. The weakness with this interface is that one needs to be
sure that this hook is actually loaded properly.

> If we ever solve the registering-custom-replay-routines conundrum,
> it'll be time to think about what the frontend API for that ought
> to be. But this patch is not pretending to solve that, and indeed is
> mainly showing that it's possible to have a useful WAL extension API
> that doesn't require solving it.

Yep. I am not saying the contrary. This patch solves its own category
of things with its strict page-level control.

> In any case, the existence of this API doesn't foreclose adding
> other APIs (perhaps backed by other RM_GENERIC_ID WAL record types)
> later on. So I don't think we need to insist that it do everything
> anyone will ever want.

Perhaps I am just confused by the interface. Linking the idea of a
page delta with GenericXLogRegister is not that intuitive, knowing
that what it should actually be GenericXLogRegisterBuffer and we
should have GenericXLogAddDelta, that applies a page difference on top
of an existing one.
--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 00:39:28
Message-ID: CAB7nPqRrAgVhcxEb50KvbOsBkeDyAVvyLFQSN8PNcq1ik2p3gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 9:33 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> ... BTW, with respect to the documentation angle, it seems to me
> that it'd be better if GenericXLogRegister were renamed to
> GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
> I think this would make the documentation clearer, and it would
> also make it easier to add other sorts of Register actions later,
> if we ever think of some (which seems not unlikely, really).

Funny thing. I just suggested the same just above :) With a second
routine to generate a delta difference from a page to keep the
knowledge of this delta in its own code path.

> Another thing to think about is whether we're going to regret
> hard-wiring the third argument as a boolean. Should we consider
> making it a bitmask of flags, instead? It's not terribly hard
> to think of other flags we might want there in future; for example
> maybe something to tell GenericXLogFinish whether it's worth trying
> to identify data movement on the page rather than just doing the
> byte-by-byte delta calculation.

Yes. Definitely this interface needs more thoughts. I'd think of
GenericXLogFinish as a more generic entry point.
--
Michael


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 01:54:37
Message-ID: CAMsr+YGRFJ3cvNbFq0o=vMnHGL5naVR84FGLVSR2kUJQyAMg4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 April 2016 at 08:36, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

>
> > Also, the entire point here is that extensions DON'T
> > get to have custom apply routines, because we have no way for
> > replay to know about such apply routines. (It surely cannot look
> > in the catalogs for them, but there's no other suitable infrastructure
> > either.) In turn, that means there's little value in having any custom
> > data associated with the WAL records.
>
> Well, yes, the startup process has no knowledge of that. That's why it
> would need to go through a hook, but the startup process has no
> knowledge of routines loaded via _PG_init perhaps?

The only way we really have at the moment is shared_preload_libraries.

This got discussed much earlier, possibly on a previous iteration of the
generic xlog discussions rather than this specific thread. IIRC the gist
was that we need a way to:

- Flag the need for a redo routine so that replay stops if that redo
routine isn't available
- Find out *which* redo routine the generic log message needs during redo
- Get a pointer to that routine to actually call it

... and also possibly allow the admin to force skip of redo calls for a
given extension (but not others) in case of a major problem.

If you rely on shared_preload_libraries, then "oops, I forgot to put
myextension in shared_preload_libraries on the downstream" becomes a Bad
Thing. There's no way to go back and redo the work you've passed over. You
have to recreate the standby. Worse, it's silently wrong. The server has no
idea it was meant to do anything and no way to tell the user it couldn't do
what it was meant to do.

We can't register redo routines in the catalogs because we don't have
access to the syscaches etc during redo (right?).

So how do we look at a generic log record, say "ok, the upstream that wrote
this says it needs to invoke registered generic xlog hook 42, which is
<func> from <extension> at <ptr>" ?

Personally I'd be willing to accept the limitations of the s_p_l and hook
approach to the point where I think we should add the hook and accept the
caveats of its use ... but I understand the problems with it. I understand
not having custom redo routine support in this iteration of the patch. It's
somewhat orthogonal job to this patch anyway - while handy for specific
relation generic xlog, custom redo routines make most sense when combined
with generic xlog that isn't necessarily associated with a specific
relation. This patch doesn't support that.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 02:09:32
Message-ID: CAB7nPqTt=tZopA1FRx1U0Z+GSD_+MMpDCT90soYqriB=Md3wFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 10:54 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 12 April 2016 at 08:36, Michael Paquier wrote:
> The only way we really have at the moment is shared_preload_libraries.

That's exactly my point.

> If you rely on shared_preload_libraries, then "oops, I forgot to put
> myextension in shared_preload_libraries on the downstream" becomes a Bad
> Thing. There's no way to go back and redo the work you've passed over. You
> have to recreate the standby. Worse, it's silently wrong. The server has no
> idea it was meant to do anything and no way to tell the user it couldn't do
> what it was meant to do.

Well, one playing with the generic WAL record facility is

> We can't register redo routines in the catalogs because we don't have access
> to the syscaches etc during redo (right?).

Yes, the startup process does not have that knowledge.

> So how do we look at a generic log record, say "ok, the upstream that wrote
> this says it needs to invoke registered generic xlog hook 42, which is
> <func> from <extension> at <ptr>" ?
>
> Personally I'd be willing to accept the limitations of the s_p_l and hook
> approach to the point where I think we should add the hook and accept the
> caveats of its use ... but I understand the problems with it.

Honestly, so do I, and I could accept the use of a hook in
generic_redo in this code path which is really... Generic. Any other
infrastructure is going to be one brick shy of a load, and we are
actually sure that those routines are getting loaded once we reach the
first redo routine as far as I know. We could for example force the
hook to set some validation flags that are then checked in the generic
redo routine, and mark in the WAL record itself that this record
should have used the hook. If the record is replayed and this hook is
missing, then do a FATAL and prevent recovery to move on.
--
Michael


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 03:32:35
Message-ID: CAMsr+YFHp4DW9i_aQ11m2o7+NxoM+xxQoSMyVb7Thz3W7tee_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 April 2016 at 10:09, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Tue, Apr 12, 2016 at 10:54 AM, Craig Ringer <craig(at)2ndquadrant(dot)com>
> wrote:
>

> > If you rely on shared_preload_libraries, then "oops, I forgot to put
> > myextension in shared_preload_libraries on the downstream" becomes a Bad
> > Thing. There's no way to go back and redo the work you've passed over.
> You
> > have to recreate the standby. Worse, it's silently wrong. The server has
> no
> > idea it was meant to do anything and no way to tell the user it couldn't
> do
> > what it was meant to do.
>
> Well, one playing with the generic WAL record facility is
>

Yeah... that's what people say about a lot of things.

If it's there, someone will shoot their foot off with it and blame us.

There's a point where you have to say "well, that was dumb, you had to take
the safety off, remove the barrel plug attached to the giant sign saying
'do not remove', find and insert the bolt, and load the ammunition yourself
first, maybe you shouldn't have done that?"

However, that doesn't mean we should make it easy for a simple oversight to
have serious data correctness implications. I'm on the fence about whether
this counts; enough so that I wouldn't fight hard to get it in even if a
patch existed, which it doesn't, and we weren't past feature freeze, which
we are.

> Any other
> infrastructure is going to be one brick shy of a load

I disagree. It's very useful, just not for what you (and I) want to use it
for. It's still quite reasonable for custom index representations, and it
can be extended later.

> We could for example force the
> hook to set some validation flags that are then checked in the generic
> redo routine, and mark in the WAL record itself that this record
> should have used the hook. If the record is replayed and this hook is
> missing, then do a FATAL and prevent recovery to move on.
>
>
Right, but _which_ routine on the standby? How does the standby know which
hook must be called? You can't just say "any hook"; there might be multiple
ones interested in different generic WAL messages. You need a registration
system of some sort and a way for the standby and master to agree on how to
identify extensions that have redo hooks for generic WAL. Then you need to
be able to look up that registration in some manner during redo.

The simplest registration system would be something like a function called
at _PG_init time that passes a globally-unique integer identifier for the
extension that maps to some kind of central web-based registry where we
hand out IDs. Forever. Then the standby says "this xlog needs extension
with generic xlog id 42 but it's missing". But we all know how well those
generic global identifier systems work when people are testing and
prototyping stuff, want to change versions incompatibly, etc....

So I disagree it's as simple as a validation flag.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 12:14:41
Message-ID: CAPpHfdtsqZvMfHktkpwpaVNYz5cLy1B5ii3gW4e3Adz5abqKHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 3:08 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> What I thought we should be able to do with that should not be only
> limited to indexes, aka to:
> - Be able to register and log full pages
> - Be able to log data associated to a block
> - Be able to have record data
> - Use at recovery custom routines to apply those WAL records
> The first 3 ones are done via the set of routines generating WAL
> records for the rmgr RM_GENERIC_ID. The 4th one needs a hook in
> generic_redo. Looking at the thread where this patch has been debated
> I am not seeing a discussion regarding that.
>

I've designed generic xlog under following restrictions:
- We don't want users to register their own redo functions since it could
affect reliability. Unfortunately, I can't find original discussion now.
- API should be as simple as possible for extension author.

However, I think we should add some way for custom redo functions one day.
If we will add pluggable storage engines (I hope one day we will), then we
would face some engines which don't use our buffer manager. For such
pluggable storage engines, current generic WAL wouldn't work, but
user-defined redo functions would.

Now, my view is that we will need kind of two-phase recovery in order to
provide user-defined redo functions:
1) In the first phase, all built-in objects are recovered. After this
phase, we can use system catalog.
2) In the second phase, user-defined redo functions are called on the base
of consistent system catalog.

I think we can implement such two-phase recovery even now on the base of
generic logical messages. We can create logical slot. When extension is
loaded it starts second phase of recovery by reading generic logical
messages from this logical slot. Logical slot position should be also
advanced on checkpoint.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 12:48:09
Message-ID: CA+TgmoZeNf_-+V_vFu0pQ5jG89hiJD2TgvYRzhUMiwnLKt5kNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 11, 2016 at 9:54 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> If you rely on shared_preload_libraries, then "oops, I forgot to put
> myextension in shared_preload_libraries on the downstream" becomes a Bad
> Thing. There's no way to go back and redo the work you've passed over. You
> have to recreate the standby. Worse, it's silently wrong. The server has no
> idea it was meant to do anything and no way to tell the user it couldn't do
> what it was meant to do.

No, that's not right. If you rely on shared_preload_libraries, then
you'd just have the server die like this:

FATAL: I don't know how to replay a record for extension rmgr "craigrules".
HINT: Add the necessary library to shared_preload_libraries and
restart the server.

That seems 100% fine, although maybe we could adopt the convention
that the user-visible extension rmgr name should match the shared
library name, just as we do for logical decoding plugins (IIRC), and
the server can try to load a library by that name and continue.

> We can't register redo routines in the catalogs because we don't have access
> to the syscaches etc during redo (right?).

Correct.

> So how do we look at a generic log record, say "ok, the upstream that wrote
> this says it needs to invoke registered generic xlog hook 42, which is
> <func> from <extension> at <ptr>" ?

Record enough information in WAL that the rmgrs can have names instead
of ID numbers. Stick the list of extension rmgrs in use into each
checkpoint record and call it good.

> Personally I'd be willing to accept the limitations of the s_p_l and hook
> approach to the point where I think we should add the hook and accept the
> caveats of its use ... but I understand the problems with it. I understand
> not having custom redo routine support in this iteration of the patch. It's
> somewhat orthogonal job to this patch anyway - while handy for specific
> relation generic xlog, custom redo routines make most sense when combined
> with generic xlog that isn't necessarily associated with a specific
> relation. This patch doesn't support that.

Yeah. And while this patch may (probably does) have technical defects
that need to be cured, I don't think that's an argument against this
approach. For prototyping, this is totally fine. For
performance-critical stuff, probably not so much. But if you don't
give people a way to prototype stuff and extend the system, then they
won't be as likely to innovate and make new stuff, and then we won't
get as many new patches for core proposing new and innovative things.
I think it's great that we are finally getting close to make the
access method system truly extensible - that has been a clear need for
a long time, and I think we are going about it in the right way. We
can add more in the future.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 13:09:05
Message-ID: CAPpHfdtqe02CC-rD9CnPG0rTsXJMveMj9ztrWkkB8SkBKCDyHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> 1. It doesn't seem like generic_xlog.c has thought very carefully about
> the semantics of the "hole" between pd_lower and pd_upper. The mainline
> XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
> for example RestoreBlockImage() explicitly zeroes the hole when restoring
> from a full-page image that has a hole. But generic_xlog.c's redo routine
> does not do anything comparable, nor does GenericXLogFinish make any
> effort to ensure that the "hole" is all-zeroes after normal application of
> a generic update. The reason this is of interest is that it means the
> contents of the "hole" could diverge between master and slave, or differ
> between the original state of a database and what it is after a crash and
> recovery. That would at least complicate forensic comparisons of pages,
> and I think it might also break checksumming. We thought that this was
> important enough to take the trouble of explicitly zeroing holes during
> mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?
>

Attached patch is intended to fix this. It zeroes "hole" in both
GenericXLogFinish() and generic_redo().

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
generic-xlog-zero-gap.patch application/octet-stream 1.8 KB

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 13:33:05
Message-ID: CAPpHfduWpq22DHW+436p-rhz-xgf7HG=3nYts-j=cYjGyZ6MfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 3:33 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> ... BTW, with respect to the documentation angle, it seems to me
> that it'd be better if GenericXLogRegister were renamed to
> GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
> I think this would make the documentation clearer, and it would
> also make it easier to add other sorts of Register actions later,
> if we ever think of some (which seems not unlikely, really).
>
> Another thing to think about is whether we're going to regret
> hard-wiring the third argument as a boolean. Should we consider
> making it a bitmask of flags, instead? It's not terribly hard
> to think of other flags we might want there in future; for example
> maybe something to tell GenericXLogFinish whether it's worth trying
> to identify data movement on the page rather than just doing the
> byte-by-byte delta calculation.

I agree with both of these ideas. Patch is attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
generic-xlog-register.patch application/octet-stream 11.5 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 13:36:24
Message-ID: CAMsr+YHBJbWGkbJAaYZvEBqyt0QYgZhbKwRwx4ZHkm7jQuTLgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 April 2016 at 20:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> > So how do we look at a generic log record, say "ok, the upstream that
> wrote
> > this says it needs to invoke registered generic xlog hook 42, which is
> > <func> from <extension> at <ptr>" ?
>
> Record enough information in WAL that the rmgrs can have names instead
> of ID numbers. Stick the list of extension rmgrs in use into each
> checkpoint record and call it good.
>

Repeating the mapping at each checkpoint sounds pretty reasonable and means
we always know what we need. There's no need to bloat each record with an
extension name and no need for any kind of ugly global registration. The
mapping table would be small and simple. I like it.

Of course, it's all maybe-in-future stuff at this point, but I think that's
a really good way to approach it.

There's no way around the fact that user defined redo functions can affect
reliability. But then, so can user-defined data types, functions,
bgworkers, plpython functions loading ctypes, plpython functions getting
creative in the datadir, and admins who paste into the wrong window. The
scope for problems is somewhat greater but not IMO prohibitively so.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 15:15:21
Message-ID: 1369.1460474121@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> Attached patch is intended to fix this. It zeroes "hole" in both
> GenericXLogFinish() and generic_redo().

Pushed. I rewrote the comments a bit and modified GenericXLogFinish
so that it doesn't copy data only to overwrite it with zeroes.

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 15:34:24
Message-ID: 570D1580.2040504@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I agree with both of these ideas. Patch is attached.

Hmm, you accidentally forget to change flag type of GenericXLogRegister in
contrib/bloom signature.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 15:35:54
Message-ID: CAPpHfduigaVxE=hS_8YqqPPnOn5EWAB44kM8EZZeC1qcXGNwaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 6:34 PM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

> I agree with both of these ideas. Patch is attached.
>>
>
> Hmm, you accidentally forget to change flag type of GenericXLogRegister in
> contrib/bloom signature.

Fixed.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
generic-xlog-register-2.patch application/octet-stream 11.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 15:42:46
Message-ID: 3290.1460475766@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> I agree with both of these ideas. Patch is attached.

Pushed with minor cleanup.

regards, tom lane


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 15:53:51
Message-ID: CAPpHfduOs+__FgC0jDnv6t0uZ5YpBvNsoE-5eVYNJuFY63dqtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 6:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> > I agree with both of these ideas. Patch is attached.
>
> Pushed with minor cleanup.
>

Great, thanks!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some other things about contrib/bloom and generic_xlog.c
Date: 2016-04-12 16:28:10
Message-ID: CA+TgmoYgk5Z4B1u7eEp9QJDL+cC0ULuKnO4_SwyumahBAOqNvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 9:36 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> Repeating the mapping at each checkpoint sounds pretty reasonable and means
> we always know what we need. There's no need to bloat each record with an
> extension name and no need for any kind of ugly global registration. The
> mapping table would be small and simple. I like it.
>
> Of course, it's all maybe-in-future stuff at this point, but I think that's
> a really good way to approach it.
>
> There's no way around the fact that user defined redo functions can affect
> reliability. But then, so can user-defined data types, functions, bgworkers,
> plpython functions loading ctypes, plpython functions getting creative in
> the datadir, and admins who paste into the wrong window. The scope for
> problems is somewhat greater but not IMO prohibitively so.

I am in agreement with you on both the merits of this particular thing
and the general principle you are articulating regarding how to think
about these things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company