Quick Links

Re: making the backend's json parser work in frontend code

Lists:	pgsql-hackers

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	making the backend's json parser work in frontend code
Date:	2020-01-15 21:02:49
Message-ID:	CA+TgmoYO8oq-iy8E02rD8eX25T-9SmyxKWqqks5OMHxKvGXpXQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The discussion on the backup manifest thread has gotten bogged down on
the issue of the format that should be used to store the backup
manifest file. I want something simple and ad-hoc; David Steele and
Stephen Frost prefer JSON. That is problematic because our JSON parser
does not work in frontend code, and I want to be able to validate a
backup against its manifest, which involves being able to parse the
manifest from frontend code. The latest development over there is that
David Steele has posted the JSON parser that he wrote for pgbackrest
with an offer to try to adapt it for use in front-end PostgreSQL code,
an offer which I genuinely appreciate. I'll write more about that over
on that thread. However, I decided to spend today doing some further
investigation of an alternative approach, namely making the backend's
existing JSON parser work in frontend code as well. I did not solve
all the problems there, but I did come up with some patches which I
think would be worth committing on independent grounds, and I think
the whole series is worth posting. So here goes.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup. It's long been
the case that wchar.c is actually compiled and linked into both
frontend and backend code. Commit
60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
that depends on wchar.c being available, but didn't actually make
wchar.c part of src/common, which seems like an odd decision: the
functions in the library are dependent on code that is not part of any
library but whose source files get copied around where needed. Eh?

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

As far as I know all three of the above patches are committable as-is;
review and contrary opinions welcome.

On the other hand, 0004, 0005, and 0006 are charitably described as
experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can
still be compiled even if #include "postgres.h" is changed to #include
"postgres-fe.h" and 0006 moves it into src/common. Note that I say
that they make it compile, not work. It's not just untested; it's
definitely broken. But it gives a feeling for what the remaining
obstacles to making this code available in a frontend environment are.
Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

Suggestions welcome.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
0001-Move-wchar.c-to-src-common.patch	application/octet-stream	1.9 KB
0002-Adjust-src-include-utils-jsonapi.h-so-it-s-not-backe.patch	application/octet-stream	6.9 KB
0003-Split-JSON-lexer-parser-from-json-data-type-support.patch	application/octet-stream	70.1 KB
0004-Introduce-json_error-macro.patch	application/octet-stream	14.7 KB
0005-Frontendify-jsonapi.c.patch	application/octet-stream	5.1 KB
0006-Move-jsonapi.c-to-src-common.patch	application/octet-stream	1.6 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-15 22:47:54
Message-ID:	9340.1579128474@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> ... However, I decided to spend today doing some further
> investigation of an alternative approach, namely making the backend's
> existing JSON parser work in frontend code as well. I did not solve
> all the problems there, but I did come up with some patches which I
> think would be worth committing on independent grounds, and I think
> the whole series is worth posting. So here goes.

In general, if we can possibly get to having one JSON parser in
src/common, that seems like an obviously better place to be than
having two JSON parsers. So I'm encouraged that it might be
feasible after all.

> 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
> missing something, this seems like an overdue cleanup.

FWIW, I've been wanting to do that for awhile. I've not studied
your patch, but +1 for the idea. We might also need to take a
hard look at mbutils.c to see if any of that code can/should move.

> Since I wrote my very first email complaining about the difficulty of
> making the backend's JSON parser work in a frontend environment, one
> obstacle has been knocked down: StringInfo is now available in
> front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
> remaining problems (that I know about) have to do with error reporting
> and multibyte character support; a read of the patches is suggested
> for those wanting further details.

The patch I just posted at <2863(dot)1579127649(at)sss(dot)pgh(dot)pa(dot)us> probably
affects this in small ways, but not anything major.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-15 23:40:46
Message-ID:	20200115234046.lb4oeyvv7qf7h6vg@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2020-01-15 16:02:49 -0500, Robert Haas wrote:
> The discussion on the backup manifest thread has gotten bogged down on
> the issue of the format that should be used to store the backup
> manifest file. I want something simple and ad-hoc; David Steele and
> Stephen Frost prefer JSON. That is problematic because our JSON parser
> does not work in frontend code, and I want to be able to validate a
> backup against its manifest, which involves being able to parse the
> manifest from frontend code. The latest development over there is that
> David Steele has posted the JSON parser that he wrote for pgbackrest
> with an offer to try to adapt it for use in front-end PostgreSQL code,
> an offer which I genuinely appreciate. I'll write more about that over
> on that thread.

I'm not sure where I come down between using json and a simple ad-hoc
format, when the dependency for the former is making the existing json
parser work in the frontend. But if the alternative is to add a second
json parser, it very clearly shifts towards using an ad-hoc
format. Having to maintain a simple ad-hoc parser is a lot less
technical debt than having a second full blown json parser. Imo even
when an external project or three also has to have that simple parser.

If the alternative were to use that newly proposed json parser to
*replace* the backend one too, the story would again be different.

> 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
> missing something, this seems like an overdue cleanup. It's long been
> the case that wchar.c is actually compiled and linked into both
> frontend and backend code. Commit
> 60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
> that depends on wchar.c being available, but didn't actually make
> wchar.c part of src/common, which seems like an odd decision: the
> functions in the library are dependent on code that is not part of any
> library but whose source files get copied around where needed. Eh?

Cool.

> 0002 does some basic header cleanup to make it possible to include the
> existing header file jsonapi.h in frontend code. The state of the JSON
> headers today looks generally poor. There seems not to have been much
> attempt to get the prototypes for a given source file, say foo.c, into
> a header file with the same name, say foo.h. Also, dependencies
> between various header files seem to be have added somewhat freely.
> This patch does not come close to fixing all that, but I consider it a
> modest down payment on a cleanup that probably ought to be taken
> further.

Yea, this seems like a necessary cleanup (or well, maybe the start of
it).

> 0003 splits json.c into two files, json.c and jsonapi.c. All the
> lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
> jsonapi.c, while the stuff that pertains to the 'json' data type
> remains in json.c. This also seems like a good cleanup, because to me,
> at least, it's not a great idea to mix together code that is used by
> both the json and jsonb data types as well as other things in the
> system that want to generate or parse json together with things that
> are specific to the 'json' data type.

> On the other hand, 0004, 0005, and 0006 are charitably described as
> experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can
> still be compiled even if #include "postgres.h" is changed to #include
> "postgres-fe.h" and 0006 moves it into src/common. Note that I say
> that they make it compile, not work. It's not just untested; it's
> definitely broken. But it gives a feeling for what the remaining
> obstacles to making this code available in a frontend environment are.
> Since I wrote my very first email complaining about the difficulty of
> making the backend's JSON parser work in a frontend environment, one
> obstacle has been knocked down: StringInfo is now available in
> front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
> remaining problems (that I know about) have to do with error reporting
> and multibyte character support; a read of the patches is suggested
> for those wanting further details.

> From d05e1fc82a51cb583a0367e72b1afc0de561dd00 Mon Sep 17 00:00:00 2001
> From: Robert Haas <rhaas(at)postgresql(dot)org>
> Date: Wed, 15 Jan 2020 10:36:52 -0500
> Subject: [PATCH 4/6] Introduce json_error() macro.
>
> ---
> src/backend/utils/adt/jsonapi.c | 221 +++++++++++++-------------------
> 1 file changed, 90 insertions(+), 131 deletions(-)
>
> diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
> index fc8af9f861..20f7f0f7ac 100644
> --- a/src/backend/utils/adt/jsonapi.c
> +++ b/src/backend/utils/adt/jsonapi.c
> @@ -17,6 +17,9 @@
> #include "miscadmin.h"
> #include "utils/jsonapi.h"
>
> +#define json_error(rest) \
> + ereport(ERROR, (rest, report_json_context(lex)))
> +

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

> /* the null action object used for pure validation */
> @@ -701,7 +735,11 @@ json_lex_string(JsonLexContext *lex)
> ch = (ch * 16) + (*s - 'A') + 10;
> else
> {
> +#ifdef FRONTEND
> + lex->token_terminator = s + PQmblen(s, PG_UTF8);
> +#else
> lex->token_terminator = s + pg_mblen(s);
> +#endif

If we were to go this way, it seems like the ifdef should rather be in a
helper function, rather than all over. It seems like it should be
unproblematic to have a common interface for both frontend/backend?

Greetings,

Andres Freund

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 02:39:13
Message-ID:	CA+TgmobEw4DBNWsNtJ269j8XSUngff46Kd2ow0askRzV0FccMg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> It's not obvious why the better approach here wouldn't be to just have a
> very simple ereport replacement, that needs to be explicitly included
> from frontend code. It'd not be meaningfully harder, imo, and it'd
> require fewer adaptions, and it'd look more familiar.

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

> > +#ifdef FRONTEND
> > + lex->token_terminator = s + PQmblen(s, PG_UTF8);
> > +#else
> > lex->token_terminator = s + pg_mblen(s);
> > +#endif
>
> If we were to go this way, it seems like the ifdef should rather be in a
> helper function, rather than all over.

Sure... like I said, this is just to illustrate the problem.

> It seems like it should be
> unproblematic to have a common interface for both frontend/backend?

Not sure how. pg_mblen() and PQmblen() are both existing interfaces,
and they're not compatible with each other. I guess we could make
PQmblen() available to backend code, but given that the function name
implies an origin in libpq, that seems wicked confusing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 03:35:28
Message-ID:	20200116033528.GC3117@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 15, 2020 at 09:39:13PM -0500, Robert Haas wrote:
> On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> It's not obvious why the better approach here wouldn't be to just have a
>> very simple ereport replacement, that needs to be explicitly included
>> from frontend code. It'd not be meaningfully harder, imo, and it'd
>> require fewer adaptions, and it'd look more familiar.
>
> I agree that it's far from obvious that the hacks in the patch are
> best; to the contrary, they are hacks. That said, I feel that the
> semantics of throwing an error are not very well-defined in a
> front-end environment. I mean, in a backend context, throwing an error
> is going to abort the current transaction, with all that this implies.
> If the frontend equivalent is to do nothing and hope for the best, I
> doubt it will survive anything more than the simplest use cases. This
> is one of the reasons I've been very reluctant to go do down this
> whole path in the first place.

The error handling is a well defined concept in the backend. If
connected to a database, you know that a session has to rollback any
existing activity, etc. The clients have to be more flexible because
an error depends a lot of how the tools is designed and how it should
react on a error. So the backend code in charge of logging an error
does the best it can: it throws an error, then lets the caller decide
what to do with it. I agree with the feeling that having a simple
replacement for ereport() in the frontend would be nice, that would be
less code churn in parts shared by backend/frontend.

> Not sure how. pg_mblen() and PQmblen() are both existing interfaces,
> and they're not compatible with each other. I guess we could make
> PQmblen() available to backend code, but given that the function name
> implies an origin in libpq, that seems wicked confusing.

Well, the problem here is the encoding part, and the code looks at the
same table pg_wchar_table[] at the end, so this needs some thoughts.
On top of that, we don't know exactly on the client what kind of
encoding is available (this led for example to several
assumptions/hiccups behind the implementation of SCRAM as it requires
utf-8 per its RFC when working on the libpq part).
--
Michael

From:	David Steele <david(at)pgmasters(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 18:37:12
Message-ID:	e39320c8-cfd2-9b10-255d-b3f2ffc2caaf@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Robert,

On 1/15/20 2:02 PM, Robert Haas wrote:
> The discussion on the backup manifest thread has gotten bogged down on
> the issue of the format that should be used to store the backup
> manifest file. I want something simple and ad-hoc; David Steele and
> Stephen Frost prefer JSON. That is problematic because our JSON parser
> does not work in frontend code, and I want to be able to validate a
> backup against its manifest, which involves being able to parse the
> manifest from frontend code. The latest development over there is that
> David Steele has posted the JSON parser that he wrote for pgbackrest
> with an offer to try to adapt it for use in front-end PostgreSQL code,
> an offer which I genuinely appreciate. I'll write more about that over
> on that thread. However, I decided to spend today doing some further
> investigation of an alternative approach, namely making the backend's
> existing JSON parser work in frontend code as well. I did not solve
> all the problems there, but I did come up with some patches which I
> think would be worth committing on independent grounds, and I think
> the whole series is worth posting. So here goes.

I was starting to wonder if it wouldn't be simpler to go back to the
Postgres JSON parser and see if we can adapt it. I'm not sure that it
*is* simpler, but it would almost certainly be more acceptable.

This looks like an obvious improvement to me.

Agreed that these header files are fairly disorganized. In general the
names json, jsonapi, jsonfuncs don't tell me a whole lot. I feel like
I'd want to include json.h to get a json parser but it only contains one
utility function before these patches. I can see that json.c primarily
contains SQL functions so that's why.

So the idea here is that json.c will have the JSON SQL functions,
jsonb.c the JSONB SQL functions, and jsonapi.c the parser, and
jsonfuncs.c the utility functions?

This seems like a good first step. I wonder if the remainder of the SQL
json/jsonb functions should be moved to json.c/jsonb.c respectively?

That does represent a lot of code churn though, so perhaps not worth it.

> As far as I know all three of the above patches are committable as-is;
> review and contrary opinions welcome.

Agreed, with some questions as above.

Well, with the caveat that it doesn't work, it's less than I expected.

Obviously ereport() is a pretty big deal and I agree with Michael
downthread that we should port this to the frontend code.

It would also be nice to unify functions like PQmblen() and pg_mblen()
if possible.

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

Regards,
--
-David
david(at)pgmasters(dot)net

From:	David Steele <david(at)pgmasters(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 18:39:31
Message-ID:	68da9d60-fbbb-0400-0755-2d04a212f023@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Robert,

On 1/16/20 11:37 AM, David Steele wrote:
>
> The next question in my mind is given the caveat that the error handing
> is questionable in the front end, can we at least render/parse valid
> JSON with the code?

Hrm, this bit was from an earlier edit. I meant:

The next question in my mind is what will it take to get this working in
a limited form so we can at least prototype it with pg_basebackup. I
can hack on this with some static strings in front end code tomorrow to
see what works and what doesn't if that makes sense.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 18:51:34
Message-ID:	CA+TgmoaZqLoFusvDcVve-gbG_ip3HYr+FqPq7k0jV0ta4t3hdg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 16, 2020 at 1:37 PM David Steele <david(at)pgmasters(dot)net> wrote:
> I was starting to wonder if it wouldn't be simpler to go back to the
> Postgres JSON parser and see if we can adapt it. I'm not sure that it
> *is* simpler, but it would almost certainly be more acceptable.

That is my feeling also.

> So the idea here is that json.c will have the JSON SQL functions,
> jsonb.c the JSONB SQL functions, and jsonapi.c the parser, and
> jsonfuncs.c the utility functions?

Uh, I think roughly that, yes. Although I can't claim to fully
understand everything that's here.

> This seems like a good first step. I wonder if the remainder of the SQL
> json/jsonb functions should be moved to json.c/jsonb.c respectively?
>
> That does represent a lot of code churn though, so perhaps not worth it.

I don't have an opinion on this right now.

> Well, with the caveat that it doesn't work, it's less than I expected.
>
> Obviously ereport() is a pretty big deal and I agree with Michael
> downthread that we should port this to the frontend code.

Another possibly-attractive option would be to defer throwing the
error: i.e. set some flags in the lex or parse state or something, and
then just return. The caller notices the flags and has enough
information to throw an error or whatever it wants to do. The reason I
think this might be attractive is that it dodges the whole question of
what exactly throwing an error is supposed to do in a world without
transactions, memory contexts, resource owners, etc. However, it has
some pitfalls of its own, like maybe being too much code churn or
hurting performance in non-error cases.

> It would also be nice to unify functions like PQmblen() and pg_mblen()
> if possible.

I don't see how to do that at the moment, but I agree that it would be
nice if we can figure it out.

> The next question in my mind is given the caveat that the error handing
> is questionable in the front end, can we at least render/parse valid
> JSON with the code?

That's a real good question. Thanks for offering to test it; I think
that would be very helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	David Steele <david(at)pgmasters(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 18:58:55
Message-ID:	dd72960a-6f10-9db6-178d-e06769454142@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/15/20 4:40 PM, Andres Freund wrote:
>
> I'm not sure where I come down between using json and a simple ad-hoc
> format, when the dependency for the former is making the existing json
> parser work in the frontend. But if the alternative is to add a second
> json parser, it very clearly shifts towards using an ad-hoc
> format. Having to maintain a simple ad-hoc parser is a lot less
> technical debt than having a second full blown json parser.

Maybe at first, but it will grow and become more complex as new features
are added. This has been our experience with pgBackRest, at least.

> Imo even
> when an external project or three also has to have that simple parser.

I don't agree here. Especially if we outgrow the format and they need
two parsers, depending on the version of PostgreSQL.

To do page-level incrementals (which this feature is intended to enable)
the user will need to be able to associate full and incremental backups
and the only way I see to do that (currently) is to read the manifests,
since the prior backup should be stored there. I think this means that
parsing the manifest is not really optional -- it will be required to do
any kind of automation with incrementals.

It's easy enough for a tool like pgBackRest to do something like that,
much harder for a user hacking together a tool in bash based on
pg_basebackup.

> If the alternative were to use that newly proposed json parser to
> *replace* the backend one too, the story would again be different.

That was certainly not my intention.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	David Steele <david(at)pgmasters(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 19:11:14
Message-ID:	c0297362-9a4d-b51f-aec2-c752d98f7d12@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/15/20 7:39 PM, Robert Haas wrote:
> On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> It's not obvious why the better approach here wouldn't be to just have a
>> very simple ereport replacement, that needs to be explicitly included
>> from frontend code. It'd not be meaningfully harder, imo, and it'd
>> require fewer adaptions, and it'd look more familiar.
>
> I agree that it's far from obvious that the hacks in the patch are
> best; to the contrary, they are hacks. That said, I feel that the
> semantics of throwing an error are not very well-defined in a
> front-end environment. I mean, in a backend context, throwing an error
> is going to abort the current transaction, with all that this implies.
> If the frontend equivalent is to do nothing and hope for the best, I
> doubt it will survive anything more than the simplest use cases. This
> is one of the reasons I've been very reluctant to go do down this
> whole path in the first place.

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 19:20:28
Message-ID:	7064.1579202428@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

David Steele <david(at)pgmasters(dot)net> writes:
> On 1/15/20 7:39 PM, Robert Haas wrote:
>>> I agree that it's far from obvious that the hacks in the patch are
>>> best; to the contrary, they are hacks. That said, I feel that the
>>> semantics of throwing an error are not very well-defined in a
>>> front-end environment. I mean, in a backend context, throwing an error
>>> is going to abort the current transaction, with all that this implies.
>>> If the frontend equivalent is to do nothing and hope for the best, I
>>> doubt it will survive anything more than the simplest use cases. This
>>> is one of the reasons I've been very reluctant to go do down this
>>> whole path in the first place.

> The way we handle this in pgBackRest is to put a TRY ... CATCH block in
> main() to log and exit on any uncaught THROW. That seems like a
> reasonable way to start here. Without memory contexts that almost
> certainly will mean memory leaks but I'm not sure how much that matters
> if the action is to exit immediately.

If that's the expectation, we might as well replace backend ereport(ERROR)
with something that just prints a message and does exit(1).

The question comes down to whether there are use-cases where a frontend
application would really want to recover and continue processing after
a JSON syntax problem. I'm not seeing that that's a near-term
requirement, so maybe we could leave it for somebody to solve when
and if they want to do it.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 19:26:24
Message-ID:	20200116192624.yo6aecandxkzf35m@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2020-01-16 14:20:28 -0500, Tom Lane wrote:
> David Steele <david(at)pgmasters(dot)net> writes:
> > The way we handle this in pgBackRest is to put a TRY ... CATCH block in
> > main() to log and exit on any uncaught THROW. That seems like a
> > reasonable way to start here. Without memory contexts that almost
> > certainly will mean memory leaks but I'm not sure how much that matters
> > if the action is to exit immediately.
>
> If that's the expectation, we might as well replace backend ereport(ERROR)
> with something that just prints a message and does exit(1).

Well, the process might still want to do some cleanup of half-finished
work. You'd not need to be resistant against memory leaks to do so, if
followed by an exit. Obviously you can also do all the necessarily
cleanup from within the ereport(ERROR) itself, but that doesn't seem
appealing to me (not composable, harder to reuse for other programs,
etc).

Greetings,

Andres Freund

From:	David Steele <david(at)pgmasters(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 20:10:53
Message-ID:	ac6a122e-17c5-c962-6e7e-73cde7ffdda6@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/16/20 12:26 PM, Andres Freund wrote:
> Hi,
>
> On 2020-01-16 14:20:28 -0500, Tom Lane wrote:
>> David Steele <david(at)pgmasters(dot)net> writes:
>>> The way we handle this in pgBackRest is to put a TRY ... CATCH block in
>>> main() to log and exit on any uncaught THROW. That seems like a
>>> reasonable way to start here. Without memory contexts that almost
>>> certainly will mean memory leaks but I'm not sure how much that matters
>>> if the action is to exit immediately.
>>
>> If that's the expectation, we might as well replace backend ereport(ERROR)
>> with something that just prints a message and does exit(1).
>
> Well, the process might still want to do some cleanup of half-finished
> work. You'd not need to be resistant against memory leaks to do so, if
> followed by an exit. Obviously you can also do all the necessarily
> cleanup from within the ereport(ERROR) itself, but that doesn't seem
> appealing to me (not composable, harder to reuse for other programs,
> etc).

In pgBackRest we have a default handler that just logs the message to
stderr and exits (though we consider it a coding error if it gets
called). Seems like we could do the same here. Default message and
exit if no handler, but optionally allow a handler (which could RETHROW
to get to the default handler afterwards).

It seems like we've been wanting a front end version of ereport() for a
while so I'll take a look at that and see what it involves.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 20:11:15
Message-ID:	15738.1579205475@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
> missing something, this seems like an overdue cleanup.

Here's a reviewed version of 0001. You missed fixing the MSVC build,
and there were assorted comments and other things referencing wchar.c
that needed to be cleaned up.

Also, it seemed to me that if we are going to move wchar.c, we should
also move encnames.c, so that libpq can get fully out of the
symlinking-source-files business. It makes initdb less weird too.

I took the liberty of sticking proper copyright headers onto these
two files, too. (This makes the diff a lot more bulky :-(. Would
it help to add the headers in a separate commit?)

Another thing I'm wondering about is if any of the #ifndef FRONTEND
code should get moved *back* to src/backend/utils/mb. But that
could be a separate commit, too.

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common. But that'd be far more invasive
to other source files, so I've not touched the issue here.

regards, tom lane

Attachment	Content-Type	Size
0001-move-wchar-and-encnames-to-src-common.patch	text/x-diff	126.3 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 20:21:38
Message-ID:	CA+TgmoaoO-pFD29xm0izRBYpBEqKA23ZhQ6Wdjy=tfeNtL_Pfw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 16, 2020 at 3:11 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > 0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
> > missing something, this seems like an overdue cleanup.
>
> Here's a reviewed version of 0001. You missed fixing the MSVC build,
> and there were assorted comments and other things referencing wchar.c
> that needed to be cleaned up.

Wow, thanks.

> Also, it seemed to me that if we are going to move wchar.c, we should
> also move encnames.c, so that libpq can get fully out of the
> symlinking-source-files business. It makes initdb less weird too.

OK.

> I took the liberty of sticking proper copyright headers onto these
> two files, too. (This makes the diff a lot more bulky :-(. Would
> it help to add the headers in a separate commit?)

I wouldn't bother making it a separate commit, but please do whatever you like.

> Another thing I'm wondering about is if any of the #ifndef FRONTEND
> code should get moved *back* to src/backend/utils/mb. But that
> could be a separate commit, too.

+1 for moving that stuff to a separate backend-only file.

> Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
> migrate over to src/include/common. But that'd be far more invasive
> to other source files, so I've not touched the issue here.

I don't have a view on this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 20:44:37
Message-ID:	CA+TgmobZD+tfzK55_A_FepgML0H1LgNXCwQPQKs3qewqQToRWQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 16, 2020 at 1:58 PM David Steele <david(at)pgmasters(dot)net> wrote:
> To do page-level incrementals (which this feature is intended to enable)
> the user will need to be able to associate full and incremental backups
> and the only way I see to do that (currently) is to read the manifests,
> since the prior backup should be stored there. I think this means that
> parsing the manifest is not really optional -- it will be required to do
> any kind of automation with incrementals.

My current belief is that enabling incremental backup will require
extending the manifest format either not at all or by adding one
additional line with some LSN info.

If we could foresee a need to store a bunch of additional *per-file*
details, I'd be a lot more receptive to the argument that we ought to
be using a more structured format like JSON. And it doesn't seem
impossible that such a thing could happen, but I don't think it's at
all clear that it actually will happen, or that it will happen soon
enough that we ought to be worrying about it now.

It's possible that we're chasing a real problem here, and if there's
something we can agree on and get done I'd rather do that than argue,
but I am still quite suspicious that there's no actually serious
technical problem here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 21:24:42
Message-ID:	29299.1579209882@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jan 16, 2020 at 3:11 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Here's a reviewed version of 0001. You missed fixing the MSVC build,
>> and there were assorted comments and other things referencing wchar.c
>> that needed to be cleaned up.

> Wow, thanks.

Pushed that.

>> Another thing I'm wondering about is if any of the #ifndef FRONTEND
>> code should get moved *back* to src/backend/utils/mb. But that
>> could be a separate commit, too.

> +1 for moving that stuff to a separate backend-only file.

After a brief look, I propose the following:

* I think we should just shove the "#ifndef FRONTEND" stuff in
wchar.c into mbutils.c. It doesn't seem worth inventing a whole
new file for that code, especially when it's arguably within the
remit of mbutils.c anyway.

* Let's remove the "#ifndef FRONTEND" restriction on the ICU-related
stuff in encnames.c. Even if we don't need that stuff in frontend
today, it's hardly unlikely that we will need it tomorrow. And there's
not that much bulk there anyway.

* The one positive reason for that restriction is the ereport() in
get_encoding_name_for_icu. We could change that to be the usual
#ifdef-ereport-or-printf dance, but I think there's a better way: put
the ereport at the caller, by redefining that function to return NULL
for an unsupported encoding. There's only one caller today anyhow.

* PG_char_to_encoding() and PG_encoding_to_char() can be moved to
mbutils.c; they'd fit reasonably well beside getdatabaseencoding and
pg_client_encoding. (I also thought about utils/adt/misc.c, but
that's not obviously better.)

Barring objections I'll go make this happen shortly.

>> Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
>> migrate over to src/include/common. But that'd be far more invasive
>> to other source files, so I've not touched the issue here.

> I don't have a view on this.

If anyone is hot to do this part, please have at it. I'm not.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-16 23:15:40
Message-ID:	15570.1579216540@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It's possible that we're chasing a real problem here, and if there's
> something we can agree on and get done I'd rather do that than argue,
> but I am still quite suspicious that there's no actually serious
> technical problem here.

It's entirely possible that you're right. But if this is a file format
that is meant to be exposed to user tools, we need to take a very long
view of the requirements for it. Five or ten years down the road, we
might be darn glad we spent extra time now.

regards, tom lane

From:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-17 01:54:57
Message-ID:	CAA8=A7_Pc=ZGbedYFJ9XBi9BVsjGDBUvZ=BpAKQ=COhz7MeXVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 16, 2020 at 7:33 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>

>
> 0002 does some basic header cleanup to make it possible to include the
> existing header file jsonapi.h in frontend code. The state of the JSON
> headers today looks generally poor. There seems not to have been much
> attempt to get the prototypes for a given source file, say foo.c, into
> a header file with the same name, say foo.h. Also, dependencies
> between various header files seem to be have added somewhat freely.
> This patch does not come close to fixing all that, but I consider it a
> modest down payment on a cleanup that probably ought to be taken
> further.
>
> 0003 splits json.c into two files, json.c and jsonapi.c. All the
> lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
> jsonapi.c, while the stuff that pertains to the 'json' data type
> remains in json.c. This also seems like a good cleanup, because to me,
> at least, it's not a great idea to mix together code that is used by
> both the json and jsonb data types as well as other things in the
> system that want to generate or parse json together with things that
> are specific to the 'json' data type.
>

I'm probably responsible for a good deal of the mess, so let me say Thankyou.

I'll have a good look at these.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	David Steele <david(at)pgmasters(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-17 17:36:26
Message-ID:	f7a23062-18d6-8aa3-e168-23f5bc38652a@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Robert,

On 1/16/20 11:51 AM, Robert Haas wrote:
> On Thu, Jan 16, 2020 at 1:37 PM David Steele <david(at)pgmasters(dot)net> wrote:
>
>> The next question in my mind is given the caveat that the error handing
>> is questionable in the front end, can we at least render/parse valid
>> JSON with the code?
>
> That's a real good question. Thanks for offering to test it; I think
> that would be very helpful.

It seems to work just fine. I didn't stress it too hard but I did put
in one escape and a multi-byte character and check the various data types.

Attached is a test hack on pg_basebackup which produces this output:

START
FIELD "number", null 0
SCALAR TYPE 2: 123
FIELD "string", null 0
SCALAR TYPE 1: val ue-丏
FIELD "bool", null 0
SCALAR TYPE 9: true
FIELD "null", null 1
SCALAR TYPE 11: null
END

I used the callbacks because that's the first method I found but it
seems like json_lex() might be easier to use in practice.

I think it's an issue that the entire string must be passed to the lexer
at once. That will not be great for large manifests. However, I don't
think it will be all that hard to implement an optional "want more"
callback in the lexer so JSON data can be fed in from the file in chunks.

So, that just leaves ereport() as the largest remaining issue? I'll
look at that today and Tuesday and see what I can work up.

Regards,
--
-David
david(at)pgmasters(dot)net

Attachment	Content-Type	Size
json-api-client-test.diff	text/plain	1.4 KB

From:	David Steele <david(at)pgmasters(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-17 17:41:10
Message-ID:	68400cc9-ed53-2d2f-4c67-b0fa3d0aab02@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Robert,

On 1/16/20 11:51 AM, Robert Haas wrote:
> On Thu, Jan 16, 2020 at 1:37 PM David Steele <david(at)pgmasters(dot)net> wrote:
>
>> So the idea here is that json.c will have the JSON SQL functions,
>> jsonb.c the JSONB SQL functions, and jsonapi.c the parser, and
>> jsonfuncs.c the utility functions?
>
> Uh, I think roughly that, yes. Although I can't claim to fully
> understand everything that's here.

Now that I've spent some time with the code I see your intent was just
to isolate the JSON lexer code with 0002 and 0003. As such, I now think
they are commit-able as is.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-17 21:24:21
Message-ID:	CA+Tgmoa7wxh+M32mWLdAsYuzYCz=_76zD3grozDy_1M1jWCp7w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 16, 2020 at 8:55 PM Andrew Dunstan
<andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
> I'm probably responsible for a good deal of the mess, so let me say Thankyou.
>
> I'll have a good look at these.

Thanks, appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-17 21:33:48
Message-ID:	CA+TgmoauCHfN_7+mT9rn9VSSMU55PpXEbLwMZU5cZW=5Je5J5w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 17, 2020 at 12:36 PM David Steele <david(at)pgmasters(dot)net> wrote:
> It seems to work just fine. I didn't stress it too hard but I did put
> in one escape and a multi-byte character and check the various data types.

Cool.

> I used the callbacks because that's the first method I found but it
> seems like json_lex() might be easier to use in practice.

Ugh, really? That doesn't seem like it would be nice at all.

> I think it's an issue that the entire string must be passed to the lexer
> at once. That will not be great for large manifests. However, I don't
> think it will be all that hard to implement an optional "want more"
> callback in the lexer so JSON data can be fed in from the file in chunks.

I thought so initially, but now I'm not so sure. The thing is, you
actually need all the manifest data in memory at once anyway, or so I
think. You're essentially doing a "full join" between the contents of
the manifest and the contents of the file system, so you've got to
scan one (probably the filesystem) and then mark entries in the other
(probably the manifest) used as you go.

But this might need more thought. The details probably depend on
exactly how you design it all.

> So, that just leaves ereport() as the largest remaining issue? I'll
> look at that today and Tuesday and see what I can work up.

PFA my work on that topic. As compared with my previous patch series,
the previous 0001 is dropped and what are now 0001 and 0002 are the
same as patches from the previous series. 0003 and 0004 are aiming
toward getting rid of ereport() and, I believe, show a plausible
strategy for so doing. There are, possibly, things not to like here,
and it's certainly incomplete, but I think I kinda like this
direction. Comments appreciated.

0003 nukes lex_accept(), inlining the logic into callers. I found that
the refactoring I wanted to do in 0004 was pretty hard without this,
and it turns out to save code, so I think this is a good idea
independently of anything else.

0004 adjusts many functions in jsonapi.c to return a new enumerated
type, JsonParseErrorType, instead of directly doing ereport(). It adds
a new function that takes this value and a lexing context and throws
an error. The JSON_ESCAPING_INVALID case is wrong and involves a gross
hack, but that's fixable with another field in the lexing context.
More work is needed to really bring this up to scratch, but the idea
is to make this code have a soft dependency on ereport() rather than a
hard one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
v2-0001-Adjust-src-include-utils-jsonapi.h-so-it-s-not-ba.patch	application/octet-stream	6.9 KB
v2-0002-Split-JSON-lexer-parser-from-json-data-type-suppo.patch	application/octet-stream	70.1 KB
v2-0004-WIP-Return-errors-rather-than-using-ereport.patch	application/octet-stream	32.4 KB
v2-0003-Remove-jsonapi.c-s-lex_accept.patch	application/octet-stream	5.9 KB

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-20 07:02:37
Message-ID:	0B442DD9-0C4D-4BC3-9753-3EC23563415A@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 16, 2020, at 1:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>>>
>>> Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
>>> migrate over to src/include/common. But that'd be far more invasive
>>> to other source files, so I've not touched the issue here.
>
>> I don't have a view on this.
>
> If anyone is hot to do this part, please have at it. I'm not.

I moved the file pg_wchar.h into src/include/common and split out
most of the functions you marked as being suitable for the
backend only into a new file src/include/utils/mbutils.h. That
resulted in the need to include this new “utils/mbutils.h” from
a number of .c files in the source tree.

One issue that came up was libpq/pqformat.h uses a couple
of those functions from within static inline functions, preventing
me from moving those to a backend-only include file without
making pqformat.h a backend-only include file.

I think the right thing to do here is to move references to these
functions into pqformat.c by un-inlining these functions. I have
not done that yet.

There are whitespace cleanup issues I’m not going to fix just
yet, since I’ll be making more changes anyway. What do you
think of the direction I’m taking in the attached?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
pg_wchar.cleanup.patch.1	application/octet-stream	74.8 KB

From:	David Steele <david(at)pgmasters(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-21 22:34:55
Message-ID:	3649c0bb-a4f2-e59b-3322-ea740a179b2f@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Robert,

On 1/17/20 2:33 PM, Robert Haas wrote:
> On Fri, Jan 17, 2020 at 12:36 PM David Steele <david(at)pgmasters(dot)net>
wrote:
>
>> I used the callbacks because that's the first method I found but it
>> seems like json_lex() might be easier to use in practice.
>
> Ugh, really? That doesn't seem like it would be nice at all.

I guess it's a matter of how you want to structure the code.

>> I think it's an issue that the entire string must be passed to the lexer
>> at once. That will not be great for large manifests. However, I don't
>> think it will be all that hard to implement an optional "want more"
>> callback in the lexer so JSON data can be fed in from the file in
chunks.
>
> I thought so initially, but now I'm not so sure. The thing is, you
> actually need all the manifest data in memory at once anyway, or so I
> think. You're essentially doing a "full join" between the contents of
> the manifest and the contents of the file system, so you've got to
> scan one (probably the filesystem) and then mark entries in the other
> (probably the manifest) used as you go.

Yeah, having a copy of the manifest in memory is the easiest way to do
validation, but I think you'd want it in a structured format.

We parse the file part of the manifest into a sorted struct array which
we can then do binary searches on by filename.

>> So, that just leaves ereport() as the largest remaining issue? I'll
>> look at that today and Tuesday and see what I can work up.
>
> PFA my work on that topic. As compared with my previous patch series,
> the previous 0001 is dropped and what are now 0001 and 0002 are the
> same as patches from the previous series. 0003 and 0004 are aiming
> toward getting rid of ereport() and, I believe, show a plausible
> strategy for so doing. There are, possibly, things not to like here,
> and it's certainly incomplete, but I think I kinda like this
> direction. Comments appreciated.
>
> 0003 nukes lex_accept(), inlining the logic into callers. I found that
> the refactoring I wanted to do in 0004 was pretty hard without this,
> and it turns out to save code, so I think this is a good idea
> independently of anything else.

No arguments here.

> 0004 adjusts many functions in jsonapi.c to return a new enumerated
> type, JsonParseErrorType, instead of directly doing ereport(). It adds
> a new function that takes this value and a lexing context and throws
> an error. The JSON_ESCAPING_INVALID case is wrong and involves a gross
> hack, but that's fixable with another field in the lexing context.
> More work is needed to really bring this up to scratch, but the idea
> is to make this code have a soft dependency on ereport() rather than a
> hard one.

My first reaction was that if we migrated ereport() first it would make
this all so much easier. Now I'm no so sure.

Having a general json parser in libcommon that is not tied into a
specific error handling/logging system actually sounds like a really
nice thing to have. If we do migrate ereport() the user would always
have the choice to call throw_a_json_error() if they wanted to.

There's also a bit of de-duplication of error messages, which is nice,
especially in the case JSON_ESCAPING_INVALID. And I agree that this
case can be fixed with another field in the lexer -- or at least so it
seems to me.

Though, throw_a_json_error() is not my favorite name. Perhaps
json_ereport()?

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-22 00:23:56
Message-ID:	CA+TgmoaiJZWt0mfi8MmOyrMoQf_MhNGiE6GjxFh-PSPXbFf+ng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 21, 2020 at 5:34 PM David Steele <david(at)pgmasters(dot)net> wrote:
> Though, throw_a_json_error() is not my favorite name. Perhaps
> json_ereport()?

That name was deliberately chosen to be dumb, with the thought that
readers would understand it was to be replaced at some point before
this was final. It sounds like it wasn't quite dumb enough to make
that totally clear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-22 18:53:08
Message-ID:	CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 21, 2020 at 7:23 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jan 21, 2020 at 5:34 PM David Steele <david(at)pgmasters(dot)net> wrote:
> > Though, throw_a_json_error() is not my favorite name. Perhaps
> > json_ereport()?
>
> That name was deliberately chosen to be dumb, with the thought that
> readers would understand it was to be replaced at some point before
> this was final. It sounds like it wasn't quite dumb enough to make
> that totally clear.

Here is a new version that is, I think, much closer what I would
consider a final form. 0001 through 0003 are as before, and unless
somebody says pretty soon that they see a problem with those or want
more time to review them, I'm going to commit them; David Steele has
endorsed all three, and they seem like independently sensible
cleanups.

0004 is a substantially cleaned up version of the patch to make the
JSON parser return a result code rather than throwing errors. Names
have been fixed, interfaces have been tidied up, and the thing is
better integrated with the surrounding code. I would really like
comments, if anyone has them, on whether this approach is acceptable.

0005 builds on 0004 by moving three functions from jsonapi.c to
jsonfuncs.c. With that done, jsonapi.c has minimal remaining
dependencies on the backend environment. It would still need a
substitute for elog(ERROR, "some internal thing is broken"); I'm
thinking of using pg_log_fatal() for that case. It would also need a
fix for the problem that pg_mblen() is not available in the front-end
environment. I don't know what to do about that yet exactly, but it
doesn't seem unsolvable. The frontend environment just needs to know
which encoding to use, and needs a way to call PQmblen() rather than
pg_mblen().

One problem with this whole thing that I just realized is that the
backup manifest file needs to store filenames, and we don't know that
the filenames we get from the filesystem are going to be valid in
utf-8 (or, for that matter, any other encoding we might want to
choose). So, just deciding that the backup manifest is always utf-8
doesn't seem like an option, unless we stick another level of escaping
in there somehow. Strictly as a theoretical matter, someone might
consider this a reason why using JSON for the backup manifest is not
necessarily the best fit, but since other arguments to that effect
have gotten me nowhere until now, I will instead request that someone
suggest to me how I ought to handle that problem.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
v3-0001-Adjust-src-include-utils-jsonapi.h-so-it-s-not-ba.patch	application/octet-stream	6.9 KB
v3-0002-Split-JSON-lexer-parser-from-json-data-type-suppo.patch	application/octet-stream	70.1 KB
v3-0005-Move-some-code-from-jsonapi.c-to-jsonfuncs.c.patch	application/octet-stream	14.3 KB
v3-0004-Adjust-pg_parse_error-so-that-it-does-not-depend-.patch	application/octet-stream	38.6 KB
v3-0003-Remove-jsonapi.c-s-lex_accept.patch	application/octet-stream	5.9 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-22 19:26:33
Message-ID:	20200122192633.GA25388@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-22, Robert Haas wrote:

> Here is a new version that is, I think, much closer what I would
> consider a final form. 0001 through 0003 are as before, and unless
> somebody says pretty soon that they see a problem with those or want
> more time to review them, I'm going to commit them; David Steele has
> endorsed all three, and they seem like independently sensible
> cleanups.

I'm not sure I see the point of keeping json.h split from jsonapi.h. It
seems to me that you could move back all the contents from jsonapi.h
into json.h, and everything would work just as well. (Evidently the
Datum in JsonEncodeDateTime's proto is problematic ... perhaps putting
that prototype in jsonfuncs.h would work.)

I don't really object to your 0001 patch as posted, though.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-22 19:57:01
Message-ID:	CA+TgmoYyA_=8k51AJ17BPr+-TQVS6fWDg=zseEqEpj+BET9dUQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 22, 2020 at 2:26 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> I'm not sure I see the point of keeping json.h split from jsonapi.h. It
> seems to me that you could move back all the contents from jsonapi.h
> into json.h, and everything would work just as well. (Evidently the
> Datum in JsonEncodeDateTime's proto is problematic ... perhaps putting
> that prototype in jsonfuncs.h would work.)
>
> I don't really object to your 0001 patch as posted, though.

The goal is to make it possible to use the JSON parser in the
frontend, and we can't do that if the header files that would have to
be included on the frontend side rely on things that only work in the
backend. As written, the patch series leaves json.h with a dependency
on Datum, so the stuff that it leaves in jsonapi.h (which is intended
to be the header that gets moved to src/common and included by
frontend code) can't be merged with it.

Now, we could obviously rearrange that. I don't think any of the file
naming here is great. But I think we probably want, as far as
possible, for the code in FOO.c to correspond to the prototypes in
FOO.h. What I'm thinking we should work towards is:

json.c/h - support for the 'json' data type
jsonb.c/h - support for the 'jsonb' data type
jsonfuncs.c/h - backend code that doesn't fit in either of the above
jsonapi.c/h - lexing/parsing code that can be used in either the
frontend or the backend

I'm not wedded to that. It just looks like the most natural thing from
where we are now.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-22 20:11:22
Message-ID:	20200122201122.GA26758@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-22, Robert Haas wrote:

> On Wed, Jan 22, 2020 at 2:26 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > I'm not sure I see the point of keeping json.h split from jsonapi.h. It
> > seems to me that you could move back all the contents from jsonapi.h
> > into json.h, and everything would work just as well. (Evidently the
> > Datum in JsonEncodeDateTime's proto is problematic ... perhaps putting
> > that prototype in jsonfuncs.h would work.)
> >
> > I don't really object to your 0001 patch as posted, though.
>
> The goal is to make it possible to use the JSON parser in the
> frontend, and we can't do that if the header files that would have to
> be included on the frontend side rely on things that only work in the
> backend. As written, the patch series leaves json.h with a dependency
> on Datum, so the stuff that it leaves in jsonapi.h (which is intended
> to be the header that gets moved to src/common and included by
> frontend code) can't be merged with it.

Right, I agree with that goal, and as I said, I don't object to your
patch as posted.

> Now, we could obviously rearrange that. I don't think any of the file
> naming here is great. But I think we probably want, as far as
> possible, for the code in FOO.c to correspond to the prototypes in
> FOO.h. What I'm thinking we should work towards is:
>
> json.c/h - support for the 'json' data type
> jsonb.c/h - support for the 'jsonb' data type
> jsonfuncs.c/h - backend code that doesn't fit in either of the above
> jsonapi.c/h - lexing/parsing code that can be used in either the
> frontend or the backend

... it would probably require more work to make this 100% attainable,
but I don't really care all that much.

> I'm not wedded to that. It just looks like the most natural thing from
> where we are now.

Let's go with it.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 03:00:35
Message-ID:	AD733159-7FD4-42EF-B3F5-252964D37FFF@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 22, 2020, at 12:11 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
> On 2020-Jan-22, Robert Haas wrote:
>
>> On Wed, Jan 22, 2020 at 2:26 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>>> I'm not sure I see the point of keeping json.h split from jsonapi.h. It
>>> seems to me that you could move back all the contents from jsonapi.h
>>> into json.h, and everything would work just as well. (Evidently the
>>> Datum in JsonEncodeDateTime's proto is problematic ... perhaps putting
>>> that prototype in jsonfuncs.h would work.)
>>>
>>> I don't really object to your 0001 patch as posted, though.
>>
>> The goal is to make it possible to use the JSON parser in the
>> frontend, and we can't do that if the header files that would have to
>> be included on the frontend side rely on things that only work in the
>> backend. As written, the patch series leaves json.h with a dependency
>> on Datum, so the stuff that it leaves in jsonapi.h (which is intended
>> to be the header that gets moved to src/common and included by
>> frontend code) can't be merged with it.
>
> Right, I agree with that goal, and as I said, I don't object to your
> patch as posted.
>
>> Now, we could obviously rearrange that. I don't think any of the file
>> naming here is great. But I think we probably want, as far as
>> possible, for the code in FOO.c to correspond to the prototypes in
>> FOO.h. What I'm thinking we should work towards is:
>>
>> json.c/h - support for the 'json' data type
>> jsonb.c/h - support for the 'jsonb' data type
>> jsonfuncs.c/h - backend code that doesn't fit in either of the above
>> jsonapi.c/h - lexing/parsing code that can be used in either the
>> frontend or the backend
>
> ... it would probably require more work to make this 100% attainable,
> but I don't really care all that much.
>
>> I'm not wedded to that. It just looks like the most natural thing from
>> where we are now.
>
> Let's go with it.

I have this done in my local repo to the point that I can build frontend tools against the json parser that is now in src/common and also run all the check-world tests without failure. I’m planning to post my work soon, possibly tonight if I don’t run out of time, but more likely tomorrow.

The main issue remaining is that my repo has a lot of stuff organized differently than Robert’s patches, so I’m trying to turn my code into a simple extension of his work rather than having my implementation compete against his.

For the curious, the code as Robert left it still relies on the DatabaseEncoding though the use of GetDatabaseEncoding, pg_mblen, and similar, and that has been changed in my patches to only rely on the database encoding in the backend, with the code in src/common taking an explicit encoding, which the backend gets in the usual way and the frontend might get with PQenv2encoding() or whatever the frontend programmer finds appropriate. Hopefully, this addresses Robert’s concern upthread about the filesystem name not necessarily being in utf8 format, though I might be misunderstanding the exact thrust of his concern. I can think of other possible interpretations of his concern as he expressed it, so I’ll wait for him to clarify.

For those who want a sneak peak, I’m attaching WIP patches to this email with all my changes, with Robert’s changes partially manually cherry-picked and the rest still unmerged. *THESE ARE NOT MEANT FOR COMMIT. THIS IS FOR ADVISORY PURPOSES ONLY.*. I have some debugging cruft left in here, too, like gcc __attribute__ stuff that won’t be in the patches I submit.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
0001-Move-pg_wchar-from-src-include-mb-to-src-include-com.patch.WIP	application/octet-stream	60.1 KB
0002-Moving-backend-only-functions-from-pg_wchar.h-into-m.patch.WIP	application/octet-stream	51.5 KB
0003-Moving-pg_wchar.h-declared-functions-from-backend-to.patch.WIP	application/octet-stream	2.2 KB
0004-Moving-jsonapi.h-to-src-include-common.patch.WIP	application/octet-stream	5.4 KB
0005-Moving-backend-functions-out-of-src-include-common-j.patch.WIP	application/octet-stream	5.6 KB
0006-Moving-common-functions-into-jsonapi.c.patch.WIP	application/octet-stream	5.5 KB
0007-Moving-json-parsing-logic-into-common.patch.WIP	application/octet-stream	5.4 KB
0008-Remove-json.c-s-lex_accept.patch.WIP	application/octet-stream	6.0 KB
0009-Making-json-parsing-work-without-throwing-exceptions.patch.WIP	application/octet-stream	43.9 KB
0010-Moving-json-parsing-into-src-common.patch.WIP	application/octet-stream	55.4 KB
0011-Adding-src-bin-pg_test_json.patch.WIP	application/octet-stream	5.0 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 17:04:00
Message-ID:	CA+Tgmobo6N8=C3FB9q+u8_n2w23iVYNxrpOfO9na1dba+m7Udw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 22, 2020 at 10:00 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> Hopefully, this addresses Robert’s concern upthread about the filesystem name not necessarily being in utf8 format, though I might be misunderstanding the exact thrust of his concern. I can think of other possible interpretations of his concern as he expressed it, so I’ll wait for him to clarify.

No, that's not it. Suppose that Álvaro Herrera has some custom
settings he likes to put on all the PostgreSQL clusters that he uses,
so he creates a file álvaro.conf and uses an "include" directive in
postgresql.conf to suck in those settings. If he also likes utf-8,
then the file name will be stored in the file system as a 12-byte
value of which the first two bytes will be 0xc3 0xa1. In that case,
everything will be fine, because JSON is supposed to always be utf-8,
and the file name is utf-8, and it's all good. But suppose he instead
likes LATIN-1. Then the file name will be stored as an 11-byte value
and the first byte will be 0xe1. The second byte, representing a
lower-case 'l', will be 0x6c. But we can't put a byte sequence that
goes 0xe1 0x6c into a JSON manifest stored as utf-8, because that's
not valid in utf-8. utf-8 requires that every byte from 0xc0-0xff be
followed by one or more bytes in the range 0x80-0xbf, and our
hypothetical file name that starts with 0xe1 0x6c does not meet that
criteria.

Now, you might say "well, why don't we just do an encoding
conversion?", but we can't. When the filesystem tells us what the file
names are, it does not tell us what encoding the person who created
those files had in mind. We don't know that they had *any* encoding in
mind. IIUC, a file in the data directory can have a name that consists
of any sequence of bytes whatsoever, so long as it doesn't contain
prohibited characters like a path separator or \0 byte. But only some
of those possible octet sequences can be stored in a manifest that has
to be valid utf-8.

The degree to which there is a practical problem here is limited by
the fact that most filenames within the data directory are chosen by
the system, e.g. base/16384/16385, and those file names are only going
to contain ASCII characters (i.e. code points 0-127) and those are
valid in utf-8 and lots of other encodings. Moreover, most people who
create additional files in the data directory will probably use ASCII
characters for those as well, at least if they are from an
English-speaking country, and if they're not, they're likely going to
use utf-8, and then they'll still be fine. But there is no rule that
says people have to do that, and if somebody wants to use file names
based around SJIS or whatever, the backup manifest functionality
should not for that reason break.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 17:23:14
Message-ID:	20200123172314.GA26054@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-23, Robert Haas wrote:

> No, that's not it. Suppose that Álvaro Herrera has some custom
> settings he likes to put on all the PostgreSQL clusters that he uses,
> so he creates a file álvaro.conf and uses an "include" directive in
> postgresql.conf to suck in those settings. If he also likes utf-8,
> then the file name will be stored in the file system as a 12-byte
> value of which the first two bytes will be 0xc3 0xa1. In that case,
> everything will be fine, because JSON is supposed to always be utf-8,
> and the file name is utf-8, and it's all good. But suppose he instead
> likes LATIN-1.

I do have files with Latin-1-encoded names in my filesystem, even though
my system is utf-8, so I understand the problem. I was wondering if it
would work to encode any non-UTF8-valid name using something like
base64; the encoded name will be plain ASCII and can be put in the
manifest, probably using a different field of the JSON object -- so for
a normal file you'd have { path => '1234/2345' } but for a
Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
Then it's the job of the tool to ensure it decodes the name to its
original form when creating/querying for the file.

A problem I have with this idea is that this is very corner-casey, so
most tool implementors will never realize that there's a need to decode
certain file names.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 17:49:58
Message-ID:	20200123174958.GA3138@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 02:23:14PM -0300, Alvaro Herrera wrote:
> On 2020-Jan-23, Robert Haas wrote:
>
> > No, that's not it. Suppose that Álvaro Herrera has some custom
> > settings he likes to put on all the PostgreSQL clusters that he uses,
> > so he creates a file álvaro.conf and uses an "include" directive in
> > postgresql.conf to suck in those settings. If he also likes utf-8,
> > then the file name will be stored in the file system as a 12-byte
> > value of which the first two bytes will be 0xc3 0xa1. In that case,
> > everything will be fine, because JSON is supposed to always be utf-8,
> > and the file name is utf-8, and it's all good. But suppose he instead
> > likes LATIN-1.
>
> I do have files with Latin-1-encoded names in my filesystem, even though
> my system is utf-8, so I understand the problem. I was wondering if it
> would work to encode any non-UTF8-valid name using something like
> base64; the encoded name will be plain ASCII and can be put in the
> manifest, probably using a different field of the JSON object -- so for
> a normal file you'd have { path => '1234/2345' } but for a
> Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
> Then it's the job of the tool to ensure it decodes the name to its
> original form when creating/querying for the file.
>
> A problem I have with this idea is that this is very corner-casey, so
> most tool implementors will never realize that there's a need to decode
> certain file names.

Another idea is to use base64 for all non-ASCII file names, so we don't
need to check if the file name is valid UTF8 before outputting --- we
just need to check for non-ASCII, which is much easier. Another
problem, though, is how do you _flag_ file names as being
base64-encoded? Use another JSON field to specify that?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 18:02:39
Message-ID:	CA+TgmoZm+d7YzVzLiskL5PQTPT5X-KfO8SmOonYb=5uV-fQ9CQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 12:24 PM Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> I do have files with Latin-1-encoded names in my filesystem, even though
> my system is utf-8, so I understand the problem. I was wondering if it
> would work to encode any non-UTF8-valid name using something like
> base64; the encoded name will be plain ASCII and can be put in the
> manifest, probably using a different field of the JSON object -- so for
> a normal file you'd have { path => '1234/2345' } but for a
> Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }.
> Then it's the job of the tool to ensure it decodes the name to its
> original form when creating/querying for the file.

Right. That's what I meant, a couple of messages back, when I
mentioned an extra layer of escaping, but your explanation here is
better because it's more detailed.

> A problem I have with this idea is that this is very corner-casey, so
> most tool implementors will never realize that there's a need to decode
> certain file names.

That's a valid concern. I would not necessarily have thought that
out-of-core tools would find a lot of use in reading them, provided
PostgreSQL itself both knows how to generate them and how to validate
them, but the interest in this topic suggests that people do care
about that.

Mostly, I think this issue shows the folly of imagining that putting
everything into JSON is a good idea because it gets rid of escaping
problems. Actually, what it does is create multiple kinds of escaping
problems. With the format I proposed, you only have to worry that the
file name might contain a tab character, because in that format, tab
is the delimiter. But, if we use JSON, then we've got the same problem
with JSON's delimiter, namely a double quote, which the JSON parser
will solve for you. We then have this additional and somewhat obscure
problem with invalidly encoded data, to which JSON itself provides no
solution. We have to invent our own, probably along the lines of what
you have just proposed. I think one can reasonably wonder whether this
is really an improvement over just inventing a way to escape tabs.

That said, there are other reasons to want to go with JSON, most of
all the fact that it's easy to see how to extend the format to
additional fields. Once you decide that each file will have an object,
you can add any keys that you like to that object and things should
scale up nicely. It has been my contention that we probably will not
find the need to add much more here, but such arguments are always
suspect and have a good chance of being wrong. Also, such prophecies
can be self-fulfilling: if the format is easy to extend, then people
may extend it, whereas if it is hard to extend, they may not try, or
they may try and then give up.

At the end of the day, I'm willing to make this work either way. I do
not think that my original proposal was bad, but there were things not
to like about it. There are also things not to like about using a
JSON-based format, and this seems to me to be a fairly significant
one. However, both sets of problems are solvable, and neither design
is awful. It's just a question of which kind of warts we like better.
To be blunt, I've already spent a lot more effort on this problem than
I would have liked, and more than 90% of it has been spent on the
issue of how to format a file that only PostgreSQL needs to read and
write. While I do not think that good file formats are unimportant, I
remain unconvinced that switching to JSON is making things better. It
seems like it's just making them different, while inflating the amount
of coding required by a fairly significant multiple.

That being said, unless somebody objects in the next few days, I am
going to assume that the people who preferred JSON over a
tab-separated file are also happy with the idea of using base-64
encoding as proposed by you above to represent files whose names are
not valid utf-8 sequences; and I will then go rewrite the patch that
generates the manifests to use that format, rewrite the validator
patch to parse that format using this infrastructure, and hopefully
end up with something that can be reviewed and committed before we run
out of time to get things done for this release. If anybody wants to
vote for another plan, please vote soon.

In the meantime, any review of the new patches I posted here yesterday
would be warmly appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 18:05:50
Message-ID:	CA+Tgmoa08oskv32jtjkC3qJ56_kKVcF0E1oLYjSKsC=Hinkdaw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 12:49 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Another idea is to use base64 for all non-ASCII file names, so we don't
> need to check if the file name is valid UTF8 before outputting --- we
> just need to check for non-ASCII, which is much easier.

I think that we have the infrastructure available to check in a
convenient way whether it's valid as utf-8, so this might not be
necessary, but I will look into it further unless there is a consensus
to go another direction entirely.

> Another
> problem, though, is how do you _flag_ file names as being
> base64-encoded? Use another JSON field to specify that?

Alvaro's proposed solution in the message to which you replied was to
call the field either 'path' or 'path_base64' depending on whether
base-64 escaping was used. That seems better to me than having a field
called 'path' and a separate field called 'is_path_base64' or
whatever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 18:11:01
Message-ID:	20200123181101.GB3138@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 01:05:50PM -0500, Robert Haas wrote:
> > Another
> > problem, though, is how do you _flag_ file names as being
> > base64-encoded? Use another JSON field to specify that?
>
> Alvaro's proposed solution in the message to which you replied was to
> call the field either 'path' or 'path_base64' depending on whether
> base-64 escaping was used. That seems better to me than having a field
> called 'path' and a separate field called 'is_path_base64' or
> whatever.

Hmm, so the JSON key name is the flag --- interesting.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 18:20:27
Message-ID:	20200123182027.GA28825@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-23, Bruce Momjian wrote:

> On Thu, Jan 23, 2020 at 01:05:50PM -0500, Robert Haas wrote:
> > > Another
> > > problem, though, is how do you _flag_ file names as being
> > > base64-encoded? Use another JSON field to specify that?
> >
> > Alvaro's proposed solution in the message to which you replied was to
> > call the field either 'path' or 'path_base64' depending on whether
> > base-64 escaping was used. That seems better to me than having a field
> > called 'path' and a separate field called 'is_path_base64' or
> > whatever.
>
> Hmm, so the JSON key name is the flag --- interesting.

Yes, because if you use the same key name, you risk a dumb tool writing
the file name as the encoded name. That's worse because it's harder to
figure out that it's wrong.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 18:22:19
Message-ID:	20200123182219.GC3138@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 03:20:27PM -0300, Alvaro Herrera wrote:
> On 2020-Jan-23, Bruce Momjian wrote:
>
> > On Thu, Jan 23, 2020 at 01:05:50PM -0500, Robert Haas wrote:
> > > > Another
> > > > problem, though, is how do you _flag_ file names as being
> > > > base64-encoded? Use another JSON field to specify that?
> > >
> > > Alvaro's proposed solution in the message to which you replied was to
> > > call the field either 'path' or 'path_base64' depending on whether
> > > base-64 escaping was used. That seems better to me than having a field
> > > called 'path' and a separate field called 'is_path_base64' or
> > > whatever.
> >
> > Hmm, so the JSON key name is the flag --- interesting.
>
> Yes, because if you use the same key name, you risk a dumb tool writing
> the file name as the encoded name. That's worse because it's harder to
> figure out that it's wrong.

Yes, good point. I think my one concern is that someone might specify
both keys in the JSON, which would be very odd.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 19:00:23
Message-ID:	CA+TgmoZ6qo_hExmDfXumTSf1QsVv4Svf9_+_qOf679ijMk_Tkg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 1:22 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Yes, good point. I think my one concern is that someone might specify
> both keys in the JSON, which would be very odd.

I think that if a tool other than PostgreSQL chooses to generate a
PostreSQL backup manifest, it must take care to do it in a manner that
is compatible with what PostgreSQL would generate. If it doesn't,
well, that sucks for them, but we can't prevent other people from
writing bad code. On a very good day, we can prevent ourselves from
writing bad code.

There is in general the question of how rigorous PostgreSQL ought to
be when validating a backup manifest. The proposal on the table is to
store four (4) fields per file: name, size, last modification time,
and checksum. So a JSON object representing a file should have four
keys, say "path", "size", "mtime", and "checksum". The "checksum" key
could perhaps be optional, in case the user disables checksums, or we
could represent that case in some other way, like "checksum" => null,
"checksum" => "", or "checksum" => "NONE". There is an almost
unlimited scope for bike-shedding here, but let's leave that to one
side for the moment.

Suppose that someone asks PostgreSQL's backup manifest verification
tool to validate a backup manifest where there's an extra key. Say, in
addition to the four keys listed in the previous paragraph, there is
an additional key, "momjian". On the one hand, our backup manifest
verification tool could take this as a sign that the manifest is
invalid, and accordingly throw an error. Or, it could assume that some
third-party backup tool generated the backup manifest and that the
"momjian" field is there to track something which is of interest to
that tool but not to PostgreSQL core, in which case it should just be
ignored.

Incidentally, some research seems to suggest that the problem of
filenames which don't form a valid utf-8 sequence cannot occur on
Windows. This blog post may be helpful in understanding the issues:

http://beets.io/blog/paths.html

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 19:04:14
Message-ID:	20200123190414.GD3138@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 02:00:23PM -0500, Robert Haas wrote:
> Incidentally, some research seems to suggest that the problem of
> filenames which don't form a valid utf-8 sequence cannot occur on
> Windows. This blog post may be helpful in understanding the issues:
>
> http://beets.io/blog/paths.html

Is there any danger of assuming a non-UTF8 sequence to be UTF8 even when
it isn't, except that it displays oddly? I am thinking of a non-UTF8
sequence that is value UTF8.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

From:	"Daniel Verite" <daniel(at)manitou-mail(dot)org>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>,"Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com>,"David Steele" <david(at)pgmasters(dot)net>,"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 19:08:40
Message-ID:	6513e2eb-cd9b-4b49-bdee-fe0c99f65b9f@manitou-mail.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:

> With the format I proposed, you only have to worry that the
> file name might contain a tab character, because in that format, tab
> is the delimiter

It could be CSV, which has this problem already solved,
is easier to parse than JSON, certainly no less popular,
and is not bound to a specific encoding.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 19:34:11
Message-ID:	CA+TgmobNbuvq2pRu0GpJJ0U=8NHkGVSponm6UJ0v5ap9sGjLbw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 2:08 PM Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> It could be CSV, which has this problem already solved,
> is easier to parse than JSON, certainly no less popular,
> and is not bound to a specific encoding.

Sure. I don't think that would look quite as nice visually as what I
proposed when inspected by humans, and our default COPY output format
is tab-separated rather than comma-separated. However, if CSV would be
more acceptable, great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 20:06:37
Message-ID:	20200123200637.GA30043@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-23, Bruce Momjian wrote:

> Yes, good point. I think my one concern is that someone might specify
> both keys in the JSON, which would be very odd.

Just make that a reason to raise an error. I think it's even possible
to specify that as a JSON Schema constraint, using a "oneOf" predicate.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-23 21:05:10
Message-ID:	C097E90F-2887-4CDD-BE5E-5EE1739C9C03@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 22, 2020, at 7:00 PM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
> I have this done in my local repo to the point that I can build frontend tools against the json parser that is now in src/common and also run all the check-world tests without failure. I’m planning to post my work soon, possibly tonight if I don’t run out of time, but more likely tomorrow.

Ok, I finished merging with Robert’s patches. The attached follow his numbering, with my patches intended to by applied after his.

I tried not to change his work too much, but I did a bit of refactoring in 0010, as explained in the commit comment.

0011 is just for verifying the linking works ok and the json parser can be invoked from a frontend tool without error — I don’t really see the point in committing it.

I ran some benchmarks for json parsing in the backend both before and after these patches, with very slight changes in runtime. The setup for the benchmark creates an unlogged table with a single text column and loads rows of json formatted text:

CREATE UNLOGGED TABLE benchmark (
j text
);
COPY benchmark (j) FROM '/Users/mark.dilger/bench/json.csv’;

FYI:

wc ~/bench/json.csv
107 34465023 503364244 /Users/mark.dilger/bench/json.csv

The benchmark itself casts the text column to jsonb, as follows:

SELECT jsonb_typeof(j::jsonb) typ, COUNT(*) FROM benchmark GROUP BY typ;

In summary, the times are:

pristine patched
————— —————
11.985 12.237
12.200 11.992
11.691 11.896
11.847 11.833
11.722 11.936

The full output for the runtimes without the patch over five iterations:

CREATE TABLE
COPY 107
typ | count
--------+-------
object | 107
(1 row)

real 0m11.985s
user 0m0.002s
sys 0m0.003s
typ | count
--------+-------
object | 107
(1 row)

real 0m12.200s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.691s
user 0m0.002s
sys 0m0.003s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.847s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.722s
user 0m0.002s
sys 0m0.003s

An with the patch, also five iterations:

CREATE TABLE
COPY 107
typ | count
--------+-------
object | 107
(1 row)

real 0m12.237s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.992s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.896s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.833s
user 0m0.002s
sys 0m0.004s
typ | count
--------+-------
object | 107
(1 row)

real 0m11.936s
user 0m0.002s
sys 0m0.004s

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
0006-Relocating-pg_wchar.h.patch	application/octet-stream	59.8 KB
0007-Adding-utils-mbutils.h.patch	application/octet-stream	51.5 KB
0008-Moving-wstrcmp-and-wstrncmp-implementations.patch	application/octet-stream	2.3 KB
0009-Relocating-jsonapi.h.patch	application/octet-stream	5.8 KB
0010-Relocating-jsonapi.c.patch	application/octet-stream	25.6 KB
0011-Adding-src-bin-pg_test_json.patch	application/octet-stream	5.0 KB

From:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 00:27:40
Message-ID:	CAA8=A79rNM-Am1nYjG-yU8RT7BOV+JUa0L_VgrwF9H1Oqsartw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 24, 2020 at 7:35 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
> > On Jan 22, 2020, at 7:00 PM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> >
> > I have this done in my local repo to the point that I can build frontend tools against the json parser that is now in src/common and also run all the check-world tests without failure. I’m planning to post my work soon, possibly tonight if I don’t run out of time, but more likely tomorrow.
>
> Ok, I finished merging with Robert’s patches. The attached follow his numbering, with my patches intended to by applied after his.
>
> I tried not to change his work too much, but I did a bit of refactoring in 0010, as explained in the commit comment.
>
> 0011 is just for verifying the linking works ok and the json parser can be invoked from a frontend tool without error — I don’t really see the point in committing it.
>
> I ran some benchmarks for json parsing in the backend both before and after these patches, with very slight changes in runtime. The setup for the benchmark creates an unlogged table with a single text column and loads rows of json formatted text:
>
> CREATE UNLOGGED TABLE benchmark (
> j text
> );
> COPY benchmark (j) FROM '/Users/mark.dilger/bench/json.csv’;
>
>
> FYI:
>
> wc ~/bench/json.csv
> 107 34465023 503364244 /Users/mark.dilger/bench/json.csv
>
> The benchmark itself casts the text column to jsonb, as follows:
>
> SELECT jsonb_typeof(j::jsonb) typ, COUNT(*) FROM benchmark GROUP BY typ;
>
> In summary, the times are:
>
> pristine patched
> ————— —————
> 11.985 12.237
> 12.200 11.992
> 11.691 11.896
> 11.847 11.833
> 11.722 11.936
>

OK, nothing noticeable there.

"accept" is a common utility I've used in the past with parsers of
this kind, but inlining it seems perfectly reasonable.

I've reviewed these patches and Robert's, and they seem basically good to me.

But I don't think src/bin is the right place for the test program. I
assume we're not going to ship this program, so it really belongs in
src/test somewhere, I think. It should also have a TAP test.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 14:17:31
Message-ID:	48F5C477-932C-4DB7-B4F6-C46F6D5AEF36@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 23, 2020, at 4:27 PM, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>
> On Fri, Jan 24, 2020 at 7:35 AM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>>
>>
>>> On Jan 22, 2020, at 7:00 PM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>>>
>>> I have this done in my local repo to the point that I can build frontend tools against the json parser that is now in src/common and also run all the check-world tests without failure. I’m planning to post my work soon, possibly tonight if I don’t run out of time, but more likely tomorrow.
>>
>> Ok, I finished merging with Robert’s patches. The attached follow his numbering, with my patches intended to by applied after his.
>>
>> I tried not to change his work too much, but I did a bit of refactoring in 0010, as explained in the commit comment.
>>
>> 0011 is just for verifying the linking works ok and the json parser can be invoked from a frontend tool without error — I don’t really see the point in committing it.
>>
>> I ran some benchmarks for json parsing in the backend both before and after these patches, with very slight changes in runtime. The setup for the benchmark creates an unlogged table with a single text column and loads rows of json formatted text:
>>
>> CREATE UNLOGGED TABLE benchmark (
>> j text
>> );
>> COPY benchmark (j) FROM '/Users/mark.dilger/bench/json.csv’;
>>
>>
>> FYI:
>>
>> wc ~/bench/json.csv
>> 107 34465023 503364244 /Users/mark.dilger/bench/json.csv
>>
>> The benchmark itself casts the text column to jsonb, as follows:
>>
>> SELECT jsonb_typeof(j::jsonb) typ, COUNT(*) FROM benchmark GROUP BY typ;
>>
>> In summary, the times are:
>>
>> pristine patched
>> ————— —————
>> 11.985 12.237
>> 12.200 11.992
>> 11.691 11.896
>> 11.847 11.833
>> 11.722 11.936
>>
>
> OK, nothing noticeable there.
>
> "accept" is a common utility I've used in the past with parsers of
> this kind, but inlining it seems perfectly reasonable.
>
> I've reviewed these patches and Robert's, and they seem basically good to me.

Thanks for the review!

> But I don't think src/bin is the right place for the test program. I
> assume we're not going to ship this program, so it really belongs in
> src/test somewhere, I think. It should also have a TAP test.

Ok, I’ll go do that; thanks for the suggestion.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 15:05:27
Message-ID:	ab359258-6268-773e-046d-1ca3b5d83afe@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-01-23 18:04, Robert Haas wrote:
> Now, you might say "well, why don't we just do an encoding
> conversion?", but we can't. When the filesystem tells us what the file
> names are, it does not tell us what encoding the person who created
> those files had in mind. We don't know that they had*any* encoding in
> mind. IIUC, a file in the data directory can have a name that consists
> of any sequence of bytes whatsoever, so long as it doesn't contain
> prohibited characters like a path separator or \0 byte. But only some
> of those possible octet sequences can be stored in a manifest that has
> to be valid utf-8.

I think it wouldn't be unreasonable to require that file names in the
database directory be consistently encoded (as defined by pg_control,
probably). After all, this information is sometimes also shown in
system views, so it's already difficult to process total junk. In
practice, this shouldn't be an onerous requirement.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 16:27:44
Message-ID:	32663.1579883264@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
> On 2020-01-23 18:04, Robert Haas wrote:
>> Now, you might say "well, why don't we just do an encoding
>> conversion?", but we can't. When the filesystem tells us what the file
>> names are, it does not tell us what encoding the person who created
>> those files had in mind. We don't know that they had*any* encoding in
>> mind. IIUC, a file in the data directory can have a name that consists
>> of any sequence of bytes whatsoever, so long as it doesn't contain
>> prohibited characters like a path separator or \0 byte. But only some
>> of those possible octet sequences can be stored in a manifest that has
>> to be valid utf-8.

> I think it wouldn't be unreasonable to require that file names in the
> database directory be consistently encoded (as defined by pg_control,
> probably). After all, this information is sometimes also shown in
> system views, so it's already difficult to process total junk. In
> practice, this shouldn't be an onerous requirement.

I don't entirely follow why we're discussing this at all, if the
requirement is backing up a PG data directory. There are not, and
are never likely to be, any legitimate files with non-ASCII names
in that context. Why can't we just skip any such files?

regards, tom lane

From:	David Steele <david(at)pgmasters(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 16:29:48
Message-ID:	ac9cabb7-acbd-e153-2785-a9f37721c167@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/23/20 11:05 AM, Robert Haas wrote:
> On Thu, Jan 23, 2020 at 12:49 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> Another idea is to use base64 for all non-ASCII file names, so we don't
>> need to check if the file name is valid UTF8 before outputting --- we
>> just need to check for non-ASCII, which is much easier.
>
> I think that we have the infrastructure available to check in a
> convenient way whether it's valid as utf-8, so this might not be
> necessary, but I will look into it further unless there is a consensus
> to go another direction entirely.
>
>> Another
>> problem, though, is how do you _flag_ file names as being
>> base64-encoded? Use another JSON field to specify that?
>
> Alvaro's proposed solution in the message to which you replied was to
> call the field either 'path' or 'path_base64' depending on whether
> base-64 escaping was used. That seems better to me than having a field
> called 'path' and a separate field called 'is_path_base64' or
> whatever.

+1. I'm not excited about this solution but don't have a better idea.

It might be nice to have a strict mode where non-ASCII/UTF8 characters
will error instead, but that can be added on later.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	David Steele <david(at)pgmasters(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 16:36:58
Message-ID:	12b96994-47c2-c87d-2c9b-710d3e052b3b@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/24/20 9:27 AM, Tom Lane wrote:
> Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
>> On 2020-01-23 18:04, Robert Haas wrote:
>>> Now, you might say "well, why don't we just do an encoding
>>> conversion?", but we can't. When the filesystem tells us what the file
>>> names are, it does not tell us what encoding the person who created
>>> those files had in mind. We don't know that they had*any* encoding in
>>> mind. IIUC, a file in the data directory can have a name that consists
>>> of any sequence of bytes whatsoever, so long as it doesn't contain
>>> prohibited characters like a path separator or \0 byte. But only some
>>> of those possible octet sequences can be stored in a manifest that has
>>> to be valid utf-8.
>
>> I think it wouldn't be unreasonable to require that file names in the
>> database directory be consistently encoded (as defined by pg_control,
>> probably). After all, this information is sometimes also shown in
>> system views, so it's already difficult to process total junk. In
>> practice, this shouldn't be an onerous requirement.
>
> I don't entirely follow why we're discussing this at all, if the
> requirement is backing up a PG data directory. There are not, and
> are never likely to be, any legitimate files with non-ASCII names
> in that context. Why can't we just skip any such files?

It's not uncommon in my experience for users to drop odd files into
PGDATA (usually versioned copies of postgresql.conf, etc.), but I agree
that it should be discouraged. Even so, I don't recall ever seeing any
non-ASCII filenames.

Skipping files sounds scary, I'd prefer an error or a warning (and then
base64 encode the filename).

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 17:00:01
Message-ID:	20200124170001.GA10552@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-24, David Steele wrote:

> It might be nice to have a strict mode where non-ASCII/UTF8 characters will
> error instead, but that can be added on later.

"your backup failed because you have a file we don't like" is not great
behavior. IIRC we already fail when a file is owned by root (or maybe
unreadable and owned by root), and it messes up severely when people
edit postgresql.conf as root. Let's not add more cases of that sort.

Maybe we can get away with *ignoring* such files, perhaps after emitting
a warning.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	David Steele <david(at)pgmasters(dot)net>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 17:06:39
Message-ID:	8dff8ca3-7efa-a369-ef37-efdad5d2504a@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/24/20 10:00 AM, Alvaro Herrera wrote:
> On 2020-Jan-24, David Steele wrote:
>
>> It might be nice to have a strict mode where non-ASCII/UTF8 characters will
>> error instead, but that can be added on later.
>
> "your backup failed because you have a file we don't like" is not great
> behavior. IIRC we already fail when a file is owned by root (or maybe
> unreadable and owned by root), and it messes up severely when people
> edit postgresql.conf as root. Let's not add more cases of that sort.

My intention was that the strict mode would not be the default, so I
don't see why it would be a big issue.

> Maybe we can get away with *ignoring* such files, perhaps after emitting
> a warning.

I'd prefer an an error (or base64 encoding) rather than just skipping a
file. The latter sounds scary.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 17:14:34
Message-ID:	A7971FA1-D8A2-4A0A-BFDD-496FEBF9DE25@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 24, 2020, at 8:36 AM, David Steele <david(at)pgmasters(dot)net> wrote:
>
>> I don't entirely follow why we're discussing this at all, if the
>> requirement is backing up a PG data directory. There are not, and
>> are never likely to be, any legitimate files with non-ASCII names
>> in that context. Why can't we just skip any such files?
>
> It's not uncommon in my experience for users to drop odd files into PGDATA (usually versioned copies of postgresql.conf, etc.), but I agree that it should be discouraged. Even so, I don't recall ever seeing any non-ASCII filenames.
>
> Skipping files sounds scary, I'd prefer an error or a warning (and then base64 encode the filename).

I tend to agree with Tom. We know that postgres doesn’t write any such files now, and if we ever decided to change that, we could change this, too. So for now, we can assume any such files are not ours. Either the user manually scribbled in this directory, or had a tool (antivirus checksum file, vim .WHATEVER.swp file, etc) that did so. Raising an error would break any automated backup process that hit this issue, and base64 encoding the file name and backing up the file contents could grab data that the user would not reasonably expect in the backup. But this argument applies equally well to such files regardless of filename encoding. It would be odd to back them up when they happen to be valid utf-8/ASCII/whatever, but not do so when they are not valid. I would expect, therefore, that we only back up files which match our expected file name pattern and ignore (perhaps with a warning) everything else.

Quoting from Robert’s email about why we want a backup manifest seems to support this idea, at least as I see it:

> So, let's suppose we invent a backup manifest. What should it contain?
> I imagine that it would consist of a list of files, and the lengths of
> those files, and a checksum for each file. I think you should have a
> choice of what kind of checksums to use, because algorithms that used
> to seem like good choices (e.g. MD5) no longer do; this trend can
> probably be expected to continue. Even if we initially support only
> one kind of checksum -- presumably SHA-something since we have code
> for that already for SCRAM -- I think that it would also be a good
> idea to allow for future changes. And maybe it's best to just allow a
> choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
> gate, so that we can avoid bikeshedding over which one is secure
> enough. I guess we'll still have to argue about the default. I also
> think that it should be possible to build a manifest with no
> checksums, so that one need not pay the overhead of computing
> checksums if one does not wish. Of course, such a manifest is of much
> less utility for checking backup integrity, but you can still check
> that you've got the right files, which is noticeably better than
> nothing. The manifest should probably also contain a checksum of its
> own contents so that the integrity of the manifest itself can be
> verified. And maybe a few other bits of metadata, but I'm not sure
> exactly what. Ideas?
>
>
>
> Once we invent the concept of a backup manifest, what do we need to do
> with them? I think we'd want three things initially:
>
>
>
> (1) When taking a backup, have the option (perhaps enabled by default)
> to include a backup manifest.
> (2) Given an existing backup that has not got a manifest, construct one.
> (3) Cross-check a manifest against a backup and complain about extra
> files, missing files, size differences, or checksum mismatches.

Nothing in there sounds to me like it needs to include random cruft.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 17:42:36
Message-ID:	20200124174236.GA11066@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-24, David Steele wrote:

> On 1/24/20 10:00 AM, Alvaro Herrera wrote:

> > Maybe we can get away with *ignoring* such files, perhaps after emitting
> > a warning.
>
> I'd prefer an an error (or base64 encoding) rather than just skipping a
> file. The latter sounds scary.

Well, if the file is "invalid" then evidently Postgres cannot possibly
care about it, so why would it care if it's missing from the backup?

I prefer the encoding scheme myself. I don't see the point of the
error.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 17:48:30
Message-ID:	29018.1579888110@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> I prefer the encoding scheme myself. I don't see the point of the
> error.

Yeah, if we don't want to skip such files, then storing them using
a base64-encoded name (with a different key than regular names)
seems plausible. But I don't really see why we'd go to that much
trouble, nor why we'd think it's likely that tools would correctly
handle a case that is going to have 0.00% usage in the field.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 17:53:18
Message-ID:	20200124175318.GA12101@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-24, Mark Dilger wrote:

> I would expect, therefore, that we only back up files which match our
> expected file name pattern and ignore (perhaps with a warning)
> everything else.

That risks missing files placed in the datadir by extensions; see
discussion about pg_checksums using a whitelist[1], which does not
translate directly to this problem, because omitting to checksum a file
is not the same as failing to copy a file into the backups.
(Essentially, the backups would be incomplete.)

[1] https://postgr.es/m/20181019171747.4uithw2sjkt6msne@alap3.anarazel.de

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 17:56:36
Message-ID:	29379.1579888596@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> On 2020-Jan-24, Mark Dilger wrote:
>> I would expect, therefore, that we only back up files which match our
>> expected file name pattern and ignore (perhaps with a warning)
>> everything else.

> That risks missing files placed in the datadir by extensions;

I agree that assuming we know everything that will appear in the
data directory is a pretty unsafe assumption. But no rational
extension is going to use a non-ASCII file name, either, if only
because it can't predict what the filesystem encoding will be.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 17:56:53
Message-ID:	CA+TgmoYbih=cNJfWAkEXFir17m7EXC2U6s7WxVT=iBOyqtYRrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 24, 2020 at 9:48 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > I prefer the encoding scheme myself. I don't see the point of the
> > error.
>
> Yeah, if we don't want to skip such files, then storing them using
> a base64-encoded name (with a different key than regular names)
> seems plausible. But I don't really see why we'd go to that much
> trouble, nor why we'd think it's likely that tools would correctly
> handle a case that is going to have 0.00% usage in the field.

I mean, I gave a not-totally-unrealistic example of how this could
happen upthread. I agree it's going to be rare, but it's not usually
OK to decide that if a user does something a little unusual,
not-obviously-related features subtly break.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 18:03:12
Message-ID:	20200124180312.GA12664@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-24, Tom Lane wrote:

> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > On 2020-Jan-24, Mark Dilger wrote:
> >> I would expect, therefore, that we only back up files which match our
> >> expected file name pattern and ignore (perhaps with a warning)
> >> everything else.
>
> > That risks missing files placed in the datadir by extensions;
>
> I agree that assuming we know everything that will appear in the
> data directory is a pretty unsafe assumption. But no rational
> extension is going to use a non-ASCII file name, either, if only
> because it can't predict what the filesystem encoding will be.

I see two different arguments. One is about the file encoding. Those
files are rare and would be placed by the user manually. We can fix
that by encoding the name. We can have a debug mode that encodes all
names that way, just to ensure the tools are prepared for it.

The other is Mark's point about "expected file pattern", which seems a
slippery slope to me. If the pattern is /^[a-zA-Z0-9_.]*$/ then I'm
okay with it (maybe add a few other punctuation chars); as you say no
sane extension would use names much weirder than that. But we should
not be stricter, such as counting the number of periods/underscores
allowed or where are alpha chars expected (except maybe disallow period
at start of filename), or anything too specific like that.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 18:16:12
Message-ID:	c323af76-0f3d-02d8-bb13-f6f2d6dc5281@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-01-24 18:56, Robert Haas wrote:
> On Fri, Jan 24, 2020 at 9:48 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>>> I prefer the encoding scheme myself. I don't see the point of the
>>> error.
>>
>> Yeah, if we don't want to skip such files, then storing them using
>> a base64-encoded name (with a different key than regular names)
>> seems plausible. But I don't really see why we'd go to that much
>> trouble, nor why we'd think it's likely that tools would correctly
>> handle a case that is going to have 0.00% usage in the field.
>
> I mean, I gave a not-totally-unrealistic example of how this could
> happen upthread. I agree it's going to be rare, but it's not usually
> OK to decide that if a user does something a little unusual,
> not-obviously-related features subtly break.

Another example might be log files under pg_log with localized weekday
or month names. (Maybe we're not planning to back up log files, but the
routines that deal with file names should probably be prepared to at
least look at the name and decide that they don't care about it rather
than freaking out right away.)

I'm not fond of the base64 idea btw., because it seems to sort of
penalize using non-ASCII characters by making the result completely not
human readable. Something along the lines of MIME would be better in
that way. There are existing solutions to storing data with metadata
around it.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 18:16:42
Message-ID:	D0450D6D-69C2-43AE-A9DA-239DF2AFFA60@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 24, 2020, at 10:03 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
> The other is Mark's point about "expected file pattern", which seems a
> slippery slope to me. If the pattern is /^[a-zA-Z0-9_.]*$/ then I'm
> okay with it (maybe add a few other punctuation chars); as you say no
> sane extension would use names much weirder than that. But we should
> not be stricter, such as counting the number of periods/underscores
> allowed or where are alpha chars expected (except maybe disallow period
> at start of filename), or anything too specific like that.

What bothered me about skipping files based only on encoding is that it creates hard to anticipate bugs. If extensions embed something, like a customer name, into a filename, and that something is usually ASCII, or usually valid utf-8, and gets backed up, but then some day they embed something that is not ASCII/utf-8, then it does not get backed up, and maybe nobody notices until they actually *need* the backup, and it’s too late.

We either need to be really strict about what gets backed up, so that nobody gets a false sense of security about what gets included in that list, or we need to be completely permissive, which would include files named in arbitrary encodings. I don’t see how it does anybody any favors to make the system appear to back up everything until you hit this unanticipated case and then it fails.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 18:43:07
Message-ID:	CA+TgmoYw5_VMggwa+ima1wDJ-UY2y+GVT506D4ya1fwybDvEWw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 23, 2020 at 1:05 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> Ok, I finished merging with Robert’s patches. The attached follow his numbering, with my patches intended to by applied after his.

I think it'd be a good idea to move the pg_wchar.h stuff into a new
thread. This thread is getting a bit complicated, because we've got
(1) the patches need to do $SUBJECT plus (2) additional patches that
clean up the multibyte stuff more plus (3) discussion of issues that
pertain to the backup manifest thread. To my knowledge, $SUBJECT
doesn't strictly require the pg_wchar.h changes, so I suggest we try
to segregate those.

> I tried not to change his work too much, but I did a bit of refactoring in 0010, as explained in the commit comment.

Hmm, I generally prefer to avoid these kinds of macro tricks because I
think they can be confusing to the reader. It's worth it in a case
like equalfuncs.c where so much boilerplate code is saved that the
gain in readability more than makes up for having to go check what the
macros do -- but I don't feel that's the case here. There aren't
*that* many call sites, and I think the code will be easier to
understand without "return" statements concealed within macros...

> I ran some benchmarks for json parsing in the backend both before and after these patches, with very slight changes in runtime.

Cool, thanks.

Since 0001-0003 have been reviewed by multiple people and nobody's
objected, I have committed those. But I made a hash of it: the first
one, I failed to credit any reviewers, or include a Discussion link,
and I just realized that I should have listed Alvaro's name as a
reviewer also. Sorry about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Steele <david(at)pgmasters(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 19:08:24
Message-ID:	20200124190824.GA13624@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2020-Jan-24, Peter Eisentraut wrote:

> I'm not fond of the base64 idea btw., because it seems to sort of penalize
> using non-ASCII characters by making the result completely not human
> readable. Something along the lines of MIME would be better in that way.
> There are existing solutions to storing data with metadata around it.

You mean quoted-printable? That works for me.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-24 19:50:00
Message-ID:	4F0F5B2F-6907-479B-9664-21090BE42B5C@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 24, 2020, at 10:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Since 0001-0003 have been reviewed by multiple people and nobody's
> objected, I have committed those.

I think 0004-0005 have been reviewed and accepted by both me and Andrew, if I understood him correctly:

> I've reviewed these patches and Robert's, and they seem basically good to me.

Certainly, nothing in those two patches caused me any concern. I’m going to modify my patches as you suggested, get rid of the INSIST macro, and move the pg_wchar changes to their own thread. None of that should require changes in your 0004 or 0005. It won’t bother me if you commit those two. Andrew?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-26 19:24:39
Message-ID:	74C3FE62-3834-40C9-90FF-B77E70B69A52@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 22, 2020, at 10:53 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> 0004 is a substantially cleaned up version of the patch to make the
> JSON parser return a result code rather than throwing errors. Names
> have been fixed, interfaces have been tidied up, and the thing is
> better integrated with the surrounding code. I would really like
> comments, if anyone has them, on whether this approach is acceptable.
>
> 0005 builds on 0004 by moving three functions from jsonapi.c to
> jsonfuncs.c. With that done, jsonapi.c has minimal remaining
> dependencies on the backend environment. It would still need a
> substitute for elog(ERROR, "some internal thing is broken"); I'm
> thinking of using pg_log_fatal() for that case. It would also need a
> fix for the problem that pg_mblen() is not available in the front-end
> environment. I don't know what to do about that yet exactly, but it
> doesn't seem unsolvable. The frontend environment just needs to know
> which encoding to use, and needs a way to call PQmblen() rather than
> pg_mblen().

I have completed the work in the attached 0006 and 0007 patches.
These are intended to apply after your 0004 and 0005; they won’t
work directly on master which, as of this writing, only contains your
0001-0003 patches.

0006 finishes moving the json parser to src/include/common and src/common.

0007 adds testing.

I would appreciate somebody looking at the portability issues for 0007
on Windows.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
v4-0006-Relocating-jsonapi-to-common.patch	application/octet-stream	11.5 KB
v4-0007-Adding-frontend-tests-for-json-parser.patch	application/octet-stream	9.9 KB

From:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-27 00:20:53
Message-ID:	CAA8=A7-ttNPzkS0AizKr+iJAtkx9LMhmQ3-PTg74v838mcnpuQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 25, 2020 at 6:20 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
>
> > On Jan 24, 2020, at 10:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > Since 0001-0003 have been reviewed by multiple people and nobody's
> > objected, I have committed those.
>
> I think 0004-0005 have been reviewed and accepted by both me and Andrew, if I understood him correctly:
>
> > I've reviewed these patches and Robert's, and they seem basically good to me.
>
> Certainly, nothing in those two patches caused me any concern. I’m going to modify my patches as you suggested, get rid of the INSIST macro, and move the pg_wchar changes to their own thread. None of that should require changes in your 0004 or 0005. It won’t bother me if you commit those two. Andrew?
>

Just reviewed the latest versions of 4 and 5, they look good to me.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-27 01:09:53
Message-ID:	CAA8=A7-CQNhgTqOniaEZKUe8EfNMDxwUyyn7rucYKj3-k=YQ8Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 27, 2020 at 5:54 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
>
> > On Jan 22, 2020, at 10:53 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > 0004 is a substantially cleaned up version of the patch to make the
> > JSON parser return a result code rather than throwing errors. Names
> > have been fixed, interfaces have been tidied up, and the thing is
> > better integrated with the surrounding code. I would really like
> > comments, if anyone has them, on whether this approach is acceptable.
> >
> > 0005 builds on 0004 by moving three functions from jsonapi.c to
> > jsonfuncs.c. With that done, jsonapi.c has minimal remaining
> > dependencies on the backend environment. It would still need a
> > substitute for elog(ERROR, "some internal thing is broken"); I'm
> > thinking of using pg_log_fatal() for that case. It would also need a
> > fix for the problem that pg_mblen() is not available in the front-end
> > environment. I don't know what to do about that yet exactly, but it
> > doesn't seem unsolvable. The frontend environment just needs to know
> > which encoding to use, and needs a way to call PQmblen() rather than
> > pg_mblen().
>
> I have completed the work in the attached 0006 and 0007 patches.
> These are intended to apply after your 0004 and 0005; they won’t
> work directly on master which, as of this writing, only contains your
> 0001-0003 patches.
>
> 0006 finishes moving the json parser to src/include/common and src/common.
>
> 0007 adds testing.
>
> I would appreciate somebody looking at the portability issues for 0007
> on Windows.
>

We'll need at a minimum something added to src/tools/msvc to build the
test program, maybe some other stuff too. I'll take a look.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-27 01:11:11
Message-ID:	76777B68-A038-4DE3-900F-145B0BE0165D@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 26, 2020, at 5:09 PM, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>
> We'll need at a minimum something added to src/tools/msvc to build the
> test program, maybe some other stuff too. I'll take a look.

Thanks, much appreciated.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-27 01:51:58
Message-ID:	CAA8=A7_4jnnbEzY55T5ZfJ4fWs-rjq4kgd-t1m02ik+-XkuECQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> > 0007 adds testing.
> >
> > I would appreciate somebody looking at the portability issues for 0007
> > on Windows.
> >
>
> We'll need at a minimum something added to src/tools/msvc to build the
> test program, maybe some other stuff too. I'll take a look.

Patch complains that the 0007 patch is malformed:

andrew(at)ariana:pg_head (master)*$ patch -p 1 <
~/Downloads/v4-0007-Adding-frontend-tests-for-json-parser.patch
patching file src/Makefile
patching file src/test/Makefile
patching file src/test/bin/.gitignore
patching file src/test/bin/Makefile
patching file src/test/bin/README
patching file src/test/bin/t/001_test_json.pl
patch: **** malformed patch at line 201: diff --git
a/src/test/bin/test_json.c b/src/test/bin/test_json.c

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-27 02:05:24
Message-ID:	F18B5CD0-B9E8-40EC-9A87-9E7F1990171F@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 26, 2020, at 5:51 PM, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>
>>> 0007 adds testing.
>>>
>>> I would appreciate somebody looking at the portability issues for 0007
>>> on Windows.
>>>
>>
>> We'll need at a minimum something added to src/tools/msvc to build the
>> test program, maybe some other stuff too. I'll take a look.
>
>
> Patch complains that the 0007 patch is malformed:
>
> andrew(at)ariana:pg_head (master)*$ patch -p 1 <
> ~/Downloads/v4-0007-Adding-frontend-tests-for-json-parser.patch
> patching file src/Makefile
> patching file src/test/Makefile
> patching file src/test/bin/.gitignore
> patching file src/test/bin/Makefile
> patching file src/test/bin/README
> patching file src/test/bin/t/001_test_json.pl
> patch: **** malformed patch at line 201: diff --git
> a/src/test/bin/test_json.c b/src/test/bin/test_json.c

I manually removed a stray newline in the patch file. I shouldn’t have done that. I’ve removed the stray newline in the sources, committed (with git commit —amend) and am testing again, which is what I should have done the first time….

Ok, the tests pass. Here are those two patches again, both regenerated with a fresh invocation of ‘git format-patch’.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
v5-0006-Relocating-jsonapi-to-common.patch	application/octet-stream	11.5 KB
v5-0007-Adding-frontend-tests-for-json-parser.patch	application/octet-stream	9.9 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-27 13:30:27
Message-ID:	CA+TgmoZffsCGkOQkopwQhnrz5p8vVLTL+SKh+=4nN4jQCSVJOw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 26, 2020 at 9:05 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> Ok, the tests pass. Here are those two patches again, both regenerated with a fresh invocation of ‘git format-patch’.

Regarding 0006:

+#ifndef FRONTEND
#include "miscadmin.h"
-#include "utils/jsonapi.h"
+#endif

I suggest

#ifdef FRONTEND
#define check_stack_depth()
#else
#include "miscadmin.h"
#endif

- lex->token_terminator = s + pg_mblen(s);
+ lex->token_terminator = s +
pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);

Can we use pq_encoding_mblen() here? Regardless, it doesn't seem great
to add more direct references to pg_wchar_table. I think we should
avoid that.

+ return JSON_BAD_PARSER_STATE;

I don't like this, either. I'm thinking about adding some
variable-argument macros that either elog() in backend code or else
pg_log_fatal() and exit(1) in frontend code. There are some existing
precedents already (e.g. rmtree.c, pgfnames.c) which could perhaps be
generalized. I think I'll start a new thread about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-27 19:02:16
Message-ID:	CAKYtNAr_PzXtcx8DNQg_KERiqv25O-mx2bOb_dptBbLOSVwbug@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 27 Jan 2020 at 19:00, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sun, Jan 26, 2020 at 9:05 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > Ok, the tests pass. Here are those two patches again, both regenerated with a fresh invocation of ‘git format-patch’.
>
> Regarding 0006:
>
> +#ifndef FRONTEND
> #include "miscadmin.h"
> -#include "utils/jsonapi.h"
> +#endif
>
> I suggest
>
> #ifdef FRONTEND
> #define check_stack_depth()
> #else
> #include "miscadmin.h"
> #endif
>
> - lex->token_terminator = s + pg_mblen(s);
> + lex->token_terminator = s +
> pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
>
> Can we use pq_encoding_mblen() here? Regardless, it doesn't seem great
> to add more direct references to pg_wchar_table. I think we should
> avoid that.
>
> + return JSON_BAD_PARSER_STATE;
>
> I don't like this, either. I'm thinking about adding some
> variable-argument macros that either elog() in backend code or else
> pg_log_fatal() and exit(1) in frontend code. There are some existing
> precedents already (e.g. rmtree.c, pgfnames.c) which could perhaps be
> generalized. I think I'll start a new thread about that.
>

Hi,
I can see one warning on HEAD.

jsonapi.c: In function ‘json_errdetail’:
jsonapi.c:1068:1: warning: control reaches end of non-void function
[-Wreturn-type]
}
^

Attaching a patch to fix warning.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
Fixed_compiler_warning_json_errdetail.patch	application/octet-stream	614 bytes

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-27 20:05:44
Message-ID:	CA41881F-6DDF-4E7C-8EA2-A01925913BB7@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 27, 2020, at 5:30 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sun, Jan 26, 2020 at 9:05 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> Ok, the tests pass. Here are those two patches again, both regenerated with a fresh invocation of ‘git format-patch’.
>
> Regarding 0006:
>
> +#ifndef FRONTEND
> #include "miscadmin.h"
> -#include "utils/jsonapi.h"
> +#endif
>
> I suggest
>
> #ifdef FRONTEND
> #define check_stack_depth()
> #else
> #include "miscadmin.h"
> #endif

Sure, we can do it that way.

> - lex->token_terminator = s + pg_mblen(s);
> + lex->token_terminator = s +
> pg_wchar_table[lex->input_encoding].mblen((const unsigned char *) s);
>
> Can we use pq_encoding_mblen() here? Regardless, it doesn't seem great
> to add more direct references to pg_wchar_table. I think we should
> avoid that.

Yes, that looks a lot cleaner.

>
> + return JSON_BAD_PARSER_STATE;
>
> I don't like this, either. I'm thinking about adding some
> variable-argument macros that either elog() in backend code or else
> pg_log_fatal() and exit(1) in frontend code. There are some existing
> precedents already (e.g. rmtree.c, pgfnames.c) which could perhaps be
> generalized. I think I'll start a new thread about that.

Right, you started the "pg_croak, or something like it?” thread, which already looks like it might not be resolved quickly. Can we use the

#ifndef FRONTEND
#define pg_log_warning(...) elog(WARNING, __VA_ARGS__)
#else
#include "common/logging.h"
#endif

pattern here as a place holder, and revisit it along with the other couple instances of this pattern if/when the “pg_croak, or something like it?” thread is ready for commit? I’m calling it json_log_and_abort(…) for now, as I can’t hope to guess what the final name will be.

I’m attaching a new patch set with these three changes including Mahendra’s patch posted elsewhere on this thread.

Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
v6-0001-Relocating-jsonapi-to-common.patch	application/octet-stream	11.0 KB
v6-0002-Fixed-warning-for-json_errdetail.patch	application/octet-stream	848 bytes
v6-0003-Adding-frontend-tests-for-json-parser.patch	application/octet-stream	9.9 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 15:06:23
Message-ID:	CA+TgmoYz+G5xvyuaTDqkQThMZPF7h1JNeHJ1UZghnGjrYSVJ+w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
<mahi6run(at)gmail(dot)com> wrote:
> I can see one warning on HEAD.
>
> jsonapi.c: In function ‘json_errdetail’:
> jsonapi.c:1068:1: warning: control reaches end of non-void function
> [-Wreturn-type]
> }
> ^
>
> Attaching a patch to fix warning.

Hmm, I don't get a warning there. This function is a switch over an
enum type with a case for every value of the enum, and every branch
either does a "return" or an "elog," so any code after the switch
should be unreachable. It's possible your compiler is too dumb to know
that, but I thought there were other places in the code base where we
assumed that if we handled every defined value of enum, that was good
enough.

But maybe not. I found similar coding in CreateDestReceiver(), and
that ends with:

/* should never get here */
pg_unreachable();

So perhaps we need the same thing here. Does adding that fix it for you?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 15:17:09
Message-ID:	CA+TgmoZ6exi7MoEb6_MT7JhkbBJM7D=9-HYuJk+1K9WApsiVTg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> I’m attaching a new patch set with these three changes including Mahendra’s patch posted elsewhere on this thread.
>
> Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.

OK, so I think this is getting close.

What is now 0001 manages to have four (4) conditionals on FRONTEND at
the top of the file. This seems like at least one two many. I am OK
with this being separate:

+#ifndef FRONTEND
#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif

postgres(_fe).h has pride of place among includes, so it's reasonable
to put this in its own section like this.

+#ifdef FRONTEND
+#define check_stack_depth()
+#define json_log_and_abort(...) pg_log_fatal(__VA_ARGS__); exit(1);
+#else
+#define json_log_and_abort(...) elog(ERROR, __VA_ARGS__)
+#endif

OK, so here we have a section entirely devoted to our own file-local
macros. Also reasonable. But in between, you have both an #ifdef
FRONTEND and an #ifndef FRONTEND for other includes, and I really
think that should be like #ifdef FRONTEND .. #else .. #endif.

Also, the preprocessor police are on their way to your house now to
arrest you for that first one. You need to write it like this:

#define json_log_and_abort(...) \
do { pg_log_fatal(__VA_ARGS__); exit(1); } while (0)

Otherwise, hilarity ensues if somebody writes if (my_code_is_buggy)
json_log_and_abort("oops").

{
- JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+ JsonLexContext *lex;
+
+#ifndef FRONTEND
+ lex = palloc0(sizeof(JsonLexContext));
+#else
+ lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+ memset(lex, 0, sizeof(JsonLexContext));
+#endif

Instead of this, how making no change at all here?

- default:
- elog(ERROR, "unexpected json parse state: %d", ctx);
}
+
+ /* Not reached */
+ json_log_and_abort("unexpected json parse state: %d", ctx);

This, too, seems unnecessary.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 15:21:05
Message-ID:	CAOBaU_b2yMS1WyUzMGCguCXN0ubsz6zDUPQQiuGfFBLp9KdH+g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2020 at 4:06 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
> <mahi6run(at)gmail(dot)com> wrote:
> > I can see one warning on HEAD.
> >
> > jsonapi.c: In function ‘json_errdetail’:
> > jsonapi.c:1068:1: warning: control reaches end of non-void function
> > [-Wreturn-type]
> > }
> > ^
> >
> > Attaching a patch to fix warning.
>
> Hmm, I don't get a warning there. This function is a switch over an
> enum type with a case for every value of the enum, and every branch
> either does a "return" or an "elog," so any code after the switch
> should be unreachable. It's possible your compiler is too dumb to know
> that, but I thought there were other places in the code base where we
> assumed that if we handled every defined value of enum, that was good
> enough.
>
> But maybe not. I found similar coding in CreateDestReceiver(), and
> that ends with:
>
> /* should never get here */
> pg_unreachable();
>
> So perhaps we need the same thing here. Does adding that fix it for you?

FTR this has unfortunately the same result on Thomas' automatic patch
tester, e.g. https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

From:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 15:30:05
Message-ID:	CAKYtNAr1Bfv7i6ZyCuABarvj12F1++1KGmRs98j7ivomew-WyA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 28 Jan 2020 at 20:36, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
> <mahi6run(at)gmail(dot)com> wrote:
> > I can see one warning on HEAD.
> >
> > jsonapi.c: In function ‘json_errdetail’:
> > jsonapi.c:1068:1: warning: control reaches end of non-void function
> > [-Wreturn-type]
> > }
> > ^
> >
> > Attaching a patch to fix warning.
>
> Hmm, I don't get a warning there. This function is a switch over an
> enum type with a case for every value of the enum, and every branch
> either does a "return" or an "elog," so any code after the switch
> should be unreachable. It's possible your compiler is too dumb to know
> that, but I thought there were other places in the code base where we
> assumed that if we handled every defined value of enum, that was good
> enough.
>
> But maybe not. I found similar coding in CreateDestReceiver(), and
> that ends with:
>
> /* should never get here */
> pg_unreachable();
>
> So perhaps we need the same thing here. Does adding that fix it for you?
>

Hi Robert,
Tom Lane already fixed this and committed yesterday(4589c6a2a30faba53d0655a8e).

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 15:31:22
Message-ID:	CA+TgmoYobK-JnsxOKPau_5h6VJP7M-oKTDqy19XCVjVEXu6hqw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2020 at 10:30 AM Mahendra Singh Thalor
<mahi6run(at)gmail(dot)com> wrote:
> Tom Lane already fixed this and committed yesterday(4589c6a2a30faba53d0655a8e).

Oops. OK, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 16:11:38
Message-ID:	CA+TgmoY5hfnHYYcp6FR0QqQuu+pVSt-=xW3iRSVE0hHFn_AadQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> FTR this has unfortunately the same result on Thomas' automatic patch
> tester, e.g. https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

That's unfortunate ... but presumably Tom's changes took care of this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 16:26:29
Message-ID:	12800.1580228789@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>> FTR this has unfortunately the same result on Thomas' automatic patch
>> tester, e.g. https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

> That's unfortunate ... but presumably Tom's changes took care of this?

Probably the cfbot just hasn't retried this build since that fix.
I don't know what its algorithm is for retrying failed builds, but it
does seem to do so after awhile.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 16:32:33
Message-ID:	CA+Tgmoa8m=w7f=EZAaJxmuQ_KX+QvjZx3GqWzZMpnkzcaRhekg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.

I think the first question for 0005 is whether want this at all.
Initially, you proposed NOT committing it, but then Andrew reviewed it
as if it were for commit. I'm not sure whether he was actually saying
that it ought to be committed, though, or whether he just missed your
remarks on the topic. Nobody else has really taken a position. I'm not
100% convinced that it's necessary to include this, but I'm also not
particularly opposed to it. It's a fairly small amount of code, which
is nice, and perhaps useful as a demonstration of how to use the JSON
parser in a frontend application, which someone also might find nice.
Anyone else want to express an opinion?

Meanwhile, here is a round of nitp^H^H^H^Hreview:

-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into the subdirectory "modules"
+# nor "bin".

I would probably have just changed this to:

# installcheck and install should not recurse into "modules" or "bin"

The details are arguable, but you definitely shouldn't say "the
subdirectory" and then list two of them.

+This directory contains a set of programs that exercise functionality declared
+in src/include/common and defined in src/common. The purpose of these programs
+is to verify that code intended to work both from frontend and backend code do
+indeed work when compiled and used in frontend code. The structure of this
+directory makes no attempt to test that such code works in the backend, as the
+backend has its own tests already, and presumably those tests sufficiently
+exercide the code as used by it.

"exercide" is not spelled correctly, but I also disagree with giving
the directory so narrow a charter. I think you should just say
something like:

This directory contains programs that are built and executed for
testing purposes,
but never installed. It may be used, for example, to test that code in
src/common
works in frontend environments.

+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that
directly. This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');

I don't know what the right thing to do here is. Perhaps someone more
familiar with TAP testing can comment.

+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));

Do we need this? I guess we're not likely to bother with translations
for a test program.

+ /*
+ * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
+ * too, which it should already be everywhere except sometimes in Windows.
+ */
+ setbuf(stdout, NULL);
+ setbuf(stderr, NULL);

Do we need this? If so, why?

+ char *json;
+ unsigned int json_len;
+ JsonLexContext *lex;
+ int client_encoding;
+ JsonParseErrorType parse_result;
+
+ json_len = (unsigned int) strlen(str);
+ client_encoding = PQenv2encoding();
+
+ json = strdup(str);
+ lex = makeJsonLexContextCstringLen(json, strlen(json),
client_encoding, true /* need_escapes */);
+ parse_result = pg_parse_json(lex, &nullSemAction);
+ fprintf(stdout, _("%s\n"), (JSON_SUCCESS == parse_result ? "VALID" :
"INVALID"));
+ return;

json_len is set but not used.

Not entirely sure why we are using PQenv2encoding() here.

The trailing return is unnecessary.

I think it would be a good idea to use json_errdetail() in the failure
case, print the error, and have the tests check that we got the
expected error.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 16:34:57
Message-ID:	13142.1580229297@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 28, 2020 at 10:30 AM Mahendra Singh Thalor
> <mahi6run(at)gmail(dot)com> wrote:
>> Tom Lane already fixed this and committed yesterday(4589c6a2a30faba53d0655a8e).

> Oops. OK, thanks.

Yeah, there were multiple issues here:

1. If a switch is expected to cover all values of an enum type,
we now prefer not to have a default: case, so that we'll get
compiler warnings if somebody adds an enum value and fails to
update the switch.

2. Without a default:, though, you need to have after-the-switch
code to catch the possibility that the runtime value was not a
legal enum element. Some compilers are trusting and assume that
that's not a possible case, but some are not (and Coverity will
complain about it too).

3. Some compilers still don't understand that elog(ERROR) doesn't
return, so you need a dummy return. Perhaps pg_unreachable()
would do as well, but project style has been the dummy return for
a long time ... and I'm not entirely convinced by the assumption
that every compiler understands pg_unreachable(), anyway.

(I know Robert knows all this stuff, even if he momentarily
forgot. Just summarizing for onlookers.)

regards, tom lane

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 16:51:28
Message-ID:	CAOBaU_ZW0u7ZBUywoZ4y=9i6o0SyA7AyXE=kerKE8ozfBOO=Uw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2020 at 5:26 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >> FTR this has unfortunately the same result on Thomas' automatic patch
> >> tester, e.g. https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968
>
> > That's unfortunate ... but presumably Tom's changes took care of this?
>
> Probably the cfbot just hasn't retried this build since that fix.
> I don't know what its algorithm is for retrying failed builds, but it
> does seem to do so after awhile.

Yes, I think to remember that Thomas put some rules to avoid
rebuilding everything all the time. Patches that was rebuilt since
indeed starting to get back to green, so it's all good!

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 17:20:07
Message-ID:	CA+TgmoZwbYW2yXZD+HV-mQdg4BR2sh9JZQT12sQNOt+R+daxGw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2020 at 11:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> 3. Some compilers still don't understand that elog(ERROR) doesn't
> return, so you need a dummy return. Perhaps pg_unreachable()
> would do as well, but project style has been the dummy return for
> a long time ... and I'm not entirely convinced by the assumption
> that every compiler understands pg_unreachable(), anyway.

Is the example of CreateDestReceiver() sufficient to show that this is
not a problem in practice?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 18:16:18
Message-ID:	17499.1580235378@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 28, 2020 at 11:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 3. Some compilers still don't understand that elog(ERROR) doesn't
>> return, so you need a dummy return. Perhaps pg_unreachable()
>> would do as well, but project style has been the dummy return for
>> a long time ... and I'm not entirely convinced by the assumption
>> that every compiler understands pg_unreachable(), anyway.

> Is the example of CreateDestReceiver() sufficient to show that this is
> not a problem in practice?

Dunno. I don't see any warnings about that in the buildfarm, but
that's not a very large sample of non-gcc compilers.

Another angle here is that on non-gcc compilers, pg_unreachable()
is going to expand to an abort() call, which is likely to eat more
code space than a dummy "return 0".

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 18:32:26
Message-ID:	18144.1580236346@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Is the example of CreateDestReceiver() sufficient to show that this is
>> not a problem in practice?

> Dunno. I don't see any warnings about that in the buildfarm, but
> that's not a very large sample of non-gcc compilers.

BTW, now that I think about it, CreateDestReceiver is not up to project
standards anyway, in that it fails to provide reasonable behavior in
the case where what's passed is not a legal value of the enum.
What you'll get, if you're lucky, is a SIGABRT crash with no
indication of the cause --- or if you're not lucky, some very
hard-to-debug crash elsewhere as a result of the function returning
a garbage pointer. So independently of whether the current coding
suppresses compiler warnings reliably, I think we ought to replace it
with elog()-and-return-NULL. Admittedly, that's wasting a few bytes
on a case that should never happen ... but we haven't ever hesitated
to do that elsewhere, if it'd make the problem more diagnosable.

IOW, there's a good reason why there are exactly no other uses
of that coding pattern.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 19:03:10
Message-ID:	CA+TgmoY=vJm2uG5xzu_EaO4n40gsXXdweSZcHijgV5EEJdEHcg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2020 at 1:32 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> BTW, now that I think about it, CreateDestReceiver is not up to project
> standards anyway, in that it fails to provide reasonable behavior in
> the case where what's passed is not a legal value of the enum.
> What you'll get, if you're lucky, is a SIGABRT crash with no
> indication of the cause --- or if you're not lucky, some very
> hard-to-debug crash elsewhere as a result of the function returning
> a garbage pointer. So independently of whether the current coding
> suppresses compiler warnings reliably, I think we ought to replace it
> with elog()-and-return-NULL. Admittedly, that's wasting a few bytes
> on a case that should never happen ... but we haven't ever hesitated
> to do that elsewhere, if it'd make the problem more diagnosable.

Well, I might be responsible for the CreateDestReceiver thing -- or I
might not, I haven't checked -- but I do think that style is a bit
cleaner and more elegant. I think it's VERY unlikely that anyone would
ever manage to call it with something that's not a legal value of the
enum, and if they do, I think the chances of surviving are basically
nil, and frankly, I'd rather die. If you asked me where you want me to
store my output and I tell you to store it in the sdklgjsdjgslkdg, you
really should refuse to do anything at all, not just stick my output
someplace-or-other and hope for the best.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 19:29:46
Message-ID:	26911.1580239786@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 28, 2020 at 1:32 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, now that I think about it, CreateDestReceiver is not up to project
>> standards anyway, in that it fails to provide reasonable behavior in
>> the case where what's passed is not a legal value of the enum.

> Well, I might be responsible for the CreateDestReceiver thing -- or I
> might not, I haven't checked -- but I do think that style is a bit
> cleaner and more elegant. I think it's VERY unlikely that anyone would
> ever manage to call it with something that's not a legal value of the
> enum, and if they do, I think the chances of surviving are basically
> nil, and frankly, I'd rather die. If you asked me where you want me to
> store my output and I tell you to store it in the sdklgjsdjgslkdg, you
> really should refuse to do anything at all, not just stick my output
> someplace-or-other and hope for the best.

Well, yeah, that's exactly my point. But in my book, "refuse to do
anything" should be "elog(ERROR)", not "invoke undefined behavior".
An actual abort() call might be all right here, in that at least
we'd know what would happen and we could debug it once we got hold
of a stack trace. But pg_unreachable() is not that. Basically, if
there's *any* circumstance, broken code or not, where control could
reach a pg_unreachable() call, you did it wrong.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 20:08:39
Message-ID:	CA+TgmobZXhdr83p_c5jf91vx7GFJ4w+dsO9zDZTPJniJHh=wrQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2020 at 2:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Well, yeah, that's exactly my point. But in my book, "refuse to do
> anything" should be "elog(ERROR)", not "invoke undefined behavior".
> An actual abort() call might be all right here, in that at least
> we'd know what would happen and we could debug it once we got hold
> of a stack trace. But pg_unreachable() is not that. Basically, if
> there's *any* circumstance, broken code or not, where control could
> reach a pg_unreachable() call, you did it wrong.

I don't really agree. I think such defensive coding is more than
justified when the input is coming from a file on disk or some other
external source where it might have been corrupted. For instance, I
think the fact that the code which deforms heap tuples will cheerfully
sail off the end of the buffer or seg fault if the tuple descriptor
doesn't match the tuple is a seriously bad thing. It results in actual
production crashes that could be avoided with more defensive coding.
Admittedly, there would likely be a performance cost, which might not
be a reason to do it, but if that cost is small I would probably vote
for paying it, because this is something that actually happens to
users on a pretty regular basis.

In the case at hand, though, there are no constants of type
CommandDest that come from any place other than a constant in the
program text, and it seems unlikely that this will ever be different
in the future. So, how could we ever end up with a value that's not in
the enum? I guess the program text itself could be corrupted, but we
cannot defend against that.

Mind you, I'm not going to put up a huge stink if you're bound and
determined to go change this. I prefer it the way that it is, and I
think that preference is well-justified by facts on the ground, but I
don't think it's worth fighting about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 20:42:08
Message-ID:	4458.1580244128@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 28, 2020 at 2:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, yeah, that's exactly my point. But in my book, "refuse to do
>> anything" should be "elog(ERROR)", not "invoke undefined behavior".
>> An actual abort() call might be all right here, in that at least
>> we'd know what would happen and we could debug it once we got hold
>> of a stack trace. But pg_unreachable() is not that. Basically, if
>> there's *any* circumstance, broken code or not, where control could
>> reach a pg_unreachable() call, you did it wrong.

> I don't really agree. I think such defensive coding is more than
> justified when the input is coming from a file on disk or some other
> external source where it might have been corrupted.

There's certainly an argument to be made that an elog() call is an
unjustified expenditure of code space and we should just do an abort()
(but still not pg_unreachable(), IMO). However, what I'm really on about
here is that CreateDestReceiver is out of step with nigh-universal project
practice. If it's not worth having an elog() here, then there are
literally hundreds of other elog() calls that we ought to be nuking on
the same grounds. I don't really want to run around and have a bunch
of debates about exactly which extant elog() calls are effectively
unreachable and which are not. That's not always very clear, and even
if it is clear today it might not be tomorrow. The minute somebody calls
CreateDestReceiver with a non-constant argument, the issue becomes open
again. And I'd rather not have to stop and think hard about the tradeoff
between elog() and abort() when I write such functions in future.

So basically, my problem with this is that I don't think it's a coding
style we want to encourage, because it's too fragile. And there's no
good argument (like performance) to leave it that way. I quite agree
with you that there are places like tuple deforming where we're taking
more chances than I'd like --- but there is a noticeable performance
cost to being paranoid there.

regards, tom lane

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 22:28:32
Message-ID:	DF8FAC3A-E557-4AB4-8103-B830276B8637@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 28, 2020, at 8:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.
>
> I think the first question for 0005 is whether want this at all.
> Initially, you proposed NOT committing it, but then Andrew reviewed it
> as if it were for commit. I'm not sure whether he was actually saying
> that it ought to be committed, though, or whether he just missed your
> remarks on the topic. Nobody else has really taken a position. I'm not
> 100% convinced that it's necessary to include this, but I'm also not
> particularly opposed to it. It's a fairly small amount of code, which
> is nice, and perhaps useful as a demonstration of how to use the JSON
> parser in a frontend application, which someone also might find nice.

Once Andrew reviewed it, I started thinking about it as something that might get committed. In that context, I think there should be a lot more tests in this new src/test/bin directory for other common code, but adding those as part of this patch just seems to confuse this patch.

In addition to adding frontend tests for code already in src/common, the conversation in another thread about adding frontend versions of elog and ereport seem like candidates for tests in this location. Sure, you can add an elog into a real frontend tool, such as pg_ctl, and update the tests for that program to expect that elog’s output, but what if you just want to exhaustively test the elog infrastructure in the frontend spanning multiple locales, encodings, whatever? You’ve also recently mentioned the possibility of having memory contexts in frontend code. Testing those seems like a good fit, too.

I decided to leave this in the next version of the patch set, v7. v6 had three files, the second being something that already got committed in a different form, so this is now in v7-0002 whereas it had been in v6-0003. v6-0002 has no equivalent in v7.

> Anyone else want to express an opinion?
>
> Meanwhile, here is a round of nitp^H^H^H^Hreview:
>
> -# installcheck and install should not recurse into the subdirectory "modules".
> +# installcheck and install should not recurse into the subdirectory "modules"
> +# nor "bin".
>
> I would probably have just changed this to:
>
> # installcheck and install should not recurse into "modules" or "bin"
>
> The details are arguable, but you definitely shouldn't say "the
> subdirectory" and then list two of them.

I read that as “nor [the subdirectory] bin” with the [the subdirectory] portion elided, and it doesn’t sound anomalous to me, but your formulation is more compact. I have used it in v7 of the patch set. Thanks.

>
> +This directory contains a set of programs that exercise functionality declared
> +in src/include/common and defined in src/common. The purpose of these programs
> +is to verify that code intended to work both from frontend and backend code do
> +indeed work when compiled and used in frontend code. The structure of this
> +directory makes no attempt to test that such code works in the backend, as the
> +backend has its own tests already, and presumably those tests sufficiently
> +exercide the code as used by it.
>
> "exercide" is not spelled correctly, but I also disagree with giving
> the directory so narrow a charter. I think you should just say
> something like:
>
> This directory contains programs that are built and executed for
> testing purposes,
> but never installed. It may be used, for example, to test that code in
> src/common
> works in frontend environments.

Your formulation sounds fine, and I’ve used it in v7.

> +# There doesn't seem to be any easy way to get TestLib to use the binaries from
> +# our directory, so we hack up a path to our binary and run that
> directly. This
> +# seems brittle enough that some other solution should be found, if possible.
> +
> +my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
>
> I don't know what the right thing to do here is. Perhaps someone more
> familiar with TAP testing can comment.

Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.

> + set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));
>
> Do we need this? I guess we're not likely to bother with translations
> for a test program.

Removed.

> + /*
> + * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
> + * too, which it should already be everywhere except sometimes in Windows.
> + */
> + setbuf(stdout, NULL);
> + setbuf(stderr, NULL);
>
> Do we need this? If so, why?

For the current test setup, it is not needed. The tap test executes this program (test_json) once per json string, and exits after printing a single line. Surely the tap test wouldn’t have problems hanging on an unflushed buffer for a program that has exited. I was imagining this code might grow more complex, with the tap test communicating repeatedly with the same instance of test_json, such as if we extend the json parser to iterate over chunks of the input json string.

I’ve removed this for v7, since we don’t need it yet.

> + char *json;
> + unsigned int json_len;
> + JsonLexContext *lex;
> + int client_encoding;
> + JsonParseErrorType parse_result;
> +
> + json_len = (unsigned int) strlen(str);
> + client_encoding = PQenv2encoding();
> +
> + json = strdup(str);
> + lex = makeJsonLexContextCstringLen(json, strlen(json),
> client_encoding, true /* need_escapes */);
> + parse_result = pg_parse_json(lex, &nullSemAction);
> + fprintf(stdout, _("%s\n"), (JSON_SUCCESS == parse_result ? "VALID" :
> "INVALID"));
> + return;
>
> json_len is set but not used.

You’re right. I’ve removed it.

> Not entirely sure why we are using PQenv2encoding() here.

This program, which passes possibly json formatted strings into the parser, gets those strings from perl through the shell. If locale settings on the machine where this runs might break something about that for a real client application, then our test should break in the same way. Hard-coding “C” or “POSIX” or whatever for the locale side-steps part of the issue we’re trying to test. No?

I’m leaving it as is for v7, but if you still disagree, I’ll change it. Let me know what you want me to change it *to*, though, as there is no obvious choice that I can see.

> The trailing return is unnecessary.

Ok, I’ve removed it.

> I think it would be a good idea to use json_errdetail() in the failure
> case, print the error, and have the tests check that we got the
> expected error.

Oh, yeah, I like that idea. That works, and is included in v7.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
v7-0001-Relocating-jsonapi-to-common.patch	application/octet-stream	11.0 KB
v7-0002-Adding-frontend-tests-for-json-parser.patch	application/octet-stream	9.8 KB

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-28 22:35:44
Message-ID:	1828A089-AFEE-4222-AEE8-D5CACFCF03C7@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for your review. I considered all of this along with your review comments in another email prior to sending v7 in response to that other email a few minutes ago.

> On Jan 28, 2020, at 7:17 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> I’m attaching a new patch set with these three changes including Mahendra’s patch posted elsewhere on this thread.
>>
>> Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now based on a fresh copy of master.
>
> OK, so I think this is getting close.
>
> What is now 0001 manages to have four (4) conditionals on FRONTEND at
> the top of the file. This seems like at least one two many.

You are referencing this section, copied here from the patch:

> #ifndef FRONTEND
> #include "postgres.h"
> #else
> #include "postgres_fe.h"
> #endif
>
> #include "common/jsonapi.h"
>
> #ifdef FRONTEND
> #include "common/logging.h"
> #endif
>
> #include "mb/pg_wchar.h"
>
> #ifndef FRONTEND
> #include "miscadmin.h"
> #endif

I merged these a bit. See v7-0001 for details.

> Also, the preprocessor police are on their way to your house now to
> arrest you for that first one. You need to write it like this:
>
> #define json_log_and_abort(...) \
> do { pg_log_fatal(__VA_ARGS__); exit(1); } while (0)

Yes, right, I had done that and somehow didn’t get it into the patch. I’ll have coffee and donuts waiting.

> {
> - JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
> + JsonLexContext *lex;
> +
> +#ifndef FRONTEND
> + lex = palloc0(sizeof(JsonLexContext));
> +#else
> + lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
> + memset(lex, 0, sizeof(JsonLexContext));
> +#endif
>
> Instead of this, how making no change at all here?

Yes, good point. I had split that into frontend vs backend because I was using palloc0fast for the backend, which seems to me the preferred function when the size is compile-time known, like it is here, and there is no palloc0fast in fe_memutils.h for frontend use. I then changed back to palloc0 when I noticed that pretty much nowhere else similar to this in the project uses palloc0fast. I neglected to change back completely, which left what you are quoting.

Out of curiousity, why is palloc0fast not used in more places?

> - default:
> - elog(ERROR, "unexpected json parse state: %d", ctx);
> }
> +
> + /* Not reached */
> + json_log_and_abort("unexpected json parse state: %d", ctx);
>
> This, too, seems unnecessary.

This was in response to Mahendra’s report of a compiler warning, which I didn’t get on my platform. The code in master changed a bit since v6 was written, so v7 just goes with how the newer committed code does this.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-29 06:02:13
Message-ID:	8440ddc9-8347-ca64-1405-845d10e054cd@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/28/20 5:28 PM, Mark Dilger wrote:
>
>
>> +# There doesn't seem to be any easy way to get TestLib to use the binaries from
>> +# our directory, so we hack up a path to our binary and run that
>> directly. This
>> +# seems brittle enough that some other solution should be found, if possible.
>> +
>> +my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
>>
>> I don't know what the right thing to do here is. Perhaps someone more
>> familiar with TAP testing can comment.
> Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.
>

I think using TESTDIR is Ok, but we do need a little more on Windows,
because the executable name will be different. See attached revised
version of the test script.

We also need some extra stuff for MSVC. Something like the attached
change to src/tools/msvc/Mkvcbuild.pm. Also, the Makefile will need a
line like:

PROGRAM = test_json

I'm still not 100% on the location of the test. I think the way the msvc
suite works this should be in its own dedicated directory e.g.
src/test/json_parse.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
jsonapi-msvc.patch	text/x-patch	1.5 KB
001_test_json.pl	application/x-perl	2.6 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-29 15:27:08
Message-ID:	CA+TgmoYz_gxm9EDEsQZ4TrdobXtUh6p82fzonVtevNA=otFZ3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 28, 2020 at 5:35 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> I merged these a bit. See v7-0001 for details.

I jiggered that a bit more and committed this. I couldn't see the
point of having both the FRONTEND and non-FRONTEND code include
pg_wchar.h.

I'll wait to see what you make of Andrew's latest comments before
doing anything further.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-29 15:45:51
Message-ID:	14178.1580312751@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 28, 2020 at 5:35 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> I merged these a bit. See v7-0001 for details.

> I jiggered that a bit more and committed this. I couldn't see the
> point of having both the FRONTEND and non-FRONTEND code include
> pg_wchar.h.

First buildfarm report is not positive:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-01-29%2015%3A30%3A26

json.obj : error LNK2019: unresolved external symbol makeJsonLexContextCstringLen referenced in function json_recv [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonb.obj : error LNK2001: unresolved external symbol makeJsonLexContextCstringLen [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2001: unresolved external symbol makeJsonLexContextCstringLen [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2019: unresolved external symbol json_lex referenced in function json_typeof [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2019: unresolved external symbol IsValidJsonNumber referenced in function datum_to_json [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
json.obj : error LNK2001: unresolved external symbol nullSemAction [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol pg_parse_json referenced in function json_strip_nulls [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol json_count_array_elements referenced in function get_array_start [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
jsonfuncs.obj : error LNK2019: unresolved external symbol json_errdetail referenced in function json_ereport_error [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
.\Release\postgres\postgres.exe : fatal error LNK1120: 7 unresolved externals [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-29 15:48:17
Message-ID:	CA+TgmoZdV0yfqjc8NXtua1LzqY=p_481wi4tmhPnsNiaw1yuyA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 29, 2020 at 10:45 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Tue, Jan 28, 2020 at 5:35 PM Mark Dilger
> > <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> >> I merged these a bit. See v7-0001 for details.
>
> > I jiggered that a bit more and committed this. I couldn't see the
> > point of having both the FRONTEND and non-FRONTEND code include
> > pg_wchar.h.
>
> First buildfarm report is not positive:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-01-29%2015%3A30%3A26
>
> json.obj : error LNK2019: unresolved external symbol makeJsonLexContextCstringLen referenced in function json_recv [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> jsonb.obj : error LNK2001: unresolved external symbol makeJsonLexContextCstringLen [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> jsonfuncs.obj : error LNK2001: unresolved external symbol makeJsonLexContextCstringLen [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> json.obj : error LNK2019: unresolved external symbol json_lex referenced in function json_typeof [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> json.obj : error LNK2019: unresolved external symbol IsValidJsonNumber referenced in function datum_to_json [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> json.obj : error LNK2001: unresolved external symbol nullSemAction [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> jsonfuncs.obj : error LNK2019: unresolved external symbol pg_parse_json referenced in function json_strip_nulls [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> jsonfuncs.obj : error LNK2019: unresolved external symbol json_count_array_elements referenced in function get_array_start [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> jsonfuncs.obj : error LNK2019: unresolved external symbol json_errdetail referenced in function json_ereport_error [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]
> .\Release\postgres\postgres.exe : fatal error LNK1120: 7 unresolved externals [c:\pgbuildfarm\pgbuildroot\HEAD\pgsql.build\postgres.vcxproj]

Hrrm, OK. I think it must need a sprinkling of Windows-specific magic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-29 15:55:10
Message-ID:	CA+Tgmobcofz2cA1VwDvwQc3i_Hegvoc1xw+akRU2cKvsh5Nf9A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 29, 2020 at 10:48 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Hrrm, OK. I think it must need a sprinkling of Windows-specific magic.

I see that the patch Andrew posted earlier adjusts Mkvcbuild.pm's
@pgcommonallfiles, so I pushed that fix. The other hunks there should
go into the patch to add a test_json utility, I think. Hopefully that
will fix it, but I guess we'll see.

I was under the impression that the MSVC build gets the list of files
to build by parsing the Makefiles, but I guess that's not true at
least in the case of src/common.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-29 21:02:08
Message-ID:	CAA8=A79ZBS6-xU78P1vkxJ17p1+7Q4EJUAUzM14x9xMjPMvNNw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 29, 2020 at 4:32 PM Andrew Dunstan
<andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>
>
> On 1/28/20 5:28 PM, Mark Dilger wrote:
> >
> >
> >> +# There doesn't seem to be any easy way to get TestLib to use the binaries from
> >> +# our directory, so we hack up a path to our binary and run that
> >> directly. This
> >> +# seems brittle enough that some other solution should be found, if possible.
> >> +
> >> +my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
> >>
> >> I don't know what the right thing to do here is. Perhaps someone more
> >> familiar with TAP testing can comment.
> > Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.
> >
>
>
> I think using TESTDIR is Ok,

I've changed my mind, I don't think that will work for MSVC, the
executable gets built elsewhere for that. I'll try to come up with
something portable.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-29 21:06:35
Message-ID:	A73F350B-8D21-49E5-95E3-5121F4A26FC6@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Jan 29, 2020, at 1:02 PM, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>
> On Wed, Jan 29, 2020 at 4:32 PM Andrew Dunstan
> <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>>
>>
>> On 1/28/20 5:28 PM, Mark Dilger wrote:
>>>
>>>
>>>> +# There doesn't seem to be any easy way to get TestLib to use the binaries from
>>>> +# our directory, so we hack up a path to our binary and run that
>>>> directly. This
>>>> +# seems brittle enough that some other solution should be found, if possible.
>>>> +
>>>> +my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
>>>>
>>>> I don't know what the right thing to do here is. Perhaps someone more
>>>> familiar with TAP testing can comment.
>>> Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.
>>>
>>
>>
>> I think using TESTDIR is Ok,
>
>
> I've changed my mind, I don't think that will work for MSVC, the
> executable gets built elsewhere for that. I'll try to come up with
> something portable.

I’m just now working on getting my Windows VMs set up with Visual Studio and whatnot, per the wiki instructions, so I don’t need to burden you with this sort of Windows task in the future. If there are any gotchas not mentioned on the wiki, I’d appreciate pointers about how to avoid them. I’ll try to help devise a solution, or test what you come up with, once I’m properly set up for that.

For no particular reason, I chose Windows Server 2019 and Windows 10 Pro.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: making the backend's json parser work in frontend code
Date:	2020-01-30 06:10:44
Message-ID:	CAA8=A78JVLTXT=qXhBAScyO9yyK3Jcv_OUE69tERLgwxdt1G9w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 30, 2020 at 7:36 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
>
> > On Jan 29, 2020, at 1:02 PM, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
> >
> > On Wed, Jan 29, 2020 at 4:32 PM Andrew Dunstan
> > <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
> >>
> >>
> >> On 1/28/20 5:28 PM, Mark Dilger wrote:
> >>>
> >>>
> >>>> +# There doesn't seem to be any easy way to get TestLib to use the binaries from
> >>>> +# our directory, so we hack up a path to our binary and run that
> >>>> directly. This
> >>>> +# seems brittle enough that some other solution should be found, if possible.
> >>>> +
> >>>> +my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
> >>>>
> >>>> I don't know what the right thing to do here is. Perhaps someone more
> >>>> familiar with TAP testing can comment.
> >>> Yeah, I was hoping that might get a comment from Andrew. I think if it works as-is on windows, we could just use it this way until it causes a problem on some platform or other. It’s not a runtime issue, being only a build-time test, and only then when tap tests are enabled *and* running check-world, so nobody should really be adversely affected. I’ll likely get around to testing this on Windows, but I don’t have any Windows environments set up yet, as that is still on my todo list.
> >>>
> >>
> >>
> >> I think using TESTDIR is Ok,
> >
> >
> > I've changed my mind, I don't think that will work for MSVC, the
> > executable gets built elsewhere for that. I'll try to come up with
> > something portable.
>
> I’m just now working on getting my Windows VMs set up with Visual Studio and whatnot, per the wiki instructions, so I don’t need to burden you with this sort of Windows task in the future. If there are any gotchas not mentioned on the wiki, I’d appreciate pointers about how to avoid them. I’ll try to help devise a solution, or test what you come up with, once I’m properly set up for that.
>
> For no particular reason, I chose Windows Server 2019 and Windows 10 Pro.
>

One VM should be sufficient. Either W10Pro os WS2019 would be fine. I
have buildfarm animals running on both.

Here's what I got working after a lot of trial and error. (This will
require a tiny change in the buildfarm script to make the animals test
it). Note that there is one test that I couldn't get working, so I
skipped it. If you can find out why it fails so much the better ... it
seems to be related to how the command processor handles quotes.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v8-0002-Adding-frontend-tests-for-json-parser.patch	text/x-patch	10.7 KB