Re: patch: function xmltable

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: patch: function xmltable
Date: 2016-08-19 08:58:52
Message-ID: CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

I am sending implementation of xmltable function. The code should to have
near to final quality and it is available for testing.

I invite any help with documentation and testing.

Regards

Pavel

Attachment Content-Type Size
xmltable-20160819.patch text/x-patch 84.6 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-08-23 19:00:12
Message-ID: CAFj8pRA3Of-taZLaDq6S8V6Uwm+o2RbScPCD=b+ach9PqL6CDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2016-08-19 10:58 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
> I am sending implementation of xmltable function. The code should to have
> near to final quality and it is available for testing.
>
> I invite any help with documentation and testing.
>

new update - the work with nodes is much more correct now.

Regards

Pavel

> Regards
>
> Pavel
>

Attachment Content-Type Size
xmltable-20160823.patch text/x-patch 91.9 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-08-24 04:56:16
Message-ID: CAFj8pRDCd-JO_DNCscMpoSiSa3SsmTsgxpEkhWV7zfO10Xc0mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-08-23 21:00 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
> 2016-08-19 10:58 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>
>> Hi
>>
>> I am sending implementation of xmltable function. The code should to have
>> near to final quality and it is available for testing.
>>
>> I invite any help with documentation and testing.
>>
>
> new update - the work with nodes is much more correct now.
>

next update

fix memory leak

Pavel

>
> Regards
>
> Pavel
>
>
>> Regards
>>
>> Pavel
>>
>
>

Attachment Content-Type Size
xmltable-20160824.patch text/x-patch 91.9 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-04 08:06:55
Message-ID: CAFj8pRBgOHYKPfFcVBqtRaA=6Cu21NXFVFcwEL3KJW3VX-6xuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

minor update - using DefElem instead own private parser type

Regards

Pavel

Attachment Content-Type Size
xmltable-20160904.patch text/x-patch 90.9 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-06 04:54:31
Message-ID: CAMsr+YEXQJOHJQsZGLo-Y=G6m0adF3E=eeuHJK2ei=eUg+nw_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4 September 2016 at 16:06, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Hi
>
> minor update - using DefElem instead own private parser type

I'm really glad that you're doing this and I'll take a look at it for this CF.

It's quite a big patch so I expect this will take a few rounds of
review and updating.

Patch applies cleanly and builds cleanly on master both with and
without --with-xml .

Overall, I think this needs to be revised with appropriate comments.
Whitespace/formatting needs fixing since it's all over the place.
Documentation is insufficient (per notes below).

Re identifier naming, some of this code uses XmlTable naming patterns,
some uses TableExpr prefixes. Is that intended to indicate a bounary
between things re-usable for other structured data ingesting
functions? Do you expect a "JSONEXPR" or similar in future? That's
alluded to by

+/*----------
+ * TableExpr - used for XMLTABLE function
+ *
+ * This can be used for json_table, jsonb_table functions in future
+ *----------
+ */
+typedef struct TableExpr
+{
...

If so, should this really be two patches, one to add the table
expression infrastructure and another to add XMLTABLE that uses it?
Also, why in that case does so much of the TableExpr code call
directly into XmlTable code? It doesn't look very generic.

Overall I find identifier naming to be a bit inconsisent and think
it's necessary to make it clear that all the "TableExpr" stuff is for
XMLTABLE specifically, if that's the case, or make the delineation
clearer if not.

I'd also like to see tests that exercise the ruleutils get_rule_expr
parts of the code for the various XMLTABLE variants.

Similarly, since this seems to add a new xpath parser, that needs
comprehensive tests. Maybe re-purpose an existing xpath test data set?

More detailed comments:
====

Docs comments:

The <function>xmltable</function> produces [a] table based on
[the] passed XML value.

The docs are pretty minimal and don't explain the various clauses of
XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is
there a good cross reference link available? The PASSING clause? etc.

How does XMLTABLE decide what to iterate over, and how to iterate over it?

Presumably the FOR ORDINALITY clause makes a column emit a numeric counter.

What standard, if any, does this conform to? Does it resemble
implementations elsewhere? What limitations or unsupported features
does it have relative to those standards?

execEvalTableExpr seems to be defined twice, with a difference in
case. This is probably not going to fly:

+static Datum
+execEvalTableExpr(TableExprState *tstate,
+ ExprContext *econtext,
+ bool *isNull, ExprDoneCond *isDone)
+{

+static Datum
+ExecEvalTableExpr(TableExprState *tstate,
+ ExprContext *econtext,
+ bool *isNull, ExprDoneCond *isDone)
+{

It looks like you've split the function into a "guts" and "wrapper"
part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
That seems reasonable for readability, but the naming isn't.

A comment is needed to explain what ExecEvalTableExpr is / does. If
it's XMLTABLE specific (which it looks like based on the code), its
name should reflect that. This pattern is repeated elsewhere; e.g.
TableExprState is really the state for an XMLTABLE expression. But
PostgreSQL actually has TABLE statements, and in future we might want
to support table-expressions, so I don't think this naming is
appropriate. This is made worse by the lack of comments on things like
the definition of TableExprState. Please use something that makes it
clear it's for XMLTABLE and add appropriate comments.

Formatting of variables, arguments, function signatures etc is
random/haphazard and doesn't follow project convention. It's neither
aligned or unaligned in the normal way, I don't understand the random
spacing at all. Maybe you should try to run pgindent and then extract
just the changes related to your patch? Or run your IDE/editor's
indent function on your changes? Right now it's actually kind of hard
to read. Do you edit with tabstop set to 1 normally or something like
that?

There's a general lack of comments throughout the added code.

In execEvalTableExpr, why are we looping over namespaces? What's that
for? Comment would be nice.

Typo: Path caclulation => Path calculation

What does XmlTableSetRowPath() do? It seems to copy its argument.
Nothing further is done with the row_path argument after it's called
by execEvalTableExpr, so what context is that memory in and do we have
to worry about it if it's large?

execEvalTableExpr says it's doing "path calculation". What it actually
appears to do is evaluate the path expressions, if provided, and
otherwise use the column name as the implied path expression. (The
docs should mention that).

It's wasn't immediately obvious to me what the branch around
tstate->for_ordinality_col is for and what the alternate path's
purpose is in terms of XMLTABLE's behaviour, until I read the parser
definition. That's largely because the behaviour of XMLTABLE is
underspecified in the docs, since once you know ORDINALITY columns
exist it's pretty obvious what it's doing.

Similarly, for the alternate branch tstate->ncols , the
XmlTableGetRowValue call there is meant to do what exactly, and
why/under what conditions? Is it for situations where the field type
is a whole-row value? a composite type? (I'm deliberately not studying
this too deeply, these are points I'd like to see commented so it can
be understood to some reasonable degree at a skim-read).

/* result is one more columns every time */
"one or more"

/* when typmod is not valid, refresh it */
if (te->typmod == -1)

Is this a cache? How is it valid or not valid and when? The comment
(thanks!) on TableExprGetTupleDesc says:

/*
* When we skip transform stage (in view), then TableExpr's
* TupleDesc should not be valid. Refresh is necessary.
*/

but I'm not really grasping what you're trying to explain here. What
transform stage? What view? This could well be my ignorance of this
part of the code; if it should be understandable by a reader who is
appropriately familiar with the executor that's fine, but if it's
specific to how XMLTABLE works some more explanation would be good.

Good that you've got all the required node copy/in/out funcs in place.

Please don't use the name "used_dns". Anyone reading that will read it
as "domain name service" and that's actually confusing with XML
because of XML schema lookups. Maybe used_defnamespace ? used
def_ns?

I haven't looked closely at keyword/parser changes yet, but it doesn't
look like you added any reserved keywords, which is good. It does add
unreserved keywords PATH and COLUMNS ; I'm not sure what policy for
unreserved keywords is or the significance of that.

New ereport() calls specify ERRCODEs, which is good.

PostgreSQL already has XPATH support in the form of xmlexists(...)
etc. Why is getXPathToken() etc needed? What re-use is possible here?
There's no explanation in the patch header or comments. Should the new
xpath parser be re-used by the existing xpath stuff? Why can't we use
libxml's facilities? etc. This at least needs explaining in the
submission, and some kind of hint as to why we have two different ways
to do it is needed in the code. If we do need a new XML parser, should
it be bundled in adt/xml.c along with a lot of user-facing
functionality, or a separate file?

How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
this? It doesn't look like they're intended to be called directly by
the user, and they're not documented (or commented).

I don't understand this at all:

+/*
+ * There are different requests from XMLTABLE, JSON_TABLE functions
+ * on passed data than has CREATE TABLE command. It is reason for
+ * introduction special structure instead using ColumnDef.
+ */
+typedef struct TableExprRawCol
+{
+ NodeTag type;
+ char *colname;
+ TypeName *typeName;
+ bool for_ordinality;
+ bool is_not_null;
+ Node *path_expr;
+ Node *default_expr;
+ int location;
+} TableExprRawCol;

That's my first-pass commentary. I'll return to this once you've had a
chance to take a look at these and tell me all the places I got it
wrong ;)

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-06 20:13:15
Message-ID: CAFj8pRB5NAOgzQPPZD0uXX3LBG6uE9N+PKyf0uf9NJ5VHiBs-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2016-09-06 6:54 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 4 September 2016 at 16:06, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> > Hi
> >
> > minor update - using DefElem instead own private parser type
>
> I'm really glad that you're doing this and I'll take a look at it for this
> CF.
>
> It's quite a big patch so I expect this will take a few rounds of
> review and updating.
>

Thank you for review

>
>
> Patch applies cleanly and builds cleanly on master both with and
> without --with-xml .
>
> Overall, I think this needs to be revised with appropriate comments.
> Whitespace/formatting needs fixing since it's all over the place.
> Documentation is insufficient (per notes below).
>

I am not able to write documentation in English language :( - This function
is pretty complex - so I hope so anybody with better language skills can
help with this. It respects standard and it respects little bit different
Oracle's behave too (different order of DEFAULT and PATH parts).

>
> Re identifier naming, some of this code uses XmlTable naming patterns,
> some uses TableExpr prefixes. Is that intended to indicate a bounary
> between things re-usable for other structured data ingesting
> functions? Do you expect a "JSONEXPR" or similar in future? That's
> alluded to by
>

This structure should be reused by JSON_TABLE function. Now, it is little
bit strange, because there is only XMLTABLE implementation - and I have to
choose between a) using two different names now, b) renaming some part in
future.

And although XMLTABLE and JSON_TABLE functions are pretty similar - share
90% of data (input value, path, columns definitions), these functions has
different syntax - so only middle level code should be shared.

>
> +/*----------
> + * TableExpr - used for XMLTABLE function
> + *
> + * This can be used for json_table, jsonb_table functions in future
> + *----------
> + */
> +typedef struct TableExpr
> +{
> ...
>
> If so, should this really be two patches, one to add the table
> expression infrastructure and another to add XMLTABLE that uses it?
> Also, why in that case does so much of the TableExpr code call
> directly into XmlTable code? It doesn't look very generic.
>

Currently the common part is not too big - just the Node related part - I
am not sure about necessity of two patches. I am agree, there is missing
some TableExpBuilder, where can be better isolated the XML part.

>
> Overall I find identifier naming to be a bit inconsisent and think
> it's necessary to make it clear that all the "TableExpr" stuff is for
> XMLTABLE specifically, if that's the case, or make the delineation
> clearer if not.
>
> I'd also like to see tests that exercise the ruleutils get_rule_expr
> parts of the code for the various XMLTABLE variants.
>
> Similarly, since this seems to add a new xpath parser, that needs
> comprehensive tests. Maybe re-purpose an existing xpath test data set?
>
>
sure

>
>
>
> More detailed comments:
> ====
>
> Docs comments:
>
> The <function>xmltable</function> produces [a] table based on
> [the] passed XML value.
>
> The docs are pretty minimal and don't explain the various clauses of
> XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is
> there a good cross reference link available? The PASSING clause? etc.
>
> How does XMLTABLE decide what to iterate over, and how to iterate over it?
>
> Presumably the FOR ORDINALITY clause makes a column emit a numeric counter.
>
> What standard, if any, does this conform to? Does it resemble
> implementations elsewhere? What limitations or unsupported features
> does it have relative to those standards?
>
>
>
> execEvalTableExpr seems to be defined twice, with a difference in
> case. This is probably not going to fly:
>
>
> +static Datum
> +execEvalTableExpr(TableExprState *tstate,
> + ExprContext *econtext,
> + bool *isNull, ExprDoneCond *isDone)
> +{
>
> +static Datum
> +ExecEvalTableExpr(TableExprState *tstate,
> + ExprContext *econtext,
> + bool *isNull, ExprDoneCond *isDone)
> +{
>
>
> It looks like you've split the function into a "guts" and "wrapper"
> part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
> That seems reasonable for readability, but the naming isn't.
>

I invite any idea how these functions should be named.

>
> A comment is needed to explain what ExecEvalTableExpr is / does. If
> it's XMLTABLE specific (which it looks like based on the code), its
> name should reflect that. This pattern is repeated elsewhere; e.g.
> TableExprState is really the state for an XMLTABLE expression. But
> PostgreSQL actually has TABLE statements, and in future we might want
> to support table-expressions, so I don't think this naming is
> appropriate. This is made worse by the lack of comments on things like
> the definition of TableExprState. Please use something that makes it
> clear it's for XMLTABLE and add appropriate comments.
>

I understand, so using TableExpr can be strange (for XMLTABLE function).
But when we will have JSON_TABLE function, then it will have a sense.

"TableExprState" is consistent with "TableExpr".

Any idea how it should be changed?

>
> Formatting of variables, arguments, function signatures etc is
> random/haphazard and doesn't follow project convention. It's neither
> aligned or unaligned in the normal way, I don't understand the random
> spacing at all. Maybe you should try to run pgindent and then extract
> just the changes related to your patch? Or run your IDE/editor's
> indent function on your changes? Right now it's actually kind of hard
> to read. Do you edit with tabstop set to 1 normally or something like
> that?
>
> There's a general lack of comments throughout the added code.
>
> In execEvalTableExpr, why are we looping over namespaces? What's that
> for? Comment would be nice.
>
> Typo: Path caclulation => Path calculation
>
> What does XmlTableSetRowPath() do? It seems to copy its argument.
> Nothing further is done with the row_path argument after it's called
> by execEvalTableExpr, so what context is that memory in and do we have
> to worry about it if it's large?
>
> execEvalTableExpr says it's doing "path calculation". What it actually
> appears to do is evaluate the path expressions, if provided, and
> otherwise use the column name as the implied path expression. (The
> docs should mention that).
>
> It's wasn't immediately obvious to me what the branch around
> tstate->for_ordinality_col is for and what the alternate path's
> purpose is in terms of XMLTABLE's behaviour, until I read the parser
> definition. That's largely because the behaviour of XMLTABLE is
> underspecified in the docs, since once you know ORDINALITY columns
> exist it's pretty obvious what it's doing.
>
> Similarly, for the alternate branch tstate->ncols , the
> XmlTableGetRowValue call there is meant to do what exactly, and
> why/under what conditions? Is it for situations where the field type
> is a whole-row value? a composite type? (I'm deliberately not studying
> this too deeply, these are points I'd like to see commented so it can
> be understood to some reasonable degree at a skim-read).
>
>
> /* result is one more columns every time */
> "one or more"
>
>
>
> /* when typmod is not valid, refresh it */
> if (te->typmod == -1)
>
>
> Is this a cache? How is it valid or not valid and when? The comment
> (thanks!) on TableExprGetTupleDesc says:
>
> /*
> * When we skip transform stage (in view), then TableExpr's
> * TupleDesc should not be valid. Refresh is necessary.
> */
>
> but I'm not really grasping what you're trying to explain here. What
> transform stage? What view? This could well be my ignorance of this
> part of the code; if it should be understandable by a reader who is
> appropriately familiar with the executor that's fine, but if it's
> specific to how XMLTABLE works some more explanation would be good.
>

This is most difficult part of this patch, and I am not sure it it is fully
correctly implemented. I use TupleDesc cache. The TupleDesc is created in
parser/transform stage. When the XMLTABLE is used in some view, then the
transformed parser tree is materialized - and when the view is used in
query, then this tree is loaded and the parser/transform stage is
"skipped". I'll check this code against implementation of ROW constructor
and I'll try to do more comments there.

>
> Good that you've got all the required node copy/in/out funcs in place.
>
> Please don't use the name "used_dns". Anyone reading that will read it
> as "domain name service" and that's actually confusing with XML
> because of XML schema lookups. Maybe used_defnamespace ? used
> def_ns?
>

good idea

>
> I haven't looked closely at keyword/parser changes yet, but it doesn't
> look like you added any reserved keywords, which is good. It does add
> unreserved keywords PATH and COLUMNS ; I'm not sure what policy for
> unreserved keywords is or the significance of that.
>
> New ereport() calls specify ERRCODEs, which is good.
>
> PostgreSQL already has XPATH support in the form of xmlexists(...)
> etc. Why is getXPathToken() etc needed? What re-use is possible here?
> There's no explanation in the patch header or comments. Should the new
> xpath parser be re-used by the existing xpath stuff? Why can't we use
> libxml's facilities? etc. This at least needs explaining in the
> submission, and some kind of hint as to why we have two different ways
> to do it is needed in the code. If we do need a new XML parser, should
> it be bundled in adt/xml.c along with a lot of user-facing
> functionality, or a separate file?
>
>
libxml2 and our XPATH function doesn't support default namespace (
http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I
implemented. This is the mayor issue of libxml2 library. Another difference
between XPATH function and XMLTABLE function is using two phase searching
and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using
two XPATH expressions - for row data cutting and next for column data
cutting (from row data). The our XPATH functions is pretty simple mapped to
libxml2 XPATH API. But it is not possible with XMLTABLE function - due
design of this function in standard (it is more user friendly and doesn't
require exactly correct xpath expressions).

I didn't find any API in libxml2 for a work with parsed xpath expressions -
I need some info about the first and last token of xpath expression - it is
base for decision about using prefix or suffix.

This functionality (xpath expression parser) cannot be used for our XPATH
function now - maybe default namespace in future.

>
> How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
> this? It doesn't look like they're intended to be called directly by
> the user, and they're not documented (or commented).
>

Probably I used wrong names. XMLTABLE function is running in two different
modes - with explicitly defined columns (XmlTableGetValue is used), and
without explicitly defined columns - so result is one XML column and only
one one step searching is used (there are not column related xpath
expressions) ( XmlTableGetRowValue is used). The function XmlTableGetValue
is used for getting one column value, the function XmlTableGetRowValue is
used for getting one value too, but in special case, when there are not any
other value.

>
> I don't understand this at all:
>
>
>
> +/*
> + * There are different requests from XMLTABLE, JSON_TABLE functions
> + * on passed data than has CREATE TABLE command. It is reason for
> + * introduction special structure instead using ColumnDef.
> + */
> +typedef struct TableExprRawCol
> +{
> + NodeTag type;
> + char *colname;
> + TypeName *typeName;
> + bool for_ordinality;
> + bool is_not_null;
> + Node *path_expr;
> + Node *default_expr;
> + int location;
> +} TableExprRawCol;
>
>
>
I am sorry. It is my fault. Now we have very similar node ColumnDef. This
node is designed for usage in utility commands - and it is not designed for
usage inside a query. I had to decide between enhancing ColumnDef node or
introduction new special node. Because there are more special attributes
and it is hard to serialize current ColumnDef, I decided to use new node.

>
> That's my first-pass commentary. I'll return to this once you've had a
> chance to take a look at these and tell me all the places I got it
> wrong ;)
>
>
Thank for this

Regard

Pavel

>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-06 20:21:49
Message-ID: CAFj8pRAD5kk364D7FHFu27v=YymoD9PeE5aB1EaiTzf_h_K8Zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> libxml2 and our XPATH function doesn't support default namespace (
> http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I
> implemented. This is the mayor issue of libxml2 library. Another difference
> between XPATH function and XMLTABLE function is using two phase searching
> and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using
> two XPATH expressions - for row data cutting and next for column data
> cutting (from row data). The our XPATH functions is pretty simple mapped to
> libxml2 XPATH API. But it is not possible with XMLTABLE function - due
> design of this function in standard (it is more user friendly and doesn't
> require exactly correct xpath expressions).
>

libxm2 doesn't support xpath 2.0 where default namespace was introduced.


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-07 03:03:21
Message-ID: CAMsr+YFVDkngz-R5b9fNPs+Zu6AXbMZbwis9s0=GfrDVUds5GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7 September 2016 at 04:13, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>> Overall, I think this needs to be revised with appropriate comments.
>> Whitespace/formatting needs fixing since it's all over the place.
>> Documentation is insufficient (per notes below).
>
>
> I am not able to write documentation in English language :( - This function
> is pretty complex - so I hope so anybody with better language skills can
> help with this. It respects standard and it respects little bit different
> Oracle's behave too (different order of DEFAULT and PATH parts).

OK, no problem. It can't be committed without more comprehensive docs
though, especially for new and nontrivial functionality.

Is there some reference material you can point to so someone else can
help with docs? And can you describe what differences there are
between your implementation and the reference?

Alternately, if you document it in Czech, do you know of anyone who
could assist in translating to English for the main documentation?

>> Re identifier naming, some of this code uses XmlTable naming patterns,
>> some uses TableExpr prefixes. Is that intended to indicate a bounary
>> between things re-usable for other structured data ingesting
>> functions? Do you expect a "JSONEXPR" or similar in future? That's
>> alluded to by
>
>
> This structure should be reused by JSON_TABLE function. Now, it is little
> bit strange, because there is only XMLTABLE implementation - and I have to
> choose between a) using two different names now, b) renaming some part in
> future.

OK. Are you planning on writing this JSON_TABLE or are you leaving
room for future growth? Either way is fine, just curious.

> And although XMLTABLE and JSON_TABLE functions are pretty similar - share
> 90% of data (input value, path, columns definitions), these functions has
> different syntax - so only middle level code should be shared.

That makes sense.

I think it would be best if you separated out the TableExpr
infrastructure from the XMLTABLE implementation though, so we can
review the first level infrastrcture separately and make this a
2-patch series. Most importantly, doing it that way will help you find
places where TableExpr code calls directly into XMLTABLE code. If
TableExpr is supposed to be reusable for json etc, it probably
shouldn't be calling XmlTable stuff directly.

That also means somewhat smaller simpler patches, which probably isn't bad.

I don't necessarily think this needs to be fully pluggable with
callbacks etc. It doesn't sound like you expect this to be used by
extensions or to have a lot of users, right? So it probably just needs
clearer separation of the infrastructure layer from the xmltable
layer. I think splitting the patch will make that easier to see and
make it easier to find problems.

My biggest complaint at the moment is that execEvalTableExpr calls
initXmlTableContext(...) directly, is aware of XML namespaces
directly, calls XmlTableSetRowPath() directly, calls
XmlTableFetchRow() directly, etc. It is in no way generic/reusable for
some later JSONTABLE feature. That needs to be fixed by:

* Renaming it so it's clearly only for XMLTABLE; or
* Abstracting the init context, set row path, fetch row etc operations
so json ones can be plugged in later

> Currently the common part is not too big - just the Node related part - I am
> not sure about necessity of two patches.

The problem is that the common part is all mixed in with the
XMLTABLE-specific part, so it's not at all clear it can be common with
something else.

> I am agree, there is missing some
> TableExpBuilder, where can be better isolated the XML part.

Yeah, that's sort of what I'm getting at.

>> execEvalTableExpr seems to be defined twice, with a difference in
>> case. This is probably not going to fly:
>>
>>
>> +static Datum
>> +execEvalTableExpr(TableExprState *tstate,
>> + ExprContext *econtext,
>> + bool *isNull, ExprDoneCond *isDone)
>> +{
>>
>> +static Datum
>> +ExecEvalTableExpr(TableExprState *tstate,
>> + ExprContext *econtext,
>> + bool *isNull, ExprDoneCond *isDone)
>> +{
>>
>>
>> It looks like you've split the function into a "guts" and "wrapper"
>> part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
>> That seems reasonable for readability, but the naming isn't.
>
>
> I invite any idea how these functions should be named.

Definitely not how they are ;) . They really can't differ in a single
character's case.

I'm not sure if PostgreSQL has any formal convention for this. Some
places use _impl e.g. pg_read_barrier_impl() but that's in the
context of an interface-vs-implementation separation, which isn't the
case here.

Some places use _internal, like AlterObjectRename_internal(...), but
that's where there's an associated public/external part, which isn't
the case here.

Some places use _guts e.g. pg_logical_slot_get_changes_guts(...),
largely where there's common use by several callers.

This is a fairly arbitrary function split for readability/length. Is
it actually useful to split this function up at all?

Anyone else have an opinion?

>> A comment is needed to explain what ExecEvalTableExpr is / does. If
>> it's XMLTABLE specific (which it looks like based on the code), its
>> name should reflect that. This pattern is repeated elsewhere; e.g.
>> TableExprState is really the state for an XMLTABLE expression. But
>> PostgreSQL actually has TABLE statements, and in future we might want
>> to support table-expressions, so I don't think this naming is
>> appropriate. This is made worse by the lack of comments on things like
>> the definition of TableExprState. Please use something that makes it
>> clear it's for XMLTABLE and add appropriate comments.
>
>
> I understand, so using TableExpr can be strange (for XMLTABLE function). But
> when we will have JSON_TABLE function, then it will have a sense.

It's pretty hard to review that as shared infrastructure when it's
still tangled up in xmltable specifics, though.

> "TableExprState" is consistent with "TableExpr".
>
> Any idea how it should be changed?

I think if you want it to be shareable infrasructure, you need to
write it so it can be used as shared infrastructure. Not just name it
that way but then make it XMLTABLE specific in actual functionality.

>> /* when typmod is not valid, refresh it */
>> if (te->typmod == -1)
>>
>>
>> Is this a cache? How is it valid or not valid and when? The comment
>> (thanks!) on TableExprGetTupleDesc says:
>>
>> /*
>> * When we skip transform stage (in view), then TableExpr's
>> * TupleDesc should not be valid. Refresh is necessary.
>> */
>>
>> but I'm not really grasping what you're trying to explain here. What
>> transform stage? What view? This could well be my ignorance of this
>> part of the code; if it should be understandable by a reader who is
>> appropriately familiar with the executor that's fine, but if it's
>> specific to how XMLTABLE works some more explanation would be good.
>
>
> This is most difficult part of this patch, and I am not sure it it is fully
> correctly implemented. I use TupleDesc cache. The TupleDesc is created in
> parser/transform stage. When the XMLTABLE is used in some view, then the
> transformed parser tree is materialized - and when the view is used in
> query, then this tree is loaded and the parser/transform stage is "skipped".
> I'll check this code against implementation of ROW constructor and I'll try
> to do more comments there.

Thanks. It would be good to highlight when this does and does not
happen and why. Why is it necessary at all?

What happens if XMLTABLE is used in a query directly, not part of a
view? if XMLTABLE is used in a view? If XMLTABLE is used in a prepared
statement / plpgsql statement / etc? What about a CTE term?

Not necessarily list all these cases one by one, just explain what
happens, when and why. Especially if it's complex, so other readers
can understand it and don't have to study it in detail to understand
what is going on. It does not need to be good public-facing
documentation, and details of wording, grammar etc can be fixed up
later, it's the ideas that matter.

Is this similar to other logic elsewhere? If so, reference that other
logic so readers know where to look. That way if they're
changing/bugfixing/etc one place they know there's another place that
might need changing.

I don't know this area of the code well enough to give a solid review
of the actual functionality, and I don't yet understand what it's
trying to do so it's hard to review it by studying what it actually
does vs what it claims to do. Maybe Peter E can help, he said he was
thinking of looking at this patch too. But more information on what
it's trying to do would be a big help.

>> PostgreSQL already has XPATH support in the form of xmlexists(...)
>> etc. Why is getXPathToken() etc needed? What re-use is possible here?
>> There's no explanation in the patch header or comments. Should the new
>> xpath parser be re-used by the existing xpath stuff? Why can't we use
>> libxml's facilities? etc. This at least needs explaining in the
>> submission, and some kind of hint as to why we have two different ways
>> to do it is needed in the code. If we do need a new XML parser, should
>> it be bundled in adt/xml.c along with a lot of user-facing
>> functionality, or a separate file?
>
> libxml2 and our XPATH function doesn't support default namespace (
> http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I
> implemented.

OK, that makes sense.

For the purpose of getting this patch in, is it a _necessary_ feature?
Can XMLTABLE be usefully implemented without it, and if so, can it be
added in a subsequent patch? It would be nice to simplify this by
using existing libxml2 functionality in the first version rather than
adding a whole new xpath as well!

> This is the mayor issue of libxml2 library. Another difference
> between XPATH function and XMLTABLE function is using two phase searching
> and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using
> two XPATH expressions - for row data cutting and next for column data
> cutting (from row data). The our XPATH functions is pretty simple mapped to
> libxml2 XPATH API. But it is not possible with XMLTABLE function - due
> design of this function in standard (it is more user friendly and doesn't
> require exactly correct xpath expressions).

So you can't use existing libxml2 xpath support to implement XMLTABLE,
even without default namespaces?

> I didn't find any API in libxml2 for a work with parsed xpath expressions -
> I need some info about the first and last token of xpath expression - it is
> base for decision about using prefix or suffix.
>
> This functionality (xpath expression parser) cannot be used for our XPATH
> function now - maybe default namespace in future.

So we'll have two different XPATH implementations for different
places, with different limitations, different possible bugs, etc?

What would be needed to make the new XPATH work for our built-in xpath
functions too?

>> How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
>> this? It doesn't look like they're intended to be called directly by
>> the user, and they're not documented (or commented).
>
>
> Probably I used wrong names. XMLTABLE function is running in two different
> modes - with explicitly defined columns (XmlTableGetValue is used), and
> without explicitly defined columns - so result is one XML column and only
> one one step searching is used (there are not column related xpath
> expressions) ( XmlTableGetRowValue is used). The function XmlTableGetValue
> is used for getting one column value, the function XmlTableGetRowValue is
> used for getting one value too, but in special case, when there are not any
> other value.

So both are internal implementation of the parser-level XMLTABLE(...)
construct and are not intended to be called directly by users - right?

Comments please! A short comment on the function saying this would be
a big help.

Regarding naming, do we already have a convention for functions that
are internal implementation of something the user "spells"
differently? Where it's transformed by the parser? I couldn't find
one. I don't much care about the names so long as there are comments
explaining what calls the functions and what the user-facing interface
that matches the function is.

Is it safe for users to call these directly? What happens if they do
so incorrectly?

Why are they not in pg_proc.h? Do they need to be?

>> +/*
>> + * There are different requests from XMLTABLE, JSON_TABLE functions
>> + * on passed data than has CREATE TABLE command. It is reason for
>> + * introduction special structure instead using ColumnDef.
>> + */
>> +typedef struct TableExprRawCol
>> +{
>> + NodeTag type;
>> + char *colname;
>> + TypeName *typeName;
>> + bool for_ordinality;
>> + bool is_not_null;
>> + Node *path_expr;
>> + Node *default_expr;
>> + int location;
>> +} TableExprRawCol;
>
> I am sorry. It is my fault. Now we have very similar node ColumnDef. This
> node is designed for usage in utility commands - and it is not designed for
> usage inside a query.

Makes sense.

> I had to decide between enhancing ColumnDef node or
> introduction new special node. Because there are more special attributes and
> it is hard to serialize current ColumnDef, I decided to use new node.

Seems reasonable. The summary is "this is the parse node for a column
of an XMLTABLE expression".

Suggested comment:

/*
* This is the parsenode for a column definition in a table-expression
like XMLTABLE.
*
* We can't re-use ColumnDef here; the utility command column
definition has all the
* wrong attributes for use in table-expressions and just doesn't make
sense here.
*/
typedef struct TableExprColumn
{
...
};

?

Why "RawCol" ? What does it become when it's not "raw" anymore? Is
that a reference to ColumnDef's raw_default and cooked_default for
untransformed vs transformed parse-trees?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-07 05:49:35
Message-ID: CAFj8pRB1DEu1d59eyGsc-ajCLLF+SOoUbpVRvCF=OrokEperRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-07 5:03 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 7 September 2016 at 04:13, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> >> Overall, I think this needs to be revised with appropriate comments.
> >> Whitespace/formatting needs fixing since it's all over the place.
> >> Documentation is insufficient (per notes below).
> >
> >
> > I am not able to write documentation in English language :( - This
> function
> > is pretty complex - so I hope so anybody with better language skills can
> > help with this. It respects standard and it respects little bit different
> > Oracle's behave too (different order of DEFAULT and PATH parts).
>
> OK, no problem. It can't be committed without more comprehensive docs
> though, especially for new and nontrivial functionality.
>
> Is there some reference material you can point to so someone else can
> help with docs? And can you describe what differences there are
> between your implementation and the reference?
>
> Alternately, if you document it in Czech, do you know of anyone who
> could assist in translating to English for the main documentation?
>
> >> Re identifier naming, some of this code uses XmlTable naming patterns,
> >> some uses TableExpr prefixes. Is that intended to indicate a bounary
> >> between things re-usable for other structured data ingesting
> >> functions? Do you expect a "JSONEXPR" or similar in future? That's
> >> alluded to by
> >
> >
> > This structure should be reused by JSON_TABLE function. Now, it is little
> > bit strange, because there is only XMLTABLE implementation - and I have
> to
> > choose between a) using two different names now, b) renaming some part in
> > future.
>
> OK. Are you planning on writing this JSON_TABLE or are you leaving
> room for future growth? Either way is fine, just curious.
>
> > And although XMLTABLE and JSON_TABLE functions are pretty similar - share
> > 90% of data (input value, path, columns definitions), these functions has
> > different syntax - so only middle level code should be shared.
>
> That makes sense.
>
> I think it would be best if you separated out the TableExpr
> infrastructure from the XMLTABLE implementation though, so we can
> review the first level infrastrcture separately and make this a
> 2-patch series. Most importantly, doing it that way will help you find
> places where TableExpr code calls directly into XMLTABLE code. If
> TableExpr is supposed to be reusable for json etc, it probably
> shouldn't be calling XmlTable stuff directly.
>
> That also means somewhat smaller simpler patches, which probably isn't bad.
>
> I don't necessarily think this needs to be fully pluggable with
> callbacks etc. It doesn't sound like you expect this to be used by
> extensions or to have a lot of users, right? So it probably just needs
> clearer separation of the infrastructure layer from the xmltable
> layer. I think splitting the patch will make that easier to see and
> make it easier to find problems.
>
> My biggest complaint at the moment is that execEvalTableExpr calls
> initXmlTableContext(...) directly, is aware of XML namespaces
> directly, calls XmlTableSetRowPath() directly, calls
> XmlTableFetchRow() directly, etc. It is in no way generic/reusable for
> some later JSONTABLE feature. That needs to be fixed by:
>
> * Renaming it so it's clearly only for XMLTABLE; or
> * Abstracting the init context, set row path, fetch row etc operations
> so json ones can be plugged in later
>
>
>
> > Currently the common part is not too big - just the Node related part -
> I am
> > not sure about necessity of two patches.
>
> The problem is that the common part is all mixed in with the
> XMLTABLE-specific part, so it's not at all clear it can be common with
> something else.
>
> > I am agree, there is missing some
> > TableExpBuilder, where can be better isolated the XML part.
>
> Yeah, that's sort of what I'm getting at.
>
>
> >> execEvalTableExpr seems to be defined twice, with a difference in
> >> case. This is probably not going to fly:
> >>
> >>
> >> +static Datum
> >> +execEvalTableExpr(TableExprState *tstate,
> >> + ExprContext *econtext,
> >> + bool *isNull, ExprDoneCond *isDone)
> >> +{
> >>
> >> +static Datum
> >> +ExecEvalTableExpr(TableExprState *tstate,
> >> + ExprContext *econtext,
> >> + bool *isNull, ExprDoneCond *isDone)
> >> +{
> >>
> >>
> >> It looks like you've split the function into a "guts" and "wrapper"
> >> part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
> >> That seems reasonable for readability, but the naming isn't.
> >
> >
> > I invite any idea how these functions should be named.
>
> Definitely not how they are ;) . They really can't differ in a single
> character's case.
>
> I'm not sure if PostgreSQL has any formal convention for this. Some
> places use _impl e.g. pg_read_barrier_impl() but that's in the
> context of an interface-vs-implementation separation, which isn't the
> case here.
>
> Some places use _internal, like AlterObjectRename_internal(...), but
> that's where there's an associated public/external part, which isn't
> the case here.
>
> Some places use _guts e.g. pg_logical_slot_get_changes_guts(...),
> largely where there's common use by several callers.
>
> This is a fairly arbitrary function split for readability/length. Is
> it actually useful to split this function up at all?
>
> Anyone else have an opinion?
>
> >> A comment is needed to explain what ExecEvalTableExpr is / does. If
> >> it's XMLTABLE specific (which it looks like based on the code), its
> >> name should reflect that. This pattern is repeated elsewhere; e.g.
> >> TableExprState is really the state for an XMLTABLE expression. But
> >> PostgreSQL actually has TABLE statements, and in future we might want
> >> to support table-expressions, so I don't think this naming is
> >> appropriate. This is made worse by the lack of comments on things like
> >> the definition of TableExprState. Please use something that makes it
> >> clear it's for XMLTABLE and add appropriate comments.
> >
> >
> > I understand, so using TableExpr can be strange (for XMLTABLE function).
> But
> > when we will have JSON_TABLE function, then it will have a sense.
>
> It's pretty hard to review that as shared infrastructure when it's
> still tangled up in xmltable specifics, though.
>
> > "TableExprState" is consistent with "TableExpr".
> >
> > Any idea how it should be changed?
>
> I think if you want it to be shareable infrasructure, you need to
> write it so it can be used as shared infrastructure. Not just name it
> that way but then make it XMLTABLE specific in actual functionality.
>
>
> >> /* when typmod is not valid, refresh it */
> >> if (te->typmod == -1)
> >>
> >>
> >> Is this a cache? How is it valid or not valid and when? The comment
> >> (thanks!) on TableExprGetTupleDesc says:
> >>
> >> /*
> >> * When we skip transform stage (in view), then TableExpr's
> >> * TupleDesc should not be valid. Refresh is necessary.
> >> */
> >>
> >> but I'm not really grasping what you're trying to explain here. What
> >> transform stage? What view? This could well be my ignorance of this
> >> part of the code; if it should be understandable by a reader who is
> >> appropriately familiar with the executor that's fine, but if it's
> >> specific to how XMLTABLE works some more explanation would be good.
> >
> >
> > This is most difficult part of this patch, and I am not sure it it is
> fully
> > correctly implemented. I use TupleDesc cache. The TupleDesc is created in
> > parser/transform stage. When the XMLTABLE is used in some view, then the
> > transformed parser tree is materialized - and when the view is used in
> > query, then this tree is loaded and the parser/transform stage is
> "skipped".
> > I'll check this code against implementation of ROW constructor and I'll
> try
> > to do more comments there.
>
> Thanks. It would be good to highlight when this does and does not
> happen and why. Why is it necessary at all?
>
> What happens if XMLTABLE is used in a query directly, not part of a
> view? if XMLTABLE is used in a view? If XMLTABLE is used in a prepared
> statement / plpgsql statement / etc? What about a CTE term?
>
> Not necessarily list all these cases one by one, just explain what
> happens, when and why. Especially if it's complex, so other readers
> can understand it and don't have to study it in detail to understand
> what is going on. It does not need to be good public-facing
> documentation, and details of wording, grammar etc can be fixed up
> later, it's the ideas that matter.
>
> Is this similar to other logic elsewhere? If so, reference that other
> logic so readers know where to look. That way if they're
> changing/bugfixing/etc one place they know there's another place that
> might need changing.
>
> I don't know this area of the code well enough to give a solid review
> of the actual functionality, and I don't yet understand what it's
> trying to do so it's hard to review it by studying what it actually
> does vs what it claims to do. Maybe Peter E can help, he said he was
> thinking of looking at this patch too. But more information on what
> it's trying to do would be a big help.
>
> >> PostgreSQL already has XPATH support in the form of xmlexists(...)
> >> etc. Why is getXPathToken() etc needed? What re-use is possible here?
> >> There's no explanation in the patch header or comments. Should the new
> >> xpath parser be re-used by the existing xpath stuff? Why can't we use
> >> libxml's facilities? etc. This at least needs explaining in the
> >> submission, and some kind of hint as to why we have two different ways
> >> to do it is needed in the code. If we do need a new XML parser, should
> >> it be bundled in adt/xml.c along with a lot of user-facing
> >> functionality, or a separate file?
> >
> > libxml2 and our XPATH function doesn't support default namespace (
> > http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I
> > implemented.
>
> OK, that makes sense.
>
> For the purpose of getting this patch in, is it a _necessary_ feature?
> Can XMLTABLE be usefully implemented without it, and if so, can it be
> added in a subsequent patch? It would be nice to simplify this by
> using existing libxml2 functionality in the first version rather than
> adding a whole new xpath as well!
>

This is not a xpath implementation - it is preprocessing of xpath
expression. without it, a users have to set explicitly PATH clause with
explicit prefix "./" and explicit suffix "/text()". The usability will be
significantly lower, and what is worst - the examples from internet should
not work. Although is is lot of lines, this code is necessary.

>
> > This is the mayor issue of libxml2 library. Another difference
> > between XPATH function and XMLTABLE function is using two phase searching
> > and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE
> using
> > two XPATH expressions - for row data cutting and next for column data
> > cutting (from row data). The our XPATH functions is pretty simple mapped
> to
> > libxml2 XPATH API. But it is not possible with XMLTABLE function - due
> > design of this function in standard (it is more user friendly and doesn't
> > require exactly correct xpath expressions).
>
> So you can't use existing libxml2 xpath support to implement XMLTABLE,
> even without default namespaces?
>
> > I didn't find any API in libxml2 for a work with parsed xpath
> expressions -
> > I need some info about the first and last token of xpath expression - it
> is
> > base for decision about using prefix or suffix.
> >
> > This functionality (xpath expression parser) cannot be used for our XPATH
> > function now - maybe default namespace in future.
>
> So we'll have two different XPATH implementations for different
> places, with different limitations, different possible bugs, etc?
>

It is just preprocessing. The evaluation of xpath expression is part of
libxml2 and it is shared.

Our XPATH function is not short, but the reason is reading namespaces data
from 2D array. The evaluation of xpath expression is on few lines.

>
> What would be needed to make the new XPATH work for our built-in xpath
> functions too?
>

> >> How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
> >> this? It doesn't look like they're intended to be called directly by
> >> the user, and they're not documented (or commented).
> >
> >
> > Probably I used wrong names. XMLTABLE function is running in two
> different
> > modes - with explicitly defined columns (XmlTableGetValue is used), and
> > without explicitly defined columns - so result is one XML column and only
> > one one step searching is used (there are not column related xpath
> > expressions) ( XmlTableGetRowValue is used). The function
> XmlTableGetValue
> > is used for getting one column value, the function XmlTableGetRowValue is
> > used for getting one value too, but in special case, when there are not
> any
> > other value.
>
> So both are internal implementation of the parser-level XMLTABLE(...)
> construct and are not intended to be called directly by users - right?
>

No - it is called from executor - and it should not be called differently.
I have to do better separation from executor, and these functions will be
private.

>
> Comments please! A short comment on the function saying this would be
> a big help.
>
> Regarding naming, do we already have a convention for functions that
> are internal implementation of something the user "spells"
> differently? Where it's transformed by the parser? I couldn't find
> one. I don't much care about the names so long as there are comments
> explaining what calls the functions and what the user-facing interface
> that matches the function is.
>
> Is it safe for users to call these directly? What happens if they do
> so incorrectly?
>
> Why are they not in pg_proc.h? Do they need to be?
>
> >> +/*
> >> + * There are different requests from XMLTABLE, JSON_TABLE functions
> >> + * on passed data than has CREATE TABLE command. It is reason for
> >> + * introduction special structure instead using ColumnDef.
> >> + */
> >> +typedef struct TableExprRawCol
> >> +{
> >> + NodeTag type;
> >> + char *colname;
> >> + TypeName *typeName;
> >> + bool for_ordinality;
> >> + bool is_not_null;
> >> + Node *path_expr;
> >> + Node *default_expr;
> >> + int location;
> >> +} TableExprRawCol;
> >
> > I am sorry. It is my fault. Now we have very similar node ColumnDef. This
> > node is designed for usage in utility commands - and it is not designed
> for
> > usage inside a query.
>
> Makes sense.
>
> > I had to decide between enhancing ColumnDef node or
> > introduction new special node. Because there are more special attributes
> and
> > it is hard to serialize current ColumnDef, I decided to use new node.
>
> Seems reasonable. The summary is "this is the parse node for a column
> of an XMLTABLE expression".
>
> Suggested comment:
>
> /*
> * This is the parsenode for a column definition in a table-expression
> like XMLTABLE.
> *
> * We can't re-use ColumnDef here; the utility command column
> definition has all the
> * wrong attributes for use in table-expressions and just doesn't make
> sense here.
> */
> typedef struct TableExprColumn
> {
> ...
> };
>
> ?
>
> Why "RawCol" ? What does it become when it's not "raw" anymore? Is
> that a reference to ColumnDef's raw_default and cooked_default for
> untransformed vs transformed parse-trees?
>

TableExprColumn is better

Regards

Pavel

>
>
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-07 06:44:14
Message-ID: CAFj8pRDQ2b8sEoM-TtghRJsEub7Rn8nOK2Z3xz=qyvupZ3e96Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
>
> Suggested comment:
>
> /*
> * This is the parsenode for a column definition in a table-expression
> like XMLTABLE.
> *
> * We can't re-use ColumnDef here; the utility command column
> definition has all the
> * wrong attributes for use in table-expressions and just doesn't make
> sense here.
> */
> typedef struct TableExprColumn
> {
> ...
> };
>
> ?
>
> Why "RawCol" ? What does it become when it's not "raw" anymore? Is
> that a reference to ColumnDef's raw_default and cooked_default for
> untransformed vs transformed parse-trees?
>

My previous reply was wrong - it is used by parser only and holds TypeName
field. The analogy with ColumnDef raw_default is perfect.

Regards

Pavel

>
>
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-07 07:23:59
Message-ID: CAMsr+YE0xPpPBw8vfL0LabtL_Ry30KOPWWtR84h5DFV1qOXoHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7 September 2016 at 14:44, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>
>> Suggested comment:
>>
>> /*
>> * This is the parsenode for a column definition in a table-expression
>> like XMLTABLE.
>> *
>> * We can't re-use ColumnDef here; the utility command column
>> definition has all the
>> * wrong attributes for use in table-expressions and just doesn't make
>> sense here.
>> */
>> typedef struct TableExprColumn
>> {
>> ...
>> };
>>
>> ?
>>
>> Why "RawCol" ? What does it become when it's not "raw" anymore? Is
>> that a reference to ColumnDef's raw_default and cooked_default for
>> untransformed vs transformed parse-trees?
>
>
> My previous reply was wrong - it is used by parser only and holds TypeName
> field. The analogy with ColumnDef raw_default is perfect.

Cool, lets just comment that then.

I'll wait on an updated patch per discussion to date. Hopefully
somebody else with more of a clue than me can offer better review of
the executor/view/caching part you specifically called out as complex.
Otherwise maybe it'll be clearer in a revised version.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-09 08:35:22
Message-ID: CAFj8pRAnMM26kkeqe0rbGQWSDqRbjmKEP8tE5mB8B=z6A551aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I am sending new version of this patch

1. now generic TableExpr is better separated from a real content generation
2. I removed cached typmod - using row type cache everywhere - it is
consistent with other few places in Pg where dynamic types are used - the
result tupdesc is generated few times more - but it is not on critical path.
3. More comments, few more lines in doc.
4. Reformated by pgindent

Regards

Pavel

Attachment Content-Type Size
xmltable-20160909.patch text/x-patch 95.4 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-09 13:44:07
Message-ID: CAFj8pRBE_+LrxFcVBdXhv0J27vix2U4Q=HgpCy0q3xUyuxjHMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi,
>
> I am sending new version of this patch
>
> 1. now generic TableExpr is better separated from a real content generation
> 2. I removed cached typmod - using row type cache everywhere - it is
> consistent with other few places in Pg where dynamic types are used - the
> result tupdesc is generated few times more - but it is not on critical path.
> 3. More comments, few more lines in doc.
> 4. Reformated by pgindent
>

new update

more regress tests

Regards

Pavel

>
> Regards
>
> Pavel
>
>
>

Attachment Content-Type Size
xmltable-20160909-2.patch.gz application/x-gzip 21.4 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 01:58:46
Message-ID: CAMsr+YEhW0K-bEkZEGUiXb94vAODqDf2btFq-+UDRDY2FOAWaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 September 2016 at 21:44, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>
> 2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>
>> Hi,
>>
>> I am sending new version of this patch
>>
>> 1. now generic TableExpr is better separated from a real content
>> generation
>> 2. I removed cached typmod - using row type cache everywhere - it is
>> consistent with other few places in Pg where dynamic types are used - the
>> result tupdesc is generated few times more - but it is not on critical path.
>> 3. More comments, few more lines in doc.
>> 4. Reformated by pgindent

Thanks.

I applied this on top of the same base as your prior patch so I could
compare changes.

The new docs look good. Thanks for that, I know it's a pain. It'll
need to cover ORDINAL too, but that's not hard. I'll try to find some
time to help with the docs per the references you sent offlist.

Out of interest, should the syntax allow room for future expansion to
permit reading from file rather than just string literal / column
reference? It'd be ideal to avoid reading big documents wholly into
memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
suggest adding that to this patch, just making sure adding it later
would not cause problems.

I see you added a builder context abstraction as discussed, so there's
no longer any direct reference to XMLTABLE specifics from TableExpr
code. Good, thanks for that. It'll make things much less messy when
adding other table expression types as you expressed the desire to do,
and means the TableExpr code now makes more sense as generic
infrastructure.

ExecEvalTableExprProtected and ExecEvalTableExpr are OK with me, or
better than execEvalTableExpr and ExecEvalTableExpr were anyway.
Eventual committer will probably have opinions here.

Mild nitpick: since you can have multiple namespaces, shouldn't
builder->SetNS be builder->AddNS ?

Added comments are helpful, thanks.

On first read-through this is a big improvement and addresses all the
concerns I raised. Documentation is much much better, thanks, I know
that's a pain.

I'll take a closer read-through shortly.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 04:28:44
Message-ID: CAMsr+YH6UW5ZxkfNx8zyVD65e7JuTmJ1FsKuX1ORk3pUprMVjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I'll take a closer read-through shortly.

Missing file. You omitted executor/tableexpr.h from the patch, so I
can't compile.

I've expanded and copy-edited the docs. Some of it is guesswork based
on the references you sent and a glance at the code. Please check my
changes carefully. I found a few surprises, like the fact that DEFAULT
isn't a normal literal, it's an xpath expression evaluated at the same
time as the rowexpression.

Updated patch attached as XMLTABLE-v3 includes the docs changes. Note
that it's missing tableexpr.h. For convenient review or to apply to
your working tree I also attach a diff of just my docs changes as
proposed-docs-changes.diff.

Docs:

- Can you send the sample data used to generate the example output?
I'd like to include at least a cut down part of it in the docs to make
it clear how the input correlates with output, and preferably put the
whole thing in an appendix.

- How does it decide what subset of the document to iterate over?
That's presumably rowexpr, which is xpath in postgresql? (I added this
to docs).

- xmlnamespaces clause in docs needs an example for a non-default namespace.

- What effect does xmlnamespaces clause have? Does supplying it allow
you to reference qualified names in xpath? What happens if you don't
specify it for a document that has namespaces or don't define all the
namespaces? What if you reference an undefined namespace in xpath?
What about if an undefined namespace isn't referenced by xpath, but is
inside a node selected by an xpath expression?

- What are the rules for converting the matched XML node into a value?
If the matched node is not a simple text node or lacks a text node as
its single child, what happens?

- What happens if the matched has multiple text node children? This
can happen if, for example, you have something like

<matchedNode>
some text <!-- comment splits up text node --> other text
</matchedNode>

- Is there a way to get an attribute as a value? If so, an example
should show this because it's going to be a common need. Presumably
you want node/@attrname ?

- What happens if COLUMNS is not specified at all? It looks like it
returns a single column result set with the matched entries as 'xml'
type, so added to docs, please verify.

- PASSING clause isn't really defined. You can specify one PASSING
entry as a literal/colref/expression, and it's the argument xml
document, right? The external docs you referred to say that PASSING
may have a BY VALUE keyword, alias its argument with AS, and may have
expressions, e.g.

PASSING BY VALUE '<x/>' AS a, '<y/>' AS b

Neither aliases nor multiple entries are supported by the code or
grammar. Should this be documented as a restriction? Do you know if
that's an extension by the other implementation or if it's SQL/XML
standard? (I've drafted a docs update to cover this in the updated
patch).

- What does BY REF mean? Should this just be mentioned with a "see
xmlexists(...)" since it seems to be compatibility noise? Is there a
corresponding BY VALUE or similar?

- The parser definitions re-use xmlexists_argument . I don't mind
that, but it's worth noting here in case others do.

- Why do the expression arguments take c_expr (anything allowed in
a_expr or b_expr), not b_expr (restricted expression) ?

- Column definitions are underdocumented. The grammar says they can be
NOT NULL, for example, but I don't see that in any of the references
you mailed me nor in the docs. What behaviour is expected for a NOT
NULL column? I've documented my best guess (not checked closely
against the code, please verify).

-

Test suggestions:

- Coverage of multiple text() node children of an element, where split
up by comment or similar

- Coverage of xpath that matches a node with child element nodes

More to come. Please review my docs changes in the mean time. I'm
spending a lot more time on this than I expected so I might have to
get onto other things for a while too.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
proposed-docs-changes.patch text/x-patch 6.3 KB
0001-XMLTABLE-v3.patch.gz application/x-gzip 22.6 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 04:36:28
Message-ID: CAMsr+YG-LjVMfG_4+50Ec=sAPXeghK9+M10aRYwdS-FvgU6Kbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 September 2016 at 12:28, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> I'll take a closer read-through shortly.

>DEFAULT
> isn't a normal literal, it's an xpath expression evaluated at the same
> time as the rowexpression.

Sorry for the spam, but turns out that's not the case as implemented
here. The docs you referenced say it should be an xpath expression,
but the implementation here is of a literal value, and examples
elsewhere on the Internet show a literal value. Unclear if the
referenced docs are wrong or what and I don't have anything to test
with.

Feel free to fix/trim the DEFAULT related changes in above docs patch as needed.

Also, tests/docs should probably cover what happens when PATH matches
more than one element, i.e. produces a list of more than one match.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 04:42:08
Message-ID: CAFj8pRDsBeTRR3xDXbHnR19ifCRrnq7SDwM=b-i_=5mbhOzTVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-12 6:36 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 12 September 2016 at 12:28, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> >> I'll take a closer read-through shortly.
>
> >DEFAULT
> > isn't a normal literal, it's an xpath expression evaluated at the same
> > time as the rowexpression.
>
> Sorry for the spam, but turns out that's not the case as implemented
> here. The docs you referenced say it should be an xpath expression,
> but the implementation here is of a literal value, and examples
> elsewhere on the Internet show a literal value. Unclear if the
> referenced docs are wrong or what and I don't have anything to test
> with.
>

It is not spam. The previous comment was not correct. DEFAULT is a
expression - result of this expression is used, when data is missing.

In standard, and some others implementation, this is literal only. It is
similar to DEFAULT clause in CREATE STATEMENT. Postgres allows expression
there. Usually Postgres allows expressions everywhere when it has sense,
and when it is allowed by bizon parser.

>
> Feel free to fix/trim the DEFAULT related changes in above docs patch as
> needed.
>
> Also, tests/docs should probably cover what happens when PATH matches
> more than one element, i.e. produces a list of more than one match.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 05:07:18
Message-ID: CAFj8pRAHpZ6+W-BE4_3PW-_XTEHEb3bjAzayCHS80re13Mv=cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-12 3:58 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 9 September 2016 at 21:44, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> >
> >
> > 2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> >>
> >> Hi,
> >>
> >> I am sending new version of this patch
> >>
> >> 1. now generic TableExpr is better separated from a real content
> >> generation
> >> 2. I removed cached typmod - using row type cache everywhere - it is
> >> consistent with other few places in Pg where dynamic types are used -
> the
> >> result tupdesc is generated few times more - but it is not on critical
> path.
> >> 3. More comments, few more lines in doc.
> >> 4. Reformated by pgindent
>
> Thanks.
>
> I applied this on top of the same base as your prior patch so I could
> compare changes.
>
> The new docs look good. Thanks for that, I know it's a pain. It'll
> need to cover ORDINAL too, but that's not hard. I'll try to find some
> time to help with the docs per the references you sent offlist.
>
> Out of interest, should the syntax allow room for future expansion to
> permit reading from file rather than just string literal / column
> reference? It'd be ideal to avoid reading big documents wholly into
> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
> suggest adding that to this patch, just making sure adding it later
> would not cause problems.
>

this is little bit different question - it is server side function, so
first question is - how to push usually client side content to server? Next
question is how to get this content to a executor. Now only COPY statement
is able to do.

I am thinking so this should not be problem, but it requires maybe some
special keywords - fileref, local fileref, and some changes in protocol.
Because this function has own implementation in parser/transform stage,
then nothing will be lost in process, and we can implement lazy parameter
evaluation. Another question if libxml2 has enough possibility to work with
stream.

One idea - we can introduce "external (server side|client side) blobs" with
special types and special streaming IO. With these types, there no changes
are necessary on syntax level. With this, the syntax sugar flag "BY REF"
can be useful.

>
> I see you added a builder context abstraction as discussed, so there's
> no longer any direct reference to XMLTABLE specifics from TableExpr
> code. Good, thanks for that. It'll make things much less messy when
> adding other table expression types as you expressed the desire to do,
> and means the TableExpr code now makes more sense as generic
> infrastructure.
>
> ExecEvalTableExprProtected and ExecEvalTableExpr are OK with me, or
> better than execEvalTableExpr and ExecEvalTableExpr were anyway.
> Eventual committer will probably have opinions here.
>
> Mild nitpick: since you can have multiple namespaces, shouldn't
> builder->SetNS be builder->AddNS ?
>

good idea.

>
> Added comments are helpful, thanks.
>
> On first read-through this is a big improvement and addresses all the
> concerns I raised. Documentation is much much better, thanks, I know
> that's a pain.
>
> I'll take a closer read-through shortly.
>

updated patch attached - with your documentation.

Regards

Pavel

>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

Attachment Content-Type Size
xmltable-5.patch text/x-patch 130.1 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 05:48:35
Message-ID: CAFj8pRA=BjeNe5+Ad9WgHZ4tW=5vVvLb2qb8Og54+ye1n++E8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-12 6:28 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> > I'll take a closer read-through shortly.
>
>
> Missing file. You omitted executor/tableexpr.h from the patch, so I
> can't compile.
>
> I've expanded and copy-edited the docs. Some of it is guesswork based
> on the references you sent and a glance at the code. Please check my
> changes carefully. I found a few surprises, like the fact that DEFAULT
> isn't a normal literal, it's an xpath expression evaluated at the same
> time as the rowexpression.
>
> Updated patch attached as XMLTABLE-v3 includes the docs changes. Note
> that it's missing tableexpr.h. For convenient review or to apply to
> your working tree I also attach a diff of just my docs changes as
> proposed-docs-changes.diff.
>
>
> Docs:
>
> - Can you send the sample data used to generate the example output?
> I'd like to include at least a cut down part of it in the docs to make
> it clear how the input correlates with output, and preferably put the
> whole thing in an appendix.
>

it is in regress tests.

>
> - How does it decide what subset of the document to iterate over?
> That's presumably rowexpr, which is xpath in postgresql? (I added this
> to docs).
>
> - xmlnamespaces clause in docs needs an example for a non-default
> namespace.
>
> - What effect does xmlnamespaces clause have? Does supplying it allow
> you to reference qualified names in xpath? What happens if you don't
> specify it for a document that has namespaces or don't define all the
> namespaces? What if you reference an undefined namespace in xpath?
> What about if an undefined namespace isn't referenced by xpath, but is
> inside a node selected by an xpath expression?
>

All this is under libxml2 control - when you use undefined namespace, then
libxml2 raises a error. The namespaces in document and in XPath queries are
absolutely independent - the relation is a URI. When you use bad URI
(referenced by name), then the result will be empty set. When you use
undefined name, then you will get a error.

>
> - What are the rules for converting the matched XML node into a value?
> If the matched node is not a simple text node or lacks a text node as
> its single child, what happens?
>

This process is described and controlled by "XML SQL mapping". The Postgres
has minimalistic implementation without possibility of external control and
without schema support. The my implementation is simple. When user doesn't
specify result target like explicit using of text() function, then the
text() function is used implicitly when target type is not XML. Then I dump
result to string and I enforce related input functions for target types.

>
> - What happens if the matched has multiple text node children? This
> can happen if, for example, you have something like
>
> <matchedNode>
> some text <!-- comment splits up text node --> other text
> </matchedNode>
>

depends on target type - it is allowed in XML, and it is disallowed for
other types. I though about support of a arrays - but the patch will be
much more complex - there can be recursion - so I disallowed it. When the
user have to solve this issue, then he can use nested XMLTABLE functions
and nested function is working with XML type.

Just for record - This issue is solved in JSON_TABLE functions - it allows
nested PATHS. But XMLTABLE doesn't allow it.

>
> - Is there a way to get an attribute as a value? If so, an example
> should show this because it's going to be a common need. Presumably
> you want node/@attrname ?
>

you can use reference to current node "." - so "./@attname" should to work
- a example is in regress tests

>
> - What happens if COLUMNS is not specified at all? It looks like it
> returns a single column result set with the matched entries as 'xml'
> type, so added to docs, please verify.
>

sure, that is it

>
>
> - PASSING clause isn't really defined. You can specify one PASSING
> entry as a literal/colref/expression, and it's the argument xml
> document, right? The external docs you referred to say that PASSING
> may have a BY VALUE keyword, alias its argument with AS, and may have
> expressions, e.g.
>
> PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
>
> Neither aliases nor multiple entries are supported by the code or
> grammar. Should this be documented as a restriction? Do you know if
> that's an extension by the other implementation or if it's SQL/XML
> standard? (I've drafted a docs update to cover this in the updated
> patch).
>

The ANSI allows to pass more documents - and then do complex queries with
XQuery. Passing more than one document has not sense in libxml2 based
implementation, so I didn't supported it. The referenced names can be
implemented later - but it needs to changes in XPATH function too.

>
>
> - What does BY REF mean? Should this just be mentioned with a "see
> xmlexists(...)" since it seems to be compatibility noise? Is there a
> corresponding BY VALUE or similar?
>

When the XML document is stored as serialized DOM, then by ref means link
on this DOM. It has not sense in Postgres - because we store XML documents
by value only.

>
>
> - The parser definitions re-use xmlexists_argument . I don't mind
> that, but it's worth noting here in case others do.
>

It is one clause - see SQL/XML doc PASSING <XML table argument passing
mechanism>

>
> - Why do the expression arguments take c_expr (anything allowed in
> a_expr or b_expr), not b_expr (restricted expression) ?
>

I don't know - I expect the problems with parser - because PASSING is
restricted keyword in ANSI/SQL and unreserved keyword in Postgres.

>
>
> - Column definitions are underdocumented. The grammar says they can be
> NOT NULL, for example, but I don't see that in any of the references
> you mailed me nor in the docs. What behaviour is expected for a NOT
> NULL column? I've documented my best guess (not checked closely
> against the code, please verify).
>
>
yes - some other databases allows it - I am thinking so it is useful.

> -
>
>
>
>
> Test suggestions:
>
> - Coverage of multiple text() node children of an element, where split
> up by comment or similar
>
> - Coverage of xpath that matches a node with child element nodes
>

I'll do it.

>
>
>
> More to come. Please review my docs changes in the mean time. I'm
> spending a lot more time on this than I expected so I might have to
> get onto other things for a while too.
>
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 05:52:55
Message-ID: CAFj8pRBJYkvwwASarwusKnTcrJjqZbz0enrOqoPnEoFrrAfwRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-12 6:36 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 12 September 2016 at 12:28, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> >> I'll take a closer read-through shortly.
>
> >DEFAULT
> > isn't a normal literal, it's an xpath expression evaluated at the same
> > time as the rowexpression.
>
> Sorry for the spam, but turns out that's not the case as implemented
> here. The docs you referenced say it should be an xpath expression,
> but the implementation here is of a literal value, and examples
> elsewhere on the Internet show a literal value. Unclear if the
> referenced docs are wrong or what and I don't have anything to test
> with.
>
> Feel free to fix/trim the DEFAULT related changes in above docs patch as
> needed.
>
> Also, tests/docs should probably cover what happens when PATH matches
> more than one element, i.e. produces a list of more than one match.
>

It is there for case, when this is allowed. When you change the target
type to any non XML type, then a error is raised.

I didn't write a negative test cases until the text of messages will be
final (or checked by native speaker).

Regards

Pavel

>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 06:02:33
Message-ID: CAMsr+YGUUzDjiO7uB0Z1v1qiqRy5ieAJn=qcYz69T9z3Oo4ySQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 September 2016 at 13:07, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>> Out of interest, should the syntax allow room for future expansion to
>> permit reading from file rather than just string literal / column
>> reference? It'd be ideal to avoid reading big documents wholly into
>> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
>> suggest adding that to this patch, just making sure adding it later
>> would not cause problems.
>
>
> this is little bit different question - it is server side function, so first
> question is - how to push usually client side content to server? Next
> question is how to get this content to a executor. Now only COPY statement
> is able to do.

Probably start with support for server-side files. When people are
dealing with really big files they'll be more willing to copy files to
the server or bind them into the server file system over the network.

The v3 protocol doesn't really allow any way for client-to-server
streaming during a query, I think that's hopeless until we have a
protocol bump.

> updated patch attached - with your documentation.

Will take a look and a better read of the code. Likely tomorrow, I've
got work to do as well.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 06:02:50
Message-ID: CAFj8pRBTW7VTdYVpACOUsHA3GWy+8iU+_UutsbLMyLHgLoxTRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

There is some opened questions - the standard (and some other databases)
requires entering XPath expression as string literal.

I am thinking so it is too strong not necessary limit - (it enforces
dynamic query in more cases), so I allowed the expressions there.

Another questions is when these expressions should be evaluated. There are
two possibilities - once per query, once per input row. I selected "once
per input row mode" - it is simpler to implement it, and it is consistent
with other "similar" generators - see the behave and related discussion to
"array_to_string" and evaluation of separator argument. The switch to "once
per query" should not be hard - but it can be strange for users, because
some his volatile expression should be stable.

Regards

Pavel


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 06:14:49
Message-ID: CAMsr+YG2Y1jOzFPMHjyDNZWPYYV80eiAC02TGQRfGQ7p43P3Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 September 2016 at 14:02, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Hi
>
> There is some opened questions - the standard (and some other databases)
> requires entering XPath expression as string literal.
>
> I am thinking so it is too strong not necessary limit - (it enforces dynamic
> query in more cases), so I allowed the expressions there.

I agree. There's no reason not to permit expressions there, and there
are many other places where we have similar extensions.

> Another questions is when these expressions should be evaluated. There are
> two possibilities - once per query, once per input row. I selected "once per
> input row mode" - it is simpler to implement it, and it is consistent with
> other "similar" generators - see the behave and related discussion to
> "array_to_string" and evaluation of separator argument. The switch to "once
> per query" should not be hard - but it can be strange for users, because
> some his volatile expression should be stable.

I would've expected once per query. There's no way the expressions can
reference the row data, so there's no reason to evaluate them each
time.

The only use case I see for evaluating them each time is - maybe -
DEFAULT. Where maybe there's a use for nextval() or other volatile
functions. But honestly, I think that's better done explicitly in a
post-pass, i.e.

select uuid_generate_v4(), x.*
from (
xmltable(.....) x
);

in cases where that's what the user actually wants.

There's no other case I can think of where expressions as arguments to
set-returning functions are evaluated once per output row.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 06:46:26
Message-ID: CAMsr+YF4nv+oT6Pk38CUiD5eEoJNLY=JrAs_xRp3CPqbXO2kjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 September 2016 at 13:07, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>> Out of interest, should the syntax allow room for future expansion to
>> permit reading from file rather than just string literal / column
>> reference? It'd be ideal to avoid reading big documents wholly into
>> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
>> suggest adding that to this patch, just making sure adding it later
>> would not cause problems.
>
>
> this is little bit different question - it is server side function, so first
> question is - how to push usually client side content to server? Next
> question is how to get this content to a executor. Now only COPY statement
> is able to do.

Probably start with support for server-side files. When people are
dealing with really big files they'll be more willing to copy files to
the server or bind them into the server file system over the network.

The v3 protocol doesn't really allow any way for client-to-server
streaming during a query, I think that's hopeless until we have a
protocol bump.

> updated patch attached - with your documentation.

>> - Can you send the sample data used to generate the example output?
>> I'd like to include at least a cut down part of it in the docs to make
>> it clear how the input correlates with output, and preferably put the
>> whole thing in an appendix.
>
>
> it is in regress tests.

Makes sense.

It's not that verbose (for XML) and I wonder if it's just worth
including it in-line in the docs along with the XMLTABLE example. It'd
be much easier to understand how XMLTABLE works and what it does then.

>> - What effect does xmlnamespaces clause have? Does supplying it allow
>> you to reference qualified names in xpath? What happens if you don't
>> specify it for a document that has namespaces or don't define all the
>> namespaces? What if you reference an undefined namespace in xpath?
>> What about if an undefined namespace isn't referenced by xpath, but is
>> inside a node selected by an xpath expression?
>
>
> All this is under libxml2 control - when you use undefined namespace, then
> libxml2 raises a error. The namespaces in document and in XPath queries are
> absolutely independent - the relation is a URI. When you use bad URI
> (referenced by name), then the result will be empty set. When you use
> undefined name, then you will get a error.

OK, makes sense.

>> - What are the rules for converting the matched XML node into a value?
>> If the matched node is not a simple text node or lacks a text node as
>> its single child, what happens?
>
> This process is described and controlled by "XML SQL mapping". The Postgres
> has minimalistic implementation without possibility of external control and
> without schema support. The my implementation is simple. When user doesn't
> specify result target like explicit using of text() function, then the
> text() function is used implicitly when target type is not XML. Then I dump
> result to string and I enforce related input functions for target types.

OK, so a subset of the full spec functionality is provided because of
limitations in Pg and libxml2. Makes sense.

My only big concern here is that use of text() is a common mistake in
XSLT, and I think the same thing will happen here. Users expect
comments to be ignored, but in fact a comment inserts a comment node
into the XML DOM, so a comment between two pieces of text produces a
tree of

element
text()
comment()
text()

If you match element/text(), you get a 2-node result and will presumably ERROR.

There is no good way to tell this from

element
text()
element
text()

when you use an xpath expression like element/text() . So you can't
safely solve it just by concatenating all resulting text() nodes
without surprising behaviour.

>> - What happens if the matched has multiple text node children? This
>> can happen if, for example, you have something like
>>
>> <matchedNode>
>> some text <!-- comment splits up text node --> other text
>> </matchedNode>
>
>
> depends on target type - it is allowed in XML, and it is disallowed for
> other types. I though about support of a arrays - but the patch will be much
> more complex - there can be recursion - so I disallowed it. When the user
> have to solve this issue, then he can use nested XMLTABLE functions and
> nested function is working with XML type.

I don't really understand how that'd work.

Do you know how other implementations handle this?

I think users are going to be VERY surprised when comments in text
break their XML.

>> - Is there a way to get an attribute as a value? If so, an example
>> should show this because it's going to be a common need. Presumably
>> you want node/@attrname ?
>
>
> you can use reference to current node "." - so "./@attname" should to work -
> a example is in regress tests

cool, just needs mention in docs then.

>> - PASSING clause isn't really defined. You can specify one PASSING
>> entry as a literal/colref/expression, and it's the argument xml
>> document, right? The external docs you referred to say that PASSING
>> may have a BY VALUE keyword, alias its argument with AS, and may have
>> expressions, e.g.
>>
>> PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
>>
>> Neither aliases nor multiple entries are supported by the code or
>> grammar. Should this be documented as a restriction?
>
>
> The ANSI allows to pass more documents - and then do complex queries with
> XQuery. Passing more than one document has not sense in libxml2 based
> implementation, so I didn't supported it. The referenced names can be
> implemented later - but it needs to changes in XPATH function too.

OK, so my docs addition that just says they're not supported should be fine.

>> - What does BY REF mean? Should this just be mentioned with a "see
>> xmlexists(...)" since it seems to be compatibility noise? Is there a
>> corresponding BY VALUE or similar?
>
> When the XML document is stored as serialized DOM, then by ref means link on
> this DOM. It has not sense in Postgres - because we store XML documents by
> value only.

Right. And since there's already precent for xmlexists there's no
point worrying about whether we lose opportunities to implement it
later.

>> - Why do the expression arguments take c_expr (anything allowed in
>> a_expr or b_expr), not b_expr (restricted expression) ?
>
>
> I don't know - I expect the problems with parser - because PASSING is
> restricted keyword in ANSI/SQL and unreserved keyword in Postgres.

I mean for the rowpath argument, not the parts within
xmlexists_argument. If the rowpath doesn't need to be c_expr
presumably it should be a b_expr or even, if it doesn't cause parsing
ambiguities, an a_expr ? There doesn't seem to be the same issue here
as we have with BETWEEN etc.

>> - Column definitions are underdocumented. The grammar says they can be
>> NOT NULL, for example, but I don't see that in any of the references
>> you mailed me nor in the docs. What behaviour is expected for a NOT
>> NULL column? I've documented my best guess (not checked closely
>> against the code, please verify).
>>
>
> yes - some other databases allows it - I am thinking so it is useful.

Sure. Sounds like my docs additions are probably right then, except
for incorrect description of DEFAULT.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-15 11:31:43
Message-ID: CAFj8pRDgrHTAhXNXKg-XeyTXYyUOfqWTmvkLS8_6cOCYVimpEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-12 8:46 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 12 September 2016 at 13:07, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> >> Out of interest, should the syntax allow room for future expansion to
> >> permit reading from file rather than just string literal / column
> >> reference? It'd be ideal to avoid reading big documents wholly into
> >> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
> >> suggest adding that to this patch, just making sure adding it later
> >> would not cause problems.
> >
> >
> > this is little bit different question - it is server side function, so
> first
> > question is - how to push usually client side content to server? Next
> > question is how to get this content to a executor. Now only COPY
> statement
> > is able to do.
>
> Probably start with support for server-side files. When people are
> dealing with really big files they'll be more willing to copy files to
> the server or bind them into the server file system over the network.
>
> The v3 protocol doesn't really allow any way for client-to-server
> streaming during a query, I think that's hopeless until we have a
> protocol bump.
>
> > updated patch attached - with your documentation.
>
>
> >> - Can you send the sample data used to generate the example output?
> >> I'd like to include at least a cut down part of it in the docs to make
> >> it clear how the input correlates with output, and preferably put the
> >> whole thing in an appendix.
> >
> >
> > it is in regress tests.
>
> Makes sense.
>
> It's not that verbose (for XML) and I wonder if it's just worth
> including it in-line in the docs along with the XMLTABLE example. It'd
> be much easier to understand how XMLTABLE works and what it does then.
>
> >> - What effect does xmlnamespaces clause have? Does supplying it allow
> >> you to reference qualified names in xpath? What happens if you don't
> >> specify it for a document that has namespaces or don't define all the
> >> namespaces? What if you reference an undefined namespace in xpath?
> >> What about if an undefined namespace isn't referenced by xpath, but is
> >> inside a node selected by an xpath expression?
> >
> >
> > All this is under libxml2 control - when you use undefined namespace,
> then
> > libxml2 raises a error. The namespaces in document and in XPath queries
> are
> > absolutely independent - the relation is a URI. When you use bad URI
> > (referenced by name), then the result will be empty set. When you use
> > undefined name, then you will get a error.
>
> OK, makes sense.
>
> >> - What are the rules for converting the matched XML node into a value?
> >> If the matched node is not a simple text node or lacks a text node as
> >> its single child, what happens?
> >
> > This process is described and controlled by "XML SQL mapping". The
> Postgres
> > has minimalistic implementation without possibility of external control
> and
> > without schema support. The my implementation is simple. When user
> doesn't
> > specify result target like explicit using of text() function, then the
> > text() function is used implicitly when target type is not XML. Then I
> dump
> > result to string and I enforce related input functions for target types.
>
> OK, so a subset of the full spec functionality is provided because of
> limitations in Pg and libxml2. Makes sense.
>
> My only big concern here is that use of text() is a common mistake in
> XSLT, and I think the same thing will happen here. Users expect
> comments to be ignored, but in fact a comment inserts a comment node
> into the XML DOM, so a comment between two pieces of text produces a
> tree of
>
> element
> text()
> comment()
> text()
>
>
> If you match element/text(), you get a 2-node result and will presumably
> ERROR.
>
> There is no good way to tell this from
>
> element
> text()
> element
> text()
>
> when you use an xpath expression like element/text() . So you can't
> safely solve it just by concatenating all resulting text() nodes
> without surprising behaviour.
>
> >> - What happens if the matched has multiple text node children? This
> >> can happen if, for example, you have something like
> >>
> >> <matchedNode>
> >> some text <!-- comment splits up text node --> other text
> >> </matchedNode>
> >
>

I fixed this case - new regress tests added

> >
> > depends on target type - it is allowed in XML, and it is disallowed for
> > other types. I though about support of a arrays - but the patch will be
> much
> > more complex - there can be recursion - so I disallowed it. When the user
> > have to solve this issue, then he can use nested XMLTABLE functions and
> > nested function is working with XML type.
>
> I don't really understand how that'd work.
>
> Do you know how other implementations handle this?
>
> I think users are going to be VERY surprised when comments in text
> break their XML.
>
> >> - Is there a way to get an attribute as a value? If so, an example
> >> should show this because it's going to be a common need. Presumably
> >> you want node/@attrname ?
> >
> >
> > you can use reference to current node "." - so "./@attname" should to
> work -
> > a example is in regress tests
>
> cool, just needs mention in docs then.
>
> >> - PASSING clause isn't really defined. You can specify one PASSING
> >> entry as a literal/colref/expression, and it's the argument xml
> >> document, right? The external docs you referred to say that PASSING
> >> may have a BY VALUE keyword, alias its argument with AS, and may have
> >> expressions, e.g.
> >>
> >> PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
> >>
> >> Neither aliases nor multiple entries are supported by the code or
> >> grammar. Should this be documented as a restriction?
> >
> >
> > The ANSI allows to pass more documents - and then do complex queries with
> > XQuery. Passing more than one document has not sense in libxml2 based
> > implementation, so I didn't supported it. The referenced names can be
> > implemented later - but it needs to changes in XPATH function too.
>
> OK, so my docs addition that just says they're not supported should be
> fine.
>
> >> - What does BY REF mean? Should this just be mentioned with a "see
> >> xmlexists(...)" since it seems to be compatibility noise? Is there a
> >> corresponding BY VALUE or similar?
> >
> > When the XML document is stored as serialized DOM, then by ref means
> link on
> > this DOM. It has not sense in Postgres - because we store XML documents
> by
> > value only.
>
> Right. And since there's already precent for xmlexists there's no
> point worrying about whether we lose opportunities to implement it
> later.
>
> >> - Why do the expression arguments take c_expr (anything allowed in
> >> a_expr or b_expr), not b_expr (restricted expression) ?
> >
> >
> > I don't know - I expect the problems with parser - because PASSING is
> > restricted keyword in ANSI/SQL and unreserved keyword in Postgres.
>
> I mean for the rowpath argument, not the parts within
> xmlexists_argument. If the rowpath doesn't need to be c_expr
> presumably it should be a b_expr or even, if it doesn't cause parsing
> ambiguities, an a_expr ? There doesn't seem to be the same issue here
> as we have with BETWEEN etc.
>

b_expr enforces shift/reduce conflict :(

>
> >> - Column definitions are underdocumented. The grammar says they can be
> >> NOT NULL, for example, but I don't see that in any of the references
> >> you mailed me nor in the docs. What behaviour is expected for a NOT
> >> NULL column? I've documented my best guess (not checked closely
> >> against the code, please verify).
> >>
> >
> > yes - some other databases allows it - I am thinking so it is useful.
>
> Sure. Sounds like my docs additions are probably right then, except
> for incorrect description of DEFAULT.
>

I found other opened question - how we can translate empty tag to SQL
value? The Oracle should not to solve this question, but PostgreSQL does.
Some databases returns empty string.

I prefer return a empty string - not null in this case. The reason is
simple - Empty string is some information - and NULL is less information.
When it is necessary I can transform empty string to NULL - different
direction is not unique.

Regards

Pavel

>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

Attachment Content-Type Size
xmltable-6.patch.gz application/x-gzip 23.3 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-15 23:44:38
Message-ID: CAMsr+YHX=Uzf0_9UW0uD20pBW7oMthJ=E7Yb8QxeSOa0pO8hSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15 September 2016 at 19:31, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> b_expr enforces shift/reduce conflict :(

No problem then. I just thought it'd be worth allowing more if it
worked to do so.

> I found other opened question - how we can translate empty tag to SQL value?
> The Oracle should not to solve this question, but PostgreSQL does. Some
> databases returns empty string.

Oracle doesn't solve the problem? it ERRORs?

> I prefer return a empty string - not null in this case.

I agree, and that's consistent with how most XML is interpreted. XSLT
for example considers <x></x> and <x/> to be pretty much the same
thing.

> The reason is simple
> - Empty string is some information - and NULL is less information. When it
> is necessary I can transform empty string to NULL - different direction is
> not unique.

Yep, I definitely agree. The only issue is if people want a DEFAULT to
be applied for empty tags. But that's something they can do in a
post-process pass easily enough, since XMLTABLE is callable as a
subquery / WITH expression / etc.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-16 01:22:49
Message-ID: 0dbb5659-2c1a-c171-b25b-d3f1cd42f4a9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/12/16 1:14 AM, Craig Ringer wrote:
> I would've expected once per query. There's no way the expressions can
> reference the row data, so there's no reason to evaluate them each
> time.
>
> The only use case I see for evaluating them each time is - maybe -
> DEFAULT. Where maybe there's a use for nextval() or other volatile
> functions. But honestly, I think that's better done explicitly in a
> post-pass, i.e.
>
> select uuid_generate_v4(), x.*
> from (
> xmltable(.....) x
> );
>
> in cases where that's what the user actually wants.
>
> There's no other case I can think of where expressions as arguments to
> set-returning functions are evaluated once per output row.

The SQL standard appears to show what the behavior ought to be:

<XML table> is equivalent to

LATERAL ( XNDC
SELECT SLI1 AS CN1, SLI2 AS CN2, ..., SLINC AS CNNC FROM XMLITERATE (
XMLQUERY ( XTRP XQAL
RETURNING SEQUENCE BY REF EMPTY ON EMPTY ) )
AS I ( V, N )
) AS CORR DCLP

and SLIj is

CASE WHEN XEj
THEN XMLCAST( XQCj AS DTj CPMj )
ELSE DEFj END

where DEFj is the default expression.

So simplified it is

LATERAL ( SELECT CASE WHEN ... ELSE DEFj END, ... FROM something )

which indicates that the default expression is evaluated for every row.

If we're not sure about all this, it might be worth restricting the
default expressions to stable or immutable expressions for the time being.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-16 07:20:23
Message-ID: CAFj8pRDdcJcsXijfBsLtK+jVHBU4+6VUsFKdba6aOpLQS2GaWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-16 1:44 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 15 September 2016 at 19:31, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> > b_expr enforces shift/reduce conflict :(
>
> No problem then. I just thought it'd be worth allowing more if it
> worked to do so.
>
> > I found other opened question - how we can translate empty tag to SQL
> value?
> > The Oracle should not to solve this question, but PostgreSQL does. Some
> > databases returns empty string.
>
> Oracle doesn't solve the problem? it ERRORs?
>

Oracle returns NULL. But there are not any difference between NULL and
empty string

Regards

Pavel

>
> > I prefer return a empty string - not null in this case.
>
> I agree, and that's consistent with how most XML is interpreted. XSLT
> for example considers <x></x> and <x/> to be pretty much the same
> thing.
>
> > The reason is simple
> > - Empty string is some information - and NULL is less information. When
> it
> > is necessary I can transform empty string to NULL - different direction
> is
> > not unique.
>
> Yep, I definitely agree. The only issue is if people want a DEFAULT to
> be applied for empty tags. But that's something they can do in a
> post-process pass easily enough, since XMLTABLE is callable as a
> subquery / WITH expression / etc.
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-18 09:53:47
Message-ID: CAFj8pRAX_3ZdZAvw2qvuAAA6qt06fU+RAT73HZBDpdH_nUtWDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

new update:

* doc is moved to better place - xml processing functions
* few more regress tests
* call forgotten check_srf_call_placement

Regards

Pavel

Attachment Content-Type Size
xmltable-8.patch.gz application/x-gzip 26.0 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-21 18:31:12
Message-ID: CAFj8pRD+OSfD43EcpAtxFEjG7pHiFea125BP=KpG14VhVx16_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-18 11:53 GMT+02:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
> new update:
>
> * doc is moved to better place - xml processing functions
> * few more regress tests
> * call forgotten check_srf_call_placement
>

another small update - fix XMLPath parser - support multibytes characters

Regards

Pavel

>
> Regards
>
> Pavel
>

Attachment Content-Type Size
xmltable-9.patch.gz application/gzip 26.9 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-23 08:05:18
Message-ID: CAMsr+YHB27LPeME5UiEoujSMg0qsaaA+KQf3W62Co8f_gmfnyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22 September 2016 at 02:31, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> another small update - fix XMLPath parser - support multibytes characters

I'm returning for another round of review.

The code doesn't handle the 5 XML built-in entities correctly in
text-typed output. It processes &apos; and &quot; but not &amp, &lt or
&gt; . See added test. I have not fixed this, but I think it's clearly
broken:

+ -- XML builtin entities
+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
COLUMNS ent text);
+ ent
+ -------
+ '
+ "
+ &amp;
+ &lt;
+ &gt;
+ (5 rows)

so I've adjusted the docs to claim that they're expanded. The code
needs fixing to avoid entity-escaping when the output column type is
not 'xml'.

&apos; and &quot; entities in xml-typed output are expanded, not
preserved. I don't know if this is intended but I suspect it is:

+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
COLUMNS ent xml);
+ ent
+ ------------------
+ <ent>'</ent>
+ <ent>"</ent>
+ <ent>&amp;</ent>
+ <ent>&lt;</ent>
+ <ent>&gt;</ent>
+ (5 rows)

For the docs changes relevant to the above search for "The five
predefined XML entities". Adjust that bit of docs if I guessed wrong
about the intended behaviour.

The tests don't cover CDATA or PCDATA . I didn't try to add that, but
they should.

Did some docs copy-editing and integrated some examples. Explained how
nested elements work, that multiple top level elements is an error,
etc. Explained the time-of-evaluation stuff. Pointed out that you can
refer to prior output columns in PATH and DEFAULT, since that's weird
and unusual compared to normal SQL. Documented handling of multiple
node matches, including the surprising results of somepath/text() on
<somepath>x<!--blah-->y</somepath>. Documented handling of nested
elements. Documented that xmltable works only on XML documents, not
fragments/forests.

Regarding evaluation time, it struck me that evaluating path
expressions once per row means the xpath must be parsed and processed
once per row. Isn't it desirable to store and re-use the preparsed
xpath? I don't think this is a major problem, since we can later
detect stable/immutable expressions including constants, evaluate only
once in that case, and cache. It's just worth thinking about.

The docs and tests don't seem to cover XML entities. What's the
behaviour there? Core XML only defines one entity, but if a schema
defines more how are they processed? The tests need to cover the
predefined entities &quot; &amp; &apos; &lt; and &gt; at least.

I have no idea whether the current code can fetch a DTD and use any
<!ENTITY > declarations to expand entities, but I'm guessing not? If
not, external DTDs, and internal DTDs with external entities should be
documented as unsupported.

It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):

SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>
$XML$ COLUMNS foo text);

+ ERROR: invalid XML content
+ LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
+ ^
+ DETAIL: line 2: StartTag: invalid element name
+ <!DOCTYPE foo [
+ ^
+ line 3: StartTag: invalid element name
+ <!ELEMENT foo (#PCDATA)>
+ ^
+ line 4: StartTag: invalid element name
+ <!ENTITY pg "PostgreSQL">
+ ^
+ line 6: Entity 'pg' not defined
+ <foo>Hello &pg;.</foo>
+ ^

libxml seems to support documents with internal DTDs:

$ xmllint --valid /tmp/x
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>

so presumably the issue lies in the xpath stuff? Note that it's not
even ignoring the DTD and choking on the undefined entity, it's
choking on the DTD its self.

OK, code comments:

In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
instead of a PG_TRY() / PG_CATCH() block?

I think the new way you handle the type stuff is much, much better,
and with comments to explain too. Thanks very much.

There's an oversight in tableexpr vs xmltable separation here:

+ case T_TableExpr:
+ *name = "xmltable";
+ return 2;

presumably you need to look at the node and decide what kind of table
expression it is or just use a generic "tableexpr".

Same problem here:

+ case T_TableExpr:
+ {
+ TableExpr *te = (TableExpr *) node;
+
+ /* c_expr shoud be closed in brackets */
+ appendStringInfoString(buf, "XMLTABLE(");

I don't have the libxml knowledge or remaining brain to usefully
evaluate the xpath and xml specifics in xpath.c today. It does strike
me that the new xpath parser should probably live in its own file,
though.

I think this is all a big improvement. Barring the notes above and my
lack of review of the guts of the xml.c parts of it, I'm pretty happy
with what I see now.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-23 08:07:24
Message-ID: CAMsr+YFHgKq0nXm4D94E+kw3-t7zpMXQrOnSrdrxY92PPgsxTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Did some docs copy-editing and integrated some examples.

Whoops, forgot to attach.

Rather than sending a whole new copy of the patch, here's a diff
against your patched tree of my changes so you can see what I've done
and apply the parts you want.

Note that I didn't updated the expected files.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
docs-and-test-updates-to-xmltable-v9.patch text/x-patch 8.7 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-23 08:29:18
Message-ID: CAFj8pRCjmsh11c+wqsAjSGA=1sXxmRAZqC86Tz-7Qcjg3-zcQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-23 10:05 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 22 September 2016 at 02:31, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> > another small update - fix XMLPath parser - support multibytes characters
>
> I'm returning for another round of review.
>
> The code doesn't handle the 5 XML built-in entities correctly in
> text-typed output. It processes &apos; and &quot; but not &amp, &lt or
> &gt; . See added test. I have not fixed this, but I think it's clearly
> broken:
>
>
> + -- XML builtin entities
> + SELECT * FROM xmltable('/x/a' PASSING
> '<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><
> ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
> COLUMNS ent text);
> + ent
> + -------
> + '
> + "
> + &amp;
> + &lt;
> + &gt;
> + (5 rows)
>
> so I've adjusted the docs to claim that they're expanded. The code
> needs fixing to avoid entity-escaping when the output column type is
> not 'xml'.
>
>
> &apos; and &quot; entities in xml-typed output are expanded, not
> preserved. I don't know if this is intended but I suspect it is:
>
> + SELECT * FROM xmltable('/x/a' PASSING
> '<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><
> ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
> COLUMNS ent xml);
> + ent
> + ------------------
> + <ent>'</ent>
> + <ent>"</ent>
> + <ent>&amp;</ent>
> + <ent>&lt;</ent>
> + <ent>&gt;</ent>
> + (5 rows)
>
>
> For the docs changes relevant to the above search for "The five
> predefined XML entities". Adjust that bit of docs if I guessed wrong
> about the intended behaviour.
>
> The tests don't cover CDATA or PCDATA . I didn't try to add that, but
> they should.
>
>
> Did some docs copy-editing and integrated some examples. Explained how
> nested elements work, that multiple top level elements is an error,
> etc. Explained the time-of-evaluation stuff. Pointed out that you can
> refer to prior output columns in PATH and DEFAULT, since that's weird
> and unusual compared to normal SQL. Documented handling of multiple
> node matches, including the surprising results of somepath/text() on
> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
> elements. Documented that xmltable works only on XML documents, not
> fragments/forests.
>
> Regarding evaluation time, it struck me that evaluating path
> expressions once per row means the xpath must be parsed and processed
> once per row. Isn't it desirable to store and re-use the preparsed
> xpath? I don't think this is a major problem, since we can later
> detect stable/immutable expressions including constants, evaluate only
> once in that case, and cache. It's just worth thinking about.
>
> The docs and tests don't seem to cover XML entities. What's the
> behaviour there? Core XML only defines one entity, but if a schema
> defines more how are they processed? The tests need to cover the
> predefined entities &quot; &amp; &apos; &lt; and &gt; at least.
>
> I have no idea whether the current code can fetch a DTD and use any
> <!ENTITY > declarations to expand entities, but I'm guessing not? If
> not, external DTDs, and internal DTDs with external entities should be
> documented as unsupported.
>
> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
>
> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
> standalone="yes" ?>
> <!DOCTYPE foo [
> <!ELEMENT foo (#PCDATA)>
> <!ENTITY pg "PostgreSQL">
> ]>
> <foo>Hello &pg;.</foo>
> $XML$ COLUMNS foo text);
>
> + ERROR: invalid XML content
> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
> + ^
> + DETAIL: line 2: StartTag: invalid element name
> + <!DOCTYPE foo [
> + ^
> + line 3: StartTag: invalid element name
> + <!ELEMENT foo (#PCDATA)>
> + ^
> + line 4: StartTag: invalid element name
> + <!ENTITY pg "PostgreSQL">
> + ^
> + line 6: Entity 'pg' not defined
> + <foo>Hello &pg;.</foo>
> + ^
>
>
> libxml seems to support documents with internal DTDs:
>
> $ xmllint --valid /tmp/x
> <?xml version="1.0" standalone="yes"?>
> <!DOCTYPE foo [
> <!ELEMENT foo (#PCDATA)>
> <!ENTITY pg "PostgreSQL">
> ]>
> <foo>Hello &pg;.</foo>
>
>
> so presumably the issue lies in the xpath stuff? Note that it's not
> even ignoring the DTD and choking on the undefined entity, it's
> choking on the DTD its self.
>
>
> OK, code comments:
>
>
> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
> instead of a PG_TRY() / PG_CATCH() block?
>
>
> I think the new way you handle the type stuff is much, much better,
> and with comments to explain too. Thanks very much.
>
>
> There's an oversight in tableexpr vs xmltable separation here:
>
> + case T_TableExpr:
> + *name = "xmltable";
> + return 2;
>
> presumably you need to look at the node and decide what kind of table
> expression it is or just use a generic "tableexpr".
>
> Same problem here:
>
> + case T_TableExpr:
> + {
> + TableExpr *te = (TableExpr *) node;
> +
> + /* c_expr shoud be closed in brackets */
> + appendStringInfoString(buf, "XMLTABLE(");
>
>
This is correct, but not well commented - looks on XMLEXPR node - TableExpr
is a holder, but it is invisible for user. User running a XMLTABLE function
and should to see XMLTABLE. It will be more clean when we will support
JSON_TABLE function.

>
>
> I don't have the libxml knowledge or remaining brain to usefully
> evaluate the xpath and xml specifics in xpath.c today. It does strike
> me that the new xpath parser should probably live in its own file,
> though.
>

I'll try move it to separate file

>
> I think this is all a big improvement. Barring the notes above and my
> lack of review of the guts of the xml.c parts of it, I'm pretty happy
> with what I see now.
>

Thank you. I hope so all major issues are solved. Probably some XML
specific related issues are there - but I am happy, so you have well XML
knowledge and you will test a corner cases.

Regards

Pavel

>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-23 20:22:02
Message-ID: CAFj8pRAbLOLBUA--8zEBdDTCgV=GW5A080YCceDJwkxrKo0thA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2016-09-23 10:07 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> > Did some docs copy-editing and integrated some examples.
>
> Whoops, forgot to attach.
>
> Rather than sending a whole new copy of the patch, here's a diff
> against your patched tree of my changes so you can see what I've done
> and apply the parts you want.
>
> Note that I didn't updated the expected files.
>

I applied your patch - there is small misunderstanding. The PATH is
evaluated once for input row already. It is not clean in code, because it
is executor node started and running for all rows. I changed it in your
part of doc.

to a simple value before calling the function. <literal>PATH</>
+ expressions are normally evaluated <emphasis>exactly once per result
row ## per input row
+ </emphasis>,

Regards

Pavel

>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-24 06:01:33
Message-ID: CAFj8pRBZyamLA=BvAtXGq5pSBEG_N9we+uKy9rtFXedYq4rZQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2016-09-23 10:05 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 22 September 2016 at 02:31, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> > another small update - fix XMLPath parser - support multibytes characters
>
> I'm returning for another round of review.
>
> The code doesn't handle the 5 XML built-in entities correctly in
> text-typed output. It processes &apos; and &quot; but not &amp, &lt or
> &gt; . See added test. I have not fixed this, but I think it's clearly
> broken:
>
>
> + -- XML builtin entities
> + SELECT * FROM xmltable('/x/a' PASSING
> '<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><
> ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
> COLUMNS ent text);
> + ent
> + -------
> + '
> + "
> + &amp;
> + &lt;
> + &gt;
> + (5 rows)
>
> so I've adjusted the docs to claim that they're expanded. The code
> needs fixing to avoid entity-escaping when the output column type is
> not 'xml'.
>
>
fixed

>
> &apos; and &quot; entities in xml-typed output are expanded, not
> preserved. I don't know if this is intended but I suspect it is:
>
> + SELECT * FROM xmltable('/x/a' PASSING
> '<x><a><ent>&apos;</ent></a><a><ent>&quot;</ent></a><a><
> ent>&amp;</ent></a><a><ent>&lt;</ent></a><a><ent>&gt;</ent></a></x>'
> COLUMNS ent xml);
> + ent
> + ------------------
> + <ent>'</ent>
> + <ent>"</ent>
> + <ent>&amp;</ent>
> + <ent>&lt;</ent>
> + <ent>&gt;</ent>
> + (5 rows)
>
>
> For the docs changes relevant to the above search for "The five
> predefined XML entities". Adjust that bit of docs if I guessed wrong
> about the intended behaviour.
>
> The tests don't cover CDATA or PCDATA . I didn't try to add that, but
> they should.
>
>
appended

> Did some docs copy-editing and integrated some examples. Explained how
> nested elements work, that multiple top level elements is an error,
> etc. Explained the time-of-evaluation stuff. Pointed out that you can
> refer to prior output columns in PATH and DEFAULT, since that's weird
> and unusual compared to normal SQL. Documented handling of multiple
> node matches, including the surprising results of somepath/text() on
> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
> elements. Documented that xmltable works only on XML documents, not
> fragments/forests.
>

I don't understand to this sentence: "It is possible for a PATH expression
to reference output columns that appear before it in the column-list, so
paths may be dynamically constructed based on other parts of the XML
document:"

> Regarding evaluation time, it struck me that evaluating path
> expressions once per row means the xpath must be parsed and processed
> once per row. Isn't it desirable to store and re-use the preparsed
> xpath? I don't think this is a major problem, since we can later
> detect stable/immutable expressions including constants, evaluate only
> once in that case, and cache. It's just worth thinking about.
>

Probably it is possible - it is exactly how you wrote - it needs to check
the change. We can try do some possible performance optimizations later -
without compatibility issues. Now, I prefer the most simple code.

a note: PATH expression is evaluated for any **input** row. In same moment
is processed row path expression and man XML document DOM parsing. So
overhead of PATH expression and PATH parsing should not be dominant.

>
> The docs and tests don't seem to cover XML entities. What's the
> behaviour there? Core XML only defines one entity, but if a schema
> defines more how are they processed? The tests need to cover the
> predefined entities &quot; &amp; &apos; &lt; and &gt; at least.
>

I don't understand, what you are propose here. ?? Please, can you send some
examples.

>
> I have no idea whether the current code can fetch a DTD and use any
> <!ENTITY > declarations to expand entities, but I'm guessing not? If
> not, external DTDs, and internal DTDs with external entities should be
> documented as unsupported.
>
> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
>
> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
> standalone="yes" ?>
> <!DOCTYPE foo [
> <!ELEMENT foo (#PCDATA)>
> <!ENTITY pg "PostgreSQL">
> ]>
> <foo>Hello &pg;.</foo>
> $XML$ COLUMNS foo text);
>
> + ERROR: invalid XML content
> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
> + ^
> + DETAIL: line 2: StartTag: invalid element name
> + <!DOCTYPE foo [
> + ^
> + line 3: StartTag: invalid element name
> + <!ELEMENT foo (#PCDATA)>
> + ^
> + line 4: StartTag: invalid element name
> + <!ENTITY pg "PostgreSQL">
> + ^
> + line 6: Entity 'pg' not defined
> + <foo>Hello &pg;.</foo>
> + ^
>
>
It is rejected before XMLTABLE function call

postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
postgres$# <!DOCTYPE foo [
postgres$# <!ELEMENT foo (#PCDATA)>
postgres$# <!ENTITY pg "PostgreSQL">
postgres$# ]>
postgres$# <foo>Hello &pg;.</foo>
postgres$# $XML$::xml;
ERROR: invalid XML content
LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
^
DETAIL: line 2: StartTag: invalid element name
<!DOCTYPE foo [
^
line 3: StartTag: invalid element name
<!ELEMENT foo (#PCDATA)>
^
line 4: StartTag: invalid element name
<!ENTITY pg "PostgreSQL">
^
line 6: Entity 'pg' not defined
<foo>Hello &pg;.</foo>
^
It is disabled by default in libxml2. I found a function
xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html
http://www.xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault

The default behave should be common for all PostgreSQL's libxml2 based
function - and then it is different topic - maybe part for PostgreSQL ToDo?
But I don't remember any user requests related to this issue.

>
> libxml seems to support documents with internal DTDs:
>
> $ xmllint --valid /tmp/x
> <?xml version="1.0" standalone="yes"?>
> <!DOCTYPE foo [
> <!ELEMENT foo (#PCDATA)>
> <!ENTITY pg "PostgreSQL">
> ]>
> <foo>Hello &pg;.</foo>
>

I removed this tests - it is not related to XMLTABLE function, but to
generic XML processing/validation.

>
>
> so presumably the issue lies in the xpath stuff? Note that it's not
> even ignoring the DTD and choking on the undefined entity, it's
> choking on the DTD its self.
>
>
> OK, code comments:
>
>
> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
> instead of a PG_TRY() / PG_CATCH() block?
>

If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when
you want to catch FATAL errors (and when you want to clean shared memory).
XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors.

>
> I think the new way you handle the type stuff is much, much better,
> and with comments to explain too. Thanks very much.
>
>
> There's an oversight in tableexpr vs xmltable separation here:
>
> + case T_TableExpr:
> + *name = "xmltable";
> + return 2;
>
> presumably you need to look at the node and decide what kind of table
> expression it is or just use a generic "tableexpr".
>
> Same problem here:
>
> + case T_TableExpr:
> + {
> + TableExpr *te = (TableExpr *) node;
> +
> + /* c_expr shoud be closed in brackets */
> + appendStringInfoString(buf, "XMLTABLE(");
>
>
commented

>
>
> I don't have the libxml knowledge or remaining brain to usefully
> evaluate the xpath and xml specifics in xpath.c today. It does strike
> me that the new xpath parser should probably live in its own file,
> though.
>

moved

>
> I think this is all a big improvement. Barring the notes above and my
> lack of review of the guts of the xml.c parts of it, I'm pretty happy
> with what I see now.
>

new version is attached

Regards

Pavel

>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

Attachment Content-Type Size
xmltable-10.patch.gz application/gzip 28.2 KB
xmltable-diff10.diff text/x-patch 34.0 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-27 01:34:47
Message-ID: CAMsr+YE0aY+x5qDeQ8SPv0DBza_aATRyNWO19Gbdmvua3CxdrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24 September 2016 at 14:01, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"

>> The docs and tests don't seem to cover XML entities. What's the
>> behaviour there? Core XML only defines one entity, but if a schema
>> defines more how are they processed? The tests need to cover the
>> predefined entities &quot; &amp; &apos; &lt; and &gt; at least.
>
>
> I don't understand, what you are propose here. ?? Please, can you send some
> examples.

Per below - handling of DTD <!ENTITY> declarations, and the builtin
entity tests I already added tests for.

>> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
>>
>> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
>> standalone="yes" ?>
>> <!DOCTYPE foo [
>> <!ELEMENT foo (#PCDATA)>
>> <!ENTITY pg "PostgreSQL">
>> ]>
>> <foo>Hello &pg;.</foo>
>> $XML$ COLUMNS foo text);
>>
>> + ERROR: invalid XML content
>> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
>> + ^
>> + DETAIL: line 2: StartTag: invalid element name
>> + <!DOCTYPE foo [
>> + ^
>> + line 3: StartTag: invalid element name
>> + <!ELEMENT foo (#PCDATA)>
>> + ^
>> + line 4: StartTag: invalid element name
>> + <!ENTITY pg "PostgreSQL">
>> + ^
>> + line 6: Entity 'pg' not defined
>> + <foo>Hello &pg;.</foo>
>> + ^
>>
>
> It is rejected before XMLTABLE function call
>
> postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
> postgres$# <!DOCTYPE foo [
> postgres$# <!ELEMENT foo (#PCDATA)>
> postgres$# <!ENTITY pg "PostgreSQL">
> postgres$# ]>
> postgres$# <foo>Hello &pg;.</foo>
> postgres$# $XML$::xml;
> ERROR: invalid XML content
> LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
> ^
> DETAIL: line 2: StartTag: invalid element name
> <!DOCTYPE foo [
[snip]

> It is disabled by default in libxml2. I found a function
> xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html
> http://www.xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault
>
> The default behave should be common for all PostgreSQL's libxml2 based
> function - and then it is different topic - maybe part for PostgreSQL ToDo?
> But I don't remember any user requests related to this issue.

OK, so it's not xmltable specific. Fine by me.

Somebody who cares can deal with it. There's clearly nobody breaking
down the walls wanting the feature.

> I removed this tests - it is not related to XMLTABLE function, but to
> generic XML processing/validation.

Good plan.

>> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
>> instead of a PG_TRY() / PG_CATCH() block?
>
>
> If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when you
> want to catch FATAL errors (and when you want to clean shared memory).
> XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors.

Ok, makes sense.

>> I don't have the libxml knowledge or remaining brain to usefully
>> evaluate the xpath and xml specifics in xpath.c today. It does strike
>> me that the new xpath parser should probably live in its own file,
>> though.
>
> moved

Thanks.

> new version is attached

Great.

I'm marking this ready for committer at this point.

I think the XML parser likely needs a more close reading, so I'll ping
Peter E to see if he'll have a chance to check that bit out. But by
and large I think the issues have been ironed out - in terms of
functionality, structure and clarity I think it's looking solid.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-27 03:40:30
Message-ID: CAFj8pRDSADePLnc-EiT73i7k-RNdNrnupDiNqGNr=eaHnnpz0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-27 3:34 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 24 September 2016 at 14:01, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> >> Did some docs copy-editing and integrated some examples. Explained how
> >> nested elements work, that multiple top level elements is an error,
> >> etc. Explained the time-of-evaluation stuff. Pointed out that you can
> >> refer to prior output columns in PATH and DEFAULT, since that's weird
> >> and unusual compared to normal SQL. Documented handling of multiple
> >> node matches, including the surprising results of somepath/text() on
> >> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
> >> elements. Documented that xmltable works only on XML documents, not
> >> fragments/forests.
> >
> >
> > I don't understand to this sentence: "It is possible for a PATH
> expression
> > to reference output columns that appear before it in the column-list, so
> > paths may be dynamically constructed based on other parts of the XML
> > document:"
>
>
>
> >> The docs and tests don't seem to cover XML entities. What's the
> >> behaviour there? Core XML only defines one entity, but if a schema
> >> defines more how are they processed? The tests need to cover the
> >> predefined entities &quot; &amp; &apos; &lt; and &gt; at least.
> >
> >
> > I don't understand, what you are propose here. ?? Please, can you send
> some
> > examples.
>
> Per below - handling of DTD <!ENTITY> declarations, and the builtin
> entity tests I already added tests for.
>
>
> >> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
> >>
> >> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
> >> standalone="yes" ?>
> >> <!DOCTYPE foo [
> >> <!ELEMENT foo (#PCDATA)>
> >> <!ENTITY pg "PostgreSQL">
> >> ]>
> >> <foo>Hello &pg;.</foo>
> >> $XML$ COLUMNS foo text);
> >>
> >> + ERROR: invalid XML content
> >> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
> ...
> >> + ^
> >> + DETAIL: line 2: StartTag: invalid element name
> >> + <!DOCTYPE foo [
> >> + ^
> >> + line 3: StartTag: invalid element name
> >> + <!ELEMENT foo (#PCDATA)>
> >> + ^
> >> + line 4: StartTag: invalid element name
> >> + <!ENTITY pg "PostgreSQL">
> >> + ^
> >> + line 6: Entity 'pg' not defined
> >> + <foo>Hello &pg;.</foo>
> >> + ^
> >>
> >
> > It is rejected before XMLTABLE function call
> >
> > postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
> > postgres$# <!DOCTYPE foo [
> > postgres$# <!ELEMENT foo (#PCDATA)>
> > postgres$# <!ENTITY pg "PostgreSQL">
> > postgres$# ]>
> > postgres$# <foo>Hello &pg;.</foo>
> > postgres$# $XML$::xml;
> > ERROR: invalid XML content
> > LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
> > ^
> > DETAIL: line 2: StartTag: invalid element name
> > <!DOCTYPE foo [
> [snip]
>
> > It is disabled by default in libxml2. I found a function
> > xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html
> > http://www.xmlsoft.org/html/libxml-parser.html#
> xmlSubstituteEntitiesDefault
> >
> > The default behave should be common for all PostgreSQL's libxml2 based
> > function - and then it is different topic - maybe part for PostgreSQL
> ToDo?
> > But I don't remember any user requests related to this issue.
>
>
> OK, so it's not xmltable specific. Fine by me.
>
> Somebody who cares can deal with it. There's clearly nobody breaking
> down the walls wanting the feature.
>
> > I removed this tests - it is not related to XMLTABLE function, but to
> > generic XML processing/validation.
>
>
> Good plan.
>
> >> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
> >> instead of a PG_TRY() / PG_CATCH() block?
> >
> >
> > If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when
> you
> > want to catch FATAL errors (and when you want to clean shared memory).
> > XMLTABLE doesn't use shared memory, and doesn't need to catch fatal
> errors.
>
> Ok, makes sense.
>
>
> >> I don't have the libxml knowledge or remaining brain to usefully
> >> evaluate the xpath and xml specifics in xpath.c today. It does strike
> >> me that the new xpath parser should probably live in its own file,
> >> though.
> >
> > moved
>
> Thanks.
>
>
> > new version is attached
>
>
> Great.
>
> I'm marking this ready for committer at this point.
>

Thank you very much

Regards

Pavel

>
> I think the XML parser likely needs a more close reading, so I'll ping
> Peter E to see if he'll have a chance to check that bit out. But by
> and large I think the issues have been ironed out - in terms of
> functionality, structure and clarity I think it's looking solid.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-27 03:53:06
Message-ID: CAMsr+YELc_=aRk2t3p0htoUw=OssZA9oKWS=e2nCGWVb1S6qLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24 September 2016 at 14:01, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"

This was based on a misunderstanding of something you said earlier. I
thought the idea was to allow this to work:

SELECT * FROM xmltable('/x' PASSING
'<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
extractedValue text PATH elemName);

... but it doesn't:

SELECT * FROM xmltable('/x' PASSING
'<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
extractedValue text PATH elemName);
ERROR: column "elemname" does not exist
LINE 1: ...' COLUMNS elemName text, extractedValue text PATH elemName);

... so please delete that text. I thought I'd tested it but the state
of my tests dir says I just got distracted by another task at the
wrong time.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-27 05:29:03
Message-ID: CAFj8pRAC=z_qPdA5PHmP_VzsnTiLU2cOyF5_DWcQSjp+cmqryA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-09-27 5:53 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 24 September 2016 at 14:01, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> >> Did some docs copy-editing and integrated some examples. Explained how
> >> nested elements work, that multiple top level elements is an error,
> >> etc. Explained the time-of-evaluation stuff. Pointed out that you can
> >> refer to prior output columns in PATH and DEFAULT, since that's weird
> >> and unusual compared to normal SQL. Documented handling of multiple
> >> node matches, including the surprising results of somepath/text() on
> >> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
> >> elements. Documented that xmltable works only on XML documents, not
> >> fragments/forests.
> >
> >
> > I don't understand to this sentence: "It is possible for a PATH
> expression
> > to reference output columns that appear before it in the column-list, so
> > paths may be dynamically constructed based on other parts of the XML
> > document:"
>
> This was based on a misunderstanding of something you said earlier. I
> thought the idea was to allow this to work:
>
> SELECT * FROM xmltable('/x' PASSING
> '<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
> extractedValue text PATH elemName);
>
> ... but it doesn't:
>
>
> SELECT * FROM xmltable('/x' PASSING
> '<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
> extractedValue text PATH elemName);
> ERROR: column "elemname" does not exist
> LINE 1: ...' COLUMNS elemName text, extractedValue text PATH elemName);
>
> ... so please delete that text. I thought I'd tested it but the state
> of my tests dir says I just got distracted by another task at the
> wrong time.
>

deleted

Regards

Pavel

>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

Attachment Content-Type Size
xmltable-11.patch.gz application/x-gzip 28.1 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-10-03 02:18:44
Message-ID: CAB7nPqT+tAEvS0+S5G1eeaTS+aexntdnuEAPGk4V=r7RH41YCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 27, 2016 at 2:29 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2016-09-27 5:53 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>>
>> [...]
>> ... so please delete that text. I thought I'd tested it but the state
>> of my tests dir says I just got distracted by another task at the
>> wrong time.

Moved patch to next CF with same status: ready for committer.
--
Michael


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-10-14 04:23:45
Message-ID: CAFj8pRAVPWdVxff48yiRmf79HA3adrtR_Zh+7ed+4i2gb-aQ0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

new update - only doc

+ <para>
+ Only XPath query language is supported. PostgreSQL doesn't support
XQuery
+ language. Then the syntax of <function>xmltable</function> doesn't
+ allow to use XQuery related functionality - the name of xml expression
+ (clause <literal>AS</literal>) is not allowed, and only one xml
expression
+ should be passed to <function>xmltable</function> function as
parameter.
+ </para>

Regards

Pavel

Attachment Content-Type Size
xmltable-12.patch.gz application/x-gzip 28.2 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-17 18:22:26
Message-ID: 20161117182226.oxefipxl2rsdpyxv@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been going over this patch. I think it'd be better to restructure
the <sect2> before adding the docs for this new function; I already
split it out, so don't do anything about this.

Next, looking at struct TableExprBuilder I noticed that the comments are
already obsolete, as they talk about function params that do not exist
(missing_columns) and they fail to mention the ones that do exist.
Also, function member SetContent is not documented at all. Overall,
these comments do not convey a lot -- apparently, whoever reads them is
already supposed to know how it works: "xyz sets a row generating
filter" doesn't tell me anything. Since this is API documentation, it
needs to be much clearer.

ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
whatsoever. Needs fixed.

I wonder if it'd be a good idea to install TableExpr first without the
implementing XMLTABLE, so that it's clearer what is API and what is
implementation.

The number of new keywords in this patch is depressing. I suppose
there's no way around that -- as I understand, this is caused by the SQL
standard's definition of the syntax for this feature.

Have to go now for a bit -- will continue looking afterwards. Please
submit delta patches on top of your latest v12 to fix the comments I
mentioned.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-18 20:49:36
Message-ID: CAFj8pRCeRBTNin1VRt79GxersBtOjbkkgZNhy9=8AR1=jm6=-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2016-11-17 19:22 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> I've been going over this patch. I think it'd be better to restructure
> the <sect2> before adding the docs for this new function; I already
> split it out, so don't do anything about this.
>
> Next, looking at struct TableExprBuilder I noticed that the comments are
> already obsolete, as they talk about function params that do not exist
> (missing_columns) and they fail to mention the ones that do exist.
> Also, function member SetContent is not documented at all. Overall,
> these comments do not convey a lot -- apparently, whoever reads them is
> already supposed to know how it works: "xyz sets a row generating
> filter" doesn't tell me anything. Since this is API documentation, it
> needs to be much clearer.
>
> ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
> whatsoever. Needs fixed.
>

I am sending the patch with more comments - but it needs a care someone
with good English skills.

>
> I wonder if it'd be a good idea to install TableExpr first without the
> implementing XMLTABLE, so that it's clearer what is API and what is
> implementation.
>

I am not sure about this step - the API is clean from name. In this moment,
for this API is not any other tests than XMLTABLE implementation.

>
> The number of new keywords in this patch is depressing. I suppose
> there's no way around that -- as I understand, this is caused by the SQL
> standard's definition of the syntax for this feature.
>
> Have to go now for a bit -- will continue looking afterwards. Please
> submit delta patches on top of your latest v12 to fix the comments I
> mentioned.
>
>
Regards

Pavel

> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
fix-comments.patch text/x-patch 4.1 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-18 20:53:56
Message-ID: 20161118205356.xpmttzfklnjiqrfv@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> 2016-11-17 19:22 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > Next, looking at struct TableExprBuilder I noticed that the comments are
> > already obsolete, as they talk about function params that do not exist
> > (missing_columns) and they fail to mention the ones that do exist.
> > Also, function member SetContent is not documented at all. Overall,
> > these comments do not convey a lot -- apparently, whoever reads them is
> > already supposed to know how it works: "xyz sets a row generating
> > filter" doesn't tell me anything. Since this is API documentation, it
> > needs to be much clearer.
> >
> > ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
> > whatsoever. Needs fixed.
>
> I am sending the patch with more comments - but it needs a care someone
> with good English skills.

Thanks, I can help with that.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-18 23:42:15
Message-ID: 20161118234215.s72fjuzedwfl7q5h@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The SQL standard seems to require a comma after the XMLNAMESPACES
clause:

<XML table> ::=
XMLTABLE <left paren>
[ <XML namespace declaration> <comma> ]
<XML table row pattern>
[ <XML table argument list> ]
COLUMNS <XML table column definitions> <right paren>

I don't understand the reason for that, but I have added it:

| XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument ')'
{
TableExpr *n = makeNode(TableExpr);
n->row_path = $8;
n->expr = $9;
n->cols = NIL;
n->namespaces = $5;
n->location = @1;
$$ = (Node *)n;
}
| XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
{
TableExpr *n = makeNode(TableExpr);
n->row_path = $8;
n->expr = $9;
n->cols = $11;
n->namespaces = $5;
n->location = @1;
$$ = (Node *)n;
}
;

Another thing I did was remove the TableExprColOptionsOpt production; in
its place I added a third rule in TableExprCol for "ColId Typename
IsNotNull" (i.e. no options). This seems to reduce the size of the
generated gram.c a good dozen kB.

I didn't like much the use of c_expr in all these productions. As I
understand it, c_expr is mostly an implementation artifact and we should
be using a_expr or b_expr almost everywhere. I see that XMLEXISTS
already expanded the very limited use of c_expr there was; I would
prefer to fix that one too rather than replicate it here. TBH I'm not
sure I like that XMLTABLE is re-using xmlexists_argument.

Actually, is the existing XMLEXISTS production correct? What I see in
the standard is

<XML table row pattern> ::= <character string literal>

<XML table argument list> ::=
PASSING <XML table argument passing mechanism> <XML query argument>
[ { <comma> <XML query argument> }... ]

<XML table argument passing mechanism> ::= <XML passing mechanism>

<XML table column definitions> ::= <XML table column definition> [ { <comma> <XML table column definition> }... ]

<XML table column definition> ::=
<XML table ordinality column definition>
| <XML table regular column definition>

<XML table ordinality column definition> ::= <column name> FOR ORDINALITY

<XML table regular column definition> ::=
<column name> <data type> [ <XML passing mechanism> ]
[ <default clause> ]
[ PATH <XML table column pattern> ]

<XML table column pattern> ::= <character string literal>

so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>", but what
we have in gram.y is:

/* We allow several variants for SQL and other compatibility. */
xmlexists_argument:
PASSING c_expr
{
$$ = $2;
}
| PASSING c_expr BY REF
{
$$ = $2;
}
| PASSING BY REF c_expr
{
$$ = $4;
}
| PASSING BY REF c_expr BY REF
{
$$ = $4;
}
;

I'm not sure why we allow "PASSING c_expr" at all. Maybe if BY VALUE/BY
REF is not specified, we should just not have PASSING at all?

If we extended this for XMLEXISTS for compatibility with some other
product, perhaps we should look into what that product supports for
XMLTABLE; maybe XMLTABLE does not need all the same options as
XMLEXISTS.

The fourth option seems very dubious to me. This was added by commit
641459f26, submitted here:
/message-id/4C0F6DBF.9000001@mlfowler.com

... Hm, actually some perusal of the XMLEXISTS predicate in the standard
shows that it's quite a different thing from XMLTABLE. Maybe we
shouldn't reuse xmlexists_argument here at all.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-19 04:19:52
Message-ID: CAFj8pRA=2RJ7WEFBCuGQyXsyaZii+DePWsnOr4kMiQc7qkJOBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-19 0:42 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> The SQL standard seems to require a comma after the XMLNAMESPACES
> clause:
>
> <XML table> ::=
> XMLTABLE <left paren>
> [ <XML namespace declaration> <comma> ]
> <XML table row pattern>
> [ <XML table argument list> ]
> COLUMNS <XML table column definitions> <right paren>
>
> I don't understand the reason for that, but I have added it:
>
> | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList
> ')' ',' c_expr xmlexists_argument ')'
> {
> TableExpr *n = makeNode(TableExpr);
> n->row_path = $8;
> n->expr = $9;
> n->cols = NIL;
> n->namespaces = $5;
> n->location = @1;
> $$ = (Node *)n;
> }
> | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList
> ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
> {
> TableExpr *n = makeNode(TableExpr);
> n->row_path = $8;
> n->expr = $9;
> n->cols = $11;
> n->namespaces = $5;
> n->location = @1;
> $$ = (Node *)n;
> }
> ;
>
>
yes, looks my oversight - it is better

> Another thing I did was remove the TableExprColOptionsOpt production; in
> its place I added a third rule in TableExprCol for "ColId Typename
> IsNotNull" (i.e. no options). This seems to reduce the size of the
> generated gram.c a good dozen kB.
>

If I remember well - this was required by better compatibility with Oracle

ANSI SQL: colname type DEFAULT PATH
Oracle: colname PATH DEFAULT

My implementation allows both combinations - there are two reasons: 1. one
less issue when people does port from Oracle, 2. almost all examples of
XMLTABLE on a net are from Oracle - it can be unfriendly, when these
examples would not work on PG - there was discussion about this issue in
this mailing list

>
>
> I didn't like much the use of c_expr in all these productions. As I
> understand it, c_expr is mostly an implementation artifact and we should
> be using a_expr or b_expr almost everywhere. I see that XMLEXISTS
> already expanded the very limited use of c_expr there was; I would
> prefer to fix that one too rather than replicate it here. TBH I'm not
> sure I like that XMLTABLE is re-using xmlexists_argument.
>

There are two situations: c_expr as document content, and c_expr after
DEFAULT and PATH keywords. First probably can be fixed, second not, because
"PATH" is unreserved keyword only.

>
> Actually, is the existing XMLEXISTS production correct? What I see in
> the standard is
>
> <XML table row pattern> ::= <character string literal>
>
> <XML table argument list> ::=
> PASSING <XML table argument passing mechanism> <XML query argument>
> [ { <comma> <XML query argument> }... ]
>
> <XML table argument passing mechanism> ::= <XML passing mechanism>
>
> <XML table column definitions> ::= <XML table column definition> [ {
> <comma> <XML table column definition> }... ]
>
> <XML table column definition> ::=
> <XML table ordinality column definition>
> | <XML table regular column definition>
>
> <XML table ordinality column definition> ::= <column name> FOR ORDINALITY
>
> <XML table regular column definition> ::=
> <column name> <data type> [ <XML passing mechanism> ]
> [ <default clause> ]
> [ PATH <XML table column pattern> ]
>
> <XML table column pattern> ::= <character string literal>
>
> so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>",
> but what
> we have in gram.y is:
>
> /* We allow several variants for SQL and other compatibility. */
> xmlexists_argument:
> PASSING c_expr
> {
> $$ = $2;
> }
> | PASSING c_expr BY REF
> {
> $$ = $2;
> }
> | PASSING BY REF c_expr
> {
> $$ = $4;
> }
> | PASSING BY REF c_expr BY REF
> {
> $$ = $4;
> }
> ;
>
> I'm not sure why we allow "PASSING c_expr" at all. Maybe if BY VALUE/BY
> REF is not specified, we should just not have PASSING at all?
>
>
If we extended this for XMLEXISTS for compatibility with some other
> product, perhaps we should look into what that product supports for
> XMLTABLE; maybe XMLTABLE does not need all the same options as
> XMLEXISTS.
>
>
The reason is a compatibility with other products - DB2. XMLTABLE uses same
options like XMLEXISTS. These options has zero value for Postgres, but its
are important - compatibility, and workable examples.

> The fourth option seems very dubious to me. This was added by commit
> 641459f26, submitted here:
> /message-id/4C0F6DBF.9000001@mlfowler.com
>
> ... Hm, actually some perusal of the XMLEXISTS predicate in the standard
> shows that it's quite a different thing from XMLTABLE. Maybe we
> shouldn't reuse xmlexists_argument here at all.
>

not sure If I understand

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-19 04:25:46
Message-ID: CAFj8pRD6QB8adTOqST_TJv6v9eFZK7twzD7Fqw93asXqh81HPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-19 5:19 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2016-11-19 0:42 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>> The SQL standard seems to require a comma after the XMLNAMESPACES
>> clause:
>>
>> <XML table> ::=
>> XMLTABLE <left paren>
>> [ <XML namespace declaration> <comma> ]
>> <XML table row pattern>
>> [ <XML table argument list> ]
>> COLUMNS <XML table column definitions> <right paren>
>>
>> I don't understand the reason for that, but I have added it:
>>
>> | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList
>> ')' ',' c_expr xmlexists_argument ')'
>> {
>> TableExpr *n =
>> makeNode(TableExpr);
>> n->row_path = $8;
>> n->expr = $9;
>> n->cols = NIL;
>> n->namespaces = $5;
>> n->location = @1;
>> $$ = (Node *)n;
>> }
>> | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList
>> ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
>> {
>> TableExpr *n =
>> makeNode(TableExpr);
>> n->row_path = $8;
>> n->expr = $9;
>> n->cols = $11;
>> n->namespaces = $5;
>> n->location = @1;
>> $$ = (Node *)n;
>> }
>> ;
>>
>>
> yes, looks my oversight - it is better
>
>
>
>> Another thing I did was remove the TableExprColOptionsOpt production; in
>> its place I added a third rule in TableExprCol for "ColId Typename
>> IsNotNull" (i.e. no options). This seems to reduce the size of the
>> generated gram.c a good dozen kB.
>>
>
> If I remember well - this was required by better compatibility with Oracle
>
> ANSI SQL: colname type DEFAULT PATH
> Oracle: colname PATH DEFAULT
>
> My implementation allows both combinations - there are two reasons: 1. one
> less issue when people does port from Oracle, 2. almost all examples of
> XMLTABLE on a net are from Oracle - it can be unfriendly, when these
> examples would not work on PG - there was discussion about this issue in
> this mailing list
>
>
>>
>>
>> I didn't like much the use of c_expr in all these productions. As I
>> understand it, c_expr is mostly an implementation artifact and we should
>> be using a_expr or b_expr almost everywhere. I see that XMLEXISTS
>> already expanded the very limited use of c_expr there was; I would
>> prefer to fix that one too rather than replicate it here. TBH I'm not
>> sure I like that XMLTABLE is re-using xmlexists_argument.
>>
>
> There are two situations: c_expr as document content, and c_expr after
> DEFAULT and PATH keywords. First probably can be fixed, second not, because
> "PATH" is unreserved keyword only.
>

It is not possible PASSING is unreserved keyword too.

Regards

Pavel

>
>
>>
>> Actually, is the existing XMLEXISTS production correct? What I see in
>> the standard is
>>
>> <XML table row pattern> ::= <character string literal>
>>
>> <XML table argument list> ::=
>> PASSING <XML table argument passing mechanism> <XML query argument>
>> [ { <comma> <XML query argument> }... ]
>>
>> <XML table argument passing mechanism> ::= <XML passing mechanism>
>>
>> <XML table column definitions> ::= <XML table column definition> [ {
>> <comma> <XML table column definition> }... ]
>>
>> <XML table column definition> ::=
>> <XML table ordinality column definition>
>> | <XML table regular column definition>
>>
>> <XML table ordinality column definition> ::= <column name> FOR ORDINALITY
>>
>> <XML table regular column definition> ::=
>> <column name> <data type> [ <XML passing mechanism> ]
>> [ <default clause> ]
>> [ PATH <XML table column pattern> ]
>>
>> <XML table column pattern> ::= <character string literal>
>>
>> so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>",
>> but what
>> we have in gram.y is:
>>
>> /* We allow several variants for SQL and other compatibility. */
>> xmlexists_argument:
>> PASSING c_expr
>> {
>> $$ = $2;
>> }
>> | PASSING c_expr BY REF
>> {
>> $$ = $2;
>> }
>> | PASSING BY REF c_expr
>> {
>> $$ = $4;
>> }
>> | PASSING BY REF c_expr BY REF
>> {
>> $$ = $4;
>> }
>> ;
>>
>> I'm not sure why we allow "PASSING c_expr" at all. Maybe if BY VALUE/BY
>> REF is not specified, we should just not have PASSING at all?
>>
>>
> If we extended this for XMLEXISTS for compatibility with some other
>> product, perhaps we should look into what that product supports for
>> XMLTABLE; maybe XMLTABLE does not need all the same options as
>> XMLEXISTS.
>>
>>
> The reason is a compatibility with other products - DB2. XMLTABLE uses
> same options like XMLEXISTS. These options has zero value for Postgres, but
> its are important - compatibility, and workable examples.
>
>
>> The fourth option seems very dubious to me. This was added by commit
>> 641459f26, submitted here:
>> /message-id/4C0F6DBF.9000001@mlfowler.com
>>
>> ... Hm, actually some perusal of the XMLEXISTS predicate in the standard
>> shows that it's quite a different thing from XMLTABLE. Maybe we
>> shouldn't reuse xmlexists_argument here at all.
>>
>
> not sure If I understand
>
> Regards
>
> Pavel
>
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-21 18:12:21
Message-ID: 20161121181221.d2veeltaaejutzxt@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Something I just noticed is that transformTableExpr takes a TableExpr
node and returns another TableExpr node. That's unlike what we do in
other places, where the node returned is of a different type than the
input node. I'm not real clear what happens if you try to re-transform
a node that was already transformed, but it seems worth thinking about.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-21 20:16:15
Message-ID: 834.1479759375@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Something I just noticed is that transformTableExpr takes a TableExpr
> node and returns another TableExpr node. That's unlike what we do in
> other places, where the node returned is of a different type than the
> input node. I'm not real clear what happens if you try to re-transform
> a node that was already transformed, but it seems worth thinking about.

We're not 100% consistent on that --- there are cases such as RowExpr
and CaseExpr where the same struct type is used for pre-parse-analysis
and post-parse-analysis nodes. I think it's okay as long as the
information content isn't markedly different, ie the transformation
just consists of transforming all the sub-nodes.

Being able to behave sanely on a re-transformation used to be an
issue, but we no longer expect transformExpr to support that.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-22 04:53:52
Message-ID: CAFj8pRCmvP_jmQv5Yqva-3Rt-9JCTFCUXh8LGWDEt7Y+EVTs7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-21 21:16 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > Something I just noticed is that transformTableExpr takes a TableExpr
> > node and returns another TableExpr node. That's unlike what we do in
> > other places, where the node returned is of a different type than the
> > input node. I'm not real clear what happens if you try to re-transform
> > a node that was already transformed, but it seems worth thinking about.
>
> We're not 100% consistent on that --- there are cases such as RowExpr
> and CaseExpr where the same struct type is used for pre-parse-analysis
> and post-parse-analysis nodes. I think it's okay as long as the
> information content isn't markedly different, ie the transformation
> just consists of transforming all the sub-nodes.
>
> Being able to behave sanely on a re-transformation used to be an
> issue, but we no longer expect transformExpr to support that.
>

I was not sure in this case - using new node was more clear for me -
safeguard against some uninitialized or untransformed value. There in only
few bytes memory more overhead.

regards

Pavel

>
> regards, tom lane
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-22 20:47:30
Message-ID: 20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I found the whole TableExprGetTupleDesc() function a bit odd in
nodeFuncs.c, so I renamed it to ExecTypeFromTableExpr() and moved it to
execTuples.c -- but only because that's where ExecTypeFromTL and others
already live. I would have liked to move it to tupdesc.c instead, but
it requires knowledge of executor nodes, which is probably the reason
that ExecTypeFromTL is in execTuples. I think we'd eat that bit of
ugliness only because we're not the first. But anyway I quickly ran
into another problem.

I noticed that ExecTypeFromTableExpr is being called from the transform
phase, which is much earlier than the executor. I noticed because of
the warning that the above movement added to nodeFuncs.c,
src/backend/nodes/nodeFuncs.c:509:5: warning: implicit declaration of function 'ExecTypeFromTableExpr' [-Wimplicit-function-declaration]

so I thought, hm, is it okay to have parse analysis run an executor
function? (I suppose this is the reason you put it in nodeFuncs in the
first place). For fun, I tried this query under GDB, with a breakpoint
on exprTypmod():

SELECT X.*
FROM emp,
XMLTABLE ('//depts/dept/employee' passing doc
COLUMNS
empID INTEGER PATH '@id',
firstname int PATH 'name/first',
lastname VARCHAR(25) PATH 'name/last') AS X;

and sure enough, the type is resolved during parse analysis:

Breakpoint 1, exprTypmod (expr=expr(at)entry=0x1d23ad8)
at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
283 if (!expr)
(gdb) print *expr
$2 = {type = T_TableExpr}
(gdb) bt
#0 exprTypmod (expr=expr(at)entry=0x1d23ad8)
at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
#1 0x000000000080c500 in get_expr_result_type (expr=0x1d23ad8,
resultTypeId=0x7ffd482bfdb4, resultTupleDesc=0x7ffd482bfdb8)
at /pgsql/source/master/src/backend/utils/fmgr/funcapi.c:247
#2 0x000000000056de1b in expandRTE (rte=rte(at)entry=0x1d6b800, rtindex=2,
sublevels_up=0, location=location(at)entry=7,
include_dropped=include_dropped(at)entry=0 '\000',
colnames=colnames(at)entry=0x7ffd482bfe10, colvars=0x7ffd482bfe18)
at /pgsql/source/master/src/backend/parser/parse_relation.c:2052
#3 0x000000000056e131 in expandRelAttrs (pstate=pstate(at)entry=0x1d238a8,
rte=rte(at)entry=0x1d6b800, rtindex=<optimized out>,
sublevels_up=<optimized out>, location=location(at)entry=7)
at /pgsql/source/master/src/backend/parser/parse_relation.c:2435
#4 0x000000000056fa64 in ExpandSingleTable (pstate=pstate(at)entry=0x1d238a8,
rte=rte(at)entry=0x1d6b800, location=7,
make_target_entry=make_target_entry(at)entry=1 '\001')
at /pgsql/source/master/src/backend/parser/parse_target.c:1266
#5 0x000000000057135b in ExpandColumnRefStar (pstate=pstate(at)entry=0x1d238a8,
cref=0x1d22720, make_target_entry=make_target_entry(at)entry=1 '\001')
at /pgsql/source/master/src/backend/parser/parse_target.c:1158
#6 0x00000000005716f9 in transformTargetList (pstate=0x1d238a8,
targetlist=<optimized out>, exprKind=EXPR_KIND_SELECT_TARGET)

This seems fine I guess, and it seems to say that we ought to move the
code that generates the tupdesc to back parse analysis rather than
executor. Okay, fine. But let's find a better place than nodeFuncs.

But if I move the XMLTABLE() call to the target list instead, the type
is resolved at planner time:

SELECT
XMLTABLE ('/dept/employee' passing $$<dept bldg="114">
<employee id="903">
<name>
<first>Mary</first>
<last>Jones</last>
</name>
<office>415</office>
<phone>905-403-6112</phone>
<phone>647-504-4546</phone>
<salary currency="USD">64000</salary>
</employee>
</dept>$$
COLUMNS
empID INTEGER PATH '@id',
firstname varchar(4) PATH 'name/first',
lastname VARCHAR(25) PATH 'name/last') AS X;

Breakpoint 1, exprTypmod (expr=expr(at)entry=0x1d6bed8)
at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
283 if (!expr)
(gdb) bt
#0 exprTypmod (expr=expr(at)entry=0x1d6bed8)
at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
#1 0x0000000000654058 in set_pathtarget_cost_width (root=0x1d6bc68,
target=0x1d6c728)
at /pgsql/source/master/src/backend/optimizer/path/costsize.c:4729
#2 0x000000000066c197 in grouping_planner (root=0x1d6bc68,
inheritance_update=40 '(', inheritance_update(at)entry=0 '\000',
tuple_fraction=0.01, tuple_fraction(at)entry=0)
at /pgsql/source/master/src/backend/optimizer/plan/planner.c:1745
#3 0x000000000066ef64 in subquery_planner (glob=glob(at)entry=0x1d6bbd0,
parse=parse(at)entry=0x1d23818, parent_root=parent_root(at)entry=0x0,
hasRecursion=hasRecursion(at)entry=0 '\000',
tuple_fraction=tuple_fraction(at)entry=0)
at /pgsql/source/master/src/backend/optimizer/plan/planner.c:795
#4 0x000000000066fe5e in standard_planner (parse=0x1d23818,
cursorOptions=256, boundParams=<optimized out>)
at /pgsql/source/master/src/backend/optimizer/plan/planner.c:307

This is surprising, but I'm not sure it's wrong.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-23 05:16:33
Message-ID: CAFj8pRAeFjYvNK9o06t4R26OFazNn=j2GqSB+yzqirFu-3kT0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-22 21:47 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> I found the whole TableExprGetTupleDesc() function a bit odd in
> nodeFuncs.c, so I renamed it to ExecTypeFromTableExpr() and moved it to
> execTuples.c -- but only because that's where ExecTypeFromTL and others
> already live. I would have liked to move it to tupdesc.c instead, but
> it requires knowledge of executor nodes, which is probably the reason
> that ExecTypeFromTL is in execTuples. I think we'd eat that bit of
> ugliness only because we're not the first. But anyway I quickly ran
> into another problem.
>
> I noticed that ExecTypeFromTableExpr is being called from the transform
> phase, which is much earlier than the executor. I noticed because of
> the warning that the above movement added to nodeFuncs.c,
> src/backend/nodes/nodeFuncs.c:509:5: warning: implicit declaration of
> function 'ExecTypeFromTableExpr' [-Wimplicit-function-declaration]
>

The tuple descriptor should not be serialized.

When xmltable is called directly, then living tuple descriptor is used -
created in transform time. Another situation is when xmltable is used from
view, where transform time is skipped.

Originally I serialized generated type - but I had the problems with record
types - the current infrastructure expects serialization only real types.

My solution is a recheck of tuple descriptor in executor time. It is small
overhead - once per query - but it allows use xmltable from views without
necessity to specify returned columns explicitly.

>
> so I thought, hm, is it okay to have parse analysis run an executor
> function? (I suppose this is the reason you put it in nodeFuncs in the
> first place). For fun, I tried this query under GDB, with a breakpoint
> on exprTypmod():
>
> SELECT X.*
> FROM emp,
> XMLTABLE ('//depts/dept/employee' passing doc
> COLUMNS
> empID INTEGER PATH '@id',
> firstname int PATH 'name/first',
> lastname VARCHAR(25) PATH 'name/last') AS X;
>
> and sure enough, the type is resolved during parse analysis:
>
> Breakpoint 1, exprTypmod (expr=expr(at)entry=0x1d23ad8)
> at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
> 283 if (!expr)
> (gdb) print *expr
> $2 = {type = T_TableExpr}
> (gdb) bt
> #0 exprTypmod (expr=expr(at)entry=0x1d23ad8)
> at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
> #1 0x000000000080c500 in get_expr_result_type (expr=0x1d23ad8,
> resultTypeId=0x7ffd482bfdb4, resultTupleDesc=0x7ffd482bfdb8)
> at /pgsql/source/master/src/backend/utils/fmgr/funcapi.c:247
> #2 0x000000000056de1b in expandRTE (rte=rte(at)entry=0x1d6b800, rtindex=2,
> sublevels_up=0, location=location(at)entry=7,
> include_dropped=include_dropped(at)entry=0 '\000',
> colnames=colnames(at)entry=0x7ffd482bfe10, colvars=0x7ffd482bfe18)
> at /pgsql/source/master/src/backend/parser/parse_relation.c:2052
> #3 0x000000000056e131 in expandRelAttrs (pstate=pstate(at)entry=0x1d238a8,
> rte=rte(at)entry=0x1d6b800, rtindex=<optimized out>,
> sublevels_up=<optimized out>, location=location(at)entry=7)
> at /pgsql/source/master/src/backend/parser/parse_relation.c:2435
> #4 0x000000000056fa64 in ExpandSingleTable (pstate=pstate(at)entry=
> 0x1d238a8,
> rte=rte(at)entry=0x1d6b800, location=7,
> make_target_entry=make_target_entry(at)entry=1 '\001')
> at /pgsql/source/master/src/backend/parser/parse_target.c:1266
> #5 0x000000000057135b in ExpandColumnRefStar (pstate=pstate(at)entry=
> 0x1d238a8,
> cref=0x1d22720, make_target_entry=make_target_entry(at)entry=1 '\001')
> at /pgsql/source/master/src/backend/parser/parse_target.c:1158
> #6 0x00000000005716f9 in transformTargetList (pstate=0x1d238a8,
> targetlist=<optimized out>, exprKind=EXPR_KIND_SELECT_TARGET)
>
> This seems fine I guess, and it seems to say that we ought to move the
> code that generates the tupdesc to back parse analysis rather than
> executor. Okay, fine. But let's find a better place than nodeFuncs.
>
> But if I move the XMLTABLE() call to the target list instead, the type
> is resolved at planner time:
>
> SELECT
> XMLTABLE ('/dept/employee' passing $$<dept bldg="114">
> <employee id="903">
> <name>
> <first>Mary</first>
> <last>Jones</last>
> </name>
> <office>415</office>
> <phone>905-403-6112</phone>
> <phone>647-504-4546</phone>
> <salary currency="USD">64000</salary>
> </employee>
> </dept>$$
> COLUMNS
> empID INTEGER PATH '@id',
> firstname varchar(4) PATH 'name/first',
> lastname VARCHAR(25) PATH 'name/last') AS X;
>
> Breakpoint 1, exprTypmod (expr=expr(at)entry=0x1d6bed8)
> at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
> 283 if (!expr)
> (gdb) bt
> #0 exprTypmod (expr=expr(at)entry=0x1d6bed8)
> at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
> #1 0x0000000000654058 in set_pathtarget_cost_width (root=0x1d6bc68,
> target=0x1d6c728)
> at /pgsql/source/master/src/backend/optimizer/path/costsize.c:4729
> #2 0x000000000066c197 in grouping_planner (root=0x1d6bc68,
> inheritance_update=40 '(', inheritance_update(at)entry=0 '\000',
> tuple_fraction=0.01, tuple_fraction(at)entry=0)
> at /pgsql/source/master/src/backend/optimizer/plan/planner.c:1745
> #3 0x000000000066ef64 in subquery_planner (glob=glob(at)entry=0x1d6bbd0,
> parse=parse(at)entry=0x1d23818, parent_root=parent_root(at)entry=0x0,
> hasRecursion=hasRecursion(at)entry=0 '\000',
> tuple_fraction=tuple_fraction(at)entry=0)
> at /pgsql/source/master/src/backend/optimizer/plan/planner.c:795
> #4 0x000000000066fe5e in standard_planner (parse=0x1d23818,
> cursorOptions=256, boundParams=<optimized out>)
> at /pgsql/source/master/src/backend/optimizer/plan/planner.c:307
>
> This is surprising, but I'm not sure it's wrong.
>

There are different processing for Set Returning nodes called from
paramlist and from tablelist. In last case the invokes exprTypmod early.

There is a different case, that you didn't check

CREATE VIEW x AS SELECT xmltable(...)
CREATE VIEW x1 AS SELECT * FROM xmltable(...)

close session

and in new session
SELECT * FROM x;
SELECT * FROM x1;

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-23 16:54:48
Message-ID: 20161123165448.wwqwzkm7y5uyothp@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I tried to see if a following RTE was able to "see" the entries created by
XMLTABLE, and sure it can:

SELECT X.*, generate_series
FROM emp,
XMLTABLE ('//depts/dept/employee' passing doc
COLUMNS
empID INTEGER PATH '@id',
firstname varchar(25) PATH 'name/first',
lastname VARCHAR(25) PATH 'name/last') AS X,
generate_series(900, empid);

empid │ firstname │ lastname │ generate_series
───────┼───────────┼──────────┼─────────────────
901 │ John │ Doe │ 900
901 │ John │ Doe │ 901
902 │ Peter │ Pan │ 900
902 │ Peter │ Pan │ 901
902 │ Peter │ Pan │ 902
903 │ Mary │ Jones │ 900
903 │ Mary │ Jones │ 901
903 │ Mary │ Jones │ 902
903 │ Mary │ Jones │ 903
(9 filas)

Cool.

I'm still wondering how this works. I'll continue to explore the patch
in order to figure out.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-23 17:49:37
Message-ID: 20161123174937.opv4zw67k63x7htb@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:

> I'm still wondering how this works. I'll continue to explore the patch
> in order to figure out.

Ah, so it's already parsed as a "range function". That part looks good
to me.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-23 23:13:30
Message-ID: 20161123231330.jw7c7fhw5iz7z3xh@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oh my, I just noticed we have a new xpath preprocessor in this patch
too. Where did this code come from -- did you write it all from
scratch?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-23 23:29:10
Message-ID: 20161123232910.d2rmsxml24wknucl@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's another version. Not there yet: need to move back the function
to create the tupdesc, as discussed. Not clear what's the best place,
however. I modified the grammar a bit (added the missing comma, removed
PATH as an unreserved keyword and just used IDENT, removed the "Opt"
version for column options), and reworked the comments in the transform
phase (I tweaked the code here and there mostly to move things to nicer
places, but it's pretty much the same code).

In the new xpath_parser.c file I think we should tidy things up a bit.
First, it needs more commentary on what the entry function actually
does, in detail. Also, IMO that function should be at the top of the
file, not at the bottom, followed by all its helpers. I would like some
more clarity on the provenance of all this code, just to assess the
probability of bugs; mostly as it's completely undocumented.

I don't like the docs either. I think we should have a complete
reference to the syntax, followed by examples, rather than letting the
examples drive the whole thing. I fixed the synopsis so that it's not
one very long line.

If you use "PATH '/'" for a column, you get the text for all the entries
in the whole XML, rather than the text for the particular row being
processed. Isn't that rather weird, or to put it differently, completely
wrong? I didn't find a way to obtain the whole XML row when you have
the COLUMNS option (which is what I was hoping for with the "PATH '/'").

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-23 23:31:30
Message-ID: 20161123233130.oqf7jl6czehy5fiw@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Sorry, here's the patch. Power loss distracted me here.

By the way, the pgindent you did is slightly confused because you failed
to add the new struct types you define to typedefs.list.

I have not looked at the new xml.c code at all, yet.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
xmltable-13.patch text/plain 177.3 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-23 23:44:16
Message-ID: 84e9ac46-d205-46f4-9c9a-65081bad3716@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/23/2016 06:31 PM, Alvaro Herrera wrote:
> Sorry, here's the patch. Power loss distracted me here.
>
> By the way, the pgindent you did is slightly confused because you failed
> to add the new struct types you define to typedefs.list.

Tips on how to use pgindent for developers:

<http://adpgtech.blogspot.com/2015/05/running-pgindent-on-non-core-code-or.html>

cheers

andrew


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 03:26:20
Message-ID: 20161124032620.ejd3q74z4gxonymh@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:

> If you use "PATH '/'" for a column, you get the text for all the entries
> in the whole XML, rather than the text for the particular row being
> processed. Isn't that rather weird, or to put it differently, completely
> wrong? I didn't find a way to obtain the whole XML row when you have
> the COLUMNS option (which is what I was hoping for with the "PATH '/'").

Ah, apparently you need to use type XML for that column in order for
this to happen. Example:

insert into emp values ($$
<depts >
<dept bldg="102">
<employee id="905">
<name>
<first>John</first>
<last>Doew</last>
</name>
<office>344</office>
<salary currency="USD">55000</salary>
</employee>

<employee id="908">
<name>
<first>Peter</first>
<last>Panw</last>
</name>
<office>216</office>
<phone>905-416-5004</phone>
</employee>
</dept>

<dept bldg="115">
<employee id="909">
<name>
<first>Mary</first>
<last>Jonesw</last>
</name>
<office>415</office>
<phone>905-403-6112</phone>
<phone>647-504-4546</phone>
<salary currency="USD">64000</salary>
</employee>
</dept>
</depts>
$$);

Note the weird salary_amount value here:

SELECT x.*
FROM emp,
XMLTABLE ('//depts/dept/employee' passing doc
COLUMNS
i for ordinality,
empID int PATH '@id',
firstname varchar(25) PATH 'name/first' default 'FOOBAR',
lastname VARCHAR(25) PATH 'name/last',
salary xml path 'concat(salary/text(), salary/@currency)' default 'DONT KNOW', salary_amount xml path '/' )
WITH ORDINALITY
AS X (i, a, b, c) limit 1;
i │ a │ b │ c │ salary │ salary_amount │ ordinality
───┼─────┼──────┼──────┼──────────┼───────────────────────┼────────────
1 │ 905 │ John │ Doew │ 55000USD │ ↵│ 1
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ John ↵│
│ │ │ │ │ Doew ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ 344 ↵│
│ │ │ │ │ 55000 ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ Peter ↵│
│ │ │ │ │ Panw ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ 216 ↵│
│ │ │ │ │ 905-416-5004↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ Mary ↵│
│ │ │ │ │ Jonesw ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ 415 ↵│
│ │ │ │ │ 905-403-6112↵│
│ │ │ │ │ 647-504-4546↵│
│ │ │ │ │ 64000 ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ ↵│
│ │ │ │ │ │
(1 fila)

If you declare salary_amount to be text instead, it doesn't happen anymore.
Apparently if you put it in a namespace, it doesn't hapen either.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 04:52:46
Message-ID: CAFj8pRA5vz20Le5EQ-5MGygWs7kh0azTRLt9=rvS8GWMWfw_jA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Oh my, I just noticed we have a new xpath preprocessor in this patch
> too. Where did this code come from -- did you write it all from
> scratch?
>

I wrote it from scratch - libxml2 has not any API for iteration over XPath
expression (different than iteration over XPath expression result), and
what I have info, there will not be any new API in libxml2.

There are two purposes:

Safe manipulation with XPath expression prefixes - ANSI SQL design
implicitly expects some prefix, but it can be used manually. The prefix
should not be used twice and in some situations, when it can breaks the
expression.

Second goal is support default namespaces - when we needed parser for first
task, then the enhancing for this task was not too much lines more.

This parser can be used for enhancing current XPath function - default
namespaces are pretty nice, when you have to use namespaces.

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 05:04:23
Message-ID: CAFj8pRCPm2Y76NtWM1z1tFyWttyp482cAgF0+PH7+zZs34NysQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-24 0:29 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Here's another version. Not there yet: need to move back the function
> to create the tupdesc, as discussed. Not clear what's the best place,
> however. I modified the grammar a bit (added the missing comma, removed
> PATH as an unreserved keyword and just used IDENT, removed the "Opt"
> version for column options), and reworked the comments in the transform
> phase (I tweaked the code here and there mostly to move things to nicer
> places, but it's pretty much the same code).
>
> In the new xpath_parser.c file I think we should tidy things up a bit.
> First, it needs more commentary on what the entry function actually
> does, in detail. Also, IMO that function should be at the top of the
> file, not at the bottom, followed by all its helpers. I would like some
> more clarity on the provenance of all this code, just to assess the
> probability of bugs; mostly as it's completely undocumented.
>
> I don't like the docs either. I think we should have a complete
> reference to the syntax, followed by examples, rather than letting the
> examples drive the whole thing. I fixed the synopsis so that it's not
> one very long line.
>
> If you use "PATH '/'" for a column, you get the text for all the entries
> in the whole XML, rather than the text for the particular row being
> processed. Isn't that rather weird, or to put it differently, completely
> wrong? I didn't find a way to obtain the whole XML row when you have
> the COLUMNS option (which is what I was hoping for with the "PATH '/'").
>

This is a libxml2 behave

Postprocessing only check result and try to push the result to expected
types.

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 05:18:31
Message-ID: CAFj8pRCxdLucFpTj9GOkH8pXcM_2jcytRf-iUbU_bEhhcQbJxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-24 5:52 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>> Oh my, I just noticed we have a new xpath preprocessor in this patch
>> too. Where did this code come from -- did you write it all from
>> scratch?
>>
>
> I wrote it from scratch - libxml2 has not any API for iteration over XPath
> expression (different than iteration over XPath expression result), and
> what I have info, there will not be any new API in libxml2.
>
> There are two purposes:
>
> Safe manipulation with XPath expression prefixes - ANSI SQL design
> implicitly expects some prefix, but it can be used manually. The prefix
> should not be used twice and in some situations, when it can breaks the
> expression.
>

Implicit prefix for column PATH expressions is "./". An user can use it
explicitly.

In my initial patches, the manipulations with XPath expression was more
complex - now, it can be reduced - but then we lost default namespaces
support, what is nice feature, supported other providers.

>
> Second goal is support default namespaces - when we needed parser for
> first task, then the enhancing for this task was not too much lines more.
>
> This parser can be used for enhancing current XPath function - default
> namespaces are pretty nice, when you have to use namespaces.
>
>
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 17:29:46
Message-ID: 20161124172946.6s27qr5um4q6aqy6@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> Hi
>
> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > Oh my, I just noticed we have a new xpath preprocessor in this patch
> > too. Where did this code come from -- did you write it all from
> > scratch?
>
> I wrote it from scratch - libxml2 has not any API for iteration over XPath
> expression (different than iteration over XPath expression result), and
> what I have info, there will not be any new API in libxml2.

Okay, I agree that the default namespace stuff looks worthwhile in the
long run. But I don't have enough time to review the xpath parser stuff
in the current commitfest, and I think it needs at the very least a lot
of additional code commentary.

However I think the rest of it can reasonably go in -- I mean the SQL
parse of it, analysis, executor. Let me propose this: you split the
patch, leaving the xpath_parser.c stuff out and XMLNAMESPACES DEFAULT,
and we introduce just the TableExpr stuff plus the XMLTABLE function. I
can commit that part in the current commitfest, and we leave the
xpath_parser plus associated features for the upcoming commitfest.

Deal?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 17:32:47
Message-ID: CAFj8pRB1gr_1W-wBMe0ySeSYf1GnhKUCweYZ4+d8Mqc4+7_ghw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-24 18:29 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > Hi
> >
> > 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
> >
> > > Oh my, I just noticed we have a new xpath preprocessor in this patch
> > > too. Where did this code come from -- did you write it all from
> > > scratch?
> >
> > I wrote it from scratch - libxml2 has not any API for iteration over
> XPath
> > expression (different than iteration over XPath expression result), and
> > what I have info, there will not be any new API in libxml2.
>
> Okay, I agree that the default namespace stuff looks worthwhile in the
> long run. But I don't have enough time to review the xpath parser stuff
> in the current commitfest, and I think it needs at the very least a lot
> of additional code commentary.
>
> However I think the rest of it can reasonably go in -- I mean the SQL
> parse of it, analysis, executor. Let me propose this: you split the
> patch, leaving the xpath_parser.c stuff out and XMLNAMESPACES DEFAULT,
> and we introduce just the TableExpr stuff plus the XMLTABLE function. I
> can commit that part in the current commitfest, and we leave the
> xpath_parser plus associated features for the upcoming commitfest.

> Deal?
>

ok

can me send your last work?

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 17:41:19
Message-ID: 20161124174119.7gccct4m7bfbkzjt@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> can me send your last work?

Sure, it's in the archives --
/message-id/20161123233130.oqf7jl6czehy5fiw@alvherre.pgsql

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 17:51:52
Message-ID: 21096.1480009912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Pavel Stehule wrote:
>> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>>> Oh my, I just noticed we have a new xpath preprocessor in this patch
>>> too. Where did this code come from -- did you write it all from
>>> scratch?

>> I wrote it from scratch - libxml2 has not any API for iteration over XPath
>> expression (different than iteration over XPath expression result), and
>> what I have info, there will not be any new API in libxml2.

> Okay, I agree that the default namespace stuff looks worthwhile in the
> long run. But I don't have enough time to review the xpath parser stuff
> in the current commitfest, and I think it needs at the very least a lot
> of additional code commentary.

contrib/xml2 has always relied on libxslt for xpath functionality.
Can we do that here instead of writing, debugging, and documenting
a pile of new code?

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 18:31:55
Message-ID: CAFj8pRD1O6oFt5=UzVkBBVYWpBX+zAk_MhXrCC85Ni4gkysa+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-24 18:51 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > Pavel Stehule wrote:
> >> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
> >>> Oh my, I just noticed we have a new xpath preprocessor in this patch
> >>> too. Where did this code come from -- did you write it all from
> >>> scratch?
>
> >> I wrote it from scratch - libxml2 has not any API for iteration over
> XPath
> >> expression (different than iteration over XPath expression result), and
> >> what I have info, there will not be any new API in libxml2.
>
> > Okay, I agree that the default namespace stuff looks worthwhile in the
> > long run. But I don't have enough time to review the xpath parser stuff
> > in the current commitfest, and I think it needs at the very least a lot
> > of additional code commentary.
>
> contrib/xml2 has always relied on libxslt for xpath functionality.
> Can we do that here instead of writing, debugging, and documenting
> a pile of new code?
>

I am sorry - I don't see it. There is nothing complex manipulation with
XPath expressions.

Regards

Pavel

>
> regards, tom lane
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-24 23:33:14
Message-ID: CAFj8pRAf4wq-TCiu5vagF5Mc5DUGHdaeszoYM8dhr5Rv0EEsKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-24 18:41 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
>
> > can me send your last work?
>
> Sure, it's in the archives --
> /message-id/20161123233130.
> oqf7jl6czehy5fiw(at)alvherre(dot)pgsql

Here is updated patch without default namespace support (and without XPath
expression transformation).

Due last changes in parser
https://github.com/postgres/postgres/commit/906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
I had to use c_expr on other positions ( xmlnamespace definition).

I don't think it is limit - in 99% there will be const literal.

Regards

Pavel

>
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-14.patch text/x-patch 167.0 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-25 00:45:37
Message-ID: CAB7nPqRoJ3T1kVX0zJU9DDxOF-Xr_82Z_OMnrxw=8O96=GDKHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 25, 2016 at 3:31 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2016-11-24 18:51 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> contrib/xml2 has always relied on libxslt for xpath functionality.
>> Can we do that here instead of writing, debugging, and documenting
>> a pile of new code?
>
> I am sorry - I don't see it. There is nothing complex manipulation with
> XPath expressions.

You are missing the point here, which is to make the implementation
footprint as light as possible, especially if the added functionality
is already present in a dependency that Postgres can be linked to. OK,
libxslt can only be linked with contrib/xml2/ now, but it would be at
least worth looking at how much the current patch can be simplified
for things like transformTableExpr or XmlTableGetValue by relying on
some existing routines. Nit: I did not look at the patch in details,
but I find the size of the latest version sent, 167kB, scary as it
complicates review and increases the likeliness of bugs.
--
Michael


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-25 02:31:26
Message-ID: 20161125023126.whqdsnp6zl2obdtj@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier wrote:

> Nit: I did not look at the patch in details,
> but I find the size of the latest version sent, 167kB, scary as it
> complicates review and increases the likeliness of bugs.

Here's the stat. Note that removing the functionality as discussed
would remove all of xpath_parser.c but I think the rest of it remains
pretty much unchanged. So it's clearly a large patch, but there are
large docs and tests too, not just code.

doc/src/sgml/func.sgml | 376 ++++++++++++++++++---
src/backend/executor/execQual.c | 335 +++++++++++++++++++
src/backend/executor/execTuples.c | 42 +++
src/backend/nodes/copyfuncs.c | 66 ++++
src/backend/nodes/equalfuncs.c | 51 +++
src/backend/nodes/nodeFuncs.c | 100 ++++++
src/backend/nodes/outfuncs.c | 51 +++
src/backend/nodes/readfuncs.c | 42 +++
src/backend/optimizer/util/clauses.c | 33 ++
src/backend/parser/gram.y | 181 ++++++++++-
src/backend/parser/parse_coerce.c | 33 +-
src/backend/parser/parse_expr.c | 182 +++++++++++
src/backend/parser/parse_target.c | 7 +
src/backend/utils/adt/Makefile | 2 +-
src/backend/utils/adt/ruleutils.c | 100 ++++++
src/backend/utils/adt/xml.c | 610 +++++++++++++++++++++++++++++++++++
src/backend/utils/adt/xpath_parser.c | 337 +++++++++++++++++++
src/backend/utils/fmgr/funcapi.c | 13 +
src/include/executor/executor.h | 1 +
src/include/executor/tableexpr.h | 69 ++++
src/include/funcapi.h | 1 -
src/include/nodes/execnodes.h | 31 ++
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 21 ++
src/include/nodes/primnodes.h | 40 +++
src/include/parser/kwlist.h | 3 +
src/include/parser/parse_coerce.h | 4 +
src/include/utils/xml.h | 2 +
src/include/utils/xpath_parser.h | 24 ++
src/test/regress/expected/xml.out | 415 ++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 323 +++++++++++++++++++
src/test/regress/expected/xml_2.out | 414 ++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 170 ++++++++++
33 files changed, 4019 insertions(+), 64 deletions(-)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-25 06:44:12
Message-ID: CAFj8pRBNoVu2UjHzn0QaZOLOkC08aPH0BP5-5prmi+uWZp=CcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-25 3:31 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Michael Paquier wrote:
>
> > Nit: I did not look at the patch in details,
> > but I find the size of the latest version sent, 167kB, scary as it
> > complicates review and increases the likeliness of bugs.
>
> Here's the stat. Note that removing the functionality as discussed
> would remove all of xpath_parser.c but I think the rest of it remains
> pretty much unchanged. So it's clearly a large patch, but there are
> large docs and tests too, not just code.
>

yes, lot of is regress tests (expected part is 3x) - and XMLTABLE function
is not trivial.

lot of code is mechanical - nodes related. The really complex part is only
in xml.c. There is one longer function only - the complexity is based on
mapping libxml2 result to PostgreSQL result (with catching exceptions due
releasing libxml2 sources).

The all changes are well isolated - there is less risk to break some other.

>
> doc/src/sgml/func.sgml | 376 ++++++++++++++++++---
> src/backend/executor/execQual.c | 335 +++++++++++++++++++
> src/backend/executor/execTuples.c | 42 +++
> src/backend/nodes/copyfuncs.c | 66 ++++
> src/backend/nodes/equalfuncs.c | 51 +++
> src/backend/nodes/nodeFuncs.c | 100 ++++++
> src/backend/nodes/outfuncs.c | 51 +++
> src/backend/nodes/readfuncs.c | 42 +++
> src/backend/optimizer/util/clauses.c | 33 ++
> src/backend/parser/gram.y | 181 ++++++++++-
> src/backend/parser/parse_coerce.c | 33 +-
> src/backend/parser/parse_expr.c | 182 +++++++++++
> src/backend/parser/parse_target.c | 7 +
> src/backend/utils/adt/Makefile | 2 +-
> src/backend/utils/adt/ruleutils.c | 100 ++++++
> src/backend/utils/adt/xml.c | 610 ++++++++++++++++++++++++++++++
> +++++
> src/backend/utils/adt/xpath_parser.c | 337 +++++++++++++++++++
> src/backend/utils/fmgr/funcapi.c | 13 +
> src/include/executor/executor.h | 1 +
> src/include/executor/tableexpr.h | 69 ++++
> src/include/funcapi.h | 1 -
> src/include/nodes/execnodes.h | 31 ++
> src/include/nodes/nodes.h | 4 +
> src/include/nodes/parsenodes.h | 21 ++
> src/include/nodes/primnodes.h | 40 +++
> src/include/parser/kwlist.h | 3 +
> src/include/parser/parse_coerce.h | 4 +
> src/include/utils/xml.h | 2 +
> src/include/utils/xpath_parser.h | 24 ++
> src/test/regress/expected/xml.out | 415 ++++++++++++++++++++++++
> src/test/regress/expected/xml_1.out | 323 +++++++++++++++++++
> src/test/regress/expected/xml_2.out | 414 ++++++++++++++++++++++++
> src/test/regress/sql/xml.sql | 170 ++++++++++
> 33 files changed, 4019 insertions(+), 64 deletions(-)
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-25 06:46:09
Message-ID: CAFj8pRChx-79VRjw2m2mXtTkBq48GHvC043Gp8y1Qc=7mQaUCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-25 7:44 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2016-11-25 3:31 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>> Michael Paquier wrote:
>>
>> > Nit: I did not look at the patch in details,
>> > but I find the size of the latest version sent, 167kB, scary as it
>> > complicates review and increases the likeliness of bugs.
>>
>> Here's the stat. Note that removing the functionality as discussed
>> would remove all of xpath_parser.c but I think the rest of it remains
>> pretty much unchanged. So it's clearly a large patch, but there are
>> large docs and tests too, not just code.
>>
>
> yes, lot of is regress tests (expected part is 3x) - and XMLTABLE function
> is not trivial.
>

regress tests are about 50%

>
> lot of code is mechanical - nodes related. The really complex part is only
> in xml.c. There is one longer function only - the complexity is based on
> mapping libxml2 result to PostgreSQL result (with catching exceptions due
> releasing libxml2 sources).
>
> The all changes are well isolated - there is less risk to break some
> other.
>
>
>>
>> doc/src/sgml/func.sgml | 376 ++++++++++++++++++---
>> src/backend/executor/execQual.c | 335 +++++++++++++++++++
>> src/backend/executor/execTuples.c | 42 +++
>> src/backend/nodes/copyfuncs.c | 66 ++++
>> src/backend/nodes/equalfuncs.c | 51 +++
>> src/backend/nodes/nodeFuncs.c | 100 ++++++
>> src/backend/nodes/outfuncs.c | 51 +++
>> src/backend/nodes/readfuncs.c | 42 +++
>> src/backend/optimizer/util/clauses.c | 33 ++
>> src/backend/parser/gram.y | 181 ++++++++++-
>> src/backend/parser/parse_coerce.c | 33 +-
>> src/backend/parser/parse_expr.c | 182 +++++++++++
>> src/backend/parser/parse_target.c | 7 +
>> src/backend/utils/adt/Makefile | 2 +-
>> src/backend/utils/adt/ruleutils.c | 100 ++++++
>> src/backend/utils/adt/xml.c | 610 ++++++++++++++++++++++++++++++
>> +++++
>> src/backend/utils/adt/xpath_parser.c | 337 +++++++++++++++++++
>> src/backend/utils/fmgr/funcapi.c | 13 +
>> src/include/executor/executor.h | 1 +
>> src/include/executor/tableexpr.h | 69 ++++
>> src/include/funcapi.h | 1 -
>> src/include/nodes/execnodes.h | 31 ++
>> src/include/nodes/nodes.h | 4 +
>> src/include/nodes/parsenodes.h | 21 ++
>> src/include/nodes/primnodes.h | 40 +++
>> src/include/parser/kwlist.h | 3 +
>> src/include/parser/parse_coerce.h | 4 +
>> src/include/utils/xml.h | 2 +
>> src/include/utils/xpath_parser.h | 24 ++
>> src/test/regress/expected/xml.out | 415 ++++++++++++++++++++++++
>> src/test/regress/expected/xml_1.out | 323 +++++++++++++++++++
>> src/test/regress/expected/xml_2.out | 414 ++++++++++++++++++++++++
>> src/test/regress/sql/xml.sql | 170 ++++++++++
>> 33 files changed, 4019 insertions(+), 64 deletions(-)
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-28 22:34:33
Message-ID: 20161128223433.mm55e6altkso3qx6@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> Here is updated patch without default namespace support (and without XPath
> expression transformation).
>
> Due last changes in parser
> https://github.com/postgres/postgres/commit/906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
> I had to use c_expr on other positions ( xmlnamespace definition).
>
> I don't think it is limit - in 99% there will be const literal.

Argh. I can't avoid the feeling that I'm missing some parser trickery
here. We have the XMLNAMESPACES keyword and the clause-terminating
comma to protect these clauses, there must be a way to define this piece
of the grammar so that there's no conflict, without losing the freedom
in the expressions. But I don't see how. Now I agree that xml
namespace definitions are going to be string literals in almost all
cases (or in extra sophisticated cases, column names) ... it's probably
better to spend the bison-fu in the document expression or the column
options, or better yet the xmlexists_argument stuff. But I don't see
possibility of improvements in any of those places, so let's put it
aside -- we can improve later, if need arises.

In any case, it looks like we can change c_expr to b_expr in a few
places, which is good because then operators work (in particular, unless
I misread the grammar, foo||bar doesn't work with c_expr and does work
with b_expr, which seems the most useful in this case). Also, it makes
no sense to support (in the namespaces clause) DEFAULT a_expr if the
IDENT case uses only b_expr, so let's reduce both to just b_expr.

While I'm looking at node definitions, I see a few things that could use
some naming improvement. For example, "expr" for TableExpr is a bit
unexpressive. We could use "document_expr" there, perhaps. "row_path"
seems fixated on the XML case and the expression be path; let's use
"row_expr" there. And "cols" could be "column_exprs" perhaps. (All
those renames cause fall-out in various node-related files, so let's
think carefully to avoid renaming them multiple times.)

In primnodes, you kept the comment that says "xpath". Please update
that to not-just-XML reality.

Please fix the comment in XmlTableAddNs; NULL is no longer a valid value.

parse_expr.c has two unused variables; please remove them.

This test in ExecEvalTableExprProtected looks weird:
if (i != tstate->for_ordinality_col - 1)
please change to comparing "i + 1" (convert array index into attribute
number), and invert the boolean expression, leaving the for_ordinality
case on top and the rest in the "else". That seems easier to read.
Also, we customarily use post-increment (rownum++) instead of pre-incr.

In execQual.c I think it's neater to have ExecEvalTableExpr go before
its subroutine. Actually, I wonder whether it is really necessary to
have a subroutine in the first place; you could just move the entire
contents of that subroutine to within the PG_TRY block instead. The
only thing you lose is one indentation level. I'm not sure about this
one, but it's worth considering.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-29 21:36:18
Message-ID: CAFj8pRBt6dt+ZO-isn4wyfnhF386YoLgdA+Bc1R8Y=2j4S_5ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-28 23:34 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
>
> > Here is updated patch without default namespace support (and without
> XPath
> > expression transformation).
> >
> > Due last changes in parser
> > https://github.com/postgres/postgres/commit/
> 906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
> > I had to use c_expr on other positions ( xmlnamespace definition).
> >
> > I don't think it is limit - in 99% there will be const literal.
>
> Argh. I can't avoid the feeling that I'm missing some parser trickery
> here. We have the XMLNAMESPACES keyword and the clause-terminating
> comma to protect these clauses, there must be a way to define this piece
> of the grammar so that there's no conflict, without losing the freedom
> in the expressions. But I don't see how. Now I agree that xml
> namespace definitions are going to be string literals in almost all
> cases (or in extra sophisticated cases, column names) ... it's probably
> better to spend the bison-fu in the document expression or the column
> options, or better yet the xmlexists_argument stuff. But I don't see
> possibility of improvements in any of those places, so let's put it
> aside -- we can improve later, if need arises.
>

The problem is in unreserved keyword "PASSING" probably.

>
> In any case, it looks like we can change c_expr to b_expr in a few
> places, which is good because then operators work (in particular, unless
> I misread the grammar, foo||bar doesn't work with c_expr and does work
> with b_expr, which seems the most useful in this case). Also, it makes
> no sense to support (in the namespaces clause) DEFAULT a_expr if the
> IDENT case uses only b_expr, so let's reduce both to just b_expr.
>

I changed all what was possible to b_expr.

>
> While I'm looking at node definitions, I see a few things that could use
> some naming improvement. For example, "expr" for TableExpr is a bit
> unexpressive. We could use "document_expr" there, perhaps. "row_path"
> seems fixated on the XML case and the expression be path; let's use
> "row_expr" there. And "cols" could be "column_exprs" perhaps. (All
> those renames cause fall-out in various node-related files, so let's
> think carefully to avoid renaming them multiple times.)
>

Columns is not only expr - list - so I renamed it to "columns". Other
renamed like you proposed

>
> In primnodes, you kept the comment that says "xpath". Please update
> that to not-just-XML reality.
>

fixed

>
> Please fix the comment in XmlTableAddNs; NULL is no longer a valid value.
>

fixed

>
> parse_expr.c has two unused variables; please remove them.
>

fixed

>
> This test in ExecEvalTableExprProtected looks weird:
> if (i != tstate->for_ordinality_col - 1)
> please change to comparing "i + 1" (convert array index into attribute
> number), and invert the boolean expression, leaving the for_ordinality
> case on top and the rest in the "else". That seems easier to read.
> Also, we customarily use post-increment (rownum++) instead of pre-incr.
>
>
fiexed

> In execQual.c I think it's neater to have ExecEvalTableExpr go before
> its subroutine. Actually, I wonder whether it is really necessary to
> have a subroutine in the first place; you could just move the entire
> contents of that subroutine to within the PG_TRY block instead. The
> only thing you lose is one indentation level. I'm not sure about this
> one, but it's worth considering.
>

done

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Regards

Pavel

Attachment Content-Type Size
xmltable-15.patch text/x-patch 165.1 KB

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-30 01:40:57
Message-ID: CAMsr+YHYneSfuAaqVt9OgFdVhQES77iNJsZggNqbWo74VEJ7Ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30 November 2016 at 05:36, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> The problem is in unreserved keyword "PASSING" probably.

Yeah, I think that's what I hit when trying to change it.

Can't you just parenthesize the expression to use operators like ||
etc? If so, not a big deal.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-30 05:30:37
Message-ID: CAFj8pRA3Q-aTtVhf703Hg=nM2FREvPAw5GmUQC+TW3vsb38zuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-30 2:40 GMT+01:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 30 November 2016 at 05:36, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> > The problem is in unreserved keyword "PASSING" probably.
>
> Yeah, I think that's what I hit when trying to change it.
>
> Can't you just parenthesize the expression to use operators like ||
> etc? If so, not a big deal.
>
>
???

>
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-30 12:38:04
Message-ID: 20161130123804.efba2dtzakta5h7n@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>
> > On 30 November 2016 at 05:36, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> > wrote:
> >
> > > The problem is in unreserved keyword "PASSING" probably.
> >
> > Yeah, I think that's what I hit when trying to change it.
> >
> > Can't you just parenthesize the expression to use operators like ||
> > etc? If so, not a big deal.
> >
> ???

"'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
to manually add parens around any expressions that they want to use.
That's not necessary most of the time since we've been able to use
b_expr in most places.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-11-30 13:53:03
Message-ID: CAFj8pRBR_He-bbgrbPdu1ZvZq+MfsRoMs2T8CAHhEa9CFM2H-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-11-30 13:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
> >
> > > On 30 November 2016 at 05:36, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> > > wrote:
> > >
> > > > The problem is in unreserved keyword "PASSING" probably.
> > >
> > > Yeah, I think that's what I hit when trying to change it.
> > >
> > > Can't you just parenthesize the expression to use operators like ||
> > > etc? If so, not a big deal.
> > >
> > ???
>
> "'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
> to manually add parens around any expressions that they want to use.
> That's not necessary most of the time since we've been able to use
> b_expr in most places.
>

Now I understand

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2016-11-30 15:21:37
Message-ID: CAFj8pRCGEwHjZPpURAo7syx8vsMO8eZE+YVnaDZFWgHhZxgeTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 30. 11. 2016 14:53 napsal uživatel "Pavel Stehule" <
pavel(dot)stehule(at)gmail(dot)com>:
>
>
>
> 2016-11-30 13:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>>
>> Pavel Stehule wrote:
>> > 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
>> >
>> > > On 30 November 2016 at 05:36, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>> > > wrote:
>> > >
>> > > > The problem is in unreserved keyword "PASSING" probably.
>> > >
>> > > Yeah, I think that's what I hit when trying to change it.
>> > >
>> > > Can't you just parenthesize the expression to use operators like ||
>> > > etc? If so, not a big deal.
>> > >
>> > ???
>>
>> "'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
>> to manually add parens around any expressions that they want to use.
>> That's not necessary most of the time since we've been able to use
>> b_expr in most places.
>

still there are one c_expr, but without new reserved word there are not
change to reduce it.

>
> Now I understand
>
> Regards
>
> Pavel
>
>>
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2016-12-02 06:30:51
Message-ID: CAJrrPGf+RXm95UfyaQJKDF1go8Egf_sQiY1ZtJrpdZnmEB35dQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 1, 2016 at 2:21 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
wrote:

> Dne 30. 11. 2016 14:53 napsal uživatel "Pavel Stehule" <
> pavel(dot)stehule(at)gmail(dot)com>:
> >
> >
> >
> > 2016-11-30 13:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
> >>
> >> Pavel Stehule wrote:
> >> > 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
> >> >
> >> > > On 30 November 2016 at 05:36, Pavel Stehule <
> pavel(dot)stehule(at)gmail(dot)com>
> >> > > wrote:
> >> > >
> >> > > > The problem is in unreserved keyword "PASSING" probably.
> >> > >
> >> > > Yeah, I think that's what I hit when trying to change it.
> >> > >
> >> > > Can't you just parenthesize the expression to use operators like ||
> >> > > etc? If so, not a big deal.
> >> > >
> >> > ???
> >>
> >> "'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
> >> to manually add parens around any expressions that they want to use.
> >> That's not necessary most of the time since we've been able to use
> >> b_expr in most places.
> >
>
> still there are one c_expr, but without new reserved word there are not
> change to reduce it.
>
>
>
Moved to next CF with the same status (ready for committer).

Regards,
Hari Babu
Fujitsu Australia


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-02 16:25:08
Message-ID: 20161202162508.2iehvemlttr5qgna@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hm, you omitted tableexpr.h from the v15 patch ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-02 19:22:00
Message-ID: CAFj8pRDSGrnfAFKjVNy4knnNUk7bbvQeJC2882Nnd3cr2hH9bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-02 17:25 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Hm, you omitted tableexpr.h from the v15 patch ...
>

I am sorry

should be ok now

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-16.patch text/x-patch 167.9 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-02 22:25:50
Message-ID: 20161202222550.t5v23evjddgmkfly@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's version 17. I have made significant changes here.

1. Restructure the execQual code. Instead of a PG_TRY wrapper, I have
split this code in three pieces; there's the main code with the PG_TRY
wrappers and is mainly in charge of the builderCxt pointer. In the
previous coding there was a shim that examined builderCxt but was not
responsible for setting it up, which was ugly. The second part is the
"initializer" which sets the row and column filters and does namespace
processing. The third part is the "FetchRow" logic. It seems to me
much cleaner this way.

2. rename the "builder" stuff to use the "routine" terminology. This is
in line with what we do for other function-pointer-filled structs, such
as FdwRoutine, IndexAmRoutine etc. I also cleaned up the names a bit
more.

3. Added a magic number to the table builder context struct, so that we
can barf appropriately. This is in line with PgXmlErrorContext --
mostly for future-proofing. I didn't test this too hard. Also, moved
the XmlTableContext struct declaration nearer the top of the file, as is
customary. (We don't really need it that way, since the functions are
all declared taking void *, but it seems cleaner to me anyway).

4. I added, edited, and fixed a large number of code comments.

This is looking much better now, but it still needs at least the
following changes.

First, we need to fix is the per_rowset_memcxt thingy. I think the way
it's currently being used is rather ugly; it looks to me like the memory
context does not belong into the XmlTableContext struct at all.
Instead, the executor code should keep the memcxt pointer in a state
struct of its own, and it should be the executor's responsibility to
change to the appropriate context before calling the table builder
functions. In particular, this means that the table context can no
longer be a void * pointer; it needs to be a struct that's defined by
the executor (probably a primnodes.h one). The void * pointer is
stashed inside that struct. Also, the "routine" pointer should not be
part of the void * struct, but of the executor's struct. So the
execQual code can switch to the memory context, and destroy it
appropriately.

Second, we should make gram.y set a new "function type" value in the
TableExpr it creates, so that the downstream code (transformTableExpr,
ExecInitExpr, ruleutils.c) really knows that the given function is
XmlTableExpr, instead of guessing just because it's the only implemented
case. Probably this "function type" is an enum (currently with a single
value TableExprTypeXml or something like that) in primnodes.

Finally, there's the pending task of renaming and moving
ExecTypeFromTableExpr to some better place. Not sure that moving it
back to nodeFuncs is a nice idea. Looks to me like calling it from
ExprTypmod is a rather ugly idea.

Hmm, ruleutils ... not sure what to think of this one.

The typedefs.list changes are just used to pgindent the affected code
correctly. It's not for commit.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Create-sect3-in-the-functions-xml-section.patch text/plain 6.6 KB
0002-xmltable-17.patch text/plain 166.7 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-03 07:28:50
Message-ID: CAFj8pRDLadjfDBS9FHki4xkquU9161XjFA7jbrrsiL244HQ4Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Here's version 17. I have made significant changes here.
>
> 1. Restructure the execQual code. Instead of a PG_TRY wrapper, I have
> split this code in three pieces; there's the main code with the PG_TRY
> wrappers and is mainly in charge of the builderCxt pointer. In the
> previous coding there was a shim that examined builderCxt but was not
> responsible for setting it up, which was ugly. The second part is the
> "initializer" which sets the row and column filters and does namespace
> processing. The third part is the "FetchRow" logic. It seems to me
> much cleaner this way.
>
> 2. rename the "builder" stuff to use the "routine" terminology. This is
> in line with what we do for other function-pointer-filled structs, such
> as FdwRoutine, IndexAmRoutine etc. I also cleaned up the names a bit
> more.
>
> 3. Added a magic number to the table builder context struct, so that we
> can barf appropriately. This is in line with PgXmlErrorContext --
> mostly for future-proofing. I didn't test this too hard. Also, moved
> the XmlTableContext struct declaration nearer the top of the file, as is
> customary. (We don't really need it that way, since the functions are
> all declared taking void *, but it seems cleaner to me anyway).
>
> 4. I added, edited, and fixed a large number of code comments.
>
> This is looking much better now, but it still needs at least the
> following changes.
>
> First, we need to fix is the per_rowset_memcxt thingy. I think the way
> it's currently being used is rather ugly; it looks to me like the memory
> context does not belong into the XmlTableContext struct at all.
> Instead, the executor code should keep the memcxt pointer in a state
> struct of its own, and it should be the executor's responsibility to
> change to the appropriate context before calling the table builder
> functions. In particular, this means that the table context can no
> longer be a void * pointer; it needs to be a struct that's defined by
> the executor (probably a primnodes.h one). The void * pointer is
> stashed inside that struct. Also, the "routine" pointer should not be
> part of the void * struct, but of the executor's struct. So the
> execQual code can switch to the memory context, and destroy it
> appropriately.
>
> Second, we should make gram.y set a new "function type" value in the
> TableExpr it creates, so that the downstream code (transformTableExpr,
> ExecInitExpr, ruleutils.c) really knows that the given function is
> XmlTableExpr, instead of guessing just because it's the only implemented
> case. Probably this "function type" is an enum (currently with a single
> value TableExprTypeXml or something like that) in primnodes.
>

It has sense - I was not sure about it - because currently it is only one
value, you mentioned it.

>
> Finally, there's the pending task of renaming and moving
> ExecTypeFromTableExpr to some better place. Not sure that moving it
> back to nodeFuncs is a nice idea. Looks to me like calling it from
> ExprTypmod is a rather ugly idea.
>

The code is related to prim nodes - it is used more times than in executor.

>
> Hmm, ruleutils ... not sure what to think of this one.
>

it is little bit more complex - but it is related to complexity of XMLTABLE

>
> The typedefs.list changes are just used to pgindent the affected code
> correctly. It's not for commit.
>

The documentation is very precious. Nice

+ /* XXX OK to do this? looks a bit out of place ... */
+ assign_record_type_typmod(typeInfo);

I am thinking it is ok. It is tupdesc without fixed typid, typname used in
returned value - you should to register this tupdesc in typcache.

Regards

Pavel

> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-03 15:03:58
Message-ID: 20161203150358.4npcgzjxcxysqlcd@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> > This is looking much better now, but it still needs at least the
> > following changes.
> >
> > First, we need to fix is the per_rowset_memcxt thingy. I think the way
> > it's currently being used is rather ugly; it looks to me like the memory
> > context does not belong into the XmlTableContext struct at all.
> > Instead, the executor code should keep the memcxt pointer in a state
> > struct of its own, and it should be the executor's responsibility to
> > change to the appropriate context before calling the table builder
> > functions. In particular, this means that the table context can no
> > longer be a void * pointer; it needs to be a struct that's defined by
> > the executor (probably a primnodes.h one). The void * pointer is
> > stashed inside that struct. Also, the "routine" pointer should not be
> > part of the void * struct, but of the executor's struct. So the
> > execQual code can switch to the memory context, and destroy it
> > appropriately.
> >
> > Second, we should make gram.y set a new "function type" value in the
> > TableExpr it creates, so that the downstream code (transformTableExpr,
> > ExecInitExpr, ruleutils.c) really knows that the given function is
> > XmlTableExpr, instead of guessing just because it's the only implemented
> > case. Probably this "function type" is an enum (currently with a single
> > value TableExprTypeXml or something like that) in primnodes.
>
> It has sense - I was not sure about it - because currently it is only one
> value, you mentioned it.

True. This is a minor point.

Are you able to do the memory context change I describe?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-04 22:00:17
Message-ID: CAFj8pRDJ-djtq1NbtyACp4wZxcSix7H-hkk+W6r=rG+yMO8y+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-03 16:03 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
>
> > 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > > This is looking much better now, but it still needs at least the
> > > following changes.
> > >
> > > First, we need to fix is the per_rowset_memcxt thingy. I think the way
> > > it's currently being used is rather ugly; it looks to me like the
> memory
> > > context does not belong into the XmlTableContext struct at all.
> > > Instead, the executor code should keep the memcxt pointer in a state
> > > struct of its own, and it should be the executor's responsibility to
> > > change to the appropriate context before calling the table builder
> > > functions. In particular, this means that the table context can no
> > > longer be a void * pointer; it needs to be a struct that's defined by
> > > the executor (probably a primnodes.h one). The void * pointer is
> > > stashed inside that struct. Also, the "routine" pointer should not be
> > > part of the void * struct, but of the executor's struct. So the
> > > execQual code can switch to the memory context, and destroy it
> > > appropriately.
> > >
> > > Second, we should make gram.y set a new "function type" value in the
> > > TableExpr it creates, so that the downstream code (transformTableExpr,
> > > ExecInitExpr, ruleutils.c) really knows that the given function is
> > > XmlTableExpr, instead of guessing just because it's the only
> implemented
> > > case. Probably this "function type" is an enum (currently with a
> single
> > > value TableExprTypeXml or something like that) in primnodes.
> >
> > It has sense - I was not sure about it - because currently it is only one
> > value, you mentioned it.
>
> True. This is a minor point.
>
> Are you able to do the memory context change I describe?
>

I am not sure if I understand well to your ideas - please, check attached
patch.

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-18.patch text/x-patch 174.1 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-04 22:12:55
Message-ID: CAFj8pRDFD+W-=4oE5WjiRK4N9MtQp=Gqx=yZgfrPiCMeeFTzdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-04 23:00 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2016-12-03 16:03 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>> Pavel Stehule wrote:
>>
>> > 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>>
>> > > This is looking much better now, but it still needs at least the
>> > > following changes.
>> > >
>> > > First, we need to fix is the per_rowset_memcxt thingy. I think the
>> way
>> > > it's currently being used is rather ugly; it looks to me like the
>> memory
>> > > context does not belong into the XmlTableContext struct at all.
>> > > Instead, the executor code should keep the memcxt pointer in a state
>> > > struct of its own, and it should be the executor's responsibility to
>> > > change to the appropriate context before calling the table builder
>> > > functions. In particular, this means that the table context can no
>> > > longer be a void * pointer; it needs to be a struct that's defined by
>> > > the executor (probably a primnodes.h one). The void * pointer is
>> > > stashed inside that struct. Also, the "routine" pointer should not be
>> > > part of the void * struct, but of the executor's struct. So the
>> > > execQual code can switch to the memory context, and destroy it
>> > > appropriately.
>> > >
>> > > Second, we should make gram.y set a new "function type" value in the
>> > > TableExpr it creates, so that the downstream code (transformTableExpr,
>> > > ExecInitExpr, ruleutils.c) really knows that the given function is
>> > > XmlTableExpr, instead of guessing just because it's the only
>> implemented
>> > > case. Probably this "function type" is an enum (currently with a
>> single
>> > > value TableExprTypeXml or something like that) in primnodes.
>> >
>> > It has sense - I was not sure about it - because currently it is only
>> one
>> > value, you mentioned it.
>>
>> True. This is a minor point.
>>
>> Are you able to do the memory context change I describe?
>>
>
> I am not sure if I understand well to your ideas - please, check attached
> patch.
>

attached patch without your patch 0001

Regards

Pavel

>
> Regards
>
> Pavel
>
>
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>

Attachment Content-Type Size
xmltable-19.patch text/x-patch 168.7 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-04 23:45:54
Message-ID: 20161204234554.nti6xxx2j5l7533n@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2016-12-04 23:00 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> > I am not sure if I understand well to your ideas - please, check attached
> > patch.

Thanks, that's what I meant, but I think you went a bit overboard
creating new functions in execQual -- seems to me it would work just
fine to have the memory switches in the same function, rather than
having a number of separate functions just to change the context then
call the method. Please remove these shim functions.

Also, you forgot to remove the now-unused per_rowset_memcxt struct member.

Also, please rename "rc" to something more meaningful -- maybe
"rowcount" is good enough. And "doc" would perhaps be better as
"document".

I'm not completely sure the structs are really sensible yet. I may do
some more changes tomorrow.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-05 06:21:37
Message-ID: CAFj8pRAimQQuK4s9uhyiP0JBVZHYgiaqqdrMOcERuScQKwwiZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-05 0:45 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > 2016-12-04 23:00 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>
> > > I am not sure if I understand well to your ideas - please, check
> attached
> > > patch.
>
> Thanks, that's what I meant, but I think you went a bit overboard
> creating new functions in execQual -- seems to me it would work just
> fine to have the memory switches in the same function, rather than
> having a number of separate functions just to change the context then
> call the method. Please remove these shim functions.
>

done

>
> Also, you forgot to remove the now-unused per_rowset_memcxt struct member.
>

done

>
> Also, please rename "rc" to something more meaningful -- maybe
> "rowcount" is good enough. And "doc" would perhaps be better as
> "document".
>

done

Regards

Pavel

>
> I'm not completely sure the structs are really sensible yet. I may do
> some more changes tomorrow.
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-20.patch text/x-patch 166.0 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-07 07:14:39
Message-ID: 20161207071439.qhfwbszhmmgqedyf@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's v21.

* I changed the grammar by moving the NOT NULL to the column options,
and removing the IsNotNull production. It wasn't nice that "NOT NULL
DEFAULT 0" was not accepted, which it is with the new representation.

* The tuple that's returned is natively a TupleTableSlot inside the
table builder, not directly a HeapTuple. That stuff was ugly and wasn't
using the proper abstraction anyway.

* I changed the signatures of the methods so that they receive
TableExprState, and restructured the "opaque" data to be inside
TableExprState. Now we don't need to have things such as the tupdesc or
the input functions be repeated in the opaque struct. Instead they
belong to the TableExprState and the methods can read them from there.

I managed to break the case with no COLUMNS. Probably related to the
tupdesc changes. It now crashes the regression test. Too tired to
debug now; care to take a look? The other stuff seems to run fine,
though of course the regression test crashes in the middle, so perhaps
there are other problems.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
xmltable-21.patch text/plain 166.2 KB
xmltable-21.patch text/plain 166.2 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-07 13:47:50
Message-ID: CAFj8pRA3nUwTM=ZSeznS=0pdYoeZ9pj-CsbK-TcLrVR-3bjahw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-07 8:14 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Here's v21.
>
> * I changed the grammar by moving the NOT NULL to the column options,
> and removing the IsNotNull production. It wasn't nice that "NOT NULL
> DEFAULT 0" was not accepted, which it is with the new representation.
>
> * The tuple that's returned is natively a TupleTableSlot inside the
> table builder, not directly a HeapTuple. That stuff was ugly and wasn't
> using the proper abstraction anyway.
>
> * I changed the signatures of the methods so that they receive
> TableExprState, and restructured the "opaque" data to be inside
> TableExprState. Now we don't need to have things such as the tupdesc or
> the input functions be repeated in the opaque struct. Instead they
> belong to the TableExprState and the methods can read them from there.
>
> I managed to break the case with no COLUMNS. Probably related to the
> tupdesc changes. It now crashes the regression test. Too tired to
> debug now; care to take a look? The other stuff seems to run fine,
> though of course the regression test crashes in the middle, so perhaps
> there are other problems.
>

I fixed two issues.

1. there are not columns data when there are not any explicit column - fixed

2. there was reverse setting in NOT NULL flag

all tests passed now

Regards

Pavel

> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-22.patch text/x-patch 166.4 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-07 17:34:59
Message-ID: 20161207173459.hv7cltspcjnvj732@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> I fixed two issues.
>
> 2. there was reverse setting in NOT NULL flag

Ah-hah, that was silly, thanks.

> 1. there are not columns data when there are not any explicit column - fixed

Hmm. Now that I see how this works, by having the GetValue "guess" what
is going on and have a special case for it, I actually don't like it
very much. It seems way too magical. I think we should do away with
the "if column is NULL" case in GetValue, and instead inject a column
during transformTableExpr if columns is NIL. This has implications on
ExecInitExpr too, which currently checks for an empty column list -- it
would no longer have to do so.

Maybe this means we need an additional method, which would request "the
expr that returns the whole row", so that transformExpr can work for
XmlTable (which I think would be something like "./") and the future
JsonTable stuff (I don't know how that one would work, but I assume it's
not necessarily the same thing).

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-07 19:37:23
Message-ID: CAFj8pRBsrhwR636-_3TPbqu=Fo3_DDer6_yp_afzR7qzhW1T6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
>
> > I fixed two issues.
> >
> > 2. there was reverse setting in NOT NULL flag
>
> Ah-hah, that was silly, thanks.
>
> > 1. there are not columns data when there are not any explicit column -
> fixed
>
> Hmm. Now that I see how this works, by having the GetValue "guess" what
> is going on and have a special case for it, I actually don't like it
> very much. It seems way too magical. I think we should do away with
> the "if column is NULL" case in GetValue, and instead inject a column
> during transformTableExpr if columns is NIL. This has implications on
> ExecInitExpr too, which currently checks for an empty column list -- it
> would no longer have to do so.
>

I prefer this way against second described. The implementation should be in
table builder routines, not in executor.

sending new update

Regards

Pavel

>
> Maybe this means we need an additional method, which would request "the
> expr that returns the whole row", so that transformExpr can work for
> XmlTable (which I think would be something like "./") and the future
> JsonTable stuff (I don't know how that one would work, but I assume it's
> not necessarily the same thing).
>

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-23.patch text/x-patch 167.0 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-07 19:50:49
Message-ID: 20161207195049.a4vpqongvh7rnhur@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> > Hmm. Now that I see how this works, by having the GetValue "guess" what
> > is going on and have a special case for it, I actually don't like it
> > very much. It seems way too magical. I think we should do away with
> > the "if column is NULL" case in GetValue, and instead inject a column
> > during transformTableExpr if columns is NIL. This has implications on
> > ExecInitExpr too, which currently checks for an empty column list -- it
> > would no longer have to do so.
>
> I prefer this way against second described. The implementation should be in
> table builder routines, not in executor.

Well, given the way you have implemented it, I would prefer the original
too. But your v23 is not what I meant. Essentially what you do in v23
is to communicate the lack of COLUMNS clause in a different way --
previously it was "ncolumns = 0", now it's "is_auto_col=true". It's
still "magic". It's not an improvement.

What I want to happen is that there is no magic at all; it's up to
transformExpr to make sure that when COLUMNS is empty, one column
appears and it must not be a magic column that makes the xml.c code act
differently, but rather to xml.c it should appear that this is just a
normal column that happens to return the entire row. If I say "COLUMNS
foo PATH '/'" I should be able to obtain a similar behavior (except that
in the current code, if I ask for "COLUMNS foo XML PATH '/'" I don't get
XML at all but rather weird text where all tags have been stripped out,
which is very strange. I would expect the tags to be preserved if the
output type is XML. Maybe the tag-stripping behavior should occur if
the output type is some type of text.)

I still have to figure out how to fix the tupledesc thing. What we have
now is not good.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-07 20:48:20
Message-ID: CAFj8pRBQL1atOTTzhHmNVqSuyyfP9_TpEre5d8w_Dgi_Oks3JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-07 20:50 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > 2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > > Hmm. Now that I see how this works, by having the GetValue "guess"
> what
> > > is going on and have a special case for it, I actually don't like it
> > > very much. It seems way too magical. I think we should do away with
> > > the "if column is NULL" case in GetValue, and instead inject a column
> > > during transformTableExpr if columns is NIL. This has implications on
> > > ExecInitExpr too, which currently checks for an empty column list -- it
> > > would no longer have to do so.
> >
> > I prefer this way against second described. The implementation should be
> in
> > table builder routines, not in executor.
>
> Well, given the way you have implemented it, I would prefer the original
> too. But your v23 is not what I meant. Essentially what you do in v23
> is to communicate the lack of COLUMNS clause in a different way --
> previously it was "ncolumns = 0", now it's "is_auto_col=true". It's
> still "magic". It's not an improvement.
>

is_auto_col is used primary for asserting. The table builder has
information for decision in parameter path, when path is NULL.

Hard to say, if this info should be assigned to column or to table. In both
locations has sense. But somewhere should be some flag.

>
> What I want to happen is that there is no magic at all; it's up to
> transformExpr to make sure that when COLUMNS is empty, one column
> appears and it must not be a magic column that makes the xml.c code act
> differently, but rather to xml.c it should appear that this is just a
> normal column that happens to return the entire row. If I say "COLUMNS
> foo PATH '/'" I should be able to obtain a similar behavior (except that
> in the current code, if I ask for "COLUMNS foo XML PATH '/'" I don't get
> XML at all but rather weird text where all tags have been stripped out,
> which is very strange. I would expect the tags to be preserved if the
> output type is XML. Maybe the tag-stripping behavior should occur if
> the output type is some type of text.)
>

I am doing this. Just I using NULL for PATH.

>
>
> I still have to figure out how to fix the tupledesc thing. What we have
> now is not good.
>

cannot be moved to nodefunc?

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-18 15:27:54
Message-ID: CAFj8pRBn-8MbY8xaBJ7K-qogyeLDAuqPdh6X8HssnGD-m+Pk+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-07 20:37 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>> Pavel Stehule wrote:
>>
>> > I fixed two issues.
>> >
>> > 2. there was reverse setting in NOT NULL flag
>>
>> Ah-hah, that was silly, thanks.
>>
>> > 1. there are not columns data when there are not any explicit column -
>> fixed
>>
>> Hmm. Now that I see how this works, by having the GetValue "guess" what
>> is going on and have a special case for it, I actually don't like it
>> very much. It seems way too magical. I think we should do away with
>> the "if column is NULL" case in GetValue, and instead inject a column
>> during transformTableExpr if columns is NIL. This has implications on
>> ExecInitExpr too, which currently checks for an empty column list -- it
>> would no longer have to do so.
>>
>
> I prefer this way against second described. The implementation should be
> in table builder routines, not in executor.
>
> sending new update
>

new update - no functional changes, just unbreaking after last changes in
master.

Regards

Pavel

>
> Regards
>
> Pavel
>
>
>>
>> Maybe this means we need an additional method, which would request "the
>> expr that returns the whole row", so that transformExpr can work for
>> XmlTable (which I think would be something like "./") and the future
>> JsonTable stuff (I don't know how that one would work, but I assume it's
>> not necessarily the same thing).
>>
>
>
>
>
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>

Attachment Content-Type Size
xmltable-24.patch text/x-patch 167.2 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-22 16:11:47
Message-ID: CAFj8pRDf66UX6807z1QPvF8Af9LEunJ=JxVpF0PPFVMMm+ND5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-18 16:27 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2016-12-07 20:37 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>
>>
>>
>> 2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>>
>>> Pavel Stehule wrote:
>>>
>>> > I fixed two issues.
>>> >
>>> > 2. there was reverse setting in NOT NULL flag
>>>
>>> Ah-hah, that was silly, thanks.
>>>
>>> > 1. there are not columns data when there are not any explicit column -
>>> fixed
>>>
>>> Hmm. Now that I see how this works, by having the GetValue "guess" what
>>> is going on and have a special case for it, I actually don't like it
>>> very much. It seems way too magical. I think we should do away with
>>> the "if column is NULL" case in GetValue, and instead inject a column
>>> during transformTableExpr if columns is NIL. This has implications on
>>> ExecInitExpr too, which currently checks for an empty column list -- it
>>> would no longer have to do so.
>>>
>>
>> I prefer this way against second described. The implementation should be
>> in table builder routines, not in executor.
>>
>> sending new update
>>
>
> new update - no functional changes, just unbreaking after last changes in
> master.
>

another update - lot of cleaning

Regards

Pavel

>
> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> Maybe this means we need an additional method, which would request "the
>>> expr that returns the whole row", so that transformExpr can work for
>>> XmlTable (which I think would be something like "./") and the future
>>> JsonTable stuff (I don't know how that one would work, but I assume it's
>>> not necessarily the same thing).
>>>
>>
>>
>>
>>
>>>
>>> --
>>> Álvaro Herrera https://www.2ndQuadrant.com/
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>
>>
>>
>

Attachment Content-Type Size
xmltable-25.patch text/x-patch 162.7 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-12-22 16:27:17
Message-ID: 20161222162717.dqij7lcixjb747p5@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> another update - lot of cleaning

Ah, the tupledesc stuff in this one appears much more reasonable to me.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-11 21:05:12
Message-ID: 20170111210512.hccaz34s7e2jk3dk@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> another update - lot of cleaning

Thanks.

The more I look at this, the less I like using NameArgExpr for
namespaces. It looks all wrong to me, and it causes ugly code all over.
Maybe I just need to look at it a bit longer.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-11 21:53:25
Message-ID: 20170111215325.4zb2arpn3efcl3en@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:

> The more I look at this, the less I like using NameArgExpr for
> namespaces. It looks all wrong to me, and it causes ugly code all over.
> Maybe I just need to look at it a bit longer.

I think it'd be cleaner to use ResTarget for the namespaces, like
xml_attribute_el does, and split the names from actual exprs in the same
way. So things like ExecInitExpr become simpler because you just
recurse to initialize the list, without having to examine each element
individually. tabexprInitialize can just do forboth().

The main reason I don't like abusing NamedArgExpr is that the whole
comment that explains it becomes a lie.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
xmltable-26.patch text/plain 160.0 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-12 15:02:07
Message-ID: CAFj8pRBTmHsJ-mU4n75JhuS8-HK1YyDPYF6LQbPd3D1fDwNKXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-11 22:53 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Alvaro Herrera wrote:
>
> > The more I look at this, the less I like using NameArgExpr for
> > namespaces. It looks all wrong to me, and it causes ugly code all over.
> > Maybe I just need to look at it a bit longer.
>
> I think it'd be cleaner to use ResTarget for the namespaces, like
> xml_attribute_el does, and split the names from actual exprs in the same
> way. So things like ExecInitExpr become simpler because you just
> recurse to initialize the list, without having to examine each element
> individually. tabexprInitialize can just do forboth().
>
> The main reason I don't like abusing NamedArgExpr is that the whole
> comment that explains it becomes a lie.
>

I used your proposed way based on Restarget

Updated patch attached.

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-27.patch text/x-patch 162.6 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-13 20:32:08
Message-ID: 20170113203208.wtjnjgubhjpjulrq@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
>
> I used your proposed way based on Restarget

Thanks. Some more tweaking to go yet before I consider this
committable, but it's much better now. Here's v28. I changed a few
things:

- make expression evaluation code more orthodox:
1. avoid PG_TRY, use a ExprContext shutdown callback instead
2. use a "Fast" evaluator, for calls past the first one
3. don't look up fmgrinfos until execution time
4. don't duplicate get_expr_result_type
- make parser accept DEFAULT namespace. Only xml implementation barfs.
(this means we lost the errposition pointer, but I don't really
care. We could fix it if we cared)
- clean up parse analysis code a little bit
- move decls/struct defs to better locations in source code
- remove leftover "namespaces" in TableExprState
- pgindent the whole mess.

I don't like the xml.c code and the "evalcols" flag. That's next on my
list to fix.

I don't think to_xmlstr() is necessary, considering xml_text2xmlChar.
We could just apply a cast of the source cstring to xmlChar.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
xmltable-28.patch text/plain 160.8 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-14 13:43:26
Message-ID: CAFj8pRA2UCwquoqxx38QFLMPxDPT=h9nNFq02fYaEVqh=-RJNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-13 21:32 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> >
> > I used your proposed way based on Restarget
>
> Thanks. Some more tweaking to go yet before I consider this
> committable, but it's much better now. Here's v28. I changed a few
> things:
>
> - make expression evaluation code more orthodox:
> 1. avoid PG_TRY, use a ExprContext shutdown callback instead
> 2. use a "Fast" evaluator, for calls past the first one
> 3. don't look up fmgrinfos until execution time
> 4. don't duplicate get_expr_result_type
> - make parser accept DEFAULT namespace. Only xml implementation barfs.
> (this means we lost the errposition pointer, but I don't really
> care. We could fix it if we cared)
> - clean up parse analysis code a little bit
> - move decls/struct defs to better locations in source code
> - remove leftover "namespaces" in TableExprState
> - pgindent the whole mess.
>
>
I checked the changes and looks correct - although for some I had not
courage :) - like dynamic change of exprstate->evalfunc

I fixed test, and append forgotten header file

> I don't like the xml.c code and the "evalcols" flag. That's next on my
> list to fix.
>

You need some flag to specify if column paths are valid or not.

> I don't think to_xmlstr() is necessary, considering xml_text2xmlChar.
> We could just apply a cast of the source cstring to xmlChar.
>

is it safe? For one byte encodings?

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-29.patch text/x-patch 163.5 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-15 06:29:22
Message-ID: CAFj8pRA_KEukOBXvS4V-imoEEsXu0pD0AsHV0-MwRFDRWte8Lg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-14 14:43 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
> 2017-01-13 21:32 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>> Pavel Stehule wrote:
>> >
>> > I used your proposed way based on Restarget
>>
>> Thanks. Some more tweaking to go yet before I consider this
>> committable, but it's much better now. Here's v28. I changed a few
>> things:
>>
>> - make expression evaluation code more orthodox:
>> 1. avoid PG_TRY, use a ExprContext shutdown callback instead
>> 2. use a "Fast" evaluator, for calls past the first one
>> 3. don't look up fmgrinfos until execution time
>> 4. don't duplicate get_expr_result_type
>> - make parser accept DEFAULT namespace. Only xml implementation barfs.
>> (this means we lost the errposition pointer, but I don't really
>> care. We could fix it if we cared)
>> - clean up parse analysis code a little bit
>> - move decls/struct defs to better locations in source code
>> - remove leftover "namespaces" in TableExprState
>> - pgindent the whole mess.
>>
>>
> I checked the changes and looks correct - although for some I had not
> courage :) - like dynamic change of exprstate->evalfunc
>
> I fixed test, and append forgotten header file
>
>
>
>
>> I don't like the xml.c code and the "evalcols" flag. That's next on my
>> list to fix.
>>
>
> You need some flag to specify if column paths are valid or not.
>
>
>> I don't think to_xmlstr() is necessary, considering xml_text2xmlChar.
>> We could just apply a cast of the source cstring to xmlChar.
>>
>
> is it safe? For one byte encodings?
>

Looks so this patch breaks regression tests

estoring database schemas in the new cluster
\"\ cdefghijklmnopqrstuvwxyz{|}~

regression
*failure*

Consult the last few lines of "pg_upgrade_dump_16387.log" for
the probable cause of the failure.
Failure, exiting
+ rm -rf /tmp/pg_upgrade_check-wSfzCh
Makefile:39: návod pro cíl „check“ selhal
make[2]: *** [check] Chyba 1
make[2]: Opouští se adresář „/home/pavel/src/postgresql/src/bin/pg_upgrade“

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 496; 1259 47693 VIEW
xmltableview2 pavel
pg_restore: [archiver (db)] could not execute query: ERROR: syntax error
at or near "("
LINE 15: ...XMLTABLE(XMLNAMESPACES('http://x.y'::"text" AS zz)('/zz:rows...

Fixed in attached patch

> Regards
>
> Pavel
>
>
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>

Attachment Content-Type Size
xmltable-30.patch text/x-patch 163.5 KB
diff.patch text/x-patch 454 bytes

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-16 19:30:02
Message-ID: 20170116193002.xdf333cvf4iyfex2@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just realized that your new xml_xmlnodetostr is line-by-line identical
to the existing xml_xmlnodetoxmltype except for two or three lines.
I think that's wrong. I'm going to patch the existing function so that
they can share code.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-16 22:51:13
Message-ID: 20170116225113.mwmuxkfu5zhmcdcx@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Given
/message-id/20170116210019.a3glfwspg5lnfrnm@alap3.anarazel.de
which is going to heavily change how the executor works in this area, I
am returning this patch to you again. I would like a few rather minor
changes:

1. to_xmlstr can be replaced with calls to xmlCharStrdup.
2. don't need xml_xmlnodetostr either -- just use xml_xmlnodetoxmltype
(which returns text*) and extract the cstring from the varlena. It's
a bit more wasteful in terms of cycles, but I don't think we care.
If we do care, change the function so that it returns cstring, and
have the callers that want text wrap it in cstring_to_text.
3. have a new perValueCxt memcxt in TableExprState, child of buildercxt,
and switch to it just before GetValue() (reset it just before
switching). Then, don't worry about leaks in GetValue. This way,
the text* conversions et al don't matter.

After that I think we're going to need to get this working on top of
Andres' changes. Which I'm afraid is going to be rather major surgery,
but I haven't looked.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-16 23:30:00
Message-ID: 20170116233000.aooak7ytomrhvyjo@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In case this still matters, I think GetValue should look more or less
like this (untested):

/*
* Return the value for column number 'colnum' for the current row. If column
* -1 is requested, return representation of the whole row.
*
* This leaks memory, so be sure to reset often the context in which it's
* called.
*/
static Datum
XmlTableGetValue(TableExprState *state, int colnum, bool *isnull)
{
#ifdef USE_LIBXML
XmlTableBuilderData *xtCxt;
Datum result = (Datum) 0;
xmlNodePtr cur;
char *cstr = NULL;
volatile xmlXPathObjectPtr xpathobj;

xtCxt = GetXmlTableBuilderPrivateData(state, "XmlTableGetValue");

Assert(xtCxt->xpathobj &&
xtCxt->xpathobj->type == XPATH_NODESET &&
xtCxt->xpathobj->nodesetval != NULL);

/* Propagate context related error context to libxml2 */
xmlSetStructuredErrorFunc((void *) xtCxt->xmlerrcxt, xml_errorHandler);

cur = xtCxt->xpathobj->nodesetval->nodeTab[xtCxt->row_count - 1];
if (cur->type != XML_ELEMENT_NODE)
elog(ERROR, "unexpected xmlNode type");

/* Handle whole row case the easy way. */
if (colnum == -1)
{
text *txt;

txt = xml_xmlnodetoxmltype(cur, xtCxt->xmlerrcxt);
result = InputFunctionCall(&state->in_functions[0],
text_to_cstring(txt),
state->typioparams[0],
-1);
*isnull = false;

return result;
}

Assert(xtCxt->xpathscomp[colnum] != NULL);

xpathobj = NULL;
PG_TRY();
{
Form_pg_attribute attr;

attr = state->resultSlot->tts_tupleDescriptor->attrs[colnum];

/* Set current node as entry point for XPath evaluation */
xmlXPathSetContextNode(cur, xtCxt->xpathcxt);

/* Evaluate column path */
xpathobj = xmlXPathCompiledEval(xtCxt->xpathscomp[colnum], xtCxt->xpathcxt);
if (xpathobj == NULL || xtCxt->xmlerrcxt->err_occurred)
xml_ereport(xtCxt->xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
"could not create XPath object");

if (xpathobj->type == XPATH_NODESET)
{
int count;
Oid targettypid = attr->atttypid;

if (xpathobj->nodesetval != NULL)
count = xpathobj->nodesetval->nodeNr;

/*
* There are four possible cases, depending on the number of
* nodes returned by the XPath expression and the type of the
* target column: a) XPath returns no nodes. b) One node is
* returned, and column is of type XML. c) One node, column type
* other than XML. d) Multiple nodes are returned.
*/
if (xpathobj->nodesetval == NULL)
{
*isnull = true;
}
else if (count == 1 && targettypid == XMLOID)
{
textstr = xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[0],
xtCxt->xmlerrcxt);
cstr = text_to_cstring(textstr);
}
else if (count == 1)
{
xmlChar *str;

str = xmlNodeListGetString(xtCxt->doc,
xpathobj->nodesetval->nodeTab[0]->xmlChildrenNode,
1);
if (str)
{
PG_TRY();
{
cstr = pstrdup(str);
}
PG_CATCH();
{
xmlFree(str);
PG_RE_THROW();
}
PG_END_TRY();
xmlFree(str);
}
else
cstr = pstrdup("");
}
else
{
StringInfoData buf;
int i;

Assert(count > 1);

/*
* When evaluating the XPath expression returns multiple
* nodes, the result is the concatenation of them all.
* The target type must be XML.
*/
if (targettypid != XMLOID)
ereport(ERROR,
(errcode(ERRCODE_CARDINALITY_VIOLATION),
errmsg("more than one value returned by column XPath expression")));

initStringInfo(&buf);
for (i = 0; i < count; i++)
/* worth freeing the text here? Naahh ... */
appendStringInfoText(&buf,
xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
xtCxt->xmlerrcxt));
cstr = buf.data;
}
}
else if (xpathobj->type == XPATH_STRING)
{
cstr = (char *) xpathobj->stringval;
*isnull = false;
}
else
elog(ERROR, "unexpected XPath object type %u", xpathobj->type);

/*
* By here, either cstr contains the result value, or the isnull flag
* has been set.
*/
Assert(cstr || *isnull);

if (!*isnull)
result = InputFunctionCall(&state->in_functions[colnum],
cstr,
state->typioparams[colnum],
attr->atttypmod);
}
PG_CATCH();
{
if (xpathobj != NULL)
xmlXPathFreeObject(xpathobj);
PG_RE_THROW();
}
PG_END_TRY();

if (xpathobj)
xmlXPathFreeObject(xpathobj);

return result;
#else
NO_XML_SUPPORT();
#endif /* not USE_LIBXML */
}

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-17 18:07:42
Message-ID: CAFj8pRCJs_jPMUFQunmvoXxchKM+PYy2B7FOO5H857as3MGEQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-16 23:51 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Given
> /message-id/20170116210019.
> a3glfwspg5lnfrnm(at)alap3(dot)anarazel(dot)de
> which is going to heavily change how the executor works in this area, I
> am returning this patch to you again. I would like a few rather minor
> changes:
>
> 1. to_xmlstr can be replaced with calls to xmlCharStrdup.
> 2. don't need xml_xmlnodetostr either -- just use xml_xmlnodetoxmltype
> (which returns text*) and extract the cstring from the varlena. It's
> a bit more wasteful in terms of cycles, but I don't think we care.
> If we do care, change the function so that it returns cstring, and
> have the callers that want text wrap it in cstring_to_text.
> 3. have a new perValueCxt memcxt in TableExprState, child of buildercxt,
> and switch to it just before GetValue() (reset it just before
> switching). Then, don't worry about leaks in GetValue. This way,
> the text* conversions et al don't matter.
>
> After that I think we're going to need to get this working on top of
> Andres' changes. Which I'm afraid is going to be rather major surgery,
> but I haven't looked.
>

I'll try to clean xml part first, and then I can reflect the SRF changes. I
am not sure if I understand to all your proposed changes here, I have to
look there.

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-18 21:08:07
Message-ID: CAFj8pRB6Ei-0YG+15WpLLNP8-dt+-K0p6zYBG_0HTMmXha7_BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-16 23:51 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Given
> /message-id/20170116210019.a3glfws
> pg5lnfrnm(at)alap3(dot)anarazel(dot)de
> which is going to heavily change how the executor works in this area, I
> am returning this patch to you again. I would like a few rather minor
> changes:
>
> 1. to_xmlstr can be replaced with calls to xmlCharStrdup.
>

I checked this idea, and it doesn't look well - xmlCharStrdup created xml
string in own memory - and it should be explicitly released with xmlFree().
In this case is more practical using PostgreSQL memory context - because
this memory is released safely in exception. I can rename this function to
more conventional pg_xmlCharStrndup. This function can be used more time in
current code.

> 2. don't need xml_xmlnodetostr either -- just use xml_xmlnodetoxmltype
> (which returns text*) and extract the cstring from the varlena. It's
> a bit more wasteful in terms of cycles, but I don't think we care.
> If we do care, change the function so that it returns cstring, and
> have the callers that want text wrap it in cstring_to_text.
>

done - it is related to not too often use case, and possible slowdown is
minimal

> 3. have a new perValueCxt memcxt in TableExprState, child of buildercxt,
> and switch to it just before GetValue() (reset it just before
> switching). Then, don't worry about leaks in GetValue. This way,
> the text* conversions et al don't matter.
>

done

>
> After that I think we're going to need to get this working on top of
> Andres' changes. Which I'm afraid is going to be rather major surgery,
> but I haven't looked.
>

I am waiting on new commits in this area. This moment I have not idea what
will be broken.

attached updated patch with cleaned xml part

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-31.patch text/x-patch 161.5 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-19 07:48:32
Message-ID: CAFj8pRCoZ79X0FxtG40n9FFRJS+3LOFw0Jp+Jb2o=kCg0=RrQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

New update - rebase after yesterday changes.

What you want to change?

Regards

Pavel

Attachment Content-Type Size
xmltable-32.patch text/x-patch 161.4 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-19 12:35:56
Message-ID: 20170119123556.hhksp33c2vwkb7ig@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?

I think the problem might come from the still pending patch on that
series, which Andres posted in
/message-id/20170118221154.aldebi7yyjvds5qa@alap3.anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-19 12:40:22
Message-ID: CAFj8pRBgrWDVJLWYmPV_TDP06NYDiMzTOYFceozo_n8wst3ZeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > Hi
> >
> > New update - rebase after yesterday changes.
> >
> > What you want to change?
>
> I think the problem might come from the still pending patch on that
> series, which Andres posted in
> /message-id/20170118221154.
> aldebi7yyjvds5qa(at)alap3(dot)anarazel(dot)de
> As far as I understand, minor details of that patch might change before
> commit, but it is pretty much in definitive form.
>

ok, we have to wait - please, check XML part if it is good for you

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-21 08:30:17
Message-ID: CAFj8pRAOi_Fy7c4Xk=d9h=PAbNoV8CbMtChp-hYgUFpEA7n+nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > Hi
> >
> > New update - rebase after yesterday changes.
> >
> > What you want to change?
>
> I think the problem might come from the still pending patch on that
> series, which Andres posted in
> /message-id/20170118221154.
> aldebi7yyjvds5qa(at)alap3(dot)anarazel(dot)de
> As far as I understand, minor details of that patch might change before
> commit, but it is pretty much in definitive form.
>

new rebased update after these changes

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-33.patch text/x-patch 165.2 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-21 09:31:48
Message-ID: CAFj8pRAA6gQBXc+DV4SFHZT9hCqN4NQWg9bPc-aZUvj-dRf8kQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-21 9:30 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
> 2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>> Pavel Stehule wrote:
>> > Hi
>> >
>> > New update - rebase after yesterday changes.
>> >
>> > What you want to change?
>>
>> I think the problem might come from the still pending patch on that
>> series, which Andres posted in
>> /message-id/20170118221154.aldebi7
>> yyjvds5qa(at)alap3(dot)anarazel(dot)de
>> As far as I understand, minor details of that patch might change before
>> commit, but it is pretty much in definitive form.
>>
>
> new rebased update after these changes
>

fix white spaces

pavel

>
> Regards
>
> Pavel
>
>
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>

Attachment Content-Type Size
xmltable-34.patch text/x-patch 164.6 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-22 09:57:22
Message-ID: CAFj8pRApgDUJodE_=aPheACX=xAJVwr07myUeBpUMoeNUYUkMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-21 10:31 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2017-01-21 9:30 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>
>> Hi
>>
>> 2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>>
>>> Pavel Stehule wrote:
>>> > Hi
>>> >
>>> > New update - rebase after yesterday changes.
>>> >
>>> > What you want to change?
>>>
>>> I think the problem might come from the still pending patch on that
>>> series, which Andres posted in
>>> /message-id/20170118221154.aldebi7
>>> yyjvds5qa(at)alap3(dot)anarazel(dot)de
>>> As far as I understand, minor details of that patch might change before
>>> commit, but it is pretty much in definitive form.
>>>
>>
>> new rebased update after these changes
>>
>
> fix white spaces
>

few fixes:

* SELECT (xmltable(..)).* + regress tests
* compilation and regress tests without --with-libxml

Regards

Pavel

>
> pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> --
>>> Álvaro Herrera https://www.2ndQuadrant.com/
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>
>>
>>
>

Attachment Content-Type Size
xmltable-35.patch text/x-patch 167.7 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-24 20:38:49
Message-ID: 20170124203849.2r2xzkraix6qy3i7@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> * SELECT (xmltable(..)).* + regress tests
> * compilation and regress tests without --with-libxml

Thanks. I just realized that this is doing more work than necessary --
I think it would be simpler to have tableexpr fill a tuplestore with the
results, instead of just expecting function execution to apply
ExecEvalExpr over and over to obtain the results. So evaluating a
tableexpr returns just the tuplestore, which function evaluation can
return as-is. That code doesn't use the value-per-call interface
anyway.

I also realized that the expr context callback is not called if there's
an error, which leaves us without shutting down libxml properly. I
added PG_TRY around the fetchrow calls, but I'm not sure that's correct
either, because there could be an error raised in other parts of the
code, after we've already emitted a few rows (for example out of
memory). I think the right way is to have PG_TRY around the execution
of the whole thing rather than just row at a time; and the tuplestore
mechanism helps us with that.

I think it would be good to have a more complex test case in regress --
let's say there is a table with some simple XML values, then we use
XMLFOREST (or maybe one of the table_to_xml functions) to generate a
large document, and then XMLTABLE uses that document as input document.

Please fix.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
xmltable-36.patch text/plain 171.1 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-24 22:37:44
Message-ID: 20170124223744.moyzln5vifjbkl7s@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> + bool *isnull);
> +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext,
> + bool *isNull);
> +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> + bool *isNull);
> +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> + Datum doc);
> +static void ShutdownTableExpr(Datum arg);

To me this (and a lot of the other code) hints quite strongly that
expression evalution is the wrong approach to implementing this. What
you're essentially doing is building a vulcano style scan node. Even if
we can this, we shouldn't double down on the bad decision to have these
magic expressions that return multiple rows. There's historical reason
for tSRFs, but we shouldn't add more weirdness like this.

Andres


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 00:32:56
Message-ID: 20170125003256.cp2hwtgfuvzbcwou@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund wrote:
> Hi,
>
> On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> > + bool *isnull);
> > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext,
> > + bool *isNull);
> > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> > + bool *isNull);
> > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> > + Datum doc);
> > +static void ShutdownTableExpr(Datum arg);
>
> To me this (and a lot of the other code) hints quite strongly that
> expression evalution is the wrong approach to implementing this. What
> you're essentially doing is building a vulcano style scan node. Even if
> we can this, we shouldn't double down on the bad decision to have these
> magic expressions that return multiple rows. There's historical reason
> for tSRFs, but we shouldn't add more weirdness like this.

Thanks for giving it a look. I have long thought that this patch would
be at odds with your overall executor work.

XMLTABLE is specified by the standard to return multiple rows ... but
then as far as my reading goes, it is only supposed to be supported in
the range table (FROM clause) not in the target list. I wonder if
this would end up better if we only tried to support it in RT. I asked
Pavel to implement it like that a few weeks ago, but ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 00:35:11
Message-ID: 20170125003511.zc3kzwiio7kpntda@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> >
> > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> > > + bool *isnull);
> > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext,
> > > + bool *isNull);
> > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> > > + bool *isNull);
> > > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> > > + Datum doc);
> > > +static void ShutdownTableExpr(Datum arg);
> >
> > To me this (and a lot of the other code) hints quite strongly that
> > expression evalution is the wrong approach to implementing this. What
> > you're essentially doing is building a vulcano style scan node. Even if
> > we can this, we shouldn't double down on the bad decision to have these
> > magic expressions that return multiple rows. There's historical reason
> > for tSRFs, but we shouldn't add more weirdness like this.
>
> Thanks for giving it a look. I have long thought that this patch would
> be at odds with your overall executor work.

Not fundamentally, but it makes it harder.

> XMLTABLE is specified by the standard to return multiple rows ... but
> then as far as my reading goes, it is only supposed to be supported in
> the range table (FROM clause) not in the target list. I wonder if
> this would end up better if we only tried to support it in RT. I asked
> Pavel to implement it like that a few weeks ago, but ...

Right - it makes sense in the FROM list - but then it should be an
executor node, instead of some expression thingy.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 04:45:03
Message-ID: 16622.1485319503@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
>> XMLTABLE is specified by the standard to return multiple rows ... but
>> then as far as my reading goes, it is only supposed to be supported in
>> the range table (FROM clause) not in the target list. I wonder if
>> this would end up better if we only tried to support it in RT. I asked
>> Pavel to implement it like that a few weeks ago, but ...

> Right - it makes sense in the FROM list - but then it should be an
> executor node, instead of some expression thingy.

+1 --- we're out of the business of having simple expressions that
return rowsets.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 04:45:24
Message-ID: CAFj8pRBi7_Gne=mSMikZ8S7SV61myLLszfetUqFkhHpotnZ6QA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-25 1:35 GMT+01:00 Andres Freund <andres(at)anarazel(dot)de>:

> On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > Hi,
> > >
> > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext
> *econtext,
> > > > + bool *isnull);
> > > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate,
> ExprContext *econtext,
> > > > + bool *isNull);
> > > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext
> *econtext,
> > > > + bool *isNull);
> > > > +static void tabexprInitialize(TableExprState *tstate, ExprContext
> *econtext,
> > > > + Datum doc);
> > > > +static void ShutdownTableExpr(Datum arg);
> > >
> > > To me this (and a lot of the other code) hints quite strongly that
> > > expression evalution is the wrong approach to implementing this. What
> > > you're essentially doing is building a vulcano style scan node. Even
> if
> > > we can this, we shouldn't double down on the bad decision to have these
> > > magic expressions that return multiple rows. There's historical reason
> > > for tSRFs, but we shouldn't add more weirdness like this.
> >
> > Thanks for giving it a look. I have long thought that this patch would
> > be at odds with your overall executor work.
>
> Not fundamentally, but it makes it harder.
>

If you plan to hold support SRFin target list, then nothing is different.
In last patch is executed under nodeProjectSet.

>
>
> > XMLTABLE is specified by the standard to return multiple rows ... but
> > then as far as my reading goes, it is only supposed to be supported in
> > the range table (FROM clause) not in the target list. I wonder if
> > this would end up better if we only tried to support it in RT. I asked
> > Pavel to implement it like that a few weeks ago, but ...
>
> Right - it makes sense in the FROM list - but then it should be an
> executor node, instead of some expression thingy.
>

The XMLTABLE function is from user perspective, from implementation
perspective a form of SRF function. I use own executor node, because fcinfo
is complex already and not too enough to hold all information about result
columns.

The implementation as RT doesn't reduce code - it is just moving to
different file.

I'll try to explain my motivation. Please, check it and correct me if I am
wrong. I don't keep on my implementation - just try to implement XMLTABLE
be consistent with another behave and be used all time without any
surprise.

1. Any function that produces a content can be used in target list. We
support SRF in target list and in FROM part. Why XMLTABLE should be a
exception?

2. In standard the XMLTABLE is placed only on FROM part - but standard
doesn't need to solve my question - there are not SRF functions allowed in
targetlist.

If there be a common decision so this inconsistency (in behave of this kind
of functions) is expected, required - then I have not a problem to remove
this support from XMLTABLE.

In this moment I don't see a technical reason for this step - with last
Andres changes the support of XMLTABLE in target list needs less than 40
lines and there is not any special path for XMLTABLE only. Andres write
support for SRF functions and SRF operator. TableExpr is third category.

Regards

Pavel

> Greetings,
>
> Andres Freund
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 04:48:58
Message-ID: CAFj8pRBUF2en2XyBcT+L5RVybQ_KPELnwFxEjPjc=CtPiKzqSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-25 5:45 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list. I wonder if
> >> this would end up better if we only tried to support it in RT. I asked
> >> Pavel to implement it like that a few weeks ago, but ...
>
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
>
> +1 --- we're out of the business of having simple expressions that
> return rowsets.
>

If we do decision so this kind of function will have different behave than
other SRF functions, then I remove support for this.

There are not technical reasons (maybe I don't see it) - last Andres
changes do well support for my code.

Regards

Pavel

>
> regards, tom lane
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 08:26:27
Message-ID: CAFj8pRCtmLPQPPHLCaD7sS4UiudTdh=b9po_U25cySDP1ThGKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
>
> > * SELECT (xmltable(..)).* + regress tests
> > * compilation and regress tests without --with-libxml
>
> Thanks. I just realized that this is doing more work than necessary --
>

?? I don't understand?

> I think it would be simpler to have tableexpr fill a tuplestore with the
> results, instead of just expecting function execution to apply
> ExecEvalExpr over and over to obtain the results. So evaluating a
> tableexpr returns just the tuplestore, which function evaluation can
> return as-is. That code doesn't use the value-per-call interface
> anyway.
>

ok

> I also realized that the expr context callback is not called if there's
> an error, which leaves us without shutting down libxml properly. I
> added PG_TRY around the fetchrow calls, but I'm not sure that's correct
> either, because there could be an error raised in other parts of the
> code, after we've already emitted a few rows (for example out of
> memory). I think the right way is to have PG_TRY around the execution
> of the whole thing rather than just row at a time; and the tuplestore
> mechanism helps us with that.
>

ok.

>
> I think it would be good to have a more complex test case in regress --
> let's say there is a table with some simple XML values, then we use
> XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> large document, and then XMLTABLE uses that document as input document.
>
> Please fix.
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 14:07:47
Message-ID: 20170125140747.etmtac4bbhvnkjnd@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list. I wonder if
> >> this would end up better if we only tried to support it in RT. I asked
> >> Pavel to implement it like that a few weeks ago, but ...
>
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
>
> +1 --- we're out of the business of having simple expressions that
> return rowsets.

Well, that's it. I'm not committing this patch against two other
committers' opinion, plus I was already on the fence about the
implementation anyway. I think you should just go with the flow and
implement this by creating nodeTableexprscan.c. It's not even
difficult.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 20:31:28
Message-ID: 20170125203128.juo2eizvkzzrvcvl@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-01-25 05:45:24 +0100, Pavel Stehule wrote:
> 2017-01-25 1:35 GMT+01:00 Andres Freund <andres(at)anarazel(dot)de>:
>
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > > Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > + bool *isnull);
> > > > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate,
> > ExprContext *econtext,
> > > > > + bool *isNull);
> > > > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > + bool *isNull);
> > > > > +static void tabexprInitialize(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > + Datum doc);
> > > > > +static void ShutdownTableExpr(Datum arg);
> > > >
> > > > To me this (and a lot of the other code) hints quite strongly that
> > > > expression evalution is the wrong approach to implementing this. What
> > > > you're essentially doing is building a vulcano style scan node. Even
> > if
> > > > we can this, we shouldn't double down on the bad decision to have these
> > > > magic expressions that return multiple rows. There's historical reason
> > > > for tSRFs, but we shouldn't add more weirdness like this.
> > >
> > > Thanks for giving it a look. I have long thought that this patch would
> > > be at odds with your overall executor work.
> >
> > Not fundamentally, but it makes it harder.
> >
>
> If you plan to hold support SRFin target list, then nothing is different.
> In last patch is executed under nodeProjectSet.

It is, because we suddenly need to call different functions - and I'm
revamping most of execQual to have an opcode dispatch based execution
model (which then also can be JITed).

> > > XMLTABLE is specified by the standard to return multiple rows ... but
> > > then as far as my reading goes, it is only supposed to be supported in
> > > the range table (FROM clause) not in the target list. I wonder if
> > > this would end up better if we only tried to support it in RT. I asked
> > > Pavel to implement it like that a few weeks ago, but ...
> >
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
> >
>
> The XMLTABLE function is from user perspective, from implementation
> perspective a form of SRF function. I use own executor node, because fcinfo
> is complex already and not too enough to hold all information about result
> columns.

> The implementation as RT doesn't reduce code - it is just moving to
> different file.

You're introducing a wholly separate callback system (TableExprRoutine)
for the new functionality. And that stuff is excruciatingly close to
stuff that the normal executor already knows how to do.

> I'll try to explain my motivation. Please, check it and correct me if I am
> wrong. I don't keep on my implementation - just try to implement XMLTABLE
> be consistent with another behave and be used all time without any
> surprise.
>
> 1. Any function that produces a content can be used in target list. We
> support SRF in target list and in FROM part. Why XMLTABLE should be a
> exception?

targetlist SRFs were a big mistake. They cause a fair number of problems
code-wise. They permeated for a long while into bits of both planner and
executor, where they really shouldn't belong. Even after the recent
changes there's a fair amount of uglyness associated with them. We
can't remove tSRFs for backward compatibility reasons, but that's not
true for XMLTABLE

Greetings,

Andres Freund


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 21:38:25
Message-ID: CAFj8pRBXnxrY_sAUunfhfxEw_+CSwBZMw6UTD4p3484WYrzPVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > >
> >
> > If you plan to hold support SRFin target list, then nothing is different.
> > In last patch is executed under nodeProjectSet.
>
> It is, because we suddenly need to call different functions - and I'm
> revamping most of execQual to have an opcode dispatch based execution
> model (which then also can be JITed).
>

> > > > XMLTABLE is specified by the standard to return multiple rows ... but
> > > > then as far as my reading goes, it is only supposed to be supported
> in
> > > > the range table (FROM clause) not in the target list. I wonder if
> > > > this would end up better if we only tried to support it in RT. I
> asked
> > > > Pavel to implement it like that a few weeks ago, but ...
> > >
> > > Right - it makes sense in the FROM list - but then it should be an
> > > executor node, instead of some expression thingy.
> > >
> >
> > The XMLTABLE function is from user perspective, from implementation
> > perspective a form of SRF function. I use own executor node, because
> fcinfo
> > is complex already and not too enough to hold all information about
> result
> > columns.
>
>
> > The implementation as RT doesn't reduce code - it is just moving to
> > different file.
>
> You're introducing a wholly separate callback system (TableExprRoutine)
> for the new functionality. And that stuff is excruciatingly close to
> stuff that the normal executor already knows how to do.
>

These callbacks are related to isolation TableExpr infrastructure and
TableExpr implementation - This design is prepared for reusing for
JSON_TABLE function.

Any placing of TableExpr code should not impact this callback system (Or I
am absolutely out and executor is able do some work what is hidden to me).

>
>
>
> > I'll try to explain my motivation. Please, check it and correct me if I
> am
> > wrong. I don't keep on my implementation - just try to implement XMLTABLE
> > be consistent with another behave and be used all time without any
> > surprise.
> >
> > 1. Any function that produces a content can be used in target list. We
> > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > exception?
>
> targetlist SRFs were a big mistake. They cause a fair number of problems
> code-wise. They permeated for a long while into bits of both planner and
> executor, where they really shouldn't belong. Even after the recent
> changes there's a fair amount of uglyness associated with them. We
> can't remove tSRFs for backward compatibility reasons, but that's not
> true for XMLTABLE
>
>
>
ok

I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement
it partially :( - mainly for usage in "ROWS FROM ()"

Greetings,
>
> Andres Freund
>


From: Andres Freund <andres(at)anarazel(dot)de>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 21:40:22
Message-ID: 20170125214022.sdpoh2lollull2yw@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

> > > I'll try to explain my motivation. Please, check it and correct me if I
> > am
> > > wrong. I don't keep on my implementation - just try to implement XMLTABLE
> > > be consistent with another behave and be used all time without any
> > > surprise.
> > >
> > > 1. Any function that produces a content can be used in target list. We
> > > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > > exception?
> >
> > targetlist SRFs were a big mistake. They cause a fair number of problems
> > code-wise. They permeated for a long while into bits of both planner and
> > executor, where they really shouldn't belong. Even after the recent
> > changes there's a fair amount of uglyness associated with them. We
> > can't remove tSRFs for backward compatibility reasons, but that's not
> > true for XMLTABLE
> >
> >
> >
> ok
>
> I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement
> it partially :( - mainly for usage in "ROWS FROM ()"

Huh?

Greetings,

Andres Freund


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 21:51:37
Message-ID: CAFj8pRDbagn03gzRQ1rGJsYkZBvcRkQU8bfFO0eTvJ+4NG8oHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-25 22:40 GMT+01:00 Andres Freund <andres(at)anarazel(dot)de>:

> Hi,
>
> > > > I'll try to explain my motivation. Please, check it and correct me
> if I
> > > am
> > > > wrong. I don't keep on my implementation - just try to implement
> XMLTABLE
> > > > be consistent with another behave and be used all time without any
> > > > surprise.
> > > >
> > > > 1. Any function that produces a content can be used in target list.
> We
> > > > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > > > exception?
> > >
> > > targetlist SRFs were a big mistake. They cause a fair number of
> problems
> > > code-wise. They permeated for a long while into bits of both planner
> and
> > > executor, where they really shouldn't belong. Even after the recent
> > > changes there's a fair amount of uglyness associated with them. We
> > > can't remove tSRFs for backward compatibility reasons, but that's not
> > > true for XMLTABLE
> > >
> > >
> > >
> > ok
> >
> > I afraid when I cannot to reuse a SRF infrastructure, I have to
> reimplement
> > it partially :( - mainly for usage in "ROWS FROM ()"
>

The TableExpr implementation is based on SRF now. You and Alvaro propose
independent implementation like generic executor node. I am sceptic so
FunctionScan supports reading from generic executor node.

Regards

Pavel

> Huh?
>
> Greetings,
>
> Andres Freund
>


From: Andres Freund <andres(at)anarazel(dot)de>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 22:33:56
Message-ID: 20170125223356.4x5xx55f4jv5xmno@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> 2017-01-25 22:40 GMT+01:00 Andres Freund <andres(at)anarazel(dot)de>:
> > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > reimplement
> > > it partially :( - mainly for usage in "ROWS FROM ()"
> >
>
> The TableExpr implementation is based on SRF now. You and Alvaro propose
> independent implementation like generic executor node. I am sceptic so
> FunctionScan supports reading from generic executor node.

Why would it need to?


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-25 22:43:22
Message-ID: CAFj8pRCrj1J6FejdgOCbWjH3U3ATGD_zUE6Q3k7ydkPWpHeGnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-25 23:33 GMT+01:00 Andres Freund <andres(at)anarazel(dot)de>:

> On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> > 2017-01-25 22:40 GMT+01:00 Andres Freund <andres(at)anarazel(dot)de>:
> > > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > > reimplement
> > > > it partially :( - mainly for usage in "ROWS FROM ()"
> > >
> >
> > The TableExpr implementation is based on SRF now. You and Alvaro propose
> > independent implementation like generic executor node. I am sceptic so
> > FunctionScan supports reading from generic executor node.
>
> Why would it need to?
>

Simply - due consistency with any other functions that can returns rows.

Maybe I don't understand to Alvaro proposal well - I have a XMLTABLE
function - TableExpr that looks like SRF function, has similar behave -
returns more rows, but should be significantly different implemented, and
should to have different limits - should not be used there and there ... It
is hard to see consistency there for me.

Regards

Pavel


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-26 00:51:00
Message-ID: CAFj8pRAjWVnbLPOcF1cCF7UedVL1iku1H16E-X4NQDd3e34f9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-25 15:07 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > >> XMLTABLE is specified by the standard to return multiple rows ... but
> > >> then as far as my reading goes, it is only supposed to be supported in
> > >> the range table (FROM clause) not in the target list. I wonder if
> > >> this would end up better if we only tried to support it in RT. I
> asked
> > >> Pavel to implement it like that a few weeks ago, but ...
> >
> > > Right - it makes sense in the FROM list - but then it should be an
> > > executor node, instead of some expression thingy.
> >
> > +1 --- we're out of the business of having simple expressions that
> > return rowsets.
>
> Well, that's it. I'm not committing this patch against two other
> committers' opinion, plus I was already on the fence about the
> implementation anyway. I think you should just go with the flow and
> implement this by creating nodeTableexprscan.c. It's not even
> difficult.
>

I am playing with this and the patch looks about 15kB longer - just due
implementation basic scan functionality - and I didn't touch a planner.

I am not happy from this - still I have a feeling so I try to reimplement
reduced SRF.

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-30 19:22:04
Message-ID: CAFj8pRAbYgh3ic+-J5WneNvGbahWXRaomsnoFVNkq1TDZ-znjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-25 15:07 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > >> XMLTABLE is specified by the standard to return multiple rows ... but
> > >> then as far as my reading goes, it is only supposed to be supported in
> > >> the range table (FROM clause) not in the target list. I wonder if
> > >> this would end up better if we only tried to support it in RT. I
> asked
> > >> Pavel to implement it like that a few weeks ago, but ...
> >
> > > Right - it makes sense in the FROM list - but then it should be an
> > > executor node, instead of some expression thingy.
> >
> > +1 --- we're out of the business of having simple expressions that
> > return rowsets.
>
> Well, that's it. I'm not committing this patch against two other
> committers' opinion, plus I was already on the fence about the
> implementation anyway. I think you should just go with the flow and
> implement this by creating nodeTableexprscan.c. It's not even
> difficult.
>

I am sending new version - it is based on own executor scan node and
tuplestore.

Some now obsolete regress tests removed, some new added.

The executor code (memory context usage) should be cleaned little bit - but
other code should be good.

Regards

Pavel

> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-40.patch.gz application/x-gzip 33.7 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-30 19:38:59
Message-ID: 20170130193859.67kov6mvvxmrrdhl@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> I am sending new version - it is based on own executor scan node and
> tuplestore.
>
> Some now obsolete regress tests removed, some new added.
>
> The executor code (memory context usage) should be cleaned little bit - but
> other code should be good.

I think you forgot nodeTableFuncscan.c.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-30 20:18:53
Message-ID: CAFj8pRBNyODWdsawRJ7PsK_+TJMU18zUPBUZ3EyXQA7sY=AsuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-30 20:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
>
> > I am sending new version - it is based on own executor scan node and
> > tuplestore.
> >
> > Some now obsolete regress tests removed, some new added.
> >
> > The executor code (memory context usage) should be cleaned little bit -
> but
> > other code should be good.
>
> I think you forgot nodeTableFuncscan.c.
>

true, I am sorry

attached

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-41.patch.gz application/x-gzip 37.5 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-31 06:33:05
Message-ID: CAB7nPqTC=Dr3D8=PvdTNU2O_QH68duheTADNMvXfJbD9fuyRAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 31, 2017 at 5:18 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> true, I am sorry

Last status is a new patch and no reviews. On top of that this thread
is quite active. So moved to next CF. Pavel, please be careful about
the status of the patch on the CF app, it was set to "waiting on
author"...
--
Michael


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-31 10:26:32
Message-ID: CAFj8pRD88BHuDyRf5AhS5OudOvKuwiANwYJj_kBORN9=d9zfsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
>
> > * SELECT (xmltable(..)).* + regress tests
> > * compilation and regress tests without --with-libxml
>
> Thanks. I just realized that this is doing more work than necessary --
> I think it would be simpler to have tableexpr fill a tuplestore with the
> results, instead of just expecting function execution to apply
> ExecEvalExpr over and over to obtain the results. So evaluating a
> tableexpr returns just the tuplestore, which function evaluation can
> return as-is. That code doesn't use the value-per-call interface
> anyway.
>
> I also realized that the expr context callback is not called if there's
> an error, which leaves us without shutting down libxml properly. I
> added PG_TRY around the fetchrow calls, but I'm not sure that's correct
> either, because there could be an error raised in other parts of the
> code, after we've already emitted a few rows (for example out of
> memory). I think the right way is to have PG_TRY around the execution
> of the whole thing rather than just row at a time; and the tuplestore
> mechanism helps us with that.
>
> I think it would be good to have a more complex test case in regress --
> let's say there is a table with some simple XML values, then we use
> XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> large document, and then XMLTABLE uses that document as input document.
>

I have a 16K lines long real XML 6.MB. Probably we would not to append it
to regress tests.

It is really fast - original customer implementation 20min, nested our
xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms

I have a plan to create tests based on pg_proc and CTE - if all works, then
the query must be empty

with x as (select proname, proowner, procost, pronargs,
array_to_string(proargnames,',') as proargnames,
array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
lateral xmltable('/proc' passing proc columns proname name, proowner oid,
procost float, pronargs int, proargnames text, proargtypes text)) select *
from z except select * from x;

>
> Please fix.
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-31 13:57:07
Message-ID: 20170131135707.b433fy2ph2v5lp4k@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> > I think it would be good to have a more complex test case in regress --
> > let's say there is a table with some simple XML values, then we use
> > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > large document, and then XMLTABLE uses that document as input document.
>
> I have a 16K lines long real XML 6.MB. Probably we would not to append it
> to regress tests.
>
> It is really fast - original customer implementation 20min, nested our
> xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms

That's great numbers, kudos for the hard work here. That will make for
a nice headline in pg10 PR materials. But what I was getting at is that
I would like to exercise a bit more of the expression handling in
xmltable execution, to make sure it doesn't handle just string literals.

> I have a plan to create tests based on pg_proc and CTE - if all works, then
> the query must be empty
>
> with x as (select proname, proowner, procost, pronargs,
> array_to_string(proargnames,',') as proargnames,
> array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
> xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
> lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> procost float, pronargs int, proargnames text, proargtypes text)) select *
> from z except select * from x;

Nice one :-)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-31 15:32:57
Message-ID: CAFj8pRDyRRR75qiCVPbikyXP8rx1AJ7pN-Xsp8jkHy2c+7pZ7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-01-31 14:57 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > > I think it would be good to have a more complex test case in regress --
> > > let's say there is a table with some simple XML values, then we use
> > > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > > large document, and then XMLTABLE uses that document as input document.
> >
> > I have a 16K lines long real XML 6.MB. Probably we would not to append it
> > to regress tests.
> >
> > It is really fast - original customer implementation 20min, nested our
> > xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms
>
> That's great numbers, kudos for the hard work here. That will make for
> a nice headline in pg10 PR materials. But what I was getting at is that
> I would like to exercise a bit more of the expression handling in
> xmltable execution, to make sure it doesn't handle just string literals.
>

I'll try to write some more dynamic examples.

>
> > I have a plan to create tests based on pg_proc and CTE - if all works,
> then
> > the query must be empty
> >
> > with x as (select proname, proowner, procost, pronargs,
> > array_to_string(proargnames,',') as proargnames,
> > array_to_string(proargtypes,',') as proargtypes from pg_proc), y as
> (select
> > xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> > proargnames, proargtypes)) as proc from x), z as (select xmltable.* from
> y,
> > lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> > procost float, pronargs int, proargnames text, proargtypes text)) select
> *
> > from z except select * from x;
>
> Nice one :-)
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-01-31 21:20:49
Message-ID: CAFj8pRA3GE3RAHqLVxGhXtTZt0JuY8AmX1v2owFuYTwCnFLQzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-01-31 14:57 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > > I think it would be good to have a more complex test case in regress --
> > > let's say there is a table with some simple XML values, then we use
> > > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > > large document, and then XMLTABLE uses that document as input document.
> >
> > I have a 16K lines long real XML 6.MB. Probably we would not to append it
> > to regress tests.
> >
> > It is really fast - original customer implementation 20min, nested our
> > xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms
>
> That's great numbers, kudos for the hard work here. That will make for
> a nice headline in pg10 PR materials. But what I was getting at is that
> I would like to exercise a bit more of the expression handling in
> xmltable execution, to make sure it doesn't handle just string literals.
>

done

>
> > I have a plan to create tests based on pg_proc and CTE - if all works,
> then
> > the query must be empty
> >
> > with x as (select proname, proowner, procost, pronargs,
> > array_to_string(proargnames,',') as proargnames,
> > array_to_string(proargtypes,',') as proargtypes from pg_proc), y as
> (select
> > xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> > proargnames, proargtypes)) as proc from x), z as (select xmltable.* from
> y,
> > lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> > procost float, pronargs int, proargnames text, proargtypes text)) select
> *
> > from z except select * from x;
>
> Nice one :-)
>

please see attached patch

* enhanced regress tests
* clean memory context work

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-42.patch.gz application/x-gzip 38.3 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-02-16 05:38:04
Message-ID: CAFj8pRAeuLpjSEpAKBBOChx-6OVUkETwwHZXQnmTRkUtHpXTiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

> please see attached patch
>
> * enhanced regress tests
> * clean memory context work
>

new update

fix a bug in string compare
fix some typo and obsolete comments

Regards

Pavel

>
> Regards
>
> Pavel
>
>
>>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>

Attachment Content-Type Size
xmltable-43.patch text/x-patch 230.5 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-02-20 18:12:55
Message-ID: CAFj8pRBYzyGLNq7rmbrE=Jx60ku6yEdtNRwojN9qPmLvcSJFxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-02-16 6:38 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
>
>> please see attached patch
>>
>> * enhanced regress tests
>> * clean memory context work
>>
>
> new update
>
> fix a bug in string compare
> fix some typo and obsolete comments
>
> Regards
>

some minor but interesting fix.

I found so some xml values imported via recv function can have
inconsistency between header encoding and used encoding. Internally the
header encoding is removed - see xml_out function.

So now, when I have to prepare data for libxml2, I don't do direct cast,
but I use xml_out_internal instead. Maybe this technique should be used
elsewhere? Same issue I see on xpath function.

Solved issue is not too often probably - the some different encoding than
utf8 should be used in XML document and XML document should be loaded with
recv function.

Regards

Pavel

>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> --
>>> Álvaro Herrera https://www.2ndQuadrant.com/
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>
>>
>>
>

Attachment Content-Type Size
xmltable-44.patch text/x-patch 231.2 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-03-02 00:12:45
Message-ID: 20170302001245.klj4wtrx7v3mqrwu@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I've been giving this a look. I started by tweaking the docs once
again, and while verifying that the example works as expected, I
replayed what I have in sgml:

... begin SGML paste ...
<para>
For example, given the following XML document:
<screen><![CDATA[
<ROWS>
<ROW id="1">
<COUNTRY_ID>AU</COUNTRY_ID>
<COUNTRY_NAME>Australia</COUNTRY_NAME>
</ROW>
<ROW id="5">
<COUNTRY_ID>JP</COUNTRY_ID>
<COUNTRY_NAME>Japan</COUNTRY_NAME>
<PREMIER_NAME>Sinzo Abe</PREMIER_NAME>
</ROW>
<ROW id="6">
<COUNTRY_ID>SG</COUNTRY_ID>
<COUNTRY_NAME>Singapore</COUNTRY_NAME>
<SIZE unit="km">791</SIZE>
</ROW>
</ROWS>
]]></screen>

the following query produces the result shown below:

<screen><![CDATA[
SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL xmltable('//ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
ordinality FOR ORDINALITY,
country_name text PATH 'COUNTRY_NAME',
country_id text PATH 'COUNTRY_ID',
size float PATH 'SIZE[(at)unit = "km"]/text()',
unit text PATH 'SIZE/@unit',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');
... end SGML paste ...

But the query doesn't actually return a table, but instead it fails with
this error:
ERROR: invalid input syntax for type double precision: ""
This is because of the "size" column (if I remove SIZE from the COLUMNS
clause, the query returns correctly). Apparently, for the rows where
SIZE is not given, we try to inssert an empty string instead of a NULL
value, which is what I expected.

I'm using your v44 code, but trimmed both the XML document used in SGML
as well as modified the query slightly to show additional features. But
those changes should not cause the above error ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-03-02 07:04:14
Message-ID: CAFj8pRCM5Fq4mShcTypcsyLtatejUf=fiO1NmQnVcWgykxJB7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

2017-03-02 1:12 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

>
> I've been giving this a look. I started by tweaking the docs once
> again, and while verifying that the example works as expected, I
> replayed what I have in sgml:
>
> ... begin SGML paste ...
> <para>
> For example, given the following XML document:
> <screen><![CDATA[
> <ROWS>
> <ROW id="1">
> <COUNTRY_ID>AU</COUNTRY_ID>
> <COUNTRY_NAME>Australia</COUNTRY_NAME>
> </ROW>
> <ROW id="5">
> <COUNTRY_ID>JP</COUNTRY_ID>
> <COUNTRY_NAME>Japan</COUNTRY_NAME>
> <PREMIER_NAME>Sinzo Abe</PREMIER_NAME>
> </ROW>
> <ROW id="6">
> <COUNTRY_ID>SG</COUNTRY_ID>
> <COUNTRY_NAME>Singapore</COUNTRY_NAME>
> <SIZE unit="km">791</SIZE>
> </ROW>
> </ROWS>
> ]]></screen>
>
> the following query produces the result shown below:
>
> <screen><![CDATA[
> SELECT xmltable.*
> FROM (SELECT data FROM xmldata) x,
> LATERAL xmltable('//ROWS/ROW'
> PASSING data
> COLUMNS id int PATH '@id',
> ordinality FOR ORDINALITY,
> country_name text PATH 'COUNTRY_NAME',
> country_id text PATH 'COUNTRY_ID',
> size float PATH 'SIZE[(at)unit =
> "km"]/text()',
> unit text PATH 'SIZE/@unit',
> premier_name text PATH 'PREMIER_NAME'
> DEFAULT 'not specified');
> ... end SGML paste ...
>
>
> But the query doesn't actually return a table, but instead it fails with
> this error:
> ERROR: invalid input syntax for type double precision: ""
> This is because of the "size" column (if I remove SIZE from the COLUMNS
> clause, the query returns correctly). Apparently, for the rows where
> SIZE is not given, we try to inssert an empty string instead of a NULL
> value, which is what I expected.
>
> I'm using your v44 code, but trimmed both the XML document used in SGML
> as well as modified the query slightly to show additional features. But
> those changes should not cause the above error ...
>

The example in doc is obsolete. Following example works without problems.

SELECT xmltable.*

FROM (SELECT data FROM xmldata) x,
LATERAL xmltable('//ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
ordinality FOR ORDINALITY,
country_name text PATH 'COUNTRY_NAME',
country_id text PATH 'COUNTRY_ID',
size float PATH 'SIZE[(at)unit = "km"]',
unit text PATH 'SIZE/@unit',
premier_name text PATH 'PREMIER_NAME'
DEFAULT 'not specified');

It is related to older variants of this patch, where I explicitly mapped
empty strings to NULL.

Now, I don't do it - I use libxml2 result with following mapping

No tag ... NULL
empty tag ... empty string

Important question is about mapping empty tags to Postgres. I prefer
current behave, because I have a possibility to differ between these states
on application level. If we returns NULL for empty tag, then there will not
be possible detect if XML has tag (although empty) or not. The change is
simple - just one row - but I am thinking so current behave is better.
There is possible risk of using /text() somewhere - it enforce a empty tag
with all negative impacts.

I prefer to fix doc in conformance with regress tests and append note about
mapping these corner cases from XML to relations.

What do you think about it?

Regards

Pavel

> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-03-02 08:13:02
Message-ID: CAFj8pRDmxH6xZwnFGKAcKKP=RFys+f3EC4SHHORi2=ud=tFBpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-03-02 8:04 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

> Hi
>
> 2017-03-02 1:12 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>>
>> I've been giving this a look. I started by tweaking the docs once
>> again, and while verifying that the example works as expected, I
>> replayed what I have in sgml:
>>
>> ... begin SGML paste ...
>> <para>
>> For example, given the following XML document:
>> <screen><![CDATA[
>> <ROWS>
>> <ROW id="1">
>> <COUNTRY_ID>AU</COUNTRY_ID>
>> <COUNTRY_NAME>Australia</COUNTRY_NAME>
>> </ROW>
>> <ROW id="5">
>> <COUNTRY_ID>JP</COUNTRY_ID>
>> <COUNTRY_NAME>Japan</COUNTRY_NAME>
>> <PREMIER_NAME>Sinzo Abe</PREMIER_NAME>
>> </ROW>
>> <ROW id="6">
>> <COUNTRY_ID>SG</COUNTRY_ID>
>> <COUNTRY_NAME>Singapore</COUNTRY_NAME>
>> <SIZE unit="km">791</SIZE>
>> </ROW>
>> </ROWS>
>> ]]></screen>
>>
>> the following query produces the result shown below:
>>
>> <screen><![CDATA[
>> SELECT xmltable.*
>> FROM (SELECT data FROM xmldata) x,
>> LATERAL xmltable('//ROWS/ROW'
>> PASSING data
>> COLUMNS id int PATH '@id',
>> ordinality FOR ORDINALITY,
>> country_name text PATH 'COUNTRY_NAME',
>> country_id text PATH 'COUNTRY_ID',
>> size float PATH 'SIZE[(at)unit =
>> "km"]/text()',
>> unit text PATH 'SIZE/@unit',
>> premier_name text PATH 'PREMIER_NAME'
>> DEFAULT 'not specified');
>> ... end SGML paste ...
>>
>>
>> But the query doesn't actually return a table, but instead it fails with
>> this error:
>> ERROR: invalid input syntax for type double precision: ""
>> This is because of the "size" column (if I remove SIZE from the COLUMNS
>> clause, the query returns correctly). Apparently, for the rows where
>> SIZE is not given, we try to inssert an empty string instead of a NULL
>> value, which is what I expected.
>>
>> I'm using your v44 code, but trimmed both the XML document used in SGML
>> as well as modified the query slightly to show additional features. But
>> those changes should not cause the above error ...
>>
>
> The example in doc is obsolete. Following example works without problems.
>
> SELECT xmltable.*
>
> FROM (SELECT data FROM xmldata) x,
> LATERAL xmltable('//ROWS/ROW'
> PASSING data
> COLUMNS id int PATH '@id',
> ordinality FOR ORDINALITY,
> country_name text PATH 'COUNTRY_NAME',
> country_id text PATH 'COUNTRY_ID',
> size float PATH 'SIZE[(at)unit = "km"]',
> unit text PATH 'SIZE/@unit',
> premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');
>
>
> It is related to older variants of this patch, where I explicitly mapped
> empty strings to NULL.
>
> Now, I don't do it - I use libxml2 result with following mapping
>
> No tag ... NULL
> empty tag ... empty string
>
> Important question is about mapping empty tags to Postgres. I prefer
> current behave, because I have a possibility to differ between these states
> on application level. If we returns NULL for empty tag, then there will not
> be possible detect if XML has tag (although empty) or not. The change is
> simple - just one row - but I am thinking so current behave is better.
> There is possible risk of using /text() somewhere - it enforce a empty tag
> with all negative impacts.
>
> I prefer to fix doc in conformance with regress tests and append note
> about mapping these corner cases from XML to relations.
>
> What do you think about it?
>

It is documented already

"If the <literal>PATH</> matches an empty tag the result is an empty string"

Attached new patch

cleaned documentation
regress tests is more robust
appended comment in src related to generating empty string for empty tag

Regards

Pavel

>
> Regards
>
> Pavel
>
>
>
>
>
>
>
>
>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>

Attachment Content-Type Size
xmltable-45.patch.gz application/x-gzip 33.1 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2017-03-02 17:13:53
Message-ID: 20170302171353.s4celogyelzvli5x@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> It is documented already
>
> "If the <literal>PATH</> matches an empty tag the result is an empty string"

Hmm, okay. But what we have here is not an empty tag, but a tag that is
completely missing. I don't think those two cases should be treated in
the same way ...

> Attached new patch
>
> cleaned documentation
> regress tests is more robust
> appended comment in src related to generating empty string for empty tag

Thanks, I incorporated those changes. Here's v46. I rewrote the
documentation, and fixed a couple of incorrectly copied&pasted comments
in the new executor code; I think that one looks good. In the future we
could rewrite it to avoid the need for a tuplestore, but I think the
current approach is good enough for a pg10 implementation.

Barring serious problems, I intend to commit this later today.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
xmltable-46.patch.gz application/x-gunzip 36.8 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-02 17:46:23
Message-ID: CAFj8pRAnbh+jS3+zy=ZhV7bYMynHALQrwYAuGh+2-to3Grc52A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 2. 3. 2017 18:14 napsal uživatel "Alvaro Herrera" <
alvherre(at)2ndquadrant(dot)com>:

Pavel Stehule wrote:

> It is documented already
>
> "If the <literal>PATH</> matches an empty tag the result is an empty
string"

Hmm, okay. But what we have here is not an empty tag, but a tag that is
completely missing. I don't think those two cases should be treated in
the same way ...

this information is not propagated from libxml2.

> Attached new patch
>
> cleaned documentation
> regress tests is more robust
> appended comment in src related to generating empty string for empty tag

Thanks, I incorporated those changes. Here's v46. I rewrote the
documentation, and fixed a couple of incorrectly copied&pasted comments
in the new executor code; I think that one looks good. In the future we
could rewrite it to avoid the need for a tuplestore, but I think the
current approach is good enough for a pg10 implementation.

Barring serious problems, I intend to commit this later today.

thank you very much

regards

Pavel

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-02 18:32:05
Message-ID: 20170302183205.hgz3ekvwbfnzcymb@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

So in the old (non-executor-node) implementation, you could attach WITH
ORDINALITY to the xmltable expression and it would count the output
rows, regardless of which XML document it comes from. With the new
implementation, the grammar no longer accepts it. To count output rows,
you still need to use row_number(). Maybe this is okay. This is the
example from the docs, and I add another XML document with two more rows
for xmltable. Look at the three numbering columns ...

CREATE TABLE xmldata AS SELECT
xml $$
<ROWS>
<ROW id="1">
<COUNTRY_ID>AU</COUNTRY_ID>
<COUNTRY_NAME>Australia</COUNTRY_NAME>
</ROW>
<ROW id="5">
<COUNTRY_ID>JP</COUNTRY_ID>
<COUNTRY_NAME>Japan</COUNTRY_NAME>
<PREMIER_NAME>Shinzo Abe</PREMIER_NAME>
<SIZE unit="sq_mi">145935</SIZE>
</ROW>
<ROW id="6">
<COUNTRY_ID>SG</COUNTRY_ID>
<COUNTRY_NAME>Singapore</COUNTRY_NAME>
<SIZE unit="sq_km">697</SIZE>
</ROW>
</ROWS>
$$ AS data;

insert into xmldata values ($$
<ROWS><ROW id="2"><COUNTRY_ID>CL</COUNTRY_ID><COUNTRY_NAME>Chile</COUNTRY_NAME></ROW>
<ROW id="3"><COUNTRY_ID>AR</COUNTRY_ID><COUNTRY_NAME>Argentina</COUNTRY_NAME></ROW></ROWS>$$);

SELECT ROW_NUMBER() OVER (), xmltable.*
FROM xmldata,
XMLTABLE('//ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
ordinality FOR ORDINALITY,
"COUNTRY_NAME" text,
country_id text PATH 'COUNTRY_ID',
size_sq_km float PATH 'SIZE[(at)unit = "sq_km"]',
size_other text PATH
'concat(SIZE[(at)unit!="sq_km"], " ", SIZE[(at)unit!="sq_km"]/@unit)',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified')
;

row_number │ id │ ordinality │ COUNTRY_NAME │ country_id │ size_sq_km │ size_other │ premier_name
────────────┼────┼────────────┼──────────────┼────────────┼────────────┼──────────────┼───────────────
1 │ 1 │ 1 │ Australia │ AU │ │ │ not specified
2 │ 5 │ 2 │ Japan │ JP │ │ 145935 sq_mi │ Shinzo Abe
3 │ 6 │ 3 │ Singapore │ SG │ 697 │ │ not specified
4 │ 2 │ 1 │ Chile │ CL │ │ │ not specified
5 │ 3 │ 2 │ Argentina │ AR │ │ │ not specified

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-02 19:17:12
Message-ID: CAFj8pRABdOReqEYxgeDPy80BRfy3kv3xT9D1DpqJvfdx-DizkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> So in the old (non-executor-node) implementation, you could attach WITH
> ORDINALITY to the xmltable expression and it would count the output
> rows, regardless of which XML document it comes from. With the new
> implementation, the grammar no longer accepts it. To count output rows,
> you still need to use row_number(). Maybe this is okay. This is the
> example from the docs, and I add another XML document with two more rows
> for xmltable. Look at the three numbering columns ...
>

It is expected - now tablefunc are not special case of SRF, so it lost all
SRF functionality. It is not critical lost - it supports internally FOR
ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced
to support WITH ORDINALITY in future, but I have not any use case for it.

Regards

Pavel

>
> CREATE TABLE xmldata AS SELECT
> xml $$
> <ROWS>
> <ROW id="1">
> <COUNTRY_ID>AU</COUNTRY_ID>
> <COUNTRY_NAME>Australia</COUNTRY_NAME>
> </ROW>
> <ROW id="5">
> <COUNTRY_ID>JP</COUNTRY_ID>
> <COUNTRY_NAME>Japan</COUNTRY_NAME>
> <PREMIER_NAME>Shinzo Abe</PREMIER_NAME>
> <SIZE unit="sq_mi">145935</SIZE>
> </ROW>
> <ROW id="6">
> <COUNTRY_ID>SG</COUNTRY_ID>
> <COUNTRY_NAME>Singapore</COUNTRY_NAME>
> <SIZE unit="sq_km">697</SIZE>
> </ROW>
> </ROWS>
> $$ AS data;
>
> insert into xmldata values ($$
> <ROWS><ROW id="2"><COUNTRY_ID>CL</COUNTRY_ID><COUNTRY_NAME>
> Chile</COUNTRY_NAME></ROW>
> <ROW id="3"><COUNTRY_ID>AR</COUNTRY_ID><COUNTRY_NAME>
> Argentina</COUNTRY_NAME></ROW></ROWS>$$);
>
> SELECT ROW_NUMBER() OVER (), xmltable.*
> FROM xmldata,
> XMLTABLE('//ROWS/ROW'
> PASSING data
> COLUMNS id int PATH '@id',
> ordinality FOR ORDINALITY,
> "COUNTRY_NAME" text,
> country_id text PATH 'COUNTRY_ID',
> size_sq_km float PATH 'SIZE[(at)unit = "sq_km"]',
> size_other text PATH
> 'concat(SIZE[(at)unit!="sq_km"], " ",
> SIZE[(at)unit!="sq_km"]/@unit)',
> premier_name text PATH 'PREMIER_NAME' DEFAULT 'not
> specified')
> ;
>
> row_number │ id │ ordinality │ COUNTRY_NAME │ country_id │ size_sq_km │
> size_other │ premier_name
> ────────────┼────┼────────────┼──────────────┼────────────┼─
> ───────────┼──────────────┼───────────────
> 1 │ 1 │ 1 │ Australia │ AU │ │
> │ not specified
> 2 │ 5 │ 2 │ Japan │ JP │ │
> 145935 sq_mi │ Shinzo Abe
> 3 │ 6 │ 3 │ Singapore │ SG │ 697 │
> │ not specified
> 4 │ 2 │ 1 │ Chile │ CL │ │
> │ not specified
> 5 │ 3 │ 2 │ Argentina │ AR │ │
> │ not specified
>
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-02 21:35:44
Message-ID: 20170302213544.ml3lgkip4rwirjnd@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > So in the old (non-executor-node) implementation, you could attach WITH
> > ORDINALITY to the xmltable expression and it would count the output
> > rows, regardless of which XML document it comes from. With the new
> > implementation, the grammar no longer accepts it. To count output rows,
> > you still need to use row_number(). Maybe this is okay. This is the
> > example from the docs, and I add another XML document with two more rows
> > for xmltable. Look at the three numbering columns ...
> >
>
> It is expected - now tablefunc are not special case of SRF, so it lost all
> SRF functionality. It is not critical lost - it supports internally FOR
> ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced
> to support WITH ORDINALITY in future, but I have not any use case for it.

Fine.

After looking at the new executor code a bit, I noticed that we don't
need the resultSlot anymore; we can use the ss_ScanTupleSlot instead.
Because resultSlot was being used in the xml.c code (which already
appeared a bit dubious to me), I changed the interface so that instead
the things that it read from it are passed as parameters -- namely, in
InitBuilder we pass natts, and in GetValue we pass typid and typmod.

Secondly, I noticed we have the FetchRow routine produce a minimal
tuple, put it in a slot; then its caller takes the slot and put the
tuple in the tuplestore. This is pointless; we can just have FetchRow
put the tuple in the tuplestore directly and not bother with any slot
manipulations there. This simplifies the code a bit.

Here's v47 with those changes.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
xmltable-47.patch.gz application/x-gunzip 36.8 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-03 09:28:54
Message-ID: CAFj8pRCfSUihcWts78_HmcTFnAfgeCk304wZ6w+k+ADt=eQt5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-03-02 22:35 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > 2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
> >
> > > So in the old (non-executor-node) implementation, you could attach WITH
> > > ORDINALITY to the xmltable expression and it would count the output
> > > rows, regardless of which XML document it comes from. With the new
> > > implementation, the grammar no longer accepts it. To count output
> rows,
> > > you still need to use row_number(). Maybe this is okay. This is the
> > > example from the docs, and I add another XML document with two more
> rows
> > > for xmltable. Look at the three numbering columns ...
> > >
> >
> > It is expected - now tablefunc are not special case of SRF, so it lost
> all
> > SRF functionality. It is not critical lost - it supports internally FOR
> > ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced
> > to support WITH ORDINALITY in future, but I have not any use case for it.
>
> Fine.
>
> After looking at the new executor code a bit, I noticed that we don't
> need the resultSlot anymore; we can use the ss_ScanTupleSlot instead.
> Because resultSlot was being used in the xml.c code (which already
> appeared a bit dubious to me), I changed the interface so that instead
> the things that it read from it are passed as parameters -- namely, in
> InitBuilder we pass natts, and in GetValue we pass typid and typmod.
>

I had similar feeling

>
> Secondly, I noticed we have the FetchRow routine produce a minimal
> tuple, put it in a slot; then its caller takes the slot and put the
> tuple in the tuplestore. This is pointless; we can just have FetchRow
> put the tuple in the tuplestore directly and not bother with any slot
> manipulations there. This simplifies the code a bit.
>
>
has sense

attached update with fixed tests

Regards

Pavel

> Here's v47 with those changes.
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
xmltable-48.patch.gz application/x-gzip 37.4 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-03 18:15:11
Message-ID: 20170303181511.jzfffaksrn22t2dy@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:

> attached update with fixed tests

Heh, I noticed that you removed the libxml "context" lines that
differentiate xml.out from xml_2.out when doing this. My implementation
emits those lines, so it was failing for me. I restored them.

I also changed a few things to avoid copying into TableFuncScanState
things that come from the TableFunc itself, since the executor state
node can grab them from the plan node. Let's do that. So instead of
"evalcols" the code now checks that the column list is empty; and also,
read the ordinality column number from the plan node.

I have to bounce this back to you one more time, hopefully the last one
I hope. Two things:

1. Please verify that pg_stat_statements behaves correctly. The patch
doesn't have changes to contrib/ so without testing I'm guessing that it
doesn't work. I think something very simple should do.

2. As I've complained many times, I find the way we manage an empty
COLUMNS clause pretty bad. The standard doesn't require that syntax
(COLUMNS is required), and I don't like the implementation, so why not
provide the feature in a different way? My proposal is to change the
column options in gram.y to be something like this:

xmltable_column_option_el:
IDENT b_expr
{ $$ = makeDefElem($1, $2, @1); }
| DEFAULT b_expr
{ $$ = makeDefElem("default", $2, @1); }
| FULL VALUE_P
{ $$ = makeDefElem("full_value", NULL, @1); }
| NOT NULL_P
{ $$ = makeDefElem("is_not_null", (Node *) makeInteger(true), @1); }
| NULL_P
{ $$ = makeDefElem("is_not_null", (Node *) makeInteger(false), @1); }
;

Note the FULL VALUE. Then we can process it like

else if (strcmp(defel->defname, "full_value") == 0)
{
if (fc->colexpr != NULL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("FULL ROW may not be specified together with PATH"),
parser_errposition(defel->location)));
fc->full_row = true;
}

So if you want the full XML value of the row, you have to specify it,

.. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... )

This has the extra feature that you can add, say, an ORDINALITY column
together with the XML value, something that you cannot do with the
current implementation.

It doesn't have to be FULL VALUE, but I couldn't think of anything
better. (I didn't want to add any new keywords for this.) If you have
a better idea, let's discuss.

Code-wise, this completely removes the "else" block in transformRangeTableFunc
which I marked with an XXX comment. That's a good thing -- let's get
rid of that. Also, it should remove the need for the separate "if
!columns" case in tfuncLoadRows. All those cases would become part of
the normal code path instead of special cases. I think
XmlTableSetColumnFilter doesn't need any change (we just don't call if
for the FULL VALUE row); and XmlTableGetValue needs a special case that
if the column filter is NULL (i.e. SetColumnFilter wasn't called for
that column) then return the whole row.

Of course, this opens an implementation issue: how do you annotate
things from parse analysis till execution? The current TableFunc
structure doesn't help, because there are only lists of column names and
expressions; and we can't use the case of a NULL colexpr, because that
case is already used by the column filter being the column name (a
feature required by the standard). A simple way would be to have a new
"colno" struct member, to store a column number for the column marked
FULL VALUE (just like ordinalitycol). This means you can't have more
than one of those FULL VALUE columns, but that seems okay.

(Of course, this means that the two cases that have no COLUMNS in the
"xmltable" production in gram.y should go away).

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
xmltable-49.patch.gz application/x-gunzip 36.6 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-03 18:42:11
Message-ID: CAFj8pRAPau_HzD1Yo_B64t6X4bv-XXXEND5aKRo-A0w47BK6KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
>
> > attached update with fixed tests
>
> Heh, I noticed that you removed the libxml "context" lines that
> differentiate xml.out from xml_2.out when doing this. My implementation
> emits those lines, so it was failing for me. I restored them.
>
> I also changed a few things to avoid copying into TableFuncScanState
> things that come from the TableFunc itself, since the executor state
> node can grab them from the plan node. Let's do that. So instead of
> "evalcols" the code now checks that the column list is empty; and also,
> read the ordinality column number from the plan node.
>
> I have to bounce this back to you one more time, hopefully the last one
> I hope. Two things:
>
> 1. Please verify that pg_stat_statements behaves correctly. The patch
> doesn't have changes to contrib/ so without testing I'm guessing that it
> doesn't work. I think something very simple should do.
>
> 2. As I've complained many times, I find the way we manage an empty
> COLUMNS clause pretty bad. The standard doesn't require that syntax
> (COLUMNS is required), and I don't like the implementation, so why not
> provide the feature in a different way? My proposal is to change the
> column options in gram.y to be something like this:

The clause COLUMNS is optional on Oracle and DB2

So I prefer a Oracle, DB2 design. If you are strongly against it, then we
can remove it to be ANSI/SQL only.

I am don't see an good idea to introduce third syntax.

> xmltable_column_option_el:
> IDENT b_expr
> { $$ = makeDefElem($1, $2, @1); }
> | DEFAULT b_expr
> { $$ = makeDefElem("default", $2, @1); }
> | FULL VALUE_P
> { $$ = makeDefElem("full_value", NULL,
> @1); }
> | NOT NULL_P
> { $$ = makeDefElem("is_not_null", (Node *)
> makeInteger(true), @1); }
> | NULL_P
> { $$ = makeDefElem("is_not_null", (Node *)
> makeInteger(false), @1); }
> ;
>
> Note the FULL VALUE. Then we can process it like
>
> else if (strcmp(defel->defname, "full_value") == 0)
> {
> if (fc->colexpr != NULL)
> ereport(ERROR,
>
> (errcode(ERRCODE_SYNTAX_ERROR),
> errmsg("FULL ROW
> may not be specified together with PATH"),
>
> parser_errposition(defel->location)));
> fc->full_row = true;
> }
>
> So if you want the full XML value of the row, you have to specify it,
>
> .. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... )
>
> This has the extra feature that you can add, say, an ORDINALITY column
> together with the XML value, something that you cannot do with the
> current implementation.
>
> It doesn't have to be FULL VALUE, but I couldn't think of anything
> better. (I didn't want to add any new keywords for this.) If you have
> a better idea, let's discuss.
>

I don't see a introduction own syntax as necessary solution here - use
Oracle, DB2 compatible syntax, or ANSI.

It is partially corner case - the benefit of this case is almost bigger
compatibility with mentioned databases.

>
> Code-wise, this completely removes the "else" block in
> transformRangeTableFunc
> which I marked with an XXX comment. That's a good thing -- let's get
> rid of that. Also, it should remove the need for the separate "if
> !columns" case in tfuncLoadRows. All those cases would become part of
> the normal code path instead of special cases. I think
> XmlTableSetColumnFilter doesn't need any change (we just don't call if
> for the FULL VALUE row); and XmlTableGetValue needs a special case that
> if the column filter is NULL (i.e. SetColumnFilter wasn't called for
> that column) then return the whole row.
>
>
> Of course, this opens an implementation issue: how do you annotate
> things from parse analysis till execution? The current TableFunc
> structure doesn't help, because there are only lists of column names and
> expressions; and we can't use the case of a NULL colexpr, because that
> case is already used by the column filter being the column name (a
> feature required by the standard). A simple way would be to have a new
> "colno" struct member, to store a column number for the column marked
> FULL VALUE (just like ordinalitycol). This means you can't have more
> than one of those FULL VALUE columns, but that seems okay.
>
>
> (Of course, this means that the two cases that have no COLUMNS in the
> "xmltable" production in gram.y should go away).
>

You are commiter, and you should to decide - as first I prefer current
state, as second a remove this part - it should be good for you too,
because code that you don't like will be left.

I dislike introduce new syntax - this case is not too important for this.

Regards

Pavel

> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-03 19:23:03
Message-ID: CAFj8pRC6nPFsBSV4pEkzvZ-OaAq=j5t0bR_D24kPd_E1tes4Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-03-03 19:42 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> 2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
>> Pavel Stehule wrote:
>>
>> > attached update with fixed tests
>>
>> Heh, I noticed that you removed the libxml "context" lines that
>> differentiate xml.out from xml_2.out when doing this. My implementation
>> emits those lines, so it was failing for me. I restored them.
>>
>> I also changed a few things to avoid copying into TableFuncScanState
>> things that come from the TableFunc itself, since the executor state
>> node can grab them from the plan node. Let's do that. So instead of
>> "evalcols" the code now checks that the column list is empty; and also,
>> read the ordinality column number from the plan node.
>>
>> I have to bounce this back to you one more time, hopefully the last one
>> I hope. Two things:
>>
>> 1. Please verify that pg_stat_statements behaves correctly. The patch
>> doesn't have changes to contrib/ so without testing I'm guessing that it
>> doesn't work. I think something very simple should do.
>>
>> 2. As I've complained many times, I find the way we manage an empty
>> COLUMNS clause pretty bad. The standard doesn't require that syntax
>> (COLUMNS is required), and I don't like the implementation, so why not
>> provide the feature in a different way? My proposal is to change the
>> column options in gram.y to be something like this:
>
>
> The clause COLUMNS is optional on Oracle and DB2
>
> So I prefer a Oracle, DB2 design. If you are strongly against it, then we
> can remove it to be ANSI/SQL only.
>
> I am don't see an good idea to introduce third syntax.
>
>
>> xmltable_column_option_el:
>> IDENT b_expr
>> { $$ = makeDefElem($1, $2, @1); }
>> | DEFAULT b_expr
>> { $$ = makeDefElem("default", $2, @1); }
>> | FULL VALUE_P
>> { $$ = makeDefElem("full_value", NULL,
>> @1); }
>> | NOT NULL_P
>> { $$ = makeDefElem("is_not_null", (Node
>> *) makeInteger(true), @1); }
>> | NULL_P
>> { $$ = makeDefElem("is_not_null", (Node
>> *) makeInteger(false), @1); }
>> ;
>>
>> Note the FULL VALUE. Then we can process it like
>>
>> else if (strcmp(defel->defname, "full_value") ==
>> 0)
>> {
>> if (fc->colexpr != NULL)
>> ereport(ERROR,
>>
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> errmsg("FULL ROW
>> may not be specified together with PATH"),
>>
>> parser_errposition(defel->location)));
>> fc->full_row = true;
>> }
>>
>> So if you want the full XML value of the row, you have to specify it,
>>
>> .. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... )
>>
>> This has the extra feature that you can add, say, an ORDINALITY column
>> together with the XML value, something that you cannot do with the
>> current implementation.
>>
>> It doesn't have to be FULL VALUE, but I couldn't think of anything
>> better. (I didn't want to add any new keywords for this.) If you have
>> a better idea, let's discuss.
>>
>
> I don't see a introduction own syntax as necessary solution here - use
> Oracle, DB2 compatible syntax, or ANSI.
>
> It is partially corner case - the benefit of this case is almost bigger
> compatibility with mentioned databases.
>
>
>>
>> Code-wise, this completely removes the "else" block in
>> transformRangeTableFunc
>> which I marked with an XXX comment. That's a good thing -- let's get
>> rid of that. Also, it should remove the need for the separate "if
>> !columns" case in tfuncLoadRows. All those cases would become part of
>> the normal code path instead of special cases. I think
>> XmlTableSetColumnFilter doesn't need any change (we just don't call if
>> for the FULL VALUE row); and XmlTableGetValue needs a special case that
>> if the column filter is NULL (i.e. SetColumnFilter wasn't called for
>> that column) then return the whole row.
>>
>>
>> Of course, this opens an implementation issue: how do you annotate
>> things from parse analysis till execution? The current TableFunc
>> structure doesn't help, because there are only lists of column names and
>> expressions; and we can't use the case of a NULL colexpr, because that
>> case is already used by the column filter being the column name (a
>> feature required by the standard). A simple way would be to have a new
>> "colno" struct member, to store a column number for the column marked
>> FULL VALUE (just like ordinalitycol). This means you can't have more
>> than one of those FULL VALUE columns, but that seems okay.
>>
>>
>> (Of course, this means that the two cases that have no COLUMNS in the
>> "xmltable" production in gram.y should go away).
>>
>
> You are commiter, and you should to decide - as first I prefer current
> state, as second a remove this part - it should be good for you too,
> because code that you don't like will be left.
>
> I dislike introduce new syntax - this case is not too important for this.
>
>
I am able to prepare reduced version if we do a agreement

Regards

Pavel

> Regards
>
> Pavel
>
>
>> --
>> Álvaro Herrera https://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-03 20:04:28
Message-ID: 20170303200428.4whzced44cjkk2ru@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> > 2. As I've complained many times, I find the way we manage an empty
> > COLUMNS clause pretty bad. The standard doesn't require that syntax
> > (COLUMNS is required), and I don't like the implementation, so why not
> > provide the feature in a different way? My proposal is to change the
> > column options in gram.y to be something like this:
>
> The clause COLUMNS is optional on Oracle and DB2
>
> So I prefer a Oracle, DB2 design. If you are strongly against it, then we
> can remove it to be ANSI/SQL only.
>
> I am don't see an good idea to introduce third syntax.

OK. I think trying to be syntax compatible with DB2 or Oracle is a lost
cause, because the syntax used in the XPath expressions seems different
-- I think Oracle uses XQuery (which we don't support) and DB2 uses ...
not sure what it is, but it doesn't work in our implementation
(stuff like '$d/employees/emp' in the row expression.)

In existing applications using those Oracle/DB2, is it common to omit
the COLUMNS clause? I searched for "xmltable oracle" and had a look at
the first few hits outside of the oracle docs:
http://viralpatel.net/blogs/oracle-xmltable-tutorial/
http://www.dba-oracle.com/t_xmltable.htm
http://stackoverflow.com/questions/12690868/how-to-use-xmltable-in-oracle
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9533111800346252295
http://stackoverflow.com/questions/1222570/what-is-an-xmltable
https://community.oracle.com/thread/3955198

Not a single one of these omit the COLUMNS clause (though the second one
mentions that the clause can be omitted).

I also looked at a few samples with DB2 -- same thing; it is possible,
but is it common?

Anyway, I noticed that "xml PATH '.'" can be used to obtain the full XML
of the row, which I think is the feature I wanted, so I think we're
covered and we can omit the case with no COLUMNS, since we already have
the feature in another way. No need to implement anything further, and
we can rip out the special case I don't like. Example:

CREATE TABLE EMPLOYEES
(
id integer,
data XML
);
INSERT INTO EMPLOYEES
VALUES (1, '<Employees>
<Employee emplid="1111" type="admin">
<firstname>John</firstname>
<lastname>Watson</lastname>
<age>30</age>
<email>johnwatson(at)sh(dot)com</email>
</Employee>
<Employee emplid="2222" type="admin">
<firstname>Sherlock</firstname>
<lastname>Homes</lastname>
<age>32</age>
<email>sherlock(at)sh(dot)com</email>
</Employee>
<Employee emplid="3333" type="user">
<firstname>Jim</firstname>
<lastname>Moriarty</lastname>
<age>52</age>
<email>jim(at)sh(dot)com</email>
</Employee>
<Employee emplid="4444" type="user">
<firstname>Mycroft</firstname>
<lastname>Holmes</lastname>
<age>41</age>
<email>mycroft(at)sh(dot)com</email>
</Employee>
</Employees>');

This is with COLUMNS omitted:

alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data);
xmltable
──────────────────────────────────────────
<Employee emplid="1111" type="admin"> ↵
<firstname>John</firstname> ↵
<lastname>Watson</lastname> ↵
<age>30</age> ↵
<email>johnwatson(at)sh(dot)com</email>↵
</Employee>
<Employee emplid="2222" type="admin"> ↵
<firstname>Sherlock</firstname> ↵
<lastname>Homes</lastname> ↵
<age>32</age> ↵
<email>sherlock(at)sh(dot)com</email> ↵
</Employee>
<Employee emplid="3333" type="user"> ↵
<firstname>Jim</firstname> ↵
<lastname>Moriarty</lastname> ↵
<age>52</age> ↵
<email>jim(at)sh(dot)com</email> ↵
</Employee>
<Employee emplid="4444" type="user"> ↵
<firstname>Mycroft</firstname> ↵
<lastname>Holmes</lastname> ↵
<age>41</age> ↵
<email>mycroft(at)sh(dot)com</email> ↵
</Employee>

and this is what you get with "xml PATH '.'" (I threw in ORDINALITY just
for fun):

alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data columns row_number for ordinality, emp xml path '.');
row_number │ emp
────────────┼──────────────────────────────────────────
1 │ <Employee emplid="1111" type="admin"> ↵
│ <firstname>John</firstname> ↵
│ <lastname>Watson</lastname> ↵
│ <age>30</age> ↵
│ <email>johnwatson(at)sh(dot)com</email>↵
│ </Employee>
2 │ <Employee emplid="2222" type="admin"> ↵
│ <firstname>Sherlock</firstname> ↵
│ <lastname>Homes</lastname> ↵
│ <age>32</age> ↵
│ <email>sherlock(at)sh(dot)com</email> ↵
│ </Employee>
3 │ <Employee emplid="3333" type="user"> ↵
│ <firstname>Jim</firstname> ↵
│ <lastname>Moriarty</lastname> ↵
│ <age>52</age> ↵
│ <email>jim(at)sh(dot)com</email> ↵
│ </Employee>
4 │ <Employee emplid="4444" type="user"> ↵
│ <firstname>Mycroft</firstname> ↵
│ <lastname>Holmes</lastname> ↵
│ <age>41</age> ↵
│ <email>mycroft(at)sh(dot)com</email> ↵
│ </Employee>

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-03 21:41:57
Message-ID: CAFj8pRBrKhBaPcacv1_6xRPN=GTfQdu+co+YRZaF-9umOawaQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-03-03 21:04 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > 2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > > 2. As I've complained many times, I find the way we manage an empty
> > > COLUMNS clause pretty bad. The standard doesn't require that syntax
> > > (COLUMNS is required), and I don't like the implementation, so why not
> > > provide the feature in a different way? My proposal is to change the
> > > column options in gram.y to be something like this:
> >
> > The clause COLUMNS is optional on Oracle and DB2
> >
> > So I prefer a Oracle, DB2 design. If you are strongly against it, then we
> > can remove it to be ANSI/SQL only.
> >
> > I am don't see an good idea to introduce third syntax.
>
> OK. I think trying to be syntax compatible with DB2 or Oracle is a lost
> cause, because the syntax used in the XPath expressions seems different
> -- I think Oracle uses XQuery (which we don't support) and DB2 uses ...
> not sure what it is, but it doesn't work in our implementation
> (stuff like '$d/employees/emp' in the row expression.)
>

100% compatibility is not possible - but XPath is subset of XQuery and in
reality - the full XQuery examples of XMLTABLE is not often.

Almost all examples of usage XMLTABLE, what I found in blogs, uses XPath
only

>
> In existing applications using those Oracle/DB2, is it common to omit
> the COLUMNS clause? I searched for "xmltable oracle" and had a look at
> the first few hits outside of the oracle docs:
> http://viralpatel.net/blogs/oracle-xmltable-tutorial/
> http://www.dba-oracle.com/t_xmltable.htm
> http://stackoverflow.com/questions/12690868/how-to-use-xmltable-in-oracle
> https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:
> 9533111800346252295
> http://stackoverflow.com/questions/1222570/what-is-an-xmltable
> https://community.oracle.com/thread/3955198
>
> Not a single one of these omit the COLUMNS clause (though the second one
> mentions that the clause can be omitted).
>
> I also looked at a few samples with DB2 -- same thing; it is possible,
> but is it common?
>

I don't think so it is common - it is corner case - and I can live without
it well

>
> Anyway, I noticed that "xml PATH '.'" can be used to obtain the full XML
> of the row, which I think is the feature I wanted, so I think we're
> covered and we can omit the case with no COLUMNS, since we already have
> the feature in another way. No need to implement anything further, and
> we can rip out the special case I don't like. Example:
>

yes,

>
> CREATE TABLE EMPLOYEES
> (
> id integer,
> data XML
> );
> INSERT INTO EMPLOYEES
> VALUES (1, '<Employees>
> <Employee emplid="1111" type="admin">
> <firstname>John</firstname>
> <lastname>Watson</lastname>
> <age>30</age>
> <email>johnwatson(at)sh(dot)com</email>
> </Employee>
> <Employee emplid="2222" type="admin">
> <firstname>Sherlock</firstname>
> <lastname>Homes</lastname>
> <age>32</age>
> <email>sherlock(at)sh(dot)com</email>
> </Employee>
> <Employee emplid="3333" type="user">
> <firstname>Jim</firstname>
> <lastname>Moriarty</lastname>
> <age>52</age>
> <email>jim(at)sh(dot)com</email>
> </Employee>
> <Employee emplid="4444" type="user">
> <firstname>Mycroft</firstname>
> <lastname>Holmes</lastname>
> <age>41</age>
> <email>mycroft(at)sh(dot)com</email>
> </Employee>
> </Employees>');
>
> This is with COLUMNS omitted:
>
> alvherre=# select xmltable.* from employees,
> xmltable('/Employees/Employee' passing data);
> xmltable
> ──────────────────────────────────────────
> <Employee emplid="1111" type="admin"> ↵
> <firstname>John</firstname> ↵
> <lastname>Watson</lastname> ↵
> <age>30</age> ↵
> <email>johnwatson(at)sh(dot)com</email>↵
> </Employee>
> <Employee emplid="2222" type="admin"> ↵
> <firstname>Sherlock</firstname> ↵
> <lastname>Homes</lastname> ↵
> <age>32</age> ↵
> <email>sherlock(at)sh(dot)com</email> ↵
> </Employee>
> <Employee emplid="3333" type="user"> ↵
> <firstname>Jim</firstname> ↵
> <lastname>Moriarty</lastname> ↵
> <age>52</age> ↵
> <email>jim(at)sh(dot)com</email> ↵
> </Employee>
> <Employee emplid="4444" type="user"> ↵
> <firstname>Mycroft</firstname> ↵
> <lastname>Holmes</lastname> ↵
> <age>41</age> ↵
> <email>mycroft(at)sh(dot)com</email> ↵
> </Employee>
>
> and this is what you get with "xml PATH '.'" (I threw in ORDINALITY just
> for fun):
>
> alvherre=# select xmltable.* from employees,
> xmltable('/Employees/Employee' passing data columns row_number for
> ordinality, emp xml path '.');
> row_number │ emp
> ────────────┼──────────────────────────────────────────
> 1 │ <Employee emplid="1111" type="admin"> ↵
> │ <firstname>John</firstname> ↵
> │ <lastname>Watson</lastname> ↵
> │ <age>30</age> ↵
> │ <email>johnwatson(at)sh(dot)com</email>↵
> │ </Employee>
> 2 │ <Employee emplid="2222" type="admin"> ↵
> │ <firstname>Sherlock</firstname> ↵
> │ <lastname>Homes</lastname> ↵
> │ <age>32</age> ↵
> │ <email>sherlock(at)sh(dot)com</email> ↵
> │ </Employee>
> 3 │ <Employee emplid="3333" type="user"> ↵
> │ <firstname>Jim</firstname> ↵
> │ <lastname>Moriarty</lastname> ↵
> │ <age>52</age> ↵
> │ <email>jim(at)sh(dot)com</email> ↵
> │ </Employee>
> 4 │ <Employee emplid="4444" type="user"> ↵
> │ <firstname>Mycroft</firstname> ↵
> │ <lastname>Holmes</lastname> ↵
> │ <age>41</age> ↵
> │ <email>mycroft(at)sh(dot)com</email> ↵
> │ </Employee>
>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-05 09:32:18
Message-ID: CAFj8pRAjrX_dtdUdfH2DjEdsh3NZkXV3PLJTjUnyXKji9ChteQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

I used your idea about special columns when COLUMNS are not explicitly
defined.

All lines that you are dislike removed.

Now, almost all code, related to this behave, is in next few lines.

+ /*
+ * Use implicit column when it is necessary. The COLUMNS clause is
optional
+ * on Oracle and DB2. In this case a result is complete row of XML type.
+ */
+ if (rtf->columns == NIL)
+ {
+ RangeTableFuncCol *fc = makeNode(RangeTableFuncCol);
+ A_Const *n = makeNode(A_Const);
+
+ fc->colname = "xmltable";
+ fc->typeName = makeTypeNameFromOid(XMLOID, -1);
+ n->val.type = T_String;
+ n->val.val.str = ".";
+ n->location = -1;
+
+ fc->colexpr = (Node *) n;
+ rtf->columns = list_make1(fc);
+ }

all regress tests passing.

Regards

Pavel

Attachment Content-Type Size
xmltable-50.patch.gz application/x-gzip 36.5 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-08 16:01:01
Message-ID: 20170308160101.heisrxxwntwdat56@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> Hi
>
> I used your idea about special columns when COLUMNS are not explicitly
> defined.
>
> All lines that you are dislike removed.

I just pushed XMLTABLE, after some additional changes. Please test it
thoroughly and report any problems.

I didn't add the change you proposed here to keep COLUMNS optional;
instead, I just made COLUMNS mandatory. I think what you propose here
is not entirely out of the question, but you left out ruleutils.c
support for it, so I decided to leave it aside for now so that I could
get this patch out of my plate once and for all. If you really want
that feature, you can submit another patch for it and discuss with the
RMT whether it belongs in PG10 or not.

Some changes I made:
* I added some pg_stat_statements support. It works fine for simple
tests, but deeper testing of it would be appreciated.

* I removed the "buildercxt" memory context. It seemed mostly
pointless, and I was disturbed by the MemoryContextResetOnly().
Per-value memory still uses the per-value memory context, but the rest
of the stuff is in the per-query context, which should be pretty much
the same.

* Desultory stylistic changes

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-08 16:10:53
Message-ID: CAFj8pRCnHE1ZzY+-BqzOr2Wsq0mwF3J_8GgWMNWFx176O=6Wzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > Hi
> >
> > I used your idea about special columns when COLUMNS are not explicitly
> > defined.
> >
> > All lines that you are dislike removed.
>
> I just pushed XMLTABLE, after some additional changes. Please test it
> thoroughly and report any problems.
>

Thank you

>
> I didn't add the change you proposed here to keep COLUMNS optional;
> instead, I just made COLUMNS mandatory. I think what you propose here
> is not entirely out of the question, but you left out ruleutils.c
> support for it, so I decided to leave it aside for now so that I could
> get this patch out of my plate once and for all. If you really want
> that feature, you can submit another patch for it and discuss with the
> RMT whether it belongs in PG10 or not.
>

It is interesting feature - because it replaces XPATH function, but not
important enough.

For daily work the default schema support is much more interesting.

>
> Some changes I made:
> * I added some pg_stat_statements support. It works fine for simple
> tests, but deeper testing of it would be appreciated.
>
> * I removed the "buildercxt" memory context. It seemed mostly
> pointless, and I was disturbed by the MemoryContextResetOnly().
> Per-value memory still uses the per-value memory context, but the rest
> of the stuff is in the per-query context, which should be pretty much
> the same.
>
> * Desultory stylistic changes
>

ok

Regards

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-08 16:32:19
Message-ID: 20170308163219.lmysss6q4hpornf4@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> > I didn't add the change you proposed here to keep COLUMNS optional;
> > instead, I just made COLUMNS mandatory. I think what you propose here
> > is not entirely out of the question, but you left out ruleutils.c
> > support for it, so I decided to leave it aside for now so that I could
> > get this patch out of my plate once and for all. If you really want
> > that feature, you can submit another patch for it and discuss with the
> > RMT whether it belongs in PG10 or not.
>
> It is interesting feature - because it replaces XPATH function, but not
> important enough.

OK.

> For daily work the default schema support is much more interesting.

Let's see that one, then. It was part of the original submission so
depending on how the patch we looks can still cram it in. But other
patches have priority for me now.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-08 16:46:48
Message-ID: CAFj8pRAZ0jKPCu58SQZ4r1ExCAwhw4GW_y3NAouTmUpW6OFZtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-03-08 17:32 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> Pavel Stehule wrote:
> > 2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > > I didn't add the change you proposed here to keep COLUMNS optional;
> > > instead, I just made COLUMNS mandatory. I think what you propose here
> > > is not entirely out of the question, but you left out ruleutils.c
> > > support for it, so I decided to leave it aside for now so that I could
> > > get this patch out of my plate once and for all. If you really want
> > > that feature, you can submit another patch for it and discuss with the
> > > RMT whether it belongs in PG10 or not.
> >
> > It is interesting feature - because it replaces XPATH function, but not
> > important enough.
>
> OK.
>
> > For daily work the default schema support is much more interesting.
>
> Let's see that one, then. It was part of the original submission so
> depending on how the patch we looks can still cram it in. But other
> patches have priority for me now.
>

It is theme for 11

Thank you very much

Pavel

>
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: patch: function xmltable
Date: 2017-03-08 17:01:37
Message-ID: 20170308170137.ae4uwi2smwkeg26o@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2017-03-08 17:32 GMT+01:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:

> > > For daily work the default schema support is much more interesting.
> >
> > Let's see that one, then. It was part of the original submission so
> > depending on how the patch we looks can still cram it in. But other
> > patches have priority for me now.
>
> It is theme for 11

Ah, great.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services