Quick Links

Re: Set query_id for query contained in utility statement

Lists:	Postg롤 토토SQL :

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Set query_id for query contained in utility statement
Date:	2024-08-02 08:21:53
Message-ID:	CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

Some utility statements like Explain, CreateTableAs and DeclareCursor
contain a query which will be planned and executed. During post parse,
only the top utility statement is jumbled, leaving the contained query
without a query_id set. Explain does the query jumble in ExplainQuery
but for the contained query but CreateTableAs and DeclareCursor were
left with unset query_id.

This leads to extensions relying on query_id like pg_stat_statements
to not be able to track those nested queries as the query_id was 0.

This patch fixes this by recursively jumbling queries contained in
those utility statements during post_parse, setting the query_id for
those contained queries and removing the need for ExplainQuery to do
it for the Explain statements.

Regards,
Anthonin Bonnefoy

Attachment	Content-Type	Size
v1-0001-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	17.3 KB

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-08-05 07:19:08
Message-ID:	CAO6_XqoXFCRMviF5QOgqnmQbgbUOKdWmPo9uNob_hzP3d+-GjQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've realised my initial approach was wrong, calling the post parse
for all nested queries in analyze.c prevents extension like pgss to
correctly track the query's nesting level.

I've changed the approach to replicate what's done in ExplainQuery to
both CreateTableAs and DeclareCursor: Jumble the query contained by
the utility statement and call the post parse hook before it is
planned or executed. Additionally, explain's nested query can itself
be a CreateTableAs or DeclareCursor which also needs to be jumbled.
The issue is visible when explaining a CreateTableAs or DeclareCursor
Query, the queryId is missing despite the verbose.

EXPLAIN (verbose) create table test_t as select 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
Output: 1

Post patch, the query id is correctly displayed.

EXPLAIN (verbose) create table test_t as select 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
Output: 1
Query Identifier: 2800308901962295548

Regards,
Anthonin Bonnefoy

Attachment	Content-Type	Size
v2-0001-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	19.1 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-08-26 03:26:00
Message-ID:	CACJufxF_XxHZ8h5RkJtsjSyNtqdxuQg04OTaoGg-RrBH8RSHRw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 5, 2024 at 3:19 PM Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
>
> I've realised my initial approach was wrong, calling the post parse
> for all nested queries in analyze.c prevents extension like pgss to
> correctly track the query's nesting level.
>
> I've changed the approach to replicate what's done in ExplainQuery to
> both CreateTableAs and DeclareCursor: Jumble the query contained by
> the utility statement and call the post parse hook before it is
> planned or executed. Additionally, explain's nested query can itself
> be a CreateTableAs or DeclareCursor which also needs to be jumbled.
> The issue is visible when explaining a CreateTableAs or DeclareCursor
> Query, the queryId is missing despite the verbose.
>
> EXPLAIN (verbose) create table test_t as select 1;
> QUERY PLAN
> ------------------------------------------
> Result (cost=0.00..0.01 rows=1 width=4)
> Output: 1
>
> Post patch, the query id is correctly displayed.
>
> EXPLAIN (verbose) create table test_t as select 1;
> QUERY PLAN
> ------------------------------------------
> Result (cost=0.00..0.01 rows=1 width=4)
> Output: 1
> Query Identifier: 2800308901962295548
>

The output is what we expect.

now after applying your patch.
SELECT pg_stat_statements_reset();
EXPLAIN (verbose) create table test_t as select 1;
EXPLAIN (verbose) create table test_t as select 2;
SELECT queryid, left(query, 60),plans, calls, rows FROM
pg_stat_statements ORDER BY calls DESC LIMIT 5;

the output is:
queryid | left
| plans | calls | rows
----------------------+--------------------------------------------------------------+-------+-------+------
2800308901962295548 | EXPLAIN (verbose) create table test_t as
select 1; | 2 | 2 | 0
2093602470903273926 | EXPLAIN (verbose) create table test_t as
select $1 | 0 | 2 | 0
-2694081619397734273 | SELECT pg_stat_statements_reset()
| 0 | 1 | 1
2643570658797872815 | SELECT queryid, left(query, $1),plans, calls,
rows FROM pg_s | 1 | 0 | 0

"EXPLAIN (verbose) create table test_t as select 1;" called twice,
is that what we expect?

should first row, the normalized query becomes

EXPLAIN (verbose) create table test_t as select $1;

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-08-26 08:54:56
Message-ID:	CAO6_Xqo5J-DP4aRGUc5J6pu5sSTgjFdNHQ4sWz+NbZx5VGFPOg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

pg_stat_statements reports nested queries and toplevel queries
separately. You can check with the toplevel column.
2800308901962295548 is the nested ctas part while 2093602470903273926
is the top explain utility statement (which explain why there's 0
plans since it's an utility statement). Since the explain ctas query
was called twice, it will be reported as 2 toplevel queries and 2
nested queries.

> should first row, the normalized query becomes
> EXPLAIN (verbose) create table test_t as select $1;

Good point, the issue in this case was the nested query was stored by
pg_stat_statements during the ExecutorEnd hook. This hook doesn't have
the jstate and parseState at that point so pg_stat_statements can't
normalize the query.

I've modified the patch to fix this. Instead of just jumbling the
query in ExplainQuery, I've moved jumbling in ExplainOneUtility which
already has specific code to handle ctas and dcs. Calling the post
parse hook here allows pg_stat_statements to store the normalized
version of the query for this queryid and nesting level.

Attachment	Content-Type	Size
v3-0001-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	28.3 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-08-27 09:12:00
Message-ID:	CACJufxGo5dMZ+=g3F_xje_C-kKe8RH_vjsTaPdZNv8P===jAvw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 26, 2024 at 4:55 PM Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
>

/* Evaluate parameters, if any */
if (entry->plansource->num_params)
{
- ParseState *pstate;
-
- pstate = make_parsestate(NULL);
- pstate->p_sourcetext = queryString;

you deleted the above these lines, but passed (ParseState *pstate) in
ExplainExecuteQuery
how do you make sure ExplainExecuteQuery passed (ParseState *pstate)
the p_next_resno is 1 and p_resolve_unknowns is true.
maybe we can add some Asserts like in ExplainExecuteQuery

/* Evaluate parameters, if any */
if (entry->plansource->num_params)
{
Assert(pstate->p_next_resno == 1);
Assert(pstate->p_resolve_unknowns == 1);
}

also it's ok to use passed (ParseState *pstate) for
{
estate = CreateExecutorState();
estate->es_param_list_info = params;
paramLI = EvaluateParams(pstate, entry, execstmt->params, estate);
}
?
I really don't know.

some of the change is refactoring, maybe you can put it into a separate patch.

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-08-30 07:37:03
Message-ID:	CAO6_XqoXPyoChZmjaWbcKt0B8jAo_0hc4+jv-ej-ZT4vGz4WRg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 27, 2024 at 11:14 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> also it's ok to use passed (ParseState *pstate) for
> {
> estate = CreateExecutorState();
> estate->es_param_list_info = params;
> paramLI = EvaluateParams(pstate, entry, execstmt->params, estate);
> }
> ?
> I really don't know.
>
> some of the change is refactoring, maybe you can put it into a separate patch.

Thanks for the review. I think the parser state is mostly used for the
error callbacks and parser_errposition but I'm not 100% sure. Either
way, you're right and it probably shouldn't be in the patch. I've
modified the patch to restrict the changes to only add the necessary
query jumble and post parse hook calls.

Attachment	Content-Type	Size
v4-0001-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	28.4 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-09-30 15:14:01
Message-ID:	CACJufxGQSpJwa4tDOx5pDy3879=-2sqHEFeWD5_a8DcCaJYuXA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

PREPARE test_prepare_pgss1(int, int) AS select generate_series($1, $2);
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
EXECUTE test_prepare_pgss1 (1,2);
EXECUTE test_prepare_pgss1 (1,3);
SELECT toplevel, calls, query, queryid, rows FROM pg_stat_statements
ORDER BY query COLLATE "C", toplevel;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;

---the above works just fine. just for demo purpose

explain(verbose) EXECUTE test_prepare_pgss1(1, 2);
explain(verbose) EXECUTE test_prepare_pgss1(1, 3);

SELECT toplevel, calls, query, queryid, rows FROM pg_stat_statements
ORDER BY query COLLATE "C", toplevel;
toplevel | calls | query
| queryid | rows
----------+-------+------------------------------------------------------------------------+----------------------+------
f | 2 | PREPARE test_prepare_pgss1(int, int) AS select
generate_series($1, $2) | -3421048434214482065 | 0
t | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
| 3366652201587963567 | 1
t | 0 | SELECT toplevel, calls, query, queryid, rows FROM
pg_stat_statements +| -6410939316132384446 | 0
| | ORDER BY query COLLATE "C", toplevel
| |
t | 1 | explain(verbose) EXECUTE test_prepare_pgss1(1, 2)
| 7618807962395633001 | 0
t | 1 | explain(verbose) EXECUTE test_prepare_pgss1(1, 3)
| -2281958002956676857 | 0

Is it possible to normalize top level utilities explain query, make
these two have the same queryid?
explain(verbose) EXECUTE test_prepare_pgss1(1, 2);
explain(verbose) EXECUTE test_prepare_pgss1(1, 3);

I guess this is a corner case.

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-01 07:26:40
Message-ID:	CAO6_XqqyNK7U-TtJV4hny5qCbCUY_e_4iOgT45fU7ffbwrpzFg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 30, 2024 at 5:14 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> Is it possible to normalize top level utilities explain query, make
> these two have the same queryid?
> explain(verbose) EXECUTE test_prepare_pgss1(1, 2);
> explain(verbose) EXECUTE test_prepare_pgss1(1, 3);
>
> I guess this is a corner case.

This seems to be a known issue. The test_prepare_pgss1's parameters
are A_Const nodes. Those nodes have a custom query jumble which
doesn't record location[1] and thus can't be normalised by pgss.

That could be fixed by replacing the switch on nodeTag by
JUMBLE_LOCATION(location) but this will impact a lot of DDL queries
and the result doesn't look great (for example, "BEGIN TRANSACTION NOT
DEFERRABLE, READ ONLY, READ WRITE, DEFERRABLE" would be normalised as
"BEGIN TRANSACTION $1 DEFERRABLE, $2 ONLY, $3 WRITE, $4")

Looking at the commit for the A_Const's jumble, this is mentioned by Michael[2]:
> (FWIW, I'd like to think that there is an argument to normalize the
> A_Const nodes for a portion of the DDL queries, by ignoring their
> values in the query jumbling and mark a location, which would be
> really useful for some workloads, but that's a separate discussion I
> am keeping for later.)

I haven't found any recent discussion but this should live in a
different thread as this is a separate issue.

[1]: https://github.com/postgres/postgres/blob/cf4401fe6cf56811343edcad29c96086c2c66481/src/backend/nodes/queryjumblefuncs.c#L323-L355
[2]: /message-id/Y9+HuYslMAP6yyPb@paquier.xyz

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-01 07:35:59
Message-ID:	ZvumX_yAwuXTkWBm@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 01, 2024 at 09:26:40AM +0200, Anthonin Bonnefoy wrote:
> This seems to be a known issue. The test_prepare_pgss1's parameters
> are A_Const nodes. Those nodes have a custom query jumble which
> doesn't record location[1] and thus can't be normalised by pgss.
>
> That could be fixed by replacing the switch on nodeTag by
> JUMBLE_LOCATION(location) but this will impact a lot of DDL queries
> and the result doesn't look great (for example, "BEGIN TRANSACTION NOT
> DEFERRABLE, READ ONLY, READ WRITE, DEFERRABLE" would be normalised as
> "BEGIN TRANSACTION $1 DEFERRABLE, $2 ONLY, $3 WRITE, $4")

Yeah, I've made peace with myself regarding the fact that adding a
location tracker to A_Const is just a very bad idea and that we should
never do that, ever. So what I proposed on this thread is a no-go. I
am not saying that there is no solution, just that this solution is a
bad one.

You can see the extent of the damages this would cause with the diffs
generated in pg_stat_statements. If there is a gap in the tests for
commands using A_Const nodes compared to what's on HEAD, we should
expand that to properly track how normalization would apply to each
one of them (see the recent one for SET queries, for example). This
is really useful when manipulating the parse nodes and query jumbling
with some pg_node_attr().
--
Michael

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-02 04:38:43
Message-ID:	ZvzOU9BbvPCpzgec@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Aug 30, 2024 at 09:37:03AM +0200, Anthonin Bonnefoy wrote:
> Thanks for the review. I think the parser state is mostly used for the
> error callbacks and parser_errposition but I'm not 100% sure. Either
> way, you're right and it probably shouldn't be in the patch. I've
> modified the patch to restrict the changes to only add the necessary
> query jumble and post parse hook calls.

So this adds four calls post_parse_analyze_hook, leading to more data
added to pgss for non-toplevel queries: one in createas.c for the CTAS
internal query, one in portalcmds.c for the inner query of DECLARE,
and two for utilities in EXPLAIN.

This is a rather old problem, trying to bring more consistency across
the board, and it comes down to this bit with EXPLAIN, that can be
seen on HEAD:
SET pg_stat_statements.track = 'all';
explain (costs off) select 1;

FWIW, I've always found this case with EXPLAIN with two entries
confusing, so what's the advantage in trying to apply this rule for
the rest? We know that EXPLAIN, DECLARE and CTAS run a query attached
to their DDL, hence isn't it sufficient to register a single entry for
the top-level query, then nothing for the internal one. The
documentation tells about inner queries when pg_stat_statements.track
= all, like the ones in PL functions, DO blocks, because we have a
clear view of the query string, creating a unique one-one mapping
between a query string and its ID. This patch maps the same query
string to more than one query ID, spreading that.

So it seems that there are arguments for not doing what this patch
proposes, but also make sure that EXPLAIN logs a single entry, not
two currently when using pg_stat_statements.track = all.

Side note. It looks like the patch is forgetting about CREATE VIEW
and CREATE MATERIALIZED VIEW, creating only a top-level entry when
running these utilities.
--
Michael

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-04 09:05:31
Message-ID:	CAO6_XqrA+c3QhxGEOTRMpTD2ThEd3DgtTNKDrMaDkqkjUcrYDg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 2, 2024 at 6:39 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> FWIW, I've always found this case with EXPLAIN with two entries
> confusing, so what's the advantage in trying to apply this rule for
> the rest? We know that EXPLAIN, DECLARE and CTAS run a query attached
> to their DDL, hence isn't it sufficient to register a single entry for
> the top-level query, then nothing for the internal one. The
> documentation tells about inner queries when pg_stat_statements.track
> = all, like the ones in PL functions, DO blocks, because we have a
> clear view of the query string, creating a unique one-one mapping
> between a query string and its ID. This patch maps the same query
> string to more than one query ID, spreading that.
>
> So it seems that there are arguments for not doing what this patch
> proposes, but also make sure that EXPLAIN logs a single entry, not
> two currently when using pg_stat_statements.track = all.

I agree that tracking 2 identical statements with different queryIds
and nesting levels is very confusing. On the other hand, from an
extension developer point of view (not necessarily limited to
pg_stat_statements), I would like to have the queryId available and
the post_parse hook called so the query can be normalised and tracked
in a hashmap.

However, the repeated statements did bug me a lot so I took a stab at
trying to find a possible solution. I made an attempt in 0001 by
tracking the statements' locations of explainable statements (Select,
Insert, Update, Merge, Delete...) during parse and propagate them in
the generated Query during transform. With the change, we now have the
following result:

SET pg_stat_statements.track = 'all';
explain (costs off) select 1;
select 1;
select queryid, calls, query, toplevel from pg_stat_statements
where query ~'select \$1';
queryid | calls | query | toplevel
---------------------+-------+-------------------------------+----------
2800308901962295548 | 1 | select $1 | t
2800308901962295548 | 1 | select $1; | f
3797185112479763582 | 1 | explain (costs off) select $1 | t

The top level and nested select statement now share both the same
queryId and query string. The additional ';' for the nested query is
due to not having the statement length and taking the whole
statement.

> Side note. It looks like the patch is forgetting about CREATE VIEW
> and CREATE MATERIALIZED VIEW, creating only a top-level entry when
> running these utilities.

I've updated 0002 to handle CREATE MATERIALIZED VIEW, CREATE VIEW
doesn't generate nested statements. I've also realised that refresh
materialized view has a similar problem to explain. The first refresh
called when the matview is created will have the correct select
substring. Subsequent refresh call will use the refresh utility
statement as the query string:

-- Create the view
CREATE MATERIALIZED VIEW test AS SELECT * FROM generate_series(1, 1000) as id;
-- Reset pgss and refresh
select * from pg_stat_statements_reset();
REFRESH MATERIALIZED VIEW test;
select queryid, calls, query, toplevel from pg_stat_statements;
queryid | calls | query
| toplevel
----------------------+-------+------------------------------------------+----------
8227191975572355654 | 1 | REFRESH MATERIALIZED VIEW test | t
-2908407163726309935 | 1 | REFRESH MATERIALIZED VIEW test; | f
-1361326859153559975 | 1 | select * from pg_stat_statements_reset() | t

I've tried to improve this behaviour in 0003 where the view's
definition is used as query string instead of the refresh utility
statement.

Attachment	Content-Type	Size
v5-0001-Track-location-to-extract-relevant-part-in-nested.patch	application/octet-stream	18.8 KB
v5-0003-Use-view-s-definition-as-query-string-on-a-materi.patch	application/octet-stream	8.2 KB
v5-0002-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	33.1 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-04 12:16:00
Message-ID:	CACJufxF9hqyfmKEdpiG=PbrGdKVNP2BQjHFJh4q6639sV7NmvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 4, 2024 at 5:05 PM Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
>
> I agree that tracking 2 identical statements with different queryIds
> and nesting levels is very confusing. On the other hand, from an
> extension developer point of view (not necessarily limited to
> pg_stat_statements), I would like to have the queryId available and
> the post_parse hook called so the query can be normalised and tracked
> in a hashmap.
>
> However, the repeated statements did bug me a lot so I took a stab at
> trying to find a possible solution. I made an attempt in 0001 by
> tracking the statements' locations of explainable statements (Select,
> Insert, Update, Merge, Delete...) during parse and propagate them in
> the generated Query during transform. With the change, we now have the
> following result:
>
> SET pg_stat_statements.track = 'all';
> explain (costs off) select 1;
> select 1;
> select queryid, calls, query, toplevel from pg_stat_statements
> where query ~'select \$1';
> queryid | calls | query | toplevel
> ---------------------+-------+-------------------------------+----------
> 2800308901962295548 | 1 | select $1 | t
> 2800308901962295548 | 1 | select $1; | f
> 3797185112479763582 | 1 | explain (costs off) select $1 | t
>
> The top level and nested select statement now share both the same
> queryId and query string. The additional ';' for the nested query is
> due to not having the statement length and taking the whole
> statement.
>

about v5 0001
select_with_parens:
'(' select_no_parens ')' { $$ = $2; }
| '(' select_with_parens ')' { $$ = $2; }
;

query "select $1);" looks weird. not sure how to improve it,
or this should be the expected behavior?

in gram.y
| values_clause { $$ = $1; }
| TABLE relation_expr
for TABLE relation_expr
we can add `n->location = @1;`

for values_clause we can do also,
then in transformValuesClause do the same as in transformSelectStmt.

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-07 05:39:36
Message-ID:	ZwN0GFIU6E8hUa9t@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 04, 2024 at 08:16:00PM +0800, jian he wrote:
> about v5 0001
> select_with_parens:
> '(' select_no_parens ')' { $$ = $2; }
> | '(' select_with_parens ')' { $$ = $2; }
> ;
>
>
> toplevel | calls | query
> ----------+-------+-------------------------------------------------------
> t | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
> t | 0 | SELECT toplevel, calls, query FROM pg_stat_statements+
> | | ORDER BY query COLLATE "C", toplevel
> t | 1 | explain (select $1)
> f | 1 | select $1);
>
> query "select $1);" looks weird. not sure how to improve it,
> or this should be the expected behavior?

GOod point, this is confusing. The point is that having only
stmt_location is not enough to detect where the element in the query
you want to track is because it only points at its start location in
the full query string. In an ideal world, what we should have is its
start and end, pass it down to pgss_store(), and store only this
subquery between the start and end positions in the stats entry.
Making that right through the parser may be challenging, though.

This concept is something that's perhaps larger than this thread? I
think that we want the same kind of thing for values in IN() and ANY()
clauses, where we want to track an area for a single normalization
parameter, perhaps with a separate node_attr. I am not sure if using
the same trackers would make sense, so I am just waving hands a bit
here, but the concepts required are quite close.

Saying that, a patch set implemented this way would ensure a strict
1:1 mapping between a query ID and the internal query in these EXPLAIN
and CREATE commands, which would be good.

The first step should be IMO to expand the tests of pgss and track all
the behaviors we have historically in the tree about all that. Then,
it becomes much easier to track how much we want to tweak them
depending on if pgss.track is set to "top" or "all", and easier to see
how a behavior changes when manipulating the parse node structures
with location data.
--
Michael

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-07 16:17:11
Message-ID:	CACJufxEXSfk4o2jHDhf50fOY6WC+dFQke2gmpcz+EHVUsmEptg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 7, 2024 at 1:39 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Oct 04, 2024 at 08:16:00PM +0800, jian he wrote:
> > about v5 0001
> > select_with_parens:
> > '(' select_no_parens ')' { $$ = $2; }
> > | '(' select_with_parens ')' { $$ = $2; }
> > ;
> >
> >
> > toplevel | calls | query
> > ----------+-------+-------------------------------------------------------
> > t | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
> > t | 0 | SELECT toplevel, calls, query FROM pg_stat_statements+
> > | | ORDER BY query COLLATE "C", toplevel
> > t | 1 | explain (select $1)
> > f | 1 | select $1);
> >
> > query "select $1);" looks weird. not sure how to improve it,
> > or this should be the expected behavior?
>
> GOod point, this is confusing. The point is that having only
> stmt_location is not enough to detect where the element in the query
> you want to track is because it only points at its start location in
> the full query string. In an ideal world, what we should have is its
> start and end, pass it down to pgss_store(), and store only this
> subquery between the start and end positions in the stats entry.
> Making that right through the parser may be challenging, though.
>

turns out UPDATE/DELETE/MERGE and other utilities stmt cannot have
arbitrary parenthesis with EXPLAIN.

attached patches can solve this specific problem.
(based on v5-0001-Track-location-to-extract-relevant-part-in-nested.patch)

the main gotcha is to add location information for the statement that
is being explained.
typedef struct ExplainStmt
{
NodeTag type;
Node *query; /* the query (see comments above) */
List *options; /* list of DefElem nodes */
ParseLoc location; /* location of the statement being explained */
} ExplainStmt;

explain select 1;
explain (select 1);
explain (((select 1)));

the above 3 explained select queries will be normalized to one select query.

Attachment	Content-Type	Size
v5-0001-exposse_explained_stmt_location.no-cfbot	application/octet-stream	2.0 KB

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-08 08:35:08
Message-ID:	CAO6_Xqrjr_1Ss0bRe5VFm6OsUwX2nuN_VhbhYj0LFP3acoaaWw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 7, 2024 at 7:39 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> GOod point, this is confusing. The point is that having only
> stmt_location is not enough to detect where the element in the query
> you want to track is because it only points at its start location in
> the full query string. In an ideal world, what we should have is its
> start and end, pass it down to pgss_store(), and store only this
> subquery between the start and end positions in the stats entry.
> Making that right through the parser may be challenging, though.

One of the issues is that we don't track the length in the parser,
only location[1]. The only place we can have some information about
the statement length (or at least, the location of the ';') is for
multi statement query.

On Mon, Oct 7, 2024 at 6:17 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> turns out UPDATE/DELETE/MERGE and other utilities stmt cannot have
> arbitrary parenthesis with EXPLAIN.

Yes, it is also possible to get the length of the Select statement
within parenthesis through the parser by using the location of ')' for
the select_no_parens.

> the main gotcha is to add location information for the statement that
> is being explained.

I've found that there are other possible issues with not having the
statement length and including the opening parenthesis won't be
enough. On HEAD, we have the following:

explain(verbose) SELECT 1, 2, 3\; explain SELECT 1, 2, 3, 4;
SELECT toplevel, query FROM pg_stat_statements
ORDER BY toplevel desc, query;
toplevel | query
----------+-----------------------------------------------------------------
t | SELECT pg_stat_statements_reset() IS NOT NULL AS t
t | explain SELECT $1, $2, $3, $4
t | explain(verbose) SELECT $1, $2, $3
f | explain(verbose) SELECT $1, $2, $3; explain SELECT 1, 2, 3, 4;
f | explain(verbose) SELECT 1, 2, 3; explain SELECT $1, $2, $3, $4;

The nested statement will have the whole query string. To fix this, we
need to propagate the statement length from the RawStmt (probably
using the ParserState?) and adjust the nested query's location and
length when the statement is transformed. I'm still working on the
details and edge cases on this.

[1]: https://github.com/postgres/postgres/blob/REL_17_STABLE/src/backend/parser/gram.y#L69-L79

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-09 08:49:40
Message-ID:	CAO6_XqoPW6Kt9o_QHa92oUZhH3C59D7vXmOJNb2VzfoBqUHTYA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here is a new version of the patchset.

0001: Add tests to cover the current behaviour: Missing nested
statements for CreateTableAs, DeclareCursor and MaterializedViews,
nested statements reported by explain including the whole string
(multi statement or whole utility statement). I've tried to be
exhaustive, testing both all and top tracking, but I may have missed
some possible corner cases.

0002: Track the location of explainable statements. We keep RawStmt's
location and length in the ParseState and use it to compute the
transformed statement's length, this is done to handle the
multi-statement query issue. For SelectStmt, we also track the
statement length when select is inside parentheses and use it when
available.
For with clauses, I've switched to directly getting the correct
location from the parser with the 'if (@$ < 0) @$ = @2;' trick. Select
is handled differently and the location is updated in
insertSelectOptions.
Tracking the statement length as the benefit of having consistent
query string between the top and nested statement. A 'Select 1' should
be reported with the same string, without the ';' in both cases.

0003: Add query jumble and post_parse calls CreateTableAs,
DeclareCursor and MaterializedViews so they would report the nested
statements, like what explain is doing.

0004: On a materialized view refresh, fetch the view definition to
report as the executed statement. By default, it would report the
Refresh utility statement while the nested statement would be the
select statement.

0005: Report the nested query for Prepare statement. When executing a
prepared query, the whole Prepare utility statement is reported for
the nested query. This patch reuses the statement's location and
ParseState's length to extract the relevant part of the query string.

Something that's not clear to me, I've added location to SelectStmt,
InsertStmt, DeleteStmt, UpdateStmt and MergeStmt in parsenodes.h.
Should the location be tagged as query_jumble_ignore? Looking at other
nodes, it doesn't seem consistent whether the location has this tag or
not.

Attachment	Content-Type	Size
v6-0004-Use-view-s-definition-as-query-string-on-a-materi.patch	application/octet-stream	6.1 KB
v6-0002-Track-location-to-extract-relevant-part-in-nested.patch	application/octet-stream	21.1 KB
v6-0001-Add-tests-covering-pgss-nested-queries.patch	application/octet-stream	47.0 KB
v6-0003-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	22.4 KB
v6-0005-Extract-nested-query-from-PrepareStmt.patch	application/octet-stream	8.8 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-11 00:38:00
Message-ID:	CACJufxF4dq8AT2crudQBrjvhTzp42BoEiGh+KT7yZUhHDTac9A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 9, 2024 at 4:49 PM Anthonin Bonnefoy
<anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:
>
> Here is a new version of the patchset.
>
> 0001: Add tests to cover the current behaviour: Missing nested
> statements for CreateTableAs, DeclareCursor and MaterializedViews,
> nested statements reported by explain including the whole string
> (multi statement or whole utility statement). I've tried to be
> exhaustive, testing both all and top tracking, but I may have missed
> some possible corner cases.
>
> 0002: Track the location of explainable statements. We keep RawStmt's
> location and length in the ParseState and use it to compute the
> transformed statement's length, this is done to handle the
> multi-statement query issue. For SelectStmt, we also track the
> statement length when select is inside parentheses and use it when
> available.
> For with clauses, I've switched to directly getting the correct
> location from the parser with the 'if (@$ < 0) @$ = @2;' trick. Select
> is handled differently and the location is updated in
> insertSelectOptions.
> Tracking the statement length as the benefit of having consistent
> query string between the top and nested statement. A 'Select 1' should
> be reported with the same string, without the ';' in both cases.
>

hi.

Query *
transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)
{
Query *result;
/* We're at top level, so allow SELECT INTO */
result = transformOptionalSelectInto(pstate, parseTree->stmt);
result->stmt_location = parseTree->stmt_location;
result->stmt_len = parseTree->stmt_len;
return result;
}
function call chain:
transformTopLevelStmt transformOptionalSelectInto transformStmt
transformSelectStmt

in transformSelectStmt we do
makeNode(Query), assign Query's stmt_len, stmt_location value.
if in transformSelectStmt we did it wrong, then
transformTopLevelStmt

result->stmt_location = parseTree->stmt_location;
result->stmt_len = parseTree->stmt_len;

will override values we've previously assigned at transformSelectStmt.

this feel not safe?

+qry->stmt_len = pstate->p_stmt_len - (stmt->location -
pstate->p_stmt_location);
i feel like, the comments didn't explain very well the difference between
stmt->location and pstate->p_stmt_location.
i know it's related to multi-statement, possibly through \;

SELECT pg_stat_statements_reset() IS NOT NULL AS t;
explain (values (1));
explain ((values (1)));
explain table tenk1;
explain ((table tenk1));
SELECT toplevel, calls, query FROM pg_stat_statements ORDER BY query
COLLATE "C", toplevel;
final output:

I already mentioned this at the end of [1].
Can you try to also normalize these cases, since we've normalized the
nested select query in explain statement.

[1] /message-id/CACJufxF9hqyfmKEdpiG%3DPbrGdKVNP2BQjHFJh4q6639sV7NmvQ%40mail.gmail.com

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-14 09:20:13
Message-ID:	CAO6_Xqr-_4bupGHyVnHFNcqCw6EWnsMN5Q7ogNsy_YR+743ZfA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Fri, Oct 11, 2024 at 2:39 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> in transformSelectStmt we do
> makeNode(Query), assign Query's stmt_len, stmt_location value.
> if in transformSelectStmt we did it wrong, then
> transformTopLevelStmt
>
> result->stmt_location = parseTree->stmt_location;
> result->stmt_len = parseTree->stmt_len;
>
> will override values we've previously assigned at transformSelectStmt.
>
> this feel not safe?

Good point. There are multiple spots in the call tree where the
location/length was set which is definitely confusing. I've updated
the patch to always set the location/length within the transformStmt
calls, near the creation of the query nodes.
This means that transformSelectStmt was only doing a call
transformOptionalSelectInto and was mostly unnecessary. I've replaced
transformSelectStmt calls by direct calls to
transformOptionalSelectInto.

> +qry->stmt_len = pstate->p_stmt_len - (stmt->location -
> pstate->p_stmt_location);
> i feel like, the comments didn't explain very well the difference between
> stmt->location and pstate->p_stmt_location.
> i know it's related to multi-statement, possibly through \;

I've added more details to the comments.

> SELECT pg_stat_statements_reset() IS NOT NULL AS t;
> explain (values (1));
> explain ((values (1)));
> explain table tenk1;
> explain ((table tenk1));
> SELECT toplevel, calls, query FROM pg_stat_statements ORDER BY query
> COLLATE "C", toplevel;
> final output:
>
> toplevel | calls | query
> ----------+-------+--------------------------------------------------------------------------------------------
> t | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
> t | 0 | SELECT toplevel, calls, query FROM
> pg_stat_statements ORDER BY query COLLATE "C", toplevel
> t | 2 | explain (values ($1))
> f | 2 | explain (values ($1));
> f | 2 | explain table tenk1
> t | 2 | explain table tenk1
>
> I already mentioned this at the end of [1].
> Can you try to also normalize these cases, since we've normalized the
> nested select query in explain statement.

I've missed that, thanks for the reminder. Which made me realise that
the TABLE command was also not handled. I've added both TABLE and
VALUES in the tests and they should now report correctly when nested.

Attachment	Content-Type	Size
v7-0001-Add-tests-covering-pgss-nested-queries.patch	application/octet-stream	50.9 KB
v7-0003-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	22.5 KB
v7-0004-Use-view-s-definition-as-query-string-on-a-materi.patch	application/octet-stream	6.1 KB
v7-0002-Track-location-to-extract-relevant-part-in-nested.patch	application/octet-stream	34.4 KB

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-15 08:11:40
Message-ID:	CAO6_XqqMYOxJmHJWCKjP44T9AsW0MmKV87XUYCP3R9JZvYcVaw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've found that COPY wasn't correctly managed. "COPY (SELECT 1) to
stdout;" nested statement would be tracked as "SELECT $1) to stdout".
Since COPY accepts a PreparableStmt, stmt_len for all types of
PreparableStmt needs to be tracked. This is done through
updatePreparableStmtEnd in CopyStmt since we know the location of the
closing parenthesis. The logic to set Query->stmt_len for statements
that have a possible stmt_len has also been moved to a dedicated
setQueryStmtLen function.

I've updated the patchset with additional tests for COPY in 0001. 0002
includes the necessary modifications to handle COPY.

Attachment	Content-Type	Size
v8-0002-Track-location-to-extract-relevant-part-in-nested.patch	application/octet-stream	39.7 KB
v8-0003-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	22.5 KB
v8-0004-Use-view-s-definition-as-query-string-on-a-materi.patch	application/octet-stream	6.1 KB
v8-0001-Add-tests-covering-pgss-nested-queries.patch	application/octet-stream	56.2 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-15 13:22:53
Message-ID:	CACJufxEQoSsU_BN8mzHeKckP5PUOKRxvDG419-wc3w9a68BNaQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hi.

explain(verbose) SELECT 1, 2, 3\; explain SELECT 1, 2, 3, 4;
will transformed to
explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4;

it seems to me your patch care about following position.
explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4;
^

but this patch [1] at another thread will get the top level statement
(passed the raw parse, no syntax error) beginning more effortless.
explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4;
^ ^

can you try to looking at [1]. it may help to resolve this patch problem.

[1] /message-id/2245576.1728418678%40sss.pgh.pa.us

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-16 07:12:52
Message-ID:	CAO6_Xqpmjqp3z7-k1bUiLkdPvL3L-zwWM5t-0-VkvCxawYZxeQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 15, 2024 at 3:23 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> explain(verbose) SELECT 1, 2, 3\; explain SELECT 1, 2, 3, 4;
> will transformed to
> explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4;
>
> it seems to me your patch care about following position.
> explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4;
> ^
> but this patch [1] at another thread will get the top level statement
> (passed the raw parse, no syntax error) beginning more effortless.
> explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4;
> ^ ^
>
> can you try to looking at [1]. it may help to resolve this patch problem.
>
> [1] /message-id/2245576.1728418678%40sss.pgh.pa.us

The top level statement beginning doesn't have an impact on this
patch. The patch's focus is on the nested statement and RawStmt's
location and length are only used to get the nested statement length.
I will need the nested statement's location and length no matter what,
to handle cases like "COPY (UPDATE ...) to stdout;" and only report
the statement within parentheses.

The linked patch will allow a simplification: the "if (@$ < 0) @$ =
@2;" I've added won't be needed anymore. But I think that's the only
possible simplification? I've run the tests on top of the linked patch
and there was no change in the regression output.

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-17 03:58:48
Message-ID:	CACJufxFmgOMxhMpMPzSQH3cd6MJ+NFXK+4OgzqFVZQPF_7G4Ew@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hi. Anthonin
please check attached v9-0001, v9-0002, v9-003.

v9-0001-Better-error-reporting-from-extension-scripts-Was.patch
same as v4-0001-Improve-parser-s-reporting-of-statement-start-loc.patch in [1]

v9-0002-Add-tests-covering-pgss-nested-queries.patch same as
v8-0001-Add-tests-covering-pgss-nested-queries.patch in [2]
which is your work.

v9-0003-Track-location-in-nested-explain-statement.patch
is the main change I made based on your patch.

in [3] I mentioned adding "ParseLoc location" to ExplainStmt, then you
found some problems at [4] with multi statements,
now I found a way to resolve it.
I also add "ParseLoc location;" to typedef struct CopyStmt.
copy (select 1) to stdout;
I tested my change can tracking
beginning location and length of the nested query ("select 1")

I didn't touch other nested queries cases yet, so far only ExplainStmt
and CopyStmt1
IMHO, it's more neat than your patches.
Can you give it a try?

[1] /message-id/2245576.1728418678%40sss.pgh.pa.us
[2] /message-id/CAO6_XqqMYOxJmHJWCKjP44T9AsW0MmKV87XUYCP3R9JZvYcVaw%40mail.gmail.com
[3] /message-id/CACJufxEXSfk4o2jHDhf50fOY6WC%2BdFQke2gmpcz%2BEHVUsmEptg%40mail.gmail.com
[4] /message-id/CAO6_Xqrjr_1Ss0bRe5VFm6OsUwX2nuN_VhbhYj0LFP3acoaaWw%40mail.gmail.com

--------------------------------------------
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
explain(verbose) SELECT 1, 2, 3;
explain(verbose) (SELECT 1, 2, 3);
SELECT toplevel, calls, query FROM pg_stat_statements ORDER BY query
COLLATE "C", toplevel;
will have 2 calls for "SELECT $1, $2, $3"

SELECT pg_stat_statements_reset() IS NOT NULL AS t;
explain(verbose) (SELECT 1, 2, 3);
explain(verbose) SELECT 1, 2, 3;
SELECT toplevel, calls, query FROM pg_stat_statements ORDER BY query
COLLATE "C", toplevel;
will have 2 calls for " (SELECT $1, $2, $3)"
I think that's fine.

Attachment	Content-Type	Size
v9-0001-Better-error-reporting-from-extension-scripts-Was.patch	application/x-patch	7.6 KB
v9-0003-Track-location-in-nested-explain-statement.patch	application/x-patch	19.0 KB
v9-0002-Add-tests-covering-pgss-nested-queries.patch	application/x-patch	56.2 KB

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-17 07:21:11
Message-ID:	CAO6_Xqq5V1z+R5pacB_JYeNR0bLP5T47aBDhrZbPR06FCLFghw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Jian,

On Thu, Oct 17, 2024 at 5:59 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> in [3] I mentioned adding "ParseLoc location" to ExplainStmt, then you
> found some problems at [4] with multi statements,
> now I found a way to resolve it.
> I also add "ParseLoc location;" to typedef struct CopyStmt.
> copy (select 1) to stdout;
> I tested my change can tracking
> beginning location and length of the nested query ("select 1")
>
> I didn't touch other nested queries cases yet, so far only ExplainStmt
> and CopyStmt1
> IMHO, it's more neat than your patches.
> Can you give it a try?

I'm not sure about this approach. It moves the responsibility of
tracking the location and length from the nested statement to the top
level statement.
- The location you added in ExplainStmt and CopyStmt has a different
meaning from all others and tracks the nested location and not the
location of the statement itself. This creates some inconsistencies.
- The work will need to be done for all statements with nested
queries: Prepare, Create table as, Declare Cursor, Materialised View.
Whereas by tracking the location of PreparableStatements, there's no
need for additional logic. For example, v8 0002 fixes the existing
behaviour for Prepare statements thanks to SelectStmt's modifications.

I feel like while it looks simpler, it will need more work to handle
all cases while introducing an outlier in how locations are tracked.

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-18 06:21:03
Message-ID:	ZxH-T6SXZWwlgzUX@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 17, 2024 at 09:21:11AM +0200, Anthonin Bonnefoy wrote:
> I'm not sure about this approach. It moves the responsibility of
> tracking the location and length from the nested statement to the top
> level statement.
> - The location you added in ExplainStmt and CopyStmt has a different
> meaning from all others and tracks the nested location and not the
> location of the statement itself. This creates some inconsistencies.
> - The work will need to be done for all statements with nested
> queries: Prepare, Create table as, Declare Cursor, Materialised View.
> Whereas by tracking the location of PreparableStatements, there's no
> need for additional logic. For example, v8 0002 fixes the existing
> behaviour for Prepare statements thanks to SelectStmt's modifications.

Hmm. And isn't tracking this information at only the top-level going
to be a problem when dealing with multiple levels of nesting, when for
example these involve pl/sql or pl/pgSQL functions? How would this
work if we're dealing with multiple levels of Nodes?
--
Michael

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-18 06:27:29
Message-ID:	ZxH_0cK6L6SSHZqi@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 15, 2024 at 10:11:40AM +0200, Anthonin Bonnefoy wrote:
> I've updated the patchset with additional tests for COPY in 0001. 0002
> includes the necessary modifications to handle COPY.

Beginning with the beginning of this patch series.

+SELECT toplevel, calls, query FROM pg_stat_statements
+ ORDER BY toplevel desc, query COLLATE "C";

In v8-0001, for the tests that want to track the queries showing up,
could it be better to adjust the ORDER BY to be (query, toplevel,
calls), making the query string first? This way, it is possible to
see which are the doublons of queries showing up for toplevel as true
and false depending on the utility.
--
Michael

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-21 08:35:17
Message-ID:	CAO6_XqrDzrhZD_L2OZu2FYMcPFPBXY=TE0OZsqXJizBeMko9OA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Fri, Oct 18, 2024 at 8:27 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> Beginning with the beginning of this patch series.
>
> +SELECT toplevel, calls, query FROM pg_stat_statements
> + ORDER BY toplevel desc, query COLLATE "C";
>
> In v8-0001, for the tests that want to track the queries showing up,
> could it be better to adjust the ORDER BY to be (query, toplevel,
> calls), making the query string first? This way, it is possible to
> see which are the doublons of queries showing up for toplevel as true
> and false depending on the utility.

I've updated 0001 to only use ORDER BY query. The query strings are
not exact doublons, as the nested statement has the additional ending
';' due to using the whole string instead of just the RawStmt. Thus,
the other sort expressions will never be used since there's no
equality. There's also the possibility of breaking down each statement
in individual blocks, with pgss reset and fetch for each one. However,
I feel it's gonna add a lot of noise in the test file.

Attachment	Content-Type	Size
v9-0003-Set-query_id-for-queries-contained-in-utility-sta.patch	application/octet-stream	23.0 KB
v9-0002-Track-location-to-extract-relevant-part-in-nested.patch	application/octet-stream	42.5 KB
v9-0001-Add-tests-covering-pgss-nested-queries.patch	application/octet-stream	55.6 KB
v9-0004-Use-view-s-definition-as-query-string-on-a-materi.patch	application/octet-stream	6.1 KB

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-22 05:06:16
Message-ID:	ZxcyyGaTy_5FmORw@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 21, 2024 at 10:35:17AM +0200, Anthonin Bonnefoy wrote:
> I've updated 0001 to only use ORDER BY query. The query strings are
> not exact doublons, as the nested statement has the additional ending
> ';' due to using the whole string instead of just the RawStmt. Thus,
> the other sort expressions will never be used since there's no
> equality. There's also the possibility of breaking down each statement
> in individual blocks, with pgss reset and fetch for each one. However,
> I feel it's gonna add a lot of noise in the test file.

I've looked at 0001, and finished by splitting the case of all-level
tracking with the multi-statements as the resulting table was feeling
heavily bloated, particularly because of MERGE that spawned in
multiple lines even if there were less entries. The rest, except for
some styling inconsistencies, was feeling OK.

One of the multi-statement tests includes this output for HEAD, and
that's on two PGSS entries:
EXPLAIN (COSTS OFF) SELECT $1, $2 UNION SELECT $3, $4;
EXPLAIN (COSTS OFF) (SELECT 1, 2, 3) UNION SELECT 3, 4, 5;

EXPLAIN (COSTS OFF) SELECT 1, 2 UNION SELECT 3, 4;
EXPLAIN (COSTS OFF) (SELECT $1, $2, $3) UNION SELECT $4, $5, $6;

I did not notice that first, but that's really something!
Normalization is only applied partially to a portion of the string, so
we have a bunch of bloat for non-top queries that has existed for
years.

+ ParseLoc stmt_location; /* start location, or -1 if unknown */
+ ParseLoc stmt_len; /* length in bytes; 0 means "rest of string" */

I'm OK with this approach after considering a few things, mostly in
terms of consistency with the existing node structures. The existing
business with YYLLOC_DEFAULT() counts here.

-Query *
-transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)

What's the advantage of removing this routine? Is that because you're
pushing the initialization of stmt_location and stmt_len into
transformOptionalSelectInto(), making it mostly moot? Something that
worries me a bit is that this changes makes the code less clean, by
having a SELECT-INTO specific routine called in the parse-analyze
paths, while adding three individual paths in charge of setting
pstate->p_stmt_len and p_stmt_location.

+ n->stmt_len = @3 - @2;

This specific case deserves a comment. I don't have the best
understanding of this area yet (need more analysis), but With the
existing updateRawStmtEnd() and RawStmt also tracking this
information, I am wondering if we could be smarter with less paths
manipulating the start locations and lengths. And your patch adds a
new setQueryStmtLen() that does the same kind of job. Hmm.

FWIW, I don't feel strongly about 0004 that tries to make the REFRESH
handling smarter. I am not sure that it even makes sense as-is by
hacking into a wrapper of pg_get_viewdef_worker() to get the query
string, leading it to not be normalized. This has also a small
footprint in 0003. I think that the bits in ExecRefreshMatView()
should be discarded from 0003, as a result.
--
Michael

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-22 05:09:03
Message-ID:	Zxczb8b8srjARwkE@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 22, 2024 at 02:06:16PM +0900, Michael Paquier wrote:
> I've looked at 0001, and finished by splitting the case of all-level
> tracking with the multi-statements as the resulting table was feeling
> heavily bloated, particularly because of MERGE that spawned in
> multiple lines even if there were less entries. The rest, except for
> some styling inconsistencies, was feeling OK.

And of course I have forgotten to attach a rebase of the remaining
patches..
--
Michael

Attachment	Content-Type	Size
v10-0001-Track-location-to-extract-relevant-part-in-neste.patch	text/x-diff	45.5 KB
v10-0002-Set-query_id-for-queries-contained-in-utility-st.patch	text/x-diff	22.8 KB
v10-0003-Use-view-s-definition-as-query-string-on-a-mater.patch	text/x-diff	6.1 KB

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-22 09:34:55
Message-ID:	CAO6_XqqMpRS1oD7mrnWLuRgF7NBkNzue4KymUOE6p9MeB9GHOA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 22, 2024 at 7:06 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> -Query *
> -transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)
>
> What's the advantage of removing this routine? Is that because you're
> pushing the initialization of stmt_location and stmt_len into
> transformOptionalSelectInto(), making it mostly moot?

Yeah, the removal of the stmt_location and stmt_len initialization
left the function with only one thing, the call to
transformOptionalSelectInto.

> Something that
> worries me a bit is that this changes makes the code less clean, by
> having a SELECT-INTO specific routine called in the parse-analyze
> paths, while adding three individual paths in charge of setting
> pstate->p_stmt_len and p_stmt_location.

I've moved pstate's p_stmt_len and p_stmt_location assignment to
transformTopLevelStmt (and also restored transformTopLevelStmt). This
will remove the multiple assignment paths.

> + n->stmt_len = @3 - @2;
>
> This specific case deserves a comment.

I think I went over this 3 times thinking "maybe I should add a
comment here". Added.

> I don't have the best
> understanding of this area yet (need more analysis), but With the
> existing updateRawStmtEnd() and RawStmt also tracking this
> information, I am wondering if we could be smarter with less paths
> manipulating the start locations and lengths. And your patch adds a
> new setQueryStmtLen() that does the same kind of job. Hmm.

This is doable. I've moved the query's location and length assignment
to the end of transformStmt which will call setQueryLocationAndLength.
The logic of manipulating locations and lengths will only happen in a
single place. That creates an additional switch on the node's type as
a small trade off.

> FWIW, I don't feel strongly about 0004 that tries to make the REFRESH
> handling smarter. I am not sure that it even makes sense as-is by
> hacking into a wrapper of pg_get_viewdef_worker() to get the query
> string, leading it to not be normalized. This has also a small
> footprint in 0003. I think that the bits in ExecRefreshMatView()
> should be discarded from 0003, as a result.

Good point on the query not being normalised. I'm fine with dropping
the materialised view part.

Also, there was an unnecessary cast in analyze.c "result->utilityStmt
= (Node *) parseTree;" as parseTree is already a Node. I removed it in
0001.

Attachment	Content-Type	Size
v11-0001-Track-location-to-extract-relevant-part-in-neste.patch	application/octet-stream	40.3 KB
v11-0002-Set-query_id-for-queries-contained-in-utility-st.patch	application/octet-stream	16.7 KB

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-23 06:10:19
Message-ID:	ZxiTS_v6mYpcPHsa@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 22, 2024 at 11:34:55AM +0200, Anthonin Bonnefoy wrote:
> On Tue, Oct 22, 2024 at 7:06 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> Something that
>> worries me a bit is that this changes makes the code less clean, by
>> having a SELECT-INTO specific routine called in the parse-analyze
>> paths, while adding three individual paths in charge of setting
>> pstate->p_stmt_len and p_stmt_location.
>
> I've moved pstate's p_stmt_len and p_stmt_location assignment to
> transformTopLevelStmt (and also restored transformTopLevelStmt). This
> will remove the multiple assignment paths.
>
>> + n->stmt_len = @3 - @2;
>>
>> This specific case deserves a comment.
>
> I think I went over this 3 times thinking "maybe I should add a
> comment here". Added.

Thanks. These changes look OK.

> This is doable. I've moved the query's location and length assignment
> to the end of transformStmt which will call setQueryLocationAndLength.
> The logic of manipulating locations and lengths will only happen in a
> single place. That creates an additional switch on the node's type as
> a small trade off.

Grouping both assignments in a single setQueryLocationAndLength() is
less confusing.

> Also, there was an unnecessary cast in analyze.c "result->utilityStmt
> = (Node *) parseTree;" as parseTree is already a Node. I removed it in
> 0001.

Indeed. It does not matter one way or another and we have plenty of
these in the tree.

I have some more minor comments.

- if (@$ < 0) /* see comments for YYLLOC_DEFAULT */
- @$ = @2;

With 14e5680eee19 now in the tree (interesting timing as this did not
exist until yesterday), it looks like we don't need these ones
anymore, no?

@@ -18943,11 +19004,13 @@ insertSelectOptions(SelectStmt *stmt,
+ /* Update SelectStmt's location to the start of the with clause */
+ stmt->stmt_location = withClause->location;

I have been wondering for a bit what this is about, and indeed this
makes the location setup easier to think about with the various SELECT
rules we need to deal with (the ones calling insertSelectOptions), and
WITH is just in a subset of them. So LGTM.

+ ParseLoc p_stmt_location; /* start location, or -1 if unknown */
+ ParseLoc p_stmt_len; /* length in bytes; 0 means "rest of string" */

So, the reason why these two fields are added to the ParseState is
that the lower layers in charge of the query transforms don't have to
RawStmt so as the location and lengths can be adjusted when queries
are between parenthesis. I was first wondering if we should push
RawStmt deeper into the argument stack, but based on the stmt_location
assignments for the DMLs and WITH, storing that in the ParseState
looks neater. The patch is lacking a description of these two fields
at the top of the ParseState structure in parse_node.h. This stuff
needs to be explained, aka we need them to be able to adjust the
locations and lengths depending on inner clauses of the queries we are
dealing with, or something like that.

While reviewing the whole, I've done some changes, mostly stylistic.
Please see the attach about them, that I have kept outside of your
v11-0001 for clarity. I still need to dive deeper into v11-0002 (not
attached here), but let's take one step at a time and conclude on
v11-0001 first..
--
Michael

Attachment	Content-Type	Size
v12-0001-Track-location-to-extract-relevant-part-in-neste.patch	text/x-diff	40.2 KB
v12-0002-Some-edits-from-me.patch	text/x-diff	6.1 KB

From:	jian he <jian(dot)universality(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-23 08:36:42
Message-ID:	CACJufxFpXw2VvSXfrqFabPUwkjcE0e62xLttEwjrYFoHL=Ycmg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 23, 2024 at 2:10 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Oct 22, 2024 at 11:34:55AM +0200, Anthonin Bonnefoy wrote:
> > On Tue, Oct 22, 2024 at 7:06 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >> Something that
> >> worries me a bit is that this changes makes the code less clean, by
> >> having a SELECT-INTO specific routine called in the parse-analyze
> >> paths, while adding three individual paths in charge of setting
> >> pstate->p_stmt_len and p_stmt_location.
> >
> > I've moved pstate's p_stmt_len and p_stmt_location assignment to
> > transformTopLevelStmt (and also restored transformTopLevelStmt). This
> > will remove the multiple assignment paths.
> >
> >> + n->stmt_len = @3 - @2;
> >>
> >> This specific case deserves a comment.
> >
> > I think I went over this 3 times thinking "maybe I should add a
> > comment here". Added.
>
> Thanks. These changes look OK.
>
> > This is doable. I've moved the query's location and length assignment
> > to the end of transformStmt which will call setQueryLocationAndLength.
> > The logic of manipulating locations and lengths will only happen in a
> > single place. That creates an additional switch on the node's type as
> > a small trade off.
>
> Grouping both assignments in a single setQueryLocationAndLength() is
> less confusing.
>
> > Also, there was an unnecessary cast in analyze.c "result->utilityStmt
> > = (Node *) parseTree;" as parseTree is already a Node. I removed it in
> > 0001.
>
> Indeed. It does not matter one way or another and we have plenty of
> these in the tree.
>
> I have some more minor comments.
>
> - if (@$ < 0) /* see comments for YYLLOC_DEFAULT */
> - @$ = @2;
>
> With 14e5680eee19 now in the tree (interesting timing as this did not
> exist until yesterday), it looks like we don't need these ones
> anymore, no?
>

I commented out
> - if (@$ < 0) /* see comments for YYLLOC_DEFAULT */
> - @$ = @2;
the test suite (regess, pg_stat_statements) still passed.
so i think it's not needed.
I am not sure of the meaning of "@$", though.
I do understand the meaning of "@2" meaning.
I think the 14e5680eee19 will make sure the statement start location
is (not empty, not related to comments).

/*
* transformTopLevelStmt -
* transform a Parse tree into a Query tree.
* This function is just responsible for transferring statement location data
* from the RawStmt into the finished Query.
*/
Query *
transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)
{
Query *result;
/* Store RawStmt's length and location in pstate */
pstate->p_stmt_len = parseTree->stmt_len;
pstate->p_stmt_location = parseTree->stmt_location;
/* We're at top level, so allow SELECT INTO */
result = transformOptionalSelectInto(pstate, parseTree->stmt);
return result;
}
do we need to change the comments?
since it's transformOptionalSelectInto inner function setQueryLocationAndLength
transfer location data to the finished query.

> While reviewing the whole, I've done some changes, mostly stylistic.
> Please see the attach about them, that I have kept outside of your
> v11-0001 for clarity. I still need to dive deeper into v11-0002 (not
> attached here), but let's take one step at a time and conclude on
> v11-0001 first..
> --
i manually checked out contrib/pg_stat_statements/expected/level_tracking.out
changes in v12-0001 it looks fine.

From:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-23 09:24:11
Message-ID:	CAO6_Xqqu35=0zAqjVKgQNTvVNS8iHzozBOPpdQHAGCUyGfv=Tw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 23, 2024 at 8:10 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> I have some more minor comments.
>
> - if (@$ < 0) /* see comments for YYLLOC_DEFAULT */
> - @$ = @2;
>
> With 14e5680eee19 now in the tree (interesting timing as this did not
> exist until yesterday), it looks like we don't need these ones
> anymore, no?

Yes, 14e5680eee19 makes the if(@$<0) unnecessaries. I saw this
yesterday and planned to remove them but you beat me to it.

> + ParseLoc p_stmt_location; /* start location, or -1 if unknown */
> + ParseLoc p_stmt_len; /* length in bytes; 0 means "rest of string" */
>
> So, the reason why these two fields are added to the ParseState is
> that the lower layers in charge of the query transforms don't have to
> RawStmt so as the location and lengths can be adjusted when queries
> are between parenthesis. I was first wondering if we should push
> RawStmt deeper into the argument stack, but based on the stmt_location
> assignments for the DMLs and WITH, storing that in the ParseState
> looks neater. The patch is lacking a description of these two fields
> at the top of the ParseState structure in parse_node.h. This stuff
> needs to be explained, aka we need them to be able to adjust the
> locations and lengths depending on inner clauses of the queries we are
> dealing with, or something like that.

True, added comments for both fields.

On Wed, Oct 23, 2024 at 10:36 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> /*
> * transformTopLevelStmt -
> * transform a Parse tree into a Query tree.
> * This function is just responsible for transferring statement location data
> * from the RawStmt into the finished Query.
> */
> Query *
> transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)
> {
> Query *result;
> /* Store RawStmt's length and location in pstate */
> pstate->p_stmt_len = parseTree->stmt_len;
> pstate->p_stmt_location = parseTree->stmt_location;
> /* We're at top level, so allow SELECT INTO */
> result = transformOptionalSelectInto(pstate, parseTree->stmt);
> return result;
> }
> do we need to change the comments?
> since it's transformOptionalSelectInto inner function setQueryLocationAndLength
> transfer location data to the finished query.

Yes, transformTopLevelStmt's comment was outdated. I've updated it.

An issue I've realised with calling setQueryLocationAndLength in
transformStmt: it was possible for pstate's location and length to be
0, leading to a Query with negative size. This wouldn't be visible in
tests since the only time Query's locations are used (AFAIK) is during
post_parse_hook which always have pstate's location information.
However, this is definitely something to avoid. I've added an
additional Assert(qry->stmt_len >= 0); to catch that. The fix is to
not do anything when pstate doesn't have location information.

This also answers another issue I was wondering about. Should the
child's parsestate inherit the location information when
make_parsestate is called? That would be incorrect since this is used
for sub-statement, pstate should reflect the size of the whole
sub-statement. However, since this is unused, it is fine to leave the
child parser with unset location data, which would in turn leave the
statement's location unset in setQueryLocationAndLength.

Patch includes Micheal changes. I've left out 0002 for now to focus on 0001.

Attachment	Content-Type	Size
v13-0001-Track-location-to-extract-relevant-part-in-neste.patch	application/octet-stream	39.3 KB

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	jian he <jian(dot)universality(at)gmail(dot)com>
Cc:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-23 23:36:07
Message-ID:	ZxmIZ6XkimMTgrAu@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 23, 2024 at 04:36:42PM +0800, jian he wrote:
> I am not sure of the meaning of "@$", though.

Please feel free to look at the upstream docs about that:
https://www.gnu.org/software/bison/manual/bison.html#Locations
"the location of the whole grouping is @$".
--
Michael

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-24 02:28:55
Message-ID:	Zxmw53XTayTgfBRO@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 23, 2024 at 11:24:11AM +0200, Anthonin Bonnefoy wrote:
> This also answers another issue I was wondering about. Should the
> child's parsestate inherit the location information when
> make_parsestate is called? That would be incorrect since this is used
> for sub-statement, pstate should reflect the size of the whole
> sub-statement. However, since this is unused, it is fine to leave the
> child parser with unset location data, which would in turn leave the
> statement's location unset in setQueryLocationAndLength.

Yeah, this argument sounds kind of right to me.

> Patch includes Micheal changes. I've left out 0002 for now to focus on 0001.

I've looked at this one again, and applied 0001. The final result is
really nice, thanks for all your efforts here. If this requires
tweaks in this release cycle, well, let's deal about them should they
show up. At least the set of regression tests will show us what's
going on.

Attached is the remaining piece, for DECLARE and CTAS. The
JumbleQuery() calls in ExecCreateTableAs() and ExplainOneUtility() for
CTAS queries are twins, showing the inner queries of CTAS
consistently. DECLARE is covered by one call in ExplainOneUtility()
and one in PerformCursorOpen().

This should be OK as-is. With the regression test coverage, it is
easy to see what changes. Let's keep that around for a few more days.
--
Michael

Attachment	Content-Type	Size
v14-0001-Set-query_id-for-queries-contained-in-utility-st.patch	text/x-diff	16.6 KB

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-10-28 00:07:50
Message-ID:	Zx7V1m34dM8l3QKF@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 24, 2024 at 11:28:55AM +0900, Michael Paquier wrote:
> Attached is the remaining piece, for DECLARE and CTAS. The
> JumbleQuery() calls in ExecCreateTableAs() and ExplainOneUtility() for
> CTAS queries are twins, showing the inner queries of CTAS
> consistently. DECLARE is covered by one call in ExplainOneUtility()
> and one in PerformCursorOpen().

I've come back to it with a fresher mind, and it still looked OK on a
second look, so applied after some slight tweaks. It also means that
we should be done here, so I am marking the CF entry as committed.
--
Michael

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-11-16 20:43:12
Message-ID:	202411162043.lhyqsgw72kxe@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2024-Oct-24, Michael Paquier wrote:

> Track more precisely query locations for nested statements

I just noticed that this commit broke pgaudit pretty thoroughly. I'm
not sure if this means pgaudit needs changes, or this commit needs to be
reconsidered in some way; at this point I'm just raising the alarm.

(FWIW there are a few other pgaudit-breaking changes in 18, but they
don't seem as bad as this one, to the extent that the fixes likely
belong into pgaudit.)

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I apologize for the confusion in my previous responses.
There appears to be an error." (ChatGPT)

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Subject:	Re: Set query_id for query contained in utility statement
Date:	2024-11-18 01:24:35
Message-ID:	ZzqXU-2uTWzC38jO@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg롤 토토SQL :

On Sat, Nov 16, 2024 at 09:43:12PM +0100, Alvaro Herrera wrote:
> I just noticed that this commit broke pgaudit pretty thoroughly. I'm
> not sure if this means pgaudit needs changes, or this commit needs to be
> reconsidered in some way; at this point I'm just raising the alarm.
>
> (FWIW there are a few other pgaudit-breaking changes in 18, but they
> don't seem as bad as this one, to the extent that the fixes likely
> belong into pgaudit.)

Thanks for the heads-up.

I have looked at that, and as far as I can see this is something that
I think is an improvement for pgaudit because we are able to track
more precisely the subqueries that are run as part of a CTAS or an
EXPLAIN. Well, that's the whole point of what's been done by
Anthonin.

Here are a couple of examples from regression.diffs:
-NOTICE: AUDIT: SESSION,5,1,READ,SELECT,,,CREATE TABLE tmp2 AS (SELECT * FROM tmp),<not logged>
+NOTICE: AUDIT: SESSION,5,1,READ,SELECT,,,SELECT * FROM tmp,<not logged>
[...]
-NOTICE: AUDIT: SESSION,30,1,READ,SELECT,,,explain select 1,<none>
+NOTICE: AUDIT: SESSION,30,1,READ,SELECT,,,select 1,<none>
NOTICE: AUDIT: SESSION,30,2,MISC,EXPLAIN,,,explain select 1,<none>

We still track the parent query and it is intact. Things change so
as the subquery showing in the logs is actually what's running, not
what's part of the parent query.

This one also looks less confusing to me:
-WARNING: AUDIT: OBJECT,1,1,WRITE,INSERT,TABLE,public.test4,"PREPARE testinsert(int, text) AS
- INSERT INTO test4 VALUES($1, $2)","1,<long param suppressed>"
+WARNING: AUDIT: OBJECT,1,1,WRITE,INSERT,TABLE,public.test4,"INSERT INTO test4 VALUES($1, $2)","1,<long param suppressed>"

As a whole, it seems to me that this requires a refresh of the
regression test output.

(I've found two small issues for pgaudit, will send some pull requests
in a few minutes.)
--
Michael