Re: [HACKERS] Cached plans and statement generalization

Lists: pgsql-hackers
From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Cached plans and statement generalization
Date: 2017-04-24 08:46:02
Message-ID: 8e76d8fc-8b8c-14bd-d4d1-e9cf193a74f5@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi hackers,

There were a lot of discussions about query plan caching in hackers
mailing list, but I failed to find some clear answer for my question and
the current consensus on this question in Postgres community. As far as
I understand current state is the following:
1. We have per-connection prepared statements.
2. Queries executed inside plpgsql code are implicitly prepared.

It is not always possible or convenient to use prepared statements.
For example, if pgbouncer is used to perform connection pooling.
Another use case (which is actually the problem I am trying to solve
now) is partitioning.
Efficient execution of query to partitioned table requires hardcoded
value for partitioning key.
Only in this case optimizer will be able to construct efficient query
plan which access only affected tables (partitions).

My small benchmark for distributed partitioned table based on pg_pathman
+ postgres_fdw shows 3 times degrade of performance in case of using
prepared statements.
But without prepared statements substantial amount of time is spent in
query compilation and planning. I was be able to speed up benchmark more
than two time by
sending prepared queries directly to the remote nodes.

So what I am thinking now is implicit query caching. If the same query
with different literal values is repeated many times, then we can try to
generalize this query and replace it with prepared query with
parameters. I am not considering now shared query cache: is seems to be
much harder to implement. But local caching of generalized queries seems
to be not so difficult to implement and requires not so much changes in
Postgres code. And it can be useful not only for sharding, but for many
other cases where prepared statements can not be used.

I wonder if such option was already considered and if it was for some
reasons rejected: can you point me at this reasons?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-24 10:24:23
Message-ID: CAPpHfduYkxZ9ONZgeCs_Znw1pfTSMXX17oLwJR2iex4-GfaQxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Konstantin!

On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> There were a lot of discussions about query plan caching in hackers
> mailing list, but I failed to find some clear answer for my question and
> the current consensus on this question in Postgres community. As far as I
> understand current state is the following:
> 1. We have per-connection prepared statements.
> 2. Queries executed inside plpgsql code are implicitly prepared.
>
> It is not always possible or convenient to use prepared statements.
> For example, if pgbouncer is used to perform connection pooling.
> Another use case (which is actually the problem I am trying to solve now)
> is partitioning.
> Efficient execution of query to partitioned table requires hardcoded value
> for partitioning key.
> Only in this case optimizer will be able to construct efficient query plan
> which access only affected tables (partitions).
>
> My small benchmark for distributed partitioned table based on pg_pathman +
> postgres_fdw shows 3 times degrade of performance in case of using prepared
> statements.
> But without prepared statements substantial amount of time is spent in
> query compilation and planning. I was be able to speed up benchmark more
> than two time by
> sending prepared queries directly to the remote nodes.
>

I don't think it's correct to ask PostgreSQL hackers about problem which
arises with pg_pathman while pg_pathman is an extension supported by
Postgres Pro.
Since we have declarative partitioning committed to 10, I think that
community should address this issue in the context of declarative
partitioning.
However, it's unlikely we can spot this issue with declarative partitioning
because it still uses very inefficient constraint exclusion mechanism.
Thus, issues you are writing about would become visible on declarative
partitioning only when constraint exclusion would be replaced with
something more efficient.

Long story short, could you reproduce this issue without pg_pathman?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-24 11:04:42
Message-ID: 7d90eab7-bccf-119e-3e61-c779a9b51197@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24.04.2017 13:24, Alexander Korotkov wrote:
> Hi, Konstantin!
>
> On Mon, Apr 24, 2017 at 11:46 AM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
> There were a lot of discussions about query plan caching in
> hackers mailing list, but I failed to find some clear answer for
> my question and the current consensus on this question in Postgres
> community. As far as I understand current state is the following:
> 1. We have per-connection prepared statements.
> 2. Queries executed inside plpgsql code are implicitly prepared.
>
> It is not always possible or convenient to use prepared statements.
> For example, if pgbouncer is used to perform connection pooling.
> Another use case (which is actually the problem I am trying to
> solve now) is partitioning.
> Efficient execution of query to partitioned table requires
> hardcoded value for partitioning key.
> Only in this case optimizer will be able to construct efficient
> query plan which access only affected tables (partitions).
>
> My small benchmark for distributed partitioned table based on
> pg_pathman + postgres_fdw shows 3 times degrade of performance in
> case of using prepared statements.
> But without prepared statements substantial amount of time is
> spent in query compilation and planning. I was be able to speed up
> benchmark more than two time by
> sending prepared queries directly to the remote nodes.
>
>
> I don't think it's correct to ask PostgreSQL hackers about problem
> which arises with pg_pathman while pg_pathman is an extension
> supported by Postgres Pro.
> Since we have declarative partitioning committed to 10, I think that
> community should address this issue in the context of declarative
> partitioning.
> However, it's unlikely we can spot this issue with declarative
> partitioning because it still uses very inefficient constraint
> exclusion mechanism. Thus, issues you are writing about would become
> visible on declarative partitioning only when constraint exclusion
> would be replaced with something more efficient.
>
> Long story short, could you reproduce this issue without pg_pathman?
>

Sorry, I have mentioned pg_pathman just as example.
The same problems takes place with partitioning based on standard
Postgres inheritance mechanism (when I manually create derived tables
and specify constraints for them).
I didn't test yet declarative partitioning committed to 10, but I expect
the that it will also suffer from this problem (because is based on
inheritance).
But as I wrote, I think that the problem with plan caching is wider and
is not bounded just to partitioning.

> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> <http://www.postgrespro.com/>
> The Russian Postgres Company
>
>

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-24 18:43:53
Message-ID: 20170424184353.u2krt2eiw2l6e2af@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> So what I am thinking now is implicit query caching. If the same query with
> different literal values is repeated many times, then we can try to
> generalize this query and replace it with prepared query with
> parameters.

That's not actuall all that easy:
- You pretty much do parse analysis to be able to do an accurate match.
How much overhead is parse analysis vs. planning in your cases?
- The invalidation infrastructure for this, if not tied to to fully
parse-analyzed statements, is going to be hell.
- Migrating to parameters can actually cause significant slowdowns, not
nice if that happens implicitly.

Greetings,

Andres Freund


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 15:11:09
Message-ID: f0bfcdd6-dba3-e9a8-2108-146e1f880839@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24.04.2017 21:43, Andres Freund wrote:
> Hi,
>
> On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
>> So what I am thinking now is implicit query caching. If the same query with
>> different literal values is repeated many times, then we can try to
>> generalize this query and replace it with prepared query with
>> parameters.
> That's not actuall all that easy:
> - You pretty much do parse analysis to be able to do an accurate match.
> How much overhead is parse analysis vs. planning in your cases?
> - The invalidation infrastructure for this, if not tied to to fully
> parse-analyzed statements, is going to be hell.
> - Migrating to parameters can actually cause significant slowdowns, not
> nice if that happens implicitly.

Well, first of all I want to share results I already get: pgbench with
default parameters, scale 10 and one connection:

protocol
TPS
simple
3492
extended
2927
prepared
6865
simple + autoprepare
6844

So autoprepare is as efficient as explicit prepare and can increase
performance almost two times.

My current implementation is replacing with parameters only string
literals in the query, i.e. select * from T where x='123'; -> select *
from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to
locate '\'' character and then correctly handle pairs of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly
understand whether minus should be treated as part of literal or as
operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where
parameters can not be used, for example "order by 1".

Fully correct substitution can be done by first performing parsing the
query, then transform parse tree, replacing literal nodes with parameter
nodes and finally deparse tree into generalized query. postgres_fdw
already contains such deparse code. It can be moved to postgres core and
reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time
needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.

There is obvious question: how I managed to get this pgbench results if
currently only substitution of string literals is supported and queries
constructed by pgbench don't contain string literals? I just made small
patch in pgbench replaceVariable method wrapping value's representation
in quotes. It has almost no impact on performance (3482 TPS vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.

I attached my patch to this mail. It is just first version of the patch
(based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is
considered to be useful, I will continue work on this patch.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare.patch text/x-patch 18.0 KB

From: Serge Rielau <serge(at)rielau(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Doole <ddoole(at)salesforce(dot)com>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 16:12:41
Message-ID: CBF37923-CDA5-48EF-9E9E-4A338D7988ED@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.

Doug Doole did this work in DB2 LUW and he may be able to point to more places to watch out for semantically.

Generally, in my experience, this feature is very valuable when dealing with (poorly designed) web apps that just glue together strings.
Protecting it under a GUC would allow to only do the work if it’s deemed likely to help.
Another rule I find useful is to abort any efforts to substitute literals if any bind variable is found in the query.
That can be used as a cue that the author of the SQL left the remaining literals in on purpose.

A follow up feature would be to formalize different flavors of peeking.
I.e. can you produce a generic plan, but still recruit the initial set of bind values/substituted literals to dos costing?

Cheers
Serge Rielau
Salesforce.com <http://salesforce.com/>

PS: FWIW, I like this feature.


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Serge Rielau <serge(at)rielau(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Doole <ddoole(at)salesforce(dot)com>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 16:45:20
Message-ID: 1f4c1a22-ecbe-811d-4695-ee4701b8f6c7@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25.04.2017 19:12, Serge Rielau wrote:
>
>> On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik
>> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>> Another problem is caused by using integer literals in context where
>> parameters can not be used, for example "order by 1”.
> You will also need to deal with modifiers in types such as
> VARCHAR(10). Not sure if there are specific functions which can only
> deal with literals (?) as well.

Sorry, I do not completely understand how presence of type modifiers can
affect string literals used in query.
Can you provide me some example?

>
> Doug Doole did this work in DB2 LUW and he may be able to point to
> more places to watch out for semantically.
>
> Generally, in my experience, this feature is very valuable when
> dealing with (poorly designed) web apps that just glue together strings.

I do not think that this optimization will be useful only for poorly
designed application.
I already pointed on two use cases where prepapred statements can not be
used:
1. pgbouncer without session-level pooling.
2. partitioning

> Protecting it under a GUC would allow to only do the work if it’s
> deemed likely to help.
> Another rule I find useful is to abort any efforts to substitute
> literals if any bind variable is found in the query.
> That can be used as a cue that the author of the SQL left the
> remaining literals in on purpose.
>
> A follow up feature would be to formalize different flavors of peeking.
> I.e. can you produce a generic plan, but still recruit the initial set
> of bind values/substituted literals to dos costing?
Here situation is the same as for explicitly prepared statements, isn't it?
Sometimes it is preferrable to use specialized plan rather than generic
plan.
I am not sure if postgres now is able to do it.

>
> Cheers
> Serge Rielau
> Salesforce.com <http://salesforce.com>
>
> PS: FWIW, I like this feature.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: David Fetter <david(at)fetter(dot)org>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 16:54:21
Message-ID: 20170425165421.GD32130@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> On 24.04.2017 21:43, Andres Freund wrote:
> > Hi,
> >
> > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > So what I am thinking now is implicit query caching. If the same query with
> > > different literal values is repeated many times, then we can try to
> > > generalize this query and replace it with prepared query with
> > > parameters.
> > That's not actuall all that easy:
> > - You pretty much do parse analysis to be able to do an accurate match.
> > How much overhead is parse analysis vs. planning in your cases?
> > - The invalidation infrastructure for this, if not tied to to fully
> > parse-analyzed statements, is going to be hell.
> > - Migrating to parameters can actually cause significant slowdowns, not
> > nice if that happens implicitly.
>
> Well, first of all I want to share results I already get: pgbench with
> default parameters, scale 10 and one connection:
>
> protocol
> TPS
> simple
> 3492
> extended
> 2927
> prepared
> 6865
> simple + autoprepare
> 6844

If this is string mashing on the unparsed query, as it appears to be,
it's going to be a perennial source of security issues.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Serge Rielau <serge(at)rielau(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Doole <ddoole(at)salesforce(dot)com>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 17:09:37
Message-ID: e85343e6-5ab5-447c-9995-5cce992fedf3@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote: On 25.04.2017 19:12, Serge Rielau wrote:

On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik < k(dot)knizhnik(at)postgrespro(dot)ru [k(dot)knizhnik(at)postgrespro(dot)ru] > wrote: Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well. Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
Can you provide me some example? SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
Also some OLAP syntax like “rows preceding”
It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.
Cheers Serge


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: David Fetter <david(at)fetter(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 20:35:21
Message-ID: 58FFB309.5040209@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/25/2017 07:54 PM, David Fetter wrote:
> On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
>> On 24.04.2017 21:43, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
>>>> So what I am thinking now is implicit query caching. If the same query with
>>>> different literal values is repeated many times, then we can try to
>>>> generalize this query and replace it with prepared query with
>>>> parameters.
>>> That's not actuall all that easy:
>>> - You pretty much do parse analysis to be able to do an accurate match.
>>> How much overhead is parse analysis vs. planning in your cases?
>>> - The invalidation infrastructure for this, if not tied to to fully
>>> parse-analyzed statements, is going to be hell.
>>> - Migrating to parameters can actually cause significant slowdowns, not
>>> nice if that happens implicitly.
>> Well, first of all I want to share results I already get: pgbench with
>> default parameters, scale 10 and one connection:
>>
>> protocol
>> TPS
>> simple
>> 3492
>> extended
>> 2927
>> prepared
>> 6865
>> simple + autoprepare
>> 6844
> If this is string mashing on the unparsed query, as it appears to be,
> it's going to be a perennial source of security issues.

Sorry, may be I missed something, but I can not understand how security can be violated by extracting string literals from query. I am just copying bytes from one buffer to another. I do not try to somehow interpret this parameters. What I am doing is
very similar with standard prepared statements.
And moreover query is parsed! Only query which was already parsed and executed (but with different values of parameters) can be autoprepared.

>
> Best,
> David.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Serge Rielau <serge(at)rielau(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Doole <ddoole(at)salesforce(dot)com>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 20:37:37
Message-ID: 58FFB391.5080101@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/25/2017 08:09 PM, Serge Rielau wrote:
>
> On Tue, Apr 25, 2017 at 9:45 AM, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
> On 25.04.2017 19:12, Serge Rielau wrote:
>
>
> On Apr 25, 2017, at 8:11 AM, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
> Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1”.
>
> You will also need to deal with modifiers in types such as VARCHAR(10). Not sure if there are specific functions which can only deal with literals (?) as well.
>
> Sorry, I do not completely understand how presence of type modifiers can affect string literals used in query.
> Can you provide me some example?
>
> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>
> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.

I am substituting only string literals. So the query above will be transformed to

SELECT $1::CHAR(10) || $2, 5 + 6;

What's wrong with it?

>
> Also some OLAP syntax like “rows preceding”
>
> It pretty much boils down to whether you can do some shallow parsing rather than expending the effort to build the parse tree.
>
> Cheers
> Serge

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Serge Rielau <serge(at)rielau(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Doole <ddoole(at)salesforce(dot)com>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 20:40:10
Message-ID: C0064A85-7CE9-44E6-B14F-98E541B328FD@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>
>> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>>
>> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
>
> I am substituting only string literals. So the query above will be transformed to
>
> SELECT $1::CHAR(10) || $2, 5 + 6;
>
> What's wrong with it?

Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Cheers
Serge


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Serge Rielau <serge(at)rielau(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Doole <ddoole(at)salesforce(dot)com>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 20:47:44
Message-ID: 58FFB5F0.2020900@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/25/2017 11:40 PM, Serge Rielau wrote:
>
>> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>>>
>>>
>>> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>>>
>>> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
>>
>> I am substituting only string literals. So the query above will be transformed to
>>
>> SELECT $1::CHAR(10) || $2, 5 + 6;
>>
>> What's wrong with it?
>
> Oh, well that leaves a lot of opportunities on the table, doesn’t it?

Well, actually my primary intention was not to make badly designed programs (not using prepared statements) work faster.
I wanted to address cases when it is not possible to use prepared statements.
If we want to substitute with parameters as much literals as possible, then parse+deparse tree seems to be the only reasonable approach.
I will try to implement it also, just to estimate parsing overhead.

>
> Cheers
> Serge
>

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Doug Doole <ddoole(at)salesforce(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Serge Rielau <serge(at)rielau(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 21:11:08
Message-ID: CAP6UvaMjiumn5-Lre_w11j=ZAOB9auO1x=-meTKsWRfcsQ7Pcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When I did this in DB2, I didn't use the parser - it was too expensive. I
just tokenized the statement and used some simple rules to bypass the
invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
disallow replacement replacement until I hit the end of the current
subquery or statement.

There are a few limitations to this approach. For example, DB2 allowed you
to cast using function notation like VARCHAR(foo, 10). This meant I would
never replace the second parameter of any VARCHAR function. Now it's
possible that when the statement was fully compiled we'd find that
VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in
cast function. Our thinking that these cases were rare enough that we
wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax
avoids this problem completely.)

Because SQL is so structured, the implementation ended up being quite
simple (a few hundred line of code) with no significant maintenance issues.
(Other developers had no problem adding in new cases where constants had to
be preserved.)

The simple tokenizer was also fairly extensible. I'd prototyped using the
same code to also normalize statements (uppercase all keywords, collapse
whitespace to a single blank, etc.) but that feature was never added to the
product.

- Doug

On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> On 04/25/2017 11:40 PM, Serge Rielau wrote:
>
>
> On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik <
> k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
>
> SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6;
>
> You can substitute ‘hello’, ‘World’, 5, and 6. But not 10.
>
>
> I am substituting only string literals. So the query above will be
> transformed to
>
> SELECT $1::CHAR(10) || $2, 5 + 6;
>
> What's wrong with it?
>
>
> Oh, well that leaves a lot of opportunities on the table, doesn’t it?
>
>
> Well, actually my primary intention was not to make badly designed
> programs (not using prepared statements) work faster.
> I wanted to address cases when it is not possible to use prepared
> statements.
> If we want to substitute with parameters as much literals as possible,
> then parse+deparse tree seems to be the only reasonable approach.
> I will try to implement it also, just to estimate parsing overhead.
>
>
>
>
> Cheers
> Serge
>
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


From: Andres Freund <andres(at)anarazel(dot)de>
To: Doug Doole <ddoole(at)salesforce(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Serge Rielau <serge(at)rielau(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 21:47:46
Message-ID: 20170425214746.3mngxgbo7yndfgax@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
> When I did this in DB2, I didn't use the parser - it was too expensive. I
> just tokenized the statement and used some simple rules to bypass the
> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> disallow replacement replacement until I hit the end of the current
> subquery or statement.

How did you manage plan invalidation and such?

- Andres


From: Doug Doole <ddoole(at)salesforce(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Serge Rielau <serge(at)rielau(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 22:21:22
Message-ID: CAP6UvaNvFYprWdghZ4=MFUFLczzj1br9CX+vAQFz8wAGd14yug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Plan invalidation was no different than for any SQL statement. DB2 keeps a
list of the objects the statement depends on. If any of the objects changes
in an incompatible way the plan is invalidated and kicked out of the cache.

I suspect what is more interesting is plan lookup. DB2 has something called
the "compilation environment". This is a collection of everything that
impacts how a statement is compiled (SQL path, optimization level, etc.).
Plan lookup is done using both the statement text and the compilation
environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
path is ANDRES, MYTEAM, SYSIBM we will have different compilation
environments. If we both issue "SELECT * FROM T" we'll end up with
different cache entries even if T in both of our statements resolves to
MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
execute "SELECT * FROM T" again, I have a new compilation environment so
the second invocation of the statement will create a new entry in the
cache. The first entry is not kicked out - it will still be there for
re-use if I change my SQL path back to my original value (modulo LRU for
cache memory management of course).

With literal replacement, the cache entry is on the modified statement
text. Given the modified statement text and the compilation environment,
you're guaranteed to get the right plan entry.

On Tue, Apr 25, 2017 at 2:47 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
> > When I did this in DB2, I didn't use the parser - it was too expensive. I
> > just tokenized the statement and used some simple rules to bypass the
> > invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
> > disallow replacement replacement until I hit the end of the current
> > subquery or statement.
>
> How did you manage plan invalidation and such?
>
> - Andres
>


From: David Fetter <david(at)fetter(dot)org>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 22:24:08
Message-ID: 20170425222408.GF32130@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 25, 2017 at 11:35:21PM +0300, Konstantin Knizhnik wrote:
> On 04/25/2017 07:54 PM, David Fetter wrote:
> > On Tue, Apr 25, 2017 at 06:11:09PM +0300, Konstantin Knizhnik wrote:
> > > On 24.04.2017 21:43, Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:
> > > > > So what I am thinking now is implicit query caching. If the same query with
> > > > > different literal values is repeated many times, then we can try to
> > > > > generalize this query and replace it with prepared query with
> > > > > parameters.
> > > > That's not actuall all that easy:
> > > > - You pretty much do parse analysis to be able to do an accurate match.
> > > > How much overhead is parse analysis vs. planning in your cases?
> > > > - The invalidation infrastructure for this, if not tied to to fully
> > > > parse-analyzed statements, is going to be hell.
> > > > - Migrating to parameters can actually cause significant slowdowns, not
> > > > nice if that happens implicitly.
> > > Well, first of all I want to share results I already get: pgbench with
> > > default parameters, scale 10 and one connection:
> > >
> > > protocol
> > > TPS
> > > simple
> > > 3492
> > > extended
> > > 2927
> > > prepared
> > > 6865
> > > simple + autoprepare
> > > 6844
> > If this is string mashing on the unparsed query, as it appears to be,
> > it's going to be a perennial source of security issues.
>
> Sorry, may be I missed something, but I can not understand how
> security can be violated by extracting string literals from query. I
> am just copying bytes from one buffer to another. I do not try to
> somehow interpret this parameters. What I am doing is very similar
> with standard prepared statements. And moreover query is parsed!
> Only query which was already parsed and executed (but with different
> values of parameters) can be autoprepared.

I don't have an exploit yet. What concerns me is attackers' access to
what is in essence the ability to poke at RULEs when they only have
privileges to read.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Andres Freund <andres(at)anarazel(dot)de>
To: Doug Doole <ddoole(at)salesforce(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Serge Rielau <serge(at)rielau(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 22:34:48
Message-ID: 20170425223448.amnhdludwondarph@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

(FWIW, on this list we don't do top-quotes)

On 2017-04-25 22:21:22 +0000, Doug Doole wrote:
> Plan invalidation was no different than for any SQL statement. DB2 keeps a
> list of the objects the statement depends on. If any of the objects changes
> in an incompatible way the plan is invalidated and kicked out of the cache.
>
> I suspect what is more interesting is plan lookup. DB2 has something called
> the "compilation environment". This is a collection of everything that
> impacts how a statement is compiled (SQL path, optimization level, etc.).
> Plan lookup is done using both the statement text and the compilation
> environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
> path is ANDRES, MYTEAM, SYSIBM we will have different compilation
> environments. If we both issue "SELECT * FROM T" we'll end up with
> different cache entries even if T in both of our statements resolves to
> MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
> execute "SELECT * FROM T" again, I have a new compilation environment so
> the second invocation of the statement will create a new entry in the
> cache. The first entry is not kicked out - it will still be there for
> re-use if I change my SQL path back to my original value (modulo LRU for
> cache memory management of course).

It's not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 22:38:23
Message-ID: CAKFQuwYMzpLF_mPNsfacMptx74++aVkiBU3=iEgiTDtMb3RmBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 25, 2017 at 3:24 PM, David Fetter <david(at)fetter(dot)org> wrote:

> I don't have an exploit yet. What concerns me is attackers' access to
> what is in essence the ability to poke at RULEs when they only have
> privileges to read.
>

​If they want to see how it works they can read the source code. In terms
of runtime data it would limited to whatever the session itself created.
In most cases the presence of the cache would be invisible. I suppose it
might appear if one were to explain a query, reset the session, explain
another query and then re-explain the original. If the chosen plan in the
second pass differed because of the presence of the leading query it would
be noticeable but not revealing. Albeit I'm a far cry from a security
expert...

David J.


From: Doug Doole <ddoole(at)salesforce(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Serge Rielau <serge(at)rielau(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 22:48:16
Message-ID: CAP6UvaO9bS8+U-Upt-1Fkn7tgUyNW_BiCFbh9a4LgcHqh84kQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> (FWIW, on this list we don't do top-quotes)
>

I know. Forgot and just did "reply all". My bad.

It's not always that simple, at least in postgres, unless you disregard
> search_path. Consider e.g. cases like
>
> CREATE SCHEMA a;
> CREATE SCHEMA b;
> CREATE TABLE a.foobar(somecol int);
> SET search_patch = 'b,a';
> SELECT * FROM foobar;
> CREATE TABLE b.foobar(anothercol int);
> SELECT * FROM foobar; -- may not be cached plan from before!
>
> it sounds - my memory of DB2 is very faint, and I never used it much -
> like similar issues could arise in DB2 too?
>

DB2 does handle this case. Unfortunately I don't know the details of how it
worked though.

A naive option would be to invalidate anything that depends on table or
view *.FOOBAR. You could probably make it a bit smarter by also requiring
that schema A appear in the path.

- Doug


From: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Doug Doole <ddoole(at)salesforce(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Serge Rielau <serge(at)rielau(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 22:57:15
Message-ID: B1FAE502-7CA7-4122-B37E-3CA1F174D997@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 4/25/17, 6:34 PM, "pgsql-hackers-owner(at)postgresql(dot)org on behalf of Andres Freund" <pgsql-hackers-owner(at)postgresql(dot)org on behalf of andres(at)anarazel(dot)de> wrote:

It's not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

You may need to store the reloid of each relation in the cached query to ensure that the schema bindings are the same, and also invalidate dependent cache entries if a referenced relation is dropped.

Regards,

Jim Finnerty


From: Serge Rielau <serge(at)rielau(dot)com>
To: Doug Doole <ddoole(at)salesforce(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-25 23:06:47
Message-ID: 6e24b729-30f0-4bdb-8bc4-549266b0776f@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.4.52&pv=10.11.6&source=email_footer_2]
On Tue, Apr 25, 2017 at 3:48 PM, Doug Doole <ddoole(at)salesforce(dot)com> wrote: It's not always that simple, at least in postgres, unless you disregard
search_path. Consider e.g. cases like

CREATE SCHEMA a;
CREATE SCHEMA b;
CREATE TABLE a.foobar(somecol int);
SET search_patch = 'b,a';
SELECT * FROM foobar;
CREATE TABLE b.foobar(anothercol int);
SELECT * FROM foobar; -- may not be cached plan from before!

it sounds - my memory of DB2 is very faint, and I never used it much -
like similar issues could arise in DB2 too?

DB2 does handle this case. Unfortunately I don't know the details of how it worked though.
A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path. While this specific scenario does not arise in DB2 since it uses CURRENT SCHEMA only for tables (much to my dislike) your examples holds for functions and types which are resolved by path. For encapsulated SQL (in views, functions) conservative semantics are enforced via including the timestamp. For dynamic SQL the problem you describe does exist though and I think it is handled in the way Doug describes. However, as noted by Doug the topic of plan invalidation is really orthogonal to normalizing the queries. All it does is provide more opportunities to run into any pre-existing bugs.
Cheers Serge
PS: I’m just starting to look at the plan invalidation code in PG because we are dealing with potentially 10s of thousands of cached SQL statements. So these complete wipe outs or walks of every plan in the cache don’t scale.


From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Konstantin Knizhnik' <k(dot)knizhnik(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-26 01:00:42
Message-ID: 0A3221C70F24FB45833433255569204D1F6EB258@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Konstantin
> Knizhnik
> Well, first of all I want to share results I already get: pgbench with default
> parameters, scale 10 and one connection:
>
> So autoprepare is as efficient as explicit prepare and can increase
> performance almost two times.

This sounds great.

BTW, when are the autoprepared statements destroyed? Are you considering some upper limit on the number of prepared statements?

Regards
Takayuki Tsunakawa


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-26 07:00:17
Message-ID: f5d50b64-0c83-369e-3a5b-789a93ed0cbe@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.04.2017 00:47, Andres Freund wrote:
> On 2017-04-25 21:11:08 +0000, Doug Doole wrote:
>> When I did this in DB2, I didn't use the parser - it was too expensive. I
>> just tokenized the statement and used some simple rules to bypass the
>> invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd
>> disallow replacement replacement until I hit the end of the current
>> subquery or statement.
> How did you manage plan invalidation and such?

The same mechanism as for prepared statements.
Cached plans are linked in the list by SaveCachedPlan function and are
invalidated by PlanCacheRelCallback.

> - Andres
>
>

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>, Doug Doole <ddoole(at)salesforce(dot)com>
Cc: Serge Rielau <serge(at)rielau(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-26 07:39:05
Message-ID: 617f73cb-b679-81e3-8c4e-92166ad4d0fc@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.04.2017 01:34, Andres Freund wrote:
> Hi,
>
> (FWIW, on this list we don't do top-quotes)
>
> On 2017-04-25 22:21:22 +0000, Doug Doole wrote:
>> Plan invalidation was no different than for any SQL statement. DB2 keeps a
>> list of the objects the statement depends on. If any of the objects changes
>> in an incompatible way the plan is invalidated and kicked out of the cache.
>>
>> I suspect what is more interesting is plan lookup. DB2 has something called
>> the "compilation environment". This is a collection of everything that
>> impacts how a statement is compiled (SQL path, optimization level, etc.).
>> Plan lookup is done using both the statement text and the compilation
>> environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your
>> path is ANDRES, MYTEAM, SYSIBM we will have different compilation
>> environments. If we both issue "SELECT * FROM T" we'll end up with
>> different cache entries even if T in both of our statements resolves to
>> MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then
>> execute "SELECT * FROM T" again, I have a new compilation environment so
>> the second invocation of the statement will create a new entry in the
>> cache. The first entry is not kicked out - it will still be there for
>> re-use if I change my SQL path back to my original value (modulo LRU for
>> cache memory management of course).
> It's not always that simple, at least in postgres, unless you disregard
> search_path. Consider e.g. cases like
>
> CREATE SCHEMA a;
> CREATE SCHEMA b;
> CREATE TABLE a.foobar(somecol int);
> SET search_patch = 'b,a';
> SELECT * FROM foobar;
> CREATE TABLE b.foobar(anothercol int);
> SELECT * FROM foobar; -- may not be cached plan from before!
>
> it sounds - my memory of DB2 is very faint, and I never used it much -
> like similar issues could arise in DB2 too?

There is the same problem with explicitly prepared statements, isn't it?
Certainly in case of using prepared statements it is responsibility of
programmer to avoid such collisions.
And in case of autoprepare programmer it is hidden from programming.
But there is guc variable controlling autoprepare feature and by default
it is switched off.
So if programmer or DBA enables it, then them should take in account
effects of such decision.

By the way, isn't it a bug in PostgreSQL that altering search path is
not invalidating cached plans?
As I already mentioned, the same problem can be reproduced with
explicitly prepared statements.

>
> Greetings,
>
> Andres Freund

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-26 07:49:18
Message-ID: bda56ce2-64fa-8236-b3a9-71db71b41c6b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.04.2017 04:00, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-owner(at)postgresql(dot)org
>> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Konstantin
>> Knizhnik
>> Well, first of all I want to share results I already get: pgbench with default
>> parameters, scale 10 and one connection:
>>
>> So autoprepare is as efficient as explicit prepare and can increase
>> performance almost two times.
> This sounds great.
>
> BTW, when are the autoprepared statements destroyed?
Right now them are destroyed only in case of receiving invalidation
message (when catalog is changed).
Prepared statements are local to backend and are located in backend's
memory.
It is unlikely, that there will be too much different queries which
cause memory overflow.
But in theory such situation is certainly possible.

> Are you considering some upper limit on the number of prepared statements?
In this case we need some kind of LRU for maintaining cache of
autoprepared statements.
I think that it is good idea to have such limited cached - it can avoid
memory overflow problem.
I will try to implement it.

> Regards
> Takayuki Tsunakawa
>
>

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-04-26 10:30:25
Message-ID: 7741e358-ff0e-af5d-9899-306b20a1a6e1@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.04.2017 10:49, Konstantin Knizhnik wrote:
>
>
> On 26.04.2017 04:00, Tsunakawa, Takayuki wrote: Are you considering
> some upper limit on the number of prepared statements?
> In this case we need some kind of LRU for maintaining cache of
> autoprepared statements.
> I think that it is good idea to have such limited cached - it can
> avoid memory overflow problem.
> I will try to implement it.

I attach new patch which allows to limit the number of autoprepared
statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent connections
and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only
statements are below:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k

As you can see, autoprepare provides more than 2 times speed improvement.

Also I tried to measure overhead of parsing (to be able to substitute
all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question: is it better to implement slower
but safer and more universal solution?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare.patch text/x-patch 19.4 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-26 10:46:12
Message-ID: CAFj8pRDRfva3b4+xL7s1jUEOCKVZ4Q14NnajVjwpoR80XJdaVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2017-04-26 12:30 GMT+02:00 Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>:

>
>
> On 26.04.2017 10:49, Konstantin Knizhnik wrote:
>
>
>
> On 26.04.2017 04:00, Tsunakawa, Takayuki wrote: Are you considering some
> upper limit on the number of prepared statements?
> In this case we need some kind of LRU for maintaining cache of
> autoprepared statements.
> I think that it is good idea to have such limited cached - it can avoid
> memory overflow problem.
> I will try to implement it.
>
>
> I attach new patch which allows to limit the number of autoprepared
> statements (autoprepare_limit GUC variable).
> Also I did more measurements, now with several concurrent connections and
> read-only statements.
> Results of pgbench with 10 connections, scale 10 and read-only statements
> are below:
>
> Protocol
> TPS
> extended
> 87k
> prepared
> 209k
> simple+autoprepare
> 206k
>
> As you can see, autoprepare provides more than 2 times speed improvement.
>
> Also I tried to measure overhead of parsing (to be able to substitute all
> literals, not only string literals).
> I just added extra call of pg_parse_query. Speed is reduced to 181k.
> So overhead is noticeable, but still making such optimization useful.
> This is why I want to ask question: is it better to implement slower but
> safer and more universal solution?
>

Unsafe solution has not any sense, and it is dangerous (80% of database
users has not necessary knowledge). If somebody needs the max possible
performance, then he use explicit prepared statements.

Regards

Pavel

>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


From: Doug Doole <ddoole(at)salesforce(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Serge Rielau <serge(at)rielau(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-26 17:08:18
Message-ID: CAP6UvaOHL2mzKT4nX=5XcsY8mRqV0vjzXc+4dJsYx+uMsAQ6Ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> A naive option would be to invalidate anything that depends on table or
> view *.FOOBAR. You could probably make it a bit smarter by also requiring
> that schema A appear in the path.
>

This has been rumbling around in my head. I wonder if you could solve this
problem by registering dependencies on objects which don't yet exist.
Consider:

CREATE TABLE C.T1(...);
CREATE TABLE C.T2(...);
SET search_path='A,B,C,D';
SELECT * FROM C.T1, T2;

For T1, you'd register a hard dependency on C.T1 and no virtual
dependencies since the table is explicitly qualified.

For T2, you'd register a hard dependency on C.T2 since that is the table
that was selected for the query. You'd also register virtual dependencies
on A.T2 and B.T2 since if either of those tables (or views) are created you
need to recompile the statement. (Note that no virtual dependency is
created on D.T2() since that table would never be selected by the compiler.)

The catch is that virtual dependencies would have to be recorded and
searched as strings, not OIDs since the objects don't exist. Virtual
dependencies only have to be checked during CREATE processing though, so
that might not be too bad.

But this is getting off topic - I just wanted to capture the idea while it
was rumbling around.


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Doug Doole <ddoole(at)salesforce(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Serge Rielau <serge(at)rielau(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-26 19:35:21
Message-ID: 5900F679.8000100@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/26/2017 08:08 PM, Doug Doole wrote:
>
> A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path.
>
>
> This has been rumbling around in my head. I wonder if you could solve this problem by registering dependencies on objects which don't yet exist. Consider:
>
> CREATE TABLE C.T1(...);
> CREATE TABLE C.T2(...);
> SET search_path='A,B,C,D';
> SELECT * FROM C.T1, T2;
>
> For T1, you'd register a hard dependency on C.T1 and no virtual dependencies since the table is explicitly qualified.
>
> For T2, you'd register a hard dependency on C.T2 since that is the table that was selected for the query. You'd also register virtual dependencies on A.T2 and B.T2 since if either of those tables (or views) are created you need to recompile the
> statement. (Note that no virtual dependency is created on D.T2() since that table would never be selected by the compiler.)
>
> The catch is that virtual dependencies would have to be recorded and searched as strings, not OIDs since the objects don't exist. Virtual dependencies only have to be checked during CREATE processing though, so that might not be too bad.
>
> But this is getting off topic - I just wanted to capture the idea while it was rumbling around.

I think that it will be enough to handle modification of search path and invalidate prepared statements cache in this case.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-04-28 10:01:31
Message-ID: 304dfa09-6135-cbcf-d6e5-1ea6c36e6eec@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.04.2017 13:46, Pavel Stehule wrote:
>
> I attach new patch which allows to limit the number of
> autoprepared statements (autoprepare_limit GUC variable).
> Also I did more measurements, now with several concurrent
> connections and read-only statements.
> Results of pgbench with 10 connections, scale 10 and read-only
> statements are below:
>
> Protocol
> TPS
> extended
> 87k
> prepared
> 209k
> simple+autoprepare
> 206k
>
>
> As you can see, autoprepare provides more than 2 times speed
> improvement.
>
> Also I tried to measure overhead of parsing (to be able to
> substitute all literals, not only string literals).
> I just added extra call of pg_parse_query. Speed is reduced to 181k.
> So overhead is noticeable, but still making such optimization useful.
> This is why I want to ask question: is it better to implement
> slower but safer and more universal solution?
>
>
> Unsafe solution has not any sense, and it is dangerous (80% of
> database users has not necessary knowledge). If somebody needs the max
> possible performance, then he use explicit prepared statements.
>

I attached new patch to this mail. I completely reimplement my original
approach and now use parse tree transformation.
New pgbench (-S -c 10) results are the following:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
185k

So there is some slowdown comparing with my original implementation and
explicitly prepared statements, but still it provide more than two times
speed-up comparing with unprepared queries. And it doesn't require to
change existed applications.
As far as most of real production application are working with DBMS
through some connection pool (pgbouncer,...), I think that such
optimization will be useful.
Isn't it interesting if If we can increase system throughput almost two
times by just setting one parameter in configuration file?

I also tried to enable autoprepare by default and run regression tests.
7 tests are not passed because of the following reasons:
1. Slightly different error reporting (for example error location is not
always identically specified).
2. Difference in query behavior caused by changed local settings
(Andres gives an example with search_path, and date test is failed
because of changing datestyle).
3. Problems with indirect dependencies (when table is altered only
cached plans directly depending on this relation and invalidated, but
not plans with indirect dependencies).
4. Not performing domain checks for null values.

I do not think that this issues can cause problems for real application.

Also it is possible to limit number of autoprepared statements using
autoprepare_limit parameter, avoid possible backend memory overflow in
case of larger number of unique queries sent by application. LRU
discipline is used to drop least recently used plans.

Any comments and suggestions for future improvement of this patch are
welcome.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare.patch text/x-patch 42.9 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-01 15:52:37
Message-ID: CA+TgmoZAHc4wqL3HvVmTd87s=bobp_8_7R9H=B_kH6q4GcA=Yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

>
> Any comments and suggestions for future improvement of this patch are
> welcome.
>

+ PG_TRY();
+ {
+ query = parse_analyze_varparams(parse_tree,
+ query_string,
+ &param_types,
+ &num_params);
+ }
+ PG_CATCH();
+ {
+ /*
+ * In case of analyze errors revert back to original query
processing
+ * and disable autoprepare for this query to avoid such
problems in future.
+ */
+ FlushErrorState();
+ if (snapshot_set) {
+ PopActiveSnapshot();
+ }
+ entry->disable_autoprepare = true;
+ undo_query_plan_changes(parse_tree, const_param_list);
+ MemoryContextSwitchTo(old_context);
+ return false;
+ }
+ PG_END_TRY();

This is definitely not a safe way of using TRY/CATCH.

+
+ /* Convert literal value to parameter value */
+ switch (const_param->literal->val.type)
+ {
+ /*
+ * Convert from integer literal
+ */
+ case T_Integer:
+ switch (ptype) {
+ case INT8OID:
+ params->params[paramno].value =
Int64GetDatum((int64)const_param->literal->val.val.ival);
+ break;
+ case INT4OID:
+ params->params[paramno].value =
Int32GetDatum((int32)const_param->literal->val.val.ival);
+ break;
+ case INT2OID:
+ if (const_param->literal->val.val.ival < SHRT_MIN
+ || const_param->literal->val.val.ival > SHRT_MAX)
+ {
+ ereport(ERROR,
+
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("smallint out of range")));
+ }
+ params->params[paramno].value =
Int16GetDatum((int16)const_param->literal->val.val.ival);
+ break;
+ case FLOAT4OID:
+ params->params[paramno].value =
Float4GetDatum((float)const_param->literal->val.val.ival);
+ break;
+ case FLOAT8OID:
+ params->params[paramno].value =
Float8GetDatum((double)const_param->literal->val.val.ival);
+ break;
+ case INT4RANGEOID:
+ sprintf(buf, "[%ld,%ld]",
const_param->literal->val.val.ival, const_param->literal->val.val.ival);
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value =
OidInputFunctionCall(typinput, buf, typioparam, -1);
+ break;
+ default:
+ pg_lltoa(const_param->literal->val.val.ival, buf);
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value =
OidInputFunctionCall(typinput, buf, typioparam, -1);
+ }
+ break;
+ case T_Null:
+ params->params[paramno].isnull = true;
+ break;
+ default:
+ /*
+ * Convert from string literal
+ */
+ getTypeInputInfo(ptype, &typinput, &typioparam);
+ params->params[paramno].value =
OidInputFunctionCall(typinput, const_param->literal->val.val.str,
typioparam, -1);
+ }

I don't see something with a bunch of hard-coded rules for particular type
OIDs having any chance of being acceptable.

This patch seems to duplicate a large amount of existing code. That would
be a good thing to avoid.

It could use a visit from the style police and a spell-checker, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-02 09:50:17
Message-ID: 4bd78cd4-1e64-5294-2eb8-3d061b20288f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.05.2017 18:52, Robert Haas wrote:
> On Fri, Apr 28, 2017 at 6:01 AM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
>
> Any comments and suggestions for future improvement of this patch
> are welcome.
>
>
> + PG_TRY();
> + {
> + query = parse_analyze_varparams(parse_tree,
> + query_string,
> + &param_types,
> + &num_params);
> + }
> + PG_CATCH();
> + {
> + /*
> + * In case of analyze errors revert back to original
> query processing
> + * and disable autoprepare for this query to avoid such
> problems in future.
> + */
> + FlushErrorState();
> + if (snapshot_set) {
> + PopActiveSnapshot();
> + }
> + entry->disable_autoprepare = true;
> + undo_query_plan_changes(parse_tree, const_param_list);
> + MemoryContextSwitchTo(old_context);
> + return false;
> + }
> + PG_END_TRY();
>
> This is definitely not a safe way of using TRY/CATCH.
>
> +
> + /* Convert literal value to parameter value */
> + switch (const_param->literal->val.type)
> + {
> + /*
> + * Convert from integer literal
> + */
> + case T_Integer:
> + switch (ptype) {
> + case INT8OID:
> + params->params[paramno].value =
> Int64GetDatum((int64)const_param->literal->val.val.ival);
> + break;
> + case INT4OID:
> + params->params[paramno].value =
> Int32GetDatum((int32)const_param->literal->val.val.ival);
> + break;
> + case INT2OID:
> + if (const_param->literal->val.val.ival < SHRT_MIN
> + || const_param->literal->val.val.ival > SHRT_MAX)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> + errmsg("smallint out of range")));
> + }
> + params->params[paramno].value =
> Int16GetDatum((int16)const_param->literal->val.val.ival);
> + break;
> + case FLOAT4OID:
> + params->params[paramno].value =
> Float4GetDatum((float)const_param->literal->val.val.ival);
> + break;
> + case FLOAT8OID:
> + params->params[paramno].value =
> Float8GetDatum((double)const_param->literal->val.val.ival);
> + break;
> + case INT4RANGEOID:
> + sprintf(buf, "[%ld,%ld]",
> const_param->literal->val.val.ival, const_param->literal->val.val.ival);
> + getTypeInputInfo(ptype, &typinput, &typioparam);
> + params->params[paramno].value =
> OidInputFunctionCall(typinput, buf, typioparam, -1);
> + break;
> + default:
> + pg_lltoa(const_param->literal->val.val.ival, buf);
> + getTypeInputInfo(ptype, &typinput, &typioparam);
> + params->params[paramno].value =
> OidInputFunctionCall(typinput, buf, typioparam, -1);
> + }
> + break;
> + case T_Null:
> + params->params[paramno].isnull = true;
> + break;
> + default:
> + /*
> + * Convert from string literal
> + */
> + getTypeInputInfo(ptype, &typinput, &typioparam);
> + params->params[paramno].value =
> OidInputFunctionCall(typinput, const_param->literal->val.val.str,
> typioparam, -1);
> + }
>
> I don't see something with a bunch of hard-coded rules for particular
> type OIDs having any chance of being acceptable.
>

Well, what I need is to convert literal value represented in Value
struct to parameter datum value.
Struct "value" contains union with integer literal and text.
So this peace of code is just provides efficient handling of most common
cases (integer parameters) and uses type's input function in other cases.

> This patch seems to duplicate a large amount of existing code. That
> would be a good thing to avoid.

Yes, I have to copy a lot of code from exec_parse_message +
exec_bind_message + exec_execute_message functions.
Definitely copying of code is bad flaw. It will be much better and
easier just to call three original functions instead of mixing gathering
their code into the new function.
But I failed to do it because
1. Autoprepare should be integrated into exec_simple_query. Before
executing query in normal way, I need to perform cache lookup for
previously prepared plan for this generalized query.
And generalization of query requires building of query tree (query
parsing). In other words, parsing should be done before I can call
exec_parse_message.
2. exec_bind_message works with parameters passed by client though
libpwq protocol, while autoprepare deals with values of parameters
extracted from literals.
3. I do not want to generate dummy name for autoprepared query to handle
it as normal prepared statement.
And I can not use unnamed statements because I want lifetime of
autoprepared statements will be larger than one statement.
4. I have to use slightly different memory context policy than named or
unnamed prepared statements.

Also this three exec_* functions contain prolog/epilog code which is
needed because them are serving separate libpq requests.
But in case of autoprepared statements them need to be executed in the
context of single libpq message, so most of this code is redundant.

> It could use a visit from the style police and a spell-checker, too.

I will definitely fix style and misspelling - I have not submitted yet
this patch to commit fest and there is long enough time to next commitfest.
My primary intention of publishing this patch is receive feedback on the
proposed approach.
I already got two very useful advices: limit number of cached statements
and pay more attention to safety.
This is why I have reimplemented my original approach with substituting
string literals with parameters without building parse tree.

Right now I am mostly thinking about two problems:
1. Finding out best criteria of detecting literals which need to be
replaced with parameters and which not. It is clear that replacing
"limit 10" with "limit $10"
will have not so much sense and can cause worse execution plan. So right
now I just ignore sort, group by and limit parts. But may be it is
possible to find some more flexible approach.
2. Which type to chose for parameters. I can try to infer type from
context (current solution), or try to use type of literal.
The problem with first approach is that query compiler is not always
able to do it and even if type can be determined, it may be too generic
(for example numeric instead of real
or range instead of integer). The problem with second approach is
opposite: type inferred from literal type can be too restrictive - quite
often integer literals are used to specify values of floating point
constant. The best solution is first try to determine parameter type
from context and then refine it based on literal type. But it will
require repeat of query analysis.
Not sure if it is possible.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Thanks for your feedback.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-02 18:26:05
Message-ID: CA+TgmoaCAoSrU9TL8t5J3YP9xp1P-0f5vySy74Rd5kPKLNuqfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 2, 2017 at 5:50 AM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> I don't see something with a bunch of hard-coded rules for particular type
>> OIDs having any chance of being acceptable.
>
> Well, what I need is ...

Regarding this...

> Definitely copying of code is bad flaw. It will be much better and easier
> just to call three original functions instead of mixing gathering their code
> into the new function.
> But I failed to do it because ...

...and also this:

I am sympathetic to the fact that this is a hard problem to solve.
I'm just telling you that the way you've got it is not acceptable and
nobody's going to commit it like that (or if they do, they will end up
having to revert it). If you want to have a technical discussion
about what might be a way to change the patch to be more acceptable,
cool, but I don't want to get into a long debate about whether what
you have is acceptable or not; I've already said what I think about
that and I believe that opinion will be widely shared. I am not
trying to beat you up here, just trying to be clear.

> Thanks for your feedback.

Sure thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-10 16:11:07
Message-ID: 8eed9c23-19ba-5404-7a9e-0584b836b3f3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.05.2017 21:26, Robert Haas wrote:
>
> I am sympathetic to the fact that this is a hard problem to solve.
> I'm just telling you that the way you've got it is not acceptable and
> nobody's going to commit it like that (or if they do, they will end up
> having to revert it). If you want to have a technical discussion
> about what might be a way to change the patch to be more acceptable,
> cool, but I don't want to get into a long debate about whether what
> you have is acceptable or not; I've already said what I think about
> that and I believe that opinion will be widely shared. I am not
> trying to beat you up here, just trying to be clear.
>
>

Based on the Robert's feedback and Tom's proposal I have implemented two
new versions of autoprepare patch.

First version is just refactoring of my original implementation: I have
extracted common code into prepare_cached_plan and exec_prepared_plan
function to avoid code duplication. Also I rewrote assignment of values
to parameters. Now types of parameters are inferred from types of
literals, so there may be several
prepared plans which are different only by types of parameters. Due to
the problem with type coercion for parameters, I have to catch errors in
parse_analyze_varparams.

Robert, can you please explain why using TRY/CATCH is not safe here:
> This is definitely not a safe way of using TRY/CATCH.

Second version is based on Tom's suggestion:
> Personally I'd think about
> replacing the entire literal-with-cast construct with a Param having
> already-known type.
So here I first patch raw parse tree, replacing A_Const with ParamRef.
Such plan is needed to perform cache lookup.
Then I restore original raw parse tree and call pg_analyze_and_rewrite.
Then I mutate analyzed tree, replacing Const with Param nodes.
In this case type coercion is already performed and I know precise types
which should be used for parameters.
It seems to be more sophisticated approach. And I can not extract common
code in prepare_cached_plan,
because preparing of plan is currently mix of steps done in
exec_simple_query and exec_parse_message.
But there is no need to catch analyze errors.

Finally performance of both approaches is the same: at pgbench it is
180k TPS on read-only queries, comparing with 80k TPS for not prepared
queries.
In both cases 7 out of 177 regression tests are not passed (mostly
because session environment is changed between subsequent query execution).

I am going to continue work on this patch I will be glad to receive any
feedback and suggestions for its improvement.
In most cases, applications are not accessing Postgres directly, but
using some connection pooling layer and so them are not able to use
prepared statements.
But at simple OLTP Postgres spent more time on building query plan than
on execution itself. And it is possible to speedup Postgres about two
times at such workload!
Another alternative is true shared plan cache. May be it is even more
perspective approach, but definitely much more invasive and harder to
implement.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-1.patch text/x-patch 46.7 KB
autoprepare-2.patch text/x-patch 40.6 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 15:12:33
Message-ID: 20170511151233.GE17200@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:
> I am going to continue work on this patch I will be glad to receive any
> feedback and suggestions for its improvement.
> In most cases, applications are not accessing Postgres directly, but using
> some connection pooling layer and so them are not able to use prepared
> statements.
> But at simple OLTP Postgres spent more time on building query plan than on
> execution itself. And it is possible to speedup Postgres about two times at
> such workload!
> Another alternative is true shared plan cache. May be it is even more
> perspective approach, but definitely much more invasive and harder to
> implement.

Can we back up and get an overview of what you are doing and how you are
doing it? Our TODO list suggests this order for successful patches:

Desirability -> Design -> Implement -> Test -> Review -> Commit

You kind of started at the Implementation/patch level, which makes it
hard to evaluate.

I think everyone agrees on the Desirability of the feature, but the
Design is the tricky part. I think the design questions are:

* What information is stored about cached plans?
* How are the cached plans invalidated?
* How is a query matched against a cached plan?

Looking at the options, ideally the plan would be cached at the same
query stage as the stage where the incoming query is checked against the
cache. However, caching and checking at the same level offers no
benefit, so they are going to be different. For example, caching a
parse tree at the time it is created, then checking at the same point if
the incoming query is the same doesn't help you because you already had
to create the parse tree get to that point.

A more concrete example is prepared statements. They are stored at the
end of planning and matched in the parser. However, you can easily do
that since the incoming query specifies the name of the prepared query,
so there is no trick to matching.

The desire is to cache as late as possible so you cache more work and
you have more detail about the referenced objects, which helps with
cache invalidation. However, you also want to do cache matching as
early as possible to improve performance.

So, let's look at some options. One interesting idea from Doug Doole
was to do it between the tokenizer and parser. I think they are glued
together so you would need a way to run the tokenizer separately and
compare that to the tokens you stored for the cached plan. The larger
issue is that prepared plans already are checked after parsing, and we
know they are a win, so matching any earlier than that just seems like
overkill and likely to lead to lots of problems.

So, you could do it after parsing but before parse-analysis, which is
kind of what prepared queries do. One tricky problem is that we don't
bind the query string tokens to database objects until after parse
analysis.

Doing matching before parse-analysis is going to be tricky, which is why
there are so many comments about the approach. Changing search_path can
certainly affect it, but creating objects in earlier-mentioned schemas
can also change how an object reference in a query is resolved. Even
obscure things like the creation of a new operator that has higher
precedence in the query could change the plan, though am not sure if
our prepared query system even handles that properly.

Anyway, that is my feedback. I would like to get an overview of what
you are trying to do and the costs/benefits of each option so we can
best guide you.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


From: Douglas Doole <dougdoole(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 17:39:58
Message-ID: CADE5jYJ3g-c6R45cD7KUmfpiOPRzhmf9t820d4cQzmnsnaU6eA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> One interesting idea from Doug Doole was to do it between the tokenizer
> and parser. I think they are glued together so you would need a way to run
> the tokenizer separately and compare that to the tokens you stored for the
> cached plan.
>

When I did this, we had the same problem that the tokenizer and parser were
tightly coupled. Fortunately, I was able to do as you suggest and run the
tokenizer separately to do my analysis.

So my model was to do statement generalization before entering the compiler
at all. I would tokenize the statement to find the literals and generate a
new statement string with placeholders. The new string would the be passed
to the compiler which would then tokenize and parse the reworked statement.

This means we incurred the cost of tokenizing twice, but the tokenizer was
lightweight enough that it wasn't a problem. In exchange I was able to do
statement generalization without touching the compiler - the compiler saw
the generalized statement text as any other statement and handled it in the
exact same way. (There was just a bit of new code around variable binding.)


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Douglas Doole <dougdoole(at)gmail(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 18:15:05
Message-ID: 20170511181505.GA25894@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 11, 2017 at 05:39:58PM +0000, Douglas Doole wrote:
> One interesting idea from Doug Doole was to do it between the tokenizer and
> parser.  I think they are glued together so you would need a way to run the
> tokenizer separately and compare that to the tokens you stored for the
> cached plan.
>
>
> When I did this, we had the same problem that the tokenizer and parser were
> tightly coupled. Fortunately, I was able to do as you suggest and run the
> tokenizer separately to do my analysis. 
>
> So my model was to do statement generalization before entering the compiler at
> all. I would tokenize the statement to find the literals and generate a new
> statement string with placeholders. The new string would the be passed to the
> compiler which would then tokenize and parse the reworked statement.
>
> This means we incurred the cost of tokenizing twice, but the tokenizer was
> lightweight enough that it wasn't a problem. In exchange I was able to do
> statement generalization without touching the compiler - the compiler saw the
> generalized statement text as any other statement and handled it in the exact
> same way. (There was just a bit of new code around variable binding.)

Good point. I think we need to do some measurements to see if the
parser-only stage is actually significant. I have a hunch that
commercial databases have much heavier parsers than we do.

This split would also not work if the scanner feeds changes back into
the parser. I know C does that for typedefs but I don't think we do.

Ideally I would like to see percentage-of-execution numbers for typical
queries for scan, parse, parse-analysis, plan, and execute to see where
the wins are.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 18:31:02
Message-ID: 22696.1494527462@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Good point. I think we need to do some measurements to see if the
> parser-only stage is actually significant. I have a hunch that
> commercial databases have much heavier parsers than we do.

FWIW, gram.y does show up as significant in many of the profiles I take.
I speculate that this is not so much that it eats many CPU cycles, as that
the constant tables are so large as to incur lots of cache misses. scan.l
is not quite as big a deal for some reason, even though it's also large.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>,Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>,Robert Haas <robertmhaas(at)gmail(dot)com>,Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>,PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 18:32:26
Message-ID: D04FA46E-AD8C-418C-8348-C6DC5C49F18D@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On May 11, 2017 11:31:02 AM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Good point. I think we need to do some measurements to see if the
>> parser-only stage is actually significant. I have a hunch that
>> commercial databases have much heavier parsers than we do.
>
>FWIW, gram.y does show up as significant in many of the profiles I
>take.
>I speculate that this is not so much that it eats many CPU cycles, as
>that
>the constant tables are so large as to incur lots of cache misses.
>scan.l
>is not quite as big a deal for some reason, even though it's also
>large.

And that there's a lot of unpredictable branches, leading to a lot if pipeline stalls.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 19:41:45
Message-ID: 5914BE79.5000502@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/11/2017 06:12 PM, Bruce Momjian wrote:
> On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:
>> I am going to continue work on this patch I will be glad to receive any
>> feedback and suggestions for its improvement.
>> In most cases, applications are not accessing Postgres directly, but using
>> some connection pooling layer and so them are not able to use prepared
>> statements.
>> But at simple OLTP Postgres spent more time on building query plan than on
>> execution itself. And it is possible to speedup Postgres about two times at
>> such workload!
>> Another alternative is true shared plan cache. May be it is even more
>> perspective approach, but definitely much more invasive and harder to
>> implement.
> Can we back up and get an overview of what you are doing and how you are
> doing it? Our TODO list suggests this order for successful patches:
>
> Desirability -> Design -> Implement -> Test -> Review -> Commit
>
> You kind of started at the Implementation/patch level, which makes it
> hard to evaluate.
>
> I think everyone agrees on the Desirability of the feature, but the
> Design is the tricky part. I think the design questions are:
>
> * What information is stored about cached plans?
> * How are the cached plans invalidated?
> * How is a query matched against a cached plan?
>
> Looking at the options, ideally the plan would be cached at the same
> query stage as the stage where the incoming query is checked against the
> cache. However, caching and checking at the same level offers no
> benefit, so they are going to be different. For example, caching a
> parse tree at the time it is created, then checking at the same point if
> the incoming query is the same doesn't help you because you already had
> to create the parse tree get to that point.
>
> A more concrete example is prepared statements. They are stored at the
> end of planning and matched in the parser. However, you can easily do
> that since the incoming query specifies the name of the prepared query,
> so there is no trick to matching.
>
> The desire is to cache as late as possible so you cache more work and
> you have more detail about the referenced objects, which helps with
> cache invalidation. However, you also want to do cache matching as
> early as possible to improve performance.
>
> So, let's look at some options. One interesting idea from Doug Doole
> was to do it between the tokenizer and parser. I think they are glued
> together so you would need a way to run the tokenizer separately and
> compare that to the tokens you stored for the cached plan. The larger
> issue is that prepared plans already are checked after parsing, and we
> know they are a win, so matching any earlier than that just seems like
> overkill and likely to lead to lots of problems.
>
> So, you could do it after parsing but before parse-analysis, which is
> kind of what prepared queries do. One tricky problem is that we don't
> bind the query string tokens to database objects until after parse
> analysis.
>
> Doing matching before parse-analysis is going to be tricky, which is why
> there are so many comments about the approach. Changing search_path can
> certainly affect it, but creating objects in earlier-mentioned schemas
> can also change how an object reference in a query is resolved. Even
> obscure things like the creation of a new operator that has higher
> precedence in the query could change the plan, though am not sure if
> our prepared query system even handles that properly.
>
> Anyway, that is my feedback. I would like to get an overview of what
> you are trying to do and the costs/benefits of each option so we can
> best guide you.
>
Sorry, for luck of overview.
I have started with small prototype just to investigate if such optimization makes sense or not.
When I get more than two time advantage in performance on standard pgbench, I come to conclusion that this
optimization can be really very useful and now try to find the best way of its implementation.

I have started with simplest approach when string literals are replaced with parameters. It is done before parsing.
And can be done very fast - just need to locate data in quotes.
But this approach is not safe and universal: you will not be able to speedup most of the existed queries without rewriting them.

This is why I have provided second implementation which replace literals with parameters after raw parsing.
Certainly it is slower than first approach. But still provide significant advantage in performance: more than two times at pgbench.
Then I tried to run regression tests and find several situations where type analysis is not correctly performed in case of replacing literals with parameters.

So my third attempt is to replace constant nodes with parameters in already analyzed tree.

Now answering your questions:

* What information is stored about cached plans?

Key to locate cached plan is raw parse tree. Value is saved CachedPlanSource.

* How are the cached plans invalidated?

In the same way as plans for explicitly prepared statements.

* How is a query matched against a cached plan?

Hash function is calculated for raw parse tree and equal() function is used to compare raw plans with literal nodes replaced with parameters.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Douglas Doole <dougdoole(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 19:48:26
Message-ID: 5914C00A.9030908@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/11/2017 09:31 PM, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Good point. I think we need to do some measurements to see if the
>> parser-only stage is actually significant. I have a hunch that
>> commercial databases have much heavier parsers than we do.
> FWIW, gram.y does show up as significant in many of the profiles I take.
> I speculate that this is not so much that it eats many CPU cycles, as that
> the constant tables are so large as to incur lots of cache misses. scan.l
> is not quite as big a deal for some reason, even though it's also large.
>
> regards, tom lane
Yes, my results shows that pg_parse_query adds not so much overhead:
206k TPS for my first variant with string literal substitution and modified query text used as hash key vs.
181k. TPS for version with patching raw parse tree constructed by pg_parse_query.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Douglas Doole <dougdoole(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 19:52:26
Message-ID: 20170511195226.4zqwzakiv7gpd64i@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote:
> On 05/11/2017 09:31 PM, Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > Good point. I think we need to do some measurements to see if the
> > > parser-only stage is actually significant. I have a hunch that
> > > commercial databases have much heavier parsers than we do.
> > FWIW, gram.y does show up as significant in many of the profiles I take.
> > I speculate that this is not so much that it eats many CPU cycles, as that
> > the constant tables are so large as to incur lots of cache misses. scan.l
> > is not quite as big a deal for some reason, even though it's also large.
> >
> > regards, tom lane
> Yes, my results shows that pg_parse_query adds not so much overhead:
> 206k TPS for my first variant with string literal substitution and modified query text used as hash key vs.
> 181k. TPS for version with patching raw parse tree constructed by pg_parse_query.

Those numbers and your statement seem to contradict each other?

- Andres


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Douglas Doole <dougdoole(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-11 20:30:46
Message-ID: 5914C9F6.9050904@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/11/2017 10:52 PM, Andres Freund wrote:
> On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote:
>> On 05/11/2017 09:31 PM, Tom Lane wrote:
>>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>>> Good point. I think we need to do some measurements to see if the
>>>> parser-only stage is actually significant. I have a hunch that
>>>> commercial databases have much heavier parsers than we do.
>>> FWIW, gram.y does show up as significant in many of the profiles I take.
>>> I speculate that this is not so much that it eats many CPU cycles, as that
>>> the constant tables are so large as to incur lots of cache misses. scan.l
>>> is not quite as big a deal for some reason, even though it's also large.
>>>
>>> regards, tom lane
>> Yes, my results shows that pg_parse_query adds not so much overhead:
>> 206k TPS for my first variant with string literal substitution and modified query text used as hash key vs.
>> 181k. TPS for version with patching raw parse tree constructed by pg_parse_query.
> Those numbers and your statement seem to contradict each other?

Oops, my parse error:( I incorrectly read Tom's statement.
Actually, I also was afraid that price of parsing is large enough and this is why my first attempt was to avoid parsing.
But then I find out that most of the time is spent in analyze and planning (see attached profile):

pg_parse_query: 4.23%
pg_analyze_and_rewrite 8.45%
pg_plan_queries: 15.49%

>
> - Andres

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
pgbench.svg image/svg+xml 69.9 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-12 00:58:15
Message-ID: 20170512005815.GA2635@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote:
> This is why I have provided second implementation which replace
> literals with parameters after raw parsing. Certainly it is slower
> than first approach. But still provide significant advantage in
> performance: more than two times at pgbench. Then I tried to run
> regression tests and find several situations where type analysis is
> not correctly performed in case of replacing literals with parameters.

So the issue is that per-command output from the parser, SelectStmt,
only has strings for identifers, e.g. table and column names, so you
can't be sure it is the same as the cached entry you matched. I suppose
if you cleared the cache every time someone created an object or changed
search_path, it might work.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-12 07:50:41
Message-ID: 857b4487-5f44-07eb-0aa7-aa152ee8ff98@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12.05.2017 03:58, Bruce Momjian wrote:
> On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote:
>> This is why I have provided second implementation which replace
>> literals with parameters after raw parsing. Certainly it is slower
>> than first approach. But still provide significant advantage in
>> performance: more than two times at pgbench. Then I tried to run
>> regression tests and find several situations where type analysis is
>> not correctly performed in case of replacing literals with parameters.
> So the issue is that per-command output from the parser, SelectStmt,
> only has strings for identifers, e.g. table and column names, so you
> can't be sure it is the same as the cached entry you matched. I suppose
> if you cleared the cache every time someone created an object or changed
> search_path, it might work.
>
Definitely changing session context (search_path, date/time format, ...)
may cause incorrect behavior of cached statements.
Actually you may get the same problem with explicitly prepared
statements (certainly, in the last case, you better understand what
going on and it is your choice whether to use or not to use prepared
statement).

The fact of failure of 7 regression tests means that autoprepare can
really change behavior of existed program. This is why my suggestion is
to switch off this feature by default.
But in 99.9% real cases (my estimation plucked out of thin air:) there
will be no such problems with autoprepare. And it can significantly
improve performance of OLTP applications
which are not able to use prepared statements (because of working
through pgbouncer or any other reasons).

Can autoprepare slow down the system?
Yes, it can. It can happen if application perform larger number of
unique queries and autoprepare cache size is not limited.
In this case large (and infinitely growing) number of stored plans can
consume a lot of memory and, what is even worse, slowdown cache lookup.
This is why I by default limit number of cached statements
(autoprepare_limit parameter) by 100.

I am almost sure that there will be some other issues with autoprepare
which I have not encountered yet (because I mostly tested it on pgbench
and Postgres regression tests).
But I am also sure that benefit of doubling system performance is good
motivation to continue work in this direction.

My main concern is whether to continue to improve current approach with
local (per-backend) cache of prepared statements.
Or create shared cache (as in Oracle). It is much more difficult to
implement shared cache (the same problem with session context, different
catalog snapshots, cache invalidation,...)
but it also provides more opportunities for queries optimization and
tuning.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-12 15:23:18
Message-ID: 20170512152318.GB2635@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
> Definitely changing session context (search_path, date/time format, ...) may
> cause incorrect behavior of cached statements.

I wonder if we should clear the cache whenever any SET command is
issued.

> Actually you may get the same problem with explicitly prepared statements
> (certainly, in the last case, you better understand what going on and it is
> your choice whether to use or not to use prepared statement).
>
> The fact of failure of 7 regression tests means that autoprepare can really
> change behavior of existed program. This is why my suggestion is to switch
> off this feature by default.

I would like to see us target something that can be enabled by default.
Even if it only improves performance by 5%, it would be better overall
than a feature that improves performance by 90% but is only used by 1%
of our users.

> But in 99.9% real cases (my estimation plucked out of thin air:) there will
> be no such problems with autoprepare. And it can significantly improve
> performance of OLTP applications
> which are not able to use prepared statements (because of working through
> pgbouncer or any other reasons).

Right, but we can't ship something that is 99.9% accurate when the
inaccuracy is indeterminate. The bottom line is that Postgres has a
very high bar, and I realize you are just prototyping at this point, but
the final product is going to have to address all the intricacies of the
issue for it to be added.

> Can autoprepare slow down the system?
> Yes, it can. It can happen if application perform larger number of unique
> queries and autoprepare cache size is not limited.
> In this case large (and infinitely growing) number of stored plans can
> consume a lot of memory and, what is even worse, slowdown cache lookup.
> This is why I by default limit number of cached statements
> (autoprepare_limit parameter) by 100.

Yes, good idea.

> I am almost sure that there will be some other issues with autoprepare which
> I have not encountered yet (because I mostly tested it on pgbench and
> Postgres regression tests).
> But I am also sure that benefit of doubling system performance is good
> motivation to continue work in this direction.

Right, you are still testing to see where the edges are.

> My main concern is whether to continue to improve current approach with
> local (per-backend) cache of prepared statements.
> Or create shared cache (as in Oracle). It is much more difficult to
> implement shared cache (the same problem with session context, different
> catalog snapshots, cache invalidation,...)
> but it also provides more opportunities for queries optimization and tuning.

I would continue in the per-backend cache direction at this point
because we don't even have that solved yet. The global cache is going
to be even more complex.

Ultimately I think we are going to want global and local caches because
the plans of the local cache are much more likely to be accurate.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-12 17:35:26
Message-ID: 982224ec-ac65-23ee-dd1b-4b8484f88245@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12.05.2017 18:23, Bruce Momjian wrote:
> On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
>> Definitely changing session context (search_path, date/time format, ...) may
>> cause incorrect behavior of cached statements.
> I wonder if we should clear the cache whenever any SET command is
> issued.

Well, with autoprepare cache disabled on each set variable, alter system
and any slow utility statement
only one regression test is not passed. And only because of different
error message context:

*** /home/knizhnik/postgresql.master/src/test/regress/expected/date.out
2017-04-11 18:07:56.497461208 +0300
--- /home/knizhnik/postgresql.master/src/test/regress/results/date.out
2017-05-12 20:21:19.767566302 +0300
***************
*** 1443,1452 ****
--
SELECT EXTRACT(MICROSEC FROM DATE 'infinity'); -- ERROR:
timestamp units "microsec" not recognized
ERROR: timestamp units "microsec" not recognized
- CONTEXT: SQL function "date_part" statement 1
SELECT EXTRACT(UNDEFINED FROM DATE 'infinity'); -- ERROR:
timestamp units "undefined" not supported
ERROR: timestamp units "undefined" not supported
- CONTEXT: SQL function "date_part" statement 1
-- test constructors
select make_date(2013, 7, 15);
make_date
--- 1443,1450 ----

======================================================================

>> Actually you may get the same problem with explicitly prepared statements
>> (certainly, in the last case, you better understand what going on and it is
>> your choice whether to use or not to use prepared statement).
>>
>> The fact of failure of 7 regression tests means that autoprepare can really
>> change behavior of existed program. This is why my suggestion is to switch
>> off this feature by default.
> I would like to see us target something that can be enabled by default.
> Even if it only improves performance by 5%, it would be better overall
> than a feature that improves performance by 90% but is only used by 1%
> of our users.

I have to agree with you here.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>,Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>,Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>,PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-12 17:38:26
Message-ID: 9D2E3022-1DCE-45DB-92AF-B2B5C4842992@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On May 12, 2017 12:50:41 AM PDT, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

>Definitely changing session context (search_path, date/time format,
>...)
>may cause incorrect behavior of cached statements.
>Actually you may get the same problem with explicitly prepared
>statements (certainly, in the last case, you better understand what
>going on and it is your choice whether to use or not to use prepared
>statement).
>
>The fact of failure of 7 regression tests means that autoprepare can
>really change behavior of existed program. This is why my suggestion is
>
>to switch off this feature by default.

I can't see us agreeing with this feature unless it actually reliably works, even if disabled by default.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-12 17:43:44
Message-ID: 20170512174344.GE6721@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 12, 2017 at 08:35:26PM +0300, Konstantin Knizhnik wrote:
>
>
> On 12.05.2017 18:23, Bruce Momjian wrote:
> >On Fri, May 12, 2017 at 10:50:41AM +0300, Konstantin Knizhnik wrote:
> >>Definitely changing session context (search_path, date/time format, ...) may
> >>cause incorrect behavior of cached statements.
> >I wonder if we should clear the cache whenever any SET command is
> >issued.
>
> Well, with autoprepare cache disabled on each set variable, alter system and
> any slow utility statement
> only one regression test is not passed. And only because of different error
> message context:

Great. We have to think of how we would keep this reliable when future
changes happen. Simple is often better because it limits the odds of
breakage.

> >>Actually you may get the same problem with explicitly prepared statements
> >>(certainly, in the last case, you better understand what going on and it is
> >>your choice whether to use or not to use prepared statement).
> >>
> >>The fact of failure of 7 regression tests means that autoprepare can really
> >>change behavior of existed program. This is why my suggestion is to switch
> >>off this feature by default.
> >I would like to see us target something that can be enabled by default.
> >Even if it only improves performance by 5%, it would be better overall
> >than a feature that improves performance by 90% but is only used by 1%
> >of our users.
>
> I have to agree with you here.

Good. I know there are database systems that will try to get the most
performance possible no matter how complex it is for users to configure
or to maintain, but that isn't us.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-15 15:31:46
Message-ID: CA+TgmoaAY+nSoRaiosu_yey89pg2pXw_cBt4r=x=yXxh5LBAzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> Robert, can you please explain why using TRY/CATCH is not safe here:
>>
>> This is definitely not a safe way of using TRY/CATCH.

This has been discussed many, many times on this mailing list before,
and I don't really want to go over it again here. We really need a
README or some documentation about this so that we don't have to keep
answering this same question again and again.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-18 08:57:57
Message-ID: 8ad33057-c29c-3f96-1676-f29ee2873c91@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.05.2017 18:31, Robert Haas wrote:
> On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> Robert, can you please explain why using TRY/CATCH is not safe here:
>>> This is definitely not a safe way of using TRY/CATCH.
> This has been discussed many, many times on this mailing list before,
> and I don't really want to go over it again here. We really need a
> README or some documentation about this so that we don't have to keep
> answering this same question again and again.
>
First of all I want to notice that new version of my patch is not using
PG_TRY/PG_CATCH.
But I still want to clarify for myself whats wrong with this constructions.
I searched both hackers mailing list archive and world-wide using google
but failed to find any references except of
sharing non-volatilie variables between try and catch blocks.
Can you please point me at the thread where this problem was discussed
or just explain in few words the source of the problem?

From my own experience I found out that PG_TRY/PG_CATCH mechanism is
not providing proper cleanup (unlike C++ exceptions).
If there are opened relations, catalog cache entries,... then throwing
error will not release them.
It will cause no problems if error is handled in PostgresMain which
aborts current transaction and releases all resources in any case.
But if I want to ignore this error and continue query execution, then
warnings about resources leaks can be reported.
Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-05-18 16:47:17
Message-ID: 20170518164717.jotsex4wilkzwyg5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-05-18 11:57:57 +0300, Konstantin Knizhnik wrote:
> From my own experience I found out that PG_TRY/PG_CATCH mechanism is not
> providing proper cleanup (unlike C++ exceptions).

Right, simply because there's no portable way to transparently do so.
Would be possible on elf glibc platforms, but ...

> If there are opened relations, catalog cache entries,... then throwing error
> will not release them.
> It will cause no problems if error is handled in PostgresMain which aborts
> current transaction and releases all resources in any case.
> But if I want to ignore this error and continue query execution, then
> warnings about resources leaks can be reported.
> Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions?

There's worse than just leaking resources. Everything touching the
database might cause persistent corruption if you don't roll back.
Consider an insert that failed with a foreign key exception, done from
some function. If you ignore that error, the row will still be visible,
but the foreign key will be violated. If you want to continue after a
PG_CATCH you have to use a subtransaction/savepoint for the PG_TRY
contents, like several PLs do.

- Andres


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-05-25 15:54:45
Message-ID: 283dfb96-48c2-1a87-90ab-97ca87b237fe@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10.05.2017 19:11, Konstantin Knizhnik wrote:
>
> Based on the Robert's feedback and Tom's proposal I have implemented
> two new versions of autoprepare patch.
>
> First version is just refactoring of my original implementation: I
> have extracted common code into prepare_cached_plan and
> exec_prepared_plan
> function to avoid code duplication. Also I rewrote assignment of
> values to parameters. Now types of parameters are inferred from types
> of literals, so there may be several
> prepared plans which are different only by types of parameters. Due to
> the problem with type coercion for parameters, I have to catch errors
> in parse_analyze_varparams.
>

Attached please find rebased version of the autoprepare patch based on
Tom's proposal (perform analyze for tree with constant literals and then
replace them with parameters).
Also I submitted this patch for the Autum commitfest.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare.patch text/x-patch 42.8 KB

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-09-09 03:35:40
Message-ID: CAEepm=2rvscv5kQppLsO6SGzEpKD6v+WB7ikO+uGniQXqQcRjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> Attached please find rebased version of the autoprepare patch based on Tom's
> proposal (perform analyze for tree with constant literals and then replace
> them with parameters).
> Also I submitted this patch for the Autum commitfest.

The patch didn't survive the Summer bitrotfest. Could you please rebase it?

--
Thomas Munro
http://www.enterprisedb.com


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached plans and statement generalization
Date: 2017-09-11 09:24:45
Message-ID: b542f0ee-02be-01cc-d8ee-af1a5a1718c3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.09.2017 06:35, Thomas Munro wrote:
> On Fri, May 26, 2017 at 3:54 AM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> Attached please find rebased version of the autoprepare patch based on Tom's
>> proposal (perform analyze for tree with constant literals and then replace
>> them with parameters).
>> Also I submitted this patch for the Autum commitfest.
> The patch didn't survive the Summer bitrotfest. Could you please rebase it?
>
Attached please find rebased version of the patch.
There are the updated performance results (pgbench -s 100 -c 1):

protocol (-M)
read-write
read-only (-S)
simple
3327
19325
extended
2256
16908
prepared
6145
39326
simple+autoprepare
4950
34841

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-3.patch text/x-patch 42.7 KB

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cached plans and statement generalization
Date: 2017-09-12 17:11:27
Message-ID: e6ced412-75ee-b4a8-d61d-68fed2cfe8cc@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11.09.2017 12:24, Konstantin Knizhnik wrote:
> Attached please find rebased version of the patch.
> There are the updated performance results (pgbench -s 100 -c 1):
>
> protocol (-M)
> read-write
> read-only (-S)
> simple
> 3327
> 19325
> extended
> 2256
> 16908
> prepared
> 6145
> 39326
> simple+autoprepare
> 4950
> 34841
>
>

One more patch passing all regression tests with autoprepare_threshold=1.
I still do not think that it should be switch on by default...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-4.patch text/x-patch 71.6 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2017-11-30 01:59:12
Message-ID: CAB7nPqS+2ZRcZZGMJOAYON2h_GOL8G2QhjmbZT__pPuZ8feA0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> One more patch passing all regression tests with autoprepare_threshold=1.
> I still do not think that it should be switch on by default...

This patch does not apply, and did not get any reviews. So I am moving
it to next CF with waiting on author as status. Please provide a
rebased version. Tsunakawa-san, you are listed as a reviewer of this
patch. If you are not planning to look at it anymore, you may want to
remove your name from the related CF entry
https://commitfest.postgresql.org/16/1150/.
--
Michael


From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Michael Paquier' <michael(dot)paquier(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2017-11-30 02:07:11
Message-ID: 0A3221C70F24FB45833433255569204D1F8424FC@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: Michael Paquier [mailto:michael(dot)paquier(at)gmail(dot)com]
> This patch does not apply, and did not get any reviews. So I am moving it
> to next CF with waiting on author as status. Please provide a rebased version.
> Tsunakawa-san, you are listed as a reviewer of this patch. If you are not
> planning to look at it anymore, you may want to remove your name from the
> related CF entry https://commitfest.postgresql.org/16/1150/.

Sorry, but I'm planning to review this next month because this proposal could alleviate some problem we faced.

Regards
Takayuki Tsunakawa


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2017-11-30 02:10:26
Message-ID: CAB7nPqQAa0YPqacRBkhcVKNkNPVj7haJMKW3eYHhnOuYDhM3Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 30, 2017 at 11:07 AM, Tsunakawa, Takayuki
<tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
> From: Michael Paquier [mailto:michael(dot)paquier(at)gmail(dot)com]
>> This patch does not apply, and did not get any reviews. So I am moving it
>> to next CF with waiting on author as status. Please provide a rebased version.
>> Tsunakawa-san, you are listed as a reviewer of this patch. If you are not
>> planning to look at it anymore, you may want to remove your name from the
>> related CF entry https://commitfest.postgresql.org/16/1150/.
>
> Sorry, but I'm planning to review this next month because this proposal could alleviate some problem we faced.

No problem. Thanks for the update.
--
Michael


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2017-12-04 15:46:07
Message-ID: a47db8e0-16c0-32ef-60b1-6ee3ebac5f3e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30.11.2017 04:59, Michael Paquier wrote:
> On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> One more patch passing all regression tests with autoprepare_threshold=1.
>> I still do not think that it should be switch on by default...
> This patch does not apply, and did not get any reviews. So I am moving
> it to next CF with waiting on author as status. Please provide a
> rebased version. Tsunakawa-san, you are listed as a reviewer of this
> patch. If you are not planning to look at it anymore, you may want to
> remove your name from the related CF entry
> https://commitfest.postgresql.org/16/1150/.

Updated version of the patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-5.patch text/x-patch 71.8 KB

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-01-06 22:51:18
Message-ID: 20180106225118.GS2416@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greetings,

* Konstantin Knizhnik (k(dot)knizhnik(at)postgrespro(dot)ru) wrote:
> On 30.11.2017 04:59, Michael Paquier wrote:
> >On Wed, Sep 13, 2017 at 2:11 AM, Konstantin Knizhnik
> ><k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> >>One more patch passing all regression tests with autoprepare_threshold=1.
> >>I still do not think that it should be switch on by default...
> >This patch does not apply, and did not get any reviews. So I am moving
> >it to next CF with waiting on author as status. Please provide a
> >rebased version. Tsunakawa-san, you are listed as a reviewer of this
> >patch. If you are not planning to look at it anymore, you may want to
> >remove your name from the related CF entry
> >https://commitfest.postgresql.org/16/1150/.
>
> Updated version of the patch is attached.

This patch appears to apply with just a bit of fuzz and make check
passes, so I'm not sure why this is currently marked as 'Waiting for
author'.

I've updated it to be 'Needs review'. If that's incorrect, feel free to
change it back with an explanation.

Thanks!

Stephen


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-01-12 00:40:57
Message-ID: CAEepm=1MyFoM7VS+ed3BS3PWvSz_oU_3_uMESdQCozku1+K1SQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Konstantin Knizhnik (k(dot)knizhnik(at)postgrespro(dot)ru) wrote:
>> Updated version of the patch is attached.
>
> This patch appears to apply with just a bit of fuzz and make check
> passes, so I'm not sure why this is currently marked as 'Waiting for
> author'.
>
> I've updated it to be 'Needs review'. If that's incorrect, feel free to
> change it back with an explanation.

Hi Konstantin,

/home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
undefined reference to `PortalGetHeapMemory'

That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
PortalGetHeapMemory. Looks like it needs to be replaced with
portal->portalContext.

--
Thomas Munro
http://www.enterprisedb.com


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-01-12 12:53:11
Message-ID: 20638bbb-2744-a4ba-116d-084e9c6520bf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12.01.2018 03:40, Thomas Munro wrote:
> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> * Konstantin Knizhnik (k(dot)knizhnik(at)postgrespro(dot)ru) wrote:
>>> Updated version of the patch is attached.
>> This patch appears to apply with just a bit of fuzz and make check
>> passes, so I'm not sure why this is currently marked as 'Waiting for
>> author'.
>>
>> I've updated it to be 'Needs review'. If that's incorrect, feel free to
>> change it back with an explanation.
> Hi Konstantin,
>
> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
> undefined reference to `PortalGetHeapMemory'
>
> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
> PortalGetHeapMemory. Looks like it needs to be replaced with
> portal->portalContext.
>
Hi  Thomas,

Thank you very much for reporting the problem.
Rebased version of the patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-6.patch text/x-patch 71.8 KB

From: David Steele <david(at)pgmasters(dot)net>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Subject: Re: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-03-02 14:26:25
Message-ID: be239334-8570-fc17-d32d-ffcb39923b58@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Konstantin,

On 1/12/18 7:53 AM, Konstantin Knizhnik wrote:
>
>
> On 12.01.2018 03:40, Thomas Munro wrote:
>> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost(at)snowman(dot)net>
>> wrote:
>>> * Konstantin Knizhnik (k(dot)knizhnik(at)postgrespro(dot)ru) wrote:
>>>> Updated version of the patch is attached.
>>> This patch appears to apply with just a bit of fuzz and make check
>>> passes, so I'm not sure why this is currently marked as 'Waiting for
>>> author'.
>>>
>>> I've updated it to be 'Needs review'.  If that's incorrect, feel free to
>>> change it back with an explanation.
>> Hi Konstantin,
>>
>> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
>>
>> undefined reference to `PortalGetHeapMemory'
>>
>> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
>> PortalGetHeapMemory.  Looks like it needs to be replaced with
>> portal->portalContext.
>>
> Hi  Thomas,
>
> Thank you very much for reporting the problem.
> Rebased version of the patch is attached.

This patch has received no review or comments since last May and appears
too complex and invasive for the final CF of PG11.

I don't think it makes sense to keep pushing a patch through CFs when it
is not getting reviewed. I'm planning to mark this as Returned with
Feedback unless there are solid arguments to the contrary.

Thanks,
--
-David
david(at)pgmasters(dot)net


From: David Steele <david(at)pgmasters(dot)net>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Subject: Re: Re: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-03-06 15:07:39
Message-ID: ca3fee0c-3833-c2a9-6ab7-e3e395518139@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/2/18 9:26 AM, David Steele wrote:
> On 1/12/18 7:53 AM, Konstantin Knizhnik wrote:
>>
>>
>> On 12.01.2018 03:40, Thomas Munro wrote:
>>> On Sun, Jan 7, 2018 at 11:51 AM, Stephen Frost <sfrost(at)snowman(dot)net>
>>> wrote:
>>>> * Konstantin Knizhnik (k(dot)knizhnik(at)postgrespro(dot)ru) wrote:
>>>>> Updated version of the patch is attached.
>>>> This patch appears to apply with just a bit of fuzz and make check
>>>> passes, so I'm not sure why this is currently marked as 'Waiting for
>>>> author'.
>>>>
>>>> I've updated it to be 'Needs review'.  If that's incorrect, feel free to
>>>> change it back with an explanation.
>>> Hi Konstantin,
>>>
>>> /home/travis/build/postgresql-cfbot/postgresql/src/backend/tcop/postgres.c:5249:
>>>
>>> undefined reference to `PortalGetHeapMemory'
>>>
>>> That's because commit 0f7c49e85518dd846ccd0a044d49a922b9132983 killed
>>> PortalGetHeapMemory.  Looks like it needs to be replaced with
>>> portal->portalContext.
>>>
>> Hi  Thomas,
>>
>> Thank you very much for reporting the problem.
>> Rebased version of the patch is attached.
>
> This patch has received no review or comments since last May and appears
> too complex and invasive for the final CF of PG11.
>
> I don't think it makes sense to keep pushing a patch through CFs when it
> is not getting reviewed. I'm planning to mark this as Returned with
> Feedback unless there are solid arguments to the contrary.

Marked as Returned with Feedback.

Regards,
--
-David
david(at)pgmasters(dot)net


From: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2018-07-31 09:12:21
Message-ID: 9A6E5062D5D4DB458C80C2B2920BD71D5C2FA8@g01jpexmbkw23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: Konstantin Knizhnik [mailto:k(dot)knizhnik(at)postgrespro(dot)ru]
> Sent: Friday, January 12, 2018 9:53 PM
> To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>; Stephen Frost
> <sfrost(at)snowman(dot)net>
> Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>; PostgreSQL mailing
> lists <pgsql-hackers(at)postgresql(dot)org>; Tsunakawa, Takayuki/綱川 貴之
> <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
> Subject: Re: [HACKERS] Cached plans and statement generalization
>
> Thank you very much for reporting the problem.
> Rebased version of the patch is attached.

Hi Konstantin.

I think that this patch excel very much. Because the customer of our
company has the demand that migrates from other DB to PostgreSQL, and
the problem to have to modify the application program to do prepare in
that case occurs. It is possible to solve it by the problem's using this
patch. I want to be helping this patch to be committed. Will you participate
in the following CF?

To review this patch, I verified it. The verified environment is
PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and "jit/jit.h"
to postgres.c of the patch by the updating of PostgreSQL. Please rebase.

1. I confirmed the influence on the performance by having applied this patch.
The result showed the tendency similar to Konstantin.
-s:100 -c:8 -t: 10000 read-only
simple: 20251 TPS
prepare: 29502 TPS
simple(autoprepare): 28001 TPS

2. I confirmed the influence on the memory utilization by the length of query that did
autoprepare. Short queries have 1 constant. Long queries have 100 constants.
This result was shown that preparing long query used the memory more.
before prepare:plan cache context: 1032 used
prepare 10 short query statement:plan cache context: 15664 used
prepare 10 long query statement:plan cache context: 558032 used

In this patch, the maximum number of query that can do prepare can be set to autoprepare_limit.
However, is it good in this? I think that I can assume the scene in the following.
- Application side user: To elicit the performance, they want to specify the number of prepared
query.
- Operation side user: To prevent the memory from overflowing, they want to set the maximum value
of the memory utilization.
Therefore, I propose to add the parameter to specify the maximum memory utilization.

3. I confirmed the transition of the amount of the memory when it tried to prepare query
of the number that exceeded the value specified for autoprepare_limit.
[autoprepare_limit=1 and execute 10 different queries]
plan cache context: 1032 used
plan cache context: 39832 used
plan cache context: 78552 used
plan cache context: 117272 used
plan cache context: 155952 used
plan cache context: 194632 used
plan cache context: 233312 used
plan cache context: 272032 used
plan cache context: 310712 used
plan cache context: 349392 used
plan cache context: 388072 used

I feel the doubt in an increase of the memory utilization when I execute a lot of
query though cached query is one (autoprepare_limit=1).
This behavior is correct?

Best regards, Yamaji


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-07-31 21:30:27
Message-ID: e9cc0486-4a64-2fc8-3eb7-58d92f5a0604@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Yamaji,

On 31.07.2018 12:12, Yamaji, Ryo wrote:
>> -----Original Message-----
>> From: Konstantin Knizhnik [mailto:k(dot)knizhnik(at)postgrespro(dot)ru]
>> Sent: Friday, January 12, 2018 9:53 PM
>> To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>; Stephen Frost
>> <sfrost(at)snowman(dot)net>
>> Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>; PostgreSQL mailing
>> lists <pgsql-hackers(at)postgresql(dot)org>; Tsunakawa, Takayuki/綱川 貴之
>> <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
>> Subject: Re: [HACKERS] Cached plans and statement generalization
>>
>> Thank you very much for reporting the problem.
>> Rebased version of the patch is attached.
> Hi Konstantin.
>
> I think that this patch excel very much. Because the customer of our
> company has the demand that migrates from other DB to PostgreSQL, and
> the problem to have to modify the application program to do prepare in
> that case occurs. It is possible to solve it by the problem's using this
> patch. I want to be helping this patch to be committed. Will you participate
> in the following CF?

This patch will be included in next release of PgPro EE.
Concerning next commit fest - I am not sure.
At previous commitfest it was returned with feedback that it "has
received no review or comments since last May".
May be your review will help to change this situation.

>
> To review this patch, I verified it. The verified environment is
> PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and "jit/jit.h"
> to postgres.c of the patch by the updating of PostgreSQL. Please rebase.
>
> 1. I confirmed the influence on the performance by having applied this patch.
> The result showed the tendency similar to Konstantin.
> -s:100 -c:8 -t: 10000 read-only
> simple: 20251 TPS
> prepare: 29502 TPS
> simple(autoprepare): 28001 TPS
>
> 2. I confirmed the influence on the memory utilization by the length of query that did
> autoprepare. Short queries have 1 constant. Long queries have 100 constants.
> This result was shown that preparing long query used the memory more.
> before prepare:plan cache context: 1032 used
> prepare 10 short query statement:plan cache context: 15664 used
> prepare 10 long query statement:plan cache context: 558032 used
>
> In this patch, the maximum number of query that can do prepare can be set to autoprepare_limit.
> However, is it good in this? I think that I can assume the scene in the following.
> - Application side user: To elicit the performance, they want to specify the number of prepared
> query.
> - Operation side user: To prevent the memory from overflowing, they want to set the maximum value
> of the memory utilization.
> Therefore, I propose to add the parameter to specify the maximum memory utilization.

I agree it may be more useful to limit amount of memory used by prepare
queries, rather than number of prepared statements.
But it is just more difficult to calculate and maintain (I am not sure
that just looking at CacheMemoryContext is enough for it).
Also, if working set of queries (frequently repeated queries) doesn't
fir in memory, then autoprepare will be almost useless (because with
high probability
prepared query will be thrown away from the cache before it can be
reused). So limiting queries from "application side" seems to be more
practical.

> 3. I confirmed the transition of the amount of the memory when it tried to prepare query
> of the number that exceeded the value specified for autoprepare_limit.
> [autoprepare_limit=1 and execute 10 different queries]
> plan cache context: 1032 used
> plan cache context: 39832 used
> plan cache context: 78552 used
> plan cache context: 117272 used
> plan cache context: 155952 used
> plan cache context: 194632 used
> plan cache context: 233312 used
> plan cache context: 272032 used
> plan cache context: 310712 used
> plan cache context: 349392 used
> plan cache context: 388072 used
>
> I feel the doubt in an increase of the memory utilization when I execute a lot of
> query though cached query is one (autoprepare_limit=1).
> This behavior is correct?

I will check it.


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-08-01 07:32:14
Message-ID: d54a4291-6b80-f413-8be9-d4b13aef08a6@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.08.2018 00:30, Konstantin Knizhnik wrote:
> Hi Yamaji,
>
>
> On 31.07.2018 12:12, Yamaji, Ryo wrote:
>>> -----Original Message-----
>>> From: Konstantin Knizhnik [mailto:k(dot)knizhnik(at)postgrespro(dot)ru]
>>> Sent: Friday, January 12, 2018 9:53 PM
>>> To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>; Stephen Frost
>>> <sfrost(at)snowman(dot)net>
>>> Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>; PostgreSQL mailing
>>> lists <pgsql-hackers(at)postgresql(dot)org>; Tsunakawa, Takayuki/綱川 貴之
>>> <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
>>> Subject: Re: [HACKERS] Cached plans and statement generalization
>>>
>>> Thank you very much for reporting the problem.
>>> Rebased version of the patch is attached.
>> Hi Konstantin.
>>
>> I think that this patch excel very much. Because the customer of our
>> company has the demand that migrates from other DB to PostgreSQL, and
>> the problem to have to modify the application program to do prepare in
>> that case occurs. It is possible to solve it by the problem's using this
>> patch. I want to be helping this patch to be committed. Will you
>> participate
>> in the following CF?
>
> This patch will be included in next release of PgPro EE.
> Concerning next commit fest - I am not sure.
> At previous commitfest it was returned with feedback that it "has
> received no review or comments since last May".
> May be your review will help to change this situation.
>
>
>>
>> To review this patch, I verified it. The verified environment is
>> PostgreSQL 11beta2. It is necessary to add "executor/spi.h" and
>> "jit/jit.h"
>> to postgres.c of the patch by the updating of PostgreSQL. Please rebase.

New version of the patch is attached.

Attachment Content-Type Size
autoprepare-9.patch text/x-patch 88.3 KB

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-08-01 07:53:03
Message-ID: 157e01d2-30de-0f61-bfa0-d9e1d7dbe66e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.07.2018 12:12, Yamaji, Ryo wrote:
>
> 3. I confirmed the transition of the amount of the memory when it tried to prepare query
> of the number that exceeded the value specified for autoprepare_limit.
> [autoprepare_limit=1 and execute 10 different queries]
> plan cache context: 1032 used
> plan cache context: 39832 used
> plan cache context: 78552 used
> plan cache context: 117272 used
> plan cache context: 155952 used
> plan cache context: 194632 used
> plan cache context: 233312 used
> plan cache context: 272032 used
> plan cache context: 310712 used
> plan cache context: 349392 used
> plan cache context: 388072 used
>
> I feel the doubt in an increase of the memory utilization when I execute a lot of
> query though cached query is one (autoprepare_limit=1).
> This behavior is correct?

I failed to reproduce the problem.
I used the following non-default configuration parameters:

autoprepare_limit=1
autoprepare_threshold=1

create dummy database:

create table foo(x integer primary key, y integer);
insert into foo values (generate_series(1,10000), 0);

and run different queries,  like:

postgres=# select *  from foo where x=1;
postgres=# select *  from foo where x+x=1;
postgres=# select *  from foo where x+x+x=1;
postgres=# select *  from foo where x+x+x+x=1;
...

and check size of CacheMemoryContext using gdb - it is not increased.

Can you please send me your test?


From: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2018-08-02 05:25:53
Message-ID: 9A6E5062D5D4DB458C80C2B2920BD71D5C471C@g01jpexmbkw23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: Konstantin Knizhnik [mailto:k(dot)knizhnik(at)postgrespro(dot)ru]
> Sent: Wednesday, August 1, 2018 4:53 PM
> To: Yamaji, Ryo/山地 亮 <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
> Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
> Subject: Re: [HACKERS] Cached plans and statement generalization
>
> I failed to reproduce the problem.
> I used the following non-default configuration parameters:
>
> autoprepare_limit=1
> autoprepare_threshold=1
>
> create dummy database:
>
> create table foo(x integer primary key, y integer); insert into foo values
> (generate_series(1,10000), 0);
>
> and run different queries,  like:
>
> postgres=# select *  from foo where x=1; postgres=# select *  from foo
> where x+x=1; postgres=# select *  from foo where x+x+x=1; postgres=#
> select *  from foo where x+x+x+x=1; ...
>
> and check size of CacheMemoryContext using gdb - it is not increased.
>
> Can you please send me your test?

I checked not CacheMemoryContext but "plan cache context".
Because I think that the memory context that autoprepare mainly uses is "plan cache context".

Non-default configuration parameter was used as well as Konstantin.
autoprepare_limit=1
autoprepare_threshold=1

The procedure of the test that I did is shown below.

create dummy database
create table test (key1 integer, key2 integer, ... , key100 integer);
insert into test values (1, 2, ... , 100);

And, different queries are executed.
select key1 from test where key1=1 and key2=2 and ... and key100=100;
select key2 from test where key1=1 and key2=2 and ... and key100=100;
select key3 from test where key1=1 and key2=2 and ... and key100=100;...

And, "plan cache context" was confirmed by using gdb.


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-08-02 22:02:19
Message-ID: b34902a7-c87c-e6ba-032e-bdf67c56e774@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.08.2018 08:25, Yamaji, Ryo wrote:
>> -----Original Message-----
>> From: Konstantin Knizhnik [mailto:k(dot)knizhnik(at)postgrespro(dot)ru]
>> Sent: Wednesday, August 1, 2018 4:53 PM
>> To: Yamaji, Ryo/山地 亮 <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
>> Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
>> Subject: Re: [HACKERS] Cached plans and statement generalization
>>
>> I failed to reproduce the problem.
>> I used the following non-default configuration parameters:
>>
>> autoprepare_limit=1
>> autoprepare_threshold=1
>>
>> create dummy database:
>>
>> create table foo(x integer primary key, y integer); insert into foo values
>> (generate_series(1,10000), 0);
>>
>> and run different queries,  like:
>>
>> postgres=# select *  from foo where x=1; postgres=# select *  from foo
>> where x+x=1; postgres=# select *  from foo where x+x+x=1; postgres=#
>> select *  from foo where x+x+x+x=1; ...
>>
>> and check size of CacheMemoryContext using gdb - it is not increased.
>>
>> Can you please send me your test?
>
> I checked not CacheMemoryContext but "plan cache context".
> Because I think that the memory context that autoprepare mainly uses is "plan cache context".
>
> Non-default configuration parameter was used as well as Konstantin.
> autoprepare_limit=1
> autoprepare_threshold=1
>
> The procedure of the test that I did is shown below.
>
> create dummy database
> create table test (key1 integer, key2 integer, ... , key100 integer);
> insert into test values (1, 2, ... , 100);
>
> And, different queries are executed.
> select key1 from test where key1=1 and key2=2 and ... and key100=100;
> select key2 from test where key1=1 and key2=2 and ... and key100=100;
> select key3 from test where key1=1 and key2=2 and ... and key100=100;...
>
> And, "plan cache context" was confirmed by using gdb.
>
>

Thank you.
Unfortunately expression_tree_walker is not consistent with copyObject:
I tried to use this walker to destroy raw parse tree of autoprepared
statement, but looks
like some nodes are not visited by expression_tree_walker. I have to
create separate memory context for each autoprepared statement.
New version  of the patch is attached.

Attachment Content-Type Size
autoprepare-10.patch text/x-patch 87.9 KB

From: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2018-08-07 10:02:20
Message-ID: 9A6E5062D5D4DB458C80C2B2920BD71D5C5111@g01jpexmbkw23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: Konstantin Knizhnik [mailto:k(dot)knizhnik(at)postgrespro(dot)ru]
> Sent: Friday, August 3, 2018 7:02 AM
> To: Yamaji, Ryo/山地 亮 <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
> Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
> Subject: Re: [HACKERS] Cached plans and statement generalization
>
> Thank you.
> Unfortunately expression_tree_walker is not consistent with copyObject:
> I tried to use this walker to destroy raw parse tree of autoprepared
> statement, but looks like some nodes are not visited by
> expression_tree_walker. I have to create separate memory context for each
> autoprepared statement.
> New version  of the patch is attached.

Thank you for attaching the patch.
The improvement to plan cache context by the new patch was confirmed.

> This patch will be included in next release of PgPro EE.
> Concerning next commit fest - I am not sure.
> At previous commitfest it was returned with feedback that it "has received no review or comments since last May".
> May be your review will help to change this situation.

I want to confirm one point.
If I will have reviewed the autoprepare patch, then are you ready to register
the patch at commit fest in the near future? I fear that autoprepare patch do
not registered at commit fest in the future (for example, you are so busy), and
do not applied to PostgreSQL. If you are not ready to register the patch, I think
I want to register at commit fest instead of you.

> I agree it may be more useful to limit amount of memory used by prepare
> queries, rather than number of prepared statements.
> But it is just more difficult to calculate and maintain (I am not sure
> that just looking at CacheMemoryContext is enough for it).
> Also, if working set of queries (frequently repeated queries) doesn't
> fir in memory, then autoprepare will be almost useless (because with
> high probability
> prepared query will be thrown away from the cache before it can be
> reused). So limiting queries from "application side" seems to be more
> practical.

I see. But I fear that autoprepare process uses irregularity amount of memory
when autoprepare_limit is specified number of prepared statements. I think
that there is scene that autoprepare process use a lot of memory (ex. it
need to prepare a lot of long queries), then other processes (ex. other
backend process in PostgreSQL or process other than PostgreSQL) cannot use
memory. I hope to specify limit amount of memory in the future.


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-08-07 12:36:16
Message-ID: 3a054e51-448e-a66d-074e-099a3f3f62b2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07.08.2018 13:02, Yamaji, Ryo wrote:
>
> I want to confirm one point.
> If I will have reviewed the autoprepare patch, then are you ready to register
> the patch at commit fest in the near future? I fear that autoprepare patch do
> not registered at commit fest in the future (for example, you are so busy), and
> do not applied to PostgreSQL. If you are not ready to register the patch, I think
> I want to register at commit fest instead of you.

I have registered the patch for next commitfest.
For some reasons it doesn't find the latest autoprepare-10.patch and
still refer to autoprepare-6.patch as the latest attachement.

>
>
>> I agree it may be more useful to limit amount of memory used by prepare
>> queries, rather than number of prepared statements.
>> But it is just more difficult to calculate and maintain (I am not sure
>> that just looking at CacheMemoryContext is enough for it).
>> Also, if working set of queries (frequently repeated queries) doesn't
>> fir in memory, then autoprepare will be almost useless (because with
>> high probability
>> prepared query will be thrown away from the cache before it can be
>> reused). So limiting queries from "application side" seems to be more
>> practical.
> I see. But I fear that autoprepare process uses irregularity amount of memory
> when autoprepare_limit is specified number of prepared statements. I think
> that there is scene that autoprepare process use a lot of memory (ex. it
> need to prepare a lot of long queries), then other processes (ex. other
> backend process in PostgreSQL or process other than PostgreSQL) cannot use
> memory. I hope to specify limit amount of memory in the future.

Right now each prepared statement has two memory contexts: one for raw
parse tree used as hash table key and another for cached plan itself.
May be it will be possible to combine them. To calculate memory consumed
by cached plans, it will be necessary to calculate memory usage
statistic for all this contexts (which requires traversal of all
context's chunks) and sum them. It is much more complex and expensive
than current check: (++autoprepare_cached_plans > autoprepare_limit)
although I so not think that it will have measurable impact on
performance...
May be there should be some faster way to estimate memory consumed by
prepared statements.

So, the current autoprepare_limit allows to limit number of autoprepared
statements and prevent memory overflow caused by execution of larger
number of different statements.
The question is whether we need more precise mechanism which will take
in account difference between small and large queries. Definitely simple
query can require 10-100 times less memory than complex query. But
memory contexts themselves (even with small block size) somehow minimize
difference in memory footprint of different queries, because of chunked
allocation.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2018-08-22 04:54:18
Message-ID: 9A6E5062D5D4DB458C80C2B2920BD71D5C6B5C@g01jpexmbkw23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote:

> I have registered the patch for next commitfest.
> For some reasons it doesn't find the latest autoprepare-10.patch and still
> refer to autoprepare-6.patch as the latest attachement.

I am sorry for the long delay in my response.
I confirmed it and become reviewer the patch.
I am going to review.

> Right now each prepared statement has two memory contexts: one for raw
> parse tree used as hash table key and another for cached plan itself.
> May be it will be possible to combine them. To calculate memory consumed
> by cached plans, it will be necessary to calculate memory usage statistic
> for all this contexts (which requires traversal of all context's chunks)
> and sum them. It is much more complex and expensive than current check:
> (++autoprepare_cached_plans > autoprepare_limit) although I so not think
> that it will have measurable impact on performance...
> May be there should be some faster way to estimate memory consumed by
> prepared statements.
>
> So, the current autoprepare_limit allows to limit number of autoprepared
> statements and prevent memory overflow caused by execution of larger
> number of different statements.
> The question is whether we need more precise mechanism which will take
> in account difference between small and large queries. Definitely simple
> query can require 10-100 times less memory than complex query. But memory
> contexts themselves (even with small block size) somehow minimize
> difference in memory footprint of different queries, because of chunked
> allocation.

Thank you for telling me how to implement and its problem.
In many cases when actually designing a system, we estimate the amount of memory
to use and prepare a memory accordingly. Therefore, if we need to estimate memory
usage in both patterns that are limit amount of memory used by prepare queries or
number of prepared statements, I think that there is no problem with the current
implementation.

Apart from the patch, if it can implement an application that outputs estimates of
memory usage when we enter a query, you may be able to use autoprepare more comfortably.
But it sounds difficult..


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-08-24 15:36:52
Message-ID: 221b66e7-fc50-0e40-8152-f448f00c6e09@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.08.2018 07:54, Yamaji, Ryo wrote:
> On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote:
>
>> I have registered the patch for next commitfest.
>> For some reasons it doesn't find the latest autoprepare-10.patch and still
>> refer to autoprepare-6.patch as the latest attachement.
> I am sorry for the long delay in my response.
> I confirmed it and become reviewer the patch.
> I am going to review.
>
>
>> Right now each prepared statement has two memory contexts: one for raw
>> parse tree used as hash table key and another for cached plan itself.
>> May be it will be possible to combine them. To calculate memory consumed
>> by cached plans, it will be necessary to calculate memory usage statistic
>> for all this contexts (which requires traversal of all context's chunks)
>> and sum them. It is much more complex and expensive than current check:
>> (++autoprepare_cached_plans > autoprepare_limit) although I so not think
>> that it will have measurable impact on performance...
>> May be there should be some faster way to estimate memory consumed by
>> prepared statements.
>>
>> So, the current autoprepare_limit allows to limit number of autoprepared
>> statements and prevent memory overflow caused by execution of larger
>> number of different statements.
>> The question is whether we need more precise mechanism which will take
>> in account difference between small and large queries. Definitely simple
>> query can require 10-100 times less memory than complex query. But memory
>> contexts themselves (even with small block size) somehow minimize
>> difference in memory footprint of different queries, because of chunked
>> allocation.
> Thank you for telling me how to implement and its problem.
> In many cases when actually designing a system, we estimate the amount of memory
> to use and prepare a memory accordingly. Therefore, if we need to estimate memory
> usage in both patterns that are limit amount of memory used by prepare queries or
> number of prepared statements, I think that there is no problem with the current
> implementation.
>
> Apart from the patch, if it can implement an application that outputs estimates of
> memory usage when we enter a query, you may be able to use autoprepare more comfortably.
> But it sounds difficult..

I have added autoprepare_memory_limit parameter which allows to limit
memory used by  autoprepared statements rather than number of such
statements.
If value of this parameter is non zero, then  total size of all memory
contexts used for autoprepared statement is maintained (can be
innsoected in debugger in autoprepare_used_memory variable).
As I already noticed, calculating memory context size adds extra
overhead but I do not notice any influence in performance. New version
of the patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-11.patch text/x-patch 89.4 KB

From: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
To: 'Konstantin Knizhnik' <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2018-09-28 08:45:22
Message-ID: 9A6E5062D5D4DB458C80C2B2920BD71D5ECEB5@g01jpexmbkw23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote:

> I have registered the patch for next commitfest.
> For some reasons it doesn't find the latest autoprepare-10.patch and still
> refer to autoprepare-6.patch as the latest attachement.

I'm sorry for the late reply. I'm currently reviewing autoprepare.
I could not make it in time for the commit fests in September,
but I will continue to review for the next commitfest.

Performance tests are good results. The results are shown below.
pgbench -s 100 -c 8 -t 10000 -S (average of 3 trials)
- all autoprepare statements use same memory context.
18052.22706 TPS
- each autoprepare statement use separate memory context.
18607.95889 TPS
- calculate memory usage (autoprepare_memory_limit)
19171.60457 TPS

From the above results, I think that adding/changing functions
will not affect performance. I am currently checking the behavior
when autoprepare_memory_limit is specified.


From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-11-29 15:47:47
Message-ID: CA+q6zcW4bL=pr3_YjkKtVhHUN5qfd7kxTFUWqCiavA0ROjBu0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Fri, Sep 28, 2018 at 10:46 AM Yamaji, Ryo <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com> wrote:
>
> On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote:
>
> > I have registered the patch for next commitfest.
> > For some reasons it doesn't find the latest autoprepare-10.patch and still
> > refer to autoprepare-6.patch as the latest attachement.
>
> I'm sorry for the late reply. I'm currently reviewing autoprepare.
> I could not make it in time for the commit fests in September,
> but I will continue to review for the next commitfest.
>
> Performance tests are good results. The results are shown below.
> pgbench -s 100 -c 8 -t 10000 -S (average of 3 trials)
> - all autoprepare statements use same memory context.
> 18052.22706 TPS
> - each autoprepare statement use separate memory context.
> 18607.95889 TPS
> - calculate memory usage (autoprepare_memory_limit)
> 19171.60457 TPS
>
> From the above results, I think that adding/changing functions
> will not affect performance. I am currently checking the behavior
> when autoprepare_memory_limit is specified.

Hi,

Thanks for reviewing. Since another CF is about to end, maybe you can post the
full review feedback?

> On Tue, Jul 31, 2018 at 11:30 PM Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
> Concerning next commit fest - I am not sure.
> At previous commitfest it was returned with feedback that it "has
> received no review or comments since last May".
> May be your review will help to change this situation.

Well, I think that wasn't the only reason, and having some balance here would
be nice. Anyway, thanks for working on this patch!

Unfortunately, the patch needs to be rebased one more time. Also, I see there
were concerns about how reliable this patch is. Just out of curiosity, can you
tell now that from v4 to v11 reliability is not a concern anymore?

(If you don't mind, I'll also CC some of the reviewers who saw previous
versions).


From: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
To: 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2018-11-30 02:06:12
Message-ID: 9A6E5062D5D4DB458C80C2B2920BD71D610148@g01jpexmbkw23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 30, 2018 at 3:48 AM, Dmitry Dolgov wrote:

> Hi,
>
> Thanks for reviewing. Since another CF is about to end, maybe you can
> post the full review feedback?

Since I had been busy with my private work, I couldn't review.
I want to review next commit fest. I am sorry I postponed many times.


From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2018-11-30 09:57:34
Message-ID: CA+q6zcWKsaiFz6oWA1JXdJWNwSSk+wUyuOVO2nryCCSxrr3KRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Fri, Nov 30, 2018 at 3:06 AM Yamaji, Ryo <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com> wrote:
>
> On Fri, Nov 30, 2018 at 3:48 AM, Dmitry Dolgov wrote:
>
> > Hi,
> >
> > Thanks for reviewing. Since another CF is about to end, maybe you can
> > post the full review feedback?
>
> Since I had been busy with my private work, I couldn't review.
> I want to review next commit fest. I am sorry I postponed many times.

No worries, just don't forget to post a review when it will be ready.


From: "Nagaura, Ryohei" <nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com>
To: 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>, "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Andres Freund" <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2019-01-29 01:38:46
Message-ID: EDA4195584F5064680D8130B1CA91C453D0182@G01JPEXMBYT04
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Although I became your reviewer, it seems to be difficult to feedback in this CF.
I continue to review, so would you update your patch please?
Until then I review your current patch.

There is one question.
date_1.out which maybe is copy of date.out includes trailing space and gaps of indent
e.g., line 3368 and 3380 in your current patch have
space in each end of line
different indent.
This is also seen in date.out.
I'm not sure whether it is ok to add new file including the above features just because a existing file includes.
Is it ok?

Best regards,
---------------------
Ryohei Nagaura


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Nagaura, Ryohei" <nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>, "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-01-29 10:45:35
Message-ID: 6ea1db86-b300-b57c-0f9d-a55dd3966175@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29.01.2019 4:38, Nagaura, Ryohei wrote:
> Hi,
>
> Although I became your reviewer, it seems to be difficult to feedback in this CF.
> I continue to review, so would you update your patch please?
> Until then I review your current patch.
>
> There is one question.
> date_1.out which maybe is copy of date.out includes trailing space and gaps of indent
> e.g., line 3368 and 3380 in your current patch have
> space in each end of line
> different indent.
> This is also seen in date.out.
> I'm not sure whether it is ok to add new file including the above features just because a existing file includes.
> Is it ok?
>
> Best regards,
> ---------------------
> Ryohei Nagaura
Rebased version of the patch is attached.
Concerning spaces in date.out and date_1.out - it is ok.
This files are just output of test execution and it is not possible to
expect lack of trailing spaces in output of test scripts execution.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-12.patch text/x-patch 89.3 KB

From: "Nagaura, Ryohei" <nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com>
To: 'Konstantin Knizhnik' <k(dot)knizhnik(at)postgrespro(dot)ru>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>, "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2019-01-30 03:04:13
Message-ID: EDA4195584F5064680D8130B1CA91C453D169A@G01JPEXMBYT04
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Mon, Jan 28, 2019 at 10:46 PM, Konstantin Knizhnik wrote:
> Rebased version of the patch is attached.
Thank you for your quick rebase.

> This files are just output of test execution and it is not possible to expect lack of
> trailing spaces in output of test scripts execution.
I understand this is because regression tests use exact matching.
I learned a lot. Thanks.

Best regards,
---------------------
Ryohei Nagaura


From: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>
To: 'Konstantin Knizhnik' <k(dot)knizhnik(at)postgrespro(dot)ru>, "Nagaura, Ryohei" <nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: RE: [HACKERS] Cached plans and statement generalization
Date: 2019-03-19 02:56:38
Message-ID: 9A6E5062D5D4DB458C80C2B2920BD71D6820CB@g01jpexmbkw23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, Jan 29, 2019 at 10:46 AM, Konstantin Knizhnik wrote:
> Rebased version of the patch is attached.

I'm sorry for the late review.
I confirmed behavior of autoprepare-12.patch. It is summarized below.

・parameter
Expected behavior was shown according to the set value.
However, I think that it is be more kind to describe that autoprepare
hold infinite statements when the setting value of
autoprepare_ (memory_) limit is 0 in the manual.
There is no problem as operation.

・pg_autoprepared_statements
I confirmed that I could refer properly.

・autoprepare cache retention period
I confirmed that autoprepared statements were deleted when the set
statement or the DDL statement was executed. Although it differs from
the explicit prepare statements, it does not matter as a autoprepare.

・performance
This patch does not confirm the basic performance of autoprepare because
it confirmed that there was no performance problem with the previous
patch (autoprepare-11.patch). However, because it was argued that
performance degradation would occur when prepared statements execute to
a partition table, I expected that autoprepare might exhibit similar
behavior, and measured the performance. I also predicted that the
plan_cache_mode setting does not apply to autoprepare, and we also
measured the plan_cache_mode by conditions.
Below results (this result is TPS)

plan_cache_mode simple simple(autoprepare) prepare
auto 130 121 121.5
force_custom_plan 132.5 90.7 122.7
force_generic_plan 126.7 14.7 24.7

Performance degradation was observed when plan_cache_mode was specified
for autoprepare. Is this behavior correct? I do not know why this is the
results.

Below performance test procedure

drop table if exists rt;
create table rt (a int, b int, c int) partition by range (a);
\o /dev/null
select 'create table rt' || x::text || ' partition of rt for values from (' ||
(x)::text || ') to (' || (x+1)::text || ');' from generate_series(0, 1024) x;
\gexec
\o

pgbench -p port -T 60 -c 1 -n -f test.sql (-M prepared) postgres

test.sql
\set a random (0, 1023)
select * from rt where a = :a;


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>, "Nagaura, Ryohei" <nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-03-19 15:03:21
Message-ID: 554d0730-f6fc-d1be-fe78-e63973ec621f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you very much for the review!

On 19.03.2019 5:56, Yamaji, Ryo wrote:
> On Tue, Jan 29, 2019 at 10:46 AM, Konstantin Knizhnik wrote:
>> Rebased version of the patch is attached.
> I'm sorry for the late review.
> I confirmed behavior of autoprepare-12.patch. It is summarized below.
>
> ・parameter
> Expected behavior was shown according to the set value.
> However, I think that it is be more kind to describe that autoprepare
> hold infinite statements when the setting value of
> autoprepare_ (memory_) limit is 0 in the manual.
> There is no problem as operation.

Sorry, I do not completely understand your concern.
Description of autoprepare_ (memory_) limit includes explanation of zero
value:

autoprepare_limit:
0 means unlimited number of autoprepared queries. Too large number of
prepared queries can cause backend memory overflow and slowdown
execution speed (because of increased lookup time.

autoprepare_memory_limit:
0 means that there is no memory limit. Calculating memory used by
prepared queries adds some extra overhead,
so non-zero value of this parameter may cause some slowdown.
autoprepare_limit is much faster way to limit number of autoprepared
statements

Do you think that this descriptions are unclear and should be rewritten?

> ・pg_autoprepared_statements
> I confirmed that I could refer properly.
>
> ・autoprepare cache retention period
> I confirmed that autoprepared statements were deleted when the set
> statement or the DDL statement was executed. Although it differs from
> the explicit prepare statements, it does not matter as a autoprepare.
>
> ・performance
> This patch does not confirm the basic performance of autoprepare because
> it confirmed that there was no performance problem with the previous
> patch (autoprepare-11.patch). However, because it was argued that
> performance degradation would occur when prepared statements execute to
> a partition table, I expected that autoprepare might exhibit similar
> behavior, and measured the performance. I also predicted that the
> plan_cache_mode setting does not apply to autoprepare, and we also
> measured the plan_cache_mode by conditions.
> Below results (this result is TPS)
>
> plan_cache_mode simple simple(autoprepare) prepare
> auto 130 121 121.5
> force_custom_plan 132.5 90.7 122.7
> force_generic_plan 126.7 14.7 24.7
>
> Performance degradation was observed when plan_cache_mode was specified
> for autoprepare. Is this behavior correct? I do not know why this is the
> results.
>
> Below performance test procedure
>
> drop table if exists rt;
> create table rt (a int, b int, c int) partition by range (a);
> \o /dev/null
> select 'create table rt' || x::text || ' partition of rt for values from (' ||
> (x)::text || ') to (' || (x+1)::text || ');' from generate_series(0, 1024) x;
> \gexec
> \o
>
> pgbench -p port -T 60 -c 1 -n -f test.sql (-M prepared) postgres
>
> test.sql
> \set a random (0, 1023)
> select * from rt where a = :a;
Autoprepare is using the same functions from plancache.c so
plan_cache_mode settings affect
autoprepare as well as explicitly preprepared statements.

Below are my results of select-only pgbench:

plan_cache_mode simple simple(autoprepare) prepare
auto 23k 42k 50k
force_custom_plan 23k 24k 26k
force_generic_plan 23k 44k 50k

As you can see force_custom_plan slowdowns both explicitly and
autoprepared statements.
Unfortunately generic plans are not working well with partitioned table
because disabling partition pruning.
At my system result of your query execution is the following:

plan_cache_mode simple simple(autoprepare) prepare
auto 232 220 219
force_custom_plan 234 175 211
 force_generic_plan 230 48 48

The conclusion is that forcing generic plan can cause slowdown of
queries on partitioned tables.
If plan cache mode is not enforced, then standard Postgres strategy of
comparing efficiency of generic and custom plans
works well.

Attached please find rebased version of the patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-13.patch text/x-patch 91.8 KB

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: "Yamaji, Ryo" <yamaji(dot)ryo(at)jp(dot)fujitsu(dot)com>, "Nagaura, Ryohei" <nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com>, 'Dmitry Dolgov' <9erthalion6(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-04-03 11:36:22
Message-ID: 35be0423-36bc-cadc-bef8-02d475751f7d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

New version of the patch with fixed error of autopreparing CTE queries.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-14.patch text/x-patch 92.0 KB

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-04-09 12:52:13
Message-ID: 7603a101-6b79-2328-2b4b-3bc3217992e5@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

New version of the patching disabling autoprepare for rules and handling
planner error.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-15.patch text/x-patch 93.1 KB

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-07-01 09:51:38
Message-ID: CA+hUKGJK2GZewMumkXtPjyATMu2tyEmg53R23WCETY=y48RxQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 10, 2019 at 12:52 AM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> New version of the patching disabling autoprepare for rules and handling
> planner error.

Hi Konstantin,

This doesn't apply. Could we please have a fresh rebase for the new Commitfest?

Thanks,

--
Thomas Munro
https://enterprisedb.com


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-07-01 15:29:25
Message-ID: f71ca2cb-b2c0-d6ca-4532-822b25f1f81e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.07.2019 12:51, Thomas Munro wrote:
> On Wed, Apr 10, 2019 at 12:52 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> New version of the patching disabling autoprepare for rules and handling
>> planner error.
> Hi Konstantin,
>
> This doesn't apply. Could we please have a fresh rebase for the new Commitfest?
>
> Thanks,
>
Attached please find rebased version of the patch.
Also this version can be found in autoprepare branch of this repository
https://github.com/postgrespro/postgresql.builtin_pool.git
on github.

Attachment Content-Type Size
autoprepare-16.patch text/x-patch 93.1 KB

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-07-07 23:23:39
Message-ID: CA+hUKGKuxM94eVyBii=0e1bWpa-A05f2L6H-JSij0H5o7t+ZqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 2, 2019 at 3:29 AM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> Attached please find rebased version of the patch.
> Also this version can be found in autoprepare branch of this repository
> https://github.com/postgrespro/postgresql.builtin_pool.git
> on github.

Thanks. I haven't looked at the code but this seems like interesting
work and I hope it will get some review. I guess this is bound to use
a lot of memory. I guess we'd eventually want to figure out how to
share the autoprepared plan cache between sessions, which is obviously
a whole can of worms.

A couple of trivial comments with my CF manager hat on:

1. Can you please fix the documentation? It doesn't build.
Obviously reviewing the goals, design and implementation are more
important than the documentation at this point, but if that is fixed
then the CF bot will be able to run check-world every day and we might
learn something about the code.
2. Accidental editor junk included: src/include/catalog/pg_proc.dat.~1~

--
Thomas Munro
https://enterprisedb.com


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-07-08 19:32:38
Message-ID: 3304c34b-4a9c-909c-4ee1-d8e7f6dcb2de@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08.07.2019 2:23, Thomas Munro wrote:
> On Tue, Jul 2, 2019 at 3:29 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> Attached please find rebased version of the patch.
>> Also this version can be found in autoprepare branch of this repository
>> https://github.com/postgrespro/postgresql.builtin_pool.git
>> on github.
> Thanks. I haven't looked at the code but this seems like interesting
> work and I hope it will get some review. I guess this is bound to use
> a lot of memory. I guess we'd eventually want to figure out how to
> share the autoprepared plan cache between sessions, which is obviously
> a whole can of worms.
>
> A couple of trivial comments with my CF manager hat on:
>
> 1. Can you please fix the documentation? It doesn't build.
> Obviously reviewing the goals, design and implementation are more
> important than the documentation at this point, but if that is fixed
> then the CF bot will be able to run check-world every day and we might
> learn something about the code.
> 2. Accidental editor junk included: src/include/catalog/pg_proc.dat.~1~

Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
I can not reproduce the problem with building documentation:

knizhnik(at)xps:~/postgresql/doc$ make
make -C ../src/backend generated-headers
make[1]: Entering directory '/home/knizhnik/postgresql/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory '/home/knizhnik/postgresql/src/backend/catalog'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/knizhnik/postgresql/src/backend/catalog'
make -C utils distprep generated-header-symlinks
make[2]: Entering directory '/home/knizhnik/postgresql/src/backend/utils'
make[2]: Nothing to be done for 'distprep'.
make[2]: Nothing to be done for 'generated-header-symlinks'.
make[2]: Leaving directory '/home/knizhnik/postgresql/src/backend/utils'
make[1]: Leaving directory '/home/knizhnik/postgresql/src/backend'
make -C src all
make[1]: Entering directory '/home/knizhnik/postgresql/doc/src'
make -C sgml all
make[2]: Entering directory '/home/knizhnik/postgresql/doc/src/sgml'
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '12devel'
stylesheet.xsl postgres.sgml
cp ./stylesheet.css html/
touch html-stamp
/usr/bin/xmllint --path . --noout --valid postgres.sgml
/usr/bin/xsltproc --path . --stringparam pg.version '12devel'
stylesheet-man.xsl postgres.sgml
touch man-stamp
make[2]: Leaving directory '/home/knizhnik/postgresql/doc/src/sgml'
make[1]: Leaving directory '/home/knizhnik/postgresql/doc/src'

Also autoporepare-16.patch doesn't include any junk

src/include/catalog/pg_proc.dat.~1~


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-07-09 12:16:49
Message-ID: CA+hUKGL9_ohxMBq04N8qwn-vEaaan2N-06FaXsTz9+seDddN3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 9, 2019 at 7:32 AM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
> I can not reproduce the problem with building documentation:

+ <term><varname>autoprepare_threshold</varname> (<type>integer/type>)

The problem is that "<type>integer/type>". (Missing "<"). There more
than one of those. Not sure why that doesn't fail for you.

> Also autoporepare-16.patch doesn't include any junk
>
> src/include/catalog/pg_proc.dat.~1~

Oops, yeah, sorry for the noise. I did something stupid in my apply script.

--
Thomas Munro
https://enterprisedb.com


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-07-09 20:59:11
Message-ID: 66f47595-1b93-ef67-fdd6-c5414e0f94c3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.07.2019 15:16, Thomas Munro wrote:
> On Tue, Jul 9, 2019 at 7:32 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
>> I can not reproduce the problem with building documentation:
> + <term><varname>autoprepare_threshold</varname> (<type>integer/type>)
>
> The problem is that "<type>integer/type>". (Missing "<"). There more
> than one of those. Not sure why that doesn't fail for you.
> or the noise. I did something stupid in my apply script.
>

Sorry, there were really several syntax problems.
I didn't understand why I have not noticed them before.
Fixed patch version of the path is attached.

Attachment Content-Type Size
autoprepare-17.patch text/x-patch 93.1 KB

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-07-31 16:56:55
Message-ID: e9cce21f-942f-31f4-54e4-ace7f2de1a5f@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/07/2019 23:59, Konstantin Knizhnik wrote:
> Fixed patch version of the path is attached.

Much of the patch and the discussion has been around the raw parsing,
and guessing which literals are actually parameters that have been
inlined into the SQL text. Do we really need to do that, though? The
documentation mentions pgbouncer and other connection poolers, where you
don't have session state, as a use case for this. But such connection
poolers could and should still be using the extended query protocol,
with Parse+Bind messages and query parameters, even if they don't use
named prepared statements. I'd want to encourage applications and
middleware to use out-of-band query parameters, to avoid SQL injection
attacks, regardless of whether they use prepared statements or cache
query plans. So how about dropping all the raw parse tree stuff, and
doing the automatic caching of plans based on the SQL string, somewhere
in the exec_parse_message? Check the autoprepare cache in
exec_parse_message(), if it was an "unnamed" prepared statement, i.e. if
the prepared statement name is an empty string.

(I'm actually not sure if exec_parse/bind_message is the right place for
this, but I saw that your current patch did it in exec_simple_query, and
exec_parse/bind_message are the equivalent of that for the extended
query protocol).

- Heikki


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-07-31 19:48:16
Message-ID: 5538dbd6-fcbb-33a7-49d8-28c5cd2df435@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.07.2019 19:56, Heikki Linnakangas wrote:
> On 09/07/2019 23:59, Konstantin Knizhnik wrote:
>> Fixed patch version of the path is attached.
>
> Much of the patch and the discussion has been around the raw parsing,
> and guessing which literals are actually parameters that have been
> inlined into the SQL text. Do we really need to do that, though? The
> documentation mentions pgbouncer and other connection poolers, where
> you don't have session state, as a use case for this. But such
> connection poolers could and should still be using the extended query
> protocol, with Parse+Bind messages and query parameters, even if they
> don't use named prepared statements. I'd want to encourage
> applications and middleware to use out-of-band query parameters, to
> avoid SQL injection attacks, regardless of whether they use prepared
> statements or cache query plans. So how about dropping all the raw
> parse tree stuff, and doing the automatic caching of plans based on
> the SQL string, somewhere in the exec_parse_message? Check the
> autoprepare cache in exec_parse_message(), if it was an "unnamed"
> prepared statement, i.e. if the prepared statement name is an empty
> string.
>
> (I'm actually not sure if exec_parse/bind_message is the right place
> for this, but I saw that your current patch did it in
> exec_simple_query, and exec_parse/bind_message are the equivalent of
> that for the extended query protocol).

It will significantly simplify this patch and eliminate all complexity
and troubles  caused by replacing string literals with parameters
if assumption that all client applications are using extended query
protocol is true.
But I am not sure that we can expect it. At least I myself see many
applications which are constructing queries by embedding literal values.
May be it is not so good and safe (SQL injection attack), but many
applications are doing it. And the idea was to improve execution speed
of existed application without changing them.

Also please notice that extended protocol requires passing more message
which has negative effect on performance.
At my  notebook I get about 21k TPS on "pgbench -S" and 18k TPS on
"pgbench -M extended -S".
And it is with unix socket connection! I think that in case of remote
connections difference will be even larger.

So may be committing simple version of this patch which do not need to
solve any challenged problems is good idea.
But I afraid that it will significantly reduce positive effect of this
patch.


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-08-01 16:56:53
Message-ID: 6c91d4ba-1a77-ba64-39ee-fda26dcfc9bd@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.07.2019 19:56, Heikki Linnakangas wrote:
> On 09/07/2019 23:59, Konstantin Knizhnik wrote:
>> Fixed patch version of the path is attached.
>
> Much of the patch and the discussion has been around the raw parsing,
> and guessing which literals are actually parameters that have been
> inlined into the SQL text. Do we really need to do that, though? The
> documentation mentions pgbouncer and other connection poolers, where
> you don't have session state, as a use case for this. But such
> connection poolers could and should still be using the extended query
> protocol, with Parse+Bind messages and query parameters, even if they
> don't use named prepared statements. I'd want to encourage
> applications and middleware to use out-of-band query parameters, to
> avoid SQL injection attacks, regardless of whether they use prepared
> statements or cache query plans. So how about dropping all the raw
> parse tree stuff, and doing the automatic caching of plans based on
> the SQL string, somewhere in the exec_parse_message? Check the
> autoprepare cache in exec_parse_message(), if it was an "unnamed"
> prepared statement, i.e. if the prepared statement name is an empty
> string.
>
> (I'm actually not sure if exec_parse/bind_message is the right place
> for this, but I saw that your current patch did it in
> exec_simple_query, and exec_parse/bind_message are the equivalent of
> that for the extended query protocol).
>
> - Heikki

I decided to implement your proposal. Much simple version of autoprepare
patch is attached.
At my computer I got the following results:

 pgbench -M simple -S         22495 TPS
 pgbench -M extended -S    47633 TPS
 pgbench -M prepared -S    47683 TPS

So autoprepare speedup execution of queries sent using extended protocol
more than twice and it is almost the same as with explicitly prepared
statements.
I failed to create regression test for it because I do not know how to
force psql to use extended protocol. Any advice is welcome.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-extended-1.patch text/x-patch 23.1 KB

From: Daniel Migowski <dmigowski(at)ikoffice(dot)de>
To: <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-08-02 08:25:30
Message-ID: 3ab03f54-b2bd-4b19-d8fb-19579df67caa@ikoffice.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am 01.08.2019 um 18:56 schrieb Konstantin Knizhnik:
> I decided to implement your proposal. Much simple version of
> autoprepare patch is attached.
> At my computer I got the following results:
>
>  pgbench -M simple -S         22495 TPS
>  pgbench -M extended -S    47633 TPS
>  pgbench -M prepared -S    47683 TPS
>
>
> So autoprepare speedup execution of queries sent using extended
> protocol more than twice and it is almost the same as with explicitly
> prepared statements.
> I failed to create regression test for it because I do not know how to
> force psql to use extended protocol. Any advice is welcome.

I am very interested in such a patch, because currently I use the same
functionality within my JDBC driver and having this directly in
PostgreSQL would surely speed things up.

I have two suggestions however:

1. Please allow to gather information about the autoprepared statements
by returning them in pg_prepared_statements view. I would love to
monitor usage of them as well as the memory consumption that occurs. I
suggested a patch to display that in
/message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de

2. Please not only use a LRU list, but maybe something which would
prefer statements that get reused at all. We often create ad hoc
statements with parameters which don't really need to be kept. Maybe I
can suggest an implementation of an LRU list where a reusal of a
statement not only pulls it to the top of the list but also increases a
reuse counter. When then a statement would drop off the end of the list
one checks if the reusal count is non-zero, and only really drops it if
the resual count is zero. Else the reusal count is decremented (or
halved) and the element is also placed at the start of the LRU list, so
it is kept a bit longer.

Thanks,

Daniel Migowski


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Daniel Migowski <dmigowski(at)ikoffice(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-08-02 08:57:31
Message-ID: bd7fce7b-7734-d72e-1809-2d0d073e4034@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.08.2019 11:25, Daniel Migowski wrote:
> Am 01.08.2019 um 18:56 schrieb Konstantin Knizhnik:
>> I decided to implement your proposal. Much simple version of
>> autoprepare patch is attached.
>> At my computer I got the following results:
>>
>>  pgbench -M simple -S         22495 TPS
>>  pgbench -M extended -S    47633 TPS
>>  pgbench -M prepared -S    47683 TPS
>>
>>
>> So autoprepare speedup execution of queries sent using extended
>> protocol more than twice and it is almost the same as with explicitly
>> prepared statements.
>> I failed to create regression test for it because I do not know how
>> to force psql to use extended protocol. Any advice is welcome.
>
> I am very interested in such a patch, because currently I use the same
> functionality within my JDBC driver and having this directly in
> PostgreSQL would surely speed things up.
>
> I have two suggestions however:
>
> 1. Please allow to gather information about the autoprepared
> statements by returning them in pg_prepared_statements view. I would
> love to monitor usage of them as well as the memory consumption that
> occurs. I suggested a patch to display that in
> /message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de
>
Sorry, but there is pg_autoprepared_statements view. I think that it
will be better to distinguish explicitly and implicitly prepared
statements, will not it?
Do you want to add more information to this view? Right now it shows
query string, types of parameters and number of this query execution.

> 2. Please not only use a LRU list, but maybe something which would
> prefer statements that get reused at all. We often create ad hoc
> statements with parameters which don't really need to be kept. Maybe I
> can suggest an implementation of an LRU list where a reusal of a
> statement not only pulls it to the top of the list but also increases
> a reuse counter. When then a statement would drop off the end of the
> list one checks if the reusal count is non-zero, and only really drops
> it if the resual count is zero. Else the reusal count is decremented
> (or halved) and the element is also placed at the start of the LRU
> list, so it is kept a bit longer.
>
There are many variations of LRU and alternatives: clock, segmented LRU, ...
I agree that classical LRU may be not the best replacement policy for
autoprepare.
Application is either use static queries, either constructing them
dynamically. It will be better to distinguish queries executed at least
twice from one-shot queries.
So may be SLRU will be the best choice. But actually situation you have
described (We often create ad hoc statements with parameters which don't
really need to be kept)
is not applicable to atutoprepare. Autoprepare deals only with
statements which are actually executed.

We have another patch which limits number of stored prepared plans to
avoid memory overflow (in case of using stored procedures which
implicitly prepare all issued queries).
Here choice of replacement policy is more critical.

> Thanks,
>
> Daniel Migowski
>
>
>
>

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Daniel Migowski <dmigowski(at)ikoffice(dot)de>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-08-02 09:01:47
Message-ID: b68e36eb-a0ea-3b28-62c4-a72f06fe4dfb@ikoffice.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am 02.08.2019 um 10:57 schrieb Konstantin Knizhnik:
> On 02.08.2019 11:25, Daniel Migowski wrote:
>>
>> I have two suggestions however:
>>
>> 1. Please allow to gather information about the autoprepared
>> statements by returning them in pg_prepared_statements view. I would
>> love to monitor usage of them as well as the memory consumption that
>> occurs. I suggested a patch to display that in
>> /message-id/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4@EXCHANGESERVER.ikoffice.de
>>
>>
> Sorry, but there is pg_autoprepared_statements view. I think that it
> will be better to distinguish explicitly and implicitly prepared
> statements, will not it?
> Do you want to add more information to this view? Right now it shows
> query string, types of parameters and number of this query execution.
My sorry, I didn't notice it in the patch. If there is another view for
those that's perfectly fine.
>> 2. Please not only use a LRU list, but maybe something which would
>> prefer statements that get reused at all. We often create ad hoc
>> statements with parameters which don't really need to be kept. Maybe
>> I can suggest an implementation of an LRU list where a reusal of a
>> statement not only pulls it to the top of the list but also increases
>> a reuse counter. When then a statement would drop off the end of the
>> list one checks if the reusal count is non-zero, and only really
>> drops it if the resual count is zero. Else the reusal count is
>> decremented (or halved) and the element is also placed at the start
>> of the LRU list, so it is kept a bit longer.
>>
> There are many variations of LRU and alternatives: clock, segmented
> LRU, ...
> I agree that classical LRU may be not the best replacement policy for
> autoprepare.
> Application is either use static queries, either constructing them
> dynamically. It will be better to distinguish queries executed at
> least twice from one-shot queries.
> So may be SLRU will be the best choice. But actually situation you
> have described (We often create ad hoc statements with parameters
> which don't really need to be kept)
> is not applicable to atutoprepare. Autoprepare deals only with
> statements which are actually executed.
>
> We have another patch which limits number of stored prepared plans to
> avoid memory overflow (in case of using stored procedures which
> implicitly prepare all issued queries).
> Here choice of replacement policy is more critical.

Alright, just wanted to have something better than a LRU, so it is not a
stepback from the implementation in the JDBC driver for us.

Kind regards,
Daniel Migowski


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-08-05 17:02:09
Message-ID: e228a05c-501d-8b66-67b5-bae212cb7b70@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.08.2019 19:56, Konstantin Knizhnik wrote:
>
>
> On 31.07.2019 19:56, Heikki Linnakangas wrote:
>> On 09/07/2019 23:59, Konstantin Knizhnik wrote:
>>> Fixed patch version of the path is attached.
>>
>> Much of the patch and the discussion has been around the raw parsing,
>> and guessing which literals are actually parameters that have been
>> inlined into the SQL text. Do we really need to do that, though? The
>> documentation mentions pgbouncer and other connection poolers, where
>> you don't have session state, as a use case for this. But such
>> connection poolers could and should still be using the extended query
>> protocol, with Parse+Bind messages and query parameters, even if they
>> don't use named prepared statements. I'd want to encourage
>> applications and middleware to use out-of-band query parameters, to
>> avoid SQL injection attacks, regardless of whether they use prepared
>> statements or cache query plans. So how about dropping all the raw
>> parse tree stuff, and doing the automatic caching of plans based on
>> the SQL string, somewhere in the exec_parse_message? Check the
>> autoprepare cache in exec_parse_message(), if it was an "unnamed"
>> prepared statement, i.e. if the prepared statement name is an empty
>> string.
>>
>> (I'm actually not sure if exec_parse/bind_message is the right place
>> for this, but I saw that your current patch did it in
>> exec_simple_query, and exec_parse/bind_message are the equivalent of
>> that for the extended query protocol).
>>
>> - Heikki
>
> I decided to implement your proposal. Much simple version of
> autoprepare patch is attached.
> At my computer I got the following results:
>
>  pgbench -M simple -S         22495 TPS
>  pgbench -M extended -S    47633 TPS
>  pgbench -M prepared -S    47683 TPS
>
>
> So autoprepare speedup execution of queries sent using extended
> protocol more than twice and it is almost the same as with explicitly
> prepared statements.
> I failed to create regression test for it because I do not know how to
> force psql to use extended protocol. Any advice is welcome.
>

Slightly improved and rebased version of the patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-extended-2.patch text/x-patch 25.3 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-09-25 20:06:11
Message-ID: 20190925200611.GA22668@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This patch fails the "rules" tests. Please fix.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-09-26 07:23:38
Message-ID: 2b7c0b2d-ef0e-d2c9-4c7d-6fa485a60f63@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25.09.2019 23:06, Alvaro Herrera wrote:
> This patch fails the "rules" tests. Please fix.
>
>
Sorry,
New version of the patch with corrected expected output for rules test
is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-extended-3.patch text/x-patch 26.0 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-12-01 03:26:03
Message-ID: 20191201032603.GE2355@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 26, 2019 at 10:23:38AM +0300, Konstantin Knizhnik wrote:
> Sorry,
> New version of the patch with corrected expected output for rules test is
> attached.

It looks like the documentation is failing to build. Could you fix
that? There may be other issues as well. I have moved the patch to
next CF, waiting on author.
--
Michael


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2019-12-02 10:07:51
Message-ID: 953a617e-bd6b-3099-2d03-7cf152d68efa@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.12.2019 6:26, Michael Paquier wrote:
> On Thu, Sep 26, 2019 at 10:23:38AM +0300, Konstantin Knizhnik wrote:
>> Sorry,
>> New version of the patch with corrected expected output for rules test is
>> attached.
> It looks like the documentation is failing to build. Could you fix
> that? There may be other issues as well. I have moved the patch to
> next CF, waiting on author.
> --
> Michael

Sorry, issues with documentation are fixed.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
autoprepare-extended-4.patch text/x-patch 26.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2020-03-01 19:26:16
Message-ID: 29529.1583090776@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
> [ autoprepare-extended-4.patch ]

The cfbot is showing that this doesn't apply anymore; there's
some doubtless-trivial conflict in prepare.c.

However ... TBH I've been skeptical of this whole proposal from the
beginning, on the grounds that it would probably hurt more use-cases
than it helps. The latest approach doesn't really do anything to
assuage that fear, because now that you've restricted it to extended
query mode, the feature amounts to nothing more than deliberately
overriding the application's choice to use unnamed rather than named
(prepared) statements. How often is that going to be a good idea?
And when it is, wouldn't it be better to fix the app? The client is
likely to have a far better idea of which statements would benefit
from this treatment than the server will; and in this context,
the client-side changes needed would really be trivial. The original
proposal, scary as it was, at least supported the argument that you
could use it to improve applications that were too dumb/hoary to
parameterize commands for themselves.

I'm also unimpressed by benchmark testing that relies entirely on
pgbench's default scenario, because that scenario consists entirely
of queries that are perfectly adapted to plan-only-once treatment.
In the real world, we constantly see complaints about cases where the
plancache mistakenly decides that a generic plan is acceptable. I think
that extending that behavior to more cases is going to be a net loss,
until we find some way to make it smarter and more reliable. At the
very least, you need to show some worst-case numbers alongside these
best-case numbers --- what happens with queries where we conclude we
must replan every time, so that the plancache becomes pure overhead?

The pgbench test case is laughably unrealistic in another way, in that
there are so few distinct queries it issues, so that there's no
stress at all on the cache-management aspect of this problem.

In short, I really think we ought to reject this patch and move on.
Maybe it could be resurrected sometime in the future when we have a
better handle on when to cache plans and when not.

If you want to press forward with it anyway, certainly the lack of
any tests in this patch is another big objection. Perhaps you
could create a pgbench TAP script that exercises the logic.

regards, tom lane


From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2020-07-01 09:18:52
Message-ID: 83A439D3-ADEB-4F9D-8051-A4EC230F3C25@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 1 Mar 2020, at 20:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> In short, I really think we ought to reject this patch and move on.
> Maybe it could be resurrected sometime in the future when we have a
> better handle on when to cache plans and when not.
>
> If you want to press forward with it anyway, certainly the lack of
> any tests in this patch is another big objection. Perhaps you
> could create a pgbench TAP script that exercises the logic.

Based on Tom's review, and that nothing has been submitted since, I've marked
this entry as returned with feedback. Feel to open a new entry if you want to
address Tom's comments and take this further.

cheers ./daniel


From: Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cached plans and statement generalization
Date: 2021-04-25 18:57:10
Message-ID: CAL-rCA1r+Pvcg4Y6hm00UDtT0_DhjyFjvswaronv7N7Yb+iPhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

вс, 1 мар. 2020 г. в 22:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>
> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
> > [ autoprepare-extended-4.patch ]
>
> The cfbot is showing that this doesn't apply anymore; there's
> some doubtless-trivial conflict in prepare.c.
>
> However ... TBH I've been skeptical of this whole proposal from the
> beginning, on the grounds that it would probably hurt more use-cases
> than it helps. The latest approach doesn't really do anything to
> assuage that fear, because now that you've restricted it to extended
> query mode, the feature amounts to nothing more than deliberately
> overriding the application's choice to use unnamed rather than named
> (prepared) statements. How often is that going to be a good idea?
> And when it is, wouldn't it be better to fix the app? The client is
> likely to have a far better idea of which statements would benefit
> from this treatment than the server will; and in this context,
> the client-side changes needed would really be trivial. The original
> proposal, scary as it was, at least supported the argument that you
> could use it to improve applications that were too dumb/hoary to
> parameterize commands for themselves.

The theme of this thread:
- it is not possible to reliably "fix the app" when pgbouncer or internal
driver connection multiplexing are used.
- another widely spread case is frameworks (ORM or other).
There is no way to prepare a concrete query because it is buried under
levels of abstractions.

Whole "autoprepare" thing is a workaround for absence of "really trivial
client-side changes" in these conditions.

regards,
Yura