Re: Parallel sec scan in plpgsql

Lists: Postg롤 토토SQL :
From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Parallel sec scan in plpgsql
Date: 2016-09-15 15:45:12
Message-ID: 538ca16c-dc52-237b-a79a-8fc8752f6967@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello!
Does parallel secscan works in plpgsql?

--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-16 04:27:08
Message-ID: CAFjFpRc9W6__zPNkwr-4sczGbPHD4w5YJ3wv3hOBQ8Y+v+SN7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> wrote:
> Hello!
> Does parallel secscan works in plpgsql?
>

Parallel seq scan is a query optimization that will work independent
of the source of the query - i.e whether it comes directly from a
client or a procedural language like plpgsql. So, I guess, answer to
your question is yes. If you are expecting something else, more
context will help.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-16 13:27:41
Message-ID: 8163e56a-c137-4dd9-5d3d-4505a6a9ac4e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg롤 토토SQL :

No it doesn't.
Paralleling neither sql function nor plpgsql:
Here is example :

ipdr=> show max_worker_processes ;
max_worker_processes
----------------------
128
(1 row)
ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set force_parallel_mode=on;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET

ipdr=> create table test as select (random ()*1000)::int % 3 as a,
ipdr-> (random ()*1000)::int % 5 as b,
ipdr-> (random ()*1000)::int % 7 as c,
ipdr-> (random ()*1000)::int % 11 as d,
ipdr-> (random ()*1000)::int % 13 as e,
ipdr-> (random ()*1000)::int % 17 as bytes
ipdr-> from generate_series(1,10*1000*1000);
SELECT 10000000

ipdr=> create or replace function parallel_test_plpgsql() returns bigint as
ipdr-> $$
ipdr$> declare
ipdr$> cnt int:=0;
ipdr$> begin
ipdr$> select count(*) into cnt from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
ipdr$> return cnt;
ipdr$> end;
ipdr$> $$ language plpgsql PARALLEL SAFE STRICT;
CREATE FUNCTION

ipdr=>
ipdr=> create or replace function parallel_test_sql() returns bigint as
ipdr-> $$
ipdr$> select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
ipdr$> $$ language sql PARALLEL SAFE STRICT;
CREATE FUNCTION

ipdr=> analyze test;
ANALYZE
ipdr=> explain (analyze,buffers) select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=87702.33..87702.34 rows=1 width=8) (actual time=723.792..723.792 rows=1 loops=1)
Buffers: shared hit=65015
-> Finalize HashAggregate (cost=87364.49..87514.64 rows=15015 width=28) (actual time=720.496..722.589 rows=15015 loops=1)
Group Key: test.a, test.b, test.c, test.d, test.e
Buffers: shared hit=65015
-> Gather (cost=85149.78..85299.93 rows=165165 width=20) (actual time=502.607..665.039 rows=180180 loops=1)
Workers Planned: 11
Workers Launched: 11
Buffers: shared hit=65015
-> Partial HashAggregate (cost=84149.78..84299.93 rows=15015 width=20) (actual time=497.106..501.170 rows=15015 loops=12)
Group Key: test.a, test.b, test.c, test.d, test.e
Buffers: shared hit=63695
-> Parallel Seq Scan on test (cost=0.00..72786.01 rows=909101 width=20) (actual time=0.018..166.556 rows=833333 loops=12)
Buffers: shared hit=63695
Planning time: 0.250 ms
Execution time: 724.293 ms
(16 rows)

ipdr=> explain (analyze,buffers) select parallel_test_plpgsql();
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1000.26 rows=1 width=8) (actual time=4088.952..4088.956 rows=1 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
Buffers: shared hit=64186
-> Result (cost=0.00..0.26 rows=1 width=8) (actual time=4084.997..4084.999 rows=1 loops=1)
Buffers: shared hit=64149
Planning time: 0.025 ms
Execution time: 4100.026 ms
(9 rows)

Log from auto_explain:
2016-09-16 16:05:11 MSK [28209]: [1-1] user=,db=,app=,client= LOG: duration: 4082.517 ms plan:
Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t
Aggregate (cost=289035.43..289035.44 rows=1 width=8)
-> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
-> Seq Scan on test (cost=0.00..163696.15 rows=10000115 width=20)

ipdr=> explain (analyze,buffers) select parallel_test_sql();
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1000.26 rows=1 width=8) (actual time=4256.830..4256.837 rows=1 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
Buffers: shared hit=64132
-> Result (cost=0.00..0.26 rows=1 width=8) (actual time=4252.401..4252.403 rows=1 loops=1)
Buffers: shared hit=64095
Planning time: 0.151 ms
Execution time: 4267.959 ms
(9 rows)

Log from auto_explain:
2016-09-16 16:22:03 MSK [731]: [1-1] user=,db=,app=,client= LOG: duration: 4249.851 ms plan:
Query Text:
select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;

Aggregate (cost=289035.43..289035.44 rows=1 width=8)
-> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
-> Seq Scan on test (cost=0.00..163696.15 rows=10000115 width=20)

So as we can see parallel secscan doesn't works in plpgsql and sql functions.
Can somebody explains me where I was wrong?

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On 16.09.2016 07:27, Ashutosh Bapat wrote:
> On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> wrote:
>> Hello!
>> Does parallel secscan works in plpgsql?
>>
>
> Parallel seq scan is a query optimization that will work independent
> of the source of the query - i.e whether it comes directly from a
> client or a procedural language like plpgsql. So, I guess, answer to
> your question is yes. If you are expecting something else, more
> context will help.
>


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-16 13:50:07
Message-ID: CAA4eK1LzNPdYxQsG6bBXjZ=8rX2c8XyUKQy+OA+QaZ9_AV0+Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> wrote:
> No it doesn't.
> Paralleling neither sql function nor plpgsql:
> Here is example :
>
> ipdr=> show max_worker_processes ;
> max_worker_processes
> ----------------------
> 128
> (1 row)
> ipdr=> set max_parallel_workers_per_gather to 128;
> SET
> ipdr=> set force_parallel_mode=on;
> SET
> ipdr=> set min_parallel_relation_size =0;
> SET
> ipdr=> set parallel_tuple_cost=0;
> SET
>

Can you try by setting force_parallel_mode = off;? I think it is
sending the whole function execution to worker due to
force_parallel_mode.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-16 15:18:04
Message-ID: 95c75bb0-a418-30cd-ff40-53eef1b27f48@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 순위SQL


On 16.09.2016 16:50, Amit Kapila wrote:
> On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> wrote:
>> No it doesn't.
>> Paralleling neither sql function nor plpgsql:
>> Here is example :
>>
>> ipdr=> show max_worker_processes ;
>> max_worker_processes
>> ----------------------
>> 128
>> (1 row)
>> ipdr=> set max_parallel_workers_per_gather to 128;
>> SET
>> ipdr=> set force_parallel_mode=on;
>> SET
>> ipdr=> set min_parallel_relation_size =0;
>> SET
>> ipdr=> set parallel_tuple_cost=0;
>> SET
>>
>
> Can you try by setting force_parallel_mode = off;? I think it is
> sending the whole function execution to worker due to
> force_parallel_mode.
>
>

No changes:

ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET
ipdr=> set force_parallel_mode = off;
SET
ipdr=> select name,setting from pg_settings where name in('max_parallel_workers_per_gather',
ipdr(> 'min_parallel_relation_size',
ipdr(> 'parallel_tuple_cost',
ipdr(> 'force_parallel_mode');
name | setting
---------------------------------+---------
force_parallel_mode | off
max_parallel_workers_per_gather | 128
min_parallel_relation_size | 0
parallel_tuple_cost | 0
(4 rows)

ipdr=> explain (analyze,buffers) select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=87702.33..87702.34 rows=1 width=8) (actual time=709.643..709.643 rows=1 loops=1)
Buffers: shared hit=65015
-> Finalize HashAggregate (cost=87364.49..87514.64 rows=15015 width=28) (actual time=706.382..708.456 rows=15015 loops=1)
Group Key: test.a, test.b, test.c, test.d, test.e
Buffers: shared hit=65015
-> Gather (cost=85149.78..85299.93 rows=165165 width=20) (actual time=478.626..645.209 rows=180180 loops=1)
Workers Planned: 11
Workers Launched: 11
Buffers: shared hit=65015
-> Partial HashAggregate (cost=84149.78..84299.93 rows=15015 width=20) (actual time=473.890..478.309 rows=15015 loops=12)
Group Key: test.a, test.b, test.c, test.d, test.e
Buffers: shared hit=63695
-> Parallel Seq Scan on test (cost=0.00..72786.01 rows=909101 width=20) (actual time=0.021..163.120 rows=833333 loops=12)
Buffers: shared hit=63695
Planning time: 0.318 ms
Execution time: 710.600 ms
(16 rows)

ipdr=> explain (analyze,buffers) select parallel_test_plpgsql();
QUERY PLAN
------------------------------------------------------------------------------------------
Result (cost=0.00..0.26 rows=1 width=8) (actual time=4003.719..4003.720 rows=1 loops=1)
Buffers: shared hit=63869
Planning time: 0.021 ms
Execution time: 4003.769 ms
(4 rows)

auto_explain:
2016-09-16 18:02:29 MSK [29353]: [53-1] user=ipdr,db=ipdr,app=psql,client=[local] LOG: duration: 4001.275 ms plan:
Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t
Aggregate (cost=289035.43..289035.44 rows=1 width=8)
-> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
-> Seq Scan on test (cost=0.00..163696.15 rows=10000115 width=20)
2016-09-16 18:02:29 MSK [29353]: [54-1] user=ipdr,db=ipdr,app=psql,client=[local] CONTEXT: SQL statement "select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t"
PL/pgSQL function parallel_test_plpgsql() line 5 at SQL statement

ipdr=> explain (analyze,buffers) select parallel_test_plpgsql();
QUERY PLAN
------------------------------------------------------------------------------------------
Result (cost=0.00..0.26 rows=1 width=8) (actual time=4497.820..4497.822 rows=1 loops=1)
Buffers: shared hit=63695
Planning time: 0.023 ms
Execution time: 4497.872 ms
(4 rows)

auto_explain:
2016-09-16 18:03:23 MSK [29353]: [57-1] user=ipdr,db=ipdr,app=psql,client=[local] LOG: duration: 4497.050 ms plan:
Query Text: select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t
Aggregate (cost=289035.43..289035.44 rows=1 width=8)
-> HashAggregate (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
-> Seq Scan on test (cost=0.00..163696.15 rows=10000115 width=20)
2016-09-16 18:03:23 MSK [29353]: [58-1] user=ipdr,db=ipdr,app=psql,client=[local] CONTEXT: SQL statement "select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t"
PL/pgSQL function parallel_test_plpgsql() line 5 at SQL statement


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-18 03:54:48
Message-ID: CAA4eK1La719C5oGH0nD75EqZSRzRs50zKZer4KTfO1O+D94oiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> wrote:
>
> On 16.09.2016 16:50, Amit Kapila wrote:
>>
>>
>> Can you try by setting force_parallel_mode = off;? I think it is
>> sending the whole function execution to worker due to
>> force_parallel_mode.
>>
>>
>
> No changes:
>

Okay, it just skipped from my mind that we don't support parallel
queries for SQL statement execution (or statements executed via
exec_stmt_execsql) from plpgsql. For detailed explanation of why that
is not feasible you can refer one of my earlier e-mails [1] on similar
topic. I think if we can somehow get the results via Perform
statement, then it could be possible to use parallelism via plpgsql.

However, you can use it via SQL functions, an example is below:

set min_parallel_relation_size =0;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;

Load 'auto_explain';
set auto_explain.log_min_duration = 0;
set auto_explain.log_analyze = true;
set auto_explain.log_nested_statements = true;

create table test_plpgsql(c1 int, c2 char(1000));
insert into test_plpgsql values(generate_series(1,100000),'aaa');

create or replace function parallel_test_set_sql() returns
setof bigint as $$
select count(*) from test_plpgsql;
$$language sql PARALLEL SAFE STRICT STABLE;

Then execute function as: select * from parallel_test_set_sql(); You
can see below plan if auto_explain module is loaded.

Finalize Aggregate (cost=14806.85..14806.86 rows=1 width=8) (actual tim
e=1094.966..1094.967 rows=1 loops=1)
-> Gather (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
216..1094.943 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=14806.83..14806.84 rows=1 width=8)
(actual time=177.867..177.868 rows=1 loops=3)
-> Parallel Seq Scan on test_plpgsql (cost=0.00..14702.6
7 rows=41667 width=0) (actual time=0.384..142.565 rows=33333 loops=3)
CONTEXT: SQL function "parallel_test_set_sql" statement 1
LOG: duration: 2965.040 ms plan:
Query Text: select * from parallel_test_set_sql();
Function Scan on parallel_test_set_sql (cost=0.25..10.25 rows=1000 widt
h=8) (actual time=2538.620..2776.955 rows=1 loops=1)

In general, I think we should support the cases as required (or
written) by you from plpgsql or sql functions. We need more work to
support such cases. There are probably two ways of supporting such
cases, we can build some intelligence in plpgsql execution such that
it can recognise such queries and allow to use parallelism or we need
to think of enabling parallelism for cases where we don't run the plan
to completion. Most of the use cases from plpgsql or sql function
fall into later category as they don't generally run the plan to
completion.

[1] - /message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-19 18:09:00
Message-ID: CA+TgmobF7_f_x8KiwRyNA0Lj-d7RxKNBQ6KUL-sAQNuF_B1_5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 17, 2016 at 11:54 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> In general, I think we should support the cases as required (or
> written) by you from plpgsql or sql functions. We need more work to
> support such cases. There are probably two ways of supporting such
> cases, we can build some intelligence in plpgsql execution such that
> it can recognise such queries and allow to use parallelism or we need
> to think of enabling parallelism for cases where we don't run the plan
> to completion. Most of the use cases from plpgsql or sql function
> fall into later category as they don't generally run the plan to
> completion.

I think there's certainly more work that could be done to teach
PL/pgsql about cases where the query will run to completion. I didn't
work very hard to make sure we covered all of those; there are
probably several different cases where parallelism could be safely
enabled but currently isn't. Also, I didn't do anything at all to
update the other PLs, and that would be good, too.

However, I think the chances of supporting parallel query for queries
not executed to completion any time in the near future are very poor.
Once the query is suspended, the user can do anything they like,
including stuff that's parallel-unsafe, and then we have no choice but
to error out, and that's not what we want to happen. If we had
absolutely no parallel-unsafe operations - which would be quite a feat
- then we might be able to get there. But even then you have the
problem that while the query is suspended, you are still nailing down
workers that are sitting around idle, waiting for the leader to resume
the query, and that's not very good either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-20 11:15:45
Message-ID: 6d1ff632-f0b5-079f-24c6-c4981b65ae8e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 18.09.2016 06:54, Amit Kapila wrote:
> On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> wrote:
>>
>> On 16.09.2016 16:50, Amit Kapila wrote:
>>>
>>>
>>> Can you try by setting force_parallel_mode = off;? I think it is
>>> sending the whole function execution to worker due to
>>> force_parallel_mode.
>>>
>>>
>>
>> No changes:
>>
>
> Okay, it just skipped from my mind that we don't support parallel
> queries for SQL statement execution (or statements executed via
> exec_stmt_execsql) from plpgsql. For detailed explanation of why that
> is not feasible you can refer one of my earlier e-mails [1] on similar
> topic. I think if we can somehow get the results via Perform
> statement, then it could be possible to use parallelism via plpgsql.
>
> However, you can use it via SQL functions, an example is below:
>
> set min_parallel_relation_size =0;
> set parallel_tuple_cost=0;
> set parallel_setup_cost=0;
>
> Load 'auto_explain';
> set auto_explain.log_min_duration = 0;
> set auto_explain.log_analyze = true;
> set auto_explain.log_nested_statements = true;
>
> create table test_plpgsql(c1 int, c2 char(1000));
> insert into test_plpgsql values(generate_series(1,100000),'aaa');
>
> create or replace function parallel_test_set_sql() returns
> setof bigint as $$
> select count(*) from test_plpgsql;
> $$language sql PARALLEL SAFE STRICT STABLE;
>
> Then execute function as: select * from parallel_test_set_sql(); You
> can see below plan if auto_explain module is loaded.
>
> Finalize Aggregate (cost=14806.85..14806.86 rows=1 width=8) (actual tim
> e=1094.966..1094.967 rows=1 loops=1)
> -> Gather (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
> 216..1094.943 rows=3 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Partial Aggregate (cost=14806.83..14806.84 rows=1 width=8)
> (actual time=177.867..177.868 rows=1 loops=3)
> -> Parallel Seq Scan on test_plpgsql (cost=0.00..14702.6
> 7 rows=41667 width=0) (actual time=0.384..142.565 rows=33333 loops=3)
> CONTEXT: SQL function "parallel_test_set_sql" statement 1
> LOG: duration: 2965.040 ms plan:
> Query Text: select * from parallel_test_set_sql();
> Function Scan on parallel_test_set_sql (cost=0.25..10.25 rows=1000 widt
> h=8) (actual time=2538.620..2776.955 rows=1 loops=1)
>
>
> In general, I think we should support the cases as required (or
> written) by you from plpgsql or sql functions. We need more work to
> support such cases. There are probably two ways of supporting such
> cases, we can build some intelligence in plpgsql execution such that
> it can recognise such queries and allow to use parallelism or we need
> to think of enabling parallelism for cases where we don't run the plan
> to completion. Most of the use cases from plpgsql or sql function
> fall into later category as they don't generally run the plan to
> completion.
>
>
> [1] - /message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com
>

Thank you for you sugestion! That works.

But what we can do with this function:
create or replace function parallel_test_sql(t int) returns setof bigint as
$$
select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t;
$$ language sql PARALLEL SAFE STRICT STABLE;

explain (analyze,buffers) select * from parallel_test_sql(2);

"Function Scan on parallel_test_sql (cost=0.25..10.25 rows=1000 width=8) (actual time=2410.789..2410.790 rows=1 loops=1)"
" Buffers: shared hit=63696"
"Planning time: 0.082 ms"
"Execution time: 2410.841 ms"

2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - Query Tool,client=127.0.0.1 LOG: duration: 2410.135 ms plan:
Query Text:
select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t;

Aggregate (cost=230701.42..230701.43 rows=1 width=8)
-> HashAggregate (cost=230363.59..230513.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
-> Seq Scan on test (cost=0.00..188696.44 rows=3333372 width=20)
Filter: (a >= $1)

No parallelism again. Looks like that Filter: (a >= $1) breaks parallelism

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-20 13:24:03
Message-ID: CAA4eK1J=U4=8AGJ-2ujOSkpM28i17N7g33CqQRWScT-ijds8FQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg사설 토토 사이트SQL

On Mon, Sep 19, 2016 at 11:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Sep 17, 2016 at 11:54 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> In general, I think we should support the cases as required (or
>> written) by you from plpgsql or sql functions. We need more work to
>> support such cases. There are probably two ways of supporting such
>> cases, we can build some intelligence in plpgsql execution such that
>> it can recognise such queries and allow to use parallelism or we need
>> to think of enabling parallelism for cases where we don't run the plan
>> to completion. Most of the use cases from plpgsql or sql function
>> fall into later category as they don't generally run the plan to
>> completion.
>
> I think there's certainly more work that could be done to teach
> PL/pgsql about cases where the query will run to completion. I didn't
> work very hard to make sure we covered all of those; there are
> probably several different cases where parallelism could be safely
> enabled but currently isn't. Also, I didn't do anything at all to
> update the other PLs, and that would be good, too.
>
> However, I think the chances of supporting parallel query for queries
> not executed to completion any time in the near future are very poor.
>

I think here point is that for any case where there is count of rows
to be selected, we disable parallelism. There are many genuine cases
like select count(*) into cnt ... which will run to completion, but as
plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode.
There are couple other cases like that. Do you see a reason for not
enabling parallelism for such cases?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-20 14:58:17
Message-ID: 20525.1474383497@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> On Mon, Sep 19, 2016 at 11:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> However, I think the chances of supporting parallel query for queries
>> not executed to completion any time in the near future are very poor.

> I think here point is that for any case where there is count of rows
> to be selected, we disable parallelism. There are many genuine cases
> like select count(*) into cnt ... which will run to completion, but as
> plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode.
> There are couple other cases like that. Do you see a reason for not
> enabling parallelism for such cases?

The other problem that would have to be confronted here is nesting,
ie it would only be OK for a plpgsql function to invoke a parallel
query if it wasn't itself being executed by a parallel worker ---
or maybe even if it's being executed by the leader process but there's
an active Gather somewhere else in the calling query's plan tree.
(Not sure about the implementation's properties for that case.)
We'd have to decide whether we want plancache to track both parallel
and nonparallel plans for plpgsql queries. Do-able no doubt but
pretty ugly. Maybe it would be better to fix the rule against workers
invoking their own parallel queries.

However that's probably moot unless the not-executing-to-completion
issue can be solved.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-20 15:01:17
Message-ID: CA+TgmoY_uNECNPX1BaC-zOpFocaEoSThaeqVMmGvDV18k2j+eA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토 베트맨SQL

On Tue, Sep 20, 2016 at 9:24 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I think here point is that for any case where there is count of rows
> to be selected, we disable parallelism. There are many genuine cases
> like select count(*) into cnt ... which will run to completion, but as
> plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode.
> There are couple other cases like that. Do you see a reason for not
> enabling parallelism for such cases?

If we can somehow know that the rowcount which is passed is greater
than or equal to the actual number of rows which will be generated,
then it's fine to enable parallelism.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-20 15:02:53
Message-ID: CA+TgmoYegNkjRt=WuGc5r+dfBh8gr+dn7t9mrg3Nnu9MeLhO4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 20, 2016 at 10:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Maybe it would be better to fix the rule against workers
> invoking their own parallel queries.

That rule does have the advantage of preventing us from having one
user backend launch N^2 workers. I don't think it would be that much
work to fix it, but the results might be pretty exciting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-22 12:36:42
Message-ID: CAA4eK1+deWHDvWPvBXAXPpGH=dhRWgt485A_HvLEYaWU3JBpKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg젠 토토SQL :

On Tue, Sep 20, 2016 at 8:31 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Sep 20, 2016 at 9:24 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> I think here point is that for any case where there is count of rows
>> to be selected, we disable parallelism. There are many genuine cases
>> like select count(*) into cnt ... which will run to completion, but as
>> plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode.
>> There are couple other cases like that. Do you see a reason for not
>> enabling parallelism for such cases?
>
> If we can somehow know that the rowcount which is passed is greater
> than or equal to the actual number of rows which will be generated,
> then it's fine to enable parallelism.
>

I think for certain cases like into clause, the rows passed will be
equal to actual number of rows, otherwise it will generate error. So
we can pass that information in executor layer. Another kind of cases
which are worth considering are when from plpgsql we fetch limited
rows at-a-time, but we fetch till end like the case of
exec_stmt_return_query().

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-22 14:02:18
Message-ID: CA+TgmobWiu2rx+u7XL+jLjv2h2iNzGMpqaFmp1VdPmi8=SUKFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 22, 2016 at 8:36 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I think for certain cases like into clause, the rows passed will be
> equal to actual number of rows, otherwise it will generate error. So
> we can pass that information in executor layer. Another kind of cases
> which are worth considering are when from plpgsql we fetch limited
> rows at-a-time, but we fetch till end like the case of
> exec_stmt_return_query().

Yes, I think that those cases can be considered. Some careful code
inspection will be needed to make sure the cases we want to enable are
safe, and some testing will be needed to make sure they behave
properly. But it doesn't sound like an unsolvable problem. I hope
someone who isn't me will decide to work on it. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-23 03:34:13
Message-ID: CAA4eK1JTaqOBVQW+jxUR+66OoAR=v84=YwsT2QGhU3+WyfL6nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토 결과SQL

On Thu, Sep 22, 2016 at 7:32 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Sep 22, 2016 at 8:36 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> I think for certain cases like into clause, the rows passed will be
>> equal to actual number of rows, otherwise it will generate error. So
>> we can pass that information in executor layer. Another kind of cases
>> which are worth considering are when from plpgsql we fetch limited
>> rows at-a-time, but we fetch till end like the case of
>> exec_stmt_return_query().
>
> Yes, I think that those cases can be considered. Some careful code
> inspection will be needed to make sure the cases we want to enable are
> safe, and some testing will be needed to make sure they behave
> properly. But it doesn't sound like an unsolvable problem. I hope
> someone who isn't me will decide to work on it. :-)
>

makes sense. I think along with that we can also evaluate, if we can
enable parallel query from other pl languages. I think if we can
enable parallelism from all pl languages in 10.0, that will be a good
step forward.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com