From: | Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Seq Scan |
Date: | 2015-02-20 21:48:23 |
Message-ID: | CAJrrPGd28BLMhD_yQTWdRcap8TW_Nf=yJKEJF+RS3GWRm0cfrQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | Postg윈 토토SQL : |
On Sat, Feb 21, 2015 at 12:57 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Feb 18, 2015 at 6:44 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
>> On 2015-02-18 16:59:26 +0530, Amit Kapila wrote:
>>
>> > There could be some cases where it could be beneficial for worker
>> > to process a sub-tree, but I think there will be more cases where
>> > it will just work on a part of node and send the result back to either
>> > master backend or another worker for further processing.
>>
>> I think many parallelism projects start out that way, and then notice
>> that it doesn't parallelize very efficiently.
>>
>> The most extreme example, but common, is aggregation over large amounts
>> of data - unless you want to ship huge amounts of data between processes
>> eto parallize it you have to do the sequential scan and the
>> pre-aggregate step (that e.g. selects count() and sum() to implement a
>> avg over all the workers) inside one worker.
>>
>
> OTOH if someone wants to parallelize scan (including expensive qual) and
> sort then it will be better to perform scan (or part of scan by one worker)
> and sort by other worker.
There exists a performance problem if we perform SCAN in one worker
and SORT operation in another worker,
because there is a need of twice tuple transfer between worker to
worker/backend. This is a costly operation.
It is better to combine SCAN and SORT operation into a one worker job.
This can be targeted once the parallel scan
code is stable.
Regards,
Hari Babu
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2015-02-20 21:51:15 | Re: Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree) |
Previous Message | Peter Eisentraut | 2015-02-20 21:45:08 | Re: Turning recovery.conf into GUCs |