Re: Parallel Seq Scan

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-02-20 21:48:23
Message-ID: CAJrrPGd28BLMhD_yQTWdRcap8TW_Nf=yJKEJF+RS3GWRm0cfrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: Postg윈 토토SQL :

On Sat, Feb 21, 2015 at 12:57 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Feb 18, 2015 at 6:44 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
>> On 2015-02-18 16:59:26 +0530, Amit Kapila wrote:
>>
>> > There could be some cases where it could be beneficial for worker
>> > to process a sub-tree, but I think there will be more cases where
>> > it will just work on a part of node and send the result back to either
>> > master backend or another worker for further processing.
>>
>> I think many parallelism projects start out that way, and then notice
>> that it doesn't parallelize very efficiently.
>>
>> The most extreme example, but common, is aggregation over large amounts
>> of data - unless you want to ship huge amounts of data between processes
>> eto parallize it you have to do the sequential scan and the
>> pre-aggregate step (that e.g. selects count() and sum() to implement a
>> avg over all the workers) inside one worker.
>>
>
> OTOH if someone wants to parallelize scan (including expensive qual) and
> sort then it will be better to perform scan (or part of scan by one worker)
> and sort by other worker.

There exists a performance problem if we perform SCAN in one worker
and SORT operation in another worker,
because there is a need of twice tuple transfer between worker to
worker/backend. This is a costly operation.
It is better to combine SCAN and SORT operation into a one worker job.
This can be targeted once the parallel scan
code is stable.

Regards,
Hari Babu
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-02-20 21:51:15 Re: Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)
Previous Message Peter Eisentraut 2015-02-20 21:45:08 Re: Turning recovery.conf into GUCs