Re: Damn slow query

Lists: pgsql-hackers
From: "Magnus Naeslund(f)" <mag(at)fbab(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Damn slow query
Date: 2002-10-09 21:34:16
Message-ID: 메이저 토토 사이트 PostgreSQL : Damn Slow Query
Views: Raw Message | 토토 PostgreSQL | Download mbox | Resend email
Lists: Postg토토 캔SQL :

Hello, i've got this query that's really slow...
Figure this:

testdb=> select now() ; select gid from bs where gid not in ( select x
from z2test ); select now();
now
-------------------------------
2002-10-09 22:37:21.234627+02
(1 row)

gid
----------
<lotsa rows>
(524 rows)
now
-------------------------------
2002-10-09 23:20:53.227844+02
(1 row)

That's 45 minutes i don't wanna spend in there...
I got indexes:

testdb=> \d bs_gid_idx
Index "bs_gid_idx"
Column | Type
--------+-----------------------
gid | character varying(16)
online | smallint
btree

testdb=> \d z2test_x_idx;
Index "z2test_x_idx"
Column | Type
--------+-----------------------
x | character varying(16)
btree

Rowcounts are:

testdb=> select count(*) from bs ; select count(*) from z2test ;
count
-------
25376
(1 row)

count
-------
19329
(1 row)

The bs table have many other columns besides the gid one, the z2test
table only has the x column.

How can i speed this query up?
It never scans by the indexes.
I know it's a lot of iterations anyway i do it, but this is too damn
slow.

I can't profile anything at this box, because it's in production state,
but if you really want me to, i'll do it tomorrow on another box.

Magnus

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Programmer/Networker [|] Magnus Naeslund
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: "Magnus Naeslund(f)" <mag(at)fbab(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Damn slow query
Date: 2002-10-09 22:08:33
Message-ID: 20021009150709.L7789-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 9 Oct 2002, Magnus Naeslund(f) wrote:

> Hello, i've got this query that's really slow...
> Figure this:
>
> testdb=> select now() ; select gid from bs where gid not in ( select x
> from z2test ); select now();

Per FAQ suggestion, try something like
select gid from bs where not exists (select * from z2test where
z2test.x=bs.gid);
to see if it is faster.


From: Joe Conway <mail(at)joeconway(dot)com>
To: "Magnus Naeslund(f)" <mag(at)fbab(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Damn slow query
Date: 2002-10-09 22:08:47
Message-ID: 3DA4A8EF.3040004@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Naeslund(f) wrote:
> Hello, i've got this query that's really slow...
> Figure this:
>
> testdb=> select now() ; select gid from bs where gid not in ( select x
> from z2test ); select now();

"IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you rewrite
this as:

select b.gid from bs b where not exists (select 1 from z2test z where z.x =
b.gid);

or possibly:

select b.gid from bs b left join z2test z on z.x = b.gid where z.x IS NULL;

HTH,

Joe


From: "Magnus Naeslund(f)" <mag(at)fbab(dot)net>
To: "Joe Conway" <mail(at)joeconway(dot)com>, "Stephan Szabo" <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Damn slow query
Date: 2002-10-09 22:30:08
Message-ID: 042001c26fe3c798dc0$f80c0a0a@mnd
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> wrote:
> "IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you
> rewrite this as:
>

...

Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> wrote:
> Per FAQ suggestion, try something like

...

Thanks alot, below are the results on your suggestions, quite an
dramatic differance (but this is another box, faster, and running 7.3b2
so the 45 minutes doesn't hold here, but it took more than 10 minutes
before i stopped the original query).

Is this an todo item, or should every user figure this out (yeah i know
i should have read the FAQ when it went so totally bad).
The NOT IN it seems quite natural here, but then again, i don't think as
the db as you do :)

mag=> \timing
Timing is on.
mag=> explain analyze select count(gid) from bs where not exists (
select * from z2test where z2test.x=bs.gid );
Aggregate (cost=129182.18..129182.18 rows=1 width=9) (actual
time=590.90..590.90 rows=1 loops=1)
-> Seq Scan on bs (cost=0.00..129150.46 rows=12688 width=9) (actual
time=42.57..590.46 rows=524 loops=1)
Filter: (NOT (subplan))
SubPlan
-> Index Scan using z2temp_x_idx on z2test (cost=0.00..5.07
rows=1 width=9) (actual time=0.02..0.02 rows=1 loops=25376)
Index Cond: (x = $0)
Total runtime: 591.01 msec

Time: 592.25 ms

mag=> EXPLAIN analyze select count(b.gid) from bs b left join z2test z
on z.x = b.gid where z.x IS NULL;
Aggregate (cost=1703.65..1703.65 rows=1 width=18) (actual
time=370.31..370.31 rows=1 loops=1)
-> Hash Join (cost=346.61..1640.21 rows=25376 width=18) (actual
time=75.45..369.91 rows=524 loops=1)
Hash Cond: ("outer".gid = "inner".x)
Filter: ("inner".x IS NULL)
-> Seq Scan on bs b (cost=0.00..595.76 rows=25376 width=9)
(actual time=0.01..34.20 rows=25376 loops=1)
-> Hash (cost=298.29..298.29 rows=19329 width=9) (actual
time=43.82..43.82 rows=0 loops=1)
-> Seq Scan on z2test z (cost=0.00..298.29 rows=19329
width=9) (actual time=0.02..22.69 rows=19329 loops=1)
Total runtime: 370.42 msec

Time: 371.90 ms
mag=>

Magnus


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Magnus Naeslund(f)" <mag(at)fbab(dot)net>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Damn slow query
Date: 2002-10-09 22:33:40
Message-ID: 200210092233.g99MXeH07489@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Naeslund(f) wrote:
> Joe Conway <mail(at)joeconway(dot)com> wrote:
> > "IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you
> > rewrite this as:
> >
>
> ...
>
> Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> wrote:
> > Per FAQ suggestion, try something like
>
> ...
>
> Thanks alot, below are the results on your suggestions, quite an
> dramatic differance (but this is another box, faster, and running 7.3b2
> so the 45 minutes doesn't hold here, but it took more than 10 minutes
> before i stopped the original query).
>
> Is this an todo item, or should every user figure this out (yeah i know
> i should have read the FAQ when it went so totally bad).
> The NOT IN it seems quite natural here, but then again, i don't think as
> the db as you do :)

We already have a TODO item:

* Allow Subplans to use efficient joins(hash, merge) with upper variable

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Magnus Naeslund(f)" <mag(at)fbab(dot)net>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Joe Conway" <mail(at)joeconway(dot)com>, "Stephan Szabo" <sszabo(at)megazone23(dot)bigpanda(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Damn slow query
Date: 2002-10-09 22:39:44
Message-ID: 043101c26fe4$c3eb4e80$f80c0a0a@mnd
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>
> We already have a TODO item:
>
> * Allow Subplans to use efficient joins(hash, merge) with upper
> variable

Cool.
One thing to note here is that the JOIN query that Joe suggested is both
faster than the subselect thing (no suprise) but also don't care if
z2test has an index on it or not.
The subselect query started taking huge amount of time again if i
dropped the z2test_x_idx ...

So if the todo could somehow figure out that that subselect should be an
JOIN instead of an NOT EXISTS query, that would be great, because the
index on z2test isn't that super-obvious (i think, because i know the
data is tiny).

Magnus

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Programmer/Networker [|] Magnus Naeslund
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


From: Joe Conway <mail(at)joeconway(dot)com>
To: "Magnus Naeslund(f)" <mag(at)fbab(dot)net>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Damn slow query
Date: 2002-10-09 22:45:41
Message-ID: 3DA4B195.9020503@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Naeslund(f) wrote:
> One thing to note here is that the JOIN query that Joe suggested is both
> faster than the subselect thing (no suprise) but also don't care if
> z2test has an index on it or not.

It's worth noting though that JOIN is not always the fastest method. I've
found situations where NOT EXISTS was significantly faster than the LEFT JOIN
method (although both are usually orders of magnatude faster than NOT IN).

Joe