possible bug in 8.4

Lists: pgsql-hackers
From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: possible bug in 8.4
Date: 2008-12-18 17:15:22
Message-ID: E34BD4AD-E249-47F3-B0EB-5EF1A7C9C3ED@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

create table a(a int not null);
insert into a(a) select generate_series(1,6000000);
create table b as select * from a where a%10 <> 0;
create index fooa on a(a);
alter table b alter column a set not null;
create index foob on b(a);

vacuum analyze verbose;

gj=# explain select a.a from a where a not in (select a from b);
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on a (cost=99035.00..257874197565.00 rows=3000000 width=4)
Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=99035.00..171493.00 rows=5400000 width=4)
-> Seq Scan on b (cost=0.00..75177.00 rows=5400000
width=4)
(5 rows)

that's absolutely humongous cost, and it really does take ages before
this thing finishes (had to kill it after an hour).

For change, same type of query (should return same thing)

gj=# explain analyze select a.a from a left join b on a.a=b.a where
b.a is null;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Merge Anti Join (cost=0.00..350302.50 rows=600000 width=4) (actual
time=0.534..10851.995 rows=600000 loops=1)
Merge Cond: (a.a = b.a)
-> Index Scan using fooa on a (cost=0.00..166209.78 rows=6000000
width=4) (actual time=0.232..3128.438 rows=6000000 loops=1)
-> Index Scan using foob on b (cost=0.00..149592.72 rows=5400000
width=4) (actual time=0.161..2778.569 rows=5400000 loops=1)
Total runtime: 10938.592 ms
(5 rows)

Time: 10939,414 ms


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-18 17:45:42
Message-ID: 87y6ydl6m1.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl> writes:

> gj=# explain select a.a from a where a not in (select a from b);
> QUERY PLAN
> -------------------------------------------------------------------------
> Seq Scan on a (cost=99035.00..257874197565.00 rows=3000000 width=4)
> Filter: (NOT (subplan))
> SubPlan
> -> Materialize (cost=99035.00..171493.00 rows=5400000 width=4)
> -> Seq Scan on b (cost=0.00..75177.00 rows=5400000 width=4)
> (5 rows)
>
>
> that's absolutely humongous cost, and it really does take ages before this
> thing finishes (had to kill it after an hour).

I think Postgres can't do better because there could be a NULL in the
subquery. If there's a NULL in the subquery then no record would match.

Now your column is NOT NULL so Postgres could do better but AFAIK we don't
look at column constraints like NOT NULL when planning. Historically we
couldn't because we didn't have plan invalidation -- and the plan you posted
below with the Anti-Join is brand new in 8.4 -- so there is room for
improvement but it's not exactly a bug.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-18 19:13:38
Message-ID: 18343.1229627618@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> I think Postgres can't do better because there could be a NULL in the
> subquery. If there's a NULL in the subquery then no record would match.

Yeah. NOT IN does not have the right semantics to become an antijoin.

FWIW, you can use a NOT EXISTS:

select a.a from a where not exists (select 1 from b where a.a = b.a);

In HEAD this should give the identical plan to the leftjoin/is null
hack.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-18 19:32:06
Message-ID: 87tz91l1op.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> I think Postgres can't do better because there could be a NULL in the
>> subquery. If there's a NULL in the subquery then no record would match.
>
> Yeah. NOT IN does not have the right semantics to become an antijoin.

If we noticed that the columns in the subquery are all guaranteed to be not
null could we do it then?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-18 19:46:04
Message-ID: 24646.1229629564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Yeah. NOT IN does not have the right semantics to become an antijoin.

> If we noticed that the columns in the subquery are all guaranteed to be not
> null could we do it then?

I think you'd also have to know that the outer-query value isn't null,
plus assume that the comparison operator can't return null for two
non-nulls (but we already assume that for btree/hash equality I think).

As you said, this would never have been safe before plan invalidation,
but it might be doable now.

regards, tom lane


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-18 22:17:32
Message-ID: E6D44CAD-A60F-46AB-A942-30577AB4509D@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

all I know, is that the same query will work on 8.3 in reasonably
acceptable time frame.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-18 22:28:47
Message-ID: 87prjpkti8.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl> writes:

> all I know, is that the same query will work on 8.3 in reasonably acceptable
> time frame.

What plan do you get in 8.3?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-18 22:36:21
Message-ID: 87k59xkt5m.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl> writes:

> all I know, is that the same query will work on 8.3 in reasonably acceptable
> time frame.
>

Because I see the exact same plan -- in fact with the exact same cost:

Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

postgres=# explain select a.a from a where a not in (select a from b);
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on a (cost=99035.00..257874197565.00 rows=3000000 width=4)
Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=99035.00..171493.00 rows=5400000 width=4)
-> Seq Scan on b (cost=0.00..75177.00 rows=5400000 width=4)
(5 rows)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Grzegorz Jaskiewicz" <gj(at)pointblue(dot)com(dot)pl>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: possible bug in 8.4
Date: 2008-12-18 23:10:57
Message-ID: 494A8421.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>> Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl> writes:
>
>> all I know, is that the same query will work on 8.3 in reasonably
> acceptable
>> time frame.
>>
>
> Because I see the exact same plan -- in fact with the exact same
cost:
>
>
> Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

I'm seeing the exact same plan on 8.3.5, too.

-Kevin


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: possible bug in 8.4
Date: 2008-12-19 09:57:42
Message-ID: 24C7D056-28F5-406A-A58B-DC0F0BF4B538@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


true, but as soon as I drop indices on both tables:

root=# explain analyze select a.a from a where a not in (select a from
b);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on a (cost=88677.00..187207.00 rows=3000000 width=4)
(actual time=22803.470..26473.039 rows=600000 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on b (cost=0.00..75177.00 rows=5400000 width=4)
(actual time=9.232..15828.904 rows=5400000 loops=1)
Total runtime: 29357.267 ms
(5 rows)

root=# select version();
version
--------------------------------------------------------------------------------------------------------
PostgreSQL 8.3.5 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC)
3.4.6 20060404 (Red Hat 3.4.6-10)
(1 row)

I thought planner should be able to choose plan that costs less.


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: possible bug in 8.4
Date: 2008-12-19 10:50:32
Message-ID: CCFD1996-3933-4F8B-A3CE-1E9881DEDAA7@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

and the plan on 8.4:
gj=# explain select a.a from a where a not in (select a from b);
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on a (cost=99035.00..257874197565.00 rows=3000000 width=4)
Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=99035.00..171493.00 rows=5400000 width=4)
-> Seq Scan on b (cost=0.00..75177.00 rows=5400000
width=4)
(5 rows)

This looks like a regression to me, or am I wrong ?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-19 13:07:58
Message-ID: 6501.1229692078@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl> writes:
> Filter: (NOT (hashed subplan))
^^^^^^^^^^^^^^

If 8.3 does that, and 8.4 doesn't, it's most likely because you are
using different work_mem settings.

regards, tom lane


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible bug in 8.4
Date: 2008-12-19 14:51:25
Message-ID: 77D255AA-8E66-4D2E-AEDF-664FC36F7FC6@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2008-12-19, at 13:07, Tom Lane wrote:

> Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl> writes:
>> Filter: (NOT (hashed subplan))
> ^^^^^^^^^^^^^^
>
> If 8.3 does that, and 8.4 doesn't, it's most likely because you are
> using different work_mem settings.

you're right, as always :)

My point is, why is planner choosing plan on 8.4 that's obviously more
expensive ? even without sort_mem (work_mem) set to higher value ?
Obviously the plan is quite expensive, so probably sorting it on disc
- would be still cheaper.
The example is quite unrealistic, but I am trying different simple
things to test planner differences between these two versions, and see
if there's any improvement/regression.
This time I will set both work_mem to same value :)