Re: BUG #11500: PRIMARY KEY index not being used

Lists: Postg범퍼카 토토SQL : Postg범퍼카 토토SQL 메일 링리스트 : 2014-09-26 이후 PGSQL-BUGS 17:44
From: marko(at)joh(dot)to
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #11500: PRIMARY KEY index not being used
Date: 2014-09-26 08:02:11
Message-ID: 20140926080211.7639.68358@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 11500
Logged by: Marko Tiikkaja
Email address: marko(at)joh(dot)to
PostgreSQL version: 9.1.12
Operating system: Linux
Description:

Hi,

We've been observing a performance problem where a PRIMARY KEY index is not
being used. The problem looks like this:

pg2=#* explain analyze select * from events where eventid = 132685185 and
processed = 0;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Index Scan using index_events_processed on events (cost=0.00..7.73 rows=1
width=106) (actual time=31.808..31.808 rows=0 loops=1)
Index Cond: (processed = 0)
Filter: (eventid = 132685185)
Total runtime: 31.852 ms
(4 rows)

pg2=#* explain analyze select * from events where eventid = 132685185 and
processed+0 = 0;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Index Scan using events_pkey on events (cost=0.00..12.38 rows=1 width=106)
(actual time=0.071..0.071 rows=0 loops=1)
Index Cond: (eventid = 132685185)
Filter: ((processed + 0) = 0)
Total runtime: 0.109 ms
(4 rows)

I'm guessing that's happening because the index on processed is smaller
(7GB, relpages=900880 vs 3.7GB, relpages=478225). The statistics say that
there are no rows where processed=0 (and it's not far from the truth), but
it's still a risky plan compared to the PK lookup. The index
index_events_processed is an index on events(processed), which should
probably be a partial index on WHERE processed = 0, but I thought I'd
report this plan anyway.

Any thoughts?


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11500: PRIMARY KEY index not being used
Date: 2014-09-26 12:02:43
Message-ID: CABRT9RB0_AhrhYQqDZtuvsqi22weHP9GcCKxLiwbOCR+Ymm1XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Sep 26, 2014 at 11:02 AM, <marko(at)joh(dot)to> wrote:
> The statistics say that
> there are no rows where processed=0 (and it's not far from the truth), but
> it's still a risky plan compared to the PK lookup.

> Any thoughts?

PostgreSQL 9.0 introduced this optimization for greater/less operators:

> When looking up statistics for greater/less-than comparisons, if the
> comparison value is in the first or last histogram bucket, use an index
> (if available) to fetch the current actual column minimum or maximum.
> This greatly improves the accuracy of estimates for comparison values
> near the ends of the data range, particularly if the range is constantly
> changing due to addition of new data.

Not sure whether it's a good idea a bad idea, but perhaps a solution
is to expand this to equality lookups too?

Does using "WHERE processed <= 0" work around the problem? (Assuming
you don't have any negative numbers in this column).

> The index
> index_events_processed is an index on events(processed), which should
> probably be a partial index on WHERE processed = 0, but I thought I'd
> report this plan anyway.

I guess you would still have this problem, unless your new index
contains the eventid column.

Regards,
Marti


From: Marko Tiikkaja <marko(at)joh(dot)to>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11500: PRIMARY KEY index not being used
Date: 2014-09-26 12:54:19
Message-ID: 542561FB.3050900@joh.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 9/26/14 2:02 PM, Marti Raudsepp wrote:
> On Fri, Sep 26, 2014 at 11:02 AM, <marko(at)joh(dot)to> wrote:
>> The statistics say that
>> there are no rows where processed=0 (and it's not far from the truth), but
>> it's still a risky plan compared to the PK lookup.
>
>> Any thoughts?
>
> PostgreSQL 9.0 introduced this optimization for greater/less operators:
>
>> When looking up statistics for greater/less-than comparisons, if the
>> comparison value is in the first or last histogram bucket, use an index
>> (if available) to fetch the current actual column minimum or maximum.
>> This greatly improves the accuracy of estimates for comparison values
>> near the ends of the data range, particularly if the range is constantly
>> changing due to addition of new data.
>
> Not sure whether it's a good idea a bad idea, but perhaps a solution
> is to expand this to equality lookups too?

I'm not sure that's the right idea to be honest. The problem is that
the planner is taking a risk by using an index which could contain
(theoretically) any number of matching rows, instead of using the
primary key which is guaranteed to only contain 0 or 1 rows. Sure,
peeking into the index to see that there are indeed some processed=0
rows would probably discourage the planner from using it, but why bother?

> Does using "WHERE processed <= 0" work around the problem? (Assuming
> you don't have any negative numbers in this column).

I unfortunately already dropped the problematic index, so I can't answer
that.

.marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: marko(at)joh(dot)to
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11500: PRIMARY KEY index not being used
Date: 2014-09-26 17:44:39
Message-ID: 8043.1411753479@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg범퍼카 토토SQL : Postg범퍼카 토토SQL 메일 링리스트 : 2014-09-26 이후 PGSQL-BUGS 17:44

marko(at)joh(dot)to writes:
> PostgreSQL version: 9.1.12

> We've been observing a performance problem where a PRIMARY KEY index is not
> being used. The problem looks like this:

In 9.2 and again in 9.3, we significantly changed the planner's modeling
of index access costs, with a view towards making better choices when
there are multiple plausible indexes to use. If you can reproduce this
misbehavior in 9.3 or later it would be worth looking into; but we're not
going to change 9.1's estimation rules at this late date. People running
on back branches tend to want plan stability in my experience.

regards, tom lane