Re: BUG #17618: unnecessary filter column <> text even after adding index

Lists: pgsql-bugs
From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: sindysenorita(at)gmail(dot)com
Subject: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-19 14:28:23
Message-ID: 17618-7a2240bfaa7e84ae@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 17618
Logged by: Sindy Senorita
Email address: sindysenorita(at)gmail(dot)com
PostgreSQL version: 13.7
Operating system: Ubuntu
Description:

Hi, I'm not sure if this is a bug or feature, but definitely not what I've
expected

So I have a table with "status" column which can contains 'valid',
'invalid', 'pending', 'unknown'.
A very simple table

CREATE TABLE public.test (
id varchar NOT NULL,
status varchar NOT NULL,
CONSTRAINT test__pkey PRIMARY KEY (id)
)
CREATE INDEX pending_test_4 ON public.test USING btree ((((status)::text <>
'invalid'::text)));

notice that I've created an index to guide statuses that is not 'invalid
my query is:
SELECT * FROM test WHERE status != 'invalid'

When I run explain analyze on that with SET enable_seqscan = off, I got
QUERY PLAN
|
------------------------------------------------------------------------------------------------------------------------+
Bitmap Heap Scan on test (cost=4.62..8.37 rows=120 width=160) (actual
time=0.088..0.134 rows=117 loops=1) |
Filter: ((status)::text <> 'invalid'::text)
|
Heap Blocks: exact=3
|
-> Bitmap Index Scan on pending_test_4 (cost=0.00..4.59 rows=60 width=0)
(actual time=0.073..0.073 rows=117 loops=1)|
Index Cond: (((status)::text <> 'invalid'::text) = true)
|
Planning Time: 0.222 ms
|
Execution Time: 0.172 ms
|

The plan has used the index condition just right, but it still perform
aditional bitmap heap scan just to filter for a clause that exactly match
the index. And worse, it double the query cost
My questions are:
1. Is this a bug? or intended feature by design? If it is by design, I'd be
very happy to learn the rationale behind it.
2. Is there any way to skip/avoid the additional bitmap scan?
3. Could there be a better solution for my query. Suppose that the variants
of the status is unknown so query SELECT .. WHERE STATUS IN (all status
beside 'invalid') is not possible

Many thanks!
Sindy


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: sindysenorita(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-19 15:24:12
Message-ID: 349136.1663601052@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> When I run explain analyze on that with SET enable_seqscan = off, I got
> QUERY PLAN
> |
> ------------------------------------------------------------------------------------------------------------------------+
> Bitmap Heap Scan on test (cost=4.62..8.37 rows=120 width=160) (actual
> time=0.088..0.134 rows=117 loops=1) |
> Filter: ((status)::text <> 'invalid'::text)
> |
> Heap Blocks: exact=3
> |
> -> Bitmap Index Scan on pending_test_4 (cost=0.00..4.59 rows=60 width=0)
> (actual time=0.073..0.073 rows=117 loops=1)|
> Index Cond: (((status)::text <> 'invalid'::text) = true)
> |
> Planning Time: 0.222 ms
> |
> Execution Time: 0.172 ms
> |

This is exactly what is expected; it's not a bug.

> The plan has used the index condition just right, but it still perform
> aditional bitmap heap scan just to filter for a clause that exactly match
> the index. And worse, it double the query cost

The filter condition is required because the bitmap produced by the index
can be lossy, ie it might identify more rows than actually satisfy the
condition. BitmapHeapNext will only actually apply the condition if
the index reports that that happened, so in practice for this sort of
query the filter condition probably never gets rechecked.

The "doubled cost" has nothing whatever to do with the filter condition;
most of that is concerned with the number of disk pages touched. It
might help you to read

/docs/current/using-explain.html

regards, tom lane


From: Sindy Senorita <sindysenorita(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-19 15:45:56
Message-ID: CAKU5B4FLa9UNdKFfdkLJYdZ2BtZT=AAN1Q6P9PtcDHjEvKu5eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I see, quick google search takes me to BitmapHeapNext implementation here
https://doxygen.postgresql.org/nodeBitmapHeapscan_8c_source.html#l00072. I
hope this is what you mean
Noted. Thanks for the explanation

Cheers

On Mon, Sep 19, 2022 at 10:24 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > When I run explain analyze on that with SET enable_seqscan = off, I got
> > QUERY PLAN
>
> > |
> >
> ------------------------------------------------------------------------------------------------------------------------+
> > Bitmap Heap Scan on test (cost=4.62..8.37 rows=120 width=160) (actual
> > time=0.088..0.134 rows=117 loops=1) |
> > Filter: ((status)::text <> 'invalid'::text)
>
> > |
> > Heap Blocks: exact=3
>
> > |
> > -> Bitmap Index Scan on pending_test_4 (cost=0.00..4.59 rows=60
> width=0)
> > (actual time=0.073..0.073 rows=117 loops=1)|
> > Index Cond: (((status)::text <> 'invalid'::text) = true)
>
> > |
> > Planning Time: 0.222 ms
>
> > |
> > Execution Time: 0.172 ms
>
> > |
>
>
> This is exactly what is expected; it's not a bug.
>
> > The plan has used the index condition just right, but it still perform
> > aditional bitmap heap scan just to filter for a clause that exactly match
> > the index. And worse, it double the query cost
>
> The filter condition is required because the bitmap produced by the index
> can be lossy, ie it might identify more rows than actually satisfy the
> condition. BitmapHeapNext will only actually apply the condition if
> the index reports that that happened, so in practice for this sort of
> query the filter condition probably never gets rechecked.
>
> The "doubled cost" has nothing whatever to do with the filter condition;
> most of that is concerned with the number of disk pages touched. It
> might help you to read
>
> /docs/current/using-explain.html
>
> regards, tom lane
>


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: sindysenorita(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-19 15:53:29
Message-ID: CAKFQuwaWi8MLBWazkXwM1gp6_e2G7LoNee_Za8ZKxfo-1K+2kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Sep 19, 2022 at 8:15 AM PG Bug reporting form <
noreply(at)postgresql(dot)org> wrote:

>
> CREATE TABLE public.test (
> id varchar NOT NULL,
> status varchar NOT NULL,
> CONSTRAINT test__pkey PRIMARY KEY (id)
> )
>

> CREATE INDEX pending_test_4 ON public.test USING btree ((((status)::text <>
> 'invalid'::text)));
>
> notice that I've created an index to guide statuses that is not 'invalid
> my query is:
> SELECT * FROM test WHERE status != 'invalid'
>

Your index contains none of the fields in the original table so the system
can never answer your inquiry using only the index.

You may find this to be informative:

/docs/devel/indexes-index-only-scans.html

Usually on a "status" field doing a few partial indexes gets you the best
result. The more statuses you need to be concerned about the more likely
just scanning the table is going to win out in performance. But if you do
only care about a few the smaller index size will be of benefit to keep
them in memory. A covering index may be of use as well though for rapidly
changing statuses tuple visibility is going to be a challenge. In short,
you seem to be providing a non-real situation and asking for advice that is
situational in nature.

David J.


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-20 17:12:03
Message-ID: CAMkU=1wqvsMwnQTrQ0zoRo2AkYhuEY-RuzpaXrvz7iNu38CVqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Sep 19, 2022 at 11:24 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> > The plan has used the index condition just right, but it still perform
> > aditional bitmap heap scan just to filter for a clause that exactly match
> > the index. And worse, it double the query cost
>
> The filter condition is required because the bitmap produced by the index
> can be lossy, ie it might identify more rows than actually satisfy the
> condition. BitmapHeapNext will only actually apply the condition if
> the index reports that that happened, so in practice for this sort of
> query the filter condition probably never gets rechecked.
>

You are describing a Recheck, but attributing its properties to a Filter.
I think that the filter condition always gets checked. It does so if I
create a tattler function which raises a notice every time it is called and
then build an index on a Boolean expression over that function (but of
course that inevitably does change the code paths a bit).

I don't know about being a bug, but it is at least a mild mal-feature that
boolean index columns/expressions can't be dealt with better, and have to
be handed in a filter rather than in a recheck.

Cheers,

Jeff


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-20 19:14:34
Message-ID: 739953.1663701274@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 핫SQL : Postg토토 핫SQL 메일 링리스트 : 2022-09-20 이후 PGSQL-BUGS 19:14

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> You are describing a Recheck, but attributing its properties to a Filter.

You're right of course, momentary brain fade on my part.

> I don't know about being a bug, but it is at least a mild mal-feature that
> boolean index columns/expressions can't be dealt with better, and have to
> be handed in a filter rather than in a recheck.

Yeah ... looking at create_bitmap_scan_plan, I see that it does this:

/*
* When dealing with special operators, we will at this point have
* duplicate clauses in qpqual and bitmapqualorig. We may as well drop
* 'em from bitmapqualorig, since there's no point in making the tests
* twice.
*/
bitmapqualorig = list_difference_ptr(bitmapqualorig, qpqual);

I wonder if that isn't backwards, ie we should prefer to put duplicates
in bitmapqualorig (the recheck condition) instead of qpqual (the filter).
If my head is screwed on correctly today, that should allow us to skip
checking the condition much of the time, and the skip would be safe
if the index is correctly asserting that no recheck is needed.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-20 23:55:59
Message-ID: 946195.1663718159@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I wrote:
> I wonder if that isn't backwards, ie we should prefer to put duplicates
> in bitmapqualorig (the recheck condition) instead of qpqual (the filter).
> If my head is screwed on correctly today, that should allow us to skip
> checking the condition much of the time, and the skip would be safe
> if the index is correctly asserting that no recheck is needed.

Flipping the removal around has the effect I expected on the plan shape,
but some of the regression test queries now give the wrong answer, so
there's something faulty about that analysis.

regards, tom lane


From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-23 13:29:39
Message-ID: CAMbWs4-o-VLk3gXqz7W7ZS384S5mvW5noB8f_BKzU_c77DBjXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Wed, Sep 21, 2022 at 7:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Flipping the removal around has the effect I expected on the plan shape,
> but some of the regression test queries now give the wrong answer, so
> there's something faulty about that analysis.

I think we may have a minor mistake when constructing the qpqual list in
create_bitmap_scan_plan. The qpqual list is supposed to be scan_clauses
minus indexquals. So we check each scan clause to see if it is redundant
with any indexqual, by using equal, checking EC or using
predicate_implied_by.

Note that the indexqual here may not be the form that has been going
through constant folding. Such as in this case with a boolean index, the
indexqual would be converted to 'indexkey expression = TRUE' by
match_boolean_index_clause. And that may make us fail to tell the scan
clause is redundant.

The comment of predicate_implied_by() says

* The top-level List structure of each list corresponds to an AND list.
* We assume that eval_const_expressions() has been applied and so there
* are no un-flattened ANDs or ORs (e.g., no AND immediately within an AND,
* including AND just below the top-level List structure).

So I think we need to run eval_const_expressions on indexquals before we
check for duplicate clauses, something like attached.

Thanks
Richard

Attachment Content-Type Size
v1-0001-constant-folding-for-indexquals-in-bitmap-scan.patch application/octet-stream 1015 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-23 14:10:09
Message-ID: 2100290.1663942209@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> So I think we need to run eval_const_expressions on indexquals before we
> check for duplicate clauses, something like attached.

[ squint... ] Surely that was done long before we ever get here?

regards, tom lane


From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-23 23:54:02
Message-ID: CAMbWs4-5CtCGx_nbzfPFTtDCXjmtCOR3vb-OfK4eNEAKwLQPSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Sep 23, 2022 at 10:10 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > So I think we need to run eval_const_expressions on indexquals before we
> > check for duplicate clauses, something like attached.
>
> [ squint... ] Surely that was done long before we ever get here?

We should have already done that long before. It seems afterwards we may
do additional transformation on indexquals. In this case with a boolean
index, I can see we convert the indexqual to form 'indexkey = TRUE' in
match_boolean_index_clause.

Thanks
Richard


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-24 00:04:51
Message-ID: 2174559.1663977891@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> On Fri, Sep 23, 2022 at 10:10 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
>>> So I think we need to run eval_const_expressions on indexquals before we
>>> check for duplicate clauses, something like attached.

>> [ squint... ] Surely that was done long before we ever get here?

> We should have already done that long before. It seems afterwards we may
> do additional transformation on indexquals. In this case with a boolean
> index, I can see we convert the indexqual to form 'indexkey = TRUE' in
> match_boolean_index_clause.

Of course, but what about that transformation would introduce something
that eval_const_expressions could simplify? (Actually, now that I think
about it, I think eval_const_expressions would break it completely because
it'd re-canonicalize the expression as just 'indexkey', exactly what we
don't want here.) In any case, if there's something between the
eval_const_expressions pass and createplan.c that introduces simplifiable
expressions, I think it's on that something's head to re-simplify; we
don't want to do something so expensive in a main code path if it's
usually going to be a complete waste.

regards, tom lane


From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-24 00:06:06
Message-ID: CAMbWs4878nvTT=D6eECLRJqrz3+FifKr6XRSonrP0pyqmSVCLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Sep 23, 2022 at 9:29 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> So I think we need to run eval_const_expressions on indexquals before we
> check for duplicate clauses, something like attached.
>

BTW, (revise to the v1 patch), if this is the right way to go, we should
do that before the foreach loop, so that we need to run
eval_const_expressions on indexquals only once rather than for each scan
clause.

Thanks
Richard


From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-24 00:41:48
Message-ID: CAMbWs498iR-5gVbsS0DCmoeK4YNs0nny6xmW-eLvm=C-KXm-Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sat, Sep 24, 2022 at 8:04 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > We should have already done that long before. It seems afterwards we may
> > do additional transformation on indexquals. In this case with a boolean
> > index, I can see we convert the indexqual to form 'indexkey = TRUE' in
> > match_boolean_index_clause.
>
> Of course, but what about that transformation would introduce something
> that eval_const_expressions could simplify? (Actually, now that I think
> about it, I think eval_const_expressions would break it completely because
> it'd re-canonicalize the expression as just 'indexkey', exactly what we
> don't want here.) In any case, if there's something between the
> eval_const_expressions pass and createplan.c that introduces simplifiable
> expressions, I think it's on that something's head to re-simplify; we
> don't want to do something so expensive in a main code path if it's
> usually going to be a complete waste.

Yeah, I agree that running eval_const_expressions here is expensive.
Maybe we can just do the reverse transformation in
create_bitmap_scan_plan against what we do for boolean index in
match_boolean_index_clause?

I think it's necessary to re-simplify the indexquals here, otherwise we
may fail to compare scan_clauses to indexquals correctly.

Thanks
Richard


From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-09-26 11:42:05
Message-ID: CAMbWs4_AmLq9ssUh+fu8KZk8QPhx21=+o3whspLNAhxY6T0Xzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sat, Sep 24, 2022 at 8:41 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

>
> On Sat, Sep 24, 2022 at 8:04 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
>> > We should have already done that long before. It seems afterwards we may
>> > do additional transformation on indexquals. In this case with a boolean
>> > index, I can see we convert the indexqual to form 'indexkey = TRUE' in
>> > match_boolean_index_clause.
>>
>> Of course, but what about that transformation would introduce something
>> that eval_const_expressions could simplify? (Actually, now that I think
>> about it, I think eval_const_expressions would break it completely because
>> it'd re-canonicalize the expression as just 'indexkey', exactly what we
>> don't want here.) In any case, if there's something between the
>> eval_const_expressions pass and createplan.c that introduces simplifiable
>> expressions, I think it's on that something's head to re-simplify; we
>> don't want to do something so expensive in a main code path if it's
>> usually going to be a complete waste.
>
>
> Yeah, I agree that running eval_const_expressions here is expensive.
> Maybe we can just do the reverse transformation in
> create_bitmap_scan_plan against what we do for boolean index in
> match_boolean_index_clause?
>

Following this idea, I come up with v2 patch. Is this the right
direction to go?

Thanks
Richard

Attachment Content-Type Size
v2-0001-constant-folding-for-indexquals-in-bitmap-scan.patch application/octet-stream 4.4 KB

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-11-02 07:46:30
Message-ID: CAMbWs49oBk4EfHC+44xkuAwQYnGH5pyKP40=FwZv9Zecx9=3Mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Sep 26, 2022 at 7:42 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> On Sat, Sep 24, 2022 at 8:41 AM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
>
>> Yeah, I agree that running eval_const_expressions here is expensive.
>> Maybe we can just do the reverse transformation in
>> create_bitmap_scan_plan against what we do for boolean index in
>> match_boolean_index_clause?
>>
>
> Following this idea, I come up with v2 patch. Is this the right
> direction to go?
>

Update with v3 patch, nothing changes except fixes a test failure
spotted by cfbot.

Thanks
Richard

Attachment Content-Type Size
v3-0001-constant-folding-for-indexquals-in-bitmap-scan.patch application/octet-stream 5.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-11-05 17:07:44
Message-ID: 2616649.1667668064@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> Update with v3 patch, nothing changes except fixes a test failure
> spotted by cfbot.

I think this is pretty close to usable, except that I don't believe
reusing simplify_boolean_equality this way is a great idea.
It does more than we need (surely the LHS-is-Const case cannot occur here)
and it has assumptions that I'm not sure hold --- particularly the bit
about !constisnull. I'd be inclined to just copy-and-paste the three or
four lines we need.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-11-05 17:23:20
Message-ID: 2619371.1667669000@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I wrote:
>> I wonder if that isn't backwards, ie we should prefer to put duplicates
>> in bitmapqualorig (the recheck condition) instead of qpqual (the filter).
>> If my head is screwed on correctly today, that should allow us to skip
>> checking the condition much of the time, and the skip would be safe
>> if the index is correctly asserting that no recheck is needed.

> Flipping the removal around has the effect I expected on the plan shape,
> but some of the regression test queries now give the wrong answer, so
> there's something faulty about that analysis.

BTW, after looking more closely I see my mistake. An example of the
sort of plan that fails with that change is

Sort
Sort Key: proname
-> Bitmap Heap Scan on pg_proc
- Filter: (proname ~~ 'RI\_FKey%del'::text)
+ Recheck Cond: (proname ~~ 'RI\_FKey%del'::text)
-> Bitmap Index Scan on pg_proc_proname_args_nsp_index
Index Cond: ((proname >= 'RI_FKey'::text) AND (proname < 'RI_FKez'::text))
(6 rows)

The difficulty here is pretty obvious: the original clause is stricter
than the index conditions generated from it. So even if the index
enforces the index conditions exactly, we still need to check the
original clause, and so it can't be relegated to the recheck field.
To improve this, we'd need to track which elements of bitmapqualorig
correspond exactly to index conditions, which we don't do ATM.

regards, tom lane


From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-11-07 09:01:19
Message-ID: CAMbWs48DuHbqtAgMjpj5Jq_ubxR0+SS-urLuVi=9Dx5FTkJieA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sun, Nov 6, 2022 at 1:07 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > Update with v3 patch, nothing changes except fixes a test failure
> > spotted by cfbot.
>
> I think this is pretty close to usable, except that I don't believe
> reusing simplify_boolean_equality this way is a great idea.
> It does more than we need (surely the LHS-is-Const case cannot occur here)
> and it has assumptions that I'm not sure hold --- particularly the bit
> about !constisnull. I'd be inclined to just copy-and-paste the three or
> four lines we need.

Thanks for the suggestion. Yes, simplify_boolean_equality is doing more
than we need. I think here we just intend to handle indexquals of form
"indexkey = true/false", which seems can only come out from function
match_boolean_index_clause. From what this function does, we are sure
the constant input can be only on right (as you pointed out), and the
operator can only be BooleanEqualOperator. Also it seems the assumption
about !constisnull holds, as match_boolean_index_clause would not make a
clause with a constant-NULL input.

I've updated the patch according to the suggestions as in v4. Thanks
for reviewing this patch!

Thanks
Richard

Attachment Content-Type Size
v4-0001-constant-folding-for-indexquals-in-bitmap-scan.patch application/octet-stream 3.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-11-07 21:06:33
Message-ID: 3495354.1667855193@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> I've updated the patch according to the suggestions as in v4. Thanks
> for reviewing this patch!

I was about ready to commit this when I re-read your initial comment
and realized that there's a second way to fix it. We can improve
predtest.c so that it understands that "x = true" implies "x" and
so on, whereupon the existing logic in create_bitmap_scan_plan
handles the case correctly. This is pretty nearly the same code as
in your v4, except that it's in a considerably less hot code path, plus
there's at least some chance that it could be useful for other purposes.
So I think I like this way better. Thoughts?

regards, tom lane

Attachment Content-Type Size
v5-0001-constant-folding-for-indexquals-in-bitmap-scan.patch text/x-diff 4.4 KB

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-11-08 01:59:34
Message-ID: CAMbWs48XWRvRcUSRXVA0ciyP9-RWo=J755khaxiXifivy3zdcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Nov 8, 2022 at 5:06 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> > I've updated the patch according to the suggestions as in v4. Thanks
> > for reviewing this patch!
>
> I was about ready to commit this when I re-read your initial comment
> and realized that there's a second way to fix it. We can improve
> predtest.c so that it understands that "x = true" implies "x" and
> so on, whereupon the existing logic in create_bitmap_scan_plan
> handles the case correctly. This is pretty nearly the same code as
> in your v4, except that it's in a considerably less hot code path, plus
> there's at least some chance that it could be useful for other purposes.
> So I think I like this way better. Thoughts?

It works for me. predtest.c is a more common place so that there may be
other cases that can benefit from this change. Thanks for the new
patch!

Thanks
Richard


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, sindysenorita(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17618: unnecessary filter column <> text even after adding index
Date: 2022-11-08 15:37:13
Message-ID: 3758104.1667921833@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> It works for me. predtest.c is a more common place so that there may be
> other cases that can benefit from this change. Thanks for the new
> patch!

Pushed then.

regards, tom lane