Re: Fixing matching of boolean index columns to sort ordering

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Fixing matching of boolean index columns to sort ordering
Date: 2016-12-13 05:08:04
Message-ID: 1788.1481605684@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The attached patch addresses the complaint raised in
/message-id/CAHt_Luuao4gd6De61GryK=2ff-MTgHzjqffdjz02uSdVqYmKKQ@mail.gmail.com

namely, that if you have an index on, say, integer columns i and j,
then the planner will figure out that it can use an indexscan with
no additional sort for a query like
select * from tab where i = 42 order by j;
but the same sort of thing does not work when the first column is
a boolean. You would think that this query is entirely isomorphic
to the one above:
select * from tab where b = true order by j;
but it isn't, because in expression preprocessing we simplify that to
select * from tab where b order by j;
Now, there's no equality condition so no EquivalenceClass is created
containing "b", and it's the presence of the EquivalenceClass that
drives the code that recognizes that the first index column can be
ignored while deciding what sort order the index produces.

The patch fixes that through the expedient of matching boolean index
columns to the restriction clauses for "tab", and when it finds a
match, acting as though we'd found a match to an EquivalenceClass
containing a constant. This is pretty ugly, but no more so than
several other existing special cases for boolean index columns.

Those special cases would largely go away if we were to canonicalize
"WHERE b" into "WHERE b = true" rather than the reverse, so you might
reasonably ask why we don't do that. I've asked myself that every time
I had to add another one of these special cases :-(, but the answer is
the same every time: where would you stop? Every WHERE clause is a
boolean expression, so there's no principled reason why such a rule
wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true",
wreaking havoc on every other optimization we have. Restricting it
to only apply to simple boolean columns is no answer because (a) indexes
can be on boolean expressions not just simple columns, and (b) part
of the point of the canonicalization is to make logically-equivalent
expressions look alike, so declining to apply it in some cases would
ruin that.

So, for better or worse, our approach is to simplify out "= true"
and then do whatever pushups we have to do at lower levels to keep
useful cases working nicely. This is another such pushup.

I'll add this to the next commitfest --- it could use some review
to see if I missed anything.

regards, tom lane

Attachment Content-Type Size
improve-boolean-index-matching.patch text/x-diff 7.4 KB

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing matching of boolean index columns to sort ordering
Date: 2016-12-13 15:18:38
Message-ID: CAKFQuwacaXxBj3SMAaLAgdK6VoYfDnp87zd2QbuZHV8iKU47_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2016 at 10:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Every WHERE clause is a
>
boolean expression, so there's no principled reason why such a rule
> wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true",
> wreaking havoc on every other optimization we have. Restricting it
> to only apply to simple boolean columns is no answer because (a) indexes
> can be on boolean expressions not just simple columns, and (b) part
> of the point of the canonicalization is to make logically-equivalent
> expressions look alike, so declining to apply it in some cases would
> ruin that.
>

​Given our reliance on operators in indexing a canonicalization ​rule that
says all boolean yielding expressions must contain an operator would yield
the desired result. "i = 42" already has an operator so no further
normalization is necessary to make it conform to such a rule.

The indexing concern may still come into play here, I'm not familiar enough
with indexes on column lists versus indexes on expressions to know off the
top of my head. The definition of "looking alike" seems like it would be
met since all such expression would look alike in having an operator.

That said, not adding "= true" is more visually appealing - though given
all of the other things we do in ruleutils this seems like a minor addition.

David J.


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing matching of boolean index columns to sort ordering
Date: 2017-01-13 07:40:13
Message-ID: CAB7nPqROquR64Ao3CT7Qcac71bizqy6ut0b_C+NTWm28rEE-pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 14, 2016 at 12:18 AM, David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> On Mon, Dec 12, 2016 at 10:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Every WHERE clause is a
>>
>> boolean expression, so there's no principled reason why such a rule
>> wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true",
>> wreaking havoc on every other optimization we have. Restricting it
>> to only apply to simple boolean columns is no answer because (a) indexes
>> can be on boolean expressions not just simple columns, and (b) part
>> of the point of the canonicalization is to make logically-equivalent
>> expressions look alike, so declining to apply it in some cases would
>> ruin that.
>
> Given our reliance on operators in indexing a canonicalization rule that
> says all boolean yielding expressions must contain an operator would yield
> the desired result. "i = 42" already has an operator so no further
> normalization is necessary to make it conform to such a rule.
>
> The indexing concern may still come into play here, I'm not familiar enough
> with indexes on column lists versus indexes on expressions to know off the
> top of my head. The definition of "looking alike" seems like it would be
> met since all such expression would look alike in having an operator.
>
> That said, not adding "= true" is more visually appealing - though given all
> of the other things we do in ruleutils this seems like a minor addition.

I have spent a couple of hours eye-balling this patch. There are not
that many users with indexes including a boolean column in its
definition... Still as this patch pushes down an index scan and avoids
an order by relying on the index scan to get things in the right order
it looks definitely right to make things better if we can. So that
seems worth it to me, even if that's adding a new extra
boolean-related optimization.

And actually, contrary to what is mentioned upthread, the planner is
not able to avoid a sort phase if other data types are used, say:
=# create table foo (a int, b int);
CREATE TABLE
=# create index on foo (a, b);
CREATE INDEX
=# explain (costs off) select * from foo where a = 1 order by b limit 10;
QUERY PLAN
----------------------------------------------------
Limit
-> Sort
Sort Key: b
-> Bitmap Heap Scan on foo
Recheck Cond: (a = 1)
-> Bitmap Index Scan on foo_a_b_idx
Index Cond: (a = 1)
(7 rows)
In this case, the sorting on column b should not be necessary as it
could be possible to rely on the index scan to get the elements
already sorted. Maybe I am missing something?
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing matching of boolean index columns to sort ordering
Date: 2017-01-13 13:29:43
Message-ID: 31684.1484314183@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> And actually, contrary to what is mentioned upthread, the planner is
> not able to avoid a sort phase if other data types are used, say:
> =# create table foo (a int, b int);
> CREATE TABLE
> =# create index on foo (a, b);
> CREATE INDEX
> =# explain (costs off) select * from foo where a = 1 order by b limit 10;

No, there's a difference between "not able to" and "chooses not to".
In this example case, it just thinks a bitmap scan is cheaper than
an ordered scan:

regression=# explain select * from foo where a = 1 order by b limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------
Limit (cost=15.10..15.13 rows=10 width=8)
-> Sort (cost=15.10..15.13 rows=11 width=8)
Sort Key: b
-> Bitmap Heap Scan on foo (cost=4.24..14.91 rows=11 width=8)
Recheck Cond: (a = 1)
-> Bitmap Index Scan on foo_a_b_idx (cost=0.00..4.24 rows=11 width=0)
Index Cond: (a = 1)
(7 rows)

regression=# set enable_bitmapscan to 0;
SET
regression=# explain select * from foo where a = 1 order by b limit 10;
QUERY PLAN
------------------------------------------------------------------------------------
Limit (cost=0.15..33.06 rows=10 width=8)
-> Index Only Scan using foo_a_b_idx on foo (cost=0.15..36.35 rows=11 width=8)
Index Cond: (a = 1)
(3 rows)

The problem with the boolean-column case is it fails to recognize
that the index matches at all.

regards, tom lane


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing matching of boolean index columns to sort ordering
Date: 2017-01-14 07:02:17
Message-ID: CAB7nPqSwSuA2n9EJKpEsyaDJcMMXckCpyvQwRUedA5fTq2RgKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 13, 2017 at 10:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> And actually, contrary to what is mentioned upthread, the planner is
>> not able to avoid a sort phase if other data types are used, say:
>> =# create table foo (a int, b int);
>> CREATE TABLE
>> =# create index on foo (a, b);
>> CREATE INDEX
>> =# explain (costs off) select * from foo where a = 1 order by b limit 10;
>
> No, there's a difference between "not able to" and "chooses not to".
> In this example case, it just thinks a bitmap scan is cheaper than
> an ordered scan:
>
> The problem with the boolean-column case is it fails to recognize
> that the index matches at all.

Bah. I was sure I was missing something, still I would have thought
that the index scan is cheaper than the bitmap index scan with ORDER
BY. As far as I can see, this patch is not the most elegant thing, but
it has value. So marked as "ready for committer".
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing matching of boolean index columns to sort ordering
Date: 2017-01-15 19:11:14
Message-ID: 15548.1484507474@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> Bah. I was sure I was missing something, still I would have thought
> that the index scan is cheaper than the bitmap index scan with ORDER
> BY. As far as I can see, this patch is not the most elegant thing, but
> it has value. So marked as "ready for committer".

Pushed, thanks for the review.

regards, tom lane