Lists: | pgsql-bugs |
---|
From: | Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-07-17 08:12:41 |
Message-ID: | 4A603279.3030205@alreg.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
One of our customers discovered that by replacing <>TRUE with =FALSE in
a query of a table containing 750.000 records reduced the query time
from about 12 seconds to about 60 milliseconds!
The problematic query looks like this:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
Alarm_status='X' ORDER BY ID DESC
If it is changed to this it works as expected:
SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
Alarm_status='X' ORDER BY ID DESC
After investigation (on a smaller dataset on my own database) I found
that the query was resulting in a sequential scan:
"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=49936.96..49936.96 rows=1 width=405) (actual
time=837.793..837.793 rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405)
(actual time=837.782..837.782 rows=0 loops=1)"
" Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 837.896 ms"
The modified query gave this result:
"explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
AND Alarm_status='X' ORDER BY ID DESC"
"Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
rows=0 loops=1)"
" Sort Key: id"
" Sort Method: quicksort Memory: 17kB"
" -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
(cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
loops=1)"
" Index Cond: (logg_avsluttet = false)"
" Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
'X'::text))"
"Total runtime: 0.123 ms"
This is a dramatical difference, but I cannot understand why. In my head
"<>TRUE" should behave exactly the same as "=FALSE". This looks like a
bug to me, or am I overlooking something?
This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.
Some relevant details from the table definition:
CREATE TABLE alarmlogg
(
id serial NOT NULL,
alarm_status character varying(1) DEFAULT ''::character varying,
logg_avsluttet boolean DEFAULT false,
...
CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
)
CREATE INDEX i_alarmlogg_alarm_status
ON alarmlogg
USING btree
(alarm_status);
CREATE INDEX i_alarmlogg_logg_avsluttet
ON alarmlogg
USING btree
(logg_avsluttet);
Regards,
Jan-Ivar Mellingen
Securinet AS
From: | Mikael Krantz <mk(at)zigamorph(dot)se> |
---|---|
To: | Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-07-17 09:45:47 |
Message-ID: | 726863a30907170245p38f75a20rc594e77021750015@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
It might be that your column may be NULL as well as TRUE or FALSE. I
am no expert in this matter though.
/M
On Fri, Jul 17, 2009 at 10:12 AM, Jan-Ivar
Mellingen<jan-ivar(dot)mellingen(at)alreg(dot)no> wrote:
> One of our customers discovered that by replacing <>TRUE with =FALSE in
> a query of a table containing 750.000 records reduced the query time
> from about 12 seconds to about 60 milliseconds!
>
> The problematic query looks like this:
> SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
> Alarm_status='X' ORDER BY ID DESC
>
> If it is changed to this it works as expected:
> SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
> Alarm_status='X' ORDER BY ID DESC
>
> After investigation (on a smaller dataset on my own database) I found
> that the query was resulting in a sequential scan:
>
> "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
> AND Alarm_status='X' ORDER BY ID DESC"
> "Sort (cost=49936.96..49936.96 rows=1 width=405) (actual
> time=837.793..837.793 rows=0 loops=1)"
> " Sort Key: id"
> " Sort Method: quicksort Memory: 17kB"
> " -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405)
> (actual time=837.782..837.782 rows=0 loops=1)"
> " Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
> 'X'::text))"
> "Total runtime: 837.896 ms"
>
> The modified query gave this result:
> "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
> AND Alarm_status='X' ORDER BY ID DESC"
> "Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
> rows=0 loops=1)"
> " Sort Key: id"
> " Sort Method: quicksort Memory: 17kB"
> " -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
> (cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
> loops=1)"
> " Index Cond: (logg_avsluttet = false)"
> " Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
> 'X'::text))"
> "Total runtime: 0.123 ms"
>
> This is a dramatical difference, but I cannot understand why. In my head
> "<>TRUE" should behave exactly the same as "=FALSE". This looks like a
> bug to me, or am I overlooking something?
>
> This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.
>
> Some relevant details from the table definition:
> CREATE TABLE alarmlogg
> (
> id serial NOT NULL,
> alarm_status character varying(1) DEFAULT ''::character varying,
> logg_avsluttet boolean DEFAULT false,
> ...
> CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
> )
>
> CREATE INDEX i_alarmlogg_alarm_status
> ON alarmlogg
> USING btree
> (alarm_status);
>
> CREATE INDEX i_alarmlogg_logg_avsluttet
> ON alarmlogg
> USING btree
> (logg_avsluttet);
>
> Regards,
> Jan-Ivar Mellingen
> Securinet AS
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Cc: | "Jan-Ivar Mellingen" <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-07-17 10:36:44 |
Message-ID: | 200907171336.44198.peter_e@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Friday 17 July 2009 11:12:41 Jan-Ivar Mellingen wrote:
> One of our customers discovered that by replacing <>TRUE with =FALSE in
> a query of a table containing 750.000 records reduced the query time
> from about 12 seconds to about 60 milliseconds!
> This is a dramatical difference, but I cannot understand why. In my head
> "<>TRUE" should behave exactly the same as "=FALSE". This looks like a
> bug to me, or am I overlooking something?
The planner just isn't that smart. The boolean type is a special case where
<> some_value implies = some_other_value, but this doesn't generalize well to
other data types. And the planner doesn't have a whole lot of data type
specific knowledge.
I think a better index definition might actually be on alarm_status, with a
partial index predicate on logg_avsluttet = false.
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Cc: | Mikael Krantz <mk(at)zigamorph(dot)se>, "Jan-Ivar Mellingen" <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-07-17 12:35:29 |
Message-ID: | 200907171535.29504.peter_e@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Friday 17 July 2009 12:45:47 Mikael Krantz wrote:
> It might be that your column may be NULL as well as TRUE or FALSE. I
> am no expert in this matter though.
Nulls also need to be considered when attempting to substitute purportedly
equivalent clauses. But in this case it wouldn't actually matter, because
WHERE foo <> TRUE
and
WHERE foo = false
would both omit the row if foo is null. Both expressions only return true if
foo has the value "false". But again, this is data type specific knowledge.
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-07-17 14:11:49 |
Message-ID: | 29115.1247839909@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> writes:
> One of our customers discovered that by replacing <>TRUE with =FALSE in
> a query of a table containing 750.000 records reduced the query time
> from about 12 seconds to about 60 milliseconds!
This is not a bug. The set of operators that are indexable is well
documented, and <> is not one of them.
regards, tom lane
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-bugs(at)postgresql(dot)org, Mikael Krantz <mk(at)zigamorph(dot)se>, "Jan-Ivar Mellingen" <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-07-17 14:21:43 |
Message-ID: | 29292.1247840503@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> ... But again, this is data type specific knowledge.
Actually, now that I think about it, the planner already has
datatype-specific knowledge about boolean equality (see
simplify_boolean_equality). It would take just a few more lines of code
there to recognize "x <> true" and "x <> false" as additional variant
spellings of the generic "x" or "NOT x" constructs. Not sure if it's
worth the trouble though; how many people really write such things?
If you really wanted to take it to extremes, you could also reduce
cases like "x > false", but that's starting to get a bit silly.
regards, tom lane
From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Mikael Krantz <mk(at)zigamorph(dot)se>, Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-07-17 16:12:28 |
Message-ID: | FCA5FE46-3527-4068-987B-E07773818627@mit.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
I don't think it even has to be so specific. We should just always
rewrite bool <> bool into bool = NOT bool.
Hmm. That only has a 50/50 chance of creating an indexable clause.
Perhaps we could even rewrite it as "a = NOT b AND NOT a = b".
--
Greg
On 2009-07-17, at 3:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> ... But again, this is data type specific knowledge.
>
> Actually, now that I think about it, the planner already has
> datatype-specific knowledge about boolean equality (see
> simplify_boolean_equality). It would take just a few more lines of
> code
> there to recognize "x <> true" and "x <> false" as additional variant
> spellings of the generic "x" or "NOT x" constructs. Not sure if it's
> worth the trouble though; how many people really write such things?
>
> If you really wanted to take it to extremes, you could also reduce
> cases like "x > false", but that's starting to get a bit silly.
>
> regards, tom lane
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
From: | Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-08-10 08:00:51 |
Message-ID: | 4A7FD3B3.70307@alreg.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg젠 토토SQL : Postg젠 토토SQL 메일 링리스트 : 2009-08-10 08:00 이후의 PGSQL-BUGS |
Jan-Ivar Mellingen skrev:
> One of our customers discovered that by replacing <>TRUE with =FALSE in
> a query of a table containing 750.000 records reduced the query time
> from about 12 seconds to about 60 milliseconds!
>
> The problematic query looks like this:
> SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE AND
> Alarm_status='X' ORDER BY ID DESC
>
> If it is changed to this it works as expected:
> SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE AND
> Alarm_status='X' ORDER BY ID DESC
>
> After investigation (on a smaller dataset on my own database) I found
> that the query was resulting in a sequential scan:
>
> "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet <> TRUE
> AND Alarm_status='X' ORDER BY ID DESC"
> "Sort (cost=49936.96..49936.96 rows=1 width=405) (actual
> time=837.793..837.793 rows=0 loops=1)"
> " Sort Key: id"
> " Sort Method: quicksort Memory: 17kB"
> " -> Seq Scan on alarmlogg (cost=0.00..49936.95 rows=1 width=405)
> (actual time=837.782..837.782 rows=0 loops=1)"
> " Filter: ((logg_avsluttet <> true) AND ((alarm_status)::text =
> 'X'::text))"
> "Total runtime: 837.896 ms"
>
> The modified query gave this result:
> "explain analyze SELECT * FROM AlarmLogg WHERE Logg_avsluttet = FALSE
> AND Alarm_status='X' ORDER BY ID DESC"
> "Sort (cost=8.36..8.37 rows=1 width=405) (actual time=0.032..0.032
> rows=0 loops=1)"
> " Sort Key: id"
> " Sort Method: quicksort Memory: 17kB"
> " -> Index Scan using i_alarmlogg_logg_avsluttet on alarmlogg
> (cost=0.00..8.35 rows=1 width=405) (actual time=0.024..0.024 rows=0
> loops=1)"
> " Index Cond: (logg_avsluttet = false)"
> " Filter: ((NOT logg_avsluttet) AND ((alarm_status)::text =
> 'X'::text))"
> "Total runtime: 0.123 ms"
>
> This is a dramatical difference, but I cannot understand why. In my head
> "<>TRUE" should behave exactly the same as "=FALSE". This looks like a
> bug to me, or am I overlooking something?
>
> This was verified on PostgreSQL 8.3.7, both on Windows Xp and Ubuntu 8.10.
>
> Some relevant details from the table definition:
> CREATE TABLE alarmlogg
> (
> id serial NOT NULL,
> alarm_status character varying(1) DEFAULT ''::character varying,
> logg_avsluttet boolean DEFAULT false,
> ...
> CONSTRAINT alarmlogg_pkey PRIMARY KEY (id)
> )
>
> CREATE INDEX i_alarmlogg_alarm_status
> ON alarmlogg
> USING btree
> (alarm_status);
>
> CREATE INDEX i_alarmlogg_logg_avsluttet
> ON alarmlogg
> USING btree
> (logg_avsluttet);
>
> Regards,
> Jan-Ivar Mellingen
> Securinet AS
>
>
>
Thanks to all who answered my question and helped me discover that I
forgot to consider the NULL values.
After a cup of coffee and a little bit of thinking it became clear that
<>TRUE is not the same as FALSE, and the NULLS are not in the index.
PostgreSQL is a great database, but it does not hurt to think a little
when using it...
Thank You all!
/Jan-Ivar
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-bugs(at)postgresql(dot)org, Mikael Krantz <mk(at)zigamorph(dot)se>, Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-08-10 14:30:58 |
Message-ID: | 603c8f070908100730x795fa9d2saf394fd327a10069@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> ... But again, this is data type specific knowledge.
>
> Actually, now that I think about it, the planner already has
> datatype-specific knowledge about boolean equality (see
> simplify_boolean_equality). It would take just a few more lines of code
> there to recognize "x <> true" and "x <> false" as additional variant
> spellings of the generic "x" or "NOT x" constructs. Not sure if it's
> worth the trouble though; how many people really write such things?
I don't know, but there's probably somebody. I probably did it myself
a few times, when I was just starting out. If it's easy, it seems
worth doing. The problem with these things is that no matter how lame
it seems to do whatever-it-is, the pain when someone does is really
large... so adding a little bit of code to avoid that seems
worthwhile, at least to me.
> If you really wanted to take it to extremes, you could also reduce
> cases like "x > false", but that's starting to get a bit silly.
Probably that one is beyond even my tolerance.
...Robert
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-bugs(at)postgresql(dot)org, Mikael Krantz <mk(at)zigamorph(dot)se>, Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-08-10 15:10:02 |
Message-ID: | 2279.1249917002@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg사설 토토SQL : Postg사설 토토SQL 메일 링리스트 : 2009-08-10 15:10 이후의 PGSQL-BUGS |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Actually, now that I think about it, the planner already has
>> datatype-specific knowledge about boolean equality (see
>> simplify_boolean_equality). It would take just a few more lines of code
>> there to recognize "x <> true" and "x <> false" as additional variant
>> spellings of the generic "x" or "NOT x" constructs. Not sure if it's
>> worth the trouble though; how many people really write such things?
> I don't know, but there's probably somebody. I probably did it myself
> a few times, when I was just starting out. If it's easy, it seems
> worth doing.
http://archives.postgresql.org/pgsql-committers/2009-07/msg00164.php
regards, tom lane
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-bugs(at)postgresql(dot)org, Mikael Krantz <mk(at)zigamorph(dot)se>, Jan-Ivar Mellingen <jan-ivar(dot)mellingen(at)alreg(dot)no> |
Subject: | Re: Huge speed penalty using <>TRUE instead of =FALSE |
Date: | 2009-08-10 21:44:17 |
Message-ID: | 603c8f070908101444p1905bff9tf7cef419559e94c3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Mon, Aug 10, 2009 at 11:10 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Jul 17, 2009 at 10:21 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Actually, now that I think about it, the planner already has
>>> datatype-specific knowledge about boolean equality (see
>>> simplify_boolean_equality). It would take just a few more lines of code
>>> there to recognize "x <> true" and "x <> false" as additional variant
>>> spellings of the generic "x" or "NOT x" constructs. Not sure if it's
>>> worth the trouble though; how many people really write such things?
>
>> I don't know, but there's probably somebody. I probably did it myself
>> a few times, when I was just starting out. If it's easy, it seems
>> worth doing.
>
> http://archives.postgresql.org/pgsql-committers/2009-07/msg00164.php
>
> regards, tom lane
Oh, cool. Sorry, I missed the fact that that email was almost a month old.
...Robert