Lists: | pgsql-hackers |
---|
From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | "Lena(dot)ribackina(at)yandex(dot)ru" <Lena(dot)ribackina(at)yandex(dot)ru>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-06-27 18:55:23 |
Message-ID: | CAEudQAq8K=mK-Rm5F1v6csKbghbyyfkanQkWL5k5oUGfjJ6GnQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
>I finished writing the code patch for transformation "Or" expressions to
>"Any" expressions. I didn't see any problems in regression tests, even
>when I changed the constant at which the minimum or expression is
>replaced by any at 0. I ran my patch on sqlancer and so far the code has
>never fallen.
Thanks for working on this.
I took the liberty of making some modifications to the patch.
I didn't compile or test it.
Please feel free to use them.
regards,
Ranier Vilela
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Replace-clause-X-N1-OR-X-N2-.-with-X-ANY-N1-N2-on.patch.txt | text/plain | 10.4 KB |
From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | "Lena(dot)ribackina(at)yandex(dot)ru" <Lena(dot)ribackina(at)yandex(dot)ru>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-06-28 21:45:36 |
Message-ID: | cc145d1a-a245-3a14-c9d6-c6300b2872a3@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 6/27/23 20:55, Ranier Vilela wrote:
> Hi,
>
>>I finished writing the code patch for transformation "Or" expressions to
>>"Any" expressions. I didn't see any problems in regression tests, even
>>when I changed the constant at which the minimum or expression is
>>replaced by any at 0. I ran my patch on sqlancer and so far the code has
>>never fallen.
> Thanks for working on this.
>
> I took the liberty of making some modifications to the patch.
> I didn't compile or test it.
> Please feel free to use them.
>
I don't want to be rude, but this doesn't seem very helpful.
- You made some changes, but you don't even attempt to explain what you
changed or why you changed it.
- You haven't even tried to compile the code, nor tested it. If it
happens to compile, wow could others even know it actually behaves the
way you wanted?
- You responded in a way that breaks the original thread, so it's not
clear which message you're responding to.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, "Lena(dot)ribackina(at)yandex(dot)ru" <Lena(dot)ribackina(at)yandex(dot)ru>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-06-29 01:36:48 |
Message-ID: | CAEudQArdUav++f6u4ZjwUvM-DeManENDW9jhzmmDA7ZAQgCD9Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Em qua., 28 de jun. de 2023 às 18:45, Tomas Vondra <
tomas(dot)vondra(at)enterprisedb(dot)com> escreveu:
> On 6/27/23 20:55, Ranier Vilela wrote:
> > Hi,
> >
> >>I finished writing the code patch for transformation "Or" expressions to
> >>"Any" expressions. I didn't see any problems in regression tests, even
> >>when I changed the constant at which the minimum or expression is
> >>replaced by any at 0. I ran my patch on sqlancer and so far the code has
> >>never fallen.
> > Thanks for working on this.
> >
> > I took the liberty of making some modifications to the patch.
> > I didn't compile or test it.
> > Please feel free to use them.
> >
>
> I don't want to be rude, but this doesn't seem very helpful.
>
Sorry, It was not my intention to cause interruptions.
> - You made some changes, but you don't even attempt to explain what you
> changed or why you changed it.
>
1. Reduce scope
2. Eliminate unnecessary variables
3. Eliminate unnecessary expressions
>
> - You haven't even tried to compile the code, nor tested it. If it
> happens to compile, wow could others even know it actually behaves the
> way you wanted?
>
Attached v2 with make check pass all tests.
Ubuntu 64 bits
gcc 64 bits
> - You responded in a way that breaks the original thread, so it's not
> clear which message you're responding to.
>
It was a pretty busy day.
Sorry for the noise, I hope I was of some help.
regards,
Ranier Vilela
P.S.
0001-Replace-clause-X-N1-OR-X-N2-.-with-X-ANY-N1-N2-on.patch fails with 4
tests.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Replace-clause-X-N1-OR-X-N2-.-with-X-ANY-N1-N2-on.patch.txt | text/plain | 9.1 KB |
0001-make-check.txt | text/plain | 16.4 KB |
From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-06-29 05:50:21 |
Message-ID: | 2a359f41-85f4-6846-cf1b-1e3855b29aa7@yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi!
On 29.06.2023 04:36, Ranier Vilela wrote:
>
> I don't want to be rude, but this doesn't seem very helpful.
>
> Sorry, It was not my intention to cause interruptions.
>
>
> - You made some changes, but you don't even attempt to explain
> what you
> changed or why you changed it.
>
> 1. Reduce scope
> 2. Eliminate unnecessary variables
> 3. Eliminate unnecessary expressions
>
>
> - You haven't even tried to compile the code, nor tested it. If it
> happens to compile, wow could others even know it actually behaves the
> way you wanted?
>
Sorry I didn't answer right away. I will try not to do this in the
future thank you for your participation and help.
Yes, the scope of this patch may be small, but I am sure that it will
solve the worst case of memory consumption with large numbers of "or"
expressions or reduce execution and planning time. As I have already
said, I conducted a launch on a database with 20 billion data generated
using a benchmark. Unfortunately, at that time I sent a not quite
correct picture: the execution time, not the planning time, increases
with the number of "or" expressions (execution_time.png). x is the
number of or expressions, y is the execution/scheduling time.
I also throw memory consumption at 50,000 "or" expressions collected by
HeapTrack (where memory consumption was recorded already at the
initialization stage of the 1.27GB pic3.png). I think such a
transformation will allow just the same to avoid such a worst case,
since in comparison with ANY memory is much less and takes little time.
SELECT FORMAT('prepare x %s AS SELECT * FROM pgbench_accounts a WHERE %s',
'(' || string_agg('int',',') ||')',
string_agg(FORMAT('aid = $%s', g.id),' or ')
) AS cmd
FROM generate_series(1, 50000) AS g(id)
\gexec
SELECT FORMAT('execute x %s;','(' || string_agg(g.id::text,',') ||')') AS cmd
FROM generate_series(1, 50000) AS g(id)
\gexec
I tried to add a transformation at the path formation stage before we
form indexes (set_plain_rel_pathlist function) and at the stage when we
have preprocessing of "or" expressions (getting rid of duplicates or
useless conditions), but everywhere there was a problem of incorrect
selectivity estimation.
CREATE TABLE tenk1 (unique1int, unique2int, tenint, hundredint);
insert into tenk1 SELECT x,x,x,x FROM generate_series(1,50000) as x;
CREATE INDEX a_idx1 ON tenk1(unique1);
CREATE INDEX a_idx2 ON tenk1(unique2);
CREATE INDEX a_hundred ON tenk1(hundred);
postgres=# explain analyze
select * from tenk1 a join tenk1 b on
((a.unique2 = 3 or a.unique2 = 7)) or (a.unique1 = 1);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..140632434.50 rows=11250150000 width=32) (actual time=0.077..373.279 rows=1350000 loops=1)
-> Seq Scan on tenk1 b (cost=0.00..2311.00 rows=150000 width=16) (actual time=0.037..13.941 rows=150000 loops=1)
-> Materialize (cost=0.00..3436.01 rows=75001 width=16) (actual time=0.000..0.001 rows=9 loops=150000)
-> Seq Scan on tenk1 a (cost=0.00..3061.00 rows=75001 width=16) (actual time=0.027..59.174 rows=9 loops=1)
Filter: ((unique2 = ANY (ARRAY[3, 7])) OR (unique1 = 1))
Rows Removed by Filter: 149991
Planning Time: 0.438 ms
Execution Time: 407.144 ms
(8 rows)
Only by converting the expression at this stage, we do not encounter
this problem.
postgres=# set enable_bitmapscan ='off';
SET
postgres=# explain analyze
select * from tenk1 a join tenk1 b on
a.unique2 = 3 or a.unique2 = 7 or a.unique1 = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..22247.02 rows=1350000 width=32) (actual time=0.094..373.627 rows=1350000 loops=1)
-> Seq Scan on tenk1 b (cost=0.00..2311.00 rows=150000 width=16) (actual time=0.051..14.667 rows=150000 loops=1)
-> Materialize (cost=0.00..3061.05 rows=9 width=16) (actual time=0.000..0.001 rows=9 loops=150000)
-> Seq Scan on tenk1 a (cost=0.00..3061.00 rows=9 width=16) (actual time=0.026..42.389 rows=9 loops=1)
Filter: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
Rows Removed by Filter: 149991
Planning Time: 0.414 ms
Execution Time: 409.154 ms
(8 rows)
I compiled my original patch and there were no problems with regression
tests. The only time there was a problem when I set the
const_transform_or_limit variable to 0 (I have 15), as you have in the
patch. To be honest, diff appears there because you had a different
plan, specifically the expressions "or" are replaced by ANY (see
regression.diffs).
Unfortunately, your patch version did not apply immediately, I did not
understand the reasons, I applied it manually.
At the moment, I'm not sure that the constant is the right number for
applying transformations, so I'm in search of it, to be honest. I will
post my observations on this issue later. If you don't mind, I'll leave
the constant equal to 15 for now.
Sorry, I don't understand well enough what is meant by points "Eliminate
unnecessary variables" and "Eliminate unnecessary expressions". Can you
explain in more detail?
Regarding the patch, there was a Warning at the compilation stage.
In file included from ../../../src/include/nodes/bitmapset.h:21,
from ../../../src/include/nodes/parsenodes.h:26,
from ../../../src/include/catalog/objectaddress.h:17,
from ../../../src/include/catalog/pg_aggregate.h:24,
from parse_expr.c:18:
parse_expr.c: In function ‘transformBoolExprOr’:
../../../src/include/nodes/nodes.h:133:66: warning: ‘expr’ is used uninitialized [-Wuninitialized]
133 | #define nodeTag(nodeptr) (((const Node*)(nodeptr))->type)
| ^~
parse_expr.c:116:29: note: ‘expr’ was declared here
116 | BoolExpr *expr;
| ^~~~
I couldn't figure out how to fix it and went back to my original
version. To be honest, I don't think anything needs to be changed here.
Unfortunately, I didn't understand the reasons why, with the available
or expressions, you don't even try to convert to ANY by calling
transformBoolExpr, as I saw. I went back to my version.
I think it's worth checking whether the or_statement variable is positive.
I think it's worth leaving the use of the or_statement variable in its
original form.
switch (expr->boolop)
{
case AND_EXPR:
opname = "AND";
break;
case OR_EXPR:
opname = "OR";
or_statement = true;
break;
case NOT_EXPR:
opname = "NOT";
break;
default:
elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop);
opname = NULL; /* keep compiler quiet */
break;
}
if (!or_statement || list_length(expr->args) < const_transform_or_limit)
return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
The current version of the patch also works and all tests pass.
--
Regards,
Alena Rybakina
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
regression.diffs | text/plain | 11.3 KB |
0001-Replace-clause-X-N1-OR-X-N2-.-with-X-ANY-N1-N2-on.patch | text/x-patch | 10.6 KB |
pic3.png | image/png | 106.0 KB |
![]() |
image/png | 53.6 KB |
From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-06-29 06:10:42 |
Message-ID: | 8b040a42-863a-6660-6bdf-c023c40e8927@yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Sorry for the possible duplicate. I have a suspicion that the previous
email was not sent.
Hi!
On 29.06.2023 04:36, Ranier Vilela wrote:
> Em qua., 28 de jun. de 2023 às 18:45, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> escreveu:
>
> On 6/27/23 20:55, Ranier Vilela wrote:
> > Hi,
> >
> >>I finished writing the code patch for transformation "Or"
> expressions to
> >>"Any" expressions. I didn't see any problems in regression
> tests, even
> >>when I changed the constant at which the minimum or expression is
> >>replaced by any at 0. I ran my patch on sqlancer and so far the
> code has
> >>never fallen.
> > Thanks for working on this.
> >
> > I took the liberty of making some modifications to the patch.
> > I didn't compile or test it.
> > Please feel free to use them.
> >
>
> I don't want to be rude, but this doesn't seem very helpful.
>
> Sorry, It was not my intention to cause interruptions.
>
>
> - You made some changes, but you don't even attempt to explain
> what you
> changed or why you changed it.
>
> 1. Reduce scope
> 2. Eliminate unnecessary variables
> 3. Eliminate unnecessary expressions
>
>
> - You haven't even tried to compile the code, nor tested it. If it
> happens to compile, wow could others even know it actually behaves the
> way you wanted?
>
> Attached v2 with make check pass all tests.
> Ubuntu 64 bits
> gcc 64 bits
>
>
> - You responded in a way that breaks the original thread, so it's not
> clear which message you're responding to.
>
> It was a pretty busy day.
>
> Sorry for the noise, I hope I was of some help.
>
> regards,
> Ranier Vilela
>
> P.S.
> 0001-Replace-clause-X-N1-OR-X-N2-.-with-X-ANY-N1-N2-on.patch fails
> with 4 tests.
Sorry I didn't answer right away. I will try not to do this in the
future thank you for your participation and help.
Yes, the scope of this patch may be small, but I am sure that it will
solve the worst case of memory consumption with large numbers of "or"
expressions or reduce execution and planning time. As I have already
said, I conducted a launch on a database with 20 billion data generated
using a benchmark. Unfortunately, at that time I sent a not quite
correct picture: the execution time, not the planning time, increases
with the number of "or" expressions
(https://www.dropbox.com/s/u7gt81blbv2adpi/execution_time.png?dl=0) x
is the number of or expressions, y is the execution/scheduling time.
I also throw memory consumption at 50,000 "or" expressions collected by
HeapTrack (where memory consumption was recorded already at the
initialization stage of the 1.27GB
https://www.dropbox.com/s/vb827ya0193dlz0/pic3.png?dl=0) I think such a
transformation will allow just the same to avoid such a worst case,
since in comparison with ANY memory is much less and takes little time.
SELECT FORMAT('prepare x %s AS SELECT * FROM pgbench_accounts a WHERE %s',
'(' || string_agg('int',',') ||')',
string_agg(FORMAT('aid = $%s', g.id),' or ')
) AS cmd
FROM generate_series(1, 50000) AS g(id)
\gexec
SELECT FORMAT('execute x %s;','(' || string_agg(g.id::text,',') ||')') AS cmd
FROM generate_series(1, 50000) AS g(id)
\gexec
I tried to add a transformation at the path formation stage before we
form indexes (set_plain_rel_pathlist function) and at the stage when we
have preprocessing of "or" expressions (getting rid of duplicates or
useless conditions), but everywhere there was a problem of incorrect
selectivity estimation.
CREATE TABLE tenk1 (unique1int, unique2int, tenint, hundredint);
insert into tenk1 SELECT x,x,x,x FROM generate_series(1,50000) as x;
CREATE INDEX a_idx1 ON tenk1(unique1);
CREATE INDEX a_idx2 ON tenk1(unique2);
CREATE INDEX a_hundred ON tenk1(hundred);
postgres=# explain analyze
select * from tenk1 a join tenk1 b on
((a.unique2 = 3 or a.unique2 = 7)) or (a.unique1 = 1);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..140632434.50 rows=11250150000 width=32) (actual time=0.077..373.279 rows=1350000 loops=1)
-> Seq Scan on tenk1 b (cost=0.00..2311.00 rows=150000 width=16) (actual time=0.037..13.941 rows=150000 loops=1)
-> Materialize (cost=0.00..3436.01 rows=75001 width=16) (actual time=0.000..0.001 rows=9 loops=150000)
-> Seq Scan on tenk1 a (cost=0.00..3061.00 rows=75001 width=16) (actual time=0.027..59.174 rows=9 loops=1)
Filter: ((unique2 = ANY (ARRAY[3, 7])) OR (unique1 = 1))
Rows Removed by Filter: 149991
Planning Time: 0.438 ms
Execution Time: 407.144 ms
(8 rows)
Only by converting the expression at this stage, we do not encounter
this problem.
postgres=# set enable_bitmapscan ='off';
SET
postgres=# explain analyze
select * from tenk1 a join tenk1 b on
a.unique2 = 3 or a.unique2 = 7 or a.unique1 = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..22247.02 rows=1350000 width=32) (actual time=0.094..373.627 rows=1350000 loops=1)
-> Seq Scan on tenk1 b (cost=0.00..2311.00 rows=150000 width=16) (actual time=0.051..14.667 rows=150000 loops=1)
-> Materialize (cost=0.00..3061.05 rows=9 width=16) (actual time=0.000..0.001 rows=9 loops=150000)
-> Seq Scan on tenk1 a (cost=0.00..3061.00 rows=9 width=16) (actual time=0.026..42.389 rows=9 loops=1)
Filter: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
Rows Removed by Filter: 149991
Planning Time: 0.414 ms
Execution Time: 409.154 ms
(8 rows)
I compiled my original patch and there were no problems with regression
tests. The only time there was a problem when I set the
const_transform_or_limit variable to 0 (I have 15), as you have in the
patch. To be honest, diff appears there because you had a different
plan, specifically the expressions "or" are replaced by ANY (see
regression.diffs).
Unfortunately, your patch version did not apply immediately, I did not
understand the reasons, I applied it manually.
At the moment, I'm not sure that the constant is the right number for
applying transformations, so I'm in search of it, to be honest. I will
post my observations on this issue later. If you don't mind, I'll leave
the constant equal to 15 for now.
Sorry, I don't understand well enough what is meant by points "Eliminate
unnecessary variables" and "Eliminate unnecessary expressions". Can you
explain in more detail?
Regarding the patch, there was a Warning at the compilation stage.
In file included from ../../../src/include/nodes/bitmapset.h:21,
from ../../../src/include/nodes/parsenodes.h:26,
from ../../../src/include/catalog/objectaddress.h:17,
from ../../../src/include/catalog/pg_aggregate.h:24,
from parse_expr.c:18:
parse_expr.c: In function ‘transformBoolExprOr’:
../../../src/include/nodes/nodes.h:133:66: warning: ‘expr’ is used uninitialized [-Wuninitialized]
133 | #define nodeTag(nodeptr) (((const Node*)(nodeptr))->type)
| ^~
parse_expr.c:116:29: note: ‘expr’ was declared here
116 | BoolExpr *expr;
| ^~~~
I couldn't figure out how to fix it and went back to my original
version. To be honest, I don't think anything needs to be changed here.
Unfortunately, I didn't understand the reasons why, with the available
or expressions, you don't even try to convert to ANY by calling
transformBoolExpr, as I saw. I went back to my version.
I think it's worth checking whether the or_statement variable is positive.
I think it's worth leaving the use of the or_statement variable in its
original form.
switch (expr->boolop)
{
case AND_EXPR:
opname = "AND";
break;
case OR_EXPR:
opname = "OR";
or_statement = true;
break;
case NOT_EXPR:
opname = "NOT";
break;
default:
elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop);
opname = NULL; /* keep compiler quiet */
break;
}
if (!or_statement || list_length(expr->args) < const_transform_or_limit)
return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
The current version of the patch also works and all tests pass.
--
Regards,
Alena Rybakina
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
0001-Replace-clause-X-N1-OR-X-N2-.-with-X-ANY-N1-N2-on.patch | text/x-patch | 10.6 KB |
regression.diffs | text/plain | 11.3 KB |
From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-06-29 12:25:20 |
Message-ID: | CAEudQAqW3p0R5NMjOjTuCbs=HTQR5YO8WDgCjcSuArbLdDJVtw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Em qui., 29 de jun. de 2023 às 02:50, Alena Rybakina <
lena(dot)ribackina(at)yandex(dot)ru> escreveu:
> Hi!
> On 29.06.2023 04:36, Ranier Vilela wrote:
>
> I don't want to be rude, but this doesn't seem very helpful.
>>
> Sorry, It was not my intention to cause interruptions.
>
>
>> - You made some changes, but you don't even attempt to explain what you
>> changed or why you changed it.
>>
> 1. Reduce scope
> 2. Eliminate unnecessary variables
> 3. Eliminate unnecessary expressions
>
>
>>
>> - You haven't even tried to compile the code, nor tested it. If it
>> happens to compile, wow could others even know it actually behaves the
>> way you wanted?
>
> Sorry I didn't answer right away. I will try not to do this in the future
> thank you for your participation and help.
>
There's no need to apologize.
> Yes, the scope of this patch may be small, but I am sure that it will
> solve the worst case of memory consumption with large numbers of "or"
> expressions or reduce execution and planning time.
>
Yeah, I also believe it will help performance.
> As I have already said, I conducted a launch on a database with 20 billion
> data generated using a benchmark. Unfortunately, at that time I sent a not
> quite correct picture: the execution time, not the planning time, increases
> with the number of "or" expressions (execution_time.png). x is the number
> of or expressions, y is the execution/scheduling time.I also throw memory
> consumption at 50,000 "or" expressions collected by HeapTrack (where memory
> consumption was recorded already at the initialization stage of the 1.27GB
> pic3.png). I think such a transformation will allow just the same to avoid
> such a worst case, since in comparison with ANY memory is much less and
> takes little time.
>
SELECT FORMAT('prepare x %s AS SELECT * FROM pgbench_accounts a WHERE %s',
> '(' || string_agg('int', ',') || ')',
> string_agg(FORMAT('aid = $%s', g.id), ' or ')
> ) AS cmd
> FROM generate_series(1, 50000) AS g(id)
> \gexec
>
> SELECT FORMAT('execute x %s;', '(' || string_agg(g.id::text, ',') || ')') AS cmd
> FROM generate_series(1, 50000) AS g(id)
> \gexec
>
> I tried to add a transformation at the path formation stage before we form
> indexes (set_plain_rel_pathlist function) and at the stage when we have
> preprocessing of "or" expressions (getting rid of duplicates or useless
> conditions), but everywhere there was a problem of incorrect selectivity
> estimation.
>
> CREATE TABLE tenk1 (unique1 int, unique2 int, ten int, hundred int);
> insert into tenk1 SELECT x,x,x,x FROM generate_series(1,50000) as x;
> CREATE INDEX a_idx1 ON tenk1(unique1);
> CREATE INDEX a_idx2 ON tenk1(unique2);
> CREATE INDEX a_hundred ON tenk1(hundred);
>
> postgres=# explain analyze
> select * from tenk1 a join tenk1 b on
> ((a.unique2 = 3 or a.unique2 = 7)) or (a.unique1 = 1);
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.00..140632434.50 rows=11250150000 width=32) (actual time=0.077..373.279 rows=1350000 loops=1)
> -> Seq Scan on tenk1 b (cost=0.00..2311.00 rows=150000 width=16) (actual time=0.037..13.941 rows=150000 loops=1)
> -> Materialize (cost=0.00..3436.01 rows=75001 width=16) (actual time=0.000..0.001 rows=9 loops=150000)
> -> Seq Scan on tenk1 a (cost=0.00..3061.00 rows=75001 width=16) (actual time=0.027..59.174 rows=9 loops=1)
> Filter: ((unique2 = ANY (ARRAY[3, 7])) OR (unique1 = 1))
> Rows Removed by Filter: 149991
> Planning Time: 0.438 ms
> Execution Time: 407.144 ms
> (8 rows)
>
> Only by converting the expression at this stage, we do not encounter this
> problem.
>
> postgres=# set enable_bitmapscan ='off';
> SET
> postgres=# explain analyze
> select * from tenk1 a join tenk1 b on
> a.unique2 = 3 or a.unique2 = 7 or a.unique1 = 1;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.00..22247.02 rows=1350000 width=32) (actual time=0.094..373.627 rows=1350000 loops=1)
> -> Seq Scan on tenk1 b (cost=0.00..2311.00 rows=150000 width=16) (actual time=0.051..14.667 rows=150000 loops=1)
> -> Materialize (cost=0.00..3061.05 rows=9 width=16) (actual time=0.000..0.001 rows=9 loops=150000)
> -> Seq Scan on tenk1 a (cost=0.00..3061.00 rows=9 width=16) (actual time=0.026..42.389 rows=9 loops=1)
> Filter: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
> Rows Removed by Filter: 149991
> Planning Time: 0.414 ms
> Execution Time: 409.154 ms
> (8 rows)
>
> I compiled my original patch and there were no problems with regression
> tests. The only time there was a problem when I set the
> const_transform_or_limit variable to 0 (I have 15), as you have in the
> patch. To be honest, diff appears there because you had a different plan,
> specifically the expressions "or" are replaced by ANY (see
> regression.diffs).
>
You are right. The v3 attached shows the same diff.
Unfortunately, your patch version did not apply immediately, I did not
> understand the reasons, I applied it manually.
>
Sorry.
> At the moment, I'm not sure that the constant is the right number for
> applying transformations, so I'm in search of it, to be honest. I will post
> my observations on this issue later. If you don't mind, I'll leave the
> constant equal to 15 for now.
>
It's hard to predict. Perhaps accounting for time on each benchmark could
help decide.
> Sorry, I don't understand well enough what is meant by points "Eliminate
> unnecessary variables" and "Eliminate unnecessary expressions". Can you
> explain in more detail?
>
One example is array_type.
As you can see in v2 and v3 it no longer exists.
>
> Regarding the patch, there was a Warning at the compilation stage.
>
> In file included from ../../../src/include/nodes/bitmapset.h:21,
>
> from ../../../src/include/nodes/parsenodes.h:26,
>
> from ../../../src/include/catalog/objectaddress.h:17,
>
> from ../../../src/include/catalog/pg_aggregate.h:24,
>
> from parse_expr.c:18:
>
> parse_expr.c: In function ‘transformBoolExprOr’:
>
> ../../../src/include/nodes/nodes.h:133:66: warning: ‘expr’ is used uninitialized [-Wuninitialized]
>
> 133 | #define nodeTag(nodeptr) (((const Node*)(nodeptr))->type)
>
> | ^~
>
> parse_expr.c:116:29: note: ‘expr’ was declared here
>
> 116 | BoolExpr *expr;
>
> | ^~~~
>
> Sorry, this error did not appear in my builds
> I couldn't figure out how to fix it and went back to my original version.
> To be honest, I don't think anything needs to be changed here.
>
> Unfortunately, I didn't understand the reasons why, with the available or
> expressions, you don't even try to convert to ANY by calling
> transformBoolExpr, as I saw. I went back to my version.
>
> I think it's worth checking whether the or_statement variable is positive.
>
> I think it's worth leaving the use of the or_statement variable in its
> original form.
>
> switch (expr->boolop)
> {
> case AND_EXPR:
> opname = "AND";
> break;
> case OR_EXPR:
> opname = "OR";
> or_statement = true;
> break;
> case NOT_EXPR:
> opname = "NOT";
> break;
> default:
> elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop);
> opname = NULL; /* keep compiler quiet */
> break;
> }
>
> if (!or_statement || list_length(expr->args) < const_transform_or_limit)
> return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
>
You are right, the v3 this way.
As I said earlier, these are just suggestions.
But thinking about it now, I think they can be classified as bad early
optimizations.
regards,
Ranier Vilela
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Replace-clause-X-N1-OR-X-N2-.-with-X-ANY-N1-N2-on.patch.txt | text/plain | 9.1 KB |
v3-regression.diffs | application/octet-stream | 11.2 KB |
From: | Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, teodor(at)sigaev(dot)ru, Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-06-29 15:15:05 |
Message-ID: | f6a67bf2-c3bf-330e-9a3c-ed03eec123be@yandex.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi!
>
> At the moment, I'm not sure that the constant is the right number
> for applying transformations, so I'm in search of it, to be
> honest. I will post my observations on this issue later. If you
> don't mind, I'll leave the constant equal to 15 for now.
>
> It's hard to predict. Perhaps accounting for time on each benchmark
> could help decide.
>
I will try to test on JOB [1] (because queries are difficult for the
optimizer due to the significant number of joins and correlations
contained in the dataset) and
tpcds [2] (the benchmark I noticed contains a sufficient number of
queries with "or" expressions).
> Sorry, I don't understand well enough what is meant by points
> "Eliminate unnecessary variables" and "Eliminate unnecessary
> expressions". Can you explain in more detail?
>
> One example is array_type.
> As you can see in v2 and v3 it no longer exists.
>
I get it. Honestly, I was guided by the example of converting "IN" to
"ANY" (transformAExprIn), at least the part of the code when we
specifically convert the expression to ScalarArrayOpExpr.
Both there and here, we first look for a common type for the collected
constants, and if there is one, then we try to find the type for the
array structure.
Only I think in my current patch it is also worth returning to the
original version in this place, since if it is not found, the
ScalarArrayOpExpr generation function will be processed incorrectly and
the request may not be executed at all, referring to the error that it
is impossible to determine the type of node (ERROR: unrecognized node
type. )
At the same time we are trying to do this transformation for each group.
The group here implies that these are combined "or" expressions on the
common left side, and at the same time we consider
only expressions that contain a constant and only equality.
What else should be taken into account is that we are trying to do this
processing before forming a BoolExpr expression (if you notice, then
after any outcome we call the makeBoolExpr function,
which just forms the "Or" expression, as in the original version,
regardless of what type of expressions it combines.
>
> As I said earlier, these are just suggestions.
> But thinking about it now, I think they can be classified as bad early
> optimizations.
Thank you again for your interest in this problem and help. Yes, I think
so too)
1. https://github.com/gregrahn/join-order-benchmark
2.
https://github.com/Alena0704/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds
--
Regards,
Alena Rybakina
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
0001-Replace-OR-clause-to-ANY-expressions.patch | text/x-patch | 10.0 KB |