Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE

Lists: pgsql-bugsPostg스포츠 토토 결과SQL
From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: alexey(dot)ermakov(at)dataegret(dot)com
Subject: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-04-17 09:40:50
Message-ID: 152395805004.19366.3107109716821067806@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The following bug has been logged on the website:

Bug reference: 15160
Logged by: Alexey Ermakov
Email address: alexey(dot)ermakov(at)dataegret(dot)com
PostgreSQL version: 10.3
Operating system: Linux
Description:

Hello,

I'm wondering how planner estimates number of rows in that case:

create table test_in (id int primary key);
insert into test_in select id from generate_series(1,1000000) gs(id);
analyze test_in;

explain analyze with ids as (select id from generate_series(1,1000) gs(id)
limit 200)
select * from test_in where id in (select id from ids);
-------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=6.93..139.79 rows=200 width=4) (actual time=0.129..0.400
rows=200 loops=1)
CTE ids
-> Limit (cost=0.00..2.00 rows=200 width=4) (actual time=0.050..0.066
rows=200 loops=1)
-> Function Scan on generate_series gs (cost=0.00..10.00
rows=1000 width=4) (actual time=0.050..0.057 rows=200 loops=1)
-> HashAggregate (cost=4.50..6.50 rows=200 width=4) (actual
time=0.117..0.133 rows=200 loops=1)
Group Key: ids.id
-> CTE Scan on ids (cost=0.00..4.00 rows=200 width=4) (actual
time=0.051..0.086 rows=200 loops=1)
-> Index Only Scan using test_in_pkey on test_in (cost=0.42..0.66
rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=200)
Index Cond: (id = ids.id)
Heap Fetches: 200
Planning time: 0.128 ms
Execution time: 0.434 ms

explain analyze with ids as (select id from generate_series(1,1000) gs(id)
limit 201)
select * from test_in where id in (select id from ids);
-------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=6.96..132.78 rows=500000 width=4) (actual
time=0.119..0.389 rows=201 loops=1)
CTE ids
-> Limit (cost=0.00..2.01 rows=201 width=4) (actual time=0.048..0.064
rows=201 loops=1)
-> Function Scan on generate_series gs (cost=0.00..10.00
rows=1000 width=4) (actual time=0.048..0.056 rows=201 loops=1)
-> HashAggregate (cost=4.52..6.52 rows=200 width=4) (actual
time=0.113..0.130 rows=201 loops=1)
Group Key: ids.id
-> CTE Scan on ids (cost=0.00..4.02 rows=201 width=4) (actual
time=0.049..0.083 rows=201 loops=1)
-> Index Only Scan using test_in_pkey on test_in (cost=0.42..0.66
rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=201)
Index Cond: (id = ids.id)
Heap Fetches: 201
Planning time: 0.068 ms
Execution time: 0.417 ms

please note that it first example we got correct estimate of total number of
rows - 200, but in last one where CTE returned 201 rows (instead of 200) we
estimate total number of rows as 500000 (half of the table test_in).
which is way off and could lead to non optimal plan and poor performance.
I have same estimate if I replace IN clause with equivalent EXISTS subquery;
normal join estimates number of rows fine (but it's not equivalent in
general case when table_in.id is not unique).
reproduced in 9.5, 9.6 and 10. interesting thing that in postgresql 10
threshold is 200 rows but in previous version it's 199.
I suspect selectivity 0.5 we somehow get inside
compute_semi_anti_join_factors function in costsize.c but I'm not sure.

Thanks,
Alexey Ermakov


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: alexey(dot)ermakov(at)dataegret(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-04-17 14:15:38
Message-ID: 15035.1523974538@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs Postg스포츠 토토 결과SQL

=?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
> I'm wondering how planner estimates number of rows in that case:

See eqjoinsel_semi, particularly the change in behavior when it thinks
nd2 is or is not a default estimate.

Given the lack of statistics about the output of the WITH clause,
it's hard to see how we'd ever get trustworthy estimates here.
I think the fact that your first example yields an accurate
estimate is mostly luck.

regards, tom lane


From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexey Ermakov <alexey(dot)ermakov(at)dataegret(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-09-25 11:24:39
Message-ID: CAK-MWwR_G0NLLB9TwRhs+3cHDa9zZnYg7AgpAfeYJn_hgpZBsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Apr 17, 2018 at 5:15 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
> > I'm wondering how planner estimates number of rows in that case:
>
> See eqjoinsel_semi, particularly the change in behavior when it thinks
> nd2 is or is not a default estimate.
>
> Given the lack of statistics about the output of the WITH clause,
> it's hard to see how we'd ever get trustworthy estimates here.
> I think the fact that your first example yields an accurate
> estimate is mostly luck.
>
> regards, tom lane
>
>

There are similar issue without CTE which look pretty weird:

Good case with LIMIT 199 and adequate estimation:
hh=# explain SELECT * FROM resume WHERE resume_id IN (select id from
generate_series(1, 1000) gs(id) LIMIT 199);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Nested Loop (cost=21.53..108.98 rows=199 width=519)
-> Unique (cost=21.42..21.62 rows=199 width=4)
-> Sort (cost=21.42..21.52 rows=199 width=4)
Sort Key: gs.id
-> Limit (cost=0.00..9.95 rows=199 width=4)
-> Function Scan on generate_series gs
(cost=0.00..50.00 rows=1000 width=4)
-> Index Scan using resume_pk on resume (cost=0.11..0.39 rows=1
width=519)
Index Cond: (resume_id = gs.id)

Very bad case with awful estimation (only difference LIMIT 200 vs LIMIT
199):
explain SELECT * FROM resume WHERE resume_id IN (select id from
generate_series(1, 1000) gs(id) LIMIT 200);
QUERY PLAN
---------------------------------------------------------------------------------------------------
Nested Loop (cost=21.64..109.53 rows=45860504 width=519)
-> Unique (cost=21.53..21.73 rows=200 width=4)
-> Sort (cost=21.53..21.63 rows=200 width=4)
Sort Key: gs.id
-> Limit (cost=0.00..10.00 rows=200 width=4)
-> Function Scan on generate_series gs
(cost=0.00..50.00 rows=1000 width=4)
-> Index Scan using resume_pk on resume (cost=0.11..0.39 rows=1
width=519)
Index Cond: (resume_id = gs.id)

It's not a problem by itself but once you start using this query with more
joined tables - a lot bad things happens because 5 orders of magnitude
error in selectivity estimation.

PS: in reality it forces us to use not more than 199 LIMIT in complex joins
for batch operations or the database start generate funny plans.

Regards,
Maxim

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: Alexey Ermakov <alexey(dot)ermakov(at)dataegret(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-09-25 20:11:15
Message-ID: 18632.1537906275@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> writes:
> On Tue, Apr 17, 2018 at 5:15 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> =?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
>>> I'm wondering how planner estimates number of rows in that case:

>> See eqjoinsel_semi, particularly the change in behavior when it thinks
>> nd2 is or is not a default estimate.

> There are similar issue without CTE which look pretty weird:

Yeah, this is exactly the same case as Alexey's: as soon as eqjoinsel_semi
decides it's dealing with a default ndistinct estimate, it chickens out
and delivers a very middle-of-the-road selectivity (0.5, it looks like).
It's somewhat luck that the non-default path is giving you an accurate
estimate, but certainly there's no surprise in the default case being
way off.

I don't particularly want to make that logic more aggressive about
assuming it's calculating something real. The existing behavior was
put in to fix a clear bug in the other direction, see
/message-id/flat/201104112029.14738.uwe%40oss4u.com

However, while looking at this I had a bit of an epiphany. The inner-join
selectivity in the same cases is pretty much on-target, so is there any
way we could factor that in? Yes, there is: the size of the semijoin
output could not be more than the output of a plain inner join of the
same two relations. So it'd be legitimate to clamp our selectivity
estimate for the semijoin case to make it not more than the inner-join
estimate.

A little bit of hacking later, I have the attached patch. The bulk of
the patch is just refactoring to avoid repetitive information lookup
when we call both eqjoinsel_semi and eqjoinsel_inner. The actual
change is just to clamp eqjoinsel_semi's result, like this:

/*
* We should never estimate the output of a semijoin to be more
* rows than the equivalent inner join; it's obviously impossible
* for that to happen. The former is N1 * Psemi while the latter
* is N1 * N2 * Pinner, so we may clamp Psemi <= N2 * Pinner.
* Doing this is worthwhile because of the shakier estimation
* rules we use in eqjoinsel_semi, particularly in cases where it
* has to punt entirely.
*/
selec = Min(selec, inner_rel->rows * selec_inner);

That makes the funny behavior go away in both test cases shown in this
thread. I find one plan change in the regression tests, but it looks
reasonable enough (and checking the actual row counts shows that the
estimate moved closer to reality, not further away).

Now, there's a certain amount of garbage-in-garbage-out to this: if for
some reason the innerjoin selectivity is way off, this could do more to
hurt the semijoin estimate than to help it. But I think that generally
the semijoin numbers are much less reliable than the innerjoin numbers,
so mostly it ought to be no-change or a win.

I'll queue this up for review in the next CF.

regards, tom lane

Attachment Content-Type Size
clamp-semijoin-selectivity-1.patch text/x-diff 21.8 KB

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-11-16 00:51:45
Message-ID: 154232950503.1400.7562962199586713542.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

This patch applies cleanly and works for the case described in the original
email. All existing regression tests pass with the addition of the explain plan
update included in the patch.

I could not devise an example in which the previous method of calculating
selectivity would have produced a better estimate. However, one question I have
after thinking through the optimization is the following:

This new selectivity calculation (targeted at semi-joins though in eqjoinsel) is:
selectivity = Min(semi-join selectivity, ntuples inner * inner-join selectivity);

For a join in which you do not have access to MCVs and assuming no NULLs, the
inner-join selectivity used to compare to the calculated semi-join selectivity*
is:

selectivity = ntuples2 * 1 / max(nd1, nd2) = ntuples2 / max(nd1, nd2)

if nd1 <= nd2:
selectivity = ntuples2 / nd2
else:
selectivity = ntuples2 / nd1

To summarize:

Selectivity Type | if nd1 <= nd2 | if nd1 > nd2 |
----------------------------------|----------------|-----------------
inner-join selectivity * ntuples2 | ntuples2 / nd2 | ntuples2 / nd1 |
semi-join selectivity | 1 | nd2 / nd1 |

Notice that ntuples2 >= nd2 so no matter what nd1 and nd2 are:

inner-join selectivity * ntuples >= semi-join selectivity

So, it seems like, unless you are missing NDVs of one of the sides of the join,
inner join selectivity can never be less than semi-join selectivity. If this is
true, why not use the default NDVs number in the semi-join selectivity
calculation?

* based on these summaries of the formulas for calculating selectivity

inner-join selectivity when you don't have access to MCVs and assuming no NULLs is
~ 1 / max(nd1,nd2)
if either nd1 or nd2 was default, nd[1,2] = min(ntuples[1,2], 200)

semi-join selectivity when you don't have access to MCVs and assuming no NULLs is
0.5 when either nd1 or nd2 is default
when neither are default
if nd2 < 0 || nd1 <= nd2
1
else
nd2/nd1

If there is a reason to keep the existing formula, then I have an additional
question about the proposed selectivity calculation:

selec = Min(selec, nd2 * selec_inner);

When would it be incorrect to instead multiply by inner side NDVs?

Besides the actual logic of the code added, Ekta and I did a code review and
had some feedback on the structure and clarity of the code and comments.

In the function eqjoinsel_semi, on line 2759 of the patched, rebased code,
could you not move the else condition:

uncertainfrac = 0.5;

Up to the top of the if statement which starts on line 2663:

if (have_mcvs1 && have_mcvs2 && OidIsValid(opfuncoid))

It seems like you already know and do not further modify the value of
isdefault1 and isdefault2 and could exit faster before looping through all the
MCVs in this case.

For the function eqjoinsel_inner, why pass in vardata1 and vardata2, as they
appear not to be used? Neither are the isdefault flags.

This is in the existing code, however, I thought I would ask here:

In eqjoinsel_semi, on line 2691 of the patched, rebased code, Why is this the
min of the number of MCVs captured and the distinct values? It seems like if
clamping resulted in an NDVs that is too low (i.e. impossibly low since the
number of distinct values cannot be less than the number of MCVs), then you
should bump it up to at least the number of MCVs:

clamped_nvalues2 = Min(sslot2->nvalues, nd2);

I also found the new comment added above the new selectivity calculation to be
a little bit confusing:
/*
* We should never estimate the output of a semijoin to be more
* rows than the equivalent inner join; it's obviously impossible
* for that to happen. The former is N1 * Psemi while the latter
* is N1 * N2 * Pinner, so we may clamp Psemi <= N2 * Pinner.
* Doing this is worthwhile because of the shakier estimation
* rules we use in eqjoinsel_semi, particularly in cases where it
* has to punt entirely.
*/
selec = Min(selec, inner_rel->rows * selec_inner);

After re-reading it several times, I understood what it
was doing, however, it would be ideal if somehow the relationship between
selectivity and cardinality were more clear.

I don't have any great ideas for additional wording, however, maybe it would
help to clarify that in order to clamp the cardinality correctly, we must clamp
the selectivity using this formula. Basically, specify that we are not clamping
the selectivity of semi-join to the selectivity of inner join, but, rather,
that we are clamping the cardinality of semi-join to consider only the matching
rows of the inner side (also, if that sentence is actually not correct, then
some description that avoids confusion like that would be helpful).

The new status of this patch is: Waiting on Author


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-11-16 16:18:41
Message-ID: 27925.1542385121@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Melanie Plageman <melanieplageman(at)gmail(dot)com> writes:
> This patch applies cleanly and works for the case described in the original
> email. All existing regression tests pass with the addition of the explain plan
> update included in the patch.

Thanks for reviewing!

> I could not devise an example in which the previous method of calculating
> selectivity would have produced a better estimate. However, one question I have
> after thinking through the optimization is the following:
> ...
> To summarize:
> Selectivity Type | if nd1 <= nd2 | if nd1 > nd2 |
> ----------------------------------|----------------|-----------------
> inner-join selectivity * ntuples2 | ntuples2 / nd2 | ntuples2 / nd1 |
> semi-join selectivity | 1 | nd2 / nd1 |

Um, mumble. Those functions could be using different values of nd2
thanks to the clamping logic near the head of eqjoinsel_semi, so I'm
not sure that the comparison you're making really holds.

> ... why not use the default NDVs number in the semi-join selectivity
> calculation?

Practical experience says that it doesn't work very well; see the thread
I referred to before,
/message-id/flat/201104112029(dot)14738(dot)uwe(at)oss4u(dot)com
particularly my comment

:: While I don't have your specific example to try, I did some
:: experimenting with queries of this form, and I noticed that 8.4's
:: heuristic in eqjoinsel_semi() was going completely nuts and estimating
:: that all rows in the lefthand side have join partners (thus, no rows out
:: of the antijoin). This is because it has stats for one side of the
:: comparison operator but not the other side (the one with the
:: sub-select). But it's taking the totally-made-up ndistinct estimate for
:: the sub-select at face value. It needs to be a bit warier I think.

That experience is what led to the "isdefault" checks that exist now
in eqjoinsel_semi. I don't think that applying a clamp based on
eqjoinsel_inner is sufficient reason to remove those sanity checks.
In particular, even if the clamp removes the possibility of the semijoin
estimate being too high, it doesn't do anything to prevent a too-low
estimate due to using baseless numbers.

> If there is a reason to keep the existing formula, then I have an additional
> question about the proposed selectivity calculation:
> selec = Min(selec, nd2 * selec_inner);
> When would it be incorrect to instead multiply by inner side NDVs?

I'm confused ... isn't that exactly what this is doing?

> In the function eqjoinsel_semi, on line 2759 of the patched, rebased code,
> could you not move the else condition:
> uncertainfrac = 0.5;
> Up to the top of the if statement which starts on line 2663:
> if (have_mcvs1 && have_mcvs2 && OidIsValid(opfuncoid))
> It seems like you already know and do not further modify the value of
> isdefault1 and isdefault2 and could exit faster before looping through all the
> MCVs in this case.

Not really, because we still need to know matchfreq1, so we still have
to do all the comparisons. It's true that nmatches won't be used in
this case, but I don't see that we can get any meaningful savings by
not computing that.

> For the function eqjoinsel_inner, why pass in vardata1 and vardata2, as they
> appear not to be used? Neither are the isdefault flags.

It just seemed saner to keep the parameter lists similar for
eqjoinsel_inner and eqjoinsel_semi (a judgment call, I admit).
In practice, since there's only one call site for eqjoinsel_inner,
I'd expect it to get inlined so that there's no runtime cost anyway.

> This is in the existing code, however, I thought I would ask here:
> In eqjoinsel_semi, on line 2691 of the patched, rebased code, Why is this the
> min of the number of MCVs captured and the distinct values? It seems like if
> clamping resulted in an NDVs that is too low (i.e. impossibly low since the
> number of distinct values cannot be less than the number of MCVs), then you
> should bump it up to at least the number of MCVs:
> clamped_nvalues2 = Min(sslot2->nvalues, nd2);

No, because sslot2->nvalues is the number of distinct values that ANALYZE
found in the whole table. If nd2 got clamped to less than that, it's
because we have a WHERE clause that is filtering the table down to fewer
rows than there are distinct values in the table, so we can be sure
(at least, up to the reliability of the WHERE estimate) that not all of
the recorded MCVs are going to be present in the rows being joined.
We don't know which ones will be present, but it seems like a reasonable
bet that the most common ones will be present. Since the list is already
ordered by decreasing frequency, just taking the first N of them gets us
that. You could imagine trying to refine that, say by reducing the
number further to allow for some of the join input rows containing
non-MCVs, but I don't see an argument for increasing it.

> I also found the new comment added above the new selectivity calculation to be
> a little bit confusing:
> /*
> * We should never estimate the output of a semijoin to be more
> * rows than the equivalent inner join; it's obviously impossible
> * for that to happen. The former is N1 * Psemi while the latter
> * is N1 * N2 * Pinner, so we may clamp Psemi <= N2 * Pinner.
> * Doing this is worthwhile because of the shakier estimation
> * rules we use in eqjoinsel_semi, particularly in cases where it
> * has to punt entirely.
> */
> selec = Min(selec, inner_rel->rows * selec_inner);

> After re-reading it several times, I understood what it
> was doing, however, it would be ideal if somehow the relationship between
> selectivity and cardinality were more clear.

Hm. Maybe the "Psemi" and "Pinner" notation is not helpful ... would
"Ssemi" and "Sinner" be better?

regards, tom lane


From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-11-16 18:31:47
Message-ID: CAAKRu_Y+d3SKkofNK7znhwgLauPUCVyZU=H0mvJH1LG7+tMyXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Thanks for the quick responses. I've put some inline follow-up questions.

On a separate note, I had one additional code clarity feedback. I felt that
eqjoinsel could be reorganized a bit for readability/clarity for the reader.
For example, eqjoinsel_inner uses only the AttStatsSlots up until here and
then
suddenly uses the original stats object and the ndvs which we passed in:

else
{
...
double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;
double nullfrac2 = stats2 ? stats2->stanullfrac : 0.0;

selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
if (nd1 > nd2)
selec /= nd1;
else
selec /= nd2;
}

It would make the process of calculating selectivity for an equijoin more
clear
to the reader if the nullfraction calculation was pulled out into the main
eqjoinsel function.

Having a clear set of steps in eqjoinsel would be helpful. Basically, my
understanding of an overview of the steps is the following:

1) get NDVs
2) get nullfrac
3) get MCVs
4) calculate selectivity

Based on this assumption, I've attached a patch with a rough idea for an
alternative structure that I think would be more clear to the reader.

> > I could not devise an example in which the previous method of calculating
> > selectivity would have produced a better estimate. However, one question
> I have
> > after thinking through the optimization is the following:
> > ...
> > To summarize:
> > Selectivity Type | if nd1 <= nd2 | if nd1 > nd2 |
> > ----------------------------------|----------------|-----------------
> > inner-join selectivity * ntuples2 | ntuples2 / nd2 | ntuples2 / nd1 |
> > semi-join selectivity | 1 | nd2 / nd1 |
>
> Um, mumble. Those functions could be using different values of nd2
> thanks to the clamping logic near the head of eqjoinsel_semi, so I'm
> not sure that the comparison you're making really holds.
>

That's a good point. Taking another look at that clamping logic, I realized
that I don't really understand why that clamping would be done for a
semi-join
and not for an inner join. It seems like for an inner join it is also true
that
the the nd1 cannot be greater than outer rel estimated tuples and nd2 could
not
be greater than inner rel estimated tuples.

Also, I don't understand when vardata2->rel->rows and inner_rel->rows would
be
different. I thought the point of doing this clamping was that, if you have
a
restriction, like the predicate in this subquery select * from foo where a
in
(select b from bar where b > 10); your row estimate for bar and your row
estimate for the rows out for that subquery would be different. However, I
looked at the RelOptInfos for vardata2->rel and inner_rel for this query
and it
seems like they are referencing the same relation and have the same rows
estimate, so I'm confused when the rows would be different.

> If there is a reason to keep the existing formula, then I have an
> additional
> > question about the proposed selectivity calculation:
> > selec = Min(selec, nd2 * selec_inner);
> > When would it be incorrect to instead multiply by inner side NDVs?
>
> I'm confused ... isn't that exactly what this is doing?
>

Sorry, typo, I was asking why
selec = Min(selec, nd2 * selec_inner);
could not be used instead of what is in the patch
selec = Min(selec, inner_rel->rows * selec_inner);

Thanks,
Melanie

Attachment Content-Type Size
suggested_semijoin_selec_refactor.patch application/octet-stream 6.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-11-17 20:22:20
Message-ID: 1224.1542486140@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Melanie Plageman <melanieplageman(at)gmail(dot)com> writes:
> On a separate note, I had one additional code clarity feedback. I felt that
> eqjoinsel could be reorganized a bit for readability/clarity for the reader.
> ...
> Based on this assumption, I've attached a patch with a rough idea for an
> alternative structure that I think would be more clear to the reader.

Hmm, that doesn't really seem like an improvement to me. As things stand,
all the actual calculations are in eqjoinsel_inner/semi; eqjoinsel itself
is only responsible for some preliminary information lookup that's needed
in all cases. My patch expands the amount of "preliminary information"
but doesn't fundamentally change that division of responsibility. It
seems like what you want to do here does change that, and I don't see
the value of breaking down the division. I also don't like the fact
that we might calculate a value that won't be used; admittedly, it's a
pretty cheap calculation so that doesn't matter much, but by the same
token we'd not be saving a lot of code by moving it.

> That's a good point. Taking another look at that clamping logic, I
> realized that I don't really understand why that clamping would be done
> for a semi-join and not for an inner join. It seems like for an inner
> join it is also true that the the nd1 cannot be greater than outer rel
> estimated tuples and nd2 could not be greater than inner rel estimated
> tuples.

The main way that eqjoinsel_inner uses those values is this bit:

* We do not have MCV lists for both sides. Estimate the join
* selectivity as MIN(1/nd1,1/nd2)*(1-nullfrac1)*(1-nullfrac2). This
* is plausible if we assume that the join operator is strict and the
* non-null values are about equally distributed: a given non-null
* tuple of rel1 will join to either zero or N2*(1-nullfrac2)/nd2 rows
* of rel2, so total join rows are at most
* N1*(1-nullfrac1)*N2*(1-nullfrac2)/nd2 giving a join selectivity of
* not more than (1-nullfrac1)*(1-nullfrac2)/nd2.

In the expression N2*(1-nullfrac2)/nd2, all three values are meant to be
measured across the whole of the rel2 relation; if we were to decrease nd2
to reflect the effects of earlier filtering, we'd get an incorrect
selectivity. The same applies to eqjoinsel_semi's calculations about the
outer rel, but *not* to the inner rel, as explained here:

* We clamp nd2 to be not more than what we estimate the inner relation's
* size to be. This is intuitively somewhat reasonable since obviously
* there can't be more than that many distinct values coming from the
* inner rel. The reason for the asymmetry (ie, that we don't clamp nd1
* likewise) is that this is the only pathway by which restriction clauses
* applied to the inner rel will affect the join result size estimate,
* since set_joinrel_size_estimates will multiply SEMI/ANTI selectivity by
* only the outer rel's size. If we clamped nd1 we'd be double-counting
* the selectivity of outer-rel restrictions.

(Here, both "outer rel's size" and "inner rel's size" mean the size after
earlier filtering steps.) So that's why we only clamp nd2 and only do so
in eqjoinsel_semi: in the other three cases, we'd be double-counting the
selectivity of earlier filters if we did that.

So basically the inconsistency here comes from the fact that we define
the meaning of join selectivity differently for inner and semi joins.
I've occasionally wondered if that was a bad choice and we should just
say that selectivity should always be calculated so that the join size
is outer size times inner size times selectivity. But that would
certainly not make for any less need for the selectivity estimator to
do things differently for inner and semi joins, so I am not seeing much
upside to changing it.

> Also, I don't understand when vardata2->rel->rows and inner_rel->rows would
> be different.

vardata2->rel->rows is going to reflect the size (post restriction quals)
of the base relation that var2 came from. inner_rel->rows will be the
same if the inner side of the current join is just that one base relation;
but if the inner side is a lower join of several base relations,
inner_rel->rows will be the size of that join.

>> I'm confused ... isn't that exactly what this is doing?

> Sorry, typo, I was asking why
> selec = Min(selec, nd2 * selec_inner);
> could not be used instead of what is in the patch
> selec = Min(selec, inner_rel->rows * selec_inner);

Because the selectivity is defined as something you multiply the relation
size with, not the number of distinct values within the rel.

regards, tom lane


From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-11-20 20:24:59
Message-ID: CAAKRu_bkmaBkjyCM3bKCrHek9+WeMt+XgRengy42SCphnBVhxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Given that you have addressed all of my feedback and that it's a pretty
low-risk change, I will change the status to "ready for committer".

There are a couple of minor follow-up clarifications inline that relate
mostly
to the questions that I asked in previous emails.

I did have one other question:
Has there been discussion in the past about adding a planner test extension
similar to those in src/test/modules for cardinality estimation? I am
imagining
something that is a "soft" check that either the rows estimation that comes
out
of calc_joinrel_size_estimate is within an expected range (given differing
estimates across machines) or that the selectivity estimate that comes out
of
eqjoinsel is within an expected range. The former seems relatively easy to
do
in a manner similar to the test_predtest extension and the latter seems
like it
could be done even more trivially.

On Sat, Nov 17, 2018 at 12:22 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> (Here, both "outer rel's size" and "inner rel's size" mean the size after
> earlier filtering steps.) So that's why we only clamp nd2 and only do so
> in eqjoinsel_semi: in the other three cases, we'd be double-counting the
> selectivity of earlier filters if we did that.
>
> I just want to make sure I am understanding what the comment is saying: So,
after we calculate the selectivity for inner join, when we return from
calc_joinrel_size_estimate we do this math:

nrows = outer_rows * inner_rows * fkselec * jselec;

and in that equation, the outer and inner rows have been adjusted to account
for any restrictions on the tables, so we don't clamp the ndvs for inner
join
in eqjoinsel_inner. However, for semi-join, that calculation is

nrows = outer_rows * fkselec * jselec;

Which means that we have to adjust the rows of the inner side before we get
here?

> So basically the inconsistency here comes from the fact that we define
> the meaning of join selectivity differently for inner and semi joins.
> I've occasionally wondered if that was a bad choice and we should just
> say that selectivity should always be calculated so that the join size
> is outer size times inner size times selectivity. But that would
> certainly not make for any less need for the selectivity estimator to
> do things differently for inner and semi joins, so I am not seeing much
> upside to changing it.
>

I see what you are saying. I got tangled up in this part of the code, so I
am
inclined to say that it could stand to be more clear. Selectivity is a
ratio,
and, even if you calculate the two sides of the ratio differently, that
doesn't
mean the definition of the ratio should be different.

Also, I wanted to address a question you asked in an earlier email:
You wrote:
> Hm. Maybe the "Psemi" and "Pinner" notation is not helpful ... would
> "Ssemi" and "Sinner" be better?

I think Ssemi and Sinner might be more clear--mostly because we haven't used
P/predicate here or in most of the other selectivity estimation comments
that I
read. Also, in some cases when we have super limited information and make a
guess, the selectivity feels pretty detached from the join predicate.

Thanks!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Date: 2018-11-23 17:17:34
Message-ID: 4905.1542993454@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Melanie Plageman <melanieplageman(at)gmail(dot)com> writes:
> Given that you have addressed all of my feedback and that it's a pretty
> low-risk change, I will change the status to "ready for committer".

Thanks for reviewing!

> Has there been discussion in the past about adding a planner test
> extension similar to those in src/test/modules for cardinality
> estimation? I am imagining something that is a "soft" check that either
> the rows estimation that comes out of calc_joinrel_size_estimate is
> within an expected range (given differing estimates across machines) or
> that the selectivity estimate that comes out of eqjoinsel is within an
> expected range. The former seems relatively easy to do in a manner
> similar to the test_predtest extension and the latter seems like it
> could be done even more trivially.

No, I don't recall any discussion about that. The regression tests in
general embody a lot of checking that the planner makes expected plan
choices: obviously the cases where we do an explicit EXPLAIN do that,
but even where we don't, we'd be likely to get artifacts such as varying
row order if an unexpected plan were chosen. Perhaps there's a use-case
for a lower-level test harness such as you suggest, but I haven't really
felt a need for it.

> On Sat, Nov 17, 2018 at 12:22 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> (Here, both "outer rel's size" and "inner rel's size" mean the size after
>> earlier filtering steps.) So that's why we only clamp nd2 and only do so
>> in eqjoinsel_semi: in the other three cases, we'd be double-counting the
>> selectivity of earlier filters if we did that.

> I just want to make sure I am understanding what the comment is saying: So,
> after we calculate the selectivity for inner join, when we return from
> calc_joinrel_size_estimate we do this math:
> nrows = outer_rows * inner_rows * fkselec * jselec;
> and in that equation, the outer and inner rows have been adjusted to account
> for any restrictions on the tables, so we don't clamp the ndvs for inner
> join in eqjoinsel_inner. However, for semi-join, that calculation is
> nrows = outer_rows * fkselec * jselec;
> Which means that we have to adjust the rows of the inner side before we get
> here?

Yeah. Basically the point is that if we have some WHERE clause that
eliminates rows from the inner side of a semijoin, we can expect that
that means the size of the semijoin result will be smaller than if the
WHERE clause hadn't been there --- because some of the outer-rel rows
only had matches among those excluded rows. But the equation in
calc_joinrel_size_estimate provides no way to factor that in, except
by adjusting the selectivity value, so that's what we do.

> You wrote:
>> Hm. Maybe the "Psemi" and "Pinner" notation is not helpful ... would
>> "Ssemi" and "Sinner" be better?

> I think Ssemi and Sinner might be more clear--mostly because we haven't used
> P/predicate here or in most of the other selectivity estimation comments
> that I
> read. Also, in some cases when we have super limited information and make a
> guess, the selectivity feels pretty detached from the join predicate.

OK, thanks --- I'll have another go at writing that comment.

regards, tom lane