Re: Rules and Views

Lists: pgsql-hackers
From: "Zeugswetter Andreas SB SD" <ZeugswetterA(at)spardat(dot)at>
To: "Curt Sampson" <cjs(at)cynic(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Hannu Krosing" <hannu(at)tm(dot)ee>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rules and Views
Date: 2002-08-01 10:23:03
Message-ID: 46C15C39FEB2C44BA555E356FBCD6FA48879A6@m0114.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> But the message I was replying to was a similar union query, and I was
> thinking that that person might be having a similar initial intuitive
> reaction, "well, it looks kinda the same." I just wanted to note that
> you need to check this stuff with explain, rather than
> blindly assuming
> you know what's going on.

I had a "union all" view, which is actually a quite different animal than
a "union" view which needs to eliminate duplicates before further processing.

Andreas


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannu(at)tm(dot)ee>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rules and Views
Date: 2002-08-01 10:29:04
Message-ID: Pine.NEB.4.44.0208011928380.539-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:

> I had a "union all" view, which is actually a quite different animal than
> a "union" view which needs to eliminate duplicates before further processing.

I had the same problem with UNION ALL.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC


From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rules and Views
Date: 2002-08-01 13:58:45
Message-ID: 1028210325.12593.5.camel@taru.tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:
> On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:
>
> > I had a "union all" view, which is actually a quite different animal than
> > a "union" view which needs to eliminate duplicates before further processing.
>
> I had the same problem with UNION ALL.
>

Could someone give an example where it is not safe to push the WHERE
clause down to individual parts of UNION (or UNION ALL) wher these parts
are simple (non-aggregate) queries?

I can see that it has to be made into HAVING in subquery if UNION's
subqueries are aggregate (GROUP BY) queries, but can anyone give an
example where the meaning of the query changes for non-aggregate
subqueries.

---------------
Hannu


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Hannu Krosing <hannu(at)tm(dot)ee>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rules and Views
Date: 2002-08-01 14:43:57
Message-ID: 20020801073659.T28695-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 Aug 2002, Hannu Krosing wrote:

> On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:
> > On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:
> >
> > > I had a "union all" view, which is actually a quite different animal than
> > > a "union" view which needs to eliminate duplicates before further processing.
> >
> > I had the same problem with UNION ALL.
> >
>
> Could someone give an example where it is not safe to push the WHERE
> clause down to individual parts of UNION (or UNION ALL) wher these parts
> are simple (non-aggregate) queries?

For union, queries that want to do something like use a temporary
sequence to act sort of like rownum and do row limiting. Admittedly
that's already pretty much unspecified behavior, but it does change
the behavior in the place of duplicate removal. In addition, I think
using bits of the spec we don't completely support you can have the
same issue with the undefined behavior of which duplicate is returned
for values that aren't the same but are equal, for example where the
duplicate removal is in one collation but the outer comparison has
a different explicitly given one.

I haven't come up with any useful examples, and not really any for
union all, however.


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Hannu Krosing <hannu(at)tm(dot)ee>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rules and Views
Date: 2002-08-01 15:04:03
Message-ID: 20020801075045.X28826-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 1 Aug 2002, Stephan Szabo wrote:

> On 1 Aug 2002, Hannu Krosing wrote:
>
> > On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:
> > > On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:
> > >
> > > > I had a "union all" view, which is actually a quite different animal than
> > > > a "union" view which needs to eliminate duplicates before further processing.
> > >
> > > I had the same problem with UNION ALL.
> > >
> >
> > Could someone give an example where it is not safe to push the WHERE
> > clause down to individual parts of UNION (or UNION ALL) wher these parts
> > are simple (non-aggregate) queries?
>
> For union, queries that want to do something like use a temporary
> sequence to act sort of like rownum and do row limiting. Admittedly
> that's already pretty much unspecified behavior, but it does change
> the behavior in the place of duplicate removal. In addition, I think
> using bits of the spec we don't completely support you can have the
> same issue with the undefined behavior of which duplicate is returned
> for values that aren't the same but are equal, for example where the
> duplicate removal is in one collation but the outer comparison has
> a different explicitly given one.

Replying to myself, you can do this right now with char columns if you
just push the conditions down blindly, something like:

create table t1(a char(5));
create table t2(a char(6));

insert into t1 values ('aaaaa');
insert into t2 values ('aaaaa');

select * from (select * from t2 union select * from t1) as f where
a::text='aaaaa';
select * from (select * from t2 where a::text='aaaaa' union
select * from t1 where a::text='aaaaa') as f;

The first select gives no rows, the second gives one. We'd have
to transform the second where clause to something like
cast(a as char(6))::text='aaaaa' in order to get the same effect
I think.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: Hannu Krosing <hannu(at)tm(dot)ee>, Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rules and Views
Date: 2002-08-01 16:02:30
Message-ID: 7905.1028217750@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> For union, queries that want to do something like use a temporary
> sequence to act sort of like rownum and do row limiting. Admittedly
> that's already pretty much unspecified behavior, but it does change
> the behavior in the place of duplicate removal. In addition, I think
> using bits of the spec we don't completely support you can have the
> same issue with the undefined behavior of which duplicate is returned
> for values that aren't the same but are equal, for example where the
> duplicate removal is in one collation but the outer comparison has
> a different explicitly given one.

Hmm. I think this consideration boils down to whether the WHERE clause
can give different results for rows that appear equal under the rules of
UNION/EXCEPT/INTERSECT. If it gives the same result for any two such
rows, then it's safe to push down; otherwise not.

It's not too difficult to come up with examples. I invite you to play
with

select z,length(z) from
(select 'abc '::char(7) as z intersect
select 'abc '::char(8) as z) ss;

and contemplate the effects of pushing down a qual involving length(z).

Whether this particular case is very important in the real world is hard
to say. But there might be more-important cases out there.

And yet, I think we can do it anyway. The score card looks like this to
me:

UNION ALL: always safe to push down, since the rows will be passed
independently to the outer WHERE anyway.

UNION: it's unspecified which of a set of "equal" rows will be returned,
and therefore the behavior would be unspecified anyway if the outer
WHERE can distinguish the rows - you might get 1 row of the set out or
none. If we push down, then we create a situation where the returned
row will always be one that passes the outer WHERE, but that is a legal
behavior.

INTERSECT: again it's unspecified which of a set of "equal" rows will be
returned, and so you might get 1 row out or none. If we push down then
it's still unspecified whether you get a row out (example: if the outer
WHERE will pass only for rows of the left table and not the right, then
push down will result in no rows of the "equal" set being emitted, but
that's a legal behavior).

INTERSECT ALL: if a set of "equal" rows contains M rows from the left
table and N from the right table, you're supposed to get min(M,N) rows
of the set out of the INTERSECT ALL. Again you can't say which of the
set you will get, so the outer WHERE might let anywhere between 0 and
min(M,N) rows out. With push down, M and N will be reduced by the WHERE
before we do the intersection, so you still have 0 to min(M,N) rows out.
The behavior will change, but it's still legal per spec AFAICT.

EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

In short, it looks to me like the spec was carefully designed to allow
push down. Pushing down a condition of this sort *does* change the
behavior, but the new behavior is still within spec.

The above analysis assumes that the WHERE condition is "stable", ie its
results for a row don't depend on the order in which the rows are tested
or anything as weird as that. But we're assuming that already when we
push down a qual in a non-set-operation case, I think.

Comments? Are there any other considerations to worry about?

regards, tom lane


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)tm(dot)ee>, Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rules and Views
Date: 2002-08-01 16:27:58
Message-ID: 20020801091507.Y29244-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 1 Aug 2002, Tom Lane wrote:

> Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> > For union, queries that want to do something like use a temporary
> > sequence to act sort of like rownum and do row limiting. Admittedly
> > that's already pretty much unspecified behavior, but it does change
> > the behavior in the place of duplicate removal. In addition, I think
> > using bits of the spec we don't completely support you can have the
> > same issue with the undefined behavior of which duplicate is returned
> > for values that aren't the same but are equal, for example where the
> > duplicate removal is in one collation but the outer comparison has
> > a different explicitly given one.
>
> Hmm. I think this consideration boils down to whether the WHERE clause
> can give different results for rows that appear equal under the rules of
> UNION/EXCEPT/INTERSECT. If it gives the same result for any two such
> rows, then it's safe to push down; otherwise not.
>
> It's not too difficult to come up with examples. I invite you to play
> with
>
> select z,length(z) from
> (select 'abc '::char(7) as z intersect
> select 'abc '::char(8) as z) ss;
>
> and contemplate the effects of pushing down a qual involving length(z).
>
> Whether this particular case is very important in the real world is hard
> to say. But there might be more-important cases out there.
>
> And yet, I think we can do it anyway. The score card looks like this to
> me:
>
> UNION ALL: always safe to push down, since the rows will be passed
> independently to the outer WHERE anyway.
>
> UNION: it's unspecified which of a set of "equal" rows will be returned,
> and therefore the behavior would be unspecified anyway if the outer
> WHERE can distinguish the rows - you might get 1 row of the set out or
> none. If we push down, then we create a situation where the returned
> row will always be one that passes the outer WHERE, but that is a legal
> behavior.
>
> INTERSECT: again it's unspecified which of a set of "equal" rows will be
> returned, and so you might get 1 row out or none. If we push down then
> it's still unspecified whether you get a row out (example: if the outer
> WHERE will pass only for rows of the left table and not the right, then
> push down will result in no rows of the "equal" set being emitted, but
> that's a legal behavior).
>
> INTERSECT ALL: if a set of "equal" rows contains M rows from the left
> table and N from the right table, you're supposed to get min(M,N) rows
> of the set out of the INTERSECT ALL. Again you can't say which of the
> set you will get, so the outer WHERE might let anywhere between 0 and
> min(M,N) rows out. With push down, M and N will be reduced by the WHERE
> before we do the intersection, so you still have 0 to min(M,N) rows out.
> The behavior will change, but it's still legal per spec AFAICT.
>

> EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

Actually I think in except you may only push down to the left, since in
this case you know that any duplicate from the right will not be
returned (since there must be none). So, you can't potentially drop
a row from the right side that may have been a duplicate of a left
side row that does match the condition.

If we assume two collations one case sensitive one not with the
except in the non-sensitive and the where in the sensitive and
a left with 'A' and right with 'a', it'd be incorrect to push a
case sensitive where foo='A' down to the right since that'd change the
output from zero rows to one.

Something similar for except all since lowering the number of rows
on the right can increase the number of returned rows above
m-n (if say all m dups match the condition and none of n do)

> The above analysis assumes that the WHERE condition is "stable", ie its
> results for a row don't depend on the order in which the rows are tested
> or anything as weird as that. But we're assuming that already when we
> push down a qual in a non-set-operation case, I think.

In which case we don't have to worry about the nextval() case.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: Hannu Krosing <hannu(at)tm(dot)ee>, Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rules and Views
Date: 2002-08-01 16:42:01
Message-ID: 8129.1028220121@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> Actually I think in except you may only push down to the left, since in
> this case you know that any duplicate from the right will not be
> returned (since there must be none). So, you can't potentially drop
> a row from the right side that may have been a duplicate of a left
> side row that does match the condition.

But we *want* to push down --- the point is to get some selectivity
into the bottom queries. You're right that in a plain EXCEPT it would
be possible to push only to the left, but that doesn't give the
performance improvement we want.

> If we assume two collations one case sensitive one not with the
> except in the non-sensitive and the where in the sensitive and
> a left with 'A' and right with 'a', it'd be incorrect to push a
> case sensitive where foo='A' down to the right since that'd change the
> output from zero rows to one.

You missed my point. Per spec, either zero or one rows out of the whole
thing is okay, because either the 'A' or the 'a' row might be returned
as the representative row for the group by the EXCEPT. Yes, the
behavior may change, but it's still within spec.

> In which case we don't have to worry about the nextval() case.

Yeah, I think nextval() and random() and so forth can be ignored;
the transformations we already do will confuse the results for such
cases, so one more isn't gonna make it worse.

regards, tom lane


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)tm(dot)ee>, Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rules and Views
Date: 2002-08-01 17:19:07
Message-ID: 20020801100955.K29686-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 1 Aug 2002, Tom Lane wrote:

> Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:

> > If we assume two collations one case sensitive one not with the
> > except in the non-sensitive and the where in the sensitive and
> > a left with 'A' and right with 'a', it'd be incorrect to push a
> > case sensitive where foo='A' down to the right since that'd change the
> > output from zero rows to one.
>
> You missed my point. Per spec, either zero or one rows out of the whole
> thing is okay, because either the 'A' or the 'a' row might be returned
> as the representative row for the group by the EXCEPT. Yes, the
> behavior may change, but it's still within spec.

Except can't return 'A' or 'a', there is no representative row because
n>0. That's the difference with UNION and INTERSECT.

"If EXCEPT is specified, then
Case:
A) If m>0 and n=0, then T contains exactly one duplicate of R.
B) Otherwise, T contains no duplicate of R."

So if T1 has a #dups>0 and T2 has a #dups>0 we should get
no rows, but what if T1' (with the clause) has a #dups>0 but
T2' has a #dups=0?


From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rules and Views
Date: 2002-08-01 17:58:29
Message-ID: 1028224710.12592.40.camel@taru.tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-08-01 at 18:02, Tom Lane wrote:
> Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> > For union, queries that want to do something like use a temporary
> > sequence to act sort of like rownum and do row limiting. Admittedly
> > that's already pretty much unspecified behavior, but it does change
> > the behavior in the place of duplicate removal. In addition, I think
> > using bits of the spec we don't completely support you can have the
> > same issue with the undefined behavior of which duplicate is returned
> > for values that aren't the same but are equal, for example where the
> > duplicate removal is in one collation but the outer comparison has
> > a different explicitly given one.
>
> Hmm. I think this consideration boils down to whether the WHERE clause
> can give different results for rows that appear equal under the rules of
> UNION/EXCEPT/INTERSECT.

Yes. I originally started to ponder this when trying to draw up a plan
for automatic generation of ON UPDATE DO INSTEAD rules for views. While
pushing down the WHERE clause is just a performance thing for SELECT it
is essential for ON UPDATE rules.

> If it gives the same result for any two such
> rows, then it's safe to push down; otherwise not.
>
> It's not too difficult to come up with examples. I invite you to play
> with
>
> select z,length(z) from
> (select 'abc '::char(7) as z intersect
> select 'abc '::char(8) as z) ss;
>
> and contemplate the effects of pushing down a qual involving length(z).

I guess the pushdown must also push implicit conversions done to parts
of union.

if that conversion were applied to z's in both parts of UNION then the
result should be the same.

select z,length(z) from
(
select 'abc '::char(7) as z
union
select 'abc '::char(8) as z
) ss where length(z) = 7;

becomes:

select z,length(z) from
(
select 'abc '::char(7) as z
where length(cast('abc '::char(7) as char(7))) = 7
union
select 'abc '::char(8) as z
where length(cast('abc '::char(8) as char(7))) = 7
) ss ;

which both return 'abc ', 7

Of course it is beneficial to detect when the conversion is not needed,
so that indexes will be used if available.

---------------
Hannu


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: Hannu Krosing <hannu(at)tm(dot)ee>, Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rules and Views
Date: 2002-08-01 18:59:40
Message-ID: 13424.1028228380@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> So if T1 has a #dups>0 and T2 has a #dups>0 we should get
> no rows, but what if T1' (with the clause) has a #dups>0 but
> T2' has a #dups=0?

Um, you're right --- pushing down into the right-hand side would reduce
N, thereby possibly *increasing* the number of output rows not reducing
it. My mistake ... should have worked out the EXCEPT case in more
detail.

This says that we can't push down at all in the EXCEPT ALL case, I
think, and I'm leery about whether we should push for EXCEPT. But
the UNION and INTERSECT cases are probably the important ones anyway.

regards, tom lane


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)tm(dot)ee>, Curt Sampson <cjs(at)cynic(dot)net>, Zeugswetter Andreas SB SD <ZeugswetterA(at)spardat(dot)at>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rules and Views
Date: 2002-08-01 20:01:00
Message-ID: 20020801120226.X30842-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 1 Aug 2002, Tom Lane wrote:

> Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> > So if T1 has a #dups>0 and T2 has a #dups>0 we should get
> > no rows, but what if T1' (with the clause) has a #dups>0 but
> > T2' has a #dups=0?
>
> Um, you're right --- pushing down into the right-hand side would reduce
> N, thereby possibly *increasing* the number of output rows not reducing
> it. My mistake ... should have worked out the EXCEPT case in more
> detail.
>
> This says that we can't push down at all in the EXCEPT ALL case, I
> think, and I'm leery about whether we should push for EXCEPT. But
> the UNION and INTERSECT cases are probably the important ones anyway.

I think that we can push to the left in both (should is a separate issue).

If the condition is true for all of the left hand dups, we can
choose to have emitted such rows as the output of the EXCEPT ALL in
the theoretical case so that the output is the same, max(0, m-n) rows.
If the condition is false for any of the left hand dups, we can safely
return any number of rows between 0 and max(0,m-n) rows since we can
say that the difference were rows that failed the where clause. If
we push the condition down, we'll get some number m1 rows that succeed
the condition (with m1<m), so returning max(0, m1-n) should be safe.
If the condition is false for all of the rows, m1=0 so we'll correctly
return no rows.

I think.