Quick Links

Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs

Lists:	pgsql-hackers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs
Date:	1999-07-07 13:54:11
Message-ID:	25813.931355651@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom, was this done?

This is not done --- I wasn't willing to try to do such a thing by
myself when we were already in 6.5 beta. It's on my todo list for 6.6.

6.5 fails in a different way than 6.4 did, for reasons that I don't
recall offhand, but the only real fix is to restructure the analyzer.

regards, tom lane

>> I believe I've identified the main cause of the peculiar behavior we
>> are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
>> parser bug.
>>
>> Here is the test case I'm looking at:
>>
>> CREATE TABLE si_tmpverifyaccountbalances (
>> type int4 NOT NULL,
>> memberid int4 NOT NULL,
>> categoriesid int4 NOT NULL,
>> amount numeric);
>>
>> CREATE TABLE invoicelinedetails (
>> invoiceid int4,
>> memberid int4,
>> totshippinghandling numeric,
>> invoicelinesid int4);
>>
>> INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
>> memberid, 1, totshippinghandling FROM invoicelinedetails
>> GROUP BY invoiceid+3, memberid, totshippinghandling;
>>
>> ERROR: INSERT has more expressions than target columns
>>
>> The reason this is coming out is that the matching of GROUP BY (also
>> ORDER BY) items to targetlist entries is fundamentally broken in this
>> context. The GROUP BY items "memberid" and "totshippinghandling" are
>> simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
>> in parse_clause.c; what findTargetlistEntry() does with them is to try
>> to match them against the resdom names of the existing targetlist items.
>> I think that's correct behavior in the plain SELECT case (but note it
>> means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
>> --- is that per spec??). But it fails miserably in the INSERT/SELECT
>> case, because by the time control gets here, the targetlist items have
>> been given resdom names *corresponding to the column names of the target
>> table*.
>>
>> So, in the example at hand, "memberid" is matched to the correct column
>> by pure luck (because it has the same name in the destination table),
>> and then "totshippinghandling" is not recognized as one of the existing
>> TLEs because it does not match any destination column name.
>>
>> Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
>> to mean the same thing no matter whether there is an INSERT in front of
>> it or not, and thus that letting target column names affect the meaning
>> of GROUP BY items is dead wrong. (Don't have a spec to check this with,
>> however.)
>>
>> I believe the most reasonable fix for this is to postpone relabeling
>> of the targetlist entries with destination column names until after
>> analysis of the SELECT's subsidiary clauses is complete. In particular,
>> it should *not* be done instantly when each TLE is made, which is what
>> MakeTargetEntryIdent currently does. The TLEs should have the same
>> resnames as in the SELECT case until after subsidiary clause processing
>> is done.
>>
>> (MakeTargetEntryIdent is broken anyway because it tries to associate
>> a destination column with every TLE, even the resjunk ones. The reason
>> we see the quoted error message in this situation is that after
>> findTargetlistEntry fails to detect that totshippinghandling is already
>> a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
>> totshippinghandling, and then MakeTargetEntryIdent tries to find a
>> target column to go with the junk TLE. So the revised code should only
>> assign dest column names to non-junk TLEs.)
>>
>> I'm not really familiar enough with the parser to want to tackle this
>> size of change by myself --- Thomas, do you want to do it? I think it's
>> largely a matter of moving code around, but I'm not sure where is the
>> right place for it...
>>
>> regards, tom lane
>>
>>

> --
> Bruce Momjian | http://www.op.net/~candle
> maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026

From:	Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs
Date:	1999-07-07 16:36:33
Message-ID:	199907071636.MAA01188@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom, was this done?
>
> This is not done --- I wasn't willing to try to do such a thing by
> myself when we were already in 6.5 beta. It's on my todo list for 6.6.

On your list. Good. I can't possibly figure out how to describe this
bug.

>
> 6.5 fails in a different way than 6.4 did, for reasons that I don't
> recall offhand, but the only real fix is to restructure the analyzer.
>
> regards, tom lane
>
>
> >> I believe I've identified the main cause of the peculiar behavior we
> >> are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
> >> parser bug.
> >>
> >> Here is the test case I'm looking at:
> >>
> >> CREATE TABLE si_tmpverifyaccountbalances (
> >> type int4 NOT NULL,
> >> memberid int4 NOT NULL,
> >> categoriesid int4 NOT NULL,
> >> amount numeric);
> >>
> >> CREATE TABLE invoicelinedetails (
> >> invoiceid int4,
> >> memberid int4,
> >> totshippinghandling numeric,
> >> invoicelinesid int4);
> >>
> >> INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
> >> memberid, 1, totshippinghandling FROM invoicelinedetails
> >> GROUP BY invoiceid+3, memberid, totshippinghandling;
> >>
> >> ERROR: INSERT has more expressions than target columns
> >>
> >> The reason this is coming out is that the matching of GROUP BY (also
> >> ORDER BY) items to targetlist entries is fundamentally broken in this
> >> context. The GROUP BY items "memberid" and "totshippinghandling" are
> >> simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
> >> in parse_clause.c; what findTargetlistEntry() does with them is to try
> >> to match them against the resdom names of the existing targetlist items.
> >> I think that's correct behavior in the plain SELECT case (but note it
> >> means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
> >> --- is that per spec??). But it fails miserably in the INSERT/SELECT
> >> case, because by the time control gets here, the targetlist items have
> >> been given resdom names *corresponding to the column names of the target
> >> table*.
> >>
> >> So, in the example at hand, "memberid" is matched to the correct column
> >> by pure luck (because it has the same name in the destination table),
> >> and then "totshippinghandling" is not recognized as one of the existing
> >> TLEs because it does not match any destination column name.
> >>
> >> Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
> >> to mean the same thing no matter whether there is an INSERT in front of
> >> it or not, and thus that letting target column names affect the meaning
> >> of GROUP BY items is dead wrong. (Don't have a spec to check this with,
> >> however.)
> >>
> >> I believe the most reasonable fix for this is to postpone relabeling
> >> of the targetlist entries with destination column names until after
> >> analysis of the SELECT's subsidiary clauses is complete. In particular,
> >> it should *not* be done instantly when each TLE is made, which is what
> >> MakeTargetEntryIdent currently does. The TLEs should have the same
> >> resnames as in the SELECT case until after subsidiary clause processing
> >> is done.
> >>
> >> (MakeTargetEntryIdent is broken anyway because it tries to associate
> >> a destination column with every TLE, even the resjunk ones. The reason
> >> we see the quoted error message in this situation is that after
> >> findTargetlistEntry fails to detect that totshippinghandling is already
> >> a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
> >> totshippinghandling, and then MakeTargetEntryIdent tries to find a
> >> target column to go with the junk TLE. So the revised code should only
> >> assign dest column names to non-junk TLEs.)
> >>
> >> I'm not really familiar enough with the parser to want to tackle this
> >> size of change by myself --- Thomas, do you want to do it? I think it's
> >> largely a matter of moving code around, but I'm not sure where is the
> >> right place for it...
> >>
> >> regards, tom lane
> >>
> >>
>
>
> > --
> > Bruce Momjian | http://www.op.net/~candle
> > maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> > + If your life is a hard drive, | 830 Blythe Avenue
> > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
>

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026