Lists: | pgsql-hackers |
---|
From: | Keith Parks <emkxp01(at)mtcc(dot)demon(dot)co(dot)uk> |
---|---|
To: | szybist(at)boxhill(dot)com |
Cc: | maillist(at)candle(dot)pha(dot)pa(dot)us, hackers(at)postgresql(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-08-31 19:19:29 |
Message-ID: | 199808311919.UAA01013@mtcc.demon.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thomas A. Szybist <szybist(at)boxhill(dot)com>
>
> >
> > If I compile backend/catalog with -O2 then the table creation is
> ^^^
> > OK. So it looks like it may be indexing.c, even with Bruce's
> > recent fixes.
>
> Do you mean -O0 here?
>
Yes, a typo, I used -O0 for this dir.
>
> I managed to get this running on a Solaris box. -O2 was not included
> by default (wonder why :)). I got a core dump when running initdb
> with -O2. I recompiled indexing.c without -O2, and it is much better.
> (I basically get the same results as under Linux.) I get the same
> core dumps that Keith is seeing with create function.
>
> So, both my Sparc boxes are behaving the same.
>
I've not got round to trying a build on my Solaris 2.6 box yet. I was
hoping that someone with something faster than a SPARC 2 would do
the biz and get the same results.
So we have at least two problems, some code that is tickling a gcc
optimiser bug (gcc 2.7.2.1 in my case) and an alignment bug in our
code that affects SPARC architecture.
I've half a mind to see if there is a later version of gcc that
does the optimisation correctly. (rpm format for Redhat 4.2)
The "create function" problem is a little harder for me to see
a way forward. ( my debugging skills are very few.)
Keith.
From: | "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu> |
---|---|
To: | Keith Parks <emkxp01(at)mtcc(dot)demon(dot)co(dot)uk>, maillist(at)candle(dot)pha(dot)pa(dot)us |
Cc: | szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-01 04:35:37 |
Message-ID: | 35EB7999.ACB12121@alumni.caltech.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> The "create function" problem is a little harder for me to see
> a way forward. ( my debugging skills are very few.)
Hmm. Bruce's most recent patches didn't fix my problems on Linux/i686
reported earlier. So I figured I'd try a full build with -O0 just to see
if it helped. Not only did it not help, but I got several other
regression tests failing, some with core dumps which did not crash with
-O2. Weird.
- Tom
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk |
Cc: | szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-01 06:06:40 |
Message-ID: | 199809010606.CAA28809@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> Thomas A. Szybist <szybist(at)boxhill(dot)com>
> >
> > >
> > > If I compile backend/catalog with -O2 then the table creation is
> > ^^^
> > > OK. So it looks like it may be indexing.c, even with Bruce's
> > > recent fixes.
> >
> > Do you mean -O0 here?
> >
>
> Yes, a typo, I used -O0 for this dir.
>
Can you try:
select * from pg_index;
Crashes here. Not good.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk |
Cc: | hackers(at)postgreSQL(dot)org (PostgreSQL-development) |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-01 06:23:55 |
Message-ID: | 199809010623.CAA00709@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> Thomas A. Szybist <szybist(at)boxhill(dot)com>
> >
> > >
> > > If I compile backend/catalog with -O2 then the table creation is
> > ^^^
> > > OK. So it looks like it may be indexing.c, even with Bruce's
> > > recent fixes.
> >
> > Do you mean -O0 here?
> >
>
> Yes, a typo, I used -O0 for this dir.
One idea is to track heapDescriptor from CatalogIndexInsert() all the
way down into the lower functions.
Compile with assert checking, which I assume you are already doing.
Add this "Assert(heapDescriptor->natts != 0)" to the function, and in
lower functions substitute heapDescriptor with the new variable name it
took as a function parameter.)
When the assert fails, we can see where it is getting messed up.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk |
Cc: | szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-01 16:19:48 |
Message-ID: | 199809011619.MAA09113@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> Thomas A. Szybist <szybist(at)boxhill(dot)com>
> >
> > >
> > > If I compile backend/catalog with -O2 then the table creation is
> > ^^^
> > > OK. So it looks like it may be indexing.c, even with Bruce's
> > > recent fixes.
> >
> > Do you mean -O0 here?
> >
>
> Yes, a typo, I used -O0 for this dir.
Can we try a simple -O rather than just -O2 and -O0. Could this be some
type of optimizer bug in gcc2/Solaris?
Everything is pointing to indexing.c, from both the initdb failure and
the create function failure. But I can't see anything wrong in there,
and other platforms seem to be OK.
Someone mentioned that Solaris does not use -O2 by default, and there
may be a good reason for this.
The new code is more streamlined from the megapatch, and perhaps the
optimizer is now able to do too much optimizing.
I don't want to go casting blame other places, but I think gcc that may
be the cause, and if it is, we just need to lower the default
optimization for those platforms.
Let's try adding Assert checking with the configure --enable-cassert
option, and compiling with -O rather than -O2 and see what happens.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu> |
---|---|
To: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
Cc: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-01 16:55:05 |
Message-ID: | 35EC26E9.15E93657@alumni.caltech.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> Can we try a simple -O rather than just -O2 and -O0. Could this be
> some type of optimizer bug in gcc2/Solaris?
> Everything is pointing to indexing.c, from both the initdb failure and
> the create function failure. But I can't see anything wrong in there,
> and other platforms seem to be OK.
Uh, no, Linux/i686 is showing trouble too, but not in the initdb stage.
The Sparc platforms will be more sensitive to byte alignment problems,
especially within C structures, so this may be illustrating a
cross-platform problem more clearly.
There is a repeatable indexing and (perhaps) caching problem I see in
the regression tests. Annoyingly, the problems get slightly worse at the
moment when compiling with -O0.
There is a fundamental problem lurking somewhere, and there may not be
much point in going beta unless you think that more testers will help to
track down the problem.
- Tom
From: | David Hartwig <daveh(at)insightdist(dot)com> |
---|---|
To: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>, Andreas Zeugswetter <andreas(dot)zeugswetter(at)telecom(dot)at> |
Cc: | hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-01 17:33:49 |
Message-ID: | 35EC2FFC.F5E92F36@insightdist.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Bruce Momjian wrote:
> > Thomas A. Szybist <szybist(at)boxhill(dot)com>
> > >
> > > >
> > > > If I compile backend/catalog with -O2 then the table creation is
> > > ^^^
> > > > OK. So it looks like it may be indexing.c, even with Bruce's
> > > > recent fixes.
> > >
> > > Do you mean -O0 here?
> > >
> >
> > Yes, a typo, I used -O0 for this dir.
>
> Can we try a simple -O rather than just -O2 and -O0. Could this be some
> type of optimizer bug in gcc2/Solaris?
>
> Everything is pointing to indexing.c, from both the initdb failure and
> the create function failure. But I can't see anything wrong in there,
> and other platforms seem to be OK.
>
Bruce,
I do not know if this problem is related in any way, but I have a serious
problem on AIX 4.1. I am jumping in here because there is a chance they are
related. Just manifest differently.
If I add an index to a table I can no longer use the table. In essence,
the look up on relname in pg_class fails. If I:
SELECT * FROM pg_class WHERE relname = 'table_i_just_indexed'
-- or \d table_i_just_indexed
I get no results.
SELECT * FROM pg_class
Displays it perfectly. So does:
SELECT * FROM pg_class WHERE relname like '%table_i_just_indexed'
If I manually correct relname:
UPDATE pg_class SET relname = 'table_i_just_indexed' WHERE relname like
'%table_i_just_indexed'
Everything seems to function normally again.
I can not reproduce on my Linux box. Assertions show nothing. This can't be
good.
Andreas, are you having any success?
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | daveh(at)insightdist(dot)com (David Hartwig) |
Cc: | andreas(dot)zeugswetter(at)telecom(dot)at, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-01 18:40:19 |
Message-ID: | 199809011840.OAA05599@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> > Can we try a simple -O rather than just -O2 and -O0. Could this be some
> > type of optimizer bug in gcc2/Solaris?
> >
> > Everything is pointing to indexing.c, from both the initdb failure and
> > the create function failure. But I can't see anything wrong in there,
> > and other platforms seem to be OK.
> >
>
> Bruce,
>
> I do not know if this problem is related in any way, but I have a serious
> problem on AIX 4.1. I am jumping in here because there is a chance they are
> related. Just manifest differently.
>
> If I add an index to a table I can no longer use the table. In essence,
> the look up on relname in pg_class fails. If I:
> SELECT * FROM pg_class WHERE relname = 'table_i_just_indexed'
> -- or \d table_i_just_indexed
> I get no results.
> SELECT * FROM pg_class
> Displays it perfectly. So does:
> SELECT * FROM pg_class WHERE relname like '%table_i_just_indexed'
> If I manually correct relname:
> UPDATE pg_class SET relname = 'table_i_just_indexed' WHERE relname like
> '%table_i_just_indexed'
> Everything seems to function normally again.
>
> I can not reproduce on my Linux box. Assertions show nothing. This can't be
> good.
Wow, this is terribly frustrating. All these problems, and I can't
reproduce any of them here.
I sure hope they all have one cause, and I hope we find the cause soon.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | lockhart(at)alumni(dot)caltech(dot)edu (Thomas G(dot) Lockhart) |
Cc: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 01:32:33 |
Message-ID: | 199809020132.VAA09102@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> > Can we try a simple -O rather than just -O2 and -O0. Could this be
> > some type of optimizer bug in gcc2/Solaris?
> > Everything is pointing to indexing.c, from both the initdb failure and
> > the create function failure. But I can't see anything wrong in there,
> > and other platforms seem to be OK.
>
> Uh, no, Linux/i686 is showing trouble too, but not in the initdb stage.
> The Sparc platforms will be more sensitive to byte alignment problems,
> especially within C structures, so this may be illustrating a
> cross-platform problem more clearly.
>
> There is a repeatable indexing and (perhaps) caching problem I see in
> the regression tests. Annoyingly, the problems get slightly worse at the
> moment when compiling with -O0.
>
> There is a fundamental problem lurking somewhere, and there may not be
> much point in going beta unless you think that more testers will help to
> track down the problem.
OK, let me send you my regression output, and perhaps you can see the
problem in there so I can debug it.
We have a configure problem too, that Tatsuo pointed out. I removed my
config.cache, and tried to run configure, and got this. I added 'set
-x' to help.
+ pwd
+ test -z /usr/ucb:/sbin:/usr/sbin:/bin:/usr/bin:/usr/contrib/bin:/usr/X11/bin:/usr/local/postgres/bin:/usr/local/bin:/usr/local/sbin:.:/usr/local/sbin:.:/usr/local/src/pgsql/pgsql/src
+ test -f /usr/ucb /sbin /usr/sbin /bin /usr/bin /usr/contrib/bin /usr/X11/bin /usr/local/postgres/bin /usr/local/bin /usr/local/sbin . /usr/local/sbin . /usr/local/src/pgsql/pgsql/src/install-sh
test: syntax error: Undefined error: 0
+ IFS=
+ INSTALL=
+ test -n
+ echo no
no
+ test -n
+ test -n
+ INSTALL=NONE
+ test NONE = NONE
+ echo - No Install Script found - aborting.
- No Install Script found - aborting.
+ exit 0
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | maillist(at)candle(dot)pha(dot)pa(dot)us (Bruce Momjian) |
Cc: | lockhart(at)alumni(dot)caltech(dot)edu, emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 02:50:23 |
Message-ID: | 199809020250.WAA10638@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> OK, let me send you my regression output, and perhaps you can see the
> problem in there so I can debug it.
As far as the beta, I am totally confused about the problems we are
having, and I don't know what to suggest. Perhaps you can see the
problems in my regression output.
>
> We have a configure problem too, that Tatsuo pointed out. I removed my
> config.cache, and tried to run configure, and got this. I added 'set
> -x' to help.
>
OK, I have fixed the configure problem. The path has to have
directories separated by space, not colons. Works now.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | lockhart(at)alumni(dot)caltech(dot)edu (Thomas G(dot) Lockhart) |
Cc: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 03:43:23 |
Message-ID: | 199809020343.XAA17256@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> > Can we try a simple -O rather than just -O2 and -O0. Could this be
> > some type of optimizer bug in gcc2/Solaris?
> > Everything is pointing to indexing.c, from both the initdb failure and
> > the create function failure. But I can't see anything wrong in there,
> > and other platforms seem to be OK.
>
> Uh, no, Linux/i686 is showing trouble too, but not in the initdb stage.
> The Sparc platforms will be more sensitive to byte alignment problems,
> especially within C structures, so this may be illustrating a
> cross-platform problem more clearly.
>
> There is a repeatable indexing and (perhaps) caching problem I see in
> the regression tests. Annoyingly, the problems get slightly worse at the
> moment when compiling with -O0.
>
OK, here is my regression output. Do you see anything strange in there?
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/plain | 33.5 KB |
From: | "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu> |
---|---|
To: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
Cc: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 05:41:08 |
Message-ID: | 35ECDA74.E804E222@alumni.caltech.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> > Uh, no, Linux/i686 is showing trouble too, but not in the initdb
> > stage. The Sparc platforms will be more sensitive to byte alignment
> > problems, especially within C structures, so this may be
> > illustrating a cross-platform problem more clearly.
> > There is a repeatable indexing and (perhaps) caching problem I see
> > in the regression tests. Annoyingly, the problems get slightly worse
> > at the moment when compiling with -O0.
> OK, here is my regression output. Do you see anything strange in
> there?
Well, yes, just not as strange as my tests :) You don't have int8
enabled, and if your compiler and libc allow it I'd like to get that
going. But that isn't a problem.
You have a core dump from the "having" test. Is that a known problem
with someone working on a solution? The test worked on my ~month-old
development tree (I could probably figure out the vintage of that tree
to more precision if it would be helpful), so something has happened in
the meantime.
I suspect that (possibly) more than one thing is going on, since there
were some changes directly related to removing the oddball OID types as
well as perhaps cleanup changes made while traversing the code.
Something may have crept in there.
If these tests are the only ones showing problems on your machine, then
consider yourself lucky. I've got several more failures, including the
one where I can't create indices on a table until after terminating and
restarting the session. The Sparc contingent sees more problems than I,
but they are on a Risc machine so will see alignment problems if they
are present.
- Tom
> ====== int8 ======
> --- expected/int8.out Sun Aug 23 11:09:38 1998
> +++ results/int8.out Tue Sep 1 23:40:10 1998
> @@ -6,110 +6,110 @@
> QUERY: INSERT INTO INT8_TBL VALUES('4567890123456789','-4567890123456789');
> QUERY: SELECT * FROM INT8_TBL;
> q1| q2
> -----------------+-----------------
> +----------+-----------
> 123| 456
> - 123| 4567890123456789
> -4567890123456789| 123
> -4567890123456789| 4567890123456789
> -4567890123456789|-4567890123456789
> + 123| 2147483647
> +2147483647| 123
> +2147483647| 2147483647
> +2147483647|-2147483648
> (5 rows)
> ====== select_having ======
> --- expected/select_having.out Sat Aug 29 00:10:03 1998
> +++ results/select_having.out Tue Sep 1 23:41:51 1998
> @@ -11,27 +11,6 @@
> QUERY: INSERT INTO test_having VALUES (9, 4, 'CCCC', 'j');
> QUERY: SELECT max(a) FROM test_having
> GROUP BY lower(c) HAVING count(*) > 2 OR min(b) = 3;
<snip>
> -QUERY: DROP TABLE test_having;
> +pqReadData() -- backend closed the channel unexpectedly.
> + This probably means the backend terminated abnormally before or while processing the request.
> +We have lost the connection to the backend, so further processing is impossible. Terminating.
OK, this test did not fail on my development tree from a month ago. What
changed? I'm seeing it fail here also.
- Tom
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | lockhart(at)alumni(dot)caltech(dot)edu (Thomas G(dot) Lockhart) |
Cc: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 14:39:58 |
Message-ID: | 199809021439.KAA01159@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> > > Uh, no, Linux/i686 is showing trouble too, but not in the initdb
> > > stage. The Sparc platforms will be more sensitive to byte alignment
> > > problems, especially within C structures, so this may be
> > > illustrating a cross-platform problem more clearly.
> > > There is a repeatable indexing and (perhaps) caching problem I see
> > > in the regression tests. Annoyingly, the problems get slightly worse
> > > at the moment when compiling with -O0.
> > OK, here is my regression output. Do you see anything strange in
> > there?
>
> Well, yes, just not as strange as my tests :) You don't have int8
> enabled, and if your compiler and libc allow it I'd like to get that
> going. But that isn't a problem.
>
> You have a core dump from the "having" test. Is that a known problem
> with someone working on a solution? The test worked on my ~month-old
> development tree (I could probably figure out the vintage of that tree
> to more precision if it would be helpful), so something has happened in
> the meantime.
>
> I suspect that (possibly) more than one thing is going on, since there
> were some changes directly related to removing the oddball OID types as
> well as perhaps cleanup changes made while traversing the code.
> Something may have crept in there.
>
> If these tests are the only ones showing problems on your machine, then
> consider yourself lucky. I've got several more failures, including the
> one where I can't create indices on a table until after terminating and
> restarting the session. The Sparc contingent sees more problems than I,
> but they are on a Risc machine so will see alignment problems if they
> are present.
Yes, very strange.
>
> - Tom
>
> > ====== int8 ======
> > --- expected/int8.out Sun Aug 23 11:09:38 1998
> > +++ results/int8.out Tue Sep 1 23:40:10 1998
> > @@ -6,110 +6,110 @@
> > QUERY: INSERT INTO INT8_TBL VALUES('4567890123456789','-4567890123456789');
> > QUERY: SELECT * FROM INT8_TBL;
> > q1| q2
> > -----------------+-----------------
> > +----------+-----------
> > 123| 456
> > - 123| 4567890123456789
> > -4567890123456789| 123
> > -4567890123456789| 4567890123456789
> > -4567890123456789|-4567890123456789
> > + 123| 2147483647
> > +2147483647| 123
> > +2147483647| 2147483647
> > +2147483647|-2147483648
> > (5 rows)
> > ====== select_having ======
> > --- expected/select_having.out Sat Aug 29 00:10:03 1998
> > +++ results/select_having.out Tue Sep 1 23:41:51 1998
> > @@ -11,27 +11,6 @@
> > QUERY: INSERT INTO test_having VALUES (9, 4, 'CCCC', 'j');
> > QUERY: SELECT max(a) FROM test_having
> > GROUP BY lower(c) HAVING count(*) > 2 OR min(b) = 3;
> <snip>
> > -QUERY: DROP TABLE test_having;
> > +pqReadData() -- backend closed the channel unexpectedly.
> > + This probably means the backend terminated abnormally before or while processing the request.
> > +We have lost the connection to the backend, so further processing is impossible. Terminating.
>
> OK, this test did not fail on my development tree from a month ago. What
> changed? I'm seeing it fail here also.
I believe David Hartwig has claimed this problem, and knows the cause.
He posted something recently.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | lockhart(at)alumni(dot)caltech(dot)edu (Thomas G(dot) Lockhart) |
Cc: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 14:58:11 |
Message-ID: | 199809021458.KAA01738@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> > Can we try a simple -O rather than just -O2 and -O0. Could this be
> > some type of optimizer bug in gcc2/Solaris?
> > Everything is pointing to indexing.c, from both the initdb failure and
> > the create function failure. But I can't see anything wrong in there,
> > and other platforms seem to be OK.
>
> Uh, no, Linux/i686 is showing trouble too, but not in the initdb stage.
> The Sparc platforms will be more sensitive to byte alignment problems,
> especially within C structures, so this may be illustrating a
> cross-platform problem more clearly.
>
> There is a repeatable indexing and (perhaps) caching problem I see in
> the regression tests. Annoyingly, the problems get slightly worse at the
> moment when compiling with -O0.
Can you see if compiling indexing.c with different optmization levels
change the output? If so, would someone else also look in indexing.c
for somethings stupid I did. I can't see it, but EVERYTHING is
pointing to that file.
>
> There is a fundamental problem lurking somewhere, and there may not be
> much point in going beta unless you think that more testers will help to
> track down the problem.
Not sure how to find the problem. Without being able to debug it here,
I am left staring at the code over and over again.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu> |
---|---|
To: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
Cc: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 15:40:44 |
Message-ID: | 35ED66FC.15B0CA82@alumni.caltech.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Bruce Momjian wrote:
>
> Not sure how to find the problem. Without being able to debug it here,
> I am left staring at the code over and over again.
That sounds like *so* much fun ;-)
OK, I've been ignoring the problem until now, and of course may not be
of any help even if I were trying to help. I'm also not quite up on the
history and sequence of events which led to our current state, and I'm
not sure we have a strong regression history yet to pin this down.
At the moment we have Bruce and the two sparc guys working on it, and
Bruce can't reproduce the problems on his machine and Keith's machine is
so dog-slow that he can't do much testing which involve full rebuilds.
Boy, it hasn't been long since that model Sparc was the best thing
going, eh?
Does anyone else have a machine (Linux x86, for example) which exhibits
problems with the current development tree, and who has an interest in
helping to track this down?
I hate to step away from docs, but could for a while if that would be
helpful. I've got a fairly fast machine and can try pinning down when
the problems started by doing full builds on fresh trees. Or since we've
been making steady incremental improvements to the tree, maybe we should
focus on a particular problem; my "can't create another index in the
current session" problem is probably related to whatever else is going
on with indices.
In general the problems persist across different "-O" compiler settings
and across different architectures and compilers, though with different
symptoms, so I would think that this is not a compiler bug per se.
- Tom
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | lockhart(at)alumni(dot)caltech(dot)edu (Thomas G(dot) Lockhart) |
Cc: | emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests.u |
Date: | 1998-09-02 15:42:13 |
Message-ID: | 199809021542.LAA02771@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> Bruce Momjian wrote:
> >
> > Not sure how to find the problem. Without being able to debug it here,
> > I am left staring at the code over and over again.
>
> That sounds like *so* much fun ;-)
>
> OK, I've been ignoring the problem until now, and of course may not be
> of any help even if I were trying to help. I'm also not quite up on the
> history and sequence of events which led to our current state, and I'm
> not sure we have a strong regression history yet to pin this down.
>
> At the moment we have Bruce and the two sparc guys working on it, and
> Bruce can't reproduce the problems on his machine and Keith's machine is
> so dog-slow that he can't do much testing which involve full rebuilds.
> Boy, it hasn't been long since that model Sparc was the best thing
> going, eh?
I was on Thomas A. Szybist machine for two hours, and even though I was
telnet'ed across three Internet machines to get there, it made my PP200
look like it was sitting still. Amazing speed. It is a Sparc running
Solaris.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | David Hartwig <daveh(at)insightdist(dot)com> |
---|---|
To: | "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu> |
Cc: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>, emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 16:17:36 |
Message-ID: | 35ED6F9F.81934B69@insightdist.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thomas G. Lockhart wrote:
> > > Uh, no, Linux/i686 is showing trouble too, but not in the initdb
> > > stage. The Sparc platforms will be more sensitive to byte alignment
> > > problems, especially within C structures, so this may be
> > > illustrating a cross-platform problem more clearly.
> > > There is a repeatable indexing and (perhaps) caching problem I see
> > > in the regression tests. Annoyingly, the problems get slightly worse
> > > at the moment when compiling with -O0.
> > OK, here is my regression output. Do you see anything strange in
> > there?
>
> Well, yes, just not as strange as my tests :) You don't have int8
> enabled, and if your compiler and libc allow it I'd like to get that
> going. But that isn't a problem.
>
> You have a core dump from the "having" test. Is that a known problem
> with someone working on a solution? The test worked on my ~month-old
> development tree (I could probably figure out the vintage of that tree
> to more precision if it would be helpful), so something has happened in
> the meantime.
>
I submitted two patch patches to fix the select_having test. The first patch addressed problems caused by
a machine dependency on the degree of accuracy of datetime. CVS is currently showing this first patch.
The second patch was to fix my first patch. It has NOT been applied yet. The problem with he first
patch, which you are seeing now, is that the test case demonstrates another bug which has nothing to do
with having. It has to do with GROUPing by a function and the argument of the function not appearing
elsewhere in the target list. Weird! In any case the latest patch will fix the regression.
BTW, I have also sent one other patch that I am waiting to see in CVS. These one is an interim AND/OR
memory exhaustion fix.
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | daveh(at)insightdist(dot)com (David Hartwig) |
Cc: | lockhart(at)alumni(dot)caltech(dot)edu, emkxp01(at)mtcc(dot)demon(dot)co(dot)uk, szybist(at)boxhill(dot)com, hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 17:54:03 |
Message-ID: | 199809021754.NAA06604@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> I submitted two patch patches to fix the select_having test. The first patch addressed problems caused by
> a machine dependency on the degree of accuracy of datetime. CVS is currently showing this first patch.
>
> The second patch was to fix my first patch. It has NOT been applied yet. The problem with he first
> patch, which you are seeing now, is that the test case demonstrates another bug which has nothing to do
> with having. It has to do with GROUPing by a function and the argument of the function not appearing
> elsewhere in the target list. Weird! In any case the latest patch will fix the regression.
>
> BTW, I have also sent one other patch that I am waiting to see in CVS. These one is an interim AND/OR
> memory exhaustion fix.
Yes, I am behind on the patch applications. Marc probably stopped while
I did my mega-cleanup, and I have been scratching my head on the
platform failures. Hopefully one of us will get it soon.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
From: | David Hartwig <daybee(at)bellatlantic(dot)net> |
---|---|
To: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
Cc: | hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-02 20:33:55 |
Message-ID: | 35EDABB2.7AD2B56D@bellatlantic.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Bruce Momjian wrote:
> > I submitted two patch patches to fix the select_having test. The first patch addressed problems caused by
> > a machine dependency on the degree of accuracy of datetime. CVS is currently showing this first patch.
> >
> > The second patch was to fix my first patch. It has NOT been applied yet. The problem with he first
> > patch, which you are seeing now, is that the test case demonstrates another bug which has nothing to do
> > with having. It has to do with GROUPing by a function and the argument of the function not appearing
> > elsewhere in the target list. Weird! In any case the latest patch will fix the regression.
> >
> > BTW, I have also sent one other patch that I am waiting to see in CVS. These one is an interim AND/OR
> > memory exhaustion fix.
>
> Yes, I am behind on the patch applications. Marc probably stopped while
> I did my mega-cleanup, and I have been scratching my head on the
> platform failures. Hopefully one of us will get it soon.
> bug.
No rush, I can see you're synapses are all firing on this index bug. Just a reminder so it doesn't fall
through the cracks.
More important, am relieved (in advance) that you have found the problem in the index code.
Can I assume there will be a snapshot I can get tomorrow from my business location to try out on my AIX box? I
am anxious to know this is behind us.
Appreciation :)
From: | Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | daybee(at)bellatlantic(dot)net (David Hartwig) |
Cc: | hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Core dump in regression tests. |
Date: | 1998-09-03 02:25:06 |
Message-ID: | 199809030225.WAA22200@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
>
> No rush, I can see you're synapses are all firing on this index bug. Just a reminder so it doesn't fall
> through the cracks.
Doing it now.
>
> More important, am relieved (in advance) that you have found the problem in the index code.
>
> Can I assume there will be a snapshot I can get tomorrow from my business location to try out on my AIX box? I
> am anxious to know this is behind us.
>
> Appreciation :)
Last I heard on Monday, Marc is making snapshots every day now, and
through the beta period.
Thanks to all the bug reports and tracebacks. They clearly pointed to
the problem. I just couldn't see it until today.
--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)