Quick Links

Re: Cygwin PostgreSQL Regression Test Problems (Revisited)

Lists:	pgsql-ports

From:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
To:	pgsql-ports(at)postgresql(dot)org
Subject:	Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-28 18:36:45
Message-ID:	20010328133645.F465@dothill.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 사이트

On Mon, Jan 15, 2001 at 11:37:55PM -0500, Jason Tishler wrote:
> 2. I am unable to successfully run the regression tests on a NT 4.0 SP5
> machine with only 64 MB of physical memory and about 175 MB of swap space.
> Other than lacking RAM and swap space, this machine is the "same" as other
> NT/2000 machines which can successfully run the regression tests.
>
> The tests usually hang during the "parallel group (18 tests)" test
> right after numerology. By "hang," I mean that the original postmaster
> is still running, but there are no postmaster children, and there are
> some number of psql processes hanging around. Using NT's TaskManager,
> I can see that the machine is running out of memory. I have even seen
> the "Windows is running low on virtual memory" dialog a few times.
> Should I expect this behavior from such a lame machine?

I previously reported the above problem with the parallel version of
the regression test (i.e., make check) on a machine with limited memory.
Unfortunately, I am seeing similar problems on a machine with 192 MB of
physical memory and about 208 MB of swap space. So, now I feel that my
initial conclusion that limited memory was the root cause is erroneous.

My current WAG is that there is a race condition in Cygwin that is
causing the some back-end postgres processes to abort. This in turn
causes the associated front-end psql processes to hang which in turn
causes the regression test to hang.

What is the best way to "catch" this problem? What are the best set of
options to pass to postmaster that will be in turn passed to the back-end
postgres processes to hopefully shed some light on this situation? Can I
get the individual back-end postgres processes to log to separate files?

There is so much going on during a parallel regression test that it's
hard to figure out what is really happening. Any help would be greatly
appreciated.

Thanks,
Jason

--
Jason Tishler
Director, Software Engineering Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp. Fax: +1 (732) 264-8798
82 Bethany Road, Suite 7 Email: Jason(dot)Tishler(at)dothill(dot)com
Hazlet, NJ 07730 USA WWW: http://www.dothill.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-28 18:57:33
Message-ID:	19071.985805853@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> I previously reported the above problem with the parallel version of
> the regression test (i.e., make check) on a machine with limited memory.
> Unfortunately, I am seeing similar problems on a machine with 192 MB of
> physical memory and about 208 MB of swap space. So, now I feel that my
> initial conclusion that limited memory was the root cause is erroneous.

Not necessarily. 18 parallel tests imply 54 concurrent processes
(a shell, a psql, and a backend per test). Depending on whether Windoze
is any good about sharing sharable pages across processes, it's not hard
at all to believe that each process might chew up a few meg of memory
and/or swap. You don't have a whole lot of headroom there if so.

Try modifying the parallel_schedule file to break the largest set of
concurrent tests down into two sets of nine tests.

Considering that we've seen people run into maxuprc limits on some Unix
versions, I wonder whether we ought to just do that across-the-board.

> What is the best way to "catch" this problem? What are the best set of
> options to pass to postmaster that will be in turn passed to the back-end
> postgres processes to hopefully shed some light on this situation?

I'd use -d1 which should be enough to see backends starting and exiting.
Any more will clutter the log with individual queries, which probably
would be more detail than you really want...

regards, tom lane

From:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-28 21:29:28
Message-ID:	20010328162928.D510@dothill.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Tom,

On Wed, Mar 28, 2001 at 01:57:33PM -0500, Tom Lane wrote:
> Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> > I previously reported the above problem with the parallel version of
> > the regression test (i.e., make check) on a machine with limited memory.
> > Unfortunately, I am seeing similar problems on a machine with 192 MB of
> > physical memory and about 208 MB of swap space. So, now I feel that my
> > initial conclusion that limited memory was the root cause is erroneous.
>
> Not necessarily. 18 parallel tests imply 54 concurrent processes
> (a shell, a psql, and a backend per test). Depending on whether Windoze
> is any good about sharing sharable pages across processes, it's not hard
> at all to believe that each process might chew up a few meg of memory
> and/or swap. You don't have a whole lot of headroom there if so.

I just increased the swap space (i.e., pagefile.sys) to 384 MB and I
still get hangs. Watching memory usage via the NT Task Manager, Windows
tells me that the memory usage during the regression test is <= 80 MB
which is significantly less than my physical memory.

I wonder if I'm bucking up against some Cygwin limitations. On the
cygwin-developers list, there was a recent discussion that indicated
that a Cygwin process can only have a max of 64 children. May be there
is a limit like that which is causing backends to abort?

> Try modifying the parallel_schedule file to break the largest set of
> concurrent tests down into two sets of nine tests.

I'm sure that will work (at least most of the time) since I only get one
of two psql processes to hangs for any given run. But, "fixing" the
problem this way just doesn't feel right to me.

> Considering that we've seen people run into maxuprc limits on some Unix
> versions, I wonder whether we ought to just do that across-the-board.

Of course, this solution is much better. :,)

> > What is the best way to "catch" this problem? What are the best set of
> > options to pass to postmaster that will be in turn passed to the back-end
> > postgres processes to hopefully shed some light on this situation?
>
> I'd use -d1 which should be enough to see backends starting and exiting.
> Any more will clutter the log with individual queries, which probably
> would be more detail than you really want...

I've done the above and it seems to indicate that all backends exited
with a status of 0. So, I still don't know why some backends "aborted."

Any other suggestions? Such as somehow specifying an individual log
file for each backend.

Thanks,
Jason

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-28 21:40:30
Message-ID:	19886.985815630@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 사이트

Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> I've done the above and it seems to indicate that all backends exited
> with a status of 0. So, I still don't know why some backends "aborted."

Hm. So what exactly is the failure mode? Do the psql processes report
any errors? Have they produced (any/all of) the expected output?

regards, tom lane

From:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-28 22:34:49
Message-ID:	20010328173449.E510@dothill.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

Tom,

On Wed, Mar 28, 2001 at 04:40:30PM -0500, Tom Lane wrote:
> Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> > I've done the above and it seems to indicate that all backends exited
> > with a status of 0. So, I still don't know why some backends "aborted."
>
> Hm. So what exactly is the failure mode? Do the psql processes report
> any errors? Have they produced (any/all of) the expected output?

The failure mode is always something like the following:

The regression test proceeds normally until one of the larger parallel
groups is running. Then it will hang after output such as:

parallel group (18 tests): point lseg box path circle date polygon time abstime inet interval reltime type_sanity oidjoins opr_sanity timestamp...

If I do a ps, I will see the postmaster process and one or more psql
processes. The corresponding postgres processes are no longer running.
(Were they ever running?) The NT Task Manager shows essentially 100% idle.

I usually kill the psql processes, with the following command:

kill $(ps | fgrep psql | awk '{print $1}')

Then the regression test will continue with output like the following:

...Signal 15
Signal 15
comments tinterval
point ... ok
lseg ... ok
box ... ok
path ... ok
polygon ... ok
circle ... ok
date ... ok
time ... ok
timestamp ... ok
interval ... ok
abstime ... ok
reltime ... ok
tinterval ... FAILED
inet ... ok
comments ... FAILED
oidjoins ... ok
type_sanity ... ok
opr_sanity ... ok
test geometry ... ok
..

I believe that the "failures" above correspond to the psql processes
that I killed.

Sometimes the regression test will run to completion without any more
hangs. Sometimes it will hang at one or more large parallel groups. If
I continue to kill the psql processes as above, the regression test will
eventually complete (with more "failures").

I've trying another experiment of killing a postgres backend to see if
the psql process notices the backend dying. It does but I was only able
to kill -9 the postgres backend. Otherwise, postgres ignored the
signal. So, I don't know if my experiment was valid. If a backend
exits normally while a psql is connected, will the psql process notice
this event?

Any other suggestions? Or, should I just run the serial_schedule and
stop my head banging?

Thanks,
Jason
--
Jason Tishler
Director, Software Engineering Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp. Fax: +1 (732) 264-8798
82 Bethany Road, Suite 7 Email: Jason(dot)Tishler(at)dothill(dot)com
Hazlet, NJ 07730 USA WWW: http://www.dothill.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-28 22:38:18
Message-ID:	20033.985819098@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> Then the regression test will continue with output like the following:

> ...Signal 15
> Signal 15
> comments tinterval
> point ... ok
> lseg ... ok
> box ... ok
> path ... ok
> polygon ... ok
> circle ... ok
> date ... ok
> time ... ok
> timestamp ... ok
> interval ... ok
> abstime ... ok
> reltime ... ok
> tinterval ... FAILED
> inet ... ok
> comments ... FAILED
> oidjoins ... ok
> type_sanity ... ok
> opr_sanity ... ok
> test geometry ... ok
> ..

This doesn't tell us much. What shows up in the output files of the
failed tests --- what are the *diffs*, not just the summary display?

regards, tom lane

From:	Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-29 06:20:59
Message-ID:	3AC2D44B.2A009384@tpf.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Tom Lane wrote:
>
> Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> > Then the regression test will continue with output like the following:
>
> > ...Signal 15
> > Signal 15
> > comments tinterval
> > point ... ok
> > lseg ... ok
> > box ... ok
> > path ... ok
> > polygon ... ok
> > circle ... ok
> > date ... ok
> > time ... ok
> > timestamp ... ok
> > interval ... ok
> > abstime ... ok
> > reltime ... ok
> > tinterval ... FAILED
> > inet ... ok
> > comments ... FAILED
> > oidjoins ... ok
> > type_sanity ... ok
> > opr_sanity ... ok
> > test geometry ... ok
> > ..
>
> This doesn't tell us much. What shows up in the output files of the
> failed tests --- what are the *diffs*, not just the summary display?
>

Hmmm, *diffs* are available little.
psql hangs at PQsetdbLogin()(select() in the
first pqWait() in connectDBComplete()).

regards,
Hiroshi Inoue

From:	Yutaka tanida <yutaka(at)hi-net(dot)zaq(dot)ne(dot)jp>
To:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-29 07:03:26
Message-ID:	20010329155502.1057.YUTAKA@hi-net.zaq.ne.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Jason,

On Wed, 28 Mar 2001 13:36:45 -0500
Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> wrote:

> On Mon, Jan 15, 2001 at 11:37:55PM -0500, Jason Tishler wrote:
> > The tests usually hang during the "parallel group (18 tests)" test
> > right after numerology. By "hang," I mean that the original postmaster
> > is still running, but there are no postmaster children, and there are
> > some number of psql processes hanging around. Using NT's TaskManager,
> > I can see that the machine is running out of memory. I have even seen
> > the "Windows is running low on virtual memory" dialog a few times.
> > Should I expect this behavior from such a lame machine?
> I previously reported the above problem with the parallel version of
> the regression test (i.e., make check) on a machine with limited memory.
> Unfortunately, I am seeing similar problems on a machine with 192 MB of
> physical memory and about 208 MB of swap space. So, now I feel that my
> initial conclusion that limited memory was the root cause is erroneous.

I can't reproduce it. Paralell regression test works perfectly and
returns "All 76 tests passed." . There's no hung-up.

Enviroment:
PIII-550 , 256MB RAM PC
NT4.0 + SP6

PostgreSQL 7.1Beta6
cygipc 1.08+my 2 patch
Cygwin1.dll 010215 snapshot

--
Yutaka tanida <yutaka(at)hi-net(dot)zaq(dot)ne(dot)jp>

From:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-29 15:17:45
Message-ID:	20010329101744.A467@dothill.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토SQL

Tom,

On Wed, Mar 28, 2001 at 06:06:22PM -0500, Tom Lane wrote:
> Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> So no queries get executed at all before the backend exits. Given that
> the backend seems to be exiting normally, one would suppose that the
> backend thinks it is seeing an EOF from the client. Is there anything
> about "unexpected EOF on client connection" in the postmaster log?

I grep-ed for EOF in postmaster.log but came up empty. Did I need to
run with debugging turned on to see this error message? I was running
*without* debugging turned on.

> Another possibility is that the failing psqls are never managing to
> connect in the first place. Can you attach to one of the stuck psqls
> with gdb and get a backtrace to see where it is?

I get the following backtrace for one of the hung psql processes:

(gdb) bt
#0 0x77f682cb in ?? ()
#1 0x77f1cd76 in ?? ()
#2 0x6103deee in _size_of_stack_reserve__ ()
#3 0x6103d84e in _size_of_stack_reserve__ ()
#4 0x67989978 in pqWait (forRead=0, forWrite=1, conn=0xa010258)
at fe-misc.c:738
#5 0x6798287c in connectDBComplete (conn=0xa010258) at fe-connect.c:1103
#6 0x67981fb1 in PQsetdbLogin (pghost=0x0, pgport=0x0, pgoptions=0x0,
pgtty=0x0, dbName=0x1a0260e8 "regression", login=0x0, pwd=0x0)
at fe-connect.c:524
#7 0x40e43f in main (argc=6, argv=0x1a021ad8) at startup.c:178

On Thu, Mar 29, 2001 at 03:20:59PM +0900, Hiroshi Inoue wrote:
> psql hangs at PQsetdbLogin()(select() in the
> first pqWait() in connectDBComplete()).

Note that my hang seems identical to the one reported by Hiroshi Inoue.

Thanks,
Jason

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-29 15:43:49
Message-ID:	22328.985880629@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> I get the following backtrace for one of the hung psql processes:

> (gdb) bt
> #0 0x77f682cb in ?? ()
> #1 0x77f1cd76 in ?? ()
> #2 0x6103deee in _size_of_stack_reserve__ ()
> #3 0x6103d84e in _size_of_stack_reserve__ ()
> #4 0x67989978 in pqWait (forRead=0, forWrite=1, conn=0xa010258)
> at fe-misc.c:738
> #5 0x6798287c in connectDBComplete (conn=0xa010258) at fe-connect.c:1103
> #6 0x67981fb1 in PQsetdbLogin (pghost=0x0, pgport=0x0, pgoptions=0x0,
> pgtty=0x0, dbName=0x1a0260e8 "regression", login=0x0, pwd=0x0)
> at fe-connect.c:524
> #7 0x40e43f in main (argc=6, argv=0x1a021ad8) at startup.c:178

It would be helpful to see the contents of the conn object ("f 5" then
"p *conn" should do it).

If Hiroshi is correct that this is the *first* call to pqWait in
connectDBComplete, then I think we are looking at a kernel bug (or more
likely a cygwin bug). psql has opened a TCP connection socket and is
now waiting for the socket to show as write-ready before it will send
a connection request. If select() never reports the socket as
write-ready, you have a hang ... and it's not possible to blame the hang
on anything else but the kernel. Both ends of the connection are on the
same machine, so there's no network problem or anything like that.
There is not anything else that we should need to do at the application
level before we should be allowed to send data.

regards, tom lane

From:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-29 16:31:19
Message-ID:	20010329113119.A209@dothill.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Tom,

On Thu, Mar 29, 2001 at 10:43:49AM -0500, Tom Lane wrote:
> Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> > I get the following backtrace for one of the hung psql processes:
>
> > (gdb) bt
> > #0 0x77f682cb in ?? ()
> > #1 0x77f1cd76 in ?? ()
> > #2 0x6103deee in _size_of_stack_reserve__ ()
> > #3 0x6103d84e in _size_of_stack_reserve__ ()
> > #4 0x67989978 in pqWait (forRead=0, forWrite=1, conn=0xa010258)
> > at fe-misc.c:738
> > #5 0x6798287c in connectDBComplete (conn=0xa010258) at fe-connect.c:1103
> > #6 0x67981fb1 in PQsetdbLogin (pghost=0x0, pgport=0x0, pgoptions=0x0,
> > pgtty=0x0, dbName=0x1a0260e8 "regression", login=0x0, pwd=0x0)
> > at fe-connect.c:524
> > #7 0x40e43f in main (argc=6, argv=0x1a021ad8) at startup.c:178
>
> It would be helpful to see the contents of the conn object ("f 5" then
> "p *conn" should do it).

I did as you suggested above and got the following:

(gdb) f 5
#5 0x6798287c in connectDBComplete (conn=0xa010258) at fe-connect.c:1103
1103 if (pqWait(0, 1, conn))
(gdb) p *conn
$1 = {pghost = 0x0, pghostaddr = 0x0, pgport = 0xa016610 "65432",
pgunixsocket = 0x0, pgtty = 0xa016620 "", pgoptions = 0xa016630 "",
dbName = 0xa017170 "regression", pguser = 0xa017150 "jt",
pgpass = 0xa017160 "", Pfdebug = 0x0,
noticeHook = 0x67984e8c <defaultNoticeProcessor>, noticeArg = 0x0,
status = CONNECTION_STARTED, asyncStatus = PGASYNC_IDLE,
notifyList = 0xa0103e0, sock = 3, laddr = {sa = {sa_family = 0,
sa_data = '\000' <repeats 13 times>}, in = {sin_family = 0,
sin_port = 0, sin_addr = {s_addr = 0},
__pad = "\000\000\000\000\000\000\000"}, un = {sun_family = 0,
sun_path = '\000' <repeats 107 times>}}, raddr = {sa = {sa_family = 1,
sa_data = "/tmp/.s.PGSQL."}, in = {sin_family = 1, sin_port = 29743,
sin_addr = {s_addr = 774860909}, __pad = "s.PGSQL."}, un = {
sun_family = 1,
sun_path = "/tmp/.s.PGSQL.65432", '\000' <repeats 88 times>}},
raddr_len = 21, be_pid = 0, be_key = 0, salt = "\000", lobjfuncs = 0x0,
inBuffer = 0xa0103f0 "", inBufSize = 16384, inStart = 0, inCursor = 0,
inEnd = 0, nonblocking = 0, outBuffer = 0xa0143f8 "", outBufSize = 8192,
outCount = 0, result = 0x0, curTuple = 0x0,
setenv_state = SETENV_STATE_IDLE, next_eo = 0x0, errorMessage = {
data = 0xa016400 "", len = 0, maxlen = 256}, workBuffer = {
data = 0xa016508 "", len = 0, maxlen = 256}, client_encoding = 0}

> If Hiroshi is correct that this is the *first* call to pqWait in
> connectDBComplete, then I think we are looking at a kernel bug (or more
> likely a cygwin bug). psql has opened a TCP connection socket and is
> now waiting for the socket to show as write-ready before it will send
> a connection request. If select() never reports the socket as
> write-ready, you have a hang ... and it's not possible to blame the hang
> on anything else but the kernel. Both ends of the connection are on the
> same machine, so there's no network problem or anything like that.
> There is not anything else that we should need to do at the application
> level before we should be allowed to send data.

Does the details reported above support your hypothesis? If so, can you
assist me in formulating a minimal test case that I can take back to
the Cygwin community?

Thanks,
Jason

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jason(dot)Tishler(at)dothill(dot)com
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-29 16:40:08
Message-ID:	22626.985884008@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com> writes:
> status = CONNECTION_STARTED, asyncStatus = PGASYNC_IDLE,

Oh-ho, that's interesting! If you look at fe-connect.c you'll see that
CONNECTION_STARTED must indicate that connect() returned EINPROGRESS
rather than a success indication. The socket is supposed to go
write-ready when the connection is finished --- for example HPUX's
connect man page sez

[EINPROGRESS] Nonblocking I/O is enabled using
O_NONBLOCK, O_NDELAY, or FIOSNBIO, and
the connection cannot be completed
immediately. This is not a failure.
Make the connect() call again a few
seconds later. Alternatively, wait for
completion by calling select() and
selecting for write.

But, evidently, it never is coming ready for write.

BTW, I note that we are trying to use Unix sockets here. Does the bug
still appear if you force pg_regress to use TCP connections?

regards, tom lane

From:	Jason Tishler <Jason(dot)Tishler(at)dothill(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-ports(at)postgresql(dot)org
Subject:	Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
Date:	2001-03-29 17:10:26
Message-ID:	20010329121026.B209@dothill.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-ports

Tom,

On Thu, Mar 29, 2001 at 11:40:08AM -0500, Tom Lane wrote:
> BTW, I note that we are trying to use Unix sockets here. Does the bug
> still appear if you force pg_regress to use TCP connections?

I'm not sure if you already know this, but Cygwin Unix sockets are
actually implemented as TCP/IP sockets. Anyway, I forced TCP connections
and got the same psql hang:

(gdb) p *conn
$1 = {pghost = 0xa016618 "localhost", pghostaddr = 0x0,
pgport = 0xa016628 "65432", pgunixsocket = 0x0, pgtty = 0xa016638 "",
pgoptions = 0xa016648 "", dbName = 0xa017188 "regression",
pguser = 0xa017168 "jt", pgpass = 0xa017178 "", Pfdebug = 0x0,
noticeHook = 0x67984e8c <defaultNoticeProcessor>, noticeArg = 0x0,
status = CONNECTION_STARTED, asyncStatus = PGASYNC_IDLE,
notifyList = 0xa0103e8, sock = 3, laddr = {sa = {sa_family = 0,
sa_data = '\000' <repeats 13 times>}, in = {sin_family = 0,
sin_port = 0, sin_addr = {s_addr = 0},
__pad = "\000\000\000\000\000\000\000"}, un = {sun_family = 0,
sun_path = '\000' <repeats 107 times>}}, raddr = {sa = {sa_family = 2,
sa_data = "ÿ\230\177\000\000\001\000\000\000\000\000\000\000"}, in = {
sin_family = 2, sin_port = 39167, sin_addr = {s_addr = 16777343},
__pad = "\000\000\000\000\000\000\000"}, un = {sun_family = 2,
sun_path = "ÿ\230\177\000\000\001", '\000' <repeats 101 times>}},
raddr_len = 16, be_pid = 0, be_key = 0, salt = "\000", lobjfuncs = 0x0,
inBuffer = 0xa0103f8 "", inBufSize = 16384, inStart = 0, inCursor = 0,
inEnd = 0, nonblocking = 0, outBuffer = 0xa014400 "", outBufSize = 8192,
outCount = 0, result = 0x0, curTuple = 0x0,
setenv_state = SETENV_STATE_IDLE, next_eo = 0x0, errorMessage = {
data = 0xa016408 "", len = 0, maxlen = 256}, workBuffer = {
data = 0xa016510 "", len = 0, maxlen = 256}, client_encoding = 0}

> But, evidently, it never is coming ready for write.

Any ideas on how to demonstrate this to the Cygwin community without
all of the PostgreSQL baggage. Sorry, but I'm not very experienced
with sockets.

Thanks,
Jason