Re: PG 7.3.1 with ssl on linux hangs (testcase available)

Lists: pgsql-bugs
From: Hans-Jürgen Hay <hjh(at)alterras(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Cc: hjh(at)alterras(dot)de
Subject: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-07-26 16:36:09
Message-ID: 1059237369.19643.77.camel@hera.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

since I moved from postgres 7.1 to 7.3 I get hangs when executing a
select on certaint rows with binary data using psql

The problem is reproducible and affects only single rows in the table .
The hangs happen only when I connect via network unsing psql -h
localhost template1

And the problem is gone, when ssl is switched off in postgresql.conf
But the problem seems to be on the client side, cause its the same using
a 7.1 server.

I have generated a testcase with 2 rows, one works the other not:

The table looks like this

create table testy (
id integer,
data bytea
);

(I can send a dump of the table with the offending rows to anybody
interessted by mail -- its about 130 K )

select * from testy where id=28162 --> works
select * from testy where id=28163 --> hangs + process goes up to
99% utilisation

The offending row is much larger, that might trigger.

I use a debian system with:
postgresql 7.3.3-1 client and server

--
Hans-Jürgen Hay <hjh(at)alterras(dot)de>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hans-Jürgen Hay <hjh(at)alterras(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 16:56:04
Message-ID: 16433.1059756964@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

=?ISO-8859-1?Q?Hans-J=FCrgen?= Hay <hjh(at)alterras(dot)de> writes:
> The hangs happen only when I connect via network unsing psql -h
> localhost template1
> And the problem is gone, when ssl is switched off in postgresql.conf

We have heard a couple of reports of problems with SSL connections,
but so far I've not been able to reproduce any trouble here. Details
about your SSL installation might help.

regards, tom lane


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 19:21:56
Message-ID: 3F2ABDD4.6030603@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>=?ISO-8859-1?Q?Hans-J=FCrgen?= Hay <hjh(at)alterras(dot)de> writes:
>
>
>>The hangs happen only when I connect via network unsing psql -h
>>localhost template1
>>And the problem is gone, when ssl is switched off in postgresql.conf
>>
>>
>
>We have heard a couple of reports of problems with SSL connections,
>but so far I've not been able to reproduce any trouble here. Details
>about your SSL installation might help.
>
>
>

Hi Tom,

good to hear that's not a singular problem with pgadmin3...
I currently don't have a system to reproduce this, because I didn't
succeed making my 7.4 server accepting ssl-connections. I posted for
installation help on pgsql-admin but didn't get a reaction so far. RTFM
up and down I couldn't see what I configured wrong (--with-openssl,
ssl=true). Can you give me a hint? If I can reproduce the problem, I
might be able to trace down that ssl thing.

Regards,
Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 19:24:02
Message-ID: 4937.1059765842@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> I currently don't have a system to reproduce this, because I didn't
> succeed making my 7.4 server accepting ssl-connections. I posted for
> installation help on pgsql-admin but didn't get a reaction so far. RTFM
> up and down I couldn't see what I configured wrong (--with-openssl,
> ssl=true). Can you give me a hint?

No, I didn't see what you did wrong either. Try looking in the
postmaster log to see if any useful error messages appear.

regards, tom lane


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 19:37:36
Message-ID: 3F2AC180.2020503@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
>
>
>>I currently don't have a system to reproduce this, because I didn't
>>succeed making my 7.4 server accepting ssl-connections. I posted for
>>installation help on pgsql-admin but didn't get a reaction so far. RTFM
>>up and down I couldn't see what I configured wrong (--with-openssl,
>>ssl=true). Can you give me a hint?
>>
>>
>
>No, I didn't see what you did wrong either. Try looking in the
>postmaster log to see if any useful error messages appear.
>

I pushed client_min_messages and log_min_messages to debug5, and the
only suspicious message at startup is

LOG: could not load root cert file "/usr/data/pgsql-7.4/root.crt": No
such file or directory
DETAIL: Will not verify client certificates.

which shouldn't be a problem (pg_hba.conf is configured to trust).

Any connect attempt will log
DEBUG: forked new backend, pid=1826 socket=8
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: reaping dead processes
DEBUG: child process (pid 1826) exited with exit code 0

with the client side message "server does not support SSL, but SSL was
required". OpenSSL is 0.9.6g

Regards,
Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 21:37:27
Message-ID: 13863.1059773847@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> I pushed client_min_messages and log_min_messages to debug5, and the
> only suspicious message at startup is
> LOG: could not load root cert file "/usr/data/pgsql-7.4/root.crt": No
> such file or directory
> DETAIL: Will not verify client certificates.
> which shouldn't be a problem (pg_hba.conf is configured to trust).

That's expected if you don't provide a root.crt file. (I don't, and it
still works for me.)

> Any connect attempt will log
> DEBUG: forked new backend, pid=1826 socket=8
> DEBUG: proc_exit(0)
> DEBUG: shmem_exit(0)
> DEBUG: exit(0)
> DEBUG: reaping dead processes
> DEBUG: child process (pid 1826) exited with exit code 0

> with the client side message "server does not support SSL, but SSL was
> required". OpenSSL is 0.9.6g

Bizarre. I looked through the recent sslmode patch, and while I found
some things I didn't like, none of them explain this. Could you perhaps
get a trace of the server-to-client interaction? Either strace'ing psql
or watching the IP traffic with a packet sniffer should do --- but make
sure you get the full contents of each packet sent and received.

regards, tom lane


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 22:17:34
Message-ID: 3F2AE6FE.8010600@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>Bizarre. I looked through the recent sslmode patch, and while I found
>some things I didn't like, none of them explain this. Could you perhaps
>get a trace of the server-to-client interaction? Either strace'ing psql
>or watching the IP traffic with a packet sniffer should do --- but make
>sure you get the full contents of each packet sent and received.
>
>
Hi Tom,

the attached file is a tcpdump of a connect attempt.

Regards,
Andreas

Attachment Content-Type Size
ssl.tcpdump text/plain 3.0 KB

From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 22:36:28
Message-ID: 3F2AEB6C.6050808@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi Tom,

reducing the traced conversation to its essence it's just a ssl-request
to a non-ssl server :-(

>localhost.32822 > localhost.postgresql: 0000 0008 04d2 162f
>
>
>localhost.postgresql > localhost.32822: 4e
>
And no hints from serverlog.
Are there some more debugging options that I could enable in the backend
to be a bit more verbose about the communication process?

Regards,
Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 23:01:21
Message-ID: 14408.1059778881@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> the attached file is a tcpdump of a connect attempt.

Hm. The postmaster is sending back 'N' indicating that it does not want
to do SSL.

Are you sure you are connecting to an SSL-enabled postmaster?

Also, is your connection by any chance IPV6? It doesn't look like it
from the tcpdump, but I'm not sure I know how to tell.

The relevant bit of code in the postmaster is

#ifdef USE_SSL
/* No SSL when disabled or on Unix sockets */
if (!EnableSSL || port->laddr.addr.ss_family != AF_INET)
SSLok = 'N';
else
SSLok = 'S'; /* Support for SSL */
#else
SSLok = 'N'; /* No support for SSL */
#endif

so one of the paths that chooses 'N' is being followed. The test on
AF_INET is broken, since it should allow for IPV6, but otherwise I
don't see what's getting you.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 23:27:05
Message-ID: 14745.1059780425@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> reducing the traced conversation to its essence it's just a ssl-request
> to a non-ssl server :-(

Yeah, I had just come to the same conclusion. Either you're connecting
to the wrong server (check those port numbers?) or there's something
wrong with the bit of postmaster code I just showed you.

regards, tom lane


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 23:29:21
Message-ID: 3F2AF7D1.80801@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>Hm. The postmaster is sending back 'N' indicating that it does not want
>to do SSL.
>
>Are you sure you are connecting to an SSL-enabled postmaster?
>
>Also, is your connection by any chance IPV6? It doesn't look like it
>from the tcpdump, but I'm not sure I know how to tell.
>
Ah, that's it! My code examination just had reached
ProcessStartupPacket, when your mail arrived.
The kernel is IPV6, and incoming IPV4 connection will be handled IPV6; I
can tell this from pg_hba.conf experiments that my installation requires
the ::ffff:192.168.0.0/24 entry.

>The relevant bit of code in the postmaster is
>
>#ifdef USE_SSL
> /* No SSL when disabled or on Unix sockets */
> if (!EnableSSL || port->laddr.addr.ss_family != AF_INET)
> SSLok = 'N';
> else
> SSLok = 'S'; /* Support for SSL */
>#else
> SSLok = 'N'; /* No support for SSL */
>#endif
>

According to the comment, it should be

if (!EnableSSL || port->laddr.addr.ss_family == AF_UNIX)

and after changing this it works! Quite hard to find, probably most
users you'd ask "do you use IPV6" would have answered "no". I didn't opt
vor IPV6 either, it's a stock SuSE 8.1.

Immediately, I checked the reported ssl hangs, and found both cases
working. Dave reported this failing on snake.pgadmin.org, which uses a
different openssl (0.9.7a), I'll see if I can check that version (won't
be before monday, need some sleep now).

Regards,
Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-01 23:31:48
Message-ID: 14801.1059780708@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> According to the comment, it should be
> if (!EnableSSL || port->laddr.addr.ss_family == AF_UNIX)
> and after changing this it works!

Right. I've committed the fix. One bug less, now to find the next one
...

> Dave reported this failing on snake.pgadmin.org, which uses a
> different openssl (0.9.7a), I'll see if I can check that version (won't
> be before monday, need some sleep now).

Sounds good, see you then.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hans-Jürgen Hay <hjh(at)alterras(dot)de>
Cc: pgsql-bugs(at)postgreSQL(dot)org
Subject: Re: PG 7.3.1 with ssl on linux hangs (testcase available)
Date: 2003-08-04 17:29:01
Message-ID: 7841.1060018141@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Ah-hah, the problem is timing-dependent. I was able to reproduce it
after I tried connecting from one machine to another.

The patch against 7.3.4 is attached.

regards, tom lane

Index: fe-misc.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-misc.c,v
retrieving revision 1.85
diff -c -r1.85 fe-misc.c
*** fe-misc.c 24 Oct 2002 23:35:55 -0000 1.85
--- fe-misc.c 4 Aug 2003 17:22:19 -0000
***************
*** 553,559 ****
--- 553,570 ----
* file is ready. Grumble. Fortunately, we don't expect this path to
* be taken much, since in normal practice we should not be trying to
* read data unless the file selected for reading already.
+ *
+ * In SSL mode it's even worse: SSL_read() could say WANT_READ and then
+ * data could arrive before we make the pqReadReady() test. So we must
+ * play dumb and assume there is more data, relying on the SSL layer to
+ * detect true EOF.
*/
+
+ #ifdef USE_SSL
+ if (conn->ssl)
+ return 0;
+ #endif
+
switch (pqReadReady(conn))
{
case 0:
Index: fe-secure.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v
retrieving revision 1.15.2.5
diff -c -r1.15.2.5 fe-secure.c
*** fe-secure.c 10 Apr 2003 23:03:13 -0000 1.15.2.5
--- fe-secure.c 4 Aug 2003 17:22:19 -0000
***************
*** 278,298 ****
libpq_gettext("SSL SYSCALL error: %s\n"),
SOCK_STRERROR(SOCK_ERRNO));
else
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("SSL SYSCALL error: EOF detected\n"));
break;
case SSL_ERROR_SSL:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("SSL error: %s\n"), SSLerrmessage());
/* fall through */
case SSL_ERROR_ZERO_RETURN:
- pqsecure_close(conn);
SOCK_ERRNO = ECONNRESET;
n = -1;
break;
default:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("Unknown SSL error code\n"));
break;
}
}
--- 278,302 ----
libpq_gettext("SSL SYSCALL error: %s\n"),
SOCK_STRERROR(SOCK_ERRNO));
else
+ {
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("SSL SYSCALL error: EOF detected\n"));
+ SOCK_ERRNO = ECONNRESET;
+ n = -1;
+ }
break;
case SSL_ERROR_SSL:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("SSL error: %s\n"), SSLerrmessage());
/* fall through */
case SSL_ERROR_ZERO_RETURN:
SOCK_ERRNO = ECONNRESET;
n = -1;
break;
default:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("Unknown SSL error code\n"));
+ n = -1;
break;
}
}
***************
*** 334,354 ****
libpq_gettext("SSL SYSCALL error: %s\n"),
SOCK_STRERROR(SOCK_ERRNO));
else
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("SSL SYSCALL error: EOF detected\n"));
break;
case SSL_ERROR_SSL:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("SSL error: %s\n"), SSLerrmessage());
/* fall through */
case SSL_ERROR_ZERO_RETURN:
- pqsecure_close(conn);
SOCK_ERRNO = ECONNRESET;
n = -1;
break;
default:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("Unknown SSL error code\n"));
break;
}
}
--- 338,362 ----
libpq_gettext("SSL SYSCALL error: %s\n"),
SOCK_STRERROR(SOCK_ERRNO));
else
+ {
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("SSL SYSCALL error: EOF detected\n"));
+ SOCK_ERRNO = ECONNRESET;
+ n = -1;
+ }
break;
case SSL_ERROR_SSL:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("SSL error: %s\n"), SSLerrmessage());
/* fall through */
case SSL_ERROR_ZERO_RETURN:
SOCK_ERRNO = ECONNRESET;
n = -1;
break;
default:
printfPQExpBuffer(&conn->errorMessage,
libpq_gettext("Unknown SSL error code\n"));
+ n = -1;
break;
}
}