Re: Unsafe threading in syslogger on Windows

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Unsafe threading in syslogger on Windows
Date: 2010-04-08 14:59:28
Message-ID: 4BBDEF50.5060408@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Windows, syslogger uses two threads. The main thread loops and polls
if any SIGHUPs have been received or if the log file needs to be
rotated. Another thread, "pipe thread", does ReadFile() on the pipe that
other processes send their log messages to. ReadFile() blocks, and
whenever new data arrives, it is processed in the pipe thread.

Both threads use palloc()/pfree(), which are not thread-safe :-(.

It's hard to trigger a crash because the main thread mostly just sleeps,
and the pipe thread only uses palloc()/pfree() when it receives chunked
messages, larger than 512 bytes. Browsing the CVS history, this was made
visibly broken by the patch that introduced the message chunking. Before
that the pipe thread just read from the pipe and wrote to the log file,
which was safe. It has always used ereport() to report read errors,
though, which can do palloc(), but there shouldn't normally be any read
errors.

I chatted with Magnus about this, and he suggested using a Windows
critical section to make sure that only one of the threads is active at
a time. That seems suitable for back-porting, but I'd like to get rid of
this threading in CVS head, it seems too error-prone.

The reason it uses threads like this on Windows is explained in the
comments:
> /*
> * Worker thread to transfer data from the pipe to the current logfile.
> *
> * We need this because on Windows, WaitForSingleObject does not work on
> * unnamed pipes: it always reports "signaled", so the blocking ReadFile won't
> * allow for SIGHUP; and select is for sockets only.
> */

But Magnus pointed out that our pgpipe() implementation on Windows
actually creates a pair of sockets instead of pipes, for exactly that
reason, so that you can use select() on the returned file descriptor.
For some reason syslogger explicitly doesn't use pgpipe() on Windows,
though, but calls CreatePipe(). I don't see any explanation why.

I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
syslogger, and let it use pgpipe() and select() instead of the extra thread.

Thoughts?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unsafe threading in syslogger on Windows
Date: 2010-04-08 15:52:34
Message-ID: 4BBDFBC2.9050108@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> On Windows, syslogger uses two threads. The main thread loops and polls
> if any SIGHUPs have been received or if the log file needs to be
> rotated. Another thread, "pipe thread", does ReadFile() on the pipe that
> other processes send their log messages to. ReadFile() blocks, and
> whenever new data arrives, it is processed in the pipe thread.
>
> Both threads use palloc()/pfree(), which are not thread-safe :-(.
>
> It's hard to trigger a crash because the main thread mostly just sleeps,
> and the pipe thread only uses palloc()/pfree() when it receives chunked
> messages, larger than 512 bytes. Browsing the CVS history, this was made
> visibly broken by the patch that introduced the message chunking. Before
> that the pipe thread just read from the pipe and wrote to the log file,
> which was safe. It has always used ereport() to report read errors,
> though, which can do palloc(), but there shouldn't normally be any read
> errors.
>
> I chatted with Magnus about this, and he suggested using a Windows
> critical section to make sure that only one of the threads is active at
> a time. That seems suitable for back-porting, but I'd like to get rid of
> this threading in CVS head, it seems too error-prone.
>
> The reason it uses threads like this on Windows is explained in the
> comments:
>
>> /*
>> * Worker thread to transfer data from the pipe to the current logfile.
>> *
>> * We need this because on Windows, WaitForSingleObject does not work on
>> * unnamed pipes: it always reports "signaled", so the blocking ReadFile won't
>> * allow for SIGHUP; and select is for sockets only.
>> */
>>
>
> But Magnus pointed out that our pgpipe() implementation on Windows
> actually creates a pair of sockets instead of pipes, for exactly that
> reason, so that you can use select() on the returned file descriptor.
> For some reason syslogger explicitly doesn't use pgpipe() on Windows,
> though, but calls CreatePipe(). I don't see any explanation why.
>
> I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
> syslogger, and let it use pgpipe() and select() instead of the extra thread.
>
>
>

Sounds reasonable. Let's see how big the changes are on HEAD. I'm not
sure it's worth creating a different smaller fix for the back branches.

cheers

andrew


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unsafe threading in syslogger on Windows
Date: 2010-04-08 22:14:12
Message-ID: 4BBE5534.4010106@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Heikki Linnakangas wrote:
>> I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
>> syslogger, and let it use pgpipe() and select() instead of the extra
>> thread.
>
> Sounds reasonable. Let's see how big the changes are on HEAD. I'm not
> sure it's worth creating a different smaller fix for the back branches.

I tried that, and got a crash somewhere in the code that inherits the
syslogger pipe/socket to the child process. I don't understand why, and
I don't feel like debugging any deeper into that right now. If you or
someone else wants to give it a shot, that would be good. If not, I
might try again some other day after sleeping over it.

Anyway, here's the patch I had in mind for back-branches.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fix-syslogger-threading-1.patch text/x-diff 4.1 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unsafe threading in syslogger on Windows
Date: 2010-04-16 09:54:17
Message-ID: 4BC833C9.3010804@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Andrew Dunstan wrote:
>> Heikki Linnakangas wrote:
>>> I'm going to see what happens if I remove all the #ifdef WIN32 blocks in
>>> syslogger, and let it use pgpipe() and select() instead of the extra
>>> thread.
>> Sounds reasonable. Let's see how big the changes are on HEAD. I'm not
>> sure it's worth creating a different smaller fix for the back branches.
>
> I tried that, and got a crash somewhere in the code that inherits the
> syslogger pipe/socket to the child process. I don't understand why, and
> I don't feel like debugging any deeper into that right now. If you or
> someone else wants to give it a shot, that would be good. If not, I
> might try again some other day after sleeping over it.
>
> Anyway, here's the patch I had in mind for back-branches.

I've committed that patch to all branches now. In the long run it would
be good to get rid of the extra thread, but I'll leave that to someone
else with more Windows skills.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com