Lists: | pgsql-bugspgsql-general |
---|
From: | Dmitry Tkach <dmitry(at)openratings(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Weird postmaster crashes |
Date: | 2003-06-10 18:03:37 |
Message-ID: | 3EE61D79.4000708@openratings.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
I am experiencing database server crashes quite frequently (sometimes,
*daily*), and I am having hard time identifying what could possibly be
causing them :-(
They seem to be happenning kinda randomly, I was unable to attribute
them to any specific database activity going on at the time...
The postgres log looks like:
2003-06-10 13:53:32 [14522] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [16915] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14523] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17095] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure
2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
a PROC structure
2003-06-10 13:53:32 [14527] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [14685] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17093] DEBUG: pq_recvbuf: unexpected EOF on
client connection
2003-06-10 13:53:32 [17092] DEBUG: pq_recvbuf: unexpected EOF on
client connection
.... <snip a few identical messages (with different pids)
2003-06-10 13:53:33 [14072] DEBUG: server process (pid 14551) exited
with exit code 1
2003-06-10 13:53:33 [14072] DEBUG: terminating any other active server
processes
2003-06-10 13:53:33 [1609] NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
.....
It does not even produce a core file after this - just silently exists,
and restarts itself.
Could somebody please point me to any clue what could possibly be wrong
with it?
This is 7.2.1 - I know, I need to upgrade.
Working on it, but it is going to take a while, and at the time being I
would greatly appreciate any ideas on what I can do about this thing.
Thanks a lot!
Dima
From: | Dennis Gearon <gearond(at)cvc(dot)net> |
---|---|
To: | Dmitry Tkach <dmitry(at)openratings(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Weird postmaster crashes |
Date: | 2003-06-10 18:14:07 |
Message-ID: | 3EE61FEF.6070707@cvc.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
the mantra is to always check hardware first. Do a disk and memory check.
Dmitry Tkach wrote:
> I am experiencing database server crashes quite frequently (sometimes,
> *daily*), and I am having hard time identifying what could possibly be
> causing them :-(
> They seem to be happenning kinda randomly, I was unable to attribute
> them to any specific database activity going on at the time...
> The postgres log looks like:
>
> 2003-06-10 13:53:32 [14522] DEBUG: pq_recvbuf: unexpected EOF on
> client connection
> 2003-06-10 13:53:32 [16915] DEBUG: pq_recvbuf: unexpected EOF on
> client connection
> 2003-06-10 13:53:32 [14523] DEBUG: pq_recvbuf: unexpected EOF on
> client connection
> 2003-06-10 13:53:32 [17095] DEBUG: pq_recvbuf: unexpected EOF on
> client connection
> 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
> a PROC structure
> 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
> a PROC structure
> 2003-06-10 13:53:32 [14527] DEBUG: pq_recvbuf: unexpected EOF on
> client connection
> 2003-06-10 13:53:32 [14685] DEBUG: pq_recvbuf: unexpected EOF on
> client connection
> 2003-06-10 13:53:32 [17093] DEBUG: pq_recvbuf: unexpected EOF on
> client connection
> 2003-06-10 13:53:32 [17092] DEBUG: pq_recvbuf: unexpected EOF on
> client connection
> .... <snip a few identical messages (with different pids)
>
> 2003-06-10 13:53:33 [14072] DEBUG: server process (pid 14551) exited
> with exit code 1
> 2003-06-10 13:53:33 [14072] DEBUG: terminating any other active server
> processes
> 2003-06-10 13:53:33 [1609] NOTICE: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend
> died abnormally and possibly corrupted shared memory.
> .....
>
>
> It does not even produce a core file after this - just silently exists,
> and restarts itself.
>
> Could somebody please point me to any clue what could possibly be wrong
> with it?
>
> This is 7.2.1 - I know, I need to upgrade.
> Working on it, but it is going to take a while, and at the time being I
> would greatly appreciate any ideas on what I can do about this thing.
>
> Thanks a lot!
>
> Dima
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dmitry Tkach <dmitry(at)openratings(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Weird postmaster crashes |
Date: | 2003-06-10 19:31:19 |
Message-ID: | 4243.1055273479@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
Dmitry Tkach <dmitry(at)openratings(dot)com> writes:
> I am experiencing database server crashes quite frequently
> 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
> a PROC structure
> This is 7.2.1 - I know, I need to upgrade.
Yes, you do. This is a known bug that was fixed in .3 or .4.
regards, tom lane
From: | Dmitry Tkach <dmitry(at)openratings(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Weird postmaster crashes |
Date: | 2003-06-10 20:05:25 |
Message-ID: | 3EE63A05.8080009@openratings.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
Tom Lane wrote:
>Dmitry Tkach <dmitry(at)openratings(dot)com> writes:
>
>
>>I am experiencing database server crashes quite frequently
>>
>>
>
>
>
>>2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
>>a PROC structure
>>
>>
>
>
>
>>This is 7.2.1 - I know, I need to upgrade.
>>
>>
>
>Yes, you do. This is a known bug that was fixed in .3 or .4.
>
> regards, tom lane
>
>
Thanks, Tom!
That's kinda what I suspected....
Could you give me some idea on what circumstances cause this to happen?
Thanks again!
Dima
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dmitry Tkach <dmitry(at)openratings(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Weird postmaster crashes |
Date: | 2003-06-10 20:36:37 |
Message-ID: | 4602.1055277397@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
Dmitry Tkach <dmitry(at)openratings(dot)com> writes:
> 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
> a PROC structure
> Could you give me some idea on what circumstances cause this to happen?
IIRC, it's an order-of-operations mistake during backend shutdown: the
proc structure is deallocated while it's still possible to receive an
interrupt from another backend --- and if you get such an interrupt, you
need the proc. So from the user's point of view it's pretty
unpredictable.
Short answer: upgrade. This is not the only nasty bug in 7.2.1.
regards, tom lane
From: | Dmitry Tkach <dmitry(at)openratings(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Weird postmaster crashes |
Date: | 2003-06-10 21:55:46 |
Message-ID: | 3EE653E2.9050806@openratings.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
Makes sense. Thanks!
One more thing to clarify - when you said it was fixed in .3 and .4 did
you mean 7.3 or 7.2.3?
Thanks!
Dima
Tom Lane wrote:
>Dmitry Tkach <dmitry(at)openratings(dot)com> writes:
>
>
>>2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without
>>a PROC structure
>>
>>
>
>
>
>>Could you give me some idea on what circumstances cause this to happen?
>>
>>
>
>IIRC, it's an order-of-operations mistake during backend shutdown: the
>proc structure is deallocated while it's still possible to receive an
>interrupt from another backend --- and if you get such an interrupt, you
>need the proc. So from the user's point of view it's pretty
>unpredictable.
>
>Short answer: upgrade. This is not the only nasty bug in 7.2.1.
>
> regards, tom lane
>
>
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dmitry Tkach <dmitry(at)openratings(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: [GENERAL] Weird postmaster crashes |
Date: | 2003-06-10 22:00:31 |
Message-ID: | 5801.1055282431@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
Dmitry Tkach <dmitry(at)openratings(dot)com> writes:
> One more thing to clarify - when you said it was fixed in .3 and .4 did
> you mean 7.3 or 7.2.3?
I meant I couldn't remember whether it was first fixed in 7.2.3 or 7.2.4.
Doesn't matter for your purposes --- as long as you're updating, you
should go to 7.2.4.
7.3.* has the fix also of course, but updating to 7.3 is a much bigger
task.
regards, tom lane