Lists: | pgsql-bugs |
---|
From: | "Doran L(dot) Barton" <fozz(at)iodynamics(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | 7.1.3 w/ Perl/DBI application hangs |
Date: | 2001-11-12 19:06:56 |
Message-ID: | 20011112120656.A2531@iodynamics.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
I've got a fairly mission-critical Perl/mod_perl application built on top of
PostgreSQL that is causing me grief. I've been battling it for a week or so
and have determined I think the problem is a PostgreSQL bug.
The application is a web-based event-scheduling system for an office of
accountants. The server is an Athlon 1.0Ghz CPU running Red Hat 7.2 and Red
Hat's PostgreSQL 7.1.3 build. The hard disks are UW SCSI disks connected
via software RAID level 1 (complements of Red Hat Linux). There is 256M of
RAM available. I've tweaked the max_connections, shared_buffers, and
sort_mem variables with no change in behavior resulting.
I recently revised the application to make better use of complex queries
in PostgreSQL instead of doing most of the work in Perl - which was making
things quite slow.
The problem manifests itself when these more-complex queries seem to "hang" -
never returning any kind of result whatsoever. When this happens, a 'ps' on
the server shows SELECT queries spinning - taking up CPU and memory. When
frustrated users refresh their browsers, this creates more of these
processes - until someone pages me and I log in and the load average is
something like 12.2.
I do VACUUM ANALYZE daily - even though the data in the database doesn't
change that much. There are approximately 16,000 records in the events
database which comprises about three years of events data (this application
has been running for a couple years). Doing a VACUUM ANALYZE when the
database is "hanging" doesn't change anything. The VACUUM ANALYZE does
return a successful result.
Stopping the postgresql service and restarting also does not fix the
proble. It does kill off the "hanging" processes, but once the database
server is running again, new queries hang just as before.
The ONLY (temporary) solution I've found is:
(1) shut down Apache
(2) restart PostgreSQL
(3) pg_dump the database
(4) stop Postgresql
(5) rename the data/ directory and create a new data/ directory
(6) Initalize the database in the new data/ directory
(7) recreate users and my database
(8) restore the data from the previous dump
(9) restart Apache
After I do the above steps, the application works fine for about 24 hours.
Then the problem begins manifesting itself again. Another thing I've
noticed is that this problem devlops independently of actual application
usage. For example, I will go through the above steps on a Saturday evening
and users will complain of not being able to use the application on Monday
morning. The application is not used at all on Sundays.
Now, I'm not familiar with some more the more down-and-dirty tools of
debugging PostgreSQL (although I've been using PG for various web-based
application projects since 1995), so please bear with me. I'll run
whatever commands I am instructed. If a developer needs ssh access to
the machine to witness first-hand what is going on, that can be arranged.
This is a serious bug and it is affecting a potentially cool application.
Please help me. :) I'm a PostgreSQL poster child.
BTW: The queries I'm submitting look like this:
-- Find weekly/biweekly recurring events for user
SELECT event.* FROM
event, u_participant WHERE
event.event_id = u_participant.event_id AND
u_participant.username = 'rld' AND
(
event.recur = 'Bi-Weekly' OR
event.recur = 'Weekly' ) AND
event.t_begin < '2001-08-05' AND
event.recur_end >= '2001-07-29'
UNION
-- Find weekly/biweekly recurring events for groups user is in
SELECT event.* FROM
event, g_participant, group_user WHERE
event.event_id = g_participant.event_id AND
group_user.group_id = g_participant.group_id AND
group_user.username = 'rld' AND
(
event.recur = 'Bi-Weekly' OR
event.recur = 'Weekly' ) AND
event.t_begin < '2001-08-05' AND
event.recur_end >= '2001-07-29'
UNION
-- Find monthly recurring events for user
SELECT event.* FROM
event, u_participant WHERE
event.event_id = u_participant.event_id AND
u_participant.username = 'rld' AND
event.recur = 'Monthly' AND
(
(
date_part('month', DATE '2001-07-29') =
date_part('month', DATE '2001-08-05') AND
(
date_part('day', event.t_begin) >=
date_part('day', DATE '2001-07-29') AND
date_part('day', event.t_begin) <
date_part('day', DATE '2001-08-05'))) OR
(
(
date_part('month', DATE '2001-07-29') =
date_part('month', event.t_begin) AND
date_part('day', event.t_begin) >=
date_part('day', DATE '2001-07-29')) OR
(
date_part('month', DATE '2001-08-05') =
date_part('month', event.t_begin) AND
date_part('day', event.t_begin) <
date_part('day', DATE '2001-08-05')))) AND
event.t_begin < '2001-08-05' AND
event.recur_end >= '2001-07-29'
UNION ...
... You get the picture.
-=Fozz
--
-------------------------------------------------------------------------
Doran L. Barton <fozz(at)iodynamics(dot)com> - Chief Super Hero - Iodynamics LLC
< http://www.iodynamics.com/ > - Linux solutions and dynamic websites
"Cars will not have intercourse on this bridge."
-- Seen in a Tokyo traffic handbook
From: | "Doran L(dot) Barton" <fozz(at)iodynamics(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 7.1.3 w/ Perl/DBI application hangs |
Date: | 2001-11-13 08:13:48 |
Message-ID: | 20011113011348.A5207@iodynamics.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
I have made an interesting observation that SHOULD have been made before
(yeah, stupid of me not to have made this connection before):
The problem I'm having with the queries "hanging" is apparently happening
as soon as a VACUUM ANALYZE is done on the database. Until a VACUUM ANALYZE
is done, the database is speedy and responsive. As soon as a VACUUM ANALYZE
is performed, the same queries executed before spin indefinitely.
A VACUUM by itself doesn't bring on this behavior- so it must be the
ANALYZE portion.
Again, this is 7.1.3 built by Red Hat for Red Hat 7.2 running on an Athlon
1.0Ghz system. Please let me know what information could be helpful in
tracking down this "bug." Is it a bug?
-=Fozz
--
-------------------------------------------------------------------------
Doran L. Barton <fozz(at)iodynamics(dot)com> - Chief Super Hero - Iodynamics LLC
< http://www.iodynamics.com/ > - Linux solutions and dynamic websites
"We take your bags and send them in all directions."
-- Seen in a Copenhagen airline ticket office
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Doran L(dot) Barton" <fozz(at)iodynamics(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 7.1.3 w/ Perl/DBI application hangs |
Date: | 2001-11-13 18:05:29 |
Message-ID: | 2674.1005674729@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
"Doran L. Barton" <fozz(at)iodynamics(dot)com> writes:
> The problem I'm having with the queries "hanging" is apparently happening
> as soon as a VACUUM ANALYZE is done on the database. Until a VACUUM ANALYZE
> is done, the database is speedy and responsive. As soon as a VACUUM ANALYZE
> is performed, the same queries executed before spin indefinitely.
> A VACUUM by itself doesn't bring on this behavior- so it must be the
> ANALYZE portion.
Interesting. VACUUM ANALYZE essentially does the following:
begin transaction;
vacuum table; -- slow
commit transaction;
[ stop here if no ANALYZE requested ]
begin transaction;
delete rows for table in pg_statistic;
scan table to construct new statistics; -- slow
add rows for table to pg_statistic;
commit transaction;
I am wondering if queries started while the ANALYZE scan is in progress
somehow fail to see any statistics available for the table, and end up
choosing stupid query plans. That shouldn't happen, since the deletion
of the old rows is part of a not-yet-committed transaction --- other
transactions should see the old rows as still valid. But it'd be worth
your while to launch a few EXPLAINs to see if the plans for the problem
queries change while the ANALYZE phase is in progress.
regards, tom lane