Re: What does this tell me?

Lists: pgsql-performance
From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: What does this tell me?
Date: 2002-10-09 03:46:12
Message-ID: web-1776486@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Folks,

I'm still having trouble with my massive data transformation procedures
taking forever to finish. Particularly, many of them will get about
1/2 way through, and then I will start seeing this in the log:

2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000E4
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000E5
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000E6
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000E7
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000E8
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000E9
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000EA
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000EB
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000EC
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000ED
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000EE
2002-10-08 20:37:13 DEBUG: recycled transaction log file
0000000A000000EF
2002-10-08 20:37:13 DEBUG: reaping dead processes
2002-10-08 20:37:13 DEBUG: child process (pid 15270) exited with exit
code 0

... repeat ad nauseum. The problem is, each "recycle transaction log
... reaping dead child process" cycle takes about 4-7 minutes ...
meaning that the procedure can take up to 1/2 hour to finish, and
sometimes not finish at all.

Obviously, the system is telling me that it is running out of resources
somehow. But I'm at my wit's end to figure out what resources,
exactly. Suggestions?

-Josh Berkus


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 03:49:36
Message-ID: 200210090349.g993nab27932@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


I think all it means is that is doesn't need some of the pg_clog files
and is reusing them, basically meaning you are pushing through lots of
transactions. I don't see it as a problem.

---------------------------------------------------------------------------

Josh Berkus wrote:
> Folks,
>
> I'm still having trouble with my massive data transformation procedures
> taking forever to finish. Particularly, many of them will get about
> 1/2 way through, and then I will start seeing this in the log:
>
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E4
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E5
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E6
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E7
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E8
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E9
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EA
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EB
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EC
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000ED
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EE
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EF
> 2002-10-08 20:37:13 DEBUG: reaping dead processes
> 2002-10-08 20:37:13 DEBUG: child process (pid 15270) exited with exit
> code 0
>
> ... repeat ad nauseum. The problem is, each "recycle transaction log
> ... reaping dead child process" cycle takes about 4-7 minutes ...
> meaning that the procedure can take up to 1/2 hour to finish, and
> sometimes not finish at all.
>
> Obviously, the system is telling me that it is running out of resources
> somehow. But I'm at my wit's end to figure out what resources,
> exactly. Suggestions?
>
> -Josh Berkus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Sean Chittenden <sean(at)chittenden(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 03:50:43
Message-ID: 20021009035043.GC34365@perrin.int.nxad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E4
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E5
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E6
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E7
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E8
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000E9
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EA
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EB
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EC
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000ED
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EE
> 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> 0000000A000000EF
> 2002-10-08 20:37:13 DEBUG: reaping dead processes
> 2002-10-08 20:37:13 DEBUG: child process (pid 15270) exited with exit
> code 0
>
> ... repeat ad nauseum. The problem is, each "recycle transaction
> log ... reaping dead child process" cycle takes about 4-7 minutes
> ... meaning that the procedure can take up to 1/2 hour to finish,
> and sometimes not finish at all.
>
> Obviously, the system is telling me that it is running out of
> resources somehow. But I'm at my wit's end to figure out what
> resources, exactly. Suggestions?

You're running out of WAL log space, iirc. Increase the number of WAL
logs available and you should be okay. If you're experiencing this
halfway through, I'd increase the size by 50%, say maybe 60-70% for
good measure. -sc

--
Sean Chittenden


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Sean Chittenden <sean(at)chittenden(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 03:55:10
Message-ID: 200210090355.g993tAU02565@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Sean Chittenden wrote:
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000E4
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000E5
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000E6
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000E7
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000E8
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000E9
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000EA
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000EB
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000EC
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000ED
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000EE
> > 2002-10-08 20:37:13 DEBUG: recycled transaction log file
> > 0000000A000000EF
> > 2002-10-08 20:37:13 DEBUG: reaping dead processes
> > 2002-10-08 20:37:13 DEBUG: child process (pid 15270) exited with exit
> > code 0
> >
> > ... repeat ad nauseum. The problem is, each "recycle transaction
> > log ... reaping dead child process" cycle takes about 4-7 minutes
> > ... meaning that the procedure can take up to 1/2 hour to finish,
> > and sometimes not finish at all.
> >
> > Obviously, the system is telling me that it is running out of
> > resources somehow. But I'm at my wit's end to figure out what
> > resources, exactly. Suggestions?
>
> You're running out of WAL log space, iirc. Increase the number of WAL
> logs available and you should be okay. If you're experiencing this
> halfway through, I'd increase the size by 50%, say maybe 60-70% for
> good measure. -sc

Oh, yes, you are right. My hardware tuning guide mentions it. Strange
it is called the transaction log file:

http://www.ca.postgresql.org/docs/momjian/hw_performance/

Unless you are seeing this more freqently than every minute, it should
be fine.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Sean Chittenden <sean(at)chittenden(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 04:01:58
Message-ID: web-1776515@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bruce, Sean,

> Oh, yes, you are right. My hardware tuning guide mentions it.
> Strange
> it is called the transaction log file:
>
> http://www.ca.postgresql.org/docs/momjian/hw_performance/
>
> Unless you are seeing this more freqently than every minute, it
> should
> be fine.

Actually, it's apparently a real problem, because the function never
completes. Each cycle of "recycling transaction logs" takes longer
and longer, and eventually locks up completely.

What the function is doing is a succession of data cleanup procedures,
updating the same table about 50 times. I will be very thankful for
the day when I can commit within a procedure.

Unfortunately, I am already at the maximum number of WAL files (64).
What do I do now?

-Josh Berkus


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Sean Chittenden <sean(at)chittenden(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 04:07:24
Message-ID: 200210090407.g9947O904553@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh Berkus wrote:
> Bruce, Sean,
>
> > Oh, yes, you are right. My hardware tuning guide mentions it.
> > Strange
> > it is called the transaction log file:
> >
> > http://www.ca.postgresql.org/docs/momjian/hw_performance/
> >
> > Unless you are seeing this more freqently than every minute, it
> > should
> > be fine.
>
> Actually, it's apparently a real problem, because the function never
> completes. Each cycle of "recycling transaction logs" takes longer
> and longer, and eventually locks up completely.
>
> What the function is doing is a succession of data cleanup procedures,
> updating the same table about 50 times. I will be very thankful for
> the day when I can commit within a procedure.
>
> Unfortunately, I am already at the maximum number of WAL files (64).
> What do I do now?

Wow, that is interesting. I thought one big transaction wouldn't lock
up the WAL records. I figured there would be a CHECKPOINT, and then the
WAL records could be recycled, even though the transaction is still
open.

Where do you see 64 as the maximum number of WAL segments. What is your
checkpoint_segments value? The actual number of files shouldn't be much
more than twice that value. What PostgreSQL version are you using?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Sean Chittenden <sean(at)chittenden(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 04:17:50
Message-ID: 20021009041750.GB56492@perrin.int.nxad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> > Oh, yes, you are right. My hardware tuning guide mentions it.
> > Strange it is called the transaction log file:
> >
> > http://www.ca.postgresql.org/docs/momjian/hw_performance/
> >
> > Unless you are seeing this more freqently than every minute, it
> > should
> > be fine.
>
> Actually, it's apparently a real problem, because the function never
> completes. Each cycle of "recycling transaction logs" takes longer
> and longer, and eventually locks up completely.
>
> What the function is doing is a succession of data cleanup
> procedures, updating the same table about 50 times. I will be very
> thankful for the day when I can commit within a procedure.
>
> Unfortunately, I am already at the maximum number of WAL files (64).
> What do I do now?

Isn't it possible to increase the size of your wal logs? I seem to
remember a tunable existing, but I can't find it in the default
config. Someone else know how off the top of their head? -sc

--
Sean Chittenden


From: Joe Conway <mail(at)joeconway(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Sean Chittenden <sean(at)chittenden(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 04:55:21
Message-ID: 3DA3B6B9.4010207@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh Berkus wrote:
> What the function is doing is a succession of data cleanup procedures,
> updating the same table about 50 times. I will be very thankful for
> the day when I can commit within a procedure.

If that's the case, can you split the work up into multiple functions, and
execute them all from a shell script? Or perhaps even offload some of the data
massaging to perl or something? (It would be easier to recommend alternate
approaches with more details.)

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Sean Chittenden <sean(at)chittenden(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 05:22:26
Message-ID: 17558.1034140946@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Josh Berkus" <josh(at)agliodbs(dot)com> writes:
> Actually, it's apparently a real problem, because the function never
> completes. Each cycle of "recycling transaction logs" takes longer
> and longer, and eventually locks up completely.
> What the function is doing is a succession of data cleanup procedures,
> updating the same table about 50 times. I will be very thankful for
> the day when I can commit within a procedure.

I think you are barking up the wrong tree.

The messages you show are perfectly normal operation, and prove nothing
much except that you pumped a lot of database updates through the
system. I think there's something wrong with your data transformation
application logic; or perhaps you are pumping so many updates through
your tables that you need some intermediate VACUUMs to get rid of
dead tuples. But messing with the WAL log parameters isn't going to
do a darn thing for you ... IMHO anyway.

regards, tom lane


From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Sean Chittenden <sean(at)chittenden(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 05:22:37
Message-ID: web-1776618@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Joe,

> If that's the case, can you split the work up into multiple
> functions, and execute them all from a shell script? Or perhaps even
> offload some of the data massaging to perl or something? (It would be
> easier to recommend alternate approaches with more details.)

I've already split it up into 11 functions, which are being managed
through Perl with ANALYZE statements between. Breaking it down
further would be really unmanageable.

Not to be mean or anything (after all, I just joined pgsql-advocacy),
I'm getting *much* worse performance on large data transformations from
PostgreSQL 7.2.1, than I get from SQL Server 7.0 on inferior hardware
(at least, except where SQL Server 7.0 crashes). I really am determined
to prove that it's because I've misconfigured it, and I thank all of
you for your help in doing so.

PGBench Results:
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 100
number of transactions per client: 10
number of transactions actually processed: 1000/1000
tps = 93.206356(including connections establishing)
tps = 103.237007(excluding connections establishing)

Of course, I don't have much to compare these to, so I don't know if
that's good or bad.

-Josh Berkus


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Sean Chittenden <sean(at)chittenden(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: What does this tell me?
Date: 2002-10-09 13:57:18
Message-ID: 1034171838.11703.11.camel@camel
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 2002-10-09 at 01:22, Josh Berkus wrote:
> Joe,
>
> > If that's the case, can you split the work up into multiple
> > functions, and execute them all from a shell script? Or perhaps even
> > offload some of the data massaging to perl or something? (It would be
> > easier to recommend alternate approaches with more details.)
>
> I've already split it up into 11 functions, which are being managed
> through Perl with ANALYZE statements between. Breaking it down
> further would be really unmanageable.
>

If I read Tom's suggestion correctly, you should probably change these
to vacuum analyze instead of analyze.

> Not to be mean or anything (after all, I just joined pgsql-advocacy),
> I'm getting *much* worse performance on large data transformations from
> PostgreSQL 7.2.1, than I get from SQL Server 7.0 on inferior hardware
> (at least, except where SQL Server 7.0 crashes).

what?? that's blasphamy!! revoke this mans advocacy membership right
now!! ;-)

I really am determined
> to prove that it's because I've misconfigured it, and I thank all of
> you for your help in doing so.
>

FWIW I just ran into a similar situation where I was doing 6
simultaneous pg_restores of our production database on my local
workstation. Apparently this pumps a lot of data through the wal logs.
I did kick up the number of wal files, but I also ended up kicking up
the number of wal_buffers as well and that seemed to help.

Robert Treat