Re: Issue with Linux+Pentium SMP Context Switching

Lists: pgsql-hackers
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Issue with Linux+Pentium SMP Context Switching
Date: 2003-12-19 18:30:13
Message-ID: 200312191030.13499.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Folks,

I brought up this issue a couple of weeks ago on the Performance list. Since
then, I've gotten e-mail confirmation from a few other users seeing this
problem. Here's the shape of the problem, we just don't know what causes it.
I've been trying to do some profiling, but since I only have production
systems to work with it's been really slow -- I have to wait for weekly
downtime for each test. I'm hoping that someone with a greater knowledge
of Linux Kernel internals and a good test machine can help out.

Linux Versions Reported: RH and Gentoo reported, Kernels 2.4.18 to 2.4.22
Not tested on other distros/kernels. Kernels are SMP-enabled.
Hardware: Intel Pentium III and 4 dual-processor systems. 5 of the 6
reported machines are made by Dell; the other is a home-build.
Demonstrated on both hyper-threaded and non-hyperthreaded Xeons;
Cannot be reproduced on Athalons.
Description of the Problem:
When a query is made against a table with millions of rows that requires a
seq scan, large hash join, per-row calculations or other intensive operation,
the system climbs to tens or hundreds of thousands of context switches per
second (contrast with, for example, 5000cs/second on AthalonMP). This hurts
performance significantly, possibly up to doubling query execution time.
Initial debug logging of a test on one Xeon system demonstrating this issue
showed a very large number of unattributed semop() calls. We are still
following up on this.

In discussions with Linux kernel hackers online, they blame the way that
PostgreSQL uses shared memory. Whether or not they are correct, the effect
of the issue is to harm PostgreSQL's performance and make us look bad on one
of the major "enterprise" systems of choice: the multi-processor Xeon system.

Ideas, anyone?

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Kurt Roeckx <Q(at)ping(dot)be>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issue with Linux+Pentium SMP Context Switching
Date: 2003-12-19 19:07:27
Message-ID: 20031219190727.GA26648@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 19, 2003 at 10:30:13AM -0800, Josh Berkus wrote:
>
> Linux Versions Reported: RH and Gentoo reported, Kernels 2.4.18 to 2.4.22
> Not tested on other distros/kernels. Kernels are SMP-enabled.

Does the same problem show with an SMP kernel on an UP system?

> When a query is made against a table with millions of rows that requires a
> seq scan, large hash join, per-row calculations or other intensive operation,
> the system climbs to tens or hundreds of thousands of context switches per
> second (contrast with, for example, 5000cs/second on AthalonMP).

This is without any other query running, right? I even find 5000
cs/s rather large if there isn't any other process that wants
some CPU.

> In discussions with Linux kernel hackers online, they blame the way that
> PostgreSQL uses shared memory.

To me this can only make sense in case there is an other backend
trying to use the same memory, and it needs to be moved from 1
CPU to an other.

Kurt


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Kurt Roeckx <Q(at)ping(dot)be>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issue with Linux+Pentium SMP Context Switching
Date: 2003-12-19 19:17:31
Message-ID: 200312191117.31743.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kurt,

> This is without any other query running, right? I even find 5000
> cs/s rather large if there isn't any other process that wants
> some CPU.

Sorry! Darn!

Important fact left out of the problem description: The issue happens when
*two or more* intensive queries are running simultaneosly.

> To me this can only make sense in case there is an other backend
> trying to use the same memory, and it needs to be moved from 1
> CPU to an other.

Yes. See above.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Manfred Spraul <manfred(at)colorfullife(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issue with Linux+Pentium SMP Context Switching
Date: 2003-12-19 20:12:02
Message-ID: 3FE35B92.1000804@colorfullife.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:

> Initial debug logging of a test on one Xeon system demonstrating this issue
>showed a very large number of unattributed semop() calls. We are still
>following up on this.
>
Postgres has it's own user space spinlock and semaphore implementation.
Both fall back to semop if there is contention.

Hmm. You wrote that the problem is Xeon specific, and that AthlonMP are
unaffected. Perhaps Xeon cpus do not like the s_lock implementation? It
doesn't follow Intel's recommentations:
- no pause instructions.
- always TAS. The recommended approach is nonatomic tests until the
value is 0, then an atomic TAS.

Attached is a gross hack that adds pause instructions. If this doesn't
magically fix your problem, then we must figure out what causes the
semop calls, and avoid them.
Could you ask your Linux hackers why they blame the shared memory
implementation in postgres? I don't see any link between shared memory
and lock contention.

--
Manfred

Attachment Content-Type Size
patch-spinlock-i386 text/plain 497 bytes

From: Kurt Roeckx <Q(at)ping(dot)be>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issue with Linux+Pentium SMP Context Switching
Date: 2003-12-20 00:23:54
Message-ID: 20031220002354.GA27173@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 19, 2003 at 11:17:31AM -0800, Josh Berkus wrote:
>
> Important fact left out of the problem description: The issue happens when
> *two or more* intensive queries are running simultaneosly.

So two queries are enough to get this problem?

I assume the tables are so big that they don't fit in shared
memory and it needs to go read in the data? So that the problem
only shows itself when it needs to replace buffers?

If it doesn't have to go read, do you still have the problem?

Kurt


From: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issue with Linux+Pentium SMP Context Switching
Date: 2003-12-20 06:50:08
Message-ID: 200312201220.08675.shridhar_daithankar@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Saturday 20 December 2003 00:00, Josh Berkus wrote:
> In discussions with Linux kernel hackers online, they blame the way that
> PostgreSQL uses shared memory. Whether or not they are correct, the
> effect of the issue is to harm PostgreSQL's performance and make us look
> bad on one of the major "enterprise" systems of choice: the multi-processor
> Xeon system.

Two suggestions..

1. Patch linux kernel for HT aware scheduler.
2. Try running Xeons in HTdisabled modes.

See if that helps. I would say using 2.6 on it is recommended anyways.. If
possible of course..

Shridhar


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issue with Linux+Pentium SMP Context Switching
Date: 2003-12-20 18:19:33
Message-ID: 3FE492B5.40304@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>Two suggestions..
>
>1. Patch linux kernel for HT aware scheduler.
>2. Try running Xeons in HTdisabled modes.
>
>See if that helps. I would say using 2.6 on it is recommended anyways.. If
>possible of course..
>
>
>
I would avoid 2.6 on a production machine. 2.6 breaks alot (not as in a
bad thing, but as in not really compatible with) of things. Wait until
distribution vendors
are shipping production with it.

Sincerely,

Joshua Drake

> Shridhar
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - jd(at)commandprompt(dot)com - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org