BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover

Lists: pgsql-bugs
From: nathanmascitelli(at)geotab(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover
Date: 2016-04-19 12:30:34
Message-ID: 20160419123034.22924.9159@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 14101
Logged by: Nathan Mascitelli
Email address: nathanmascitelli(at)geotab(dot)com
PostgreSQL version: 9.4.5
Operating system: Windows Server 2012 R2
Description:

Hello,

I am running Postgres 9.4 on Windows Server 2012. I have had postgres crash
a few times with the following error:

FATAL: could not reattach to shared memory (key=00000000000000D0,
addr=00000001405E0000): error code 1455

Looking around
(http://stackoverflow.com/questions/24614314/postgresql-8-3-7-fatal-could-not-reattach-to-shared-memory-and-warning-wor)
it looks like this error was supposed to be fixed back in 8.3.

Around the time of the crash it appears to have free RAM available.

My server specs:

OS: Windows Server 2012 x64
CPU: Intel Xeon CPU 2.30GHz (x2)
RAM: 52GB
Postgres: 9.4.5

Some settings from postgres:

shared_buffers = 512MB
effective_cache_size = 39GB
work_mem = 90MB
maintenance_work_mem = 2000MB

Can anyone suggest what might be causing the crash? If you need more info
please let me know.

Thanks,

Nathan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: nathanmascitelli(at)geotab(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover
Date: 2016-04-20 03:57:00
Message-ID: 25458.1461124620@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

nathanmascitelli(at)geotab(dot)com writes:
> I am running Postgres 9.4 on Windows Server 2012. I have had postgres crash
> a few times with the following error:
> FATAL: could not reattach to shared memory (key=00000000000000D0,
> addr=00000001405E0000): error code 1455

Hm, that's an odd one. According to
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681385(v=vs.85).aspx
that is

ERROR_COMMITMENT_LIMIT
1455 (0x5AF)
The paging file is too small for this operation to complete.

which suggests that this is a resource-exhaustion type of problem.
How many backend processes are you trying to run concurrently,
and what else is running on the machine?

regards, tom lane


From: Nathan Mascitelli <nathanmascitelli(at)geotab(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover
Date: 2016-04-20 12:03:44
Message-ID: CAJddYyii+-5fDO2QKP3n30MgtOREq8jOgMb25vKabDGRSDdKcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi Tom,​

The machine has the postgres server and our application server on it. There
are ​around 300 databases and typically ~1000 connections. At the time of
the crash there was free RAM and disk space on the server.

Is there any other information I can give you?

Thanks,

*Nathan Mascitelli*
Geotab Inc.
Software Developer | B. Eng Engineering Physics

[Direct] *+1 (289) 681-1005*
[Toll-Free] *+1 (877) 436-8221*
[Visit] www.geotab.com
Twitter <http://twitter.com/geotab> | Facebook
<http://www.facebook.com/geotab> | YouTube <http://www.youtube.com/mygeotab>
| LinkedIn
<http://www.linkedin.com/company/102661?trk=tyah&trkInfo=tarId:1403199250031,tas:geotab,idx:2-1-3>

On Tue, Apr 19, 2016 at 11:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> nathanmascitelli(at)geotab(dot)com writes:
> > I am running Postgres 9.4 on Windows Server 2012. I have had postgres
> crash
> > a few times with the following error:
> > FATAL: could not reattach to shared memory (key=00000000000000D0,
> > addr=00000001405E0000): error code 1455
>
> Hm, that's an odd one. According to
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms681385(v=vs.85).aspx
> that is
>
> ERROR_COMMITMENT_LIMIT
> 1455 (0x5AF)
> The paging file is too small for this operation to complete.
>
> which suggests that this is a resource-exhaustion type of problem.
> How many backend processes are you trying to run concurrently,
> and what else is running on the machine?
>
> regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nathan Mascitelli <nathanmascitelli(at)geotab(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover
Date: 2016-04-20 14:09:20
Message-ID: 21783.1461161360@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Nathan Mascitelli <nathanmascitelli(at)geotab(dot)com> writes:
> The machine has the postgres server and our application server on it. There
> are around 300 databases and typically ~1000 connections. At the time of
> the crash there was free RAM and disk space on the server.

You're a braver man than I, to trust Windows with a 1000-connection
server. But anyway, if the backend count is that high it's far from
surprising that you hit some Windows resource limit or other. It's
widely considered best practice to use a connection pooler to limit
the number of backends to something a lot less than that, regardless
of platform.

regards, tom lane


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Nathan Mascitelli <nathanmascitelli(at)geotab(dot)com>
Subject: Re: BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover
Date: 2016-04-20 14:31:28
Message-ID: 571792C0.3060503@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 4/20/2016 7:09 AM, Tom Lane wrote:
> Nathan Mascitelli<nathanmascitelli(at)geotab(dot)com> writes:
>> >The machine has the postgres server and our application server on it. There
>> >are ​around 300 databases and typically ~1000 connections. At the time of
>> >the crash there was free RAM and disk space on the server.
> You're a braver man than I, to trust Windows with a 1000-connection
> server. But anyway, if the backend count is that high it's far from
> surprising that you hit some Windows resource limit or other. It's
> widely considered best practice to use a connection pooler to limit
> the number of backends to something a lot less than that, regardless
> of platform.

with 300 databases in use, a pooler would not be much help. 1000
connections is an average of 3 connections per database.

--
john r pierce, recycling bits in santa cruz


From: Nathan Mascitelli <nathanmascitelli(at)geotab(dot)com>
To: John R Pierce <pierce(at)hogranch(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover
Date: 2016-04-20 15:17:23
Message-ID: CAJddYyguxXFLj-w8xs1gn1WKm-htBYWU3FYdU_A-K1f0TypCow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

We are using a connection pooler. On average we see 2-5 connections per
database. But it sounds like we would need to either collect connections
more aggressively or lower the number of databases/server correct?

*Nathan Mascitelli*
Geotab Inc.
Software Developer | B. Eng Engineering Physics

[Direct] *+1 (289) 681-1005*
[Toll-Free] *+1 (877) 436-8221*
[Visit] www.geotab.com
Twitter <http://twitter.com/geotab> | Facebook
<http://www.facebook.com/geotab> | YouTube <http://www.youtube.com/mygeotab>
| LinkedIn
<http://www.linkedin.com/company/102661?trk=tyah&trkInfo=tarId:1403199250031,tas:geotab,idx:2-1-3>

On Wed, Apr 20, 2016 at 10:31 AM, John R Pierce <pierce(at)hogranch(dot)com> wrote:

> On 4/20/2016 7:09 AM, Tom Lane wrote:
>
>> Nathan Mascitelli<nathanmascitelli(at)geotab(dot)com> writes:
>>
>>> >The machine has the postgres server and our application server on it.
>>> There
>>> >are ​around 300 databases and typically ~1000 connections. At the time
>>> of
>>> >the crash there was free RAM and disk space on the server.
>>>
>> You're a braver man than I, to trust Windows with a 1000-connection
>> server. But anyway, if the backend count is that high it's far from
>> surprising that you hit some Windows resource limit or other. It's
>> widely considered best practice to use a connection pooler to limit
>> the number of backends to something a lot less than that, regardless
>> of platform.
>>
>
> with 300 databases in use, a pooler would not be much help. 1000
> connections is an average of 3 connections per database.
>
>
>
>
>
> --
> john r pierce, recycling bits in santa cruz
>
>


From: Nathan Mascitelli <nathanmascitelli(at)geotab(dot)com>
To: John R Pierce <pierce(at)hogranch(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover
Date: 2016-04-20 16:37:55
Message-ID: CAJddYyjFC3FNP5CsSg8FpH66CC6Dej+46_AZ1WfGmJNueY9QFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

That's a good idea, I'll have to play around and see what we can do.

In your opinion would a Linux server be able to handle this setup? Would
1000 connections/processes be a problem on Linux?

Thanks,

*Nathan Mascitelli*
Geotab Inc.
Software Developer | B. Eng Engineering Physics

[Direct] *+1 (289) 681-1005*
[Toll-Free] *+1 (877) 436-8221*
[Visit] www.geotab.com
Twitter <http://twitter.com/geotab> | Facebook
<http://www.facebook.com/geotab> | YouTube <http://www.youtube.com/mygeotab>
| LinkedIn
<http://www.linkedin.com/company/102661?trk=tyah&trkInfo=tarId:1403199250031,tas:geotab,idx:2-1-3>

On Wed, Apr 20, 2016 at 11:21 AM, John R Pierce <pierce(at)hogranch(dot)com> wrote:

> On 4/20/2016 8:17 AM, Nathan Mascitelli wrote:
>
>> We are using a connection pooler. On average we see 2-5 connections per
>> database. But it sounds like we would need to either collect connections
>> more aggressively or lower the number of databases/server correct?
>>
>
>
> depending on the use case for these 300 different databases, perhaps they
> could be consolidated into 'schemas' within a smaller number of databases.
>
>
> --
> john r pierce, recycling bits in santa cruz
>
>


From: David Gould <daveg(at)sonic(dot)net>
To: Nathan Mascitelli <nathanmascitelli(at)geotab(dot)com>
Cc: John R Pierce <pierce(at)hogranch(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14101: Postgres Service Crashes With Memory Error And Does Not Recover
Date: 2016-04-22 05:13:08
Message-ID: 20160421221308.258c4fb0@engels
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Wed, 20 Apr 2016 12:37:55 -0400
Nathan Mascitelli <nathanmascitelli(at)geotab(dot)com> wrote:

> That's a good idea, I'll have to play around and see what we can do.
>
> In your opinion would a Linux server be able to handle this setup? Would
> 1000 connections/processes be a problem on Linux?

As long as the machine has the resources, this is not a problem. I have
clients with 1800 to 2000 connections. It will help a bit if you configure
huge pages and you will possibly need to configure more semaphores depending
on what linux version you are on. You can contact me for details if you
actually do this.

-dg

--
David Gould daveg(at)sonic(dot)net
If simplicity worked, the world would be overrun with insects.