Re: 9.6 and fsync=off

Lists: pgsql-hackers
From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: 9.6 and fsync=off
Date: 2016-04-27 09:58:08
Message-ID: CAMsr+YGechDvrFj4xwoBWhJ7rW25Avqo+UqcyqbnS=LTFdGJ4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all

After helping clean up the mess from another user who turned fsync=off
because they read (bad) tuning advice that it was faster, I'd really like
to change the config file comment.

Really.

#fsync = on # turns forced synchronization on
or off

Now, we can't rename fsync to disable_crash_safety=on or
corrupt_my_database=on. But the comment needs changing.

How about:

#fsync = on # force disk flushes required for
crash safety

or, preferably something like:

"Enable forced disk flushes when they are required for crash safety.
Disabling fsync can lead to unrecoverable database corruption in a crash of
the host system."

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.6 and fsync=off
Date: 2016-04-27 10:43:07
Message-ID: 20160427104307.GA4861@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2016-04-27 17:58:08 +0800, craig(at)2ndquadrant(dot)com wrote:
>
> #fsync = on # turns forced synchronization on or off

I suggest: # provide crash safety by flushing disk writes
# (Disabling this can lead to unrecoverable data
# loss if the system crashes.)

-- Abhijit


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-27 10:53:04
Message-ID: CABUevExGL-FWNP0pyCH_9nY==i1ogP-v8XNXpQyDaEp7u4ypCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 27, 2016 at 12:43 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
wrote:

> At 2016-04-27 17:58:08 +0800, craig(at)2ndquadrant(dot)com wrote:
> >
> > #fsync = on # turns forced synchronization
> on or off
>
> I suggest: # provide crash safety by
> flushing disk writes
> # (Disabling this can lead to
> unrecoverable data
> # loss if the system crashes.)
>

+1 for the change. I suggest shortening it to just "disabling this can lead
to unrecoverable data corruption" (I think corruption is better than loss,
mainly because too many people equate loss with "i may loose my last 10
updates, and I'm fine with that).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-27 10:56:39
Message-ID: 57209AE7.9020906@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27/04/16 12:53, Magnus Hagander wrote:
>
>
> On Wed, Apr 27, 2016 at 12:43 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com
> <mailto:ams(at)2ndquadrant(dot)com>> wrote:
>
> At 2016-04-27 17:58:08 +0800, craig(at)2ndquadrant(dot)com
> <mailto:craig(at)2ndquadrant(dot)com> wrote:
> >
> > #fsync = on # turns forced synchronization on or off
>
> I suggest: # provide crash safety by
> flushing disk writes
> # (Disabling this can
> lead to unrecoverable data
> # loss if the system
> crashes.)
>
>
> +1 for the change. I suggest shortening it to just "disabling this can
> lead to unrecoverable data corruption" (I think corruption is better
> than loss, mainly because too many people equate loss with "i may loose
> my last 10 updates, and I'm fine with that).
>

+1 (Abhijit's wording with data loss changed to data corruption)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-27 10:57:57
Message-ID: 20160427105756.GB4861@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's a patch just to help things along.

-- Abhijit

Attachment Content-Type Size
fsync.diff text/x-diff 714 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-27 13:44:26
Message-ID: 24748.1461764666@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
> +1 (Abhijit's wording with data loss changed to data corruption)

I'd suggest something like

#fsync = on # flush data to disk for crash safety
# (turning this off can cause
# unrecoverable data corruption!)

regards, tom lane


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-27 15:04:37
Message-ID: CAMsr+YEy9zj2Nb_n8OFMAf2HpX-RVpb12sWYD2uakPy0Um9kag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27 April 2016 at 21:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
> > +1 (Abhijit's wording with data loss changed to data corruption)
>
> I'd suggest something like
>
> #fsync = on # flush data to disk for crash
> safety
> # (turning this off can cause
> # unrecoverable data corruption!)
>
>
Looks good.

The docs on fsync are already good, it's just a matter of making people
think twice and actually look at them.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-27 17:48:27
Message-ID: CA+TgmoYXP_jePW_KR+xwzMWqcqdT44FxL-tgBz9+zkZbwtWjZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 27, 2016 at 11:04 AM, Craig Ringer <craig(at)2ndquadrant(dot)com>
wrote:>> I'd suggest something like
>>
>> #fsync = on # flush data to disk for crash
>> safety
>> # (turning this off can cause
>> # unrecoverable data corruption!)
>>
>
> Looks good.
>
> The docs on fsync are already good, it's just a matter of making people
> think twice and actually look at them.

Committed that way. Thanks for suggesting this, Craig.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Stark <stark(at)mit(dot)edu>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-28 18:53:50
Message-ID: CAM-w4HOJmMzR04OOHh6mk-rPfcbgGyVKfmgPWXsOJnmR_0sC7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 27, 2016 at 10:58 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> Now, we can't rename fsync to disable_crash_safety=on or
> corrupt_my_database=on. But the comment needs changing.

Fwiw we've done similar things in the past. We can provide
backwards-compatibility support for "fsync" but make the setting
appear as "crash_safety" or whatever in pg_settings and in the default
postgres.conf. The only downside is that tools or scripts that
retrieve all the settings might break or miss that setting.

--
greg


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-28 19:32:37
Message-ID: CANP8+jLAph=Gk1hQy_+o8CvgxuHahZpgh7kqSy2LMq1+OsJS1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27 April 2016 at 17:04, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 27 April 2016 at 21:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
>> > +1 (Abhijit's wording with data loss changed to data corruption)
>>
>> I'd suggest something like
>>
>> #fsync = on # flush data to disk for crash
>> safety
>> # (turning this off can cause
>> # unrecoverable data corruption!)
>>
>>
> Looks good.
>
> The docs on fsync are already good, it's just a matter of making people
> think twice and actually look at them.
>

If fsync=off and you turn it on, does it fsync anything at that point?

Or does it mean only that future fsyncs will occur?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-28 20:30:37
Message-ID: CAKFQuwbDxCxiTGa6wqA1Mx5Y44c6pBg3t6jmRqDg-6J1beG8YQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday, April 28, 2016, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 27 April 2016 at 17:04, Craig Ringer <craig(at)2ndquadrant(dot)com
> <javascript:_e(%7B%7D,'cvml','craig(at)2ndquadrant(dot)com');>> wrote:
>
>> On 27 April 2016 at 21:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
>> <javascript:_e(%7B%7D,'cvml','tgl(at)sss(dot)pgh(dot)pa(dot)us');>> wrote:
>>
>>> Petr Jelinek <petr(at)2ndquadrant(dot)com
>>> <javascript:_e(%7B%7D,'cvml','petr(at)2ndquadrant(dot)com');>> writes:
>>> > +1 (Abhijit's wording with data loss changed to data corruption)
>>>
>>> I'd suggest something like
>>>
>>> #fsync = on # flush data to disk for crash
>>> safety
>>> # (turning this off can cause
>>> # unrecoverable data corruption!)
>>>
>>>
>> Looks good.
>>
>> The docs on fsync are already good, it's just a matter of making people
>> think twice and actually look at them.
>>
>
> If fsync=off and you turn it on, does it fsync anything at that point?
>
> Or does it mean only that future fsyncs will occur?
>
>
http://www.postgresql.org/docs/current/static/runtime-config-wal.html

4th paragraph in the fsync section.

David J.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-28 20:38:31
Message-ID: CANP8+jKtR+MCL3YAvT5b_aP3g-E5Mq_Bq5G4bh1oMcijnrRVcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28 April 2016 at 22:30, David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
wrote:

> On Thursday, April 28, 2016, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> On 27 April 2016 at 17:04, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>>
>>> On 27 April 2016 at 21:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>>> Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
>>>> > +1 (Abhijit's wording with data loss changed to data corruption)
>>>>
>>>> I'd suggest something like
>>>>
>>>> #fsync = on # flush data to disk for crash
>>>> safety
>>>> # (turning this off can cause
>>>> # unrecoverable data
>>>> corruption!)
>>>>
>>>>
>>> Looks good.
>>>
>>> The docs on fsync are already good, it's just a matter of making people
>>> think twice and actually look at them.
>>>
>>
>> If fsync=off and you turn it on, does it fsync anything at that point?
>>
>> Or does it mean only that future fsyncs will occur?
>>
>>
> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>
> 4th paragraph in the fsync section.
>

Thanks. I've never touched that parameter! But I could have read the docs.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-04-28 20:44:23
Message-ID: 20160428204423.sq2l675d2hamgr6m@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-04-28 21:32:37 +0200, Simon Riggs wrote:
> On 27 April 2016 at 17:04, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
> > On 27 April 2016 at 21:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> >> Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
> >> > +1 (Abhijit's wording with data loss changed to data corruption)
> >>
> >> I'd suggest something like
> >>
> >> #fsync = on # flush data to disk for crash
> >> safety
> >> # (turning this off can cause
> >> # unrecoverable data corruption!)
> >>
> >>
> > Looks good.
> >
> > The docs on fsync are already good, it's just a matter of making people
> > think twice and actually look at them.
> >
>
> If fsync=off and you turn it on, does it fsync anything at that point?
>
> Or does it mean only that future fsyncs will occur?

Abhijit had a patch implementing automatically running fsync whenever
reenabled IIRC. Abhijit?

Andres


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.6 and fsync=off
Date: 2016-04-29 06:35:14
Message-ID: 20160429063514.GA630@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2016-04-28 13:44:23 -0700, andres(at)anarazel(dot)de wrote:
>
> Abhijit had a patch implementing automatically running fsync whenever
> reenabled IIRC. Abhijit?

The patch I had written is attached, and it's not quite the same thing.
Here's how I originally described it in response to a question from
Robert:

«In 20150115133245(dot)GG5245(at)awork2(dot)anarazel(dot)de, Andres explained his
rationale as follows:

«What I am thinking of is that, currently, if you start the
server for initial loading with fsync=off, and then restart it,
you're open to data loss. So when the current config file
setting is changed from off to on, we should fsync the data
directory. Even if there was no crash restart.»

That's what I tried to implement.»

I remember there was some subsequent discussion about it being better to
issue fsync during a checkpoint when we see that its value has changed,
but if I did any work on it (which I have a vague memory of), I can't
find it now. Sorry.

Do you want a patch along those lines now, or is it too late?

-- Abhijit

Attachment Content-Type Size
0002-Recursively-fsync-PGDATA-on-the-next-restart-after-f.patch text/x-diff 3.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Craig Ringer <craig(at)2ndQuadrant(dot)com>, Petr Jelinek <petr(at)2ndQuadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.6 and fsync=off
Date: 2016-04-29 13:49:07
Message-ID: 12906.1461937747@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
> Do you want a patch along those lines now, or is it too late?

We're certainly not going to consider fooling with this in 9.6.
The situation for manual fsync-twiddling is no worse than it was in
any prior release, and we are long past feature freeze.

If you want to put it on your to-do queue for 9.7, feel free.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-05-02 14:07:50
Message-ID: CA+TgmobDk9pg3mvZLSwrnR3x022QmJyGPh4PsERx23iG9RcdVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 29, 2016 at 9:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
>> Do you want a patch along those lines now, or is it too late?
>
> We're certainly not going to consider fooling with this in 9.6.
> The situation for manual fsync-twiddling is no worse than it was in
> any prior release, and we are long past feature freeze.
>
> If you want to put it on your to-do queue for 9.7, feel free.

Agreed.

I also think that it would be a swell idea to detect whether a system
has ever crashed with fsync=off, and do something about that, like
maybe bleat on every subsequent startup for the lifetime of the
cluster. I think Andres may have even proposed a patch for this sort
of thing before, although I don't remember for sure and I think he and
I disagreed on the details. Sketch:

- Keep a copy of the fsync status in pg_control.
- If we ever enter recovery while it's turned off, say:
WARNING: Entering recovery with fsync=off; this cluster may be
irretrievably corrupted.
...and also set a separate flag indicating that we've done at least
one recovery with fsync=off.
- If that flag is set on a subsequent startup, say:
WARNING: Recovery was previously performed with fsync=off; this
cluster may be irretrievably corrupted.

While I'm kvetching, it might also be a good idea to have a timestamp
in pg_control indicating the date and time at which pg_resetxlog was
last run (and maybe the cluster creation time, too). I run across way
too many clusters where the customer can't convincingly vouch for the
proposition that nothing evil has been done, and having some forensic
evidence available would make it easier to figure out where the blame
lies.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-05-02 14:27:08
Message-ID: CAMsr+YF4JUTP6pOsCBMd9QTw9uCAfS8tzfku1O=73wqTU7Y9Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2 May 2016 at 22:07, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> I also think that it would be a swell idea to detect whether a system
> has ever crashed with fsync=off, and do something about that, like
> maybe bleat on every subsequent startup for the lifetime of the
> cluster.

Yes. Very, very yes.

That would've made my life considerably easier on a few occasions now.

It shouldn't take much more than a new pg_control field and a test during
recovery.

Should TODO this, but since that's sometimes where ideas go to die, I'm
going to see if I can hack this out soon as well.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-05-02 14:32:44
Message-ID: 20364.1462199564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(at)2ndquadrant(dot)com> writes:
> On 2 May 2016 at 22:07, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I also think that it would be a swell idea to detect whether a system
>> has ever crashed with fsync=off, and do something about that, like
>> maybe bleat on every subsequent startup for the lifetime of the
>> cluster.

> Yes. Very, very yes.

+1 for tracking this in pg_control (maybe even with a counter, not
just a flag). I'm less convinced that we need to bleat on every
subsequent startup though --- that seems like just nagging.
Having the info available from pg_controldata seems sufficient for
forensics.

The timestamp ideas aren't bad either.

BTW, how would this work in a standby server?

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-05-02 15:59:34
Message-ID: 20160502155934.zohrkzzo6kyzd6hy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
> I also think that it would be a swell idea to detect whether a system
> has ever crashed with fsync=off, and do something about that, like
> maybe bleat on every subsequent startup for the lifetime of the
> cluster. I think Andres may have even proposed a patch for this sort
> of thing before, although I don't remember for sure and I think he and
> I disagreed on the details. Sketch:

Hm, I can't remember doing that.

> - Keep a copy of the fsync status in pg_control.
> - If we ever enter recovery while it's turned off, say:
> WARNING: Entering recovery with fsync=off; this cluster may be
> irretrievably corrupted.
> ...and also set a separate flag indicating that we've done at least
> one recovery with fsync=off.
> - If that flag is set on a subsequent startup, say:
> WARNING: Recovery was previously performed with fsync=off; this
> cluster may be irretrievably corrupted.

Well, the problem with that is that postgres crashes are actually
harmless with regard to fsync=on/off. It's just OS crashes that are a
problem. So it seems quite likely that the false-positive rate here
would be high enough, to make people ignore it.

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-05-02 16:04:29
Message-ID: 1035.1462205069@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
>> - If that flag is set on a subsequent startup, say:
>> WARNING: Recovery was previously performed with fsync=off; this
>> cluster may be irretrievably corrupted.

> Well, the problem with that is that postgres crashes are actually
> harmless with regard to fsync=on/off. It's just OS crashes that are a
> problem. So it seems quite likely that the false-positive rate here
> would be high enough, to make people ignore it.

That's a pretty good point. Also, as sketched, I believe this would
start bleating after a crash recovery performed because a backend
died --- which is a case where we know for certain there was no OS
crash. So this idea needs some more thought.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.6 and fsync=off
Date: 2016-05-02 16:21:07
Message-ID: CA+TgmoYnMPCAg7iYchAxhQqD-iJH0Npii3ueXVpwWn07uozNGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 2, 2016 at 12:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
>>> - If that flag is set on a subsequent startup, say:
>>> WARNING: Recovery was previously performed with fsync=off; this
>>> cluster may be irretrievably corrupted.
>
>> Well, the problem with that is that postgres crashes are actually
>> harmless with regard to fsync=on/off. It's just OS crashes that are a
>> problem. So it seems quite likely that the false-positive rate here
>> would be high enough, to make people ignore it.
>
> That's a pretty good point. Also, as sketched, I believe this would
> start bleating after a crash recovery performed because a backend
> died --- which is a case where we know for certain there was no OS
> crash. So this idea needs some more thought.

That's true. I think, that we could arrange to ignore postmaster
initiated crash-and-restart cycles in deciding whether to set the
flag. Now, somebody could still do an immediate shutdown, or the
postmaster could go boom, but I don't think those are common enough
scenarios to justify not tracking this. If you are using fsync=off
and running an immediate shutdown and then setting fsync=on and
restarting the server ... yeah, that could hypothetically be safe.
But I think you are playing with fire. If you are using fsync=off for
the initial data load, it's not too much to ask that you shut the
cluster down cleanly when you are done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company