Quick Links

Re: Enable data checksums by default

Lists:	pgsql-hackers

From:	Christoph Berg <myon(at)debian(dot)org>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Enable data checksums by default
Date:	2019-03-22 15:16:54
Message-ID:	20190322151653.GD32388@msg.df7cb.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Lately, PostgreSQL has moved many defaults from "bare minimum" more to
the "user friendly by default" side, e.g. hot_standby & replication in
the default configuration, parallelism, and generally higher defaults
for resource knobs like *_mem, autovacuum_* and so on.

I think, the next step in that direction would be to enable data
checksums by default. They make sense in most setups, and people who
plan to run very performance-critical systems where checksums might be
too much need to tune many knobs anyway, and can as well choose to
disable them manually, instead of having everyone else have to enable
them manually. Also, disabling is much easier than enabling.

One argument against checksums used to be that we lack tools to fix
problems with them. But ignore_checksum_failure and the pg_checksums
tool fix that.

The attached patch flips the default in initdb. It also adds a new
option -k --no-data-checksums that wasn't present previously. Docs are
updated to say what the new default is, and the testsuite exercises
the -K option.

Christoph

Attachment	Content-Type	Size
0001-Enable-data-checksums-by-default.patch	text/x-diff	6.6 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Christoph Berg <myon(at)debian(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-22 16:07:22
Message-ID:	4368.1553270842@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Christoph Berg <myon(at)debian(dot)org> writes:
> I think, the next step in that direction would be to enable data
> checksums by default. They make sense in most setups,

Well, that is exactly the point that needs some proof, not just
an unfounded assertion.

IMO, the main value of checksums is that they allow the Postgres
project to deflect blame. That's nice for us but I'm not sure
that it's a benefit for users. I've seen little if any data to
suggest that checksums actually catch enough problems to justify
the extra CPU costs and the risk of false positives.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-22 16:10:40
Message-ID:	20190322161040.jqbdo3t3tab5qci6@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2019-03-22 12:07:22 -0400, Tom Lane wrote:
> Christoph Berg <myon(at)debian(dot)org> writes:
> > I think, the next step in that direction would be to enable data
> > checksums by default. They make sense in most setups,
>
> Well, that is exactly the point that needs some proof, not just
> an unfounded assertion.
>
> IMO, the main value of checksums is that they allow the Postgres
> project to deflect blame. That's nice for us but I'm not sure
> that it's a benefit for users. I've seen little if any data to
> suggest that checksums actually catch enough problems to justify
> the extra CPU costs and the risk of false positives.

IDK, being able to verify in some form that backups aren't corrupted on
an IO level is mighty nice. That often does allow to detect the issue
while one still has older backups around.

My problem is more that I'm not confident the checks are mature
enough. The basebackup checks are atm not able to detect random data,
and neither basebackup nor backend checks detect zeroed out files/file
ranges.

Greetings,

Andres Freund

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-22 16:32:10
Message-ID:	20ce4ea9-47f3-9903-01b5-00eaf349b09d@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/22/19 5:10 PM, Andres Freund wrote:
> Hi,
>
> On 2019-03-22 12:07:22 -0400, Tom Lane wrote:
>> Christoph Berg <myon(at)debian(dot)org> writes:
>>> I think, the next step in that direction would be to enable data
>>> checksums by default. They make sense in most setups,
>>
>> Well, that is exactly the point that needs some proof, not just
>> an unfounded assertion.
>>
>> IMO, the main value of checksums is that they allow the Postgres
>> project to deflect blame. That's nice for us but I'm not sure
>> that it's a benefit for users. I've seen little if any data to
>> suggest that checksums actually catch enough problems to justify
>> the extra CPU costs and the risk of false positives.
>

I'm not sure about checksums being an effective tool to deflect blame.
Considering the recent fsync retry issues - due to the assumption that
we can just retry fsync we might have lost some of the writes, resulting
in torn pages and checksum failures. I'm sure we could argue about how
much sense the fsync behavior makes, but I doubt checksum failures are
enough to deflect blame here.

> IDK, being able to verify in some form that backups aren't corrupted on
> an IO level is mighty nice. That often does allow to detect the issue
> while one still has older backups around.
>

Yeah, I agree that's a valuable capability. I think the question is how
effective it actually is considering how much the storage changed over
the past few years (which necessarily affects the type of failures
people have to deal with).

> My problem is more that I'm not confident the checks are mature
> enough. The basebackup checks are atm not able to detect random data,
> and neither basebackup nor backend checks detect zeroed out files/file
> ranges.
>

Yep :-( The pg_basebackup vulnerability to random garbage in a page
header is unfortunate, we better improve that.

It's not clear to me what can checksums do about zeroed pages (and/or
truncated files) though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-22 16:41:17
Message-ID:	20190322164117.g2wmstoy6hbkyzfp@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote:
> On 3/22/19 5:10 PM, Andres Freund wrote:
> > IDK, being able to verify in some form that backups aren't corrupted on
> > an IO level is mighty nice. That often does allow to detect the issue
> > while one still has older backups around.
> >
>
> Yeah, I agree that's a valuable capability. I think the question is how
> effective it actually is considering how much the storage changed over
> the past few years (which necessarily affects the type of failures
> people have to deal with).

I'm not sure I understand? How do the changes around storage
meaningfully affect the need to have some trust in backups and
benefiting from earlier detection?

> It's not clear to me what can checksums do about zeroed pages (and/or
> truncated files) though.

Well, there's nothing fundamental about needing added pages be
zeroes. We could expand them to be initialized with actual valid
checksums instead of
/* new buffers are zero-filled */
MemSet((char *) bufBlock, 0, BLCKSZ);
/* don't set checksum for all-zero page */
smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);

the problem is that it's hard to do so safely without adding a lot of
additional WAL logging. A lot of filesystems will journal metadata
changes (like the size of the file), but not contents. So after a crash
the tail end might appear zeroed out, even if we never wrote
zeroes. That's obviously solvable by WAL logging, but that's not cheap.

It might still be a good idea to just write a page with an initialized
header / checksum at that point, as that ought to still detect a number
of problems we can't detect right now.

Greetings,

Andres Freund

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-22 17:01:32
Message-ID:	12110cc1-3729-9e5e-b6bb-62151a68af29@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/22/19 5:41 PM, Andres Freund wrote:
> Hi,
>
> On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote:
>> On 3/22/19 5:10 PM, Andres Freund wrote:
>>> IDK, being able to verify in some form that backups aren't corrupted on
>>> an IO level is mighty nice. That often does allow to detect the issue
>>> while one still has older backups around.
>>>
>>
>> Yeah, I agree that's a valuable capability. I think the question is how
>> effective it actually is considering how much the storage changed over
>> the past few years (which necessarily affects the type of failures
>> people have to deal with).
>
> I'm not sure I understand? How do the changes around storage
> meaningfully affect the need to have some trust in backups and
> benefiting from earlier detection?
>

Having trusted in backups is still desirable - nothing changes that,
obviously. The question I was posing was rather "Are checksums still
effective on current storage systems?"

I'm wondering if the storage systems people use nowadays may be failing
in ways that are not reliably detectable by checksums. I don't have any
data to either support or reject that hypothesis, though.

>
>> It's not clear to me what can checksums do about zeroed pages (and/or
>> truncated files) though.
>
> Well, there's nothing fundamental about needing added pages be
> zeroes. We could expand them to be initialized with actual valid
> checksums instead of
> /* new buffers are zero-filled */
> MemSet((char *) bufBlock, 0, BLCKSZ);
> /* don't set checksum for all-zero page */
> smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
>
> the problem is that it's hard to do so safely without adding a lot of
> additional WAL logging. A lot of filesystems will journal metadata
> changes (like the size of the file), but not contents. So after a crash
> the tail end might appear zeroed out, even if we never wrote
> zeroes. That's obviously solvable by WAL logging, but that's not cheap.
>

Hmmm. I'd say a filesystem that does not guarantee having all the data
after an fsync is outright broken, but maybe that's what checksums are
meant to protect against.

> It might still be a good idea to just write a page with an initialized
> header / checksum at that point, as that ought to still detect a number
> of problems we can't detect right now.
>

Sounds reasonable.

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-22 17:07:15
Message-ID:	20190322170715.tafjitaldyhwfl2u@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2019-03-22 18:01:32 +0100, Tomas Vondra wrote:
> On 3/22/19 5:41 PM, Andres Freund wrote:
> > Hi,
> >
> > On 2019-03-22 17:32:10 +0100, Tomas Vondra wrote:
> >> On 3/22/19 5:10 PM, Andres Freund wrote:
> >>> IDK, being able to verify in some form that backups aren't corrupted on
> >>> an IO level is mighty nice. That often does allow to detect the issue
> >>> while one still has older backups around.
> >>>
> >>
> >> Yeah, I agree that's a valuable capability. I think the question is how
> >> effective it actually is considering how much the storage changed over
> >> the past few years (which necessarily affects the type of failures
> >> people have to deal with).
> >
> > I'm not sure I understand? How do the changes around storage
> > meaningfully affect the need to have some trust in backups and
> > benefiting from earlier detection?
> >
>
> Having trusted in backups is still desirable - nothing changes that,
> obviously. The question I was posing was rather "Are checksums still
> effective on current storage systems?"
>
> I'm wondering if the storage systems people use nowadays may be failing
> in ways that are not reliably detectable by checksums. I don't have any
> data to either support or reject that hypothesis, though.

I don't think it's useful to paint unsubstantiated doom-and-gloom
pictures.

> >> It's not clear to me what can checksums do about zeroed pages (and/or
> >> truncated files) though.
> >
> > Well, there's nothing fundamental about needing added pages be
> > zeroes. We could expand them to be initialized with actual valid
> > checksums instead of
> > /* new buffers are zero-filled */
> > MemSet((char *) bufBlock, 0, BLCKSZ);
> > /* don't set checksum for all-zero page */
> > smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
> >
> > the problem is that it's hard to do so safely without adding a lot of
> > additional WAL logging. A lot of filesystems will journal metadata
> > changes (like the size of the file), but not contents. So after a crash
> > the tail end might appear zeroed out, even if we never wrote
> > zeroes. That's obviously solvable by WAL logging, but that's not cheap.
> >
>
> Hmmm. I'd say a filesystem that does not guarantee having all the data
> after an fsync is outright broken, but maybe that's what checksums are
> meant to protect against.

There's no fsync here. smgrextend(with-valid-checksum);crash; - the OS
will probably have journalled the file size change, but not the
contents. After a crash it's thus likely that the data page will appear
zeroed. Which prevents us from erroring out when encountering a zeroed
page, even though that'd be very good for error detection capabilities,
because storage systems will show corrupted data as zeroes in a number
of cases.

Greetings,

Andres Freund

From:	Christoph Berg <myon(at)debian(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-26 15:14:46
Message-ID:	20190326151446.GG3829@msg.df7cb.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Re: Tom Lane 2019-03-22 <4368(dot)1553270842(at)sss(dot)pgh(dot)pa(dot)us>
> Christoph Berg <myon(at)debian(dot)org> writes:
> > I think, the next step in that direction would be to enable data
> > checksums by default. They make sense in most setups,
>
> Well, that is exactly the point that needs some proof, not just
> an unfounded assertion.

I run a benchmark with checksums disabled/enabled. shared_buffers is
512kB to make sure almost any read will fetch the page from the OS
cache; scale factor is 50 (~750MB) to make sure the whole cluster fits
into RAM.

model name: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (8 threads)
alter system set shared_buffers = '512kB';
pgbench -s 50 -i
pgbench -P 5 -M prepared -c 8 -j 8 -T 60 --select-only

without checksums:
tps = 96893.627255 (including connections establishing)
tps = 97570.587793 (including connections establishing)
tps = 97455.484419 (including connections establishing)
tps = 97533.668801 (including connections establishing)
average: 97363

with checksums:
tps = 91942.502487 (including connections establishing)
tps = 92390.556925 (including connections establishing)
tps = 92956.923271 (including connections establishing)
tps = 92914.205047 (including connections establishing)
average: 92551

select 92551.0/97363;
0.9506

So the cost is 5% in this very contrived case. In almost any other
setting, the cost would be lower, I'd think.

Christoph

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-26 19:17:01
Message-ID:	CAH2-WzkTXAQbm6sxQkshUCRGfkKT1i9xV7wU0xLrKJ8Rb1r9fQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 22, 2019 at 9:07 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> IMO, the main value of checksums is that they allow the Postgres
> project to deflect blame. That's nice for us but I'm not sure
> that it's a benefit for users. I've seen little if any data to
> suggest that checksums actually catch enough problems to justify
> the extra CPU costs and the risk of false positives.

I share your concern.

Some users have a peculiar kind of cognitive dissonance around
corruption, at least in my experience. It's very difficult for them to
make a choice on whether or not to fail hard. Perhaps that needs to be
taken into account, without being indulged.

--
Peter Geoghegan

From:	Christoph Berg <myon(at)debian(dot)org>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-27 13:56:58
Message-ID:	20190327135658.GK12804@msg.df7cb.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Re: To Tom Lane 2019-03-26 <20190326151446(dot)GG3829(at)msg(dot)df7cb(dot)de>
> I run a benchmark with checksums disabled/enabled. shared_buffers is
> 512kB to make sure almost any read will fetch the page from the OS
> cache; scale factor is 50 (~750MB) to make sure the whole cluster fits
> into RAM.
[...]
> So the cost is 5% in this very contrived case. In almost any other
> setting, the cost would be lower, I'd think.

(That was on 12devel, btw.)

That was about the most extreme OLTP read-only workload. After
thinking about it some more, I realized that exercising large seqscans
might be an even better way to test it because of less per-query
overhead.

Same setup again, shared_buffers = 16 (128kB), jit = off,
max_parallel_workers_per_gather = 0:

select count(bid) from pgbench_accounts;

no checksums: ~456ms
with checksums: ~489ms

456.0/489 = 0.9325

The cost of checksums is about 6.75% here.

Christoph

From:	Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To:	Christoph Berg <myon(at)debian(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-27 20:51:16
Message-ID:	CA+CSw_twXdRzDN2XsSZBxEj63DeZ+f6_hs3Qf7hmXfenxSq+jg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 27, 2019, 15:57 Christoph Berg <myon(at)debian(dot)org> wrote:

> Re: To Tom Lane 2019-03-26 <20190326151446(dot)GG3829(at)msg(dot)df7cb(dot)de>
> > I run a benchmark with checksums disabled/enabled. shared_buffers is
> > 512kB to make sure almost any read will fetch the page from the OS
> > cache; scale factor is 50 (~750MB) to make sure the whole cluster fits
> > into RAM.
> [...]
> > So the cost is 5% in this very contrived case. In almost any other
> > setting, the cost would be lower, I'd think.
>
> (That was on 12devel, btw.)
>
> That was about the most extreme OLTP read-only workload. After
> thinking about it some more, I realized that exercising large seqscans
> might be an even better way to test it because of less per-query
> overhead.
>
> Same setup again, shared_buffers = 16 (128kB), jit = off,
> max_parallel_workers_per_gather = 0:
>
> select count(bid) from pgbench_accounts;
>
> no checksums: ~456ms
> with checksums: ~489ms
>
> 456.0/489 = 0.9325
>
> The cost of checksums is about 6.75% here.
>

Can you try with postgres compiled with CFLAGS="-O2 -march=native"? There's
a bit of low hanging fruit there to use a runtime CPU check to pick a
better optimized checksum function.

Regards,
Ants Aasma

From:	Christoph Berg <myon(at)debian(dot)org>
To:	Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-28 08:38:16
Message-ID:	20190328083816.GA31223@msg.df7cb.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Re: Ants Aasma 2019-03-27 <CA+CSw_twXdRzDN2XsSZBxEj63DeZ+f6_hs3Qf7hmXfenxSq+jg(at)mail(dot)gmail(dot)com>
> Can you try with postgres compiled with CFLAGS="-O2 -march=native"? There's
> a bit of low hanging fruit there to use a runtime CPU check to pick a
> better optimized checksum function.

Frankly, no. This is with the apt.pg.o packages which are supposed to
be usable by everyone. If there is a better per-CPU checksum function,
PG should pick it at runtime. Special compiler flags are a no-go here.

CPPFLAGS = -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer

Christoph

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-28 12:03:44
Message-ID:	5942cae7-3676-7cad-8d49-85ac53f321de@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2019-03-22 16:16, Christoph Berg wrote:
> I think, the next step in that direction would be to enable data
> checksums by default. They make sense in most setups, and people who
> plan to run very performance-critical systems where checksums might be
> too much need to tune many knobs anyway, and can as well choose to
> disable them manually, instead of having everyone else have to enable
> them manually. Also, disabling is much easier than enabling.

It would also enable pg_rewind to work by default.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Bernd Helmle <mailings(at)oopsware(dot)de>
To:	Christoph Berg <myon(at)debian(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-29 10:16:11
Message-ID:	c5d3e07dcb09d4f3f52e0eea691268cfb7f9cc71.camel@oopsware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Am Dienstag, den 26.03.2019, 16:14 +0100 schrieb Christoph Berg:
> select 92551.0/97363;
> 0.9506
>
> So the cost is 5% in this very contrived case. In almost any other
> setting, the cost would be lower, I'd think.

Well, my machine (Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz, 32 GByte
RAM) tells me this:

pgbench -s 50 -i pgbench
pg_ctl -o "--shared-buffers=128kB" restart
pgbench -r -P4 -Mprepared -T60 -c $clients -j $clients -n -S

...prewarm...

Clients checksums
1 20110
2 35338
4 67207
8 96627
16 110091

Clients no checksums
1 21716
2 38543
4 72118
8 117545
16 121415

Clients Impact
1 0,926045312212194
2 0,916846119918014
4 0,931903269641421
8 0,822042621974563
16 0,906733105464728

So between ~7% to 18% impact with checksums in this specific case here.

Bernd

From:	Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To:	Christoph Berg <myon(at)debian(dot)org>, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-29 11:18:14
Message-ID:	CA+CSw_s8NT_crvXCKiTb80hxD=YzZDfzyMt3=kBM6AP0TLxeVQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 28, 2019 at 10:38 AM Christoph Berg <myon(at)debian(dot)org> wrote:

> Re: Ants Aasma 2019-03-27 <
> CA+CSw_twXdRzDN2XsSZBxEj63DeZ+f6_hs3Qf7hmXfenxSq+jg(at)mail(dot)gmail(dot)com>
> > Can you try with postgres compiled with CFLAGS="-O2 -march=native"?
> There's
> > a bit of low hanging fruit there to use a runtime CPU check to pick a
> > better optimized checksum function.
>
> Frankly, no. This is with the apt.pg.o packages which are supposed to
> be usable by everyone. If there is a better per-CPU checksum function,
> PG should pick it at runtime. Special compiler flags are a no-go here.
>

I went ahead and tested it on the count(*) test, same settings as upthread.
Median of 5 runs of 20txs on Intel i5-2500k @ 4GHz.

No checksum: 344ms
Checksums: 384ms (+12%)
No checksum march=native: 344ms
Checksums march=native: 369ms (+7%)

The checksum code was written to be easily auto-vectorized by the compiler.
So if we just compile the same function with different compiler flags and
pick between them at runtime the overhead can be approximately halved. Not
saying that this needs to be done before enabling checksums by default,
just that when considering overhead, we can foresee it being much lower in
future versions.

Regards,
Ants Aasma

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Bernd Helmle <mailings(at)oopsware(dot)de>
Cc:	Christoph Berg <myon(at)debian(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-29 14:10:14
Message-ID:	20190329141014.GJ1954@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 29, 2019 at 11:16:11AM +0100, Bernd Helmle wrote:
> So between ~7% to 18% impact with checksums in this specific case here.

I can't really believe that many people set up shared_buffers at 128kB
which would cause such a large number of page evictions, but I can
believe that many users have shared_buffers set to its default value
and that we are going to get complains about "performance drop after
upgrade to v12" if we switch data checksums to on by default.
--
Michael

From:	Bernd Helmle <mailings(at)oopsware(dot)de>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Christoph Berg <myon(at)debian(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-29 19:25:41
Message-ID:	3586bb9345a59bfc8d13a50a7c729be1ee6759fd.camel@oopsware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Am Freitag, den 29.03.2019, 23:10 +0900 schrieb Michael Paquier:
>
> I can't really believe that many people set up shared_buffers at
> 128kB
> which would cause such a large number of page evictions, but I can
> believe that many users have shared_buffers set to its default value
> and that we are going to get complains about "performance drop after
> upgrade to v12" if we switch data checksums to on by default.

Yeah, i think Christoph's benchmark is based on this thinking. I assume
this very unrealistic scenery should emulate the worst case (many
buffer_reads, high checksum calculation load).

Bernd

From:	Christoph Berg <myon(at)debian(dot)org>
To:	Bernd Helmle <mailings(at)oopsware(dot)de>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-29 19:35:26
Message-ID:	20190329193526.GB19154@msg.df7cb.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Re: Bernd Helmle 2019-03-29 <3586bb9345a59bfc8d13a50a7c729be1ee6759fd(dot)camel(at)oopsware(dot)de>
> Am Freitag, den 29.03.2019, 23:10 +0900 schrieb Michael Paquier:
> >
> > I can't really believe that many people set up shared_buffers at
> > 128kB
> > which would cause such a large number of page evictions, but I can
> > believe that many users have shared_buffers set to its default value
> > and that we are going to get complains about "performance drop after
> > upgrade to v12" if we switch data checksums to on by default.
>
> Yeah, i think Christoph's benchmark is based on this thinking. I assume
> this very unrealistic scenery should emulate the worst case (many
> buffer_reads, high checksum calculation load).

It's not unrealistic to have large seqscans that are all buffer
misses, the table just has to be big enough. The idea in my benchmark
was that if I make shared buffers really small, and the table still
fits in to RAM, I should be seeing only buffer misses, but without any
delay for actually reading from disk.

Christoph

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Christoph Berg <myon(at)debian(dot)org>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-30 19:25:43
Message-ID:	20190330192543.GH4719@development
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 29, 2019 at 08:35:26PM +0100, Christoph Berg wrote:
>Re: Bernd Helmle 2019-03-29 <3586bb9345a59bfc8d13a50a7c729be1ee6759fd(dot)camel(at)oopsware(dot)de>
>> Am Freitag, den 29.03.2019, 23:10 +0900 schrieb Michael Paquier:
>> >
>> > I can't really believe that many people set up shared_buffers at
>> > 128kB
>> > which would cause such a large number of page evictions, but I can
>> > believe that many users have shared_buffers set to its default value
>> > and that we are going to get complains about "performance drop after
>> > upgrade to v12" if we switch data checksums to on by default.
>>
>> Yeah, i think Christoph's benchmark is based on this thinking. I assume
>> this very unrealistic scenery should emulate the worst case (many
>> buffer_reads, high checksum calculation load).
>
>It's not unrealistic to have large seqscans that are all buffer
>misses, the table just has to be big enough. The idea in my benchmark
>was that if I make shared buffers really small, and the table still
>fits in to RAM, I should be seeing only buffer misses, but without any
>delay for actually reading from disk.
>
>Christoph
>

FWIW I think it's a mistake to focus solely on CPU utilization, which
all the benchmarks performed on this thread do because they look at tps
of in-memory read-only workloads. Checksums have other costs too, not
just the additional CPU time. Most importanly they require wal_log_hints
to be set (which people may or may not want anyway).

I've done a simple benchmark, that does read-only (-S) and read-write
(-N) pgbench runs with different scales, but also measures duration of
the pgbench init and amount of WAL produced during the tests.

On a small machine (i5, 8GB RAM, SSD RAID) the results are these:

scale config | init tps wal
=========================|==================================
ro 10 no-hints | 2 117038 130
hints | 2 116378 146
checksums | 2 115619 147
-------------------|----------------------------------
200 no-hints | 32 88340 2407
hints | 37 86154 2628
checksums | 36 83336 2624
-------------------|----------------------------------
2000 no-hints | 365 38680 1967
hints | 423 38670 2123
checksums | 504 37510 2046
-------------------------|----------------------------------
rw 10 no-hints | 2 19691 437
hints | 2 19712 437
checksums | 2 19654 437
-------------------|----------------------------------
200 no-hints | 32 15839 2745
hints | 37 15735 2783
checksums | 36 15646 2775
-------------------|----------------------------------
2000 no-hints | 365 5371 3721
hints | 423 5270 3671
checksums | 504 5094 3574

The no-hints config is default (wal_log_hints=off, data_checksums=off),
hints sets wal_log_hints=on and checksums enables data checksums. All
the configs were somewhat tuned (1GB shared buffers, max_wal_size high
enough not to hit checkpoints very often, etc.).

I've also done the tests on the a larger machine (2x E5-2620v4, 32GB of
RAM, NVMe SSD), and the general pattern is about the same - while the
tps and amount of WAL (not covering the init) does not change, the time
for initialization increases significantly (by 20-40%).

This effect is even clearer when using slower storage (SATA-based RAID).
The results then look like this:

scale config | init tps wal
=========================|==================================
ro 100 no-hints | 49 229459 122
hints | 101 167983 190
checksums | 103 156307 190
-------------------|----------------------------------
1000 no-hints | 580 152167 109
hints | 1047 122814 142
checksums | 1080 118586 141
-------------------|----------------------------------
6000 no-hints | 4035 508 1
hints | 11193 502 1
checksums | 11376 506 1
-------------------------|----------------------------------
rw 100 no-hints | 49 279 192
hints | 101 275 190
checksums | 103 275 190
-------------------|----------------------------------
1000 no-hints | 580 237 210
hints | 1047 225 201
checksums | 1080 224 200
-------------------|----------------------------------
6000 no-hints | 4035 135 123
hints | 11193 133 122
checksums | 11376 132 121

and when expressed as relative to no-hints:

scale config | init tps wal
============================|===============================
ro 100 hints | 206% 73% 155%
checksums | 210% 68% 155%
-------------------|--------------------------------
1000 hints | 181% 81% 131%
checksums | 186% 78% 129%
-------------------|--------------------------------
6000 hints | 277% 99% 100%
checksums | 282% 100% 104%
----------------------------|--------------------------------
rw 100 hints | 206% 99% 99%
checksums | 210% 99% 99%
-------------------|--------------------------------
1000 hints | 181% 95% 96%
checksums | 186% 95% 95%
-------------------|--------------------------------
6000 hints | 277% 99% 99%
checksums | 282% 98% 98%

I have not investigated the exact reasons, but my hypothesis it's about
the amount of WAL generated during the initial CREATE INDEX (because it
probably ends up setting the hint bits), which puts additional pressure
on the storage.

Unfortunately, this additional cost is unlikely to go away :-(

Now, maybe we want to enable checksums by default anyway, but we should
not pretent the only cost related to checksums is CPU usage.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org,Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>,Christoph Berg <myon(at)debian(dot)org>,Bernd Helmle <mailings(at)oopsware(dot)de>,Michael Paquier <michael(at)paquier(dot)xyz>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-03-30 20:17:20
Message-ID:	23278C72-65BD-4EAE-85D4-15CC51E77DC9@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On March 30, 2019 3:25:43 PM EDT, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>On Fri, Mar 29, 2019 at 08:35:26PM +0100, Christoph Berg wrote:
>>Re: Bernd Helmle 2019-03-29
><3586bb9345a59bfc8d13a50a7c729be1ee6759fd(dot)camel(at)oopsware(dot)de>
>>> Am Freitag, den 29.03.2019, 23:10 +0900 schrieb Michael Paquier:
>>> >
>>> > I can't really believe that many people set up shared_buffers at
>>> > 128kB
>>> > which would cause such a large number of page evictions, but I can
>>> > believe that many users have shared_buffers set to its default
>value
>>> > and that we are going to get complains about "performance drop
>after
>>> > upgrade to v12" if we switch data checksums to on by default.
>>>
>>> Yeah, i think Christoph's benchmark is based on this thinking. I
>assume
>>> this very unrealistic scenery should emulate the worst case (many
>>> buffer_reads, high checksum calculation load).
>>
>>It's not unrealistic to have large seqscans that are all buffer
>>misses, the table just has to be big enough. The idea in my benchmark
>>was that if I make shared buffers really small, and the table still
>>fits in to RAM, I should be seeing only buffer misses, but without any
>>delay for actually reading from disk.
>>
>>Christoph
>>
>
>FWIW I think it's a mistake to focus solely on CPU utilization, which
>all the benchmarks performed on this thread do because they look at tps
>of in-memory read-only workloads. Checksums have other costs too, not
>just the additional CPU time. Most importanly they require
>wal_log_hints
>to be set (which people may or may not want anyway).
>
>I've done a simple benchmark, that does read-only (-S) and read-write
>(-N) pgbench runs with different scales, but also measures duration of
>the pgbench init and amount of WAL produced during the tests.
>
>On a small machine (i5, 8GB RAM, SSD RAID) the results are these:
>
> scale config | init tps wal
> =========================|==================================
> ro 10 no-hints | 2 117038 130
> hints | 2 116378 146
> checksums | 2 115619 147
> -------------------|----------------------------------
> 200 no-hints | 32 88340 2407
> hints | 37 86154 2628
> checksums | 36 83336 2624
> -------------------|----------------------------------
> 2000 no-hints | 365 38680 1967
> hints | 423 38670 2123
> checksums | 504 37510 2046
> -------------------------|----------------------------------
> rw 10 no-hints | 2 19691 437
> hints | 2 19712 437
> checksums | 2 19654 437
> -------------------|----------------------------------
> 200 no-hints | 32 15839 2745
> hints | 37 15735 2783
> checksums | 36 15646 2775
> -------------------|----------------------------------
> 2000 no-hints | 365 5371 3721
> hints | 423 5270 3671
> checksums | 504 5094 3574
>
>The no-hints config is default (wal_log_hints=off, data_checksums=off),
>hints sets wal_log_hints=on and checksums enables data checksums. All
>the configs were somewhat tuned (1GB shared buffers, max_wal_size high
>enough not to hit checkpoints very often, etc.).
>
>I've also done the tests on the a larger machine (2x E5-2620v4, 32GB of
>RAM, NVMe SSD), and the general pattern is about the same - while the
>tps and amount of WAL (not covering the init) does not change, the time
>for initialization increases significantly (by 20-40%).
>
>This effect is even clearer when using slower storage (SATA-based
>RAID).
>The results then look like this:
>
> scale config | init tps wal
> =========================|==================================
> ro 100 no-hints | 49 229459 122
> hints | 101 167983 190
> checksums | 103 156307 190
> -------------------|----------------------------------
> 1000 no-hints | 580 152167 109
> hints | 1047 122814 142
> checksums | 1080 118586 141
> -------------------|----------------------------------
> 6000 no-hints | 4035 508 1
> hints | 11193 502 1
> checksums | 11376 506 1
> -------------------------|----------------------------------
> rw 100 no-hints | 49 279 192
> hints | 101 275 190
> checksums | 103 275 190
> -------------------|----------------------------------
> 1000 no-hints | 580 237 210
> hints | 1047 225 201
> checksums | 1080 224 200
> -------------------|----------------------------------
> 6000 no-hints | 4035 135 123
> hints | 11193 133 122
> checksums | 11376 132 121
>
>and when expressed as relative to no-hints:
>
> scale config | init tps wal
> ============================|===============================
> ro 100 hints | 206% 73% 155%
> checksums | 210% 68% 155%
> -------------------|--------------------------------
> 1000 hints | 181% 81% 131%
> checksums | 186% 78% 129%
> -------------------|--------------------------------
> 6000 hints | 277% 99% 100%
> checksums | 282% 100% 104%
> ----------------------------|--------------------------------
> rw 100 hints | 206% 99% 99%
> checksums | 210% 99% 99%
> -------------------|--------------------------------
> 1000 hints | 181% 95% 96%
> checksums | 186% 95% 95%
> -------------------|--------------------------------
> 6000 hints | 277% 99% 99%
> checksums | 282% 98% 98%
>
>I have not investigated the exact reasons, but my hypothesis it's about
>the amount of WAL generated during the initial CREATE INDEX (because it
>probably ends up setting the hint bits), which puts additional pressure
>on the storage.
>
>Unfortunately, this additional cost is unlikely to go away :-(
>
>Now, maybe we want to enable checksums by default anyway, but we should
>not pretent the only cost related to checksums is CPU usage.

Thanks for running these, very helpful.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

From:	Christoph Berg <myon(at)debian(dot)org>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-04-01 08:16:47
Message-ID:	20190401081647.GB13726@msg.df7cb.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Re: Tomas Vondra 2019-03-30 <20190330192543(dot)GH4719(at)development>
> I have not investigated the exact reasons, but my hypothesis it's about
> the amount of WAL generated during the initial CREATE INDEX (because it
> probably ends up setting the hint bits), which puts additional pressure
> on the storage.
>
> Unfortunately, this additional cost is unlikely to go away :-(

If WAL volume is a problem, would wal_compression help?

> Now, maybe we want to enable checksums by default anyway, but we should
> not pretent the only cost related to checksums is CPU usage.

Thanks for doing these tests. The point I'm trying to make is, why do
we run without data checksums by default? For example, we do checksum
the WAL all the time, and there's not even an option to disable it,
even if that might make things faster. Why don't we enable data
checksums by default as well?

Christoph

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Christoph Berg <myon(at)debian(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-04-01 08:25:57
Message-ID:	CABUevEzpZO7_fu9ihPZH2wQykSBPOne0oAC9EqZLLB_u=xwEnw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 1, 2019 at 10:17 AM Christoph Berg <myon(at)debian(dot)org> wrote:

> Re: Tomas Vondra 2019-03-30 <20190330192543(dot)GH4719(at)development>
> > I have not investigated the exact reasons, but my hypothesis it's about
> > the amount of WAL generated during the initial CREATE INDEX (because it
> > probably ends up setting the hint bits), which puts additional pressure
> > on the storage.
> >
> > Unfortunately, this additional cost is unlikely to go away :-(
>
> If WAL volume is a problem, would wal_compression help?
>
> > Now, maybe we want to enable checksums by default anyway, but we should
> > not pretent the only cost related to checksums is CPU usage.
>
> Thanks for doing these tests. The point I'm trying to make is, why do
> we run without data checksums by default? For example, we do checksum
> the WAL all the time, and there's not even an option to disable it,
> even if that might make things faster. Why don't we enable data
> checksums by default as well?
>

I think one of the often overlooked original reasons was that we need to
log hint bits, same as when wal_log_hints is set.

Of course, if we consider it today, you have to do that in order to use
pg_rewind as well, so a lot of people who want to run any form of HA setup
will be having that turned on anyway. I think that has turned out to be a
much weaker reason than it originally was thought to be.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Christoph Berg <myon(at)debian(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-04-10 03:09:21
Message-ID:	20190410030921.aragfr54c4rhloix@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 1, 2019 at 10:16:47AM +0200, Christoph Berg wrote:
> Re: Tomas Vondra 2019-03-30 <20190330192543(dot)GH4719(at)development>
> > I have not investigated the exact reasons, but my hypothesis it's about
> > the amount of WAL generated during the initial CREATE INDEX (because it
> > probably ends up setting the hint bits), which puts additional pressure
> > on the storage.
> >
> > Unfortunately, this additional cost is unlikely to go away :-(
>
> If WAL volume is a problem, would wal_compression help?
>
> > Now, maybe we want to enable checksums by default anyway, but we should
> > not pretent the only cost related to checksums is CPU usage.
>
> Thanks for doing these tests. The point I'm trying to make is, why do
> we run without data checksums by default? For example, we do checksum
> the WAL all the time, and there's not even an option to disable it,
> even if that might make things faster. Why don't we enable data
> checksums by default as well?

We checksum wal because we know partial WAL writes are likely to happen
during power failure during a write. Data pages have pre-images (GUC
full_page_writes) stored in WAL so they are protected from partial
writes, hence are less likely to need checksum protection to detect
corruption.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-04-10 03:11:03
Message-ID:	20190410031103.ql4dubhkll65lcdq@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 22, 2019 at 12:07:22PM -0400, Tom Lane wrote:
> Christoph Berg <myon(at)debian(dot)org> writes:
> > I think, the next step in that direction would be to enable data
> > checksums by default. They make sense in most setups,
>
> Well, that is exactly the point that needs some proof, not just
> an unfounded assertion.
>
> IMO, the main value of checksums is that they allow the Postgres
> project to deflect blame. That's nice for us but I'm not sure
> that it's a benefit for users. I've seen little if any data to
> suggest that checksums actually catch enough problems to justify
> the extra CPU costs and the risk of false positives.

Enabling checksums by default will require anyone using pg_upgrade to
run initdb to disable checksums before running pg_upgrade, for one
release. We could add checksums for non-link pg_upgrade runs, but we
don't have code to do that yet, and most people use link anyway.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-04-11 16:58:12
Message-ID:	20190411165812.fda2lbtka5fgesdf@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2019-04-09 23:11:03 -0400, Bruce Momjian wrote:
> Enabling checksums by default will require anyone using pg_upgrade to
> run initdb to disable checksums before running pg_upgrade, for one
> release. We could add checksums for non-link pg_upgrade runs, but we
> don't have code to do that yet, and most people use link anyway.

Hm. We could just have pg_ugprade run pg_checksums --enable/disable,
based on the old cluster, and print a warning on mismatches. Not sure if
that's worth it, but ...

Greetings,

Andres Freund

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-04-11 18:15:41
Message-ID:	pab_LcKKbQtxu9JUmu_Rwe2ugslnCPDqcJTpSvhVDY7S-FMOIvUHUsiXqfyq_bCvvgNXURDNRDEuCM3yTXFwMPG3DDp4QANe5NFcohFz7-k=@yesql.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, April 11, 2019 6:58 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2019-04-09 23:11:03 -0400, Bruce Momjian wrote:
>
> > Enabling checksums by default will require anyone using pg_upgrade to
> > run initdb to disable checksums before running pg_upgrade, for one
> > release. We could add checksums for non-link pg_upgrade runs, but we
> > don't have code to do that yet, and most people use link anyway.
>
> Hm. We could just have pg_ugprade run pg_checksums --enable/disable,
> based on the old cluster, and print a warning on mismatches. Not sure if
> that's worth it, but ...

That would be for link mode, for copy-mode you'd have to initdb with checksums
turned off and run pg_checksums on the new cluster, else the non-destructive
nature of copy mode would be lost.

Another option would be to teach pg_upgrade to checksum the cluster during the
upgrade on the fly. That would however be a big conceptual change for
pg_upgrade as it's currently not modifying the cluster data. In Greenplum we
have done this, but it was an easier choice there as we are rewriting all the
pages anyways. It would also create yet another utility which can checksum an
offline cluster, but wanted to bring the idea to the table.

cheers ./daniel

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-04-11 18:56:41
Message-ID:	20190411185641.czkxd36iiouib5dd@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2019-04-11 18:15:41 +0000, Daniel Gustafsson wrote:
> On Thursday, April 11, 2019 6:58 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > On 2019-04-09 23:11:03 -0400, Bruce Momjian wrote:
> >
> > > Enabling checksums by default will require anyone using pg_upgrade to
> > > run initdb to disable checksums before running pg_upgrade, for one
> > > release. We could add checksums for non-link pg_upgrade runs, but we
> > > don't have code to do that yet, and most people use link anyway.
> >
> > Hm. We could just have pg_ugprade run pg_checksums --enable/disable,
> > based on the old cluster, and print a warning on mismatches. Not sure if
> > that's worth it, but ...
>
> That would be for link mode, for copy-mode you'd have to initdb with checksums
> turned off and run pg_checksums on the new cluster, else the non-destructive
> nature of copy mode would be lost.

I don't think so? But I think we might just have misunderstood each
other. What I was suggesting is that we could take the burden of having
to match the old cluster's checksum enabled/disabled setting when
initdb'ing the new cluster, by changing the new cluster instead of
erroring out with:
if (oldctrl->data_checksum_version == 0 &&
newctrl->data_checksum_version != 0)
pg_fatal("old cluster does not use data checksums but the new one does\n");
else if (oldctrl->data_checksum_version != 0 &&
newctrl->data_checksum_version == 0)
pg_fatal("old cluster uses data checksums but the new one does not\n");
else if (oldctrl->data_checksum_version != newctrl->data_checksum_version)
pg_fatal("old and new cluster pg_controldata checksum versions do not match\n");

As the new cluster at that time isn't yet related to the old cluster, I
don't see why that'd influence the non-destructive nature?

Greetings,

Andres Freund

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Enable data checksums by default
Date:	2019-04-11 21:08:07
Message-ID:	pcs5Fq7m2Vfdl3HnxjzhjTJdpN7NrW_cRzwYqow4ekLXfPA8sjbuNRWHmHohoeVr9Zk_bxazXz-Q-t6zV5xUpI0pE4dooicPnp9Johc_-5c=@yesql.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, April 11, 2019 8:56 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2019-04-11 18:15:41 +0000, Daniel Gustafsson wrote:
>
> > On Thursday, April 11, 2019 6:58 PM, Andres Freund andres(at)anarazel(dot)de wrote:
> >
> > > On 2019-04-09 23:11:03 -0400, Bruce Momjian wrote:
> > >
> > > > Enabling checksums by default will require anyone using pg_upgrade to
> > > > run initdb to disable checksums before running pg_upgrade, for one
> > > > release. We could add checksums for non-link pg_upgrade runs, but we
> > > > don't have code to do that yet, and most people use link anyway.
> > >
> > > Hm. We could just have pg_ugprade run pg_checksums --enable/disable,
> > > based on the old cluster, and print a warning on mismatches. Not sure if
> > > that's worth it, but ...
> >
> > That would be for link mode, for copy-mode you'd have to initdb with checksums
> > turned off and run pg_checksums on the new cluster, else the non-destructive
> > nature of copy mode would be lost.
>
> I don't think so? But I think we might just have misunderstood each
> other. What I was suggesting is that we could take the burden of having
> to match the old cluster's checksum enabled/disabled setting when
> initdb'ing the new cluster, by changing the new cluster instead of
> erroring out with:
> if (oldctrl->data_checksum_version == 0 &&
>
> newctrl->data_checksum_version != 0)
>
> pg_fatal("old cluster does not use data checksums but the new one does\\n");
> else if (oldctrl->data_checksum_version != 0 &&
>
> newctrl->data_checksum_version == 0)
>
> pg_fatal("old cluster uses data checksums but the new one does not\\n");
> else if (oldctrl->data_checksum_version != newctrl->data_checksum_version)
>
> pg_fatal("old and new cluster pg_controldata checksum versions do not match\\n");
>
>
> As the new cluster at that time isn't yet related to the old cluster, I
> don't see why that'd influence the non-destructive nature?

Right, now I see what you mean, and I indeed misunderstood you. Thanks for
clarifying.

cheers ./daniel