Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-12 14:12:47
Message-ID: 30896.1492006367@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres mentioned, and I've confirmed locally, that a large chunk of
initdb's runtime goes into regprocin's brute-force lookups of function
OIDs from function names. The recent discussion about cutting TAP test
time prompted me to look into that question again. We had had some
grand plans for getting genbki.pl to perform the name-to-OID conversion
as part of a big rewrite, but since that project is showing few signs
of life, I'm thinking that a more localized performance fix would be
a good thing to look into. There seem to be a couple of plausible
routes to a fix:

1. The best thing would still be to make genbki.pl do the conversion,
and write numeric OIDs into postgres.bki. The core stumbling block
here seems to be that for most catalogs, Catalog.pm and genbki.pl
never really break down a DATA line into fields --- and we certainly
have got to do that, if we're going to replace the values of regproc
fields. The places that do need to do that approximate it like this:

# To construct fmgroids.h and fmgrtab.c, we need to inspect some
# of the individual data fields. Just splitting on whitespace
# won't work, because some quoted fields might contain internal
# whitespace. We handle this by folding them all to a simple
# "xxx". Fortunately, this script doesn't need to look at any
# fields that might need quoting, so this simple hack is
# sufficient.
$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
@{$row}{(at)attnames} = split /\s+/, $row->{bki_values};

We would need a bullet-proof, non-hack, preferably not too slow way to
split DATA lines into fields properly. I'm one of the world's worst
Perl programmers, but surely there's a way?

2. Alternatively, we could teach bootstrap mode to build a hashtable
mapping proname to OID while it reads pg_proc data from postgres.bki,
and then make regprocin's bootstrap path consult the hashtable instead
of looking directly at the pg_proc file. That I'm quite sure is do-able,
but it seems like it's leaving money on the table compared to doing
the transformation earlier.

Thoughts?

regards, tom lane


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-12 15:00:06
Message-ID: a15e94ac-c346-5816-69ae-ec6811e32af1@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/12/2017 04:12 PM, Tom Lane wrote:
> 1. The best thing would still be to make genbki.pl do the conversion,
> and write numeric OIDs into postgres.bki. The core stumbling block
> here seems to be that for most catalogs, Catalog.pm and genbki.pl
> never really break down a DATA line into fields --- and we certainly
> have got to do that, if we're going to replace the values of regproc
> fields. The places that do need to do that approximate it like this:
>
> # To construct fmgroids.h and fmgrtab.c, we need to inspect some
> # of the individual data fields. Just splitting on whitespace
> # won't work, because some quoted fields might contain internal
> # whitespace. We handle this by folding them all to a simple
> # "xxx". Fortunately, this script doesn't need to look at any
> # fields that might need quoting, so this simple hack is
> # sufficient.
> $row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
> @{$row}{(at)attnames} = split /\s+/, $row->{bki_values};
>
> We would need a bullet-proof, non-hack, preferably not too slow way to
> split DATA lines into fields properly. I'm one of the world's worst
> Perl programmers, but surely there's a way?
>
> 2. Alternatively, we could teach bootstrap mode to build a hashtable
> mapping proname to OID while it reads pg_proc data from postgres.bki,
> and then make regprocin's bootstrap path consult the hashtable instead
> of looking directly at the pg_proc file. That I'm quite sure is do-able,
> but it seems like it's leaving money on the table compared to doing
> the transformation earlier.
>
> Thoughts?

Looked at this an option 1 seems simple enough if I am not missing
something. I might hack something up later tonight. Either way I think
this improvement can be done separately from the proposed replacement of
the catalog header files. Trying to fix everything at once often leads
to nothing being fixed at all.

Andreas


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-12 17:34:37
Message-ID: 20170412173437.qfqfnl6k3icpfczx@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-12 10:12:47 -0400, Tom Lane wrote:
> Andres mentioned, and I've confirmed locally, that a large chunk of
> initdb's runtime goes into regprocin's brute-force lookups of function
> OIDs from function names. The recent discussion about cutting TAP test
> time prompted me to look into that question again. We had had some
> grand plans for getting genbki.pl to perform the name-to-OID conversion
> as part of a big rewrite, but since that project is showing few signs
> of life, I'm thinking that a more localized performance fix would be
> a good thing to look into. There seem to be a couple of plausible
> routes to a fix:
>
> 1. The best thing would still be to make genbki.pl do the conversion,
> and write numeric OIDs into postgres.bki. The core stumbling block
> here seems to be that for most catalogs, Catalog.pm and genbki.pl
> never really break down a DATA line into fields --- and we certainly
> have got to do that, if we're going to replace the values of regproc
> fields. The places that do need to do that approximate it like this:
>
> # To construct fmgroids.h and fmgrtab.c, we need to inspect some
> # of the individual data fields. Just splitting on whitespace
> # won't work, because some quoted fields might contain internal
> # whitespace. We handle this by folding them all to a simple
> # "xxx". Fortunately, this script doesn't need to look at any
> # fields that might need quoting, so this simple hack is
> # sufficient.
> $row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
> @{$row}{(at)attnames} = split /\s+/, $row->{bki_values};
>
> We would need a bullet-proof, non-hack, preferably not too slow way to
> split DATA lines into fields properly. I'm one of the world's worst
> Perl programmers, but surely there's a way?

I've done something like 1) before:
http://archives.postgresql.org/message-id/20150221230839.GE2037%40awork2.anarazel.de

I don't think the speeds matters all that much, because we'll only do it
when generating the .bki file - a couple ms more or less won't matter
much.

I IIRC spent some more time to also load the data files from a different
format:
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/sane-catalog
although that's presumably heavily outdated now.

- Andres


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-12 19:15:53
Message-ID: 972d541d-7285-53ff-ea71-dcbfd886570a@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/12/2017 05:00 PM, Andreas Karlsson wrote:
> Looked at this an option 1 seems simple enough if I am not missing
> something. I might hack something up later tonight. Either way I think
> this improvement can be done separately from the proposed replacement of
> the catalog header files. Trying to fix everything at once often leads
> to nothing being fixed at all.

Here is my proof of concept patch. It does basically the same thing as
Andres's patch except that it handles quoted values a bit better and
does not try to support anything other than the regproc type.

The patch speeds up initdb without fsync from 0.80 seconds to 0.55
seconds, which is a nice speedup, while adding a negligible amount of
extra work on compilation.

Two things which could be improved in this patch if people deem it
important:

- Refactor the code to be more generic, like Andres patch, so it is
easier to add lookups for other data types.

- Write something closer to a real .bki parser to actually understand
quoted values and _null_ when parsing the data. Right now this patch
only splits each row into its fields (while being aware of quotes) but
does not attempt to remove quotes. The PoC patch treats "foo" and foo as
different.

Andreas

Attachment Content-Type Size
bki-regproc-oids-v1.patch text/x-patch 5.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-13 16:13:30
Message-ID: 6312.1492100010@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Karlsson <andreas(at)proxel(dot)se> writes:
> Here is my proof of concept patch. It does basically the same thing as
> Andres's patch except that it handles quoted values a bit better and
> does not try to support anything other than the regproc type.

> The patch speeds up initdb without fsync from 0.80 seconds to 0.55
> seconds, which is a nice speedup, while adding a negligible amount of
> extra work on compilation.

I've pushed this with some mostly-cosmetic adjustments:

* created a single subroutine that understands how to split DATA lines,
rather than having several copies of the regex

* rearranged the code so that the data structure returned by
Catalog::Catalogs() isn't scribbled on (which was already
happening before your patch, but it seemed pretty ugly to me)

* stripped out the bootstrap-time name lookup code from all of reg*
not just regproc.

There's certainly lots more that could be done in the genbki code,
but I think all we can justify at this stage of the development
cycle is to get the low-hanging fruit for testing speedups.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-13 16:41:32
Message-ID: 20170413164132.3nhobp4z4wnnvhca@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-13 12:13:30 -0400, Tom Lane wrote:
> Andreas Karlsson <andreas(at)proxel(dot)se> writes:
> > Here is my proof of concept patch. It does basically the same thing as
> > Andres's patch except that it handles quoted values a bit better and
> > does not try to support anything other than the regproc type.
>
> > The patch speeds up initdb without fsync from 0.80 seconds to 0.55
> > seconds, which is a nice speedup, while adding a negligible amount of
> > extra work on compilation.
>
> I've pushed this with some mostly-cosmetic adjustments:

Cool. I wonder if we also should remove AtEOXact_CatCache()'s
cross-checks - the resowner replacement has been in place for a while,
and seems robust enough. They're now the biggest user of time.

- Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-13 16:56:14
Message-ID: 27409.1492102574@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Cool. I wonder if we also should remove AtEOXact_CatCache()'s
> cross-checks - the resowner replacement has been in place for a while,
> and seems robust enough. They're now the biggest user of time.

Hm, biggest user of time in what workload? I've not noticed that
function particularly.

I agree that it doesn't seem like we need to spend a lot of time
cross-checking there, though. Maybe keep the code but #ifdef it
under some nondefault debugging symbol.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-13 17:06:14
Message-ID: 20170413170614.6pv2mpqiq7s6huio@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > Cool. I wonder if we also should remove AtEOXact_CatCache()'s
> > cross-checks - the resowner replacement has been in place for a while,
> > and seems robust enough. They're now the biggest user of time.
>
> Hm, biggest user of time in what workload? I've not noticed that
> function particularly.

Just initdb. I presume it's because the catcaches will frequently be
relatively big there.

> I agree that it doesn't seem like we need to spend a lot of time
> cross-checking there, though. Maybe keep the code but #ifdef it
> under some nondefault debugging symbol.

Hm, if we want to keep it, maybe tie it to CLOBBER_CACHE_ALWAYS or such,
so it gets compiled at least sometimes? Not a great fit, but ...

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-13 18:05:43
Message-ID: 9244.1492106743@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>>> Cool. I wonder if we also should remove AtEOXact_CatCache()'s
>>> cross-checks - the resowner replacement has been in place for a while,
>>> and seems robust enough. They're now the biggest user of time.

>> Hm, biggest user of time in what workload? I've not noticed that
>> function particularly.

> Just initdb. I presume it's because the catcaches will frequently be
> relatively big there.

Hm. That ties into something I was looking at yesterday. The only
reason that function is called so much is that bootstrap mode runs a
separate transaction for *each line of the bki file* (cf do_start,
do_end in bootparse.y). Which seems pretty silly. I experimented
with collapsing all the transactions for consecutive DATA lines into
one transaction, but couldn't immediately make it work due to memory
management issues. I didn't try collapsing the entire run into a
single transaction, but maybe that would actually be easier, though
no doubt more wasteful of memory.

>> I agree that it doesn't seem like we need to spend a lot of time
>> cross-checking there, though. Maybe keep the code but #ifdef it
>> under some nondefault debugging symbol.

> Hm, if we want to keep it, maybe tie it to CLOBBER_CACHE_ALWAYS or such,
> so it gets compiled at least sometimes? Not a great fit, but ...

Don't like that, because CCA is by definition not the normal cache
behavior. It would make a bit of sense to tie it to CACHEDEBUG,
but as you say, it'd never get tested normally if we do that.

On the whole, though, we may be looking at diminishing returns here.
I just did some "perf" measurement of the overall "initdb" cycle,
and what I'm seeing suggests that bootstrap mode as such is now a
pretty small fraction of the overall cycle:

+ 51.07% 0.01% 28 postgres postgres [.] PostgresMain #
...
+ 13.52% 0.00% 0 postgres postgres [.] AuxiliaryProcessMain #

That says that the post-bootstrap steps are now the bulk of the time,
which agrees with naked-eye observation.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-13 20:42:23
Message-ID: 20170413204223.7znfwllhqmglzaam@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-13 14:05:43 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
> >> Andres Freund <andres(at)anarazel(dot)de> writes:
> >>> Cool. I wonder if we also should remove AtEOXact_CatCache()'s
> >>> cross-checks - the resowner replacement has been in place for a while,
> >>> and seems robust enough. They're now the biggest user of time.
>
> >> Hm, biggest user of time in what workload? I've not noticed that
> >> function particularly.
>
> > Just initdb. I presume it's because the catcaches will frequently be
> > relatively big there.
>
> Hm. That ties into something I was looking at yesterday. The only
> reason that function is called so much is that bootstrap mode runs a
> separate transaction for *each line of the bki file* (cf do_start,
> do_end in bootparse.y). Which seems pretty silly.

Indeed.

> On the whole, though, we may be looking at diminishing returns here.
> I just did some "perf" measurement of the overall "initdb" cycle,
> and what I'm seeing suggests that bootstrap mode as such is now a
> pretty small fraction of the overall cycle:
>
> + 51.07% 0.01% 28 postgres postgres [.] PostgresMain #
> ...
> + 13.52% 0.00% 0 postgres postgres [.] AuxiliaryProcessMain #
>
> That says that the post-bootstrap steps are now the bulk of the time,
> which agrees with naked-eye observation.

The AtEOXact_CatCache weren't only in bootstrap mode, but yea, it's by
far smaller wins in comparison to the regprocin thing.

Greetings,

Andres Freund


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-14 18:51:39
Message-ID: 81484b13-4c27-caa0-f97d-519a836b55dd@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/13/2017 06:13 PM, Tom Lane wrote:
> I've pushed this with some mostly-cosmetic adjustments:

Thanks! I like your adjustments.

> There's certainly lots more that could be done in the genbki code,
> but I think all we can justify at this stage of the development
> cycle is to get the low-hanging fruit for testing speedups.

Yeah, I also noticed that the genbki code seems to have gotten little
love and that much more can be done here.

Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-14 21:54:19
Message-ID: 12002.1492206859@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-04-13 14:05:43 -0400, Tom Lane wrote:
>> Hm. That ties into something I was looking at yesterday. The only
>> reason that function is called so much is that bootstrap mode runs a
>> separate transaction for *each line of the bki file* (cf do_start,
>> do_end in bootparse.y). Which seems pretty silly.

> Indeed.

I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki. So I've pushed a patch that does
that. We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.

regards, tom lane


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 01:44:18
Message-ID: b549c8ad-f12e-aad1-9a59-b24cb3e55a17@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/14/2017 11:54 PM, Tom Lane wrote:
> I failed to resist the temptation to poke at this, and found that
> indeed nothing seems to break if we just use one transaction for the
> whole processing of postgres.bki. So I've pushed a patch that does
> that. We're definitely down to the point where worrying about the
> speed of bootstrap mode, per se, is useless; the other steps in
> initdb visibly take a lot more time.

Looked some at this and what take time now for me seems to mainly be
these four things (out of a total runtime of 560 ms).

1. setup_conversion: 140 ms
2. select_default_timezone: 90 ms
3. bootstrap_template1: 80 ms
4. setup_schema: 65 ms

These four take up about two thirds of the total runtime, so it seems
likely that we may still have relatively low hanging fruit (but not
worth committing for PostgreSQL 10).

I have not done profiling of these functions yet, so am not sure how
they best would be fixed but maybe setup_conversion could be converted
into bki entries to speed it up.

Andreas


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Andreas Karlsson <andreas(at)proxel(dot)se>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 02:08:39
Message-ID: bf1defc7-40a5-6e43-0474-c68820b6c9f8@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15/04/17 13:44, Andreas Karlsson wrote:
> On 04/14/2017 11:54 PM, Tom Lane wrote:
>> I failed to resist the temptation to poke at this, and found that
>> indeed nothing seems to break if we just use one transaction for the
>> whole processing of postgres.bki. So I've pushed a patch that does
>> that. We're definitely down to the point where worrying about the
>> speed of bootstrap mode, per se, is useless; the other steps in
>> initdb visibly take a lot more time.
>
> Looked some at this and what take time now for me seems to mainly be
> these four things (out of a total runtime of 560 ms).
>
> 1. setup_conversion: 140 ms
> 2. select_default_timezone: 90 ms
> 3. bootstrap_template1: 80 ms
> 4. setup_schema: 65 ms
>
> These four take up about two thirds of the total runtime, so it seems
> likely that we may still have relatively low hanging fruit (but not
> worth committing for PostgreSQL 10).
>
> I have not done profiling of these functions yet, so am not sure how
> they best would be fixed but maybe setup_conversion could be converted
> into bki entries to speed it up.
>
> Andreas
>
>
How much could be done concurrently?

Cheers.
Gavin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 04:07:33
Message-ID: 5807.1492229253@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Karlsson <andreas(at)proxel(dot)se> writes:
> Looked some at this and what take time now for me seems to mainly be
> these four things (out of a total runtime of 560 ms).

> 1. setup_conversion: 140 ms
> 2. select_default_timezone: 90 ms
> 3. bootstrap_template1: 80 ms
> 4. setup_schema: 65 ms

FWIW, you can bypass select_default_timezone by setting environment
variable TZ to a valid timezone name before calling initdb. On
my workstation, that currently cuts the time for "initdb --no-sync"
from about 1.25 sec to just about exactly 1.0 sec.

I doubt that we want all the buildfarm members to switch over to
doing that, since then we'd lose coverage of select_default_timezone.
But it's interesting to speculate about having the buildfarm script
extract the selected timezone name out of postgresql.conf after the
first initdb it does, and then set TZ for remaining tests. We surely
don't need to test select_default_timezone more than once per
buildfarm run.

regards, tom lane


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 13:29:53
Message-ID: f9e89f74-ceae-e94c-7cc1-1435233a2f2f@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/15/2017 12:07 AM, Tom Lane wrote:
> Andreas Karlsson <andreas(at)proxel(dot)se> writes:
>> Looked some at this and what take time now for me seems to mainly be
>> these four things (out of a total runtime of 560 ms).
>> 1. setup_conversion: 140 ms
>> 2. select_default_timezone: 90 ms
>> 3. bootstrap_template1: 80 ms
>> 4. setup_schema: 65 ms
> FWIW, you can bypass select_default_timezone by setting environment
> variable TZ to a valid timezone name before calling initdb. On
> my workstation, that currently cuts the time for "initdb --no-sync"
> from about 1.25 sec to just about exactly 1.0 sec.
>
> I doubt that we want all the buildfarm members to switch over to
> doing that, since then we'd lose coverage of select_default_timezone.
> But it's interesting to speculate about having the buildfarm script
> extract the selected timezone name out of postgresql.conf after the
> first initdb it does, and then set TZ for remaining tests. We surely
> don't need to test select_default_timezone more than once per
> buildfarm run.
>
>

Yeah. The obvious place to do this would be the "make check" stage,
which runs very early. Unfortunately, pg_regress carefully removes its
data directory on success - kind of a pity that's not switchable.

Probably for now the simplest thing for buildfarm animals whose owners
are trying to squeeze every last second would be to put something like

TZ => 'Us/Eastern',

in the build_env section of the config file. But as you say we don't
want everyone doing that.

Alternatively, we could have an initdb TAP test that explicitly removed
the environment setting so we'd get coverage of select_default_timezone,
and have the buildfarm set TZ to something if it's not already set.

cheers

andrew

--

Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 15:44:40
Message-ID: 20170415154440.3eg3dltxs5zybmv2@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:

> Alternatively, we could have an initdb TAP test that explicitly removed
> the environment setting so we'd get coverage of select_default_timezone,
> and have the buildfarm set TZ to something if it's not already set.

What about having an initdb option that runs select_default_timezone
only and reports the result, so that it can be used in the buildfarm
script to set TZ in all the regular initdb calls?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 16:04:22
Message-ID: 389c14b5-3678-e617-450f-07bf29197249@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/15/2017 11:44 AM, Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>> Alternatively, we could have an initdb TAP test that explicitly removed
>> the environment setting so we'd get coverage of select_default_timezone,
>> and have the buildfarm set TZ to something if it's not already set.
> What about having an initdb option that runs select_default_timezone
> only and reports the result, so that it can be used in the buildfarm
> script to set TZ in all the regular initdb calls?
>

Seems like more work.

What I had in mind was the attached plus roughly this in the buildfarm
client:

$ENV{TZ} ||= 'US/Eastern';

or whatever zone we choose to use.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
initdb-test-no-tz.patch text/x-diff 835 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 16:13:15
Message-ID: 29572.1492272795@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> What I had in mind was the attached plus roughly this in the buildfarm
> client:
> $ENV{TZ} ||= 'US/Eastern';
> or whatever zone we choose to use.

How about letting the first "make check" run with whatever is in the
environment, and then forcing TZ to become set (much as above) for
all the remaining tests? I'm afraid what you've got here might
encourage a certain sameness of the test environments.

regards, tom lane


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 16:33:54
Message-ID: 4dda7d5c-b7f7-f89a-792e-a52e72be71d3@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/15/2017 12:13 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
>> What I had in mind was the attached plus roughly this in the buildfarm
>> client:
>> $ENV{TZ} ||= 'US/Eastern';
>> or whatever zone we choose to use.
> How about letting the first "make check" run with whatever is in the
> environment, and then forcing TZ to become set (much as above) for
> all the remaining tests? I'm afraid what you've got here might
> encourage a certain sameness of the test environments.
>
>

Sure. Just means putting this code a bit later in the file. "make check"
is only one initdb, so it won't cost much. I'm still inclined to force a
TAP test for initdb with no TZ set, though.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 18:13:21
Message-ID: 8602.1492280001@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> Sure. Just means putting this code a bit later in the file. "make check"
> is only one initdb, so it won't cost much. I'm still inclined to force a
> TAP test for initdb with no TZ set, though.

I'd like that to be an additional tweak for some existing test case
rather than expending a whole initdb run just for that --- but otherwise,
no objection.

regards, tom lane


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-15 20:43:17
Message-ID: e6a6d107-2d51-a17d-2758-9b35e802c2bc@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/15/2017 02:13 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
>> Sure. Just means putting this code a bit later in the file. "make check"
>> is only one initdb, so it won't cost much. I'm still inclined to force a
>> TAP test for initdb with no TZ set, though.
> I'd like that to be an additional tweak for some existing test case
> rather than expending a whole initdb run just for that --- but otherwise,
> no objection.
>
>

That's what my patch did.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-16 23:43:15
Message-ID: 1d15be07-f625-ce11-d4a9-2379cd0399a8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/15/17 12:33, Andrew Dunstan wrote:
> Sure. Just means putting this code a bit later in the file. "make check"
> is only one initdb, so it won't cost much. I'm still inclined to force a
> TAP test for initdb with no TZ set, though.

How much is this going to buy overall? Is it worth the complications?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-16 23:48:36
Message-ID: cb56e19f-1e3e-00ba-cb98-af370167eb4b@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/16/2017 07:43 PM, Peter Eisentraut wrote:
> On 4/15/17 12:33, Andrew Dunstan wrote:
>> Sure. Just means putting this code a bit later in the file. "make check"
>> is only one initdb, so it won't cost much. I'm still inclined to force a
>> TAP test for initdb with no TZ set, though.
> How much is this going to buy overall? Is it worth the complications?
>

I don't know, but it's a very small change.

I am going to spend some time instrumenting where the time goes in
various tests.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: ilmari(at)ilmari(dot)org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=)
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-17 16:49:54
Message-ID: d8jpogbq8r1.fsf@dalvik.ping.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> There's certainly lots more that could be done in the genbki code,
> but I think all we can justify at this stage of the development
> cycle is to get the low-hanging fruit for testing speedups.

I threw Devel::NYTProf at it and picked some more low-hanging fruit.
Attached are separate patches for each change, and here are the runtimes
of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
(averages of 5 runs, in millseconds):

master (b6dd1271): 355, 182

1: Avoid unnecessary regex captures: 349, 183
2: Avoid repeated calls to SplitDataLine: 316, 158
3: Inline SplitDataLine: 291, 141
4: Inline check_natts: 287, 141

Together they shave 68ms or 19.2% off the runtime of genbki.pl and 41ms
or 22.5% off the runtime of Gen_fmgrtab.pl

Finally, one non-performance patch, which just removes the use of
Exporter in Catalog.pm, since none of the users actually import anything
from it.

- ilmari
--
"I use RMS as a guide in the same way that a boat captain would use
a lighthouse. It's good to know where it is, but you generally
don't want to find yourself in the same spot." - Tollef Fog Heen

Attachment Content-Type Size
0001-Avoid-unnecessary-regex-captures-in-Catalog.pm.patch text/x-diff 1.2 KB
0002-Avoid-repeated-calls-to-Catalog-SplitDataLine.patch text/x-diff 3.2 KB
0003-Inline-SplitDataLines-into-its-only-caller.patch text/x-diff 2.8 KB
0004-Inline-check_nattrs-into-its-only-caller.patch text/x-diff 1.8 KB
0005-Remove-pointless-Exporter-usage-in-Catalog.pm.patch text/x-diff 988 bytes

From: Andres Freund <andres(at)anarazel(dot)de>
To: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-17 16:53:15
Message-ID: 20170417165315.7jtyajt7unjp56ii@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > There's certainly lots more that could be done in the genbki code,
> > but I think all we can justify at this stage of the development
> > cycle is to get the low-hanging fruit for testing speedups.
>
> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
> Attached are separate patches for each change, and here are the runtimes
> of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
> (averages of 5 runs, in millseconds):
>
> master (b6dd1271): 355, 182
>
> 1: Avoid unnecessary regex captures: 349, 183
> 2: Avoid repeated calls to SplitDataLine: 316, 158
> 3: Inline SplitDataLine: 291, 141
> 4: Inline check_natts: 287, 141
>
> Together they shave 68ms or 19.2% off the runtime of genbki.pl and 41ms
> or 22.5% off the runtime of Gen_fmgrtab.pl

I'm a bit doubtful about improving the performance of genbki at the cost
of any sort of complication - it's only executed during the actual
build, not during initdb... I don't see much point in doing things like
3) and 4), it's just not worth it imo.

- Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-17 17:09:33
Message-ID: 7985.1492448973@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
>> I threw Devel::NYTProf at it and picked some more low-hanging fruit.

> I'm a bit doubtful about improving the performance of genbki at the cost
> of any sort of complication - it's only executed during the actual
> build, not during initdb... I don't see much point in doing things like
> 3) and 4), it's just not worth it imo.

Yeah, it's only run once per build, or probably even less often than that
considering we only re-run it when the input files change. I could get
interested in another 20% reduction in initdb's time, but not genbki's.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-17 17:14:12
Message-ID: 20170417171412.hg6ap7vriaqulbcq@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > There's certainly lots more that could be done in the genbki code,
> > but I think all we can justify at this stage of the development
> > cycle is to get the low-hanging fruit for testing speedups.
>
> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
> Attached are separate patches for each change, and here are the runtimes
> of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
> (averages of 5 runs, in millseconds):

Btw, I think Tom's "more that could be done" was referring more to doing
more upfront work, checking, easier input format, whatnot in the genbki,
not so much performance work... Tom, correct me if I'm wrong.

- Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-17 17:20:55
Message-ID: 8478.1492449655@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Btw, I think Tom's "more that could be done" was referring more to doing
> more upfront work, checking, easier input format, whatnot in the genbki,
> not so much performance work... Tom, correct me if I'm wrong.

Yeah, as per what we were just saying, performance of that code isn't
really an issue right now. Supporting a nicer input format is probably
the biggest step forward that could be taken, but that will require some
major work :-(. We've also talked about things like supporting
regprocedure rather than only regproc (ie disambiguating overloaded
functions), likewise regoperator, and so on, with an eye to reducing
the number of places where people have to write numeric OIDs in the
input. There's some threads on this in the archives, but I'm too
lazy to look.

regards, tom lane


From: ilmari(at)ilmari(dot)org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=)
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-18 09:19:52
Message-ID: d8jinm2qdhj.fsf@dalvik.ping.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Andres Freund <andres(at)anarazel(dot)de> writes:
>> On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
>>> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
>
>> I'm a bit doubtful about improving the performance of genbki at the cost
>> of any sort of complication - it's only executed during the actual
>> build, not during initdb... I don't see much point in doing things like
>> 3) and 4), it's just not worth it imo.
>
> Yeah, it's only run once per build, or probably even less often than that
> considering we only re-run it when the input files change. I could get
> interested in another 20% reduction in initdb's time, but not genbki's.

Fair enough. I still think 1), 2) and 5) are worthwile code cleanups
regardless of the performance impact. In fact, if we don't care that
much about the performance, we can move the duplicated code in
Gen_fmgrmtab.pl and genbki.pl that turns bki_values into a hash into
Catalogs.pm. That regresses genbki.pl time by ~30ms on my machine.

Revised patch series attached.

--
- Twitter seems more influential [than blogs] in the 'gets reported in
the mainstream press' sense at least. - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
to a mainstream media article. - Calle Dybedahl

Attachment Content-Type Size
0001-Avoid-unnecessary-regex-captures-in-Catalog.pm.patch text/x-diff 1.2 KB
0002-Avoid-repeated-calls-to-Catalog-SplitDataLine.patch text/x-diff 3.2 KB
0003-Hashify-bki-values-in-Catlog-Catalogs.patch text/x-diff 4.8 KB
0004-Remove-pointless-Exporter-usage-in-Catalog.pm.patch text/x-diff 988 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ilmari(at)ilmari(dot)org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=)
Cc: John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-08 20:48:48
Message-ID: 7408.1525812528@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Resurrecting this old thread ...

I decided it'd be interesting to re-examine where initdb's runtime
is going, seeing that we just got done with a lot of bootstrap data
restructuring. I stuck some timing code into initdb, and got
results like this:

creating directory /home/postgres/testversion/data ... ok
elapsed = 0.256 msec
creating subdirectories ... ok
elapsed = 2.385 msec
selecting default max_connections ... 100
elapsed = 13.528 msec
selecting default shared_buffers ... 128MB
elapsed = 13.699 msec
selecting dynamic shared memory implementation ... posix
elapsed = 0.129 msec
elapsed = 281.335 msec in select_default_timezone
creating configuration files ... ok
elapsed = 1.319 msec
running bootstrap script ... ok
elapsed = 162.143 msec
performing post-bootstrap initialization ... ok
elapsed = 832.569 msec

Sync to disk skipped.

real 0m1.316s
user 0m0.941s
sys 0m0.395s

(I'm using "initdb -N" because the cost of the sync step is so
platform-dependent, and it's not interesting anyway for buildfarm
or make check-world testing. Also, I rearranged the code slightly
so that select_default_timezone could be timed separately from the
rest of the "creating configuration files" step.)

In trying to break down the "post-bootstrap initialization" step
a bit further, I soon realized that trying to time the sub-steps from
initdb is useless. initdb is just shoving bytes down the pipe as
fast as the kernel will let it; it has no idea how long it's taking
the backend to do any one query or queries. So I ended up adding
"-c log_statement=all -c log_min_duration_statement=0" to the
backend_options, and digging query durations out of the log output.
I got these totals for the major steps in the post-boot run:

pg_authid setup: 0.909 ms
pg_depend setup: 64.980 ms
system views: 106.221 ms
pg_description: 39.665 ms
pg_collation: 65.162 ms
conversions: 72.024 ms
text search: 29.454 ms
init-acl hacking: 14.339 ms
information schema: 188.497 ms
plpgsql: 2.531 ms
analyze/vacuum/additional db creation: 171.762 ms

So the conversions don't look nearly as interesting as Andreas
suggested upthread. Pushing them into .bki format would at best
save ~ 70 ms out of 1300. Which is not nothing, but it's not
going to change the world either.

Really the only thing here that jumps out as being unduly expensive for
what it's doing is select_default_timezone. That is, and always has been,
a brute-force algorithm; I wonder if there's a way to do better? We can
probably guess that every non-Windows platform is using the IANA timezone
data these days. If there were some way to extract the name of the active
timezone setting directly, we wouldn't have to try to reverse-engineer it.
But I don't know of any portable way :-(

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-09 15:09:02
Message-ID: CA+TgmoYGgrLazC_eHfBsH2N2YDMhUi6nK_hsXgM5LFmgoSSYVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 8, 2018 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Really the only thing here that jumps out as being unduly expensive for
> what it's doing is select_default_timezone. That is, and always has been,
> a brute-force algorithm; I wonder if there's a way to do better? We can
> probably guess that every non-Windows platform is using the IANA timezone
> data these days. If there were some way to extract the name of the active
> timezone setting directly, we wouldn't have to try to reverse-engineer it.
> But I don't know of any portable way :-(

Who says we need a portable way? If we had something that worked on
Linux and macOS, it would cover most developer environments. I wonder
if readlink("/etc/localtime", buf, sz) might be a viable approach.
You could, at least, try any time zone you can derive that way first,
before you try any others.

Also, how about having a --timezone option for initdb? Then you could
determine the time zone just once and pass it down to subsequent
initdb invocations. Or just hardcode the regression tests to use some
fixed time zone, like Antarctica/Troll. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-09 15:18:10
Message-ID: 7191.1525879090@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, May 8, 2018 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Really the only thing here that jumps out as being unduly expensive for
>> what it's doing is select_default_timezone. That is, and always has been,
>> a brute-force algorithm; I wonder if there's a way to do better?

> Who says we need a portable way? If we had something that worked on
> Linux and macOS, it would cover most developer environments. I wonder
> if readlink("/etc/localtime", buf, sz) might be a viable approach.

I wondered about that, but I'm afraid it's often a hardlink not a
symlink. Still, we could try it.

> Also, how about having a --timezone option for initdb?

We already have that, it's called the TZ environment variable.

There was an effort awhile back to try to speed up the buildfarm
by having that get set automatically, but it failed for reasons
I don't recall ATM.

regards, tom lane


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-09 15:39:24
Message-ID: 20180509153924.32kvslqeqdyco7c5@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Tue, May 8, 2018 at 4:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Really the only thing here that jumps out as being unduly expensive for
> >> what it's doing is select_default_timezone. That is, and always has been,
> >> a brute-force algorithm; I wonder if there's a way to do better?
>
> > Who says we need a portable way? If we had something that worked on
> > Linux and macOS, it would cover most developer environments. I wonder
> > if readlink("/etc/localtime", buf, sz) might be a viable approach.
>
> I wondered about that, but I'm afraid it's often a hardlink not a
> symlink. Still, we could try it.

In Debian systems, it's a symlink. Apparently in RHEL6 and older it's a
copy or hardlink, and the file /etc/sysconfig/clock contains a ZONE
variable that points to the right zone. Maybe if we add enough
platform-dependent hacks, we would use the slow fallback only for rare
cases. (Maybe have initdb emit a warning when the fallback is used, so
that we know what else to look for.)

This comment is insightful:
https://github.com/HowardHinnant/date/issues/269#issuecomment-353792132

It's talking about this code:
https://github.com/HowardHinnant/date/blob/master/src/tz.cpp#L3652

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-09 15:44:47
Message-ID: CA+TgmoY0AtWhu6MO3qWJrbah1SeLwDqUNixCObkuwRYTi4u_JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 9, 2018 at 11:39 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> In Debian systems, it's a symlink. Apparently in RHEL6 and older it's a
> copy or hardlink, and the file /etc/sysconfig/clock contains a ZONE
> variable that points to the right zone. Maybe if we add enough
> platform-dependent hacks, we would use the slow fallback only for rare
> cases. (Maybe have initdb emit a warning when the fallback is used, so
> that we know what else to look for.)

I just checked a couple of RHEL7 systems and it seems to be a symlink
there. It's also a symlink on my laptop (macOS 10.13.3).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-09 15:56:13
Message-ID: 8873.1525881373@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Tom Lane wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> Who says we need a portable way? If we had something that worked on
>>> Linux and macOS, it would cover most developer environments. I wonder
>>> if readlink("/etc/localtime", buf, sz) might be a viable approach.

>> I wondered about that, but I'm afraid it's often a hardlink not a
>> symlink. Still, we could try it.

> In Debian systems, it's a symlink. Apparently in RHEL6 and older it's a
> copy or hardlink, and the file /etc/sysconfig/clock contains a ZONE
> variable that points to the right zone.

Yeah, on my RHEL6 box,

$ ls -l /etc/localtime
-rw-r--r--. 1 root root 3519 May 4 2010 /etc/localtime

The mod date's a little astonishing, considering this system was built
in 2013. It is *not* a hardlink, or even an exact match, to anything
under /usr/share/zoneinfo, though perhaps it was originally. Also:

$ cat /etc/sysconfig/clock
# The time zone of the system is defined by the contents of /etc/localtime.
# This file is only for evaluation by system-config-date, do not rely on its
# contents elsewhere.
ZONE="America/New York"

I'm inclined to respect the comment, especially since I see they are not
spelling the zone name canonically anyway (note space not underscore);
so looking at this wouldn't work without some ill-defined heuristics.

However, more modern Red Hat systems seem to have /etc/localtime as
a symlink, so checking it would work there, and also macOS seems to
do it that way for as far back as I can check.

> This comment is insightful:
> https://github.com/HowardHinnant/date/issues/269#issuecomment-353792132
> It's talking about this code:
> https://github.com/HowardHinnant/date/blob/master/src/tz.cpp#L3652

Interesting. Not sure if we want to try all those files. But I'll
take a look at whipping up something that checks /etc/localtime.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-10 16:18:19
Message-ID: 21138.1525969099@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> ... I'll
> take a look at whipping up something that checks /etc/localtime.

Here's a draft patch. It seems to do what I expect on a couple of
different macOS releases as well as recent Fedora. Googling finds
some suggestions that there might be other locations for the symlink,
for instance it's alleged that glibc can be configured to look at
/usr/local/etc/localtime instead of /etc/localtime. But AFAICS none
of the alternative locations are used on enough systems that it'd be
worth the cycles to check them. Likewise, there seem to be some
systems with conventions about storing the zone name in a text file
someplace, but not really enough of them to make it worth our time
to support that.

The initdb speedup on the other boxes I checked seems to be in the
vicinity of 50 ms out of circa-1-sec total, which is less than I'd have
expected from my measurements on my RHEL6 box :-(. Still, I think it's
worth doing, not least because (if it works) this fixes the problem that
you may get an unexpected alias for your system timezone, as in this
complaint awhile back:

/message-id/flat/4D7016AA.2090609%40agliodbs.com

I also changed initdb so that it reports what zone name it picked.
I have a vague recollection that it might've done that long ago, and
we suppressed it in a burst of enthusiasm about making initdb less
chatty. That seems like something to undo, since we're tweaking what
are still basically heuristic rules for choosing the zone.

Next question is what to do with this. Do we want to sit on it till
v12, or sneak it in now?

regards, tom lane

Attachment Content-Type Size
check-etc-localtime-symlink-for-zone-name-1.patch text/x-diff 9.3 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-10 17:40:58
Message-ID: 20180510174057.lxvpwbgbbghz2zrm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2018-05-10 12:18:19 -0400, Tom Lane wrote:
> Next question is what to do with this. Do we want to sit on it till
> v12, or sneak it in now?

Is there a decent argument for sneaking it in? I don't really have an
opinion. I don't think it'd really be arguable that this'll make testing
meaningfully faster. OTOH, it's fresh in your mind (which can be said
about a lot of patches obviously).

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-10 19:37:04
Message-ID: 373.1525981024@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2018-05-10 12:18:19 -0400, Tom Lane wrote:
>> Next question is what to do with this. Do we want to sit on it till
>> v12, or sneak it in now?

> Is there a decent argument for sneaking it in? I don't really have an
> opinion. I don't think it'd really be arguable that this'll make testing
> meaningfully faster. OTOH, it's fresh in your mind (which can be said
> about a lot of patches obviously).

Yeah, I had hoped that this might make a noticeable difference on slower
buildfarm animals, but some testing shows that it's more likely to be
barely above the noise floor.

OTOH, in view of Josh's old gripe, maybe it could be argued to be a bug
fix, at least on platforms where it does anything.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-10 20:10:18
Message-ID: CA+TgmoZzSJpxjteW=+2HD9SKtmMZJs-VqX+scaMg2HG1qA26pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 10, 2018 at 3:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, I had hoped that this might make a noticeable difference on slower
> buildfarm animals, but some testing shows that it's more likely to be
> barely above the noise floor.

Any idea why?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-10 21:08:37
Message-ID: 10623.1525986517@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, May 10, 2018 at 3:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah, I had hoped that this might make a noticeable difference on slower
>> buildfarm animals, but some testing shows that it's more likely to be
>> barely above the noise floor.

> Any idea why?

Well, maybe the noise floor is too high. I experimented with "make check"
in src/bin/scripts (TAP tests enabled), which does 13 initdb's as of HEAD.
On dromedary's host (an older macOS release), 3 runs each:

HEAD: 1m47.017s 1m49.548s 1m46.805s
+patch: 1m47.188s 1m45.611s 1m51.520s

If you blame that last run on some background task chewing cycles, which
is certainly possible considering I'd neglected to turn off dromedary's
cron job, then there's some gain but not much. There *is* a clearly
visible win if you measure "initdb -N" in isolation,

HEAD: 0m5.244s 0m5.025s 0m4.930s
+patch: 0m4.785s 0m4.706s 0m4.824s

but it's just not much compared to the other overhead of a TAP test.
These numbers would support the idea that we're saving about 250 ms
per initdb, which times 13 is circa 3 sec, which is comparable to the
inter-run variation I'm seeing in the scripts test.

I have no doubt that it'd add up to something over a lot of buildfarm
runs, but it's certainly no sort of game-changer.

Running on my RHEL6 dev machine, where I customarily use PROVE_FLAGS='-j4'
for check-world runs, src/bin/scripts takes maybe 14.9s on average. The
patch itself does nothing for that machine of course, but if I simulate
its effect by setting the TZ environment variable, the runtime drops to
14.2s or so. So again, there'd be some win for developer testing, but
not enough to get anybody excited.

regards, tom lane


From: "Jonathan S(dot) Katz" <jonathan(dot)katz(at)excoventures(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-10 21:37:07
Message-ID: DF42863C-BB66-48C6-BE68-CE6C928F23F3@excoventures.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> On May 10, 2018, at 12:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> On 2018-05-10 12:18:19 -0400, Tom Lane wrote:
>>> Next question is what to do with this. Do we want to sit on it till
>>> v12, or sneak it in now?
>
>> Is there a decent argument for sneaking it in? I don't really have an
>> opinion. I don't think it'd really be arguable that this'll make testing
>> meaningfully faster. OTOH, it's fresh in your mind (which can be said
>> about a lot of patches obviously).
>
> Yeah, I had hoped that this might make a noticeable difference on slower
> buildfarm animals, but some testing shows that it's more likely to be
> barely above the noise floor.
>
> OTOH, in view of Josh's old gripe, maybe it could be argued to be a bug
> fix, at least on platforms where it does anything.

Read back to get some history/context on this, and from my vantage
point it sounds like this is fixing a bug (i.e. incorrect behavior). It also sounds
like based on the changes the earliest we’d be able to commit is is 11 and
not any further because people could be expecting the incorrect behavior to
happen, and thus we may break existing systems.

Jonathan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jonathan S(dot) Katz" <jonathan(dot)katz(at)excoventures(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, John Naylor <jcnaylor(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-10 21:42:48
Message-ID: 13499.1525988568@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jonathan S. Katz" <jonathan(dot)katz(at)excoventures(dot)com> writes:
>> On May 10, 2018, at 12:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> OTOH, in view of Josh's old gripe, maybe it could be argued to be a bug
>> fix, at least on platforms where it does anything.

> Read back to get some history/context on this, and from my vantage
> point it sounds like this is fixing a bug (i.e. incorrect behavior). It also sounds
> like based on the changes the earliest we’d be able to commit is is 11 and
> not any further because people could be expecting the incorrect behavior to
> happen, and thus we may break existing systems.

Yeah, given the small number of complaints, I doubt back-patching would
be a good idea.

Seems like we should just leave this for v12.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-08-10 18:31:25
Message-ID: 25428.1533925885@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
>> ... I'll
>> take a look at whipping up something that checks /etc/localtime.

> Here's a draft patch. It seems to do what I expect on a couple of
> different macOS releases as well as recent Fedora.

The cfbot points out that this has suffered bit-rot, so here's a rebased
version --- no substantive changes.

regards, tom lane

Attachment Content-Type Size
check-etc-localtime-symlink-for-zone-name-2.patch text/x-diff 11.8 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-09-12 07:38:30
Message-ID: 20180912073701.GM25160@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 10, 2018 at 02:31:25PM -0400, Tom Lane wrote:
> The cfbot points out that this has suffered bit-rot, so here's a rebased
> version --- no substantive changes.

+ /*
+ * Try to read the symlink. If not there, not a symlink, etc etc, just
+ * quietly fail; the precise reason needn't concern us.
+ */
+ len = readlink(linkname, link_target, sizeof(link_target));

One thing that I can see changing with this patch is how timezone is set
in postgresql.conf. For example, on HEAD I get 'Japan' while this patch
gives back 'Asia/Tokyo'. Could it be an issue for countries with
multiple timezones? I am not sure how Russian systems would react on
that for example.

The time cut for initdb is interesting for buildfarm members anyway, and
the patch looks in nice shape to me.
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-09-12 14:00:21
Message-ID: 949.1536760821@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier <michael(at)paquier(dot)xyz> writes:
> One thing that I can see changing with this patch is how timezone is set
> in postgresql.conf. For example, on HEAD I get 'Japan' while this patch
> gives back 'Asia/Tokyo'. Could it be an issue for countries with
> multiple timezones? I am not sure how Russian systems would react on
> that for example.

Interesting --- what platform were you testing on?

I believe that this patch will never make for any functional change,
it will only give you some other alias for the zone it would have
selected anyway. This could only fail to be true if there are
distinct timezones that score_timezone() is failing to tell apart,
which would be a bug in score_timezone, not this patch. (Presumably,
we could fix any such bug by increasing the number of dates that
score_timezone tests.)

Since the tzdb database is rather full of aliases, for instance the
one you mentioned

$ grep ^Li src/timezone/data/tzdata.zi
...
Li Asia/Tokyo Japan
...

there is certainly plenty of opportunity for this to change the
apparent value of TimeZone. But I think it's for the better:
instead of choosing an alias that happens to be first according
to some unspecified search order, it will choose the alias that
somebody actually configured the operating system with.

regards, tom lane


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-09-12 22:44:08
Message-ID: 20180912224408.GB1387@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 12, 2018 at 10:00:21AM -0400, Tom Lane wrote:
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
>> One thing that I can see changing with this patch is how timezone is set
>> in postgresql.conf. For example, on HEAD I get 'Japan' while this patch
>> gives back 'Asia/Tokyo'. Could it be an issue for countries with
>> multiple timezones? I am not sure how Russian systems would react on
>> that for example.
>
> Interesting --- what platform were you testing on?

Debian SID, with Asia/Tokyo used for /etc/localtime.

> I believe that this patch will never make for any functional change,
> it will only give you some other alias for the zone it would have
> selected anyway. This could only fail to be true if there are
> distinct timezones that score_timezone() is failing to tell apart,
> which would be a bug in score_timezone, not this patch. (Presumably,
> we could fix any such bug by increasing the number of dates that
> score_timezone tests.)

Looking at the list of aliases, I am not seeing listed countries running
across multiple timezones, so that may be fine.. Anyway, your change,
and particularly the cut of time in running initdb, which matters for
TAP tests, are definitely good things in my opinion. So I am switching
that as ready for committer.
--
Michael


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-09-12 23:04:53
Message-ID: 20180912230453.mpm6qtvaj2p5des5@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2018-Sep-13, Michael Paquier wrote:

> Looking at the list of aliases, I am not seeing listed countries running
> across multiple timezones, so that may be fine..

Well, mine is there, and it's correct:

Li America/Santiago Chile/Continental
Li Pacific/Easter Chile/EasterIsland

Laugh all you want about Chile of all countries having multiple
timezones ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-09-12 23:20:33
Message-ID: 29346.1536794433@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier <michael(at)paquier(dot)xyz> writes:
> On Wed, Sep 12, 2018 at 10:00:21AM -0400, Tom Lane wrote:
>> I believe that this patch will never make for any functional change,
>> it will only give you some other alias for the zone it would have
>> selected anyway.

> Looking at the list of aliases, I am not seeing listed countries running
> across multiple timezones, so that may be fine.

Not sure what you're worried about. "Linked" time zones are the *same
data*. In an installed tzdb tree, the Japan file is either a hardlink or
symlink to the Asia/Tokyo one, so they can't differ. What you seem to be
speculating about is actual errors in the tzdb data, ie not describing
the facts on the ground in particular places. That's possible I suppose
but it's hardly our problem if it happens; it'd be theirs to fix.

regards, tom lane


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-09-13 02:55:02
Message-ID: CAEepm=2ryg8rH-nrkCxV9Gy+=8FvGY_DRe9t_GT2NTcuO2qXQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 13, 2018 at 11:20 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
> > On Wed, Sep 12, 2018 at 10:00:21AM -0400, Tom Lane wrote:
> >> I believe that this patch will never make for any functional change,
> >> it will only give you some other alias for the zone it would have
> >> selected anyway.
>
> > Looking at the list of aliases, I am not seeing listed countries running
> > across multiple timezones, so that may be fine.
>
> Not sure what you're worried about. "Linked" time zones are the *same
> data*. In an installed tzdb tree, the Japan file is either a hardlink or
> symlink to the Asia/Tokyo one, so they can't differ. What you seem to be
> speculating about is actual errors in the tzdb data, ie not describing
> the facts on the ground in particular places. That's possible I suppose
> but it's hardly our problem if it happens; it'd be theirs to fix.

I tested this on a system where /etc/localtime is not a symlink
(FreeBSD) and it worked fine, falling back to the old behaviour and
finding my timezone. (Apparently the argument against a symlink
/etc/localtime -> /usr/share/tzinfo/... is that new processes would
effectively switch timezone after /usr is mounted so your boot logs
would be mixed up. Or something. I bet that actually happens on
Linux too, but maybe no one does /usr as a mount point anymore...?)

I noticed that the patch does a bunch of s/Olson/IANA/. That leaves
only one place in the tree that still refers to the "Olson" database:
dt_common.c. Might want to change that too?

--
Thomas Munro
http://www.enterprisedb.com


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-09-13 03:59:36
Message-ID: 20180913035936.GG3578@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 12, 2018 at 08:04:53PM -0300, Alvaro Herrera wrote:
> Well, mine is there, and it's correct:
>
> Li America/Santiago Chile/Continental
> Li Pacific/Easter Chile/EasterIsland
>
> Laugh all you want about Chile of all countries having multiple
> timezones ...

Impressed is a better word. Really impressed seeing the shape of the
country ;)
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-09-13 14:18:49
Message-ID: 11007.1536848329@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> I noticed that the patch does a bunch of s/Olson/IANA/. That leaves
> only one place in the tree that still refers to the "Olson" database:
> dt_common.c. Might want to change that too?

Ah, I didn't realize we were that close to wiping out the old
terminology. Yeah, I'll get that one too. Thanks for noticing.

regards, tom lane