Re: move collation import to backend

Lists: Postg메이저 토토 사이트SQL
From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: move collation import to backend
Date: 2016-10-28 01:56:53
Message-ID: b78a8524-0412-5605-54d1-972b481eb16d@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Currently, initdb parses locale -a output to populate pg_collation. If
additional collations are installed in the operating system, it is not
possible to repeat this process, only by doing each step manually. So I
propose to move this to a backend function that can be called
separately, and have initdb call that. Running this logic in the
backend instead of initdb also makes the code simpler. If we add other
collation providers such as ICU, initdb doesn't need to know about that
at all, because all the logic would be contained in the backend.

Here is an example:

select pg_import_system_collations(if_not_exists => false, schema =>
'test');

(Specifying the schema also allows testing this without overwriting
pg_catalog.)

I thought about making this a top-level command (IMPORT COLLATIONS ...
?) but decided against it for now, to keep it simple. Right now, this
is more of a refactoring. Documentation could be added if we decide so.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Add-function-to-import-operation-system-collations.patch text/x-patch 13.6 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2016-11-12 15:38:30
Message-ID: 20161112153830.hy4pbjcf4jbrbr6q@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2016-10-27 21:56:53 -0400, Peter Eisentraut wrote:
> Currently, initdb parses locale -a output to populate pg_collation. If
> additional collations are installed in the operating system, it is not
> possible to repeat this process, only by doing each step manually. So I
> propose to move this to a backend function that can be called
> separately, and have initdb call that. Running this logic in the
> backend instead of initdb also makes the code simpler. If we add other
> collation providers such as ICU, initdb doesn't need to know about that
> at all, because all the logic would be contained in the backend.

That generally sounds like a good idea. There's some questions imo:
E.g. what if previously present collations are now unavailable?

> I thought about making this a top-level command (IMPORT COLLATIONS ...
> ?) but decided against it for now, to keep it simple.

Seems ok to me.

>
> /*
> * Also forbid matching an any-encoding entry. This test of course is not
> * backed up by the unique index, but it's not a problem since we don't
> * support adding any-encoding entries after initdb.
> */

Note that this isn't true anymore...

> +
> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
> +
> +Datum
> +pg_import_system_collations(PG_FUNCTION_ARGS)
> +{

Uh?

> + bool if_not_exists = PG_GETARG_BOOL(0);
> + Oid nspid = PG_GETARG_OID(1);
> +
> + FILE *locale_a_handle;
> + char localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
> + int count = 0;
> +
> + locale_a_handle = OpenPipeStream("locale -a", "r");
> + if (locale_a_handle == NULL)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not execute command \"%s\": %m",
> + "locale -a")));

This function needs to have !superuser permissions revoked, which it
afaics currently hasn't.

Greetings,

Andres Freund


From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2016-11-29 17:16:37
Message-ID: 5e72e41b-4f62-4bab-f367-7e26c82add70@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/12/16 10:38 AM, Andres Freund wrote:
> E.g. what if previously present collations are now unavailable?

You get an error message when you try to use the collation. I think
that is a different class of problems.

>>
>> /*
>> * Also forbid matching an any-encoding entry. This test of course is not
>> * backed up by the unique index, but it's not a problem since we don't
>> * support adding any-encoding entries after initdb.
>> */
>
> Note that this isn't true anymore...

I think this is still correct, because the collation import does not
produce any any-encoding entries (collencoding = -1).

>> +
>> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
>> +
>> +Datum
>> +pg_import_system_collations(PG_FUNCTION_ARGS)
>> +{
>
> Uh?

Required to avoid compiler warning about missing prototype.

> This function needs to have !superuser permissions revoked, which it
> afaics currently hasn't.

Done.

New patch attached (includes OID change because of conflict).

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v2-0001-Add-function-to-import-operation-system-collation.patch text/x-patch 13.7 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2016-11-29 19:53:35
Message-ID: 20161129195335.veosnf5rbrdmh346@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
> On 11/12/16 10:38 AM, Andres Freund wrote:
> >> /*
> >> * Also forbid matching an any-encoding entry. This test of course is not
> >> * backed up by the unique index, but it's not a problem since we don't
> >> * support adding any-encoding entries after initdb.
> >> */
> >
> > Note that this isn't true anymore...
>
> I think this is still correct, because the collation import does not
> produce any any-encoding entries (collencoding = -1).

Well, the comment "don't support adding any-encoding entries after
initdb." is now wrong.

> >> +
> >> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
> >> +
> >> +Datum
> >> +pg_import_system_collations(PG_FUNCTION_ARGS)
> >> +{
> >
> > Uh?
>
> Required to avoid compiler warning about missing prototype.

It seems not to be project style to have prototypes in the middle of the
file...

- Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2016-11-29 20:10:26
Message-ID: 12023.1480450226@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
>> Required to avoid compiler warning about missing prototype.

> It seems not to be project style to have prototypes in the middle of the
> file...

I agree. Please put that in builtins.h, if you can't find any better
header for it.

regards, tom lane


From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2016-11-30 13:18:20
Message-ID: 089bb119-cd29-9aa3-8271-8a7ceead5554@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/29/16 2:53 PM, Andres Freund wrote:
> On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
>> On 11/12/16 10:38 AM, Andres Freund wrote:
>>>> /*
>>>> * Also forbid matching an any-encoding entry. This test of course is not
>>>> * backed up by the unique index, but it's not a problem since we don't
>>>> * support adding any-encoding entries after initdb.
>>>> */
>>>
>>> Note that this isn't true anymore...
>>
>> I think this is still correct, because the collation import does not
>> produce any any-encoding entries (collencoding = -1).
>
> Well, the comment "don't support adding any-encoding entries after
> initdb." is now wrong.

I think there is a misunderstanding. The comment says that we don't
support adding encodings that have collencoding = -1 after initdb. That
is still true. Note that the original comment as two "any"'s. With
this patch, we would now support adding collations with collencoding <>
-1 after initdb.

>
>>>> +
>>>> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
>>>> +
>>>> +Datum
>>>> +pg_import_system_collations(PG_FUNCTION_ARGS)
>>>> +{
>>>
>>> Uh?
>>
>> Required to avoid compiler warning about missing prototype.
>
> It seems not to be project style to have prototypes in the middle of the
> file...

OK, will fix.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2016-12-05 05:33:27
Message-ID: CAJrrPGe8PB4xzOxvL_JR10fD7m+3CDKLJQLJbKYr3G6dbxQumQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 1, 2016 at 12:18 AM, Peter Eisentraut <
peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

>
> >
> >>>> +
> >>>> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
> >>>> +
> >>>> +Datum
> >>>> +pg_import_system_collations(PG_FUNCTION_ARGS)
> >>>> +{
> >>>
> >>> Uh?
> >>
> >> Required to avoid compiler warning about missing prototype.
> >
> > It seems not to be project style to have prototypes in the middle of the
> > file...
>
> OK, will fix.
>

Moved to next CF with "waiting on author" status.

Regards,
Hari Babu
Fujitsu Australia


From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2016-12-18 21:30:51
Message-ID: 2d35a7b7-9918-5dde-8fe3-f505f5557bc2@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/30/16 8:18 AM, Peter Eisentraut wrote:
>> It seems not to be project style to have prototypes in the middle of the
>> file...
>
> OK, will fix.

Updated patch with that fix.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v3-0001-Add-function-to-import-operation-system-collation.patch text/x-patch 14.2 KB

From: Euler Taveira <euler(at)timbira(dot)com(dot)br>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2017-01-10 03:04:25
Message-ID: 2f9dcce1-e97b-ca51-6181-10b19013fcd5@timbira.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18-12-2016 18:30, Peter Eisentraut wrote:
> Updated patch with that fix.
>
Peter, I reviewed and improved your patch.

* I document the new function. Since collation is a database object, I
chose "Database Object Management Functions" section.
* I've added a check to any-encoding database because I got 'FATAL:
collation "C" already exists' on Debian 8, although, I didn't get on
CentOS 7. The problem is that Debian has two locales for C (C and
C.utf-8) and CentOS has just one (C).
* I've added OidIsValid to test the new collation row.
* I've changed the parameter order. Schema seems more important than
if_not_exists. Also, we generally leave those boolean parameters for the
end of list. I don't turn if_not_exists optional but IMO it would be a
good idea (default = true).
* You removed some #if and #ifdef while moving things around. I put it back.
* You didn't pgident some lines of code but I'm sure you didn't for a
small patch footprint.
* I didn't test on Windows.
* As a last comment, you set cost = 100 and it seems reasonable because
it lasts 411 ms to scan/load 923 collations in my slow VM. However,
successive executions takes ~1200 ms.

I'm attaching the complete and also a patch at the top of your last patch.

--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Attachment Content-Type Size
collation-import-fixes.diff text/x-patch 5.0 KB
collation-import-complete.diff text/x-patch 15.6 KB

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2017-01-17 17:05:33
Message-ID: 08ef05ba-d8c5-192f-9c59-17d6b1ec9b10@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/9/17 10:04 PM, Euler Taveira wrote:
> On 18-12-2016 18:30, Peter Eisentraut wrote:
>> Updated patch with that fix.
>>
> Peter, I reviewed and improved your patch.
>
> * I document the new function. Since collation is a database object, I
> chose "Database Object Management Functions" section.

OK

> * I've added a check to any-encoding database because I got 'FATAL:
> collation "C" already exists' on Debian 8, although, I didn't get on
> CentOS 7. The problem is that Debian has two locales for C (C and
> C.utf-8) and CentOS has just one (C).

OK

> * I've added OidIsValid to test the new collation row.

OK

> * I've changed the parameter order. Schema seems more important than
> if_not_exists. Also, we generally leave those boolean parameters for the
> end of list. I don't turn if_not_exists optional but IMO it would be a
> good idea (default = true).

I put them that way because in an SQL command the "IF NOT EXISTS" comes
before the schema, but I can see how it is weird that way in a function.

> * You removed some #if and #ifdef while moving things around. I put it back.
> * You didn't pgident some lines of code but I'm sure you didn't for a
> small patch footprint.

I had left the #if in initdb, but I think your changes are better.

> I'm attaching the complete and also a patch at the top of your last patch.

Thanks. If there are no more comments, I will proceed with that.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Euler Taveira <euler(at)timbira(dot)com(dot)br>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2017-01-18 15:56:56
Message-ID: CAMkU=1z26qS-j9jfBPUzaz4xGPbaUiUypy_6-7RozExOxG4Byg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg메이저 토토 사이트SQL

On Tue, Jan 17, 2017 at 9:05 AM, Peter Eisentraut <
peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

> On 1/9/17 10:04 PM, Euler Taveira wrote:
> > On 18-12-2016 18:30, Peter Eisentraut wrote:
> >> Updated patch with that fix.
> >>
> > Peter, I reviewed and improved your patch.
> >
> > * I document the new function. Since collation is a database object, I
> > chose "Database Object Management Functions" section.
>
> OK
>
> > * I've added a check to any-encoding database because I got 'FATAL:
> > collation "C" already exists' on Debian 8, although, I didn't get on
> > CentOS 7. The problem is that Debian has two locales for C (C and
> > C.utf-8) and CentOS has just one (C).
>
> OK
>
> > * I've added OidIsValid to test the new collation row.
>
> OK
>
> > * I've changed the parameter order. Schema seems more important than
> > if_not_exists. Also, we generally leave those boolean parameters for the
> > end of list. I don't turn if_not_exists optional but IMO it would be a
> > good idea (default = true).
>
> I put them that way because in an SQL command the "IF NOT EXISTS" comes
> before the schema, but I can see how it is weird that way in a function.
>
> > * You removed some #if and #ifdef while moving things around. I put it
> back.
> > * You didn't pgident some lines of code but I'm sure you didn't for a
> > small patch footprint.
>
> I had left the #if in initdb, but I think your changes are better.
>
> > I'm attaching the complete and also a patch at the top of your last
> patch.
>
> Thanks. If there are no more comments, I will proceed with that.
>
>
With this commit, I'm getting 'make check' fail at initdb with the error:

2017-01-18 07:47:50.565 PST [43691] FATAL: collation "aa_ER(at)saaho" for
encoding "UTF8" already exists
2017-01-18 07:47:50.565 PST [43691] STATEMENT: SELECT
pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');

My system:

CentOS release 6.8 (Final)
gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)

./configure > /dev/null # no options

$ locale -a|fgrep aa_ER
aa_ER
aa_ER.utf8
aa_ER(dot)utf8(at)saaho
aa_ER(at)saaho

I have no idea what @ means in a locale, I'm just relaying the information.

Cheers,

Jeff


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: move collation import to backend
Date: 2017-01-18 16:12:13
Message-ID: 16636.1484755933@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> With this commit, I'm getting 'make check' fail at initdb with the error:

> 2017-01-18 07:47:50.565 PST [43691] FATAL: collation "aa_ER(at)saaho" for
> encoding "UTF8" already exists

Yeah, so are large chunks of the buildfarm. Having now read the patch,
I see that the problem is that it simply ignored the de-duplication
logic that existed in initdb's implementation. That was put there
on the basis of bitter experience, as I recall.

The new code seems to think it's sufficient to do an "if not exists"
insertion when generating abbreviated names, but that's wrong, and
even if it avoided outright failures, it would be nondeterministic
(I doubt "locale -a" is guaranteed to emit locale names in any
particular order).

I think this needs to be reverted pending redesign of the de-duplication
coding.

regards, tom lane