Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.

Lists: pgsql-bugs
From: asimantov(at)tableausoftware(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-15 23:04:27
Message-ID: 20140915230427.2486.29437@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 11431
Logged by: Alon Siman-Tov
Email address: asimantov(at)tableausoftware(dot)com
PostgreSQL version: 9.3.5
Operating system: Windows 7 , Windows 2008
Description:

I am failing to backup/restore a Norwegian-Bokmål database on Windows,
whether the cluster was initialized with English US or Norwegian Bokmål
locale.

The database created within this cluster, has the following locale
settings:
LC_COLLATE = ''
LC_CTYPE = ''

I have backed-up the database using the pg_dump command:
pg_dump.exe -h localhost -p 5432 -U pgadmin -F c -Z 0 -f
"C:/test/bokmaldb.pg_dump" bokmaldb

Then try restoring the pg_dump file using:
pg_restore.exe -h localhost -p 5432 -U pgadmin -C -d
postgres -j 4 "C:/test/bokmaldb.pg_dump"

Fails with this error:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 1928; 1262
16384 DATABASE bokmaldb pgadmin
pg_restore: [archiver (db)] could not execute query: ERROR:
invalid byte sequence for encoding "UTF8": 0xe5 0x6c 0x29
Command was: CREATE DATABASE bokmaldb WITH TEMPLATE = template0 ENCODING
= 'UTF8' LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE...

I have tested and failed using the following Windows postgres versions:
9.3.5
9.1.14
9.0.18

Please investigate why Norwegian-Bokmål database cannot be backed-up and
restored.

Set of commands I ran:
initdb.exe -D c:/test/pgsql/data -E UTF8
--locale="Norwegian (Bokmål)_Norway.1252" -A trust -U pgadmin
pg_ctl -D "c:/test/pgsql/data" -l
c:/test/bokmalLocalenblocalePG.log start
psql.exe -h localhost -p 5432 -d postgres -U pgadmin -c
"CREATE DATABASE bokmaldb;"
pg_dump.exe -h localhost -p 5432 -U pgadmin -F c -Z 0 -f
"C:/test/bokmaldb.pg_dump" bokmaldb
psql.exe -h localhost -p 5432 -d postgres -U pgadmin -c
"DROP DATABASE bokmaldb;"
pg_restore.exe -h localhost -p 5432 -U pgadmin -C -d
postgres -j 4 "C:/test/bokmaldb.pg_dump"

Output of running restore:

C:\Program Files\PostgreSQL\9.3\bin>pg_restore.exe -h
localhost -p 5432 -U pgadmin -C -d postgres -j 4 "C:/test/bokmaldb.pg_dump
"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 1928; 1262
16384 DATABASE bokmaldb pgadmin
pg_restore: [archiver (db)] could not execute query: ERROR:
invalid byte sequence for encoding "UTF8": 0xe5 0x6c 0x29
Command was: CREATE DATABASE bokmaldb WITH
TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'Norwegian
(Bokmål)_Norway.1252' LC_CTYPE...
pg_restore: [archiver (db)] could not execute query: ERROR:
database "bokmaldb" does not exist
Command was: ALTER DATABASE bokmaldb OWNER
TO pgadmin;

pg_restore: [archiver (db)] could not reconnect to database:
FATAL: database "bokmaldb" does not exist

NOTE: running the following set of commands, that restore database with
Swedish locale works fine:
initdb.exe -D c:/test/pgsql/data -E UTF8 --locale="Swedish_SWeden.1252" -A
trust -U pgadmin
pg_ctl -D "c:/test/pgsql/data" -l c:/test/swdb.log start
psql.exe -h localhost -p 5432 -d postgres -U pgadmin -c "CREATE DATABASE
swdb;"
pg_dump.exe -h localhost -p 5432 -U pgadmin -F c -Z 0 -f
"C:/test/swdb.pg_dump" swdb
psql.exe -h localhost -p 5432 -d postgres -U pgadmin -c "DROP DATABASE
swdb;"
pg_restore.exe -h localhost -p 5432 -U pgadmin -C -d postgres -j 4
"C:/test/swdb.pg_dump"

** The database is restored with following settings:

CREATE DATABASE swdb
WITH OWNER = pgadmin
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'Swedish_Sweden.1252'
LC_CTYPE = 'Swedish_Sweden.1252'
CONNECTION LIMIT = -1;


From: Alon <asimantov(at)tableausoftware(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-19 22:15:53
Message-ID: 1411164953375-5819745.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The pg_dump file contains this command:
CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE = 'Norwegian
(Bokmål)_Norway.1252';

The UTF16 encoding for ål) [a-ring l parenthesis] is
00e5 006c 0029

In UTF8 this set of characters encoded as:
c3 a5 6c 29

The a-ring is converted to two bytes while the others are one.

Based on the ERROR:
invalid byte sequence for encoding "UTF8": 0xe5 0x6c 0x29

It appears the set of characters is getting passed as:
e5 6c 29

In UTF8, e5 is always the start of a three byte character,possibly
pg_restore, ceratedb or else, tries to read these bytes as a single
character.
However, 6c and 29 can only be single byte characters, they can't be the
next two bytes in a three byte character. Hence the failure.
Seems like in the code, the 00xe5 is converted to e5 instead of 'c3 a5' when
passing the LC_COLLATE and LC_CTYPE values.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-11431-Failing-to-backup-and-restore-a-Windows-postgres-database-with-Norwegian-Bokm-l-locale-tp5819260p5819745.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.


From: Noah Misch <noah(at)leadboat(dot)com>
To: Alon <asimantov(at)tableausoftware(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-21 05:18:46
Message-ID: 20140921051846.GA1565935@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Sep 19, 2014 at 03:15:53PM -0700, Alon wrote:
> The pg_dump file contains this command:
> CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
> LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE = 'Norwegian
> (Bokmål)_Norway.1252';
>
> The UTF16 encoding for ål) [a-ring l parenthesis] is
> 00e5 006c 0029
>
> In UTF8 this set of characters encoded as:
> c3 a5 6c 29
>
> The a-ring is converted to two bytes while the others are one.
>
> Based on the ERROR:
> invalid byte sequence for encoding "UTF8": 0xe5 0x6c 0x29
>
> It appears the set of characters is getting passed as:
> e5 6c 29
>
> In UTF8, e5 is always the start of a three byte character,possibly
> pg_restore, ceratedb or else, tries to read these bytes as a single
> character.
> However, 6c and 29 can only be single byte characters, they can't be the
> next two bytes in a three byte character. Hence the failure.
> Seems like in the code, the 00xe5 is converted to e5 instead of 'c3 a5' when
> passing the LC_COLLATE and LC_CTYPE values.

In WIN1252, "e5 6c 29" is "ål)". We're likely failing to set client_encoding
at some essential point in the process.


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-21 05:31:36
Message-ID: 541E62B8.2050607@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg범퍼카 토토SQL : Postg범퍼카 토토SQL 메일 링리스트 : 2014-09-21 이후 PGSQL-BUGS. 05:31

On 9/20/2014 10:18 PM, Noah Misch wrote:
> On Fri, Sep 19, 2014 at 03:15:53PM -0700, Alon wrote:
>> >The pg_dump file contains this command:
>> >CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
>> >LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE = 'Norwegian
>> >(Bokmål)_Norway.1252';...
> In WIN1252, "e5 6c 29" is "ål)". We're likely failing to set client_encoding
> at some essential point in the process.

is it a problem that ENCODING=utf8 is being used with the collation and
char typing of Norwegian windows-1252 ?

--
john r pierce 37N 122W
somewhere on the middle of the left coast


From: Alon <asimantov(at)tableausoftware(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-21 13:37:06
Message-ID: 1411306626409-5819836.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 커뮤니티SQL : Postg토토 커뮤니티SQL 메일 링리스트 : 2014-09-21 이후 PGSQL-BUGS 13:37

Yes, the Encoding might be the source of the problem, once initialized the
cluster with WINDOWS1252 as following:

initdb.exe -D c:/test/pgsql/data -E WINDOWS1252
--locale="Norwegian (Bokmål)_Norway.1252" -A trust -U pgadmin
pg_ctl -D "c:/test/pgsql/data" -l
c:/test/bokmalLocalenblocalePG.log start

The cluster created with these settings:
ENCODING = 'WIN1252'
TABLESPACE = pg_default
LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252'
LC_CTYPE = 'Norwegian (Bokmål)_Norway.1252'

Then pg_backup will create a proper pg_dump file, that can be restored into
database with a Norwegian (Bokmål)_Norway.1252 locale.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-11431-Failing-to-backup-and-restore-a-Windows-postgres-database-with-Norwegian-Bokm-l-locale-tp5819260p5819836.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.


From: Alon <asimantov(at)tableausoftware(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: RE: Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-21 13:38:39
Message-ID: 1854234292D9BF4AABF6CCFF34ED59C02734B449@S1P5DAG8E.EXCHPROD.USA.NET
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Yes, the Encoding might be the source of the problem, once initialized the cluster with WINDOWS1252 as following:

initdb.exe -D c:/test/pgsql/data -E WINDOWS1252
--locale="Norwegian (Bokmål)_Norway.1252" -A trust -U pgadmin
pg_ctl -D "c:/test/pgsql/data" -l
c:/test/bokmalLocalenblocalePG.log start

The cluster created with these settings:
ENCODING = 'WIN1252'
TABLESPACE = pg_default
LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252'
LC_CTYPE = 'Norwegian (Bokmål)_Norway.1252'

Then pg_backup will create a proper pg_dump file, that can be restored into database with a Norwegian (Bokmål)_Norway.1252 locale.

Thank you,
Alon

From: John R Pierce [via PostgreSQL] [mailto:ml-node+s1045698n5819819h77(at)n5(dot)nabble(dot)com]
Sent: Saturday, September 20, 2014 10:32 PM
To: Alon Siman Tov
Subject: Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.

On 9/20/2014 10:18 PM, Noah Misch wrote:
> On Fri, Sep 19, 2014 at 03:15:53PM -0700, Alon wrote:
>> >The pg_dump file contains this command:
>> >CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
>> >LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE = 'Norwegian
>> >(Bokmål)_Norway.1252';...
> In WIN1252, "e5 6c 29" is "ål)". We're likely failing to set client_encoding
> at some essential point in the process.

is it a problem that ENCODING=utf8 is being used with the collation and
char typing of Norwegian windows-1252 ?

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-bugs mailing list ([hidden email]</user/SendEmail.jtp?type=node&node=5819819&i=0>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

________________________________
If you reply to this email, your message will be added to the discussion below:
http://postgresql.1045698.n5.nabble.com/BUG-11431-Failing-to-backup-and-restore-a-Windows-postgres-database-with-Norwegian-Bokm-l-locale-tp5819260p5819819.html
To unsubscribe from BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale., click here<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=5819260&code=YXNpbWFudG92QHRhYmxlYXVzb2Z0d2FyZS5jb218NTgxOTI2MHwtMTAzMDQzNjQwOA==>.
NAML<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-11431-Failing-to-backup-and-restore-a-Windows-postgres-database-with-Norwegian-Bokm-l-locale-tp5819260p5819837.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Alon <asimantov(at)tableausoftware(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-21 16:13:25
Message-ID: 7107.1411316005@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Fri, Sep 19, 2014 at 03:15:53PM -0700, Alon wrote:
>> The pg_dump file contains this command:
>> CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
>> LC_COLLATE = 'Norwegian (Bokml)_Norway.1252' LC_CTYPE = 'Norwegian
>> (Bokml)_Norway.1252';

> In WIN1252, "e5 6c 29" is "l)". We're likely failing to set client_encoding
> at some essential point in the process.

The level of stupidity needed to use non-ASCII characters in a locale name
is breathtaking. What were Microsoft thinking? How are we supposed to
guess what encoding applies when setting the encoding?

regards, tom lane


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alon <asimantov(at)tableausoftware(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-21 18:04:13
Message-ID: 20140921180413.GA1585212@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sun, Sep 21, 2014 at 12:13:25PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > On Fri, Sep 19, 2014 at 03:15:53PM -0700, Alon wrote:
> >> The pg_dump file contains this command:
> >> CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
> >> LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE = 'Norwegian
> >> (Bokmål)_Norway.1252';
>
> > In WIN1252, "e5 6c 29" is "ål)". We're likely failing to set client_encoding
> > at some essential point in the process.
>
> The level of stupidity needed to use non-ASCII characters in a locale name
> is breathtaking. What were Microsoft thinking?

Windows Vista did deprecate that locale name style in favor of "nb-NO".
setlocale(LC_x, "") still returns the old style, though. You need PostgreSQL
built with VS2012 or later to use "nb-NO" style; see IsoLocaleName().

> How are we supposed to
> guess what encoding applies when setting the encoding?

Windows functions with "char *" arguments where encoding matters typically
expect the argument to be in the Windows ANSI code page.


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alon <asimantov(at)tableausoftware(dot)com>, <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-22 15:15:37
Message-ID: 54203D19.5020202@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 09/21/2014 09:04 PM, Noah Misch wrote:
> On Sun, Sep 21, 2014 at 12:13:25PM -0400, Tom Lane wrote:
>> Noah Misch <noah(at)leadboat(dot)com> writes:
>>> On Fri, Sep 19, 2014 at 03:15:53PM -0700, Alon wrote:
>>>> The pg_dump file contains this command:
>>>> CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
>>>> LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE = 'Norwegian
>>>> (Bokmål)_Norway.1252';
>>
>>> In WIN1252, "e5 6c 29" is "ål)". We're likely failing to set client_encoding
>>> at some essential point in the process.
>>
>> The level of stupidity needed to use non-ASCII characters in a locale name
>> is breathtaking. What were Microsoft thinking?
>
> Windows Vista did deprecate that locale name style in favor of "nb-NO".
> setlocale(LC_x, "") still returns the old style, though. You need PostgreSQL
> built with VS2012 or later to use "nb-NO" style; see IsoLocaleName().

Older versions support "norwegian-bokmal" as an alias, so we can use
that. Unfortunately, the user can't use the alias as a work-around, as
we always use the canonical string returned by setlocale(), regardless
of what alias the user used. But we could use it as a work-around in our
code.

We have a similar mapping for a few country names that have dots in the
name, "Hong Kong S.A.R." , "U.A.E.", and "Macau S.A.R.". This is
slightly different, though. With the dots, the problem is that
setlocale() doesn't accept the country name as argument. With Bokmål,
the problem is that we don't store the locale name in the catalogs
correctly, and hence don't pass correctly to setlocale(). (even if we
stored it correctly when a single encoding is used throughout the
cluster, things will go wrong as soon as you create a database that uses
a non-default encoding).

I think we should map "Norwegian (Bokmål)" to "norwegian-bokmal", so
that the mapping is applied to the return value of setlocale(). So when
initdb does setlocale(NULL, "") to get the locale, we'll return
"norwegian-bokmal", and that gets stored in the catalogs and
postgresql.conf.

Patch for that attached. pg_upgrade canonicalizes locale names by
passing them through setlocale(), before comparing them, so it should
still work. I'm a bit wary of back-patching, though. I think this would
work with existing clusters (as far as they work currently, with the
non-ASCII characters stored in pg_database), but would need some more
testing to be confident.

- Heikki

Attachment Content-Type Size
map-bokmal-1.patch text/x-diff 7.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Alon <asimantov(at)tableausoftware(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-22 16:10:45
Message-ID: 15763.1411402245@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> I think we should map "Norwegian (Bokml)" to "norwegian-bokmal", so
> that the mapping is applied to the return value of setlocale(). So when
> initdb does setlocale(NULL, "") to get the locale, we'll return
> "norwegian-bokmal", and that gets stored in the catalogs and
> postgresql.conf.

That seems like a reasonable hack if there's only the one case ...
how sure are we about that?

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Alon <asimantov(at)tableausoftware(dot)com>, <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-22 16:38:20
Message-ID: 5420507C.3050604@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 09/22/2014 07:10 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>> I think we should map "Norwegian (Bokmål)" to "norwegian-bokmal", so
>> that the mapping is applied to the return value of setlocale(). So when
>> initdb does setlocale(NULL, "") to get the locale, we'll return
>> "norwegian-bokmal", and that gets stored in the catalogs and
>> postgresql.conf.
>
> That seems like a reasonable hack if there's only the one case ...
> how sure are we about that?

That's the only one I can see in MSDN documentation, and in the dropdown
box in Windows 7. New languages and countries pop up every now and then,
but overall the lists are fairly stable. And we could live with a few of
these if we have to.

- Heikki


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alon <asimantov(at)tableausoftware(dot)com>, <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-09-25 12:22:38
Message-ID: 5424090E.9060700@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 09/22/2014 06:15 PM, Heikki Linnakangas wrote:
> Patch for that attached. pg_upgrade canonicalizes locale names by
> passing them through setlocale(), before comparing them, so it should
> still work. I'm a bit wary of back-patching, though. I think this would
> work with existing clusters (as far as they work currently, with the
> non-ASCII characters stored in pg_database), but would need some more
> testing to be confident.

This seems the best way to fix this in master, but there's a problem if
we backpatch this. If existing databases in the cluster already have
"Norwegian (Bokmål)" as the locale, and you update the binaries and try
to create a new database:

postgres=# create database foodb;
ERROR: new collation (norwegian-bokmal_Norway.1252) is incompatible
with the co llation of the template database (Norwegian
(Bokmål)_Norway.1252)
HINT: Use the same collation as in the template database, or use
template0 as t emplate.

That's straightforward to fix; instead of doing a straight strcmp() to
check if the locales are the same, canonicalize them by calling
check_locale first. Attached patch does that.

After this, it's a bit strange that newly created databases use
"norwegian-bokmal" as the locale, while old ones use "Norwegian (Bokmål)":

foodb=# select datname, encoding, datcollate from pg_database;
datname | encoding | datcollate
-----------+----------+--------------------------------
template1 | 24 | Norwegian (Bokmål)_Norway.1252
template0 | 24 | Norwegian (Bokmål)_Norway.1252
postgres | 24 | Norwegian (Bokmål)_Norway.1252
foodb | 24 | norwegian-bokmal_Norway.1252
utf8db | 6 | norwegian-bokmal_Norway.1252
(5 rows)

But we know those non-ASCII characters are problematic, so I think this
is an improvement even in old clusters. At least you won't get any more
of them. You could also UPDATE pg_database manually to fix that in an
existing cluster.

One more problem: pg_upgrade doesn't canonicalize locale names either,
so you get:

> lc_collate cluster values do not match: old "Norwegian (BokmÕl)_Norway.1252", n
> ew "norwegian-bokmal_Norway.1252"
>
> Failure, exiting

Bruce: do you think it would be OK to canonicalize the locale names
before comparing? pg_upgrade already has a function to canonicalize, but
it's only used when upgrading from a pre-9.2 server; locale names on
newer versions are assumed to be already in canonical form.

Alternatively, we could not bother with changing pg_upgrade or CREATE
DATABASE, and instead instruct Bokmål users to do the manual UPDATE of
pg_database in the release notes. That might be the most robust
solution, if there are more cases where we compare locales that I've missed.

- Heikki

Attachment Content-Type Size
map-bokmal-2.patch text/x-diff 10.7 KB

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alon <asimantov(at)tableausoftware(dot)com>, <pgsql-bugs(at)postgresql(dot)org>, "Bruce Momjian" <bruce(at)momjian(dot)us>
Subject: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-10-10 12:45:01
Message-ID: 5437D4CD.8020909@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 09/25/2014 03:22 PM, Heikki Linnakangas wrote:
> Alternatively, we could not bother with changing pg_upgrade or CREATE
> DATABASE, and instead instruct Bokmål users to do the manual UPDATE of
> pg_database in the release notes. That might be the most robust
> solution, if there are more cases where we compare locales that I've missed.

I just realized that pg_upgrade doesn't currently do the
canonicalization when upgrading from a pre-8.4 server. So AFAICS,
upgrading from 8.3 to 9.2-9.4 will fail, if the locale name in the old
cluster doesn't happen to match the canonical form.

I committed the pg_upgrade patch discussed on the other thread to
master. For the 9.2->9.4 back-branches, I think we should do the
attached
(0001-Make-the-locale-comparison-in-pg_upgrade-more-lenien.patch). This
fixes the problem with 8.3 -> 9.2 upgrades, and prepares pg_upgrade for
the "Norwegian (Bokmål)" -> "norwegian-bokmal" mapping.

Now, for the mapping. I thought about it some more, and I think we
really need to provide instructions to fix existing clusters, as part of
the release notes. Even if we do the change to CREATE DATABASE that I
included in my earlier patch, you'll continue to have problems with
different encodings. And initdb writes the values in postgresql.conf
too, for lc_messages et all. Those really should be fixed too.

So I don't think there's much point in the CREATE DATABASE part. Let's
backpatch the attached patch
(0001-Work-around-Windows-locale-name-with-non-ASCII-chara.patch) and
add the attached instructions to the release notes.

The UPDATE statements in the instructions are specially crafted to avoid
"invalid byte sequence" errors, if they're run from a database that uses
a different encoding than the one used in the existing rows. I tried
using regexp_replace at first, but tries to decode all the characters,
and fails. I'm not 100% this would work with every encoding, but at
least it works with a mixture of LATIN1 and utf-8, which are the most
likely encodings to be used with Norwegian.

These three things together, the pg_upgrade patch, the setlocale()
mapping patch and the release note changes, resolves the issue.

- Heikki

Attachment Content-Type Size
0001-Make-the-locale-comparison-in-pg_upgrade-more-lenien.patch text/x-diff 5.8 KB
0001-Work-around-Windows-locale-name-with-non-ASCII-chara.patch text/x-diff 8.6 KB
fix-bokmal-pg_database.txt text/plain 1.3 KB

From: Alon <asimantov(at)tableausoftware(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-10-10 18:32:48
Message-ID: 1412965968588-5822615.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Looks like great progress has been made to address this issue. Any idea which
release the patch might go in?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-11431-Failing-to-backup-and-restore-a-Windows-postgres-database-with-Norwegian-Bokm-l-locale-tp5819260p5822615.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Alon <asimantov(at)tableausoftware(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-10-11 12:57:19
Message-ID: CAB7nPqQ=a+fWPRf7JSNCJHBknsiL20RViVuFraH83xgmgf3zYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sat, Oct 11, 2014 at 3:32 AM, Alon <asimantov(at)tableausoftware(dot)com> wrote:
> Looks like great progress has been made to address this issue. Any idea which
> release the patch might go in?
I imagine that it will be in for 9.3.6, the next minor release of hte
9.3 series.
Regards,
--
Michael


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alon <asimantov(at)tableausoftware(dot)com>, <pgsql-bugs(at)postgresql(dot)org>, "Bruce Momjian" <bruce(at)momjian(dot)us>
Subject: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-10-24 18:15:41
Message-ID: 544A974D.9000002@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 10/10/2014 03:45 PM, Heikki Linnakangas wrote:
> On 09/25/2014 03:22 PM, Heikki Linnakangas wrote:
>> Alternatively, we could not bother with changing pg_upgrade or CREATE
>> DATABASE, and instead instruct Bokmål users to do the manual UPDATE of
>> pg_database in the release notes. That might be the most robust
>> solution, if there are more cases where we compare locales that I've missed.
>
> I just realized that pg_upgrade doesn't currently do the
> canonicalization when upgrading from a pre-8.4 server. So AFAICS,
> upgrading from 8.3 to 9.2-9.4 will fail, if the locale name in the old
> cluster doesn't happen to match the canonical form.
>
> I committed the pg_upgrade patch discussed on the other thread to
> master. For the 9.2->9.4 back-branches, I think we should do the
> attached
> (0001-Make-the-locale-comparison-in-pg_upgrade-more-lenien.patch). This
> fixes the problem with 8.3 -> 9.2 upgrades, and prepares pg_upgrade for
> the "Norwegian (Bokmål)" -> "norwegian-bokmal" mapping.
>
>
> Now, for the mapping. I thought about it some more, and I think we
> really need to provide instructions to fix existing clusters, as part of
> the release notes. Even if we do the change to CREATE DATABASE that I
> included in my earlier patch, you'll continue to have problems with
> different encodings. And initdb writes the values in postgresql.conf
> too, for lc_messages et all. Those really should be fixed too.
>
> So I don't think there's much point in the CREATE DATABASE part. Let's
> backpatch the attached patch
> (0001-Work-around-Windows-locale-name-with-non-ASCII-chara.patch) and
> add the attached instructions to the release notes.
>
> The UPDATE statements in the instructions are specially crafted to avoid
> "invalid byte sequence" errors, if they're run from a database that uses
> a different encoding than the one used in the existing rows. I tried
> using regexp_replace at first, but tries to decode all the characters,
> and fails. I'm not 100% this would work with every encoding, but at
> least it works with a mixture of LATIN1 and utf-8, which are the most
> likely encodings to be used with Norwegian.
>
> These three things together, the pg_upgrade patch, the setlocale()
> mapping patch and the release note changes, resolves the issue.

Ok, I've committed these patches now.

The release note entry needs to be written, with the instructions on
updating pg_database. We don't have anywhere to stash release note
entries for minor releases, until we actually write the release notes,
do we?

- Heikki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Alon <asimantov(at)tableausoftware(dot)com>, pgsql-bugs(at)postgresql(dot)org, "Bruce Momjian" <bruce(at)momjian(dot)us>
Subject: Re: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2014-10-24 18:52:27
Message-ID: 4893.1414176747@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> The release note entry needs to be written, with the instructions on
> updating pg_database. We don't have anywhere to stash release note
> entries for minor releases, until we actually write the release notes,
> do we?

The normal procedure is that you put sufficient information into the
commit message for the release notes to be written from that. If you
didn't, tut tut ... but you could add an SGML comment at the front of
release-9.4.sgml where the next section will get added.

regards, tom lane


From: Alon <asimantov(at)tableausoftware(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2015-05-08 21:05:10
Message-ID: 1431119110460-5848514.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Any time estimate in which release/revision the fix might be?

--
View this message in context: http://postgresql.nabble.com/BUG-11431-Failing-to-backup-and-restore-a-Windows-postgres-database-with-Norwegian-Bokm-l-locale-tp5819260p5848514.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alon <asimantov(at)tableausoftware(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres da =?utf-8?Q?tabase, _with_Norwegian_Bokm=C3=A5l_locale.?=
Date: 2015-05-08 21:09:01
Message-ID: 22254.1431119341@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alon <asimantov(at)tableausoftware(dot)com> writes:
> Any time estimate in which release/revision the fix might be?

The last ones, no? Or are you claiming this was not fixed in 9.4.1 et al?

http://www.postgresql.org/docs/9.4/static/release-9-4-1.html

regards, tom lane


From: Alon <asimantov(at)tableausoftware(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Date: 2015-05-08 21:21:57
Message-ID: 1431120117818-5848517.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"The 9.4.1, 9.3.6, and 9.2.10 minor releases contain changes to the way
Postgres handles the Norwegian (Bokmål) locale on Windows. "

So based on the link, 9.3.6 should contain the fix. Right?
If so, I will test it and inform.

--
View this message in context: http://postgresql.nabble.com/BUG-11431-Failing-to-backup-and-restore-a-Windows-postgres-database-with-Norwegian-Bokm-l-locale-tp5819260p5848517.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.