BUG #4451: initcap() function capitalizes incorrectly

Lists: pgsql-bugs
From: "Scott V" <datagenic(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4451: initcap() function capitalizes incorrectly
Date: 2008-10-06 04:01:09
Message-ID: 200810060401.m96419cn021991@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 4451
Logged by: Scott V
Email address: datagenic(at)gmail(dot)com
PostgreSQL version: 8.3.1
Operating system: Mac OS X 10.5.4
Description: initcap() function capitalizes incorrectly
Details:

initcap() capitalizes incorrectly when passing strings containing certain
two-byte utf-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.

The function appears to be incorrectly interpreting the two-byte chars as
non-alphamueric characters. They are in fact alphanumerics, they just have
diacritical markings.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Scott V <datagenic(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4451: initcap() function capitalizes incorrectly
Date: 2008-10-06 08:03:23
Message-ID: 48E9C64B.7080306@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Scott V wrote:
> The following bug has been logged online:
>
> Bug reference: 4451
> Logged by: Scott V
> Email address: datagenic(at)gmail(dot)com
> PostgreSQL version: 8.3.1
> Operating system: Mac OS X 10.5.4
> Description: initcap() function capitalizes incorrectly
> Details:
>
> initcap() capitalizes incorrectly when passing strings containing certain
> two-byte utf-8 characters. E.g., when argument = 'mātūrāte', initcap
> returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.
>
> The function appears to be incorrectly interpreting the two-byte chars as
> non-alphamueric characters. They are in fact alphanumerics, they just have
> diacritical markings.

What's your setting for lc_collate?

//Magnus


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Scott V <datagenic(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4451: initcap() function capitalizes incorrectly
Date: 2008-10-06 12:37:08
Message-ID: 18902.1223296628@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Scott V wrote:
>> PostgreSQL version: 8.3.1
>> Operating system: Mac OS X 10.5.4

>> initcap() capitalizes incorrectly when passing strings containing certain
>> two-byte utf-8 characters. E.g., when argument = 'mtrte', initcap
>> returns 'MTRTe'. Correct result should be 'Mtrte'.

> What's your setting for lc_collate?

I think actually it's lc_ctype that determines case-folding. But the
current theory is that Apple's locale support is simply broken for
utf-8:
http://archives.postgresql.org/pgsql-general/2008-02/msg01072.php
which means that even if Scott had all his settings right, it wouldn't
work :-( A quick test on OS X here seems to confirm this.

regards, tom lane


From: "Scott Vanderbilt" <datagenic(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Magnus Hagander" <magnus(at)hagander(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4451: initcap() function capitalizes incorrectly
Date: 2008-10-06 15:50:11
Message-ID: cac40f10810060850i66b74557ob07aad3315f98857@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Note sure what the correct settings should be, but output from SHOW
ALL in psql says:

lc_collate C
lc_ctype C

On Mon, Oct 6, 2008 at 5:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Scott V wrote:
>>> PostgreSQL version: 8.3.1
>>> Operating system: Mac OS X 10.5.4
>
>>> initcap() capitalizes incorrectly when passing strings containing certain
>>> two-byte utf-8 characters. E.g., when argument = 'mātūrāte', initcap
>>> returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.
>
>> What's your setting for lc_collate?
>
> I think actually it's lc_ctype that determines case-folding. But the
> current theory is that Apple's locale support is simply broken for
> utf-8:
> http://archives.postgresql.org/pgsql-general/2008-02/msg01072.php
> which means that even if Scott had all his settings right, it wouldn't
> work :-( A quick test on OS X here seems to confirm this.
>
> regards, tom lane
>


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Scott Vanderbilt <datagenic(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4451: initcap() function capitalizes incorrectly
Date: 2008-10-06 16:01:46
Message-ID: 48EA366A.8050601@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Scott Vanderbilt wrote:
> Note sure what the correct settings should be, but output from SHOW
> ALL in psql says:
>
> lc_collate C
> lc_ctype C

There's a chapter on locale support in the user manual:

http://www.postgresql.org/docs/8.3/interactive/locale.html

The right setting depends on what language's collation rules you want to
follow. "locale -a" in a shell should list the available options.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com