Re: BUG #13636: psql numericlocale adds comma where it ought not

Lists: pgsql-bugs
From: jeff(dot)janes(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #13636: psql numericlocale adds comma where it ought not
Date: 2015-09-24 20:25:45
Message-ID: 20150924202545.26913.85050@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 13636
Logged by: Jeff Janes
Email address: jeff(dot)janes(at)gmail(dot)com
PostgreSQL version: 9.4.4
Operating system: Linux
Description:

\pset numericlocale on
select 1000000::real;
float4
--------
1e,+06
(1 row)

There should not be a comma added between e and +.

Same with other versions.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jeff(dot)janes(at)gmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #13636: psql numericlocale adds comma where it ought not
Date: 2015-09-24 23:37:20
Message-ID: 21049.1443137840@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

jeff(dot)janes(at)gmail(dot)com writes:
> \pset numericlocale on
> select 1000000::real;
> float4
> --------
> 1e,+06
> (1 row)

> There should not be a comma added between e and +.

Indeed. It looks like the author of format_numeric_locale() never
heard of e-format output. There's some other pretty crummy code in
there, but that's the core problem ...

regards, tom lane


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13636: psql numericlocale adds comma where it ought not
Date: 2015-09-24 23:46:42
Message-ID: CAEepm=1nTC__rwR78OE5NRW_AK4XYU6RsQxwxRv7m16=C5mUsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Sep 25, 2015 at 11:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> jeff(dot)janes(at)gmail(dot)com writes:
>> \pset numericlocale on
>> select 1000000::real;
>> float4
>> --------
>> 1e,+06
>> (1 row)
>
>> There should not be a comma added between e and +.
>
> Indeed. It looks like the author of format_numeric_locale() never
> heard of e-format output. There's some other pretty crummy code in
> there, but that's the core problem ...

Does this look reasonable?

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
numeric-locale-scientific.patch application/octet-stream 5.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13636: psql numericlocale adds comma where it ought not
Date: 2015-09-25 02:19:11
Message-ID: 27190.1443147551@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Fri, Sep 25, 2015 at 11:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Indeed. It looks like the author of format_numeric_locale() never
>> heard of e-format output. There's some other pretty crummy code in
>> there, but that's the core problem ...

> Does this look reasonable?

I thought it needed a rather more thoroughgoing revision: there is no need
for it to assume so much about what is in the column, and good reason for
it not to. (For instance, I note that psql will try to apply this code to
"money" columns, which may be a bad idea, but there can definitely be
stuff in there that doesn't look like a regular number.) It should muck
with digits immediately following the sign, and nothing else. There was
some other useless inefficiency too. I came up with the attached.

I would have borrowed your regression test additions, except I'm afraid
they will fail if the prevailing locale isn't C.

regards, tom lane

Attachment Content-Type Size
numeric-locale-fixes.patch text/x-diff 6.0 KB

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13636: psql numericlocale adds comma where it ought not
Date: 2015-09-25 03:33:04
Message-ID: CAEepm=0x_-CpXc2VctwvKBn2eJ-HK04eZWFG1tAwr7d7kxJAyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Sep 25, 2015 at 2:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> On Fri, Sep 25, 2015 at 11:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Indeed. It looks like the author of format_numeric_locale() never
>>> heard of e-format output. There's some other pretty crummy code in
>>> there, but that's the core problem ...
>
>> Does this look reasonable?
>
> I thought it needed a rather more thoroughgoing revision: there is no need
> for it to assume so much about what is in the column, and good reason for
> it not to. (For instance, I note that psql will try to apply this code to
> "money" columns, which may be a bad idea, but there can definitely be
> stuff in there that doesn't look like a regular number.) It should muck
> with digits immediately following the sign, and nothing else. There was
> some other useless inefficiency too. I came up with the attached.
>
> I would have borrowed your regression test additions, except I'm afraid
> they will fail if the prevailing locale isn't C.

Oops, right. (I suppose there could be a schedule of optional extra
tests that somehow run with C locale for psql, but perhaps not worth
setting up just for this.)

--
Thomas Munro
http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13636: psql numericlocale adds comma where it ought not
Date: 2015-09-25 04:06:31
Message-ID: 21952.1443153991@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Fri, Sep 25, 2015 at 2:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I would have borrowed your regression test additions, except I'm afraid
>> they will fail if the prevailing locale isn't C.

> Oops, right. (I suppose there could be a schedule of optional extra
> tests that somehow run with C locale for psql, but perhaps not worth
> setting up just for this.)

Yeah, I thought about that too, and likewise decided it probably wasn't
worth the trouble, not yet anyway.

regards, tom lane


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13636: psql numericlocale adds comma where it ought not
Date: 2015-09-25 11:59:32
Message-ID: CAEepm=2tm4wSgoTj6TBxZqj9f8EaLRmFYTbdey6f2EDy7GLcfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Sep 25, 2015 at 4:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> On Fri, Sep 25, 2015 at 2:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I would have borrowed your regression test additions, except I'm afraid
>>> they will fail if the prevailing locale isn't C.
>
>> Oops, right. (I suppose there could be a schedule of optional extra
>> tests that somehow run with C locale for psql, but perhaps not worth
>> setting up just for this.)
>
> Yeah, I thought about that too, and likewise decided it probably wasn't
> worth the trouble, not yet anyway.

About your follow-up commit 6325527d845b629243fb3f605af6747a7a4ac45f,
I noticed that glibc localedata has some grouping values of 0 (no
grouping at all), for example nl_NL, el_GR, hr_HR, it_IT, pl_PL,
es_CU, pt_PT and we don't honour that, if it's 0 we use 3. All the
rest begin with 3, except for unm_US which uses 2;2;2;3 (apparently a
Delaware language), and I confirmed that now produces strings like "12
34 56", so I guess that obscure locale may be the only case that the
commit actually changes on a glibc system.

--
Thomas Munro
http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13636: psql numericlocale adds comma where it ought not
Date: 2015-09-25 15:16:44
Message-ID: 6391.1443194204@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> About your follow-up commit 6325527d845b629243fb3f605af6747a7a4ac45f,
> I noticed that glibc localedata has some grouping values of 0 (no
> grouping at all), for example nl_NL, el_GR, hr_HR, it_IT, pl_PL,
> es_CU, pt_PT and we don't honour that, if it's 0 we use 3. All the
> rest begin with 3, except for unm_US which uses 2;2;2;3 (apparently a
> Delaware language), and I confirmed that now produces strings like "12
> 34 56", so I guess that obscure locale may be the only case that the
> commit actually changes on a glibc system.

Yeah, the locales where grouping isn't just 3 are so obscure that
I don't particularly care. If someone from one of those areas
wants to submit a feature patch to implement grouping more fully,
more power to 'em ...

However, I checked this morning and found that the MONEY case that
was niggling me yesterday is indeed a problem, for instance in de_DE:

$ LC_NUMERIC=de_DE psql regression
psql (9.6devel)
Type "help" for help.

regression=# set lc_monetary = 'de_DE';
SET
regression=# select '123456.78'::money;
money
-------------------
12.345.678,00 EUR
(1 row)

regression=# \pset numericlocale on
Locale-adjusted numeric output is on.
regression=# select '123456.78'::money;
money
-------------------
12,345.678,00 EUR
(1 row)

So we're gonna have to do something about that. I considered
a few fixes:

* Remove CASHOID from the set of datatypes that printQuery will choose
to right-justify. This seems likely to annoy people who are used to
having money amounts right-justified.

* Separate "use locale formatting" from "right justify", and apply only
the latter to CASHOID. This would be the cleanest fix but by far the
most invasive. I don't particularly want to do that much work and I
definitely wouldn't want to back-patch it.

* Put a hack into format_numeric_locale() so that it won't mess with
monetary output. This seems feasible because cash_out() always insists
on using a non-empty currency symbol. For example, we could check that
the string includes no characters outside "0123456789+-.eE" and feel
pretty safe that no money value would pass the check.

So I'm inclined to do the third one. Objections, better ideas?

regards, tom lane