Re: TRIM_ARRAY

Lists: pgsql-hackers
From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: TRIM_ARRAY
Date: 2021-02-16 17:54:11
Message-ID: fc92ce17-9655-8ff1-c62a-4dc4c8ccd815@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The SQL standard defines a function called TRIM_ARRAY that surprisingly
has syntax that looks like a function! So I implemented it using a thin
wrapper around our array slice syntax. It is literally just ($1)[1:$2].

An interesting case that I decided to handle by explaining it in the
docs is that this won't give you the first n elements if your lower
bound is not 1. My justification for this is 1) non-standard lower
bounds are so rare in the wild that 2) people using them can just not
use this function. The alternative is to go through the unnest dance
(or write it in C) which defeats inlining.

Patch attached.
--
Vik Fearing

Attachment Content-Type Size
0001-implement-trim_array.patch text/x-patch 4.2 KB

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: Vik Fearing <vik(at)postgresfriends(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: TRIM_ARRAY
Date: 2021-02-16 18:32:47
Message-ID: CAMsGm5eseX1Sc1u=d5Prv_44CiMuhvtLnYFa2mFeic0oktqDWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 16 Feb 2021 at 12:54, Vik Fearing <vik(at)postgresfriends(dot)org> wrote:

> The SQL standard defines a function called TRIM_ARRAY that surprisingly
> has syntax that looks like a function! So I implemented it using a thin
> wrapper around our array slice syntax. It is literally just ($1)[1:$2].
>
> An interesting case that I decided to handle by explaining it in the
> docs is that this won't give you the first n elements if your lower
> bound is not 1. My justification for this is 1) non-standard lower
> bounds are so rare in the wild that 2) people using them can just not
> use this function. The alternative is to go through the unnest dance
> (or write it in C) which defeats inlining.
>

I don't recall ever seeing non-default lower bounds, so I actually think
it's OK to just rule out that scenario, but why not something like this:

($1)[:array_lower ($1, 1) + $2 - 1]

Note that I've used the 9.6 feature that allows omitting the lower bound.


From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: TRIM_ARRAY
Date: 2021-02-16 22:38:30
Message-ID: d6429c8e-1811-6456-b3f2-20ca5776076b@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2/16/21 7:32 PM, Isaac Morland wrote:
> On Tue, 16 Feb 2021 at 12:54, Vik Fearing <vik(at)postgresfriends(dot)org> wrote:
>
>> The SQL standard defines a function called TRIM_ARRAY that surprisingly
>> has syntax that looks like a function! So I implemented it using a thin
>> wrapper around our array slice syntax. It is literally just ($1)[1:$2].
>>
>> An interesting case that I decided to handle by explaining it in the
>> docs is that this won't give you the first n elements if your lower
>> bound is not 1. My justification for this is 1) non-standard lower
>> bounds are so rare in the wild that 2) people using them can just not
>> use this function. The alternative is to go through the unnest dance
>> (or write it in C) which defeats inlining.
>>
>
> I don't recall ever seeing non-default lower bounds, so I actually think
> it's OK to just rule out that scenario, but why not something like this:
>
> ($1)[:array_lower ($1, 1) + $2 - 1]

I'm kind of embarrassed that I didn't think about doing that; it is a
much better solution. You lose the non-standard bounds but I don't
think there is any way besides C to keep the lower bound regardless of
how you trim it.

V2 attached.
--
Vik Fearing

Attachment Content-Type Size
0001-implement-trim_array.v2.patch text/x-patch 4.0 KB

From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: TRIM_ARRAY
Date: 2021-02-17 00:25:52
Message-ID: bf054104-8c21-672a-f694-ed38fdec5cbc@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2/16/21 11:38 PM, Vik Fearing wrote:
> On 2/16/21 7:32 PM, Isaac Morland wrote:
>> On Tue, 16 Feb 2021 at 12:54, Vik Fearing <vik(at)postgresfriends(dot)org> wrote:
>>
>>> The SQL standard defines a function called TRIM_ARRAY that surprisingly
>>> has syntax that looks like a function! So I implemented it using a thin
>>> wrapper around our array slice syntax. It is literally just ($1)[1:$2].
>>>
>>> An interesting case that I decided to handle by explaining it in the
>>> docs is that this won't give you the first n elements if your lower
>>> bound is not 1. My justification for this is 1) non-standard lower
>>> bounds are so rare in the wild that 2) people using them can just not
>>> use this function. The alternative is to go through the unnest dance
>>> (or write it in C) which defeats inlining.
>>>
>>
>> I don't recall ever seeing non-default lower bounds, so I actually think
>> it's OK to just rule out that scenario, but why not something like this:
>>
>> ($1)[:array_lower ($1, 1) + $2 - 1]
>
> I'm kind of embarrassed that I didn't think about doing that; it is a
> much better solution. You lose the non-standard bounds but I don't
> think there is any way besides C to keep the lower bound regardless of
> how you trim it.

I've made a bit of a mess out of this, but I partly blame the standard
which is very unclear. It actually describes trimming the right n
elements instead of the left n like I've done here. I'll be back later
with a better patch that does what it's actually supposed to.
--
Vik Fearing


From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: TRIM_ARRAY
Date: 2021-02-21 02:09:05
Message-ID: 931d94fa-9238-47ba-6f17-01600a845417@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2/17/21 1:25 AM, Vik Fearing wrote:

> I've made a bit of a mess out of this, but I partly blame the standard
> which is very unclear. It actually describes trimming the right n
> elements instead of the left n like I've done here. I'll be back later
> with a better patch that does what it's actually supposed to.

And here is that patch.

Since the only justification I have for such a silly function is that
it's part of the standard, I decided to also issue the errors that the
standard describes which means the new function is now in C.
--
Vik Fearing

Attachment Content-Type Size
0001-implement-trim_array.v3.patch text/x-patch 6.7 KB

From: Dian Fay <dian(dot)m(dot)fay(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Vik Fearing <vikreykja(at)gmail(dot)com>
Subject: Re: TRIM_ARRAY
Date: 2021-03-01 23:14:39
Message-ID: 161464047991.29967.8270411748023500393.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, failed
Documentation: tested, failed

This basically does what it says, and the code looks good. The
documentation is out of alphabetical order (trim_array should appear
under cardinality rather than over)) but good otherwise. I was able to
"break" the function with an untyped null in psql:

select trim_array(null, 2);
ERROR: could not determine polymorphic type because input has type unknown

I don't know whether there are any circumstances other than manual entry
in psql where this could happen, since column values and variables will
always be typed. I don't have access to the standard, but DB2's docs[1]
note "if any argument is null, the result is the null value", so an
up-front null check might be preferable to a slightly arcane user-facing
error, even if it's a silly invocation of a silly function :)

[1] https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_trimarray.html

The new status of this patch is: Waiting on Author


From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: Dian Fay <dian(dot)m(dot)fay(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Vik Fearing <vikreykja(at)gmail(dot)com>
Subject: Re: TRIM_ARRAY
Date: 2021-03-01 23:53:04
Message-ID: b5ee1ac8-e403-6cdf-fcc3-dd5ccd5ceaf8@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/2/21 12:14 AM, Dian Fay wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, passed
> Implements feature: tested, passed
> Spec compliant: tested, failed
> Documentation: tested, failed

Thank you for looking at my patch!

> This basically does what it says, and the code looks good. The
> documentation is out of alphabetical order (trim_array should appear
> under cardinality rather than over)) but good otherwise.

Hmm. It appears between cardinality and unnest in the source code and
also my compiled html. Can you say more about where you're seeing the
wrong order?

> I was able to
> "break" the function with an untyped null in psql:
>
> select trim_array(null, 2);
> ERROR: could not determine polymorphic type because input has type unknown
>
> I don't know whether there are any circumstances other than manual entry
> in psql where this could happen, since column values and variables will
> always be typed. I don't have access to the standard, but DB2's docs[1]
> note "if any argument is null, the result is the null value", so an
> up-front null check might be preferable to a slightly arcane user-facing
> error, even if it's a silly invocation of a silly function :)
>
> [1] https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_trimarray.html

The standard also says that if either argument is null, the result is
null. The problem here is that postgres needs to know what the return
type is and it can only determine that from the input.

If you give the function a typed null, it returns null as expected.

> The new status of this patch is: Waiting on Author

I put it back to Needs Review without a new patch because I don't know
what I would change.
--
Vik Fearing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dian Fay <dian(dot)m(dot)fay(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Vik Fearing <vikreykja(at)gmail(dot)com>
Subject: Re: TRIM_ARRAY
Date: 2021-03-01 23:53:41
Message-ID: 865850.1614642821@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dian Fay <dian(dot)m(dot)fay(at)gmail(dot)com> writes:
> This basically does what it says, and the code looks good. The
> documentation is out of alphabetical order (trim_array should appear
> under cardinality rather than over)) but good otherwise. I was able to
> "break" the function with an untyped null in psql:

> select trim_array(null, 2);
> ERROR: could not determine polymorphic type because input has type unknown

That's a generic parser behavior for polymorphic functions, not something
this particular function could or should dodge.

regards, tom lane


From: "Dian M Fay" <dian(dot)m(dot)fay(at)gmail(dot)com>
To: "Vik Fearing" <vik(at)postgresfriends(dot)org>, <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: "Vik Fearing" <vikreykja(at)gmail(dot)com>
Subject: Re: TRIM_ARRAY
Date: 2021-03-02 00:02:50
Message-ID: C9MFB04FWKBB.13JGHNUNS5Z4R@lamia
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon Mar 1, 2021 at 6:53 PM EST, Vik Fearing wrote:
> > This basically does what it says, and the code looks good. The
> > documentation is out of alphabetical order (trim_array should appear
> > under cardinality rather than over)) but good otherwise.
>
> Hmm. It appears between cardinality and unnest in the source code and
> also my compiled html. Can you say more about where you're seeing the
> wrong order?

I applied the patch to the latest commit, ffd3944ab9. Table 9.52 is
ordered:

array_to_string
array_upper
trim_array
cardinality
unnest

> The problem here is that postgres needs to know what the return
> type is and it can only determine that from the input.
>
> If you give the function a typed null, it returns null as expected.
>
> > The new status of this patch is: Waiting on Author
>
> I put it back to Needs Review without a new patch because I don't know
> what I would change.

I'd thought that checking v and returning null instead of raising the
error would be more friendly, should it be possible to pass an untyped
null accidentally instead of on purpose, and I couldn't rule that out.
I've got no objections other than the docs having been displaced.


From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: Dian M Fay <dian(dot)m(dot)fay(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: TRIM_ARRAY
Date: 2021-03-02 01:31:06
Message-ID: 6565ca98-6733-532a-2130-752545bab6b5@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/2/21 1:02 AM, Dian M Fay wrote:
> On Mon Mar 1, 2021 at 6:53 PM EST, Vik Fearing wrote:
>>> This basically does what it says, and the code looks good. The
>>> documentation is out of alphabetical order (trim_array should appear
>>> under cardinality rather than over)) but good otherwise.
>>
>> Hmm. It appears between cardinality and unnest in the source code and
>> also my compiled html. Can you say more about where you're seeing the
>> wrong order?
>
> I applied the patch to the latest commit, ffd3944ab9. Table 9.52 is
> ordered:
>
> array_to_string
> array_upper
> trim_array
> cardinality
> unnest

So it turns out I must have fixed it locally after I posted the patch
and then forgot I did that. Attached is a new patch with the order
correct. Thanks for spotting it!

>> The problem here is that postgres needs to know what the return
>> type is and it can only determine that from the input.
>>
>> If you give the function a typed null, it returns null as expected.
>>
>>> The new status of this patch is: Waiting on Author
>>
>> I put it back to Needs Review without a new patch because I don't know
>> what I would change.
>
> I'd thought that checking v and returning null instead of raising the
> error would be more friendly, should it be possible to pass an untyped
> null accidentally instead of on purpose, and I couldn't rule that out.

As Tom said, that is something that does not belong in this patch.
--
Vik Fearing

Attachment Content-Type Size
0001-implement-trim_array.v4.patch text/x-patch 6.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vik Fearing <vik(at)postgresfriends(dot)org>
Cc: Dian M Fay <dian(dot)m(dot)fay(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: TRIM_ARRAY
Date: 2021-03-03 21:47:17
Message-ID: 1249311.1614808037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Vik Fearing <vik(at)postgresfriends(dot)org> writes:
> On 3/2/21 1:02 AM, Dian M Fay wrote:
>> I'd thought that checking v and returning null instead of raising the
>> error would be more friendly, should it be possible to pass an untyped
>> null accidentally instead of on purpose, and I couldn't rule that out.

> As Tom said, that is something that does not belong in this patch.

Yeah, the individual function really doesn't have any way to affect
this, since the error happens on the way to identifying which function
we should call in the first place.

I had the same problem as Dian of the func.sgml hunk winding up in
the wrong place. I think this is practically inevitable unless the
submitter uses more than 3 lines of context for the diff, because
otherwise the context is just boilerplate that looks the same
everywhere in the function tables. Unless the diff is 100% up to date
so that the line numbers are exactly right, patch is likely to guess
wrong about where to insert the new hunk. We'll just have to be
vigilant about that.

I fooled with your test case a bit ... I didn't think it was really
necessary to create and drop a table, when we could just use a VALUES
clause as source of test data. Also you'd forgotten to update the
"descr" description of the function to match the final understanding
of the semantics.

Looks good otherwise, so pushed.

regards, tom lane


From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dian M Fay <dian(dot)m(dot)fay(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: TRIM_ARRAY
Date: 2021-03-03 22:03:46
Message-ID: 5d4074b7-f844-e52c-2b3b-833f2d334a69@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/3/21 10:47 PM, Tom Lane wrote:
>
> I had the same problem as Dian of the func.sgml hunk winding up in
> the wrong place. I think this is practically inevitable unless the
> submitter uses more than 3 lines of context for the diff, because
> otherwise the context is just boilerplate that looks the same
> everywhere in the function tables. Unless the diff is 100% up to date
> so that the line numbers are exactly right, patch is likely to guess
> wrong about where to insert the new hunk. We'll just have to be
> vigilant about that.

Noted.

> I fooled with your test case a bit ... I didn't think it was really
> necessary to create and drop a table, when we could just use a VALUES
> clause as source of test data. Also you'd forgotten to update the
> "descr" description of the function to match the final understanding
> of the semantics.

Thank you.

> Looks good otherwise, so pushed.

Thanks!
--
Vik Fearing