Quick Links

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

Lists:	pgsql-hackers

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	[PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-03 09:15:57
Message-ID:	fc160ee0-c843-b024-29bb-97b5da61971f@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Oracle:
https://docs.oracle.com/en/database/oracle/oracle-database/18/adfns/regexp.html#GUID-F14733F3-B943-4BAD-8489-F9704986386B
IBM:
https://www.ibm.com/support/producthub/db2/docs/content/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0061494.html?pos=2
Z/OS:
https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_regexplike.html
EDB:
https://www.enterprisedb.com/edb-docs/d/edb-postgres-advanced-server/reference/database-compatibility-for-oracle-developers-reference-guide/9.6/Database_Compatibility_for_Oracle_Developers_Reference_Guide.1.098.html

Hi,

I would like to suggest adding the $subject functions to PostgreSQL. We
can do lot of things using regexp_matches() and regexp_replace() but
some time it consist on building complex queries that these functions
can greatly simplify.

Look like all RDBMS that embedded a regexp engine implement these
functions (Oracle, DB2, MySQL, etc) but I don't know if they are part of
the SQL standard. Probably using regexp_matches() can be enough even if
it generates more complex statements but having these functions in
PostgreSQL could be useful for users and code coming from theses RDBMS.

- REGEXP_COUNT( string text, pattern text, [, position int] [, flags
text ] ) -> integer

Return the number of times a pattern occurs in a source string
after a certain position, default from beginning.

It can be implemented in PostgreSQL as a subquery using:

SELECT count(*) FROM regexp_matches('A1B2C3', '[A-Z][0-9]',
'g'); -> 3

To support positioning we have to use substr(), for example
starting at position 2:

SELECT count(*) FROM regexp_matches(substr('A1B2C3', 2),
'[A-Z][0-9]'); -> 2

With regexp_count() we can simply use it like this:

SELECT regexp_count('A1B2C3', '[A-Z][0-9]'); -> 3
SELECT regexp_count('A1B2C3', '[A-Z][0-9]', 2); -> 2

- REGEXP_INSTR( string text, pattern text, [, position int] [,
occurrence int] [, return_opt int ] [, flags text ] [, group int] ) ->
integer

Return the position in a string for a regular expression
pattern. It returns an integer indicating the beginning or ending
position of the matched substring, depending on the value of the
return_opt argument (default beginning). If no match is found, then the
function returns 0.

            * position: indicates the character where the search should
begin.
            * occurrence: indicates which occurrence of pattern found in
string should be search.
            * return_opt: 0 mean returns the position of the first
character of the occurrence, 1 mean returns the position of the
character following the occurrence.
            * flags: regular expression modifiers.
            * group: indicates which subexpression in pattern is the
target of the function.

Example:

SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1,
0, 'i', 4); -> 7

to obtain a PostgreSQL equivalent:

SELECT position((SELECT (regexp_matches('1234567890',
'(123)(4(56)(78))', 'ig'))[4] offset 0 limit 1) IN '1234567890');

- REGEXP_SUBSTR( string text, pattern text, [, position int] [,
occurrence int] [, flags text ] [, group int] ) -> text

It is similar to regexp_instr(), but instead of returning the
position of the substring, it returns the substring itself.

Example:

SELECT regexp_substr('500 gilles''s street, 38000 Grenoble,
FR', ',[^,]+,'); -> , 38000 Grenoble,

or with a more complex extraction:

SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1,
'i', 4); -> 78
SELECT regexp_substr('1234567890 1234557890',
'(123)(4(5[56])(78))', 1, 2, 'i', 3); -> 55

To obtain the same result for the last example we have to use:

SELECT (SELECT * FROM regexp_matches('1234567890
1234557890', '(123)(4(5[56])(78))', 'g') offset 1 limit 2)[3];

I have not implemented the regexp_like() function, it is quite similar
than the ~ and ~* operators except that it can also support other
modifiers than 'i'. I can implement it easily and add it to the patch if
we want to supports all those common functions.

- REGEXP_LIKE( string text, pattern text, [, flags text ] ) -> boolean

Similar to the LIKE condition, except that it performs regular
expression matching instead of the simple pattern matching performed by
LIKE.

Example:

SELECT * FROM t1 WHERE regexp_like(col1, '^d$', 'm');

to obtain a PostgreSQL equivalent:

SELECT * FROM t1 WHERE regexp_match (col1, '^d$', 'm' ) IS
NOT NULL;

There is also a possible extension to regexp_replace() that I have not
implemented yet because it need more work than the previous functions.

- REGEXP_REPLACE( string text, pattern text, replace_string text, [,
position int] [, occurrence int] [, flags text ] )

Extend PostgreSQL regexp_replace() by adding position and occurrence
capabilities.

The patch is ready for testing with documentation and regression tests.

Best regards,

--
Gilles Darold
LzLabs GmbH

From:	Gilles Darold <gillesdarold(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-03 09:22:16
Message-ID:	79fadb31-3f06-3efc-bc53-1c4b015d5088@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

My apologies for the links in the head, the email formatting and the
missing patch, I accidently send the email too early.

Gilles

Attachment	Content-Type	Size
regexp_functions-v1.patch	text/x-patch	42.9 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-20 18:48:48
Message-ID:	2292303d-766d-2172-2942-b076a3bb6505@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

This is a new version of the patch that now implements all the XQUERY
regexp functions as described in the standard, minus the differences of
PostgerSQL regular expression explain in [1].

The standard SQL describe functions like_regex(), occurrences_regex(),
position_regex(), substring_regex() and translate_regex() which
correspond to the commonly named functions regexp_like(),
regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
reported by Chapman Flack in [2]. All these function are implemented in
the patch. Syntax of the functions are:

- regexp_like(string, pattern [, flags ])

- regexp_count( string, pattern [, position ] [, flags ])

- regexp_instr( string, pattern [, position ] [, occurrence ] [,
returnopt ] [, flags ] [, group ])

- regexp_substr( string, pattern [, position ] [, occurrence ] [, flags
] [, group ])

- regexp_replace(source, pattern, replacement [, position ] [,
occurrence ] [, flags ])

In addition to previous patch version I have added the regexp()_like
function and extended the existsing regex_replace() function. The patch
documents these functions and adds regression tests for all functions. I
will add it to the commitfest.

An other regexp functions regexp_positions() that returns all
occurrences that matched a POSIX regular expression is also developped
by Joel Jacobson, see [2]. This function expands the list of regexp
functions described in XQUERY.

[1]
/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP

[2]
/message-id/flat/bf2222d5-909d-408b-8531-95b32f18d4ab%40www.fastmail.com#3ec8ba658eeabcae2ac6ccca33bd1aed

--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

Attachment	Content-Type	Size
v2-0001-xquery-regexp-functions.patch	text/x-patch	71.4 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-21 09:21:17
Message-ID:	a508c358-5661-c97c-7edd-0ca39e0dfbb6@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 20/03/2021 à 19:48, Gilles Darold a écrit :
>
> Hi,
>
>
> This is a new version of the patch that now implements all the XQUERY
> regexp functions as described in the standard, minus the differences
> of PostgerSQL regular expression explain in [1].
>
>
> The standard SQL describe functions like_regex(), occurrences_regex(),
> position_regex(), substring_regex() and translate_regex() which
> correspond to the commonly named functions regexp_like(),
> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace()
> as reported by Chapman Flack in [2]. All these function are
> implemented in the patch. Syntax of the functions are:
>
>
> - regexp_like(string, pattern [, flags ])
>
> - regexp_count( string, pattern [, position ] [, flags ])
>
> - regexp_instr( string, pattern [, position ] [, occurrence ] [,
> returnopt ] [, flags ] [, group ])
>
> - regexp_substr( string, pattern [, position ] [, occurrence ] [,
> flags ] [, group ])
>
> - regexp_replace(source, pattern, replacement [, position ] [,
> occurrence ] [, flags ])
>
>
> In addition to previous patch version I have added the regexp()_like
> function and extended the existsing regex_replace() function. The
> patch documents these functions and adds regression tests for all
> functions. I will add it to the commitfest.
>
>
> An other regexp functions regexp_positions() that returns all
> occurrences that matched a POSIX regular expression is also developped
> by Joel Jacobson, see [2]. This function expands the list of regexp
> functions described in XQUERY.
>
>
> [1]
> /docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP
>
> [2]
> /message-id/flat/bf2222d5-909d-408b-8531-95b32f18d4ab%40www.fastmail.com#3ec8ba658eeabcae2ac6ccca33bd1aed
>
>

I would like to see these functions in PG 14 but it is a bit too late,
added to commitfest 2021-07.

--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

From:	er(at)xs4all(dot)nl
To:	Gilles Darold <gilles(at)darold(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-21 11:07:37
Message-ID:	2122577845.234888.1616324857283@webmailclassic.xs4all.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 2021.03.20. 19:48 Gilles Darold <gilles(at)darold(dot)net> wrote:
>
> This is a new version of the patch that now implements all the XQUERY
> regexp functions as described in the standard, minus the differences of
> PostgerSQL regular expression explain in [1].
>
> The standard SQL describe functions like_regex(), occurrences_regex(),
> position_regex(), substring_regex() and translate_regex() which
> correspond to the commonly named functions regexp_like(),
> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
> reported by Chapman Flack in [2]. All these function are implemented in

> [v2-0001-xquery-regexp-functions.patch]

Hi,

Apply, compile and (world)check are fine. I haven't found errors in functionality.

I went through the docs, and came up with these changes in func.sgml, and pg_proc.dat.

Useful functions - thanks!

Erik Rijkers

Attachment	Content-Type	Size
func.sgml.20210321.diff	text/x-patch	7.2 KB
pg_proc.dat.20210321.diff	text/x-patch	1.8 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-21 13:19:13
Message-ID:	9f91e4e7-19c8-d8e0-6955-fefaebd4c8fe@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 21/03/2021 à 12:07, er(at)xs4all(dot)nl a écrit :
>> On 2021.03.20. 19:48 Gilles Darold <gilles(at)darold(dot)net> wrote:
>>
>> This is a new version of the patch that now implements all the XQUERY
>> regexp functions as described in the standard, minus the differences of
>> PostgerSQL regular expression explain in [1].
>>
>> The standard SQL describe functions like_regex(), occurrences_regex(),
>> position_regex(), substring_regex() and translate_regex() which
>> correspond to the commonly named functions regexp_like(),
>> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
>> reported by Chapman Flack in [2]. All these function are implemented in
>> [v2-0001-xquery-regexp-functions.patch]
> Hi,
>
> Apply, compile and (world)check are fine. I haven't found errors in functionality.
>
> I went through the docs, and came up with these changes in func.sgml, and pg_proc.dat.
>
> Useful functions - thanks!
>
> Erik Rijkers

Thanks a lot Erik, here is a version of the patch with your corrections.

--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

Attachment	Content-Type	Size
v3-0001-xquery-regexp-functions.patch	text/x-patch	71.6 KB

From:	Chapman Flack <chap(at)anastigmatix(dot)net>
To:	Gilles Darold <gilles(at)darold(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-21 14:42:34
Message-ID:	60575B5A.4080807@anastigmatix.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/21/21 09:19, Gilles Darold wrote:
>>> On 2021.03.20. 19:48 Gilles Darold <gilles(at)darold(dot)net> wrote:
>>>
>>> This is a new version of the patch that now implements all the XQUERY
>>> regexp functions as described in the standard, minus the differences of
>>> PostgerSQL regular expression explain in [1].
>>>
>>> The standard SQL describe functions like_regex(), occurrences_regex(),
>>> position_regex(), substring_regex() and translate_regex() which
>>> correspond to the commonly named functions regexp_like(),
>>> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
>>> reported by Chapman Flack in [2]. All these function are implemented in
>>> [v2-0001-xquery-regexp-functions.patch]

I quickly looked over this patch preparing to object if it actually
purported to implement the ISO foo_regex() named functions without
the ISO semantics, but a quick grep reassured me that it doesn't
implement any of those functions. It only supplies functions in
the alternative, apparently common de facto naming scheme regexp_foo().

To be clear, I think that's the right call. I do not think it would be
a good idea to supply functions that have the ISO names but not the
specified regex dialect.

A set of functions analogous to the ISO ones but differently named and
with a different regex dialect seems fine to me, especially if these
different names are de facto common, and as far as I can tell, that is
what this patch provides. So I have no objection to that. :)

It might then be fair to say that the /description/ of the patch as
implementing the XQuery-based foo_regex functions isn't quite right,
or at least carries a risk of jarring some readers into hasty
double-takes on Sunday mornings before coffee.

It might be clearer to just mention the close correspondence between
the functions in this differently-named set and the corresponding ISO ones.

If this turns out to be a case of "attached the wrong patch, here's
the one that does implement foo_regex functions!" then I reserve an
objection to that. :)

Regards,
-Chap

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Chapman Flack <chap(at)anastigmatix(dot)net>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-21 14:53:17
Message-ID:	345476.1616338397@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Chapman Flack <chap(at)anastigmatix(dot)net> writes:
> If this turns out to be a case of "attached the wrong patch, here's
> the one that does implement foo_regex functions!" then I reserve an
> objection to that. :)

+1 to that. Just to add a note, I do have some ideas about extending
our regex parser so that it could duplicate the XQuery syntax --- none
of the points we mention in 9.7.3.8 seem insurmountable. I'm not
planning to work on that in the near future, mind you, but I definitely
think that we don't want to paint ourselves into a corner where we've
already implemented the XQuery regex functions with the wrong behavior.

regards, tom lane

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chapman Flack <chap(at)anastigmatix(dot)net>
Cc:	er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-21 16:40:45
Message-ID:	5e3a2df7-ba00-a633-f015-f6c97e5e86e8@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 21/03/2021 à 15:53, Tom Lane a écrit :
> Chapman Flack <chap(at)anastigmatix(dot)net> writes:
>> If this turns out to be a case of "attached the wrong patch, here's
>> the one that does implement foo_regex functions!" then I reserve an
>> objection to that. :)
> +1 to that. Just to add a note, I do have some ideas about extending
> our regex parser so that it could duplicate the XQuery syntax --- none
> of the points we mention in 9.7.3.8 seem insurmountable. I'm not
> planning to work on that in the near future, mind you, but I definitely
> think that we don't want to paint ourselves into a corner where we've
> already implemented the XQuery regex functions with the wrong behavior.
>
> regards, tom lane

I apologize for confusing with the words and phrases I have used. This
patch implements the regexp_foo () functions which are available in most
RDBMS with the behavior described in the documentation. I have modified
the title of the patch in the commitfest to removed wrong use of XQUERY.

I don't know too if the other RDBMS respect the XQUERY behavior but for
what I've seen for Oracle they are using limited regexp modifiers with
sometime not the same letter than PostgreSQL for the same behavior. I
have implemented these functions with the Oracle behavior in Orafce [1]
with a function that checks the modifiers used. This patch doesn't mimic
the Oracle behavior, it use the PostgreSQL behavior with regexp, the one
used by regex_replace() and regex_matches(). All regexp modifiers can be
used.

[1] https://github.com/orafce/orafce/blob/master/orafce--3.14--3.15.sql

--
Gilles Darold
http://www.darold.net/

From:	Gilles Darold <gillesdarold(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chapman Flack <chap(at)anastigmatix(dot)net>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-03-21 16:46:47
Message-ID:	45fe2893-1efc-1666-3ea5-b5fb1918c7fa@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

And the patch renamed.

Attachment	Content-Type	Size
v4-0001-regexp-foo-functions.patch	text/x-patch	71.6 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gilles Darold <gillesdarold(at)gmail(dot)com>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, Gilles Darold <gilles(at)darold(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-07-26 19:56:01
Message-ID:	212635.1627329361@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gilles Darold <gillesdarold(at)gmail(dot)com> writes:
> [ v4-0001-regexp-foo-functions.patch ]

I started to work through this and was distressed to realize that
it's trying to redefine regexp_replace() in an incompatible way.
We already have

The patch proposes to add (among other alternatives)

+{ oid => '9608', descr => 'replace text using regexp',
+ proname => 'regexp_replace', prorettype => 'text',
+ proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' },

which is going to be impossibly confusing for both humans and machines.
I don't think we should go there. Even if you managed to construct
examples that didn't result in "ambiguous function" failures, that
doesn't mean that ordinary mortals won't get bit that way.

I'm inclined to just drop the regexp_replace additions. I don't think
that the extra parameters Oracle provides here are especially useful.
They're definitely not useful enough to justify creating compatibility
hazards for.

regards, tom lane

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gilles Darold <gillesdarold(at)gmail(dot)com>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-07-27 09:38:36
Message-ID:	2204c4a4-8ee8-abdd-d5ca-d0b6893f96ab@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 26/07/2021 à 21:56, Tom Lane a écrit :
> Gilles Darold <gillesdarold(at)gmail(dot)com> writes:
>> [ v4-0001-regexp-foo-functions.patch ]
> I started to work through this and was distressed to realize that
> it's trying to redefine regexp_replace() in an incompatible way.
> We already have
>
> regression=# \df regexp_replace
> List of functions
> Schema | Name | Result data type | Argument data types | Type
> ------------+----------------+------------------+------------------------+------
> pg_catalog | regexp_replace | text | text, text, text | func
> pg_catalog | regexp_replace | text | text, text, text, text | func
> (2 rows)
>
> The patch proposes to add (among other alternatives)
>
> +{ oid => '9608', descr => 'replace text using regexp',
> + proname => 'regexp_replace', prorettype => 'text',
> + proargtypes => 'text text text int4', prosrc => 'textregexreplace_extended_no_occurrence' },
>
> which is going to be impossibly confusing for both humans and machines.
> I don't think we should go there. Even if you managed to construct
> examples that didn't result in "ambiguous function" failures, that
> doesn't mean that ordinary mortals won't get bit that way.
>
> I'm inclined to just drop the regexp_replace additions. I don't think
> that the extra parameters Oracle provides here are especially useful.
> They're definitely not useful enough to justify creating compatibility
> hazards for.

I would not say that being able to replace the Nth occurrence of a
pattern matching is not useful but i agree that this is not a common
case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
and I have though that we can not have compatibility issues because of
the different data type at the 4th parameter. Anyway, maybe we can just
rename the function even if I would prefer that regexp_replace() be
extended. For example:

regexp_replace(source, pattern, replacement [, flags ]);

regexp_substitute(source, pattern, replacement [, position ] [,
occurrence ] [, flags ]);

of course with only 3 parameters the two functions are the same.

What do you think about the renaming proposal instead of simply drop the
extended form of the function?

Best regards,

[1] https://docs.oracle.com/database/121/SQLRF/functions163.htm#SQLRF06302

[2] https://www.ibm.com/docs/en/db2oc?topic=functions-regexp-replace

--
Gilles Darold
http://www.darold.net/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Gilles Darold <gillesdarold(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-07-30 21:38:24
Message-ID:	1124806.1627681104@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gilles Darold <gilles(at)darold(dot)net> writes:
> Le 26/07/2021 à 21:56, Tom Lane a écrit :
>> I'm inclined to just drop the regexp_replace additions. I don't think
>> that the extra parameters Oracle provides here are especially useful.
>> They're definitely not useful enough to justify creating compatibility
>> hazards for.

> I would not say that being able to replace the Nth occurrence of a
> pattern matching is not useful but i agree that this is not a common
> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
> and I have though that we can not have compatibility issues because of
> the different data type at the 4th parameter.

Well, here's an example of the potential issues:

regression=# create function rr(text,text,text,text) returns text
regression-# language sql as $$select 'text'$$;
CREATE FUNCTION
regression=# create function rr(text,text,text,int4) returns text
language sql as $$select 'int4'$$;
CREATE FUNCTION
regression=# select rr('a','b','c','d');
rr
------
text
(1 row)

regression=# select rr('a','b','c',42);
rr
------
int4
(1 row)

So far so good, but:

regression=# prepare rr as select rr('a','b','c',$1);
PREPARE
regression=# execute rr(12);
rr
------
text
(1 row)

So somebody trying to use the 4-parameter Oracle form from, say, JDBC
would get bit if they were sloppy about specifying parameter types.

The one saving grace is that digits aren't valid regexp flags,
so the outcome would be something like

regression=# select regexp_replace('a','b','c','12');
ERROR: invalid regular expression option: "1"

which'd be less difficult to debug than silent misbehavior.
Conversely, if you thought you were passing flags but it somehow
got interpreted as a start position, that would fail too:

regression=# prepare rri as select rr('a','b','c', $1::int);
PREPARE
regression=# execute rri('gi');
ERROR: invalid input syntax for type integer: "gi"
LINE 1: execute rri('gi');
^

Still, I bet a lot that we'd see periodic bug reports complaining
that it doesn't work.

> Anyway, maybe we can just
> rename the function even if I would prefer that regexp_replace() be
> extended. For example:
> regexp_replace(source, pattern, replacement [, flags ]);
> regexp_substitute(source, pattern, replacement [, position ] [,
> occurrence ] [, flags ]);

Hmm. Of course the entire selling point of this patch seems to be
bug-compatibility with Oracle, so using different names is largely
defeating the point :-(

Maybe we should just hold our noses and do it. The point that
you'd get a recognizable failure if the wrong function were chosen
reassures me a little bit. We've seen a lot of cases where this
sort of ambiguity results in the system just silently doing something
different from what you expected, and I was afraid that that could
happen here.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Gilles Darold <gillesdarold(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-01 17:23:13
Message-ID:	1456640.1627838593@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've been working through this patch, and trying to verify
compatibility against Oracle and DB2, and I see some points that need
discussion or at least recording for the archives.

* In Oracle, while the documentation for regexp_instr says that
return_option should only be 0 or 1, experimentation with sqlfiddle
shows that any nonzero value is silently treated as 1. The patch
raises an error for other values, which I think is a good idea.
(IBM's docs say that DB2 raises an error too, though I can't test
that.) We don't need to be bug-compatible to that extent.

* What should happen when the subexpression/capture group number of
regexp_instr or regexp_substr exceeds the number of parenthesized
subexpressions of the regexp? Oracle silently returns a no-match
result (0 or NULL), as does this patch. However, IBM's docs say
that DB2 raises an error. I'm inclined to think that this is
likewise taking bug-compatibility too far, and that we should
raise an error like DB2. There are clearly cases where throwing
an error would help debug a faulty call, while I'm less clear on
a use-case where not throwing an error would be useful.

* IBM's docs say that both regexp_count and regexp_like have
arguments "string, pattern [, start] [, flags]" --- that is,
each of start and flags can be independently specified or omitted.
The patch follows Oracle, which has no start option for
regexp_like, and where you can't write flags for regexp_count
without writing start. This is fine by me, because doing these
like DB2 would introduce the same which-argument-is-this issues
as we're being forced to cope with for regexp_replace. I don't
think we need to accept ambiguity in these cases too. But it's
worth memorializing this decision in the thread.

* The patch has most of these functions silently ignoring the 'g'
flag, but I think they should raise errors instead. Oracle doesn't
accept a 'g' flag for these, so why should we? The only case where
that logic doesn't hold is regexp_replace, because depending on which
syntax you use the 'g' flag might or might not be meaningful. So
for regexp_replace, I'd vote for silently ignoring 'g' if the
occurrence-number parameter is given, while honoring it if not.

I've already made changes in my local copy per the last item,
but I've not done anything about throwing errors for out-of-range
subexpression numbers. Anybody have an opinion about that one?

regards, tom lane

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-01 19:22:41
Message-ID:	b0e7ffa2-e776-2b6b-d0b1-bf3289439790@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 30/07/2021 à 23:38, Tom Lane a écrit :
> Gilles Darold <gilles(at)darold(dot)net> writes:
>> Le 26/07/2021 à 21:56, Tom Lane a écrit :
>>> I'm inclined to just drop the regexp_replace additions. I don't think
>>> that the extra parameters Oracle provides here are especially useful.
>>> They're definitely not useful enough to justify creating compatibility
>>> hazards for.
>> I would not say that being able to replace the Nth occurrence of a
>> pattern matching is not useful but i agree that this is not a common
>> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
>> and I have though that we can not have compatibility issues because of
>> the different data type at the 4th parameter.
> Well, here's an example of the potential issues:
>
> [...]

Thanks for pointing me this case, I did not think that the prepared
statement could lead to this confusion.

>> Anyway, maybe we can just
>> rename the function even if I would prefer that regexp_replace() be
>> extended. For example:
>> regexp_replace(source, pattern, replacement [, flags ]);
>> regexp_substitute(source, pattern, replacement [, position ] [,
>> occurrence ] [, flags ]);
> Hmm. Of course the entire selling point of this patch seems to be
> bug-compatibility with Oracle, so using different names is largely
> defeating the point :-(
>
> Maybe we should just hold our noses and do it. The point that
> you'd get a recognizable failure if the wrong function were chosen
> reassures me a little bit. We've seen a lot of cases where this
> sort of ambiguity results in the system just silently doing something
> different from what you expected, and I was afraid that that could
> happen here.

I join a new version of the patch that include a check of the option
parameter in the basic form of regexp_replace() and return an error in
ambiguous cases.

PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and
wise','(\w+)thy', '\1ish', $1);
EXECUTE rr(1);
ERROR: ambiguous use of the option parameter in regex_replace(),
value: 1
HINT: you might set the occurrence parameter to force the use of
the extended form of regex_replace()

This is done by checking if the option parameter value is an integer and
throw the error in this case. I don't think of anything better.

Best regards,

--
Gilles Darold

Attachment	Content-Type	Size
v5-0001-regexp-foo-functions.patch	text/x-patch	73.1 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-01 19:48:00
Message-ID:	366af4de-b78e-ef76-5c7a-3d8a16567168@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 01/08/2021 à 19:23, Tom Lane a écrit :
> I've been working through this patch, and trying to verify
> compatibility against Oracle and DB2, and I see some points that need
> discussion or at least recording for the archives.
>
> * In Oracle, while the documentation for regexp_instr says that
> return_option should only be 0 or 1, experimentation with sqlfiddle
> shows that any nonzero value is silently treated as 1. The patch
> raises an error for other values, which I think is a good idea.
> (IBM's docs say that DB2 raises an error too, though I can't test
> that.) We don't need to be bug-compatible to that extent.
>
> * What should happen when the subexpression/capture group number of
> regexp_instr or regexp_substr exceeds the number of parenthesized
> subexpressions of the regexp? Oracle silently returns a no-match
> result (0 or NULL), as does this patch. However, IBM's docs say
> that DB2 raises an error. I'm inclined to think that this is
> likewise taking bug-compatibility too far, and that we should
> raise an error like DB2. There are clearly cases where throwing
> an error would help debug a faulty call, while I'm less clear on
> a use-case where not throwing an error would be useful.
>
> * IBM's docs say that both regexp_count and regexp_like have
> arguments "string, pattern [, start] [, flags]" --- that is,
> each of start and flags can be independently specified or omitted.
> The patch follows Oracle, which has no start option for
> regexp_like, and where you can't write flags for regexp_count
> without writing start. This is fine by me, because doing these
> like DB2 would introduce the same which-argument-is-this issues
> as we're being forced to cope with for regexp_replace. I don't
> think we need to accept ambiguity in these cases too. But it's
> worth memorializing this decision in the thread.
>
> * The patch has most of these functions silently ignoring the 'g'
> flag, but I think they should raise errors instead. Oracle doesn't
> accept a 'g' flag for these, so why should we? The only case where
> that logic doesn't hold is regexp_replace, because depending on which
> syntax you use the 'g' flag might or might not be meaningful. So
> for regexp_replace, I'd vote for silently ignoring 'g' if the
> occurrence-number parameter is given, while honoring it if not.
>
> I've already made changes in my local copy per the last item,
> but I've not done anything about throwing errors for out-of-range
> subexpression numbers. Anybody have an opinion about that one?

I thought about this while I was implementing the functions and chose to
not throw an error because of the Oracle behavior and also with others
regular expression implementation. For example in Perl there is no error:

$ perl -e '$str="hello world"; $str =~ s/(l)/$20/; print "$str\n";'
helo world

Usually a regular expression is always tested by its creator to be sure
that this the right one and that it does what is expected. But I agree
that it could help the writer to debug its RE.

Also if I recall well Oracle and DB2 limit the number of capture groups
back references from \1 to \9 for Oracle and \0 to \9 for DB2. I have
chosen to not apply this limit, I don't see the interest of such a
limitation.

--
Gilles Darold
http://www.darold.net/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-01 23:21:55
Message-ID:	1567465.1627860115@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gilles Darold <gilles(at)darold(dot)net> writes:
> [ v5-0001-regexp-foo-functions.patch ]

I've gone through this whole patch now, and found quite a lot that I did
not like. In no particular order:

* Wrapping parentheses around the user's regexp doesn't work. It can
turn an invalid regexp into a valid one: for example 'a)(b' should draw
a syntax error. With this patch, no error would be thrown, but the
"outer" parens wouldn't do what you expected. Worse, it can turn a
valid regexp into an invalid one: the metasyntax options described in
9.7.3.4 only work at the start of the regexp. So we have to handle
whole-regexp cases honestly rather than trying to turn them into an
instance of the parenthesized-subexpression case.

* You did a lot of things quite inefficiently, apparently to avoid
touching any existing code. I think it's better to extend
setup_regexp_matches() and replace_text_regexp() a little bit so that
they can support the behaviors these new functions need. In both of
them, it's absolutely trivial to allow a search start position to be
passed in; and it doesn't take much to teach replace_text_regexp()
to replace only the N'th match.

* Speaking of N'th, there is not much of anything that I like
about Oracle's terminology for the function arguments, and I don't
think we ought to adopt it. If we're documenting the functions as
processing the "N'th match", it seems to me to be natural to call
the parameter "N" not "occurrence". Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell.
Likewise, "position" is a horribly vague term for the search start
position; it could be interpreted to mean several other things.
"start" seems much better. "return_opt" is likewise awfully unclear.
I went with "endoption" below, though I could be talked into something
else. The only one of Oracle's choices that I like is "subexpr" for
subexpression number ... but you went with DB2's rather vague "group"
instead. I don't want to use their "capture group" terminology,
because that appears nowhere else in our documentation. Our existing
terminology is "parenthesized subexpression", which seems fine to me
(and also agrees with Oracle's docs).

* I spent a lot of time on the docs too. A lot of the syntax specs
were wrong (where you put the brackets matters), many of the examples
seemed confusingly overcomplicated, and the text explanations needed
copy-editing.

* Also, the regression tests seemed misguided. This patch is not
responsible for testing the regexp engine as such; we have tests
elsewhere that do that. So I don't think we need complex regexps
here. We just need to verify that the parameters of these functions
act properly, and check their error cases. That can be done much
more quickly and straightforwardly than what you had.

So here's a revised version that I like better. I think this
is pretty nearly committable, aside from the question of whether
a too-large subexpression number should be an error or not.

regards, tom lane

Attachment	Content-Type	Size
v6-0001-regexp-foo-functions.patch	text/x-diff	60.7 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-02 01:02:20
Message-ID:	1572284.1627866140@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> ... aside from the question of whether
> a too-large subexpression number should be an error or not.

Oh ... poking around some more, I noticed a very nearby precedent.
regexp_replace's replacement string can include \1 to \9 to insert
the substring matching the N'th parenthesized subexpression. But
if there is no such subexpression, you don't get an error, just
an empty insertion. So that seems like an argument for not
throwing an error for an out-of-range subexpr parameter.

regards, tom lane

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-02 21:22:04
Message-ID:	672983ad-86de-9063-acc1-a1f85ef0e14d@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 02/08/2021 à 01:21, Tom Lane a écrit :
> Gilles Darold <gilles(at)darold(dot)net> writes:
>> [ v5-0001-regexp-foo-functions.patch ]
> I've gone through this whole patch now, and found quite a lot that I did
> not like. In no particular order:
>
> * Wrapping parentheses around the user's regexp doesn't work. It can
> turn an invalid regexp into a valid one: for example 'a)(b' should draw
> a syntax error. With this patch, no error would be thrown, but the
> "outer" parens wouldn't do what you expected. Worse, it can turn a
> valid regexp into an invalid one: the metasyntax options described in
> 9.7.3.4 only work at the start of the regexp. So we have to handle
> whole-regexp cases honestly rather than trying to turn them into an
> instance of the parenthesized-subexpression case.
>
> * You did a lot of things quite inefficiently, apparently to avoid
> touching any existing code. I think it's better to extend
> setup_regexp_matches() and replace_text_regexp() a little bit so that
> they can support the behaviors these new functions need. In both of
> them, it's absolutely trivial to allow a search start position to be
> passed in; and it doesn't take much to teach replace_text_regexp()
> to replace only the N'th match.
>
> * Speaking of N'th, there is not much of anything that I like
> about Oracle's terminology for the function arguments, and I don't
> think we ought to adopt it. If we're documenting the functions as
> processing the "N'th match", it seems to me to be natural to call
> the parameter "N" not "occurrence". Speaking of the "occurrence'th
> occurrence" is just silly, not to mention long and easy to misspell.
> Likewise, "position" is a horribly vague term for the search start
> position; it could be interpreted to mean several other things.
> "start" seems much better. "return_opt" is likewise awfully unclear.
> I went with "endoption" below, though I could be talked into something
> else. The only one of Oracle's choices that I like is "subexpr" for
> subexpression number ... but you went with DB2's rather vague "group"
> instead. I don't want to use their "capture group" terminology,
> because that appears nowhere else in our documentation. Our existing
> terminology is "parenthesized subexpression", which seems fine to me
> (and also agrees with Oracle's docs).
>
> * I spent a lot of time on the docs too. A lot of the syntax specs
> were wrong (where you put the brackets matters), many of the examples
> seemed confusingly overcomplicated, and the text explanations needed
> copy-editing.
>
> * Also, the regression tests seemed misguided. This patch is not
> responsible for testing the regexp engine as such; we have tests
> elsewhere that do that. So I don't think we need complex regexps
> here. We just need to verify that the parameters of these functions
> act properly, and check their error cases. That can be done much
> more quickly and straightforwardly than what you had.
>
>
> So here's a revised version that I like better. I think this
> is pretty nearly committable, aside from the question of whether
> a too-large subexpression number should be an error or not.

Thanks a lot for the patch improvement and the guidance. I have read the
patch and I agree with your choices I think I was too much trying to
mimic the oraclisms. I don't think we should take care of the too-large
subexpression number, the regexp writer should always test its regular
expression and also this will not prevent him to chose the wrong capture
group number but just a non existing one.

Best regards,

--
Gilles Darold

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-03 09:45:09
Message-ID:	d042905c-f0b3-be60-44b8-bd3229cca66e@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 02/08/2021 à 23:22, Gilles Darold a écrit :
> Le 02/08/2021 à 01:21, Tom Lane a écrit :
>> Gilles Darold <gilles(at)darold(dot)net> writes:
>>> [ v5-0001-regexp-foo-functions.patch ]
>> I've gone through this whole patch now, and found quite a lot that I did
>> not like. In no particular order:
>>
>> * Wrapping parentheses around the user's regexp doesn't work. It can
>> turn an invalid regexp into a valid one: for example 'a)(b' should draw
>> a syntax error. With this patch, no error would be thrown, but the
>> "outer" parens wouldn't do what you expected. Worse, it can turn a
>> valid regexp into an invalid one: the metasyntax options described in
>> 9.7.3.4 only work at the start of the regexp. So we have to handle
>> whole-regexp cases honestly rather than trying to turn them into an
>> instance of the parenthesized-subexpression case.
>>
>> * You did a lot of things quite inefficiently, apparently to avoid
>> touching any existing code. I think it's better to extend
>> setup_regexp_matches() and replace_text_regexp() a little bit so that
>> they can support the behaviors these new functions need. In both of
>> them, it's absolutely trivial to allow a search start position to be
>> passed in; and it doesn't take much to teach replace_text_regexp()
>> to replace only the N'th match.
>>
>> * Speaking of N'th, there is not much of anything that I like
>> about Oracle's terminology for the function arguments, and I don't
>> think we ought to adopt it. If we're documenting the functions as
>> processing the "N'th match", it seems to me to be natural to call
>> the parameter "N" not "occurrence". Speaking of the "occurrence'th
>> occurrence" is just silly, not to mention long and easy to misspell.
>> Likewise, "position" is a horribly vague term for the search start
>> position; it could be interpreted to mean several other things.
>> "start" seems much better. "return_opt" is likewise awfully unclear.
>> I went with "endoption" below, though I could be talked into something
>> else. The only one of Oracle's choices that I like is "subexpr" for
>> subexpression number ... but you went with DB2's rather vague "group"
>> instead. I don't want to use their "capture group" terminology,
>> because that appears nowhere else in our documentation. Our existing
>> terminology is "parenthesized subexpression", which seems fine to me
>> (and also agrees with Oracle's docs).
>>
>> * I spent a lot of time on the docs too. A lot of the syntax specs
>> were wrong (where you put the brackets matters), many of the examples
>> seemed confusingly overcomplicated, and the text explanations needed
>> copy-editing.
>>
>> * Also, the regression tests seemed misguided. This patch is not
>> responsible for testing the regexp engine as such; we have tests
>> elsewhere that do that. So I don't think we need complex regexps
>> here. We just need to verify that the parameters of these functions
>> act properly, and check their error cases. That can be done much
>> more quickly and straightforwardly than what you had.
>>
>>
>> So here's a revised version that I like better. I think this
>> is pretty nearly committable, aside from the question of whether
>> a too-large subexpression number should be an error or not.
>
> Thanks a lot for the patch improvement and the guidance. I have read the
> patch and I agree with your choices I think I was too much trying to
> mimic the oraclisms. I don't think we should take care of the too-large
> subexpression number, the regexp writer should always test its regular
> expression and also this will not prevent him to chose the wrong capture
> group number but just a non existing one.

Actually I just found that the regexp_like() function doesn't support
the start parameter which is something we should support. I saw that
Oracle do not support it but DB2 does and I think we should also support
it. I will post a new version of the patch once it is done.

Best regards,

--
Gilles Darold

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, chap(at)anastigmatix(dot)net, er(at)xs4all(dot)nl
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-03 11:26:48
Message-ID:	ab3f3424-4969-906d-0932-800900b4647a@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 03/08/2021 à 11:45, Gilles Darold a écrit :
> Actually I just found that the regexp_like() function doesn't support
> the start parameter which is something we should support. I saw that
> Oracle do not support it but DB2 does and I think we should also
> support it. I will post a new version of the patch once it is done.

Here is a new version of the patch that adds the start parameter to
regexp_like() function but while I'm adding support to this parameter it
become less obvious for me that we should implement it. However feel
free to not use this version if you think that adding the start
parameter has no real interest.

Best regards,

--
Gilles Darold

Attachment	Content-Type	Size
v7-0001-regexp-foo-functions.patch	text/x-patch	62.0 KB

From:	Erik Rijkers <er(at)xs4all(dot)nl>
To:	Gilles Darold <gilles(at)darold(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, chap(at)anastigmatix(dot)net
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-03 13:02:43
Message-ID:	c004ff29-506c-18df-851d-08b6494c0f40@xs4all.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 8/3/21 1:26 PM, Gilles Darold wrote:
> Le 03/08/2021 à 11:45, Gilles Darold a écrit :
>> Actually I just found that the regexp_like() function doesn't support
>> the start parameter which is something we should support. I saw that
>> Oracle do not support it but DB2 does and I think we should also
>> support it. I will post a new version of the patch once it is done.
>

I for one am in favor of this 'start'-argument addition. Slightly
harder usage, but more precise manipulation.

Erik Rijkers

>
> Here is a new version of the patch that adds the start parameter to
> regexp_like() function but while I'm adding support to this parameter it
> become less obvious for me that we should implement it. However feel
> free to not use this version if you think that adding the start
> parameter has no real interest.
>
>
> Best regards,
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Erik Rijkers <er(at)xs4all(dot)nl>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, chap(at)anastigmatix(dot)net
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-03 13:39:50
Message-ID:	1702731.1627997990@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Erik Rijkers <er(at)xs4all(dot)nl> writes:
> On 8/3/21 1:26 PM, Gilles Darold wrote:
>> Le 03/08/2021 à 11:45, Gilles Darold a écrit :
>>> Actually I just found that the regexp_like() function doesn't support
>>> the start parameter which is something we should support. I saw that
>>> Oracle do not support it but DB2 does and I think we should also
>>> support it. I will post a new version of the patch once it is done.

> +1

> I for one am in favor of this 'start'-argument addition. Slightly
> harder usage, but more precise manipulation.

As I said upthread, I am *not* in favor of making those DB2 additions.
We do not need to create ambiguities around those functions like the
one we have for regexp_replace. If Oracle doesn't have those options,
why do we need them?

regards, tom lane

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Erik Rijkers <er(at)xs4all(dot)nl>
Cc:	Gilles Darold <gilles(at)darold(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, chap(at)anastigmatix(dot)net
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-03 14:11:24
Message-ID:	57cdabdb-4d63-e6ee-66ee-1ffc462e7149@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 03/08/2021 à 15:39, Tom Lane a écrit :
> Erik Rijkers <er(at)xs4all(dot)nl> writes:
>> On 8/3/21 1:26 PM, Gilles Darold wrote:
>>> Le 03/08/2021 à 11:45, Gilles Darold a écrit :
>>>> Actually I just found that the regexp_like() function doesn't support
>>>> the start parameter which is something we should support. I saw that
>>>> Oracle do not support it but DB2 does and I think we should also
>>>> support it. I will post a new version of the patch once it is done.
>> +1
>> I for one am in favor of this 'start'-argument addition. Slightly
>> harder usage, but more precise manipulation.
> As I said upthread, I am *not* in favor of making those DB2 additions.
> We do not need to create ambiguities around those functions like the
> one we have for regexp_replace. If Oracle doesn't have those options,
> why do we need them?

Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.

--
Gilles Darold

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gilles Darold <gilles(at)darold(dot)net>
Cc:	Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers(at)lists(dot)postgresql(dot)org, chap(at)anastigmatix(dot)net
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-08-03 17:10:24
Message-ID:	1759344.1628010624@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gilles Darold <gilles(at)darold(dot)net> writes:
> Sorry I have missed that, but I'm fine with this implemenation so let's
> keep the v6 version of the patch and drop this one.

Pushed, then. There's still lots of time to tweak the behavior of course.

regards, tom lane

From:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gilles Darold <gilles(at)darold(dot)net>
Cc:	Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers(at)lists(dot)postgresql(dot)org, chap(at)anastigmatix(dot)net
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-12-15 12:41:16
Message-ID:	b7988566-daa2-80ed-2fdc-6f6630462d26@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03.08.21 19:10, Tom Lane wrote:
> Gilles Darold <gilles(at)darold(dot)net> writes:
>> Sorry I have missed that, but I'm fine with this implemenation so let's
>> keep the v6 version of the patch and drop this one.
>
> Pushed, then. There's still lots of time to tweak the behavior of course.

I have a documentation follow-up to this. It seems that these new
functions are almost a de facto standard, whereas the SQL-standard
functions are not implemented anywhere. I propose the attached patch to
update the subsection in the pattern-matching section to give more
detail on this and suggest equivalent functions among these newly added
ones. What do you think?

Attachment	Content-Type	Size
0001-doc-More-documentation-on-regular-expressions-and-SQ.patch	text/plain	6.4 KB

From:	Gilles Darold <gilles(at)darold(dot)net>
To:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers(at)lists(dot)postgresql(dot)org, chap(at)anastigmatix(dot)net
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-12-15 13:15:04
Message-ID:	170bf229-0d4a-eaa1-4fb0-802442964474@darold.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Le 15/12/2021 à 13:41, Peter Eisentraut a écrit :
> On 03.08.21 19:10, Tom Lane wrote:
>> Gilles Darold <gilles(at)darold(dot)net> writes:
>>> Sorry I have missed that, but I'm fine with this implemenation so let's
>>> keep the v6 version of the patch and drop this one.
>>
>> Pushed, then. There's still lots of time to tweak the behavior of
>> course.
>
> I have a documentation follow-up to this. It seems that these new
> functions are almost a de facto standard, whereas the SQL-standard
> functions are not implemented anywhere. I propose the attached patch
> to update the subsection in the pattern-matching section to give more
> detail on this and suggest equivalent functions among these newly
> added ones. What do you think?

I'm in favor to apply your changes to documentation. It is a good thing
to precise the relation between this implementation of the regex_*
functions and the SQL stardard.

--
Gilles Darold

From:	Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To:	Gilles Darold <gilles(at)darold(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers(at)lists(dot)postgresql(dot)org, chap(at)anastigmatix(dot)net
Subject:	Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date:	2021-12-20 09:43:20
Message-ID:	81423aa1-154f-7cc7-5709-d6c040261ceb@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.12.21 14:15, Gilles Darold wrote:
> Le 15/12/2021 à 13:41, Peter Eisentraut a écrit :
>> On 03.08.21 19:10, Tom Lane wrote:
>>> Gilles Darold <gilles(at)darold(dot)net> writes:
>>>> Sorry I have missed that, but I'm fine with this implemenation so let's
>>>> keep the v6 version of the patch and drop this one.
>>>
>>> Pushed, then. There's still lots of time to tweak the behavior of
>>> course.
>>
>> I have a documentation follow-up to this. It seems that these new
>> functions are almost a de facto standard, whereas the SQL-standard
>> functions are not implemented anywhere. I propose the attached patch
>> to update the subsection in the pattern-matching section to give more
>> detail on this and suggest equivalent functions among these newly
>> added ones. What do you think?
>
>
> I'm in favor to apply your changes to documentation. It is a good thing
> to precise the relation between this implementation of the regex_*
> functions and the SQL stardard.

ok, done