Re: PostgreSQL POSIX regular expression problem

Lists: pgus-general
From: "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com>
To: "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org>
Subject: PostgreSQL POSIX regular expression problem
Date: 2016-04-06 18:31:42
Message-ID: 3BBE635F64E28D4C899377A61DAA9FE04530581C@NBSVR-MAIL01.radiantblue.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgus-general

* Trying to build a regular expression syntax that doesn't drop the last digit (5) at the very end of the phone number.

testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');
substring
--------------
865-999-569
(1 row)

* Removing the space at the end returns empty, but I need the complete phone number: 865-999-5695

testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');
substring
-----------
(1 row)

* These three examples work fine using a slight variation of the regex clause.

testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 abc' from 'PhoneNumber: \[commercial](.*?)(,|(\s(\w+)$))');
substring
---------------
865-999-5695
(1 row)

testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695, abc' from 'PhoneNumber: \[commercial](.*?)(,|(\s(\w+)$))');
substring
---------------
865-999-5695
(1 row)

testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695,' from 'PhoneNumber: \[commercial](.*?)(,|(\s(\w+)$))');
substring
---------------
865-999-5695
(1 row)

* but if I remove the last word and the comma. again I get null, when I need the phone number. The phone number may be the last value at the end of the string or it may not be.

testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695' from 'PhoneNumber: \[commercial](.*?)(,|(\s(\w+)$))');
substring
-----------

(1 row)

thanks

Fred


From: Greg Sabino Mullane <greg(at)endpoint(dot)com>
To: "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com>
Cc: "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL POSIX regular expression problem
Date: 2016-04-07 15:42:57
Message-ID: 20160407154255.GD4082@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgus-general

On Wed, Apr 06, 2016 at 06:31:42PM +0000, FBurgess(at)Radiantblue(dot)com wrote:
> * Trying to build a regular expression syntax that doesn't drop the last digit (5) at the very end of the phone number.
>
> testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');

You have two parens in there - the reason you are not getting the last digit is that the first parens (.*?) scoops
up everything - including a leading space - up to the second parens (,|(\d\s+)), which ensures that the final digit
plus whitespace combination is always excluded from the first match.

That's a very specific regex. Can you simply slurp a phone number directly, like so?:

substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from '([\d\-]+)')

--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8


From: "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com>
To: Greg Sabino Mullane <greg(at)endpoint(dot)com>
Cc: "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL POSIX regular expression problem
Date: 2016-04-07 16:16:41
Message-ID: 3BBE635F64E28D4C899377A61DAA9FE045305942@NBSVR-MAIL01.radiantblue.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgus-general

HI Greg, your solution will work for all my variations except for parentheses around the area_code.

pfdb=# select substring('OfficeName: abc office supply PhoneNumber:[commercial] (865)-999-5695, emailaddress: abc(at)office(dot)com' from '([\d\-]+)');
substring
-----------
865
(1 row)

thanks!!
________________________________________
From: Greg Sabino Mullane [greg(at)endpoint(dot)com]
Sent: Thursday, April 07, 2016 11:42 AM
To: Burgess, Freddie
Cc: pgus-general(at)postgresql(dot)org
Subject: Re: [pgus-general] PostgreSQL POSIX regular expression problem

On Wed, Apr 06, 2016 at 06:31:42PM +0000, FBurgess(at)Radiantblue(dot)com wrote:
> * Trying to build a regular expression syntax that doesn't drop the last digit (5) at the very end of the phone number.
>
> testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');

You have two parens in there - the reason you are not getting the last digit is that the first parens (.*?) scoops
up everything - including a leading space - up to the second parens (,|(\d\s+)), which ensures that the final digit
plus whitespace combination is always excluded from the first match.

That's a very specific regex. Can you simply slurp a phone number directly, like so?:

substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from '([\d\-]+)')

--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8


From: "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com>
To: Greg Sabino Mullane <greg(at)endpoint(dot)com>
Cc: "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL POSIX regular expression problem
Date: 2016-04-07 22:15:20
Message-ID: 3BBE635F64E28D4C899377A61DAA9FE04530599B@NBSVR-MAIL01.radiantblue.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgus-general

Thanks Greg and others,

Here is my workaround to pull out just the corporate phone number in this case. Some of the data has phone numbers in this valid format: (703) 450-5300. This solution doesn't work for these, but all other variations are addressed.

select substring(regexp_replace(regexp_replace('OfficeName: abc office supply PhoneNumber: [commercial] (703)-999-5695, PhoneNumber: [Corporate] (703)-450-5300','[\)]','','g'),'[\(]','','g') from 'PhoneNumber:\s*\[Corporate]\s*([\d\-]+)');

Fred
________________________________________
From: Burgess, Freddie
Sent: Thursday, April 07, 2016 12:16 PM
To: Greg Sabino Mullane
Cc: pgus-general(at)postgresql(dot)org
Subject: RE: [pgus-general] PostgreSQL POSIX regular expression problem

HI Greg, your solution will work for all my variations except for parentheses around the area_code.

pfdb=# select substring('OfficeName: abc office supply PhoneNumber:[commercial] (865)-999-5695, emailaddress: abc(at)office(dot)com' from '([\d\-]+)');
substring
-----------
865
(1 row)

thanks!!
________________________________________
From: Greg Sabino Mullane [greg(at)endpoint(dot)com]
Sent: Thursday, April 07, 2016 11:42 AM
To: Burgess, Freddie
Cc: pgus-general(at)postgresql(dot)org
Subject: Re: [pgus-general] PostgreSQL POSIX regular expression problem

On Wed, Apr 06, 2016 at 06:31:42PM +0000, FBurgess(at)Radiantblue(dot)com wrote:
> * Trying to build a regular expression syntax that doesn't drop the last digit (5) at the very end of the phone number.
>
> testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');

You have two parens in there - the reason you are not getting the last digit is that the first parens (.*?) scoops
up everything - including a leading space - up to the second parens (,|(\d\s+)), which ensures that the final digit
plus whitespace combination is always excluded from the first match.

That's a very specific regex. Can you simply slurp a phone number directly, like so?:

substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from '([\d\-]+)')

--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8


From: Greg Sabino Mullane <greg(at)endpoint(dot)com>
To: "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com>
Cc: "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL POSIX regular expression problem
Date: 2016-04-07 23:44:06
Message-ID: 20160407234406.GE4082@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgus-general

If you want to just pull out the numbers, do a direct regexp_replace with
some (\d\d\d)\D* like so:

select regexp_replace('my phone: (904)-867 5309 ', '.*(\d\d\d)\D*(\d\d\d)\D*(\d\d\d\d).*$', '\1-\2-\3');

To grab something after a specific string appears (e.g. "Corporate"), add it in but
make sure the first dot-star becomes non-greedy with a question mark:

select regexp_replace('my phone: 904 867 5309 [Corporate] (703)-555-1234',
'.*?Corporate.*(\d\d\d)\D*(\d\d\d)\D*(\d\d\d\d).*$', '\1-\2-\3');

--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8