Lists: | pgus-general |
---|
From: | "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com> |
---|---|
To: | "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org> |
Subject: | PostgreSQL POSIX regular expression problem |
Date: | 2016-04-06 18:31:42 |
Message-ID: | 3BBE635F64E28D4C899377A61DAA9FE04530581C@NBSVR-MAIL01.radiantblue.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgus-general |
* Trying to build a regular expression syntax that doesn't drop the last digit (5) at the very end of the phone number.
testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');
substring
--------------
865-999-569
(1 row)
* Removing the space at the end returns empty, but I need the complete phone number: 865-999-5695
testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');
substring
-----------
(1 row)
* These three examples work fine using a slight variation of the regex clause.
testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 abc' from 'PhoneNumber: \[commercial](.*?)(,|(\s(\w+)$))');
substring
---------------
865-999-5695
(1 row)
testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695, abc' from 'PhoneNumber: \[commercial](.*?)(,|(\s(\w+)$))');
substring
---------------
865-999-5695
(1 row)
testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695,' from 'PhoneNumber: \[commercial](.*?)(,|(\s(\w+)$))');
substring
---------------
865-999-5695
(1 row)
* but if I remove the last word and the comma. again I get null, when I need the phone number. The phone number may be the last value at the end of the string or it may not be.
testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695' from 'PhoneNumber: \[commercial](.*?)(,|(\s(\w+)$))');
substring
-----------
(1 row)
thanks
Fred
From: | Greg Sabino Mullane <greg(at)endpoint(dot)com> |
---|---|
To: | "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com> |
Cc: | "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL POSIX regular expression problem |
Date: | 2016-04-07 15:42:57 |
Message-ID: | 20160407154255.GD4082@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgus-general |
On Wed, Apr 06, 2016 at 06:31:42PM +0000, FBurgess(at)Radiantblue(dot)com wrote:
> * Trying to build a regular expression syntax that doesn't drop the last digit (5) at the very end of the phone number.
>
> testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');
You have two parens in there - the reason you are not getting the last digit is that the first parens (.*?) scoops
up everything - including a leading space - up to the second parens (,|(\d\s+)), which ensures that the final digit
plus whitespace combination is always excluded from the first match.
That's a very specific regex. Can you simply slurp a phone number directly, like so?:
substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from '([\d\-]+)')
--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8
From: | "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com> |
---|---|
To: | Greg Sabino Mullane <greg(at)endpoint(dot)com> |
Cc: | "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL POSIX regular expression problem |
Date: | 2016-04-07 16:16:41 |
Message-ID: | 3BBE635F64E28D4C899377A61DAA9FE045305942@NBSVR-MAIL01.radiantblue.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgus-general |
HI Greg, your solution will work for all my variations except for parentheses around the area_code.
pfdb=# select substring('OfficeName: abc office supply PhoneNumber:[commercial] (865)-999-5695, emailaddress: abc(at)office(dot)com' from '([\d\-]+)');
substring
-----------
865
(1 row)
thanks!!
________________________________________
From: Greg Sabino Mullane [greg(at)endpoint(dot)com]
Sent: Thursday, April 07, 2016 11:42 AM
To: Burgess, Freddie
Cc: pgus-general(at)postgresql(dot)org
Subject: Re: [pgus-general] PostgreSQL POSIX regular expression problem
On Wed, Apr 06, 2016 at 06:31:42PM +0000, FBurgess(at)Radiantblue(dot)com wrote:
> * Trying to build a regular expression syntax that doesn't drop the last digit (5) at the very end of the phone number.
>
> testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');
You have two parens in there - the reason you are not getting the last digit is that the first parens (.*?) scoops
up everything - including a leading space - up to the second parens (,|(\d\s+)), which ensures that the final digit
plus whitespace combination is always excluded from the first match.
That's a very specific regex. Can you simply slurp a phone number directly, like so?:
substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from '([\d\-]+)')
--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8
From: | "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com> |
---|---|
To: | Greg Sabino Mullane <greg(at)endpoint(dot)com> |
Cc: | "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL POSIX regular expression problem |
Date: | 2016-04-07 22:15:20 |
Message-ID: | 3BBE635F64E28D4C899377A61DAA9FE04530599B@NBSVR-MAIL01.radiantblue.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgus-general |
Thanks Greg and others,
Here is my workaround to pull out just the corporate phone number in this case. Some of the data has phone numbers in this valid format: (703) 450-5300. This solution doesn't work for these, but all other variations are addressed.
select substring(regexp_replace(regexp_replace('OfficeName: abc office supply PhoneNumber: [commercial] (703)-999-5695, PhoneNumber: [Corporate] (703)-450-5300','[\)]','','g'),'[\(]','','g') from 'PhoneNumber:\s*\[Corporate]\s*([\d\-]+)');
Fred
________________________________________
From: Burgess, Freddie
Sent: Thursday, April 07, 2016 12:16 PM
To: Greg Sabino Mullane
Cc: pgus-general(at)postgresql(dot)org
Subject: RE: [pgus-general] PostgreSQL POSIX regular expression problem
HI Greg, your solution will work for all my variations except for parentheses around the area_code.
pfdb=# select substring('OfficeName: abc office supply PhoneNumber:[commercial] (865)-999-5695, emailaddress: abc(at)office(dot)com' from '([\d\-]+)');
substring
-----------
865
(1 row)
thanks!!
________________________________________
From: Greg Sabino Mullane [greg(at)endpoint(dot)com]
Sent: Thursday, April 07, 2016 11:42 AM
To: Burgess, Freddie
Cc: pgus-general(at)postgresql(dot)org
Subject: Re: [pgus-general] PostgreSQL POSIX regular expression problem
On Wed, Apr 06, 2016 at 06:31:42PM +0000, FBurgess(at)Radiantblue(dot)com wrote:
> * Trying to build a regular expression syntax that doesn't drop the last digit (5) at the very end of the phone number.
>
> testdb=# select substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from 'PhoneNumber: \[commercial](.*?)(,|(\d\s+))');
You have two parens in there - the reason you are not getting the last digit is that the first parens (.*?) scoops
up everything - including a leading space - up to the second parens (,|(\d\s+)), which ensures that the final digit
plus whitespace combination is always excluded from the first match.
That's a very specific regex. Can you simply slurp a phone number directly, like so?:
substring('OfficeName: abc office supply PhoneNumber: [commercial] 865-999-5695 ' from '([\d\-]+)')
--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8
From: | Greg Sabino Mullane <greg(at)endpoint(dot)com> |
---|---|
To: | "FBurgess(at)Radiantblue(dot)com" <FBurgess(at)Radiantblue(dot)com> |
Cc: | "pgus-general(at)postgresql(dot)org" <pgus-general(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL POSIX regular expression problem |
Date: | 2016-04-07 23:44:06 |
Message-ID: | 20160407234406.GE4082@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgus-general |
If you want to just pull out the numbers, do a direct regexp_replace with
some (\d\d\d)\D* like so:
select regexp_replace('my phone: (904)-867 5309 ', '.*(\d\d\d)\D*(\d\d\d)\D*(\d\d\d\d).*$', '\1-\2-\3');
To grab something after a specific string appears (e.g. "Corporate"), add it in but
make sure the first dot-star becomes non-greedy with a question mark:
select regexp_replace('my phone: 904 867 5309 [Corporate] (703)-555-1234',
'.*?Corporate.*(\d\d\d)\D*(\d\d\d)\D*(\d\d\d\d).*$', '\1-\2-\3');
--
Greg Sabino Mullane greg(at)endpoint(dot)com
End Point Corporation
PGP Key: 0x14964AC8