Lists: | pgsql-hackers |
---|
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Conflict handling for COPY FROM |
Date: | 2018-08-04 13:09:51 |
Message-ID: | CALAY4q8PcAcTUVgjvKkNxenoAcydiwUuYhwNwad4kcZQ9Ewdbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hellow hackers,
A few commitfest ago there was same effort to add errors handling to COPY
FROM[1] and i see there that we already have infrastructure for supporting
handling of unique violation or exclusion constraint violation error and I
think it is independently useful too. Attached is a patch to do that.
In order to prevent extreme condition the patch also add a new GUC variable
called copy_max_error_limit that control the amount of error to swallow
before start to error and new failed record file options for copy to write
a failed record so the user can examine it.
With the new option COPY FROM can be specified like:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }[ON CONFLICT IGNORE
failed_record_filename] [ [ WITH ] ( option [, ...] ) ]
[1].
/message-id/flat/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA(at)gmail(dot)com
Comment?
Regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v1.patch | text/x-patch | 20.3 KB |
From: | Karen Huddleston <khuddleston(at)pivotal(dot)io> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2018-08-18 05:17:20 |
Message-ID: | 153456944015.1410.12495599902221553748.pgcf@coridan.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg사설 토토 사이트SQL |
Hi Surafel,
Andrew and I began reviewing your patch. It applied cleanly and seems to mostly have the functionality you describe. We did have some comments/questions.
1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.
Thanks for writing this patch!
Karen
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2018-08-20 06:37:05 |
Message-ID: | 153474702574.1410.1544274037026819695.pgcf@coridan.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg윈 토토SQL : |
Thanks for looking at it
1. It sounded like you added the copy_max_error_limit GUC as part of this patch to allow users to specify how many errors they want to swallow with this new functionality. The GUC didn't seem to be defined and we saw no mention of it in the code. My guess is this might be good to address one of the concerns mentioned in the initial email thread of this generating too many transaction IDs so it would probably be good to have it.
By mistake I write it copy_max_error_limit here but in the patch it is copy_maximum_error_limit with a default value of 100 sorry for mismatch
2. I was curious why you only have support for skipping errors on UNIQUE and EXCLUSION constraints and not other kinds of constraints? I'm not sure how difficult it would be to add support for those, but it seems they could also be useful.
I see it now that most of formatting error can be handle safely I will attache the patch for it this week
3. We think the wording "ON CONFLICT IGNORE" may not be the clearest description of what this is doing since it is writing the failed rows to a file for a user to process later, but they are not being ignored. We considered things like STASH or LOG as alternatives to IGNORE. Andrew may have some other suggestions for wording.
I agree.I will change it to ON CONFLICT LOG if we can’t find better naming
4. We also noticed this has no tests and thought it would be good to add some to ensure this functionality works how you intend it and continues to work. We started running some SQL to validate this, but haven't gotten the chance to put it into a clean test yet. We can send you what we have so far, or we are also willing to put a little time in to turn it into tests ourselves that we could contribute to this patch.
okay
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2018-08-20 12:14:34 |
Message-ID: | CA+TgmoYo2fLSHOaM-3jsV2ONM8-_tuoU2RPme9dBpbggC2==yg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 꽁 머니SQL |
On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <surafel3000(at)gmail(dot)com>
wrote:
> In order to prevent extreme condition the patch also add a new GUC
> variable called copy_max_error_limit that control the amount of error to
> swallow before start to error and new failed record file options for copy
> to write a failed record so the user can examine it.
>
Why should this be a GUC rather than a COPY option?
In fact, try doing all of this by adding more options to COPY rather than
new syntax.
COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')
It kind of stinks to use a log file written by the server as the
dup-reporting channel though. That would have to be superuser-only.
...Robert
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2018-08-23 14:11:04 |
Message-ID: | CALAY4q-hWwbg-P1=oJcVT1jTHj_g+JxJ_0JNe4MdYKz8RvUF=g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 결과SQL |
Hello,
The attached patch add error handling for
Extra data
missing data
invalid oid
null oid and
row count mismatch
And the record that field on the above case write to the file with appended
error message in it and in case of unique violation or exclusion constraint
violation error the failed record write as it is because the case of the
error can not be identified specifically
The new syntax became :
COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
Regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v2.patch | text/x-patch | 14.7 KB |
From: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
---|---|
To: | surafel3000(at)gmail(dot)com |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2018-11-29 12:15:54 |
Message-ID: | CA+q6zcWofzh+v+R4g62qzOtpuw75yx+LzkRHjCfKQk4iR409ig@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> On Thu, Aug 23, 2018 at 4:16 PM Surafel Temesgen <surafel3000(at)gmail(dot)com> wrote:
>
> The attached patch add error handling for
> Extra data
>
> missing data
>
> invalid oid
>
> null oid and
>
> row count mismatch
Hi,
Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
conflicts now, could you rebase it? I'm moving it to the next CF as "Waiting on
Author". Also I would appreciate if someone from the reviewers (Karen
Huddleston ?) could post a full patch review.
From: | "Nasby, Jim" <nasbyj(at)amazon(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Surafel Temesgen <surafel3000(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2018-12-04 19:15:24 |
Message-ID: | C85CD39D-65D8-4CBF-955C-AEEF731E648B@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 베이SQL |
On Aug 20, 2018, at 5:14 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sat, Aug 4, 2018 at 9:10 AM Surafel Temesgen <surafel3000(at)gmail(dot)com> wrote:
> In order to prevent extreme condition the patch also add a new GUC variable called copy_max_error_limit that control the amount of error to swallow before start to error and new failed record file options for copy to write a failed record so the user can examine it.
>
> Why should this be a GUC rather than a COPY option?
>
> In fact, try doing all of this by adding more options to COPY rather than new syntax.
>
> COPY ... WITH (ignore_conflicts 1000, ignore_logfile '...')
>
> It kind of stinks to use a log file written by the server as the dup-reporting channel though. That would have to be superuser-only.
Perhaps a better option would be to allow the user to specify a name for a cursor, and have COPY do the moral equivalent of DECLARE name? Calling a function for each bad row would be another option.
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2018-12-19 11:48:14 |
Message-ID: | CALAY4q_wJPmhSbkAG8a_QAQwbVajgU1+jTYCwA+QzzXLo75tjA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg범퍼카 토토SQL |
On Thu, Nov 29, 2018 at 3:15 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>
> Unfortunately, the patch conflict-handling-onCopy-from-v2.patch has some
> conflicts now, could you rebase it?
>
Thank you for informing, attach is rebased patch against current master
Regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v3.patch | text/x-patch | 14.5 KB |
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-04 06:06:34 |
Message-ID: | 20190204060634.GD4092@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:
> Thank you for informing, attach is rebased patch against current
> master
copy.c conflicts on HEAD, please rebase. I am moving the patch to
next CF, waiting on author.
--
Michael
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-16 05:24:57 |
Message-ID: | 20190216052457.v2nzb56sq3yd2vip@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg스포츠 토토 결과SQL |
Hi,
On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
> COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
This doesn't seem to address Robert's point that a log file requires to
be super user only, which seems to restrict the feature more than
necessary?
- Andres
From: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-17 13:58:22 |
Message-ID: | e5a225a0-26d4-cc1b-4144-7305ed928b8d@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | 503 젠 토토 페치 실패 |
On 2/16/19 12:24 AM, Andres Freund wrote:
> Hi,
>
> On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
>> COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
> This doesn't seem to address Robert's point that a log file requires to
> be super user only, which seems to restrict the feature more than
> necessary?
>
I liked Jim Nasby's idea of having it call a function rather than
writing to a log file.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-19 10:51:53 |
Message-ID: | CALAY4q98ViUEivELAJY697UkSANRhvhBe3qnUL_Nk+vC+r+jKg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Feb 4, 2019 at 9:06 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Wed, Dec 19, 2018 at 02:48:14PM +0300, Surafel Temesgen wrote:
> > Thank you for informing, attach is rebased patch against current
> > master
>
> copy.c conflicts on HEAD, please rebase. I am moving the patch to
> next CF, waiting on author.
> --
>
Thank you, here is a rebased patch against current master
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v4.patch | text/x-patch | 14.5 KB |
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-19 11:05:37 |
Message-ID: | CALAY4q8vD-FY--nWDVv509-95NgTEbL6UGY-MEY0+-J540fSpA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
> > COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
>
> This doesn't seem to address Robert's point that a log file requires to
> be super user only, which seems to restrict the feature more than
> necessary?
>
> - Andres
>
I think having write permission on specified directory is enough.
we use out put file name in COPY TO similarly.
regards
Surafel
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org,Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>,PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-19 12:47:23 |
Message-ID: | 92E58BB6-E05A-4F59-8C65-AF0392C253FF@anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg젠 토토SQL : |
On February 19, 2019 3:05:37 AM PST, Surafel Temesgen <surafel3000(at)gmail(dot)com> wrote:
>On Sat, Feb 16, 2019 at 8:24 AM Andres Freund <andres(at)anarazel(dot)de>
>wrote:
>
>> Hi,
>>
>> On 2018-08-23 17:11:04 +0300, Surafel Temesgen wrote:
>> > COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';
>>
>> This doesn't seem to address Robert's point that a log file requires
>to
>> be super user only, which seems to restrict the feature more than
>> necessary?
>>
>> - Andres
>>
>
>
>I think having write permission on specified directory is enough.
>we use out put file name in COPY TO similarly.
Err, what? Again, that requires super user permissions (in contrast to copy from/to stdin/out). Backends run as the user postgres runs under - it will always have write permissions to at least the entire data directory. I think not addressing this just about guarantees the feature will be rejected.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-20 13:01:23 |
Message-ID: | CALAY4q-7GEpHZROhgGCTpKB7aanFCev63UkStavoWq4d_J6xGw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>
> Err, what? Again, that requires super user permissions (in contrast to
> copy from/to stdin/out). Backends run as the user postgres runs under
>
okay i see it now and modified the patch similarly
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v5.patch | text/x-patch | 15.0 KB |
From: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-20 14:05:53 |
Message-ID: | 62827e86-ded2-a68e-dfdc-1454a38a704d@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2/20/19 8:01 AM, Surafel Temesgen wrote:
>
>
> On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres(at)anarazel(dot)de
> <mailto:andres(at)anarazel(dot)de>> wrote:
>
>
>
> Err, what? Again, that requires super user permissions (in
> contrast to copy from/to stdin/out). Backends run as the user
> postgres runs under
>
>
>
> okay i see it now and modified the patch similarly
>
>
Why log to a file at all? We do have, you know, a database handy, where
we might more usefully log errors. You could usefully log the offending
row as an array of text, possibly.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org,Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>,Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>,PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-02-20 16:03:54 |
Message-ID: | 82B3531E-082D-4882-A15E-25293F779569@anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>
>On 2/20/19 8:01 AM, Surafel Temesgen wrote:
>>
>>
>> On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres(at)anarazel(dot)de
>> <mailto:andres(at)anarazel(dot)de>> wrote:
>>
>>
>>
>> Err, what? Again, that requires super user permissions (in
>> contrast to copy from/to stdin/out). Backends run as the user
>> postgres runs under
>>
>>
>>
>> okay i see it now and modified the patch similarly
>>
>>
>
>
>Why log to a file at all? We do have, you know, a database handy, where
>we might more usefully log errors. You could usefully log the offending
>row as an array of text, possibly.
Or even just return it as a row. CopyBoth is relatively widely supported these days.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
From: | David Steele <david(at)pgmasters(dot)net> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Re: Conflict handling for COPY FROM |
Date: | 2019-03-25 08:50:13 |
Message-ID: | e0e1cffd-25ac-f0bb-31c1-3807c63edcc5@pgmasters.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Surafel,
On 2/20/19 8:03 PM, Andres Freund wrote:
> On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>>
>> Why log to a file at all? We do have, you know, a database handy, where
>> we might more usefully log errors. You could usefully log the offending
>> row as an array of text, possibly.
>
> Or even just return it as a row. CopyBoth is relatively widely supported these days.
This patch no longer applies so marked Waiting on Author.
Also, it appears that you have some comments from Andrew and Andres that
you should reply to.
Regards,
--
-David
david(at)pgmasters(dot)net
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-04-03 20:04:11 |
Message-ID: | 20190403200411.3pxx5rufo7kvnqoi@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
On 2019-03-25 12:50:13 +0400, David Steele wrote:
> On 2/20/19 8:03 PM, Andres Freund wrote:
> > On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
> > >
> > > Why log to a file at all? We do have, you know, a database handy, where
> > > we might more usefully log errors. You could usefully log the offending
> > > row as an array of text, possibly.
> >
> > Or even just return it as a row. CopyBoth is relatively widely supported these days.
>
> This patch no longer applies so marked Waiting on Author.
>
> Also, it appears that you have some comments from Andrew and Andres that you
> should reply to.
As nothing has happened the last weeks, I've now marked this as
returned with feedback.
- Andres
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-06-28 10:56:43 |
Message-ID: | CALAY4q-4+p3j7dD-a1rPjCw5u9k8XCXxSu6dJwLe3iCRiQR6Pw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | 503 스포츠 토토 결과 |
On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>
> On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <
> andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
> >
> >On 2/20/19 8:01 AM, Surafel Temesgen wrote:
> >>
> >>
> >> On Tue, Feb 19, 2019 at 3:47 PM Andres Freund <andres(at)anarazel(dot)de
> >> <mailto:andres(at)anarazel(dot)de>> wrote:
> >>
> >>
> >>
> >> Err, what? Again, that requires super user permissions (in
> >> contrast to copy from/to stdin/out). Backends run as the user
> >> postgres runs under
> >>
> >>
> >>
> >> okay i see it now and modified the patch similarly
> >>
> >>
> >
> >
> >Why log to a file at all? We do have, you know, a database handy, where
> >we might more usefully log errors. You could usefully log the offending
> >row as an array of text, possibly.
>
> Or even just return it as a row. CopyBoth is relatively widely supported
> these days.
>
>
hello,
i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated
In addition to the above change in the attached patch i also change
the syntax to ERROR LIMIT because it is no longer only skip
unique and exclusion constrain violation
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v6.patch | text/x-patch | 14.0 KB |
From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-06-28 13:12:21 |
Message-ID: | 20190628131221.GA30180@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2019-Jun-28, Surafel Temesgen wrote:
> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On February 20, 2019 6:05:53 AM PST, Andrew Dunstan <
> > andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
> > >Why log to a file at all? We do have, you know, a database handy, where
> > >we might more usefully log errors. You could usefully log the offending
> > >row as an array of text, possibly.
> >
> > Or even just return it as a row. CopyBoth is relatively widely supported
> > these days.
>
> i think generating warning about it also sufficiently meet its propose of
> notifying user about skipped record with existing logging facility
> and we use it for similar propose in other place too. The different
> i see is the number of warning that can be generated
Warnings seem useless for this purpose. I'm with Andres: returning rows
would make this a fine feature. If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.
That would make the feature much more usable because you can do further
processing with the rows that conflict, if any is necessary (or throw
them away if not). Putting them in warnings will just make the screen
scroll fast.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-02 16:57:21 |
Message-ID: | f55afd63-8d64-e8df-0d54-13c116b8d23e@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 28.06.2019 16:12, Alvaro Herrera wrote:
>> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> Or even just return it as a row. CopyBoth is relatively widely supported
>>> these days.
>> i think generating warning about it also sufficiently meet its propose of
>> notifying user about skipped record with existing logging facility
>> and we use it for similar propose in other place too. The different
>> i see is the number of warning that can be generated
> Warnings seem useless for this purpose. I'm with Andres: returning rows
> would make this a fine feature. If the user wants the rows in a table
> as Andrew suggests, she can use wrap the whole thing in an insert.
I agree with previous commentators that returning rows will make this
feature more versatile. Though, having a possibility to simply skip
conflicting/malformed rows is worth of doing from my perspective.
However, pushing every single skipped row to the client as a separated
WARNING will be too much for a bulk import. So maybe just overall stats
about skipped rows number will be enough?
Also, I would prefer having an option to ignore all errors, e.g. with
option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
a number of future errors if you are playing with some badly structured
data, while always setting it to 100500k looks ugly.
Anyway, below are some issues with existing code after a brief review of
the patch:
1) Calculation of processed rows isn't correct (I've checked). You do it
in two places, and
- processed++;
+ if (!cstate->error_limit)
+ processed++;
is never incremented if ERROR_LIMIT is specified and no errors
occurred/no constraints exist, so the result will always be 0. However,
if primary column with constraints exists, then processed is calculated
correctly, since another code path is used:
+ if (specConflict)
+ {
+ ...
+ }
+ else
+ processed++;
I would prefer this calculation in a single place (as it was before
patch) for simplicity and in order to avoid such problems.
2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
is specified and was exceeded, which doesn't seem to be correct, does it?
- if (resultRelInfo->ri_NumIndices > 0)
+ if (resultRelInfo->ri_NumIndices > 0 &&
cstate->error_limit == 0)
recheckIndexes = ExecInsertIndexTuples(myslot,
3) Trailing whitespaces added to error messages and tests for some reason:
+ ereport(WARNING,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("skipping \"%s\" --- missing data
for column \"%s\" ",
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing data for column \"%s\" ",
-ERROR: missing data for column "e"
+ERROR: missing data for column "e"
CONTEXT: COPY x, line 1: "2000 230 23 23"
-ERROR: missing data for column "e"
+ERROR: missing data for column "e"
CONTEXT: COPY x, line 1: "2001 231 \N \N"
Otherwise, the patch applies/compiles cleanly and regression tests are
passed.
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-03 11:42:00 |
Message-ID: | CALAY4q8rs26AOHf48mC67bGgN4yHfrycnq5BgihyVpUe7hypyQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Alexey,
Thank you for looking at it
On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
wrote:
> On 28.06.2019 16:12, Alvaro Herrera wrote:
> >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> >>> Or even just return it as a row. CopyBoth is relatively widely
> supported
> >>> these days.
> >> i think generating warning about it also sufficiently meet its propose
> of
> >> notifying user about skipped record with existing logging facility
> >> and we use it for similar propose in other place too. The different
> >> i see is the number of warning that can be generated
> > Warnings seem useless for this purpose. I'm with Andres: returning rows
> > would make this a fine feature. If the user wants the rows in a table
> > as Andrew suggests, she can use wrap the whole thing in an insert.
>
> I agree with previous commentators that returning rows will make this
> feature more versatile.
I agree. am looking at the options
Also, I would prefer having an option to ignore all errors, e.g. with
> option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
> a number of future errors if you are playing with some badly structured
> data, while always setting it to 100500k looks ugly.
>
>
Good idea
> 1) Calculation of processed rows isn't correct (I've checked). You do it
> in two places, and
>
> - processed++;
> + if (!cstate->error_limit)
> + processed++;
>
> is never incremented if ERROR_LIMIT is specified and no errors
> occurred/no constraints exist, so the result will always be 0. However,
> if primary column with constraints exists, then processed is calculated
> correctly, since another code path is used:
>
>
Correct. i will fix
+ if (specConflict)
> + {
> + ...
> + }
> + else
> + processed++;
>
> I would prefer this calculation in a single place (as it was before
> patch) for simplicity and in order to avoid such problems.
>
>
ok
> 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
> is specified and was exceeded, which doesn't seem to be correct, does it?
>
> - if (resultRelInfo->ri_NumIndices > 0)
> + if (resultRelInfo->ri_NumIndices > 0 &&
> cstate->error_limit == 0)
> recheckIndexes =
> ExecInsertIndexTuples(myslot,
>
>
No it alwase executed . I did it this way to avoid
inserting index tuple twice but i see its unlikely
> 3) Trailing whitespaces added to error messages and tests for some reason:
>
> + ereport(WARNING,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("skipping \"%s\" --- missing data
> for column \"%s\" ",
>
> + ereport(ERROR,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("missing data for column \"%s\" ",
>
> -ERROR: missing data for column "e"
> +ERROR: missing data for column "e"
> CONTEXT: COPY x, line 1: "2000 230 23 23"
>
> -ERROR: missing data for column "e"
> +ERROR: missing data for column "e"
> CONTEXT: COPY x, line 1: "2001 231 \N \N"
>
>
regards
Surafel
From: | Anthony Nowocien <anowocien(at)gmail(dot)com> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-03 13:24:55 |
Message-ID: | CAH5RRoO_2n1PYuONuA4auhYa1xofv+AzvKVDU94XTDZ=+EiwvQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
I'm very interested in this patch and would like to give a review within a
week. On the feature side, how about simply using the less verbose "ERRORS"
instead of "ERROR LIMIT" ?
On Wed, Jul 3, 2019 at 1:42 PM Surafel Temesgen <surafel3000(at)gmail(dot)com>
wrote:
> Hi Alexey,
> Thank you for looking at it
>
> On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov <
> a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>
>> On 28.06.2019 16:12, Alvaro Herrera wrote:
>> >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <andres(at)anarazel(dot)de>
>> wrote:
>> >>> Or even just return it as a row. CopyBoth is relatively widely
>> supported
>> >>> these days.
>> >> i think generating warning about it also sufficiently meet its propose
>> of
>> >> notifying user about skipped record with existing logging facility
>> >> and we use it for similar propose in other place too. The different
>> >> i see is the number of warning that can be generated
>> > Warnings seem useless for this purpose. I'm with Andres: returning rows
>> > would make this a fine feature. If the user wants the rows in a table
>> > as Andrew suggests, she can use wrap the whole thing in an insert.
>>
>> I agree with previous commentators that returning rows will make this
>> feature more versatile.
>
>
> I agree. am looking at the options
>
> Also, I would prefer having an option to ignore all errors, e.g. with
>> option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
>> a number of future errors if you are playing with some badly structured
>> data, while always setting it to 100500k looks ugly.
>>
>>
> Good idea
>
> I also +1 having an option to ignore all errors. Other RDBMS might use a
large number, but "-1" seems cleaner so far.
>
>
>> 1) Calculation of processed rows isn't correct (I've checked). You do it
>> in two places, and
>>
>> - processed++;
>> + if (!cstate->error_limit)
>> + processed++;
>>
>> is never incremented if ERROR_LIMIT is specified and no errors
>> occurred/no constraints exist, so the result will always be 0. However,
>> if primary column with constraints exists, then processed is calculated
>> correctly, since another code path is used:
>>
>>
> Correct. i will fix
>
> + if (specConflict)
>> + {
>> + ...
>> + }
>> + else
>> + processed++;
>>
>> I would prefer this calculation in a single place (as it was before
>> patch) for simplicity and in order to avoid such problems.
>>
>>
> ok
>
>
>> 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT
>> is specified and was exceeded, which doesn't seem to be correct, does it?
>>
>> - if (resultRelInfo->ri_NumIndices > 0)
>> + if (resultRelInfo->ri_NumIndices > 0 &&
>> cstate->error_limit == 0)
>> recheckIndexes =
>> ExecInsertIndexTuples(myslot,
>>
>>
> No it alwase executed . I did it this way to avoid
> inserting index tuple twice but i see its unlikely
>
>
>> 3) Trailing whitespaces added to error messages and tests for some reason:
>>
>> + ereport(WARNING,
>> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>> + errmsg("skipping \"%s\" --- missing data
>> for column \"%s\" ",
>>
>> + ereport(ERROR,
>> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>> + errmsg("missing data for column \"%s\" ",
>>
>> -ERROR: missing data for column "e"
>> +ERROR: missing data for column "e"
>> CONTEXT: COPY x, line 1: "2000 230 23 23"
>>
>> -ERROR: missing data for column "e"
>> +ERROR: missing data for column "e"
>> CONTEXT: COPY x, line 1: "2001 231 \N \N"
>>
>>
>
> regards
> Surafel
>
Thanks,
Anthony
From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-05 04:10:19 |
Message-ID: | CA+hUKGLatHZfyQsFOXEPJR7X_DT3nko5XY7W2mxg1t9782mR=Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Jun 28, 2019 at 10:57 PM Surafel Temesgen <surafel3000(at)gmail(dot)com> wrote:
> In addition to the above change in the attached patch i also change
> the syntax to ERROR LIMIT because it is no longer only skip
> unique and exclusion constrain violation
Hi Surafel,
FYI copy.sgml has some DTD validity problems.
https://travis-ci.org/postgresql-cfbot/postgresql/builds/554350168
--
Thomas Munro
https://enterprisedb.com
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-11 13:42:12 |
Message-ID: | CALAY4q-V8BPWjCb2fqBHGpiUCCsWL-nbMYD_x-tpSDq_D_N+0w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi
>
> Also, I would prefer having an option to ignore all errors, e.g. with
> option ERROR_LIMIT set to -1. Because it is rather difficult to estimate
> a number of future errors if you are playing with some badly structured
> data, while always setting it to 100500k looks ugly.
>
Here are the patch that contain all the comment given except adding a way
to specify
to ignoring all error because specifying a highest number can do the work
and may be
try to store such badly structure data is a bad idea
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v7.patch | text/x-patch | 13.4 KB |
From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-14 01:47:59 |
Message-ID: | CA+hUKGLfd1x3XkHpN09WRUCvr591pHLS2TvJvzOa60Tuz+LBtg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 베이SQL |
On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen <surafel3000(at)gmail(dot)com> wrote:
> Here are the patch that contain all the comment given except adding a way to specify
> to ignoring all error because specifying a highest number can do the work and may be
> try to store such badly structure data is a bad idea
Hi Surafel,
FYI GCC warns:
copy.c: In function ‘CopyFrom’:
copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
(void) dest->receiveSlot(myslot, dest);
^
copy.c:2702:16: note: ‘dest’ was declared here
DestReceiver *dest;
^
--
Thomas Munro
https://enterprisedb.com
From: | Anthony Nowocien <anowocien(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-14 08:15:17 |
Message-ID: | CAH5RRoNctxFUupW8EpxCu+_nz6vda1xFZqLv-f-8MK+YQmkZ9Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
sorry for answering a bit later than I hoped. Here is my review so far:
Contents
======
This patch starts to address in my opinion one of COPY's shortcoming, which
is error handling. PK and exclusion errors are taken care of, but not
(yet?) other types of errors.
Documentation is updated, "\h copy" also and some regression tests are
added.
Initial Run
=======
Patch applies (i've tested v6) cleanly.
make: OK
make install: OK
make check: OK
make installcheck: OK
Performance
========
I've tested the patch on a 1.1G file with 10 000 000 lines. Each test was
done 15 times on a small local VM. Table is without constraints.
head: 38,93s
head + patch: 38,76s
Another test was one a 0.1GB file with 1 000 000 lines. Each test done 10
times on a small local VM and the table has a pk.
COPY 4,550s
COPY CONFLICT 4,595s
COPY CONFLICT with only one pk error 10,529s
COPY CONFLICT pk error every 100 lines 10,859s
COPY CONFLICT pk error every 1000 lines 10,879s
I did not test exclusions so far.
Thoughts
======
I find the feature useful in itself. One big question for me is can it be
improved later on to handle other types of errors (like check constraints
for example) ? A "-1" for the error limit would be very useful in my
opinion.
I am also afraid that the name "error_limit" might mislead users into
thinking that all error types are handled. But I do not have a better
suggestion without making this clause much longer...
I've had a short look at the code, but this will need review by someone
else.
Anyway, thanks a lot for taking the time to work on it.
Anthony
On Sun, Jul 14, 2019 at 3:48 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen <surafel3000(at)gmail(dot)com>
> wrote:
> > Here are the patch that contain all the comment given except adding a
> way to specify
> > to ignoring all error because specifying a highest number can do the
> work and may be
> > try to store such badly structure data is a bad idea
>
> Hi Surafel,
>
> FYI GCC warns:
>
> copy.c: In function ‘CopyFrom’:
> copy.c:3383:8: error: ‘dest’ may be used uninitialized in this
> function [-Werror=maybe-uninitialized]
> (void) dest->receiveSlot(myslot, dest);
> ^
> copy.c:2702:16: note: ‘dest’ was declared here
> DestReceiver *dest;
> ^
>
> --
> Thomas Munro
> https://enterprisedb.com
>
>
>
From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-14 16:40:01 |
Message-ID: | 20190714164001.GA21895@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I think making ERROR a reserved word is a terrible idea, and we don't
need it for this feature anyway. Use a new option in the parenthesized
options list instead.
error_limit being an integer, please don't use it as a boolean:
if (cstate->error_limit)
...
Add an explicit comparison to zero instead, for code readability.
Also, since each error decrements the same variable, it becomes hard to
reason about the state: at the end, are we ending with the exact number
of errors, or did we start with the feature disabled? I suggest that
it'd make sense to have a boolean indicating whether this feature has
been requested, and the integer is just the remaining allowed problems.
Line 3255 or thereabouts contains an excess " char
The "warn about it" comment is obsolete, isn't it? There's no warning
there.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-07-16 07:08:37 |
Message-ID: | CALAY4q89O2f1+5RMc6nq5QpV7br8VZM9FK_zOh9jUS_AkuOGqw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Jul 14, 2019 at 7:40 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:
>
> error_limit being an integer, please don't use it as a boolean:
>
> if (cstate->error_limit)
>
...
>
> Add an explicit comparison to zero instead, for code readability.
> Also, since each error decrements the same variable, it becomes hard to
> reason about the state: at the end, are we ending with the exact number
> of errors, or did we start with the feature disabled? I suggest that
> it'd make sense to have a boolean indicating whether this feature has
> been requested, and the integer is just the remaining allowed problems.
>
>
done
Line 3255 or thereabouts contains an excess " char
>
>
fixed
> The "warn about it" comment is obsolete, isn't it? There's no warning
> there.
>
>
fixed
i also add an option to ignore all errors in ERROR set to -1
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v8.patch | text/x-patch | 13.3 KB |
From: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-09-20 13:16:32 |
Message-ID: | 03431db3-7b69-b7a4-43be-f31efb6b44f3@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Surafel,
On 16.07.2019 10:08, Surafel Temesgen wrote:
> i also add an option to ignore all errors in ERROR set to -1
Great!
The patch still applies cleanly (tested on e1c8743e6c), but I've got
some problems using more elaborated tests.
First of all, there is definitely a problem with grammar. In docs ERROR
is defined as option and
COPY test FROM '/path/to/copy-test-simple.csv' ERROR -1;
works just fine, but if modern 'WITH (...)' syntax is used:
COPY test FROM '/path/to/copy-test-simple.csv' WITH (ERROR -1);
ERROR: option "error" not recognized
while 'WITH (error_limit -1)' it works again.
It happens, since COPY supports modern and very-very old syntax:
* In the preferred syntax the options are comma-separated
* and use generic identifiers instead of keywords. The pre-9.0
* syntax had a hard-wired, space-separated set of options.
So I see several options here:
1) Everything is left as is, but then docs should be updated and
reflect, that error_limit is required for modern syntax.
2) However, why do we have to support old syntax here? I guess it exists
for backward compatibility only, but this is a completely new feature.
So maybe just 'WITH (error_limit 42)' will be enough?
3) You also may simply change internal option name from 'error_limit' to
'error' or SQL keyword from 'ERROR' tot 'ERROR_LIMIT'.
I would prefer the second option.
Next, you use DestRemoteSimple for returning conflicting tuples back:
+ dest = CreateDestReceiver(DestRemoteSimple);
+ dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
However, printsimple supports very limited subset of built-in types, so
CREATE TABLE large_test (id integer primary key, num1 bigint, num2
double precision);
COPY large_test FROM '/path/to/copy-test.tsv';
COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;
fails with following error 'ERROR: unsupported type OID: 701', which
seems to be very confusing from the end user perspective. I've tried to
switch to DestRemote, but couldn't figure it out quickly.
Finally, I simply cannot get into this validation:
+ else if (strcmp(defel->defname, "error_limit") == 0)
+ {
+ if (cstate->ignore_error)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+ cstate->error_limit = defGetInt64(defel);
+ cstate->ignore_error = true;
+ if (cstate->error_limit == -1)
+ cstate->ignore_all_error = true;
+ }
If cstate->ignore_error is defined, then we have already processed
options list, since this is the only one place, where it's set. So we
should never get into this ereport, doesn't it?
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
copy-test.tsv | text/tab-separated-values | 24 bytes |
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-11-11 13:00:46 |
Message-ID: | CALAY4q_LwbCgLmQ-n7380sZZRkv8DzynW47w=57ODn0CXji2kw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Sep 20, 2019 at 4:16 PM Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
wrote:
>
> First of all, there is definitely a problem with grammar. In docs ERROR
> is defined as option and
>
> COPY test FROM '/path/to/copy-test-simple.csv' ERROR -1;
>
> works just fine, but if modern 'WITH (...)' syntax is used:
>
> COPY test FROM '/path/to/copy-test-simple.csv' WITH (ERROR -1);
> ERROR: option "error" not recognized
>
> while 'WITH (error_limit -1)' it works again.
>
> It happens, since COPY supports modern and very-very old syntax:
>
> * In the preferred syntax the options are comma-separated
> * and use generic identifiers instead of keywords. The pre-9.0
> * syntax had a hard-wired, space-separated set of options.
>
> So I see several options here:
>
> 1) Everything is left as is, but then docs should be updated and
> reflect, that error_limit is required for modern syntax.
>
> 2) However, why do we have to support old syntax here? I guess it exists
> for backward compatibility only, but this is a completely new feature.
> So maybe just 'WITH (error_limit 42)' will be enough?
>
> 3) You also may simply change internal option name from 'error_limit' to
> 'error' or SQL keyword from 'ERROR' tot 'ERROR_LIMIT'.
>
> I would prefer the second option.
>
agreed and Done
>
>
> Next, you use DestRemoteSimple for returning conflicting tuples back:
>
> + dest = CreateDestReceiver(DestRemoteSimple);
> + dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
>
> However, printsimple supports very limited subset of built-in types, so
>
> CREATE TABLE large_test (id integer primary key, num1 bigint, num2
> double precision);
> COPY large_test FROM '/path/to/copy-test.tsv';
> COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;
>
> fails with following error 'ERROR: unsupported type OID: 701', which
> seems to be very confusing from the end user perspective. I've tried to
> switch to DestRemote, but couldn't figure it out quickly.
>
>
fixed
>
> Finally, I simply cannot get into this validation:
>
> + else if (strcmp(defel->defname, "error_limit") == 0)
> + {
> + if (cstate->ignore_error)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options"),
> + parser_errposition(pstate, defel->location)));
> + cstate->error_limit = defGetInt64(defel);
> + cstate->ignore_error = true;
> + if (cstate->error_limit == -1)
> + cstate->ignore_all_error = true;
> + }
>
> If cstate->ignore_error is defined, then we have already processed
> options list, since this is the only one place, where it's set. So we
> should never get into this ereport, doesn't it?
>
>
yes the check only needed for modern syntax
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v9.patch | text/x-patch | 12.1 KB |
From: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-11-15 15:24:41 |
Message-ID: | 23ebc407-4c51-b549-00f0-75c3c7f1fc05@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 11.11.2019 16:00, Surafel Temesgen wrote:
>
>
> Next, you use DestRemoteSimple for returning conflicting tuples back:
>
> + dest = CreateDestReceiver(DestRemoteSimple);
> + dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
>
> However, printsimple supports very limited subset of built-in
> types, so
>
> CREATE TABLE large_test (id integer primary key, num1 bigint, num2
> double precision);
> COPY large_test FROM '/path/to/copy-test.tsv';
> COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;
>
> fails with following error 'ERROR: unsupported type OID: 701', which
> seems to be very confusing from the end user perspective. I've
> tried to
> switch to DestRemote, but couldn't figure it out quickly.
>
>
> fixed
Thanks, now it works with my tests.
1) Maybe it is fine, but now I do not like this part:
+ portal = GetPortalByName("");
+ dest = CreateDestReceiver(DestRemote);
+ SetRemoteDestReceiverParams(dest, portal);
+ dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
Here you implicitly use the fact that portal with a blank name is always
created in exec_simple_query before we get to this point. Next, you
create new DestReceiver and set it to this portal, but it is also
already created and set in the exec_simple_query.
Would it be better if you just explicitly pass ready DestReceiver to
DoCopy (similarly to how it is done for T_ExecuteStmt / ExecuteQuery),
as it may be required by COPY now?
2) My second concern is that you use three internal flags to track
errors limit:
+ int error_limit; /* total number of error to ignore */
+ bool ignore_error; /* is ignore error specified? */
+ bool ignore_all_error; /* is error_limit -1 (ignore all
error)
+ * specified? */
Though it seems that we can just leave error_limit as a user-defined
constant and track errors with something like errors_count. In that case
you do not need auxiliary ignore_all_error flag. But probably it is a
matter of personal choice.
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-11-18 06:42:14 |
Message-ID: | CALAY4q_Yjeh6=TuroMsxTb6rm2iTxycCQmEqkD6aohX7y2_bTw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Nov 15, 2019 at 6:24 PM Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
wrote:
> On 11.11.2019 16:00, Surafel Temesgen wrote:
> >
> >
> > Next, you use DestRemoteSimple for returning conflicting tuples back:
> >
> > + dest = CreateDestReceiver(DestRemoteSimple);
> > + dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
> >
> > However, printsimple supports very limited subset of built-in
> > types, so
> >
> > CREATE TABLE large_test (id integer primary key, num1 bigint, num2
> > double precision);
> > COPY large_test FROM '/path/to/copy-test.tsv';
> > COPY large_test FROM '/path/to/copy-test.tsv' ERROR 3;
> >
> > fails with following error 'ERROR: unsupported type OID: 701', which
> > seems to be very confusing from the end user perspective. I've
> > tried to
> > switch to DestRemote, but couldn't figure it out quickly.
> >
> >
> > fixed
>
> Thanks, now it works with my tests.
>
> 1) Maybe it is fine, but now I do not like this part:
>
> + portal = GetPortalByName("");
> + dest = CreateDestReceiver(DestRemote);
> + SetRemoteDestReceiverParams(dest, portal);
> + dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
>
> Here you implicitly use the fact that portal with a blank name is always
> created in exec_simple_query before we get to this point. Next, you
> create new DestReceiver and set it to this portal, but it is also
> already created and set in the exec_simple_query.
>
> Would it be better if you just explicitly pass ready DestReceiver to
> DoCopy (similarly to how it is done for T_ExecuteStmt / ExecuteQuery),
>
>
Good idea .Thank you
>
> 2) My second concern is that you use three internal flags to track
> errors limit:
>
> + int error_limit; /* total number of error to ignore */
> + bool ignore_error; /* is ignore error specified? */
> + bool ignore_all_error; /* is error_limit -1 (ignore all
> error)
> + * specified? */
>
> Though it seems that we can just leave error_limit as a user-defined
> constant and track errors with something like errors_count. In that case
> you do not need auxiliary ignore_all_error flag. But probably it is a
> matter of personal choice.
>
>
using bool flags will save as from using integer type as a boolean and hold
the fact
error limit was specified even if it became zero and it seems to me it is
straightforward
to treat ignore_all_error separately.
Attache is the patch that use already created DestReceiver
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v10.patch | text/x-patch | 14.9 KB |
From: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-11-21 13:22:29 |
Message-ID: | 5e6abe50-d74c-291a-e294-3f56604bc3c7@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 18.11.2019 9:42, Surafel Temesgen wrote:
> On Fri, Nov 15, 2019 at 6:24 PM Alexey Kondratov
> <a(dot)kondratov(at)postgrespro(dot)ru <mailto:a(dot)kondratov(at)postgrespro(dot)ru>> wrote:
>
>
> 1) Maybe it is fine, but now I do not like this part:
>
> + portal = GetPortalByName("");
> + dest = CreateDestReceiver(DestRemote);
> + SetRemoteDestReceiverParams(dest, portal);
> + dest->rStartup(dest, (int) CMD_SELECT, tupDesc);
>
> Here you implicitly use the fact that portal with a blank name is
> always
> created in exec_simple_query before we get to this point. Next, you
> create new DestReceiver and set it to this portal, but it is also
> already created and set in the exec_simple_query.
>
> Would it be better if you just explicitly pass ready DestReceiver to
> DoCopy (similarly to how it is done for T_ExecuteStmt /
> ExecuteQuery),
>
>
> Good idea .Thank you
Now the whole patch works exactly as expected for me and I cannot find
any new technical flaws. However, the doc is rather vague, especially
these places:
+ specifying it to -1 returns all error record.
Actually, we return only rows with constraint violation, but malformed
rows are ignored with warning. I guess that we simply cannot return
malformed rows back to the caller in the same way as with constraint
violation, since we cannot figure out (in general) which column
corresponds to which type if there are extra or missing columns.
+ and same record formatting error is ignored.
I can get it, but it definitely should be reworded.
What about something like this?
+ <varlistentry>
+ <term><literal>ERROR_LIMIT</literal></term>
+ <listitem>
+ <para>
+ Enables ignoring of errored out rows up to <replaceable
+ class="parameter">limit_number</replaceable>. If <replaceable
+ class="parameter">limit_number</replaceable> is set
+ to -1, then all errors will be ignored.
+ </para>
+
+ <para>
+ Currently, only unique or exclusion constraint violation
+ and rows formatting errors are ignored. Malformed
+ rows will rise warnings, while constraint violating rows
+ will be returned back to the caller.
+ </para>
+
+ </listitem>
+ </varlistentry>
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-11-25 05:55:52 |
Message-ID: | CALAY4q9TZ1cZc04A0_KOhBKAZe3jWeqEXSqJgVtQymD8v9u1BQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg와이즈 토토SQL |
On Thu, Nov 21, 2019 at 4:22 PM Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
wrote:
>
> Now the whole patch works exactly as expected for me and I cannot find
> any new technical flaws. However, the doc is rather vague, especially
> these places:
>
> + specifying it to -1 returns all error record.
>
> Actually, we return only rows with constraint violation, but malformed
> rows are ignored with warning. I guess that we simply cannot return
> malformed rows back to the caller in the same way as with constraint
> violation, since we cannot figure out (in general) which column
> corresponds to which type if there are extra or missing columns.
>
> + and same record formatting error is ignored.
>
> I can get it, but it definitely should be reworded.
>
> What about something like this?
>
> + <varlistentry>
> + <term><literal>ERROR_LIMIT</literal></term>
> + <listitem>
> + <para>
> + Enables ignoring of errored out rows up to <replaceable
> + class="parameter">limit_number</replaceable>. If <replaceable
> + class="parameter">limit_number</replaceable> is set
> + to -1, then all errors will be ignored.
> + </para>
> +
> + <para>
> + Currently, only unique or exclusion constraint violation
> + and rows formatting errors are ignored. Malformed
> + rows will rise warnings, while constraint violating rows
> + will be returned back to the caller.
> + </para>
> +
> + </listitem>
> + </varlistentry>
>
>
>
It is better so changed
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v11.patch | text/x-patch | 15.1 KB |
From: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
---|---|
To: | 'Surafel Temesgen' <surafel3000(at)gmail(dot)com> |
Cc: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, 'Anthony Nowocien' <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: Conflict handling for COPY FROM |
Date: | 2019-12-12 04:51:48 |
Message-ID: | OSBPR01MB4728B3714100D5E3979B446B8C550@OSBPR01MB4728.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello Surafel,
I'm very interested in this patch.
Although I'm a beginner,I would like to participate in the development of PostgreSQL.
1. I want to suggest new output format.
In my opinion, it's kind to display description of output and add "line number" and "error" to output.
For example,
error lines
line number | first | second | third | error
------------+-------+--------+-------+------------
1 | 1 | 10 | 0.5 | UNIQUE
2 | 2 | 42 | 0.1 | CHECK
3 | 3 | NULL | 0 | NOT NULL
(3 rows)
Although only unique or exclusion constraint violation returned back to the caller currently,
I think that column "error" will be useful when it becomes possible to handle other types of errors(check, not-null and so on).
If you assume that users re-execute COPY FROM with the output lines as input, these columns are obstacles.
Therefore I think that this output format should be displayed only when we set new option(for example ERROR_VERBOSE) like "COPY FROM ... ERROR_VERBOSE;".
2. I have a question about copy meta-command.
When I executed copy meta-command, output wasn't displayed.
Does it correspond to copy meta-command?
Regards
--
Asaba Takanori
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
Cc: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-12-13 05:49:20 |
Message-ID: | CALAY4q87BzLMUbc00btGU=CWNXgJxQv-ppEpwE6DwEEwZYmgOg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Asaba,
On Thu, Dec 12, 2019 at 7:51 AM asaba(dot)takanori(at)fujitsu(dot)com <
asaba(dot)takanori(at)fujitsu(dot)com> wrote:
> Hello Surafel,
>
> I'm very interested in this patch.
> Although I'm a beginner,I would like to participate in the development of
> PostgreSQL.
>
>
> 1. I want to suggest new output format.
> In my opinion, it's kind to display description of output and add "line
> number" and "error" to output.
> For example,
>
> error lines
>
> line number | first | second | third | error
> ------------+-------+--------+-------+------------
> 1 | 1 | 10 | 0.5 | UNIQUE
> 2 | 2 | 42 | 0.1 | CHECK
> 3 | 3 | NULL | 0 | NOT NULL
> (3 rows)
>
> Although only unique or exclusion constraint violation returned back to
> the caller currently,
> I think that column "error" will be useful when it becomes possible to
> handle other types of errors(check, not-null and so on).
>
currently we can't get violation kind in speculative insertion
> If you assume that users re-execute COPY FROM with the output lines as
> input, these columns are obstacles.
> Therefore I think that this output format should be displayed only when we
> set new option(for example ERROR_VERBOSE) like "COPY FROM ...
> ERROR_VERBOSE;".
>
>
i agree adding optional feature for this is useful in same scenario but
i think its a material for future improvement after basic feature done.
>
> 2. I have a question about copy meta-command.
> When I executed copy meta-command, output wasn't displayed.
> Does it correspond to copy meta-command?
>
>
okay . i will look at it
thank you
regards
Surafel
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
Cc: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2019-12-16 13:23:59 |
Message-ID: | CALAY4q8x8dR+Mu7dgmwuEPNX-1uVywy3oReBHP8+nM1phzj1RA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Dec 12, 2019 at 7:51 AM asaba(dot)takanori(at)fujitsu(dot)com <
asaba(dot)takanori(at)fujitsu(dot)com> wrote:
> 2. I have a question about copy meta-command.
> When I executed copy meta-command, output wasn't displayed.
> Does it correspond to copy meta-command?
>
>
Fixed
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-onCopy-from-v12.patch | text/x-patch | 16.0 KB |
From: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
---|---|
To: | surafel3000(at)gmail(dot)com |
Cc: | asaba(dot)takanori(at)fujitsu(dot)com, a(dot)kondratov(at)postgrespro(dot)ru, alvherre(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, andrew(dot)dunstan(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, anowocien(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-02-14 07:46:52 |
Message-ID: | 20200214.164652.377076550947349056.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 캔SQL : |
In your patch for copy.sgml:
ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'
I think this should be:
ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>
(no single quote)
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
From: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
---|---|
To: | surafel3000(at)gmail(dot)com |
Cc: | asaba(dot)takanori(at)fujitsu(dot)com, a(dot)kondratov(at)postgrespro(dot)ru, alvherre(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, andrew(dot)dunstan(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, anowocien(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-02-17 02:37:10 |
Message-ID: | 20200217.113710.1462561830892693526.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg롤 토토SQL : |
> In your patch for copy.sgml:
>
> ERROR_LIMIT '<replaceable class="parameter">limit_number</replaceable>'
>
> I think this should be:
>
> ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>
>
> (no single quote)
More comments:
- I think the document should stat that if limit_number = 0, all
errors are immediately raised (behaves same as current befavior without the patch).
- "constraint violating rows will be returned back to the caller."
This does explains the current implementation. I am not sure if it's
intended or not though:
cat /tmp/a
1 1
2 2
3 3
3 4
psql test
$ psql test
psql (13devel)
Type "help" for help.
test=# select * from t1;
i | j
---+---
1 | 1
2 | 2
3 | 3
(3 rows)
test=# copy t1 from '/tmp/a' with (error_limit 1);
ERROR: duplicate key value violates unique constraint "t1_pkey"
DETAIL: Key (i)=(2) already exists.
CONTEXT: COPY t1, line 2: "2 2"
So if the number of errors raised exceeds error_limit, no constaraint
violating rows (in this case i=1, j=1) are returned.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Cc: | asaba(dot)takanori(at)fujitsu(dot)com, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-02-17 06:26:27 |
Message-ID: | CALAY4q_ue=XqBgnJjq=ntxv=LtPX-8Dnng7=EqLswwAnRXHb_w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
> > ERROR_LIMIT '<replaceable
> class="parameter">limit_number</replaceable>'
> >
> > I think this should be:
> >
> > ERROR_LIMIT <replaceable class="parameter">limit_number</replaceable>
> >
> > (no single quote)
>
>
Thank you .Fixed
> More comments:
>
> - I think the document should stat that if limit_number = 0, all
> errors are immediately raised (behaves same as current befavior without
> the patch).
>
>
if we want all error to be raised error limit_number not need to be
specified.
but if it is specified like limit_number = 0 i think it is self-explanatory
> - "constraint violating rows will be returned back to the caller."
> This does explains the current implementation. I am not sure if it's
> intended or not though:
>
> cat /tmp/a
> 1 1
> 2 2
> 3 3
> 3 4
>
> psql test
> $ psql test
> psql (13devel)
> Type "help" for help.
>
> test=# select * from t1;
> i | j
> ---+---
> 1 | 1
> 2 | 2
> 3 | 3
> (3 rows)
>
> test=# copy t1 from '/tmp/a' with (error_limit 1);
> ERROR: duplicate key value violates unique constraint "t1_pkey"
> DETAIL: Key (i)=(2) already exists.
> CONTEXT: COPY t1, line 2: "2 2"
>
> So if the number of errors raised exceeds error_limit, no constaraint
> violating rows (in this case i=1, j=1) are returned.
>
error_limit is specified to dictate the number of error allowed in copy
operation
to precede. If it exceed the number the operation is stopped. there may
be more conflict afterward and returning limited number of conflicting rows
have no much use
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-q-from-v13.patch | text/x-patch | 16.0 KB |
From: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
---|---|
To: | surafel3000(at)gmail(dot)com |
Cc: | asaba(dot)takanori(at)fujitsu(dot)com, a(dot)kondratov(at)postgrespro(dot)ru, alvherre(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, andrew(dot)dunstan(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, anowocien(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-02-17 07:00:00 |
Message-ID: | 20200217.160000.1889738947788169392.t-ishii@sraoss.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
>> test=# copy t1 from '/tmp/a' with (error_limit 1);
>> ERROR: duplicate key value violates unique constraint "t1_pkey"
>> DETAIL: Key (i)=(2) already exists.
>> CONTEXT: COPY t1, line 2: "2 2"
>>
>> So if the number of errors raised exceeds error_limit, no constaraint
>> violating rows (in this case i=1, j=1) are returned.
>>
>
> error_limit is specified to dictate the number of error allowed in copy
> operation
> to precede. If it exceed the number the operation is stopped. there may
> be more conflict afterward and returning limited number of conflicting rows
> have no much use
Still I see your explanation differs from what the document patch says.
+ Currently, only unique or exclusion constraint violation
+ and rows formatting errors are ignored. Malformed
+ rows will rise warnings, while constraint violating rows
+ will be returned back to the caller.
I am afraid once this patch is part of next version of PostgreSQL, we
get many complains/inqueires from users. What about changing like this:
Currently, only unique or exclusion constraint violation and
rows formatting errors are ignored. Malformed rows will rise
warnings, while constraint violating rows will be returned back
to the caller unless any error is raised; i.e. if any error is
raised due to error_limit exceeds, no rows will be returned back
to the caller.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Cc: | asaba(dot)takanori(at)fujitsu(dot)com, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-02-17 08:07:48 |
Message-ID: | CALAY4q9GtfScUoafzin8zFQ-UjXB65hByFy1jsS=VNbX9dvuXw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | 503 스포츠 토토 사이트 |
On Mon, Feb 17, 2020 at 10:00 AM Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> wrote:
> >> test=# copy t1 from '/tmp/a' with (error_limit 1);
> >> ERROR: duplicate key value violates unique constraint "t1_pkey"
> >> DETAIL: Key (i)=(2) already exists.
> >> CONTEXT: COPY t1, line 2: "2 2"
> >>
> >> So if the number of errors raised exceeds error_limit, no constaraint
> >> violating rows (in this case i=1, j=1) are returned.
> >>
> >
> > error_limit is specified to dictate the number of error allowed in copy
> > operation
> > to precede. If it exceed the number the operation is stopped. there may
> > be more conflict afterward and returning limited number of conflicting
> rows
> > have no much use
>
> Still I see your explanation differs from what the document patch says.
>
> + Currently, only unique or exclusion constraint violation
> + and rows formatting errors are ignored. Malformed
> + rows will rise warnings, while constraint violating rows
> + will be returned back to the caller.
>
> I am afraid once this patch is part of next version of PostgreSQL, we
> get many complains/inqueires from users. What about changing like this:
>
> Currently, only unique or exclusion constraint violation and
> rows formatting errors are ignored. Malformed rows will rise
> warnings, while constraint violating rows will be returned back
> to the caller unless any error is raised; i.e. if any error is
> raised due to error_limit exceeds, no rows will be returned back
> to the caller.
>
Its better so amended .
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-q-from-v14.patch | text/x-patch | 16.2 KB |
From: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
---|---|
To: | 'Surafel Temesgen' <surafel3000(at)gmail(dot)com> |
Cc: | 'Tatsuo Ishii' <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: Conflict handling for COPY FROM |
Date: | 2020-03-06 08:30:03 |
Message-ID: | OSBPR01MB47287299909583B6E26F39A28CE30@OSBPR01MB4728.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello Surafel,
Sorry for my late reply.
From: Surafel Temesgen <surafel3000(at)gmail(dot)com>
>On Thu, Dec 12, 2019 at 7:51 AM mailto:asaba(dot)takanori(at)fujitsu(dot)com <mailto:asaba(dot)takanori(at)fujitsu(dot)com> wrote:
>>2. I have a question about copy meta-command.
>>When I executed copy meta-command, output wasn't displayed.
>>Does it correspond to copy meta-command?
>
>Fixed
Thank you.
I think we need regression test that constraint violating row is returned back to the caller.
How about this?
・ /src/test/regress/expected/copy2.out
@@ -1,5 +1,5 @@
CREATE TEMP TABLE x (
- a serial,
+ a serial UNIQUE,
b int,
c text not null default 'stuff',
d text,
@@ -55,6 +55,16 @@ LINE 1: COPY x TO stdout WHERE a = 1;
^
COPY x from stdin WHERE a = 50004;
COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING: skipping "70001 22 32" --- missing data for column "d"
+WARNING: skipping "70002 23 33 43 53 54" --- extra data after last expected column
+WARNING: skipping "70003 24 34 44" --- missing data for column "e"
+
+ a | b | c | d | e
+-------+----+----+----+----------------------
+ 70005 | 27 | 37 | 47 | before trigger fired
+(1 row)
+
COPY x from stdin WHERE f > 60003;
ERROR: column "f" does not exist
・ src/test/regress/sql/copy2.sql
@@ -1,5 +1,5 @@
CREATE TEMP TABLE x (
- a serial,
+ a serial UNIQUE,
b int,
c text not null default 'stuff',
d text,
@@ -110,6 +110,15 @@ COPY x from stdin WHERE a > 60003;
60005 26 36 46 56
\.
+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001 22 32
+70002 23 33 43 53 54
+70003 24 34 44
+70004 25 35 45 55
+70005 26 36 46 56
+70005 27 37 47 57
+\.
+
COPY x from stdin WHERE f > 60003;
COPY x from stdin WHERE a = max(x.b);
Regards,
--
Takanori Asaba
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
Cc: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-03-09 12:34:54 |
Message-ID: | CALAY4q-TY-qv73e0pPiAAu+8BXzFVUFu+LRhnXcCS1spjE0B3w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Mar 6, 2020 at 11:30 AM asaba(dot)takanori(at)fujitsu(dot)com <
asaba(dot)takanori(at)fujitsu(dot)com> wrote:
> Hello Surafel,
>
> Sorry for my late reply.
>
> From: Surafel Temesgen <surafel3000(at)gmail(dot)com>
> >On Thu, Dec 12, 2019 at 7:51 AM mailto:asaba(dot)takanori(at)fujitsu(dot)com
> <mailto:asaba(dot)takanori(at)fujitsu(dot)com> wrote:
> >>2. I have a question about copy meta-command.
> >>When I executed copy meta-command, output wasn't displayed.
> >>Does it correspond to copy meta-command?
> >
> >Fixed
> Thank you.
>
> I think we need regression test that constraint violating row is returned
> back to the caller.
> How about this?
>
>
okay attached is a rebased patch with it
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-copy-from-v15.patch | text/x-patch | 16.4 KB |
From: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com>, "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
Cc: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-03-10 17:05:24 |
Message-ID: | 9ffdd63f-a279-99b6-30be-9defe51ea6a1@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg메이저 토토 사이트SQL |
On 09.03.2020 15:34, Surafel Temesgen wrote:
>
> okay attached is a rebased patch with it
>
+ Portal portal = NULL;
...
+ portal = GetPortalByName("");
+ SetRemoteDestReceiverParams(dest, portal);
I think that you do not need this, since you are using a ready
DestReceiver. The whole idea of passing DestReceiver down to the
CopyFrom was to avoid that code. This unnamed portal is created in the
exec_simple_query [1] and has been already set to the DestReceiver there
[2].
Maybe I am missing something, but I have just removed this code and
everything works just fine.
[1]
https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1178
[2]
https://github.com/postgres/postgres/blob/0a42a2e9/src/backend/tcop/postgres.c#L1226
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
From: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
---|---|
To: | 'Surafel Temesgen' <surafel3000(at)gmail(dot)com> |
Cc: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: Conflict handling for COPY FROM |
Date: | 2020-03-13 03:09:25 |
Message-ID: | OSBPR01MB4728AFAA9A172AD1D2B98A548CFA0@OSBPR01MB4728.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello Surafel,
From: Surafel Temesgen <surafel3000(at)gmail(dot)com>
>>On Fri, Mar 6, 2020 at 11:30 AM mailto:asaba(dot)takanori(at)fujitsu(dot)com <mailto:asaba(dot)takanori(at)fujitsu(dot)com> wrote:
>>I think we need regression test that constraint violating row is returned back to the caller.
>>How about this?
>
>okay attached is a rebased patch with it
Thank you very much.
Although it is a small point, it may be better like this:
+70005 27 36 46 56 -> 70005 27 37 47 57
I want to discuss about copy from binary file.
It seems that this patch tries to avoid the error that number of field is different .
+ {
+ if (cstate->error_limit > 0 || cstate->ignore_all_error)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("skipping \"%s\" --- row field count is %d, expected %d",
+ cstate->line_buf.data, (int) fld_count, attr_count)));
+ cstate->error_limit--;
+ goto next_line;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ }
I checked like this:
postgres=# CREATE TABLE x (
postgres(# a serial UNIQUE,
postgres(# b int,
postgres(# c text not null default 'stuff',
postgres(# d text,
postgres(# e text
postgres(# );
CREATE TABLE
postgres=# COPY x from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 70004 25 35 45 55
>> 70005 26 36 46 56
>> \.
COPY 2
postgres=# SELECT * FROM x;
a | b | c | d | e
-------+----+----+----+----
70004 | 25 | 35 | 45 | 55
70005 | 26 | 36 | 46 | 56
(2 rows)
postgres=# COPY x TO '/tmp/copyout' (FORMAT binary);
COPY 2
postgres=# CREATE TABLE y (
postgres(# a serial UNIQUE,
postgres(# b int,
postgres(# c text not null default 'stuff',
postgres(# d text
postgres(# );
CREATE TABLE
postgres=# COPY y FROM '/tmp/copyout' WITH (FORMAT binary,ERROR_LIMIT -1);
2020-03-12 16:55:55.457 JST [2319] WARNING: skipping "" --- row field count is 5, expected 4
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 1
2020-03-12 16:55:55.457 JST [2319] WARNING: skipping "" --- row field count is 0, expected 4
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 2
2020-03-12 16:55:55.457 JST [2319] ERROR: unexpected EOF in COPY data
2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 3, column a
2020-03-12 16:55:55.457 JST [2319] STATEMENT: COPY y FROM '/tmp/copyout' WITH (FORMAT binary,ERROR_LIMIT -1);
WARNING: skipping "" --- row field count is 5, expected 4
WARNING: skipping "" --- row field count is 0, expected 4
ERROR: unexpected EOF in COPY data
CONTEXT: COPY y, line 3, column a
It seems that the error isn't handled.
'WARNING: skipping "" --- row field count is 5, expected 4' is correct,
but ' WARNING: skipping "" --- row field count is 0, expected 4' is not correct.
Also, is it needed to skip the error that happens when input is binary file?
Is the case that each row has different number of field and only specific rows are copied occurred?
Regards,
--
Takanori Asaba
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
Cc: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-03-26 07:44:36 |
Message-ID: | CALAY4q-jbfDicjbCUC4piQS=hQUR0Xrjb2M4dxwh+sccYdvRTw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Takanori Asaba,
>
>
> Although it is a small point, it may be better like this:
> +70005 27 36 46 56 -> 70005 27 37 47 57
>
>
done
> I want to discuss about copy from binary file.
> It seems that this patch tries to avoid the error that number of field is
> different .
>
> + {
> + if (cstate->error_limit > 0 ||
> cstate->ignore_all_error)
> + {
> + ereport(WARNING,
> +
> (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("skipping \"%s\"
> --- row field count is %d, expected %d",
> +
> cstate->line_buf.data, (int) fld_count, attr_count)));
> + cstate->error_limit--;
> + goto next_line;
> + }
> + else
> + ereport(ERROR,
> +
> (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("row field count
> is %d, expected %d",
> + (int)
> fld_count, attr_count)));
> +
> + }
>
> I checked like this:
>
> postgres=# CREATE TABLE x (
> postgres(# a serial UNIQUE,
> postgres(# b int,
> postgres(# c text not null default 'stuff',
> postgres(# d text,
> postgres(# e text
> postgres(# );
> CREATE TABLE
> postgres=# COPY x from stdin;
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> 70004 25 35 45 55
> >> 70005 26 36 46 56
> >> \.
> COPY 2
> postgres=# SELECT * FROM x;
> a | b | c | d | e
> -------+----+----+----+----
> 70004 | 25 | 35 | 45 | 55
> 70005 | 26 | 36 | 46 | 56
> (2 rows)
>
> postgres=# COPY x TO '/tmp/copyout' (FORMAT binary);
> COPY 2
> postgres=# CREATE TABLE y (
> postgres(# a serial UNIQUE,
> postgres(# b int,
> postgres(# c text not null default 'stuff',
> postgres(# d text
> postgres(# );
> CREATE TABLE
> postgres=# COPY y FROM '/tmp/copyout' WITH (FORMAT binary,ERROR_LIMIT -1);
> 2020-03-12 16:55:55.457 JST [2319] WARNING: skipping "" --- row field
> count is 5, expected 4
> 2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 1
> 2020-03-12 16:55:55.457 JST [2319] WARNING: skipping "" --- row field
> count is 0, expected 4
> 2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 2
> 2020-03-12 16:55:55.457 JST [2319] ERROR: unexpected EOF in COPY data
> 2020-03-12 16:55:55.457 JST [2319] CONTEXT: COPY y, line 3, column a
> 2020-03-12 16:55:55.457 JST [2319] STATEMENT: COPY y FROM '/tmp/copyout'
> WITH (FORMAT binary,ERROR_LIMIT -1);
> WARNING: skipping "" --- row field count is 5, expected 4
> WARNING: skipping "" --- row field count is 0, expected 4
> ERROR: unexpected EOF in COPY data
> CONTEXT: COPY y, line 3, column a
>
> It seems that the error isn't handled.
> 'WARNING: skipping "" --- row field count is 5, expected 4' is correct,
> but ' WARNING: skipping "" --- row field count is 0, expected 4' is not
> correct.
>
>
Thank you for the detailed example
> Also, is it needed to skip the error that happens when input is binary
> file?
> Is the case that each row has different number of field and only specific
> rows are copied occurred?
>
>
An error that can be surly handled without transaction rollback can
be included in error handling but i will like to proceed without binary file
errors handling for the time being
regards
Surafel
Attachment | Content-Type | Size |
---|---|---|
conflict-handling-copy-from-v16.patch | application/octet-stream | 15.3 KB |
From: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com> |
---|---|
To: | 'Surafel Temesgen' <surafel3000(at)gmail(dot)com> |
Cc: | Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: Conflict handling for COPY FROM |
Date: | 2020-03-27 00:20:07 |
Message-ID: | OSBPR01MB4728C0FFD431FCC0600A1F568CCC0@OSBPR01MB4728.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello Surafel,
From: Surafel Temesgen <surafel3000(at)gmail(dot)com>
>An error that can be surly handled without transaction rollback can
>be included in error handling but i will like to proceed without binary file
>errors handling for the time being
Thank you.
Also it seems that you apply Alexey's comment.
So I'll mark this patch as ready for commiter.
Regards,
--
Takanori Asaba
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-03-27 22:27:32 |
Message-ID: | 28444.1585348052@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 커뮤니티SQL |
Surafel Temesgen <surafel3000(at)gmail(dot)com> writes:
> [ conflict-handling-copy-from-v16.patch ]
I took a quick look at this patch, since it was marked "ready for
committer", but I don't see how it can possibly be considered committable.
1. Covering only the errors that are thrown in DoCopy itself doesn't
seem to me to pass the smell test. Yes, I'm sure there's some set of
use-cases for which that'd be helpful, but I think most people would
expect a "skip errors" option to be able to handle cases like malformed
numbers or bad encoding. I understand the implementation reasons that
make it impractical to cover other errors, but do we really want a
feature that most people will see as much less than half-baked? I fear
it'll be an embarrassment.
2. If I'm reading the patch correctly, (some) rejected rows are actually
sent back to the client. This is a wire protocol break of the first
magnitude, and can NOT be accepted. At least not without some provisions
for not doing it with a client that isn't prepared for it. I also am
fairly worried about the possibilities for deadlock (ie, both ends stuck
waiting for the other one to absorb data) if the return traffic volume is
high enough.
3. I don't think enough thought has been put into the reporting, either.
+ ereport(WARNING,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("skipping \"%s\" --- extra data after last expected column",
+ cstate->line_buf.data)));
That's not going to be terribly helpful if the input line runs to many
megabytes. Or even if no individual line is very long, but you get
millions of such warnings. It's pretty much at variance with our
message style guidelines (among other things, those caution you to keep
the primary error message short); and it's randomly different from
COPY's existing practice, which is to show the faulty line as CONTEXT.
Plus it seems plenty weird that some errors are reported this way while
others are reported by sending back the bad tuple (with, it looks like,
no mention of what the specific problem is ... what if you have a lot of
unique indexes?).
BTW, while I don't know much about the ON CONFLICT (speculative
insertion) infrastructure, I wonder how well it really works to
not specify an arbiter index. I see that you're getting away with
it in a trivial test case that has exactly one index, but that's
not stressing the point very hard.
On the whole, I feel like adding this sort of functionality to COPY
itself is a dead end. COPY is meant for fast bulk transfer and not
much else; trying to load more functionality onto it can only end
in serving multiple masters poorly. What we normally recommend
if you have data that needs to be cleaned is to import it into a
permissively-defined staging table (eg, with all columns declared
as text) and then transfer cleaned data to your tables-of-record.
Looking at this patch in terms of whether the functionality is
available in that approach, it seems like you might want two parts
of it:
1. A COPY option to be flexible about the number of columns in the
input, say by filling omitted columns with NULLs.
2. INSERT ... ON CONFLICT can be used to transfer data to permanent
tables with rejection of duplicate keys, but it doesn't have much
flexibility about just what to do with duplicates. Maybe we could
add more ON CONFLICT actions? Sending conflicted rows to some other
table, or updating the source table to show which rows were copied
and which not, might be useful things to think about.
regards, tom lane
From: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-03-30 12:46:13 |
Message-ID: | CALAY4q_5y3TZYkC5nCYJr1Lufbb3kykd6KAedYyW3J3rJ_XsAA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Mar 27, 2020 at 3:27 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Surafel Temesgen <surafel3000(at)gmail(dot)com> writes:
> > [ conflict-handling-copy-from-v16.patch ]
>
> I took a quick look at this patch, since it was marked "ready for
> committer", but I don't see how it can possibly be considered committable.
>
> 1. Covering only the errors that are thrown in DoCopy itself doesn't
> seem to me to pass the smell test. Yes, I'm sure there's some set of
> use-cases for which that'd be helpful, but I think most people would
> expect a "skip errors" option to be able to handle cases like malformed
> numbers or bad encoding. I understand the implementation reasons that
> make it impractical to cover other errors, but do we really want a
> feature that most people will see as much less than half-baked? I fear
> it'll be an embarrassment.
>
> I did small research and most major database management system didn't
claim
they handle every problem in loading file at least in every usage scenario.
> 2. If I'm reading the patch correctly, (some) rejected rows are actually
> sent back to the client. This is a wire protocol break of the first
> magnitude, and can NOT be accepted. At least not without some provisions
> for not doing it with a client that isn't prepared for it. I also am
> fairly worried about the possibilities for deadlock (ie, both ends stuck
> waiting for the other one to absorb data) if the return traffic volume is
> high enough.
>
>
if my understanding is correct copy_in occur in sub protocol inside simple
or
extended query protocol that know and handle the responds
> 3. I don't think enough thought has been put into the reporting, either.
>
> + ereport(WARNING,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("skipping \"%s\" --- extra data after
> last expected column",
> + cstate->line_buf.data)));
>
> That's not going to be terribly helpful if the input line runs to many
> megabytes. Or even if no individual line is very long, but you get
> millions of such warnings. It's pretty much at variance with our
> message style guidelines (among other things, those caution you to keep
> the primary error message short); and it's randomly different from
> COPY's existing practice, which is to show the faulty line as CONTEXT.
> Plus it seems plenty weird that some errors are reported this way while
> others are reported by sending back the bad tuple (with, it looks like,
> no mention of what the specific problem is ... what if you have a lot of
> unique indexes?).
>
> Currently we can’t get problem description in speculative insertion
infrastructure and am afraid adding problem description to return tuple
will make the data less usable without further processing.Warning raised
for error that happen before tuple contracted. Other option is to skip those
record silently but reporting to user give user the chance to correct it.
> BTW, while I don't know much about the ON CONFLICT (speculative
> insertion) infrastructure, I wonder how well it really works to
> not specify an arbiter index. I see that you're getting away with
> it in a trivial test case that has exactly one index, but that's
> not stressing the point very hard.
>
> If arbiter index is not specified that means all indexes as the comment in
ExecCheckIndexConstraints stated
/* ----------------------------------------------------------------
* ExecCheckIndexConstraints
*
* This routine checks if a tuple violates any unique or
* exclusion constraints. Returns true if there is no conflict.
* Otherwise returns false, and the TID of the conflicting
* tuple is returned in *conflictTid.
*
* If 'arbiterIndexes' is given, only those indexes are checked.
* NIL means all indexes.
regards
Surafel
From: | David Steele <david(at)pgmasters(dot)net> |
---|---|
To: | Surafel Temesgen <surafel3000(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-04-08 15:16:57 |
Message-ID: | d2222495-a7da-d81a-9058-9f5cea6d763d@pgmasters.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> On Fri, Mar 27, 2020 at 3:27 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>
> I took a quick look at this patch, since it was marked "ready for
> committer", but I don't see how it can possibly be considered
> committable.
Based on Tom's review I've marked this patch Returned with Feedback.
If there's an acceptable implementation, it seems that it would be very
different from what we have here.
Regards,
--
-David
david(at)pgmasters(dot)net
From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Surafel Temesgen <surafel3000(at)gmail(dot)com>, "asaba(dot)takanori(at)fujitsu(dot)com" <asaba(dot)takanori(at)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Nowocien <anowocien(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Conflict handling for COPY FROM |
Date: | 2020-04-08 16:41:52 |
Message-ID: | CAKFQuway9O6kzLCTDgO4fS6pFyk75a1HqtvTfzKfC9diJ+473g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 사이트 추천SQL |
On Fri, Mar 27, 2020 at 3:27 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Surafel Temesgen <surafel3000(at)gmail(dot)com> writes:
> > [ conflict-handling-copy-from-v16.patch ]
>
> I took a quick look at this patch, since it was marked "ready for
> committer", but I don't see how it can possibly be considered committable.
>
[...]
>
> Looking at this patch in terms of whether the functionality is
> available in that approach, it seems like you might want two parts
> of it:
>
> 1. A COPY option to be flexible about the number of columns in the
> input, say by filling omitted columns with NULLs.
>
> 2. INSERT ... ON CONFLICT can be used to transfer data to permanent
> tables with rejection of duplicate keys, but it doesn't have much
> flexibility about just what to do with duplicates. Maybe we could
> add more ON CONFLICT actions? Sending conflicted rows to some other
> table, or updating the source table to show which rows were copied
> and which not, might be useful things to think about.
>
tl/dr: patch 1: change the option to something like "processing_mode = row
{default = file}" and relay existing errors across all rows in the table in
the error detail message.
tl/dr patch 2: add an "IGNORE_CONFLICTS = {-1, 0 default, 1+}" option that
converts the errors that appear for speculative insertions into warnings
(in the error detail message) and either aborts the copy (cleanly, no
inserted rows) if the count exceeds the user specified value) or commits if
it is able (i.e., no errors in the error message detail - which would come
from other problems besides conflicts). I don't really get why a number is
needed here though - its not like ON CONFLICT DO NOTHING has that ability
and all this seems to be wanting to do is enable a subset of ON CONFLICT DO
NOTHING for COPY - in which case no warning or error would appear in the
first place.
Sorry for the rambling below, a decent chuck of the material is stream of
consciousness but it all generally relates to the behaviors that this patch
is touching.
My desired take-away from this would be to have COPY's error response
include one line for each input line that fails to be inserted with the
line number and the underlying error message passed along verbatim.
Importing, fixing one error, trying again, fixing another error, repeat is
a first level goal for me. Including at most the first 30 characters of
the problematic line would facilitate the common case where the first field
is an identifier which is more useful that a pure line number. But
including the entire line doesn't seem worth the risk. Though it should be
considered whether to include the error detail of the underlying error
message - probably as a subsequent line.
After that, consider having a facility for telling COPY that you are OK if
it ignores the problem rows and inserts the ones it is able - and
converting the final exit code to be a warning instead of an error
(whatever that means, if anything, in this context).
The speculative insertion stuff is outside my experience but are we trying
to just make it work, give the COPY user control of whether its used or
not, have an independent facility just for copy, or something else?
Teaching copy to use speculative insertion infrastructure that already
exists is a patch in its own right and seems to be fixing a current POLA
violation - this other stuff about reporting and ignoring errors is besides
the point. The basic inter-operability fix seems like it could be made to
work under today's copy error handling and report framework just fine.
If there is value in someone augmenting the error handling and response
framework to work in a tabular format instead of a free-form error message
text body that too could be presented and discussed as its own commit.
Granted it may be the case that such a verbose error message from COPY as
described above is undesirable but would be palatable in some other
presentation format. I'm inclined to believe, however, that even if COPY
is made an exception to the general error message rules (and the detail
information would be part of the errdetail, not the main message, it could
just include a summary count) that the added functionality would make the
exception acceptable.
To be honest I haven't dived into the assumptions baked into the regression
tests to know whether the tabular stuff that was discussed and that is
shown in the regression expected outputs is normal or not. I assume the
part in question is:
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING: skipping "70001 22 32" --- missing data for column "d"
+WARNING: skipping "70002 23 33 43 53 54" --- extra data after last
expected column
+WARNING: skipping "70003 24 34 44" --- missing data for column "e"
+
+ a | b | c | d | e
+-------+----+----+----+----------------------
+ 70005 | 27 | 37 | 47 | before trigger fired
+(1 row)
Right now COPY would just fail. What would be nice is for it to say:
ERROR: Line 1? missing data for column "d"
ERROR: Line 2? --- extra data after last expected column
ERROR: Line 3? --- missing data for column "e"
Making process go as far along as possible, to get better (more specific)
messages from deeper in the system, but still just saying "ERROR: Line #
failed"
Then,
WARNING: ...
WARNING: ...
WARNING: ...
COPY 1
But I'm not even sure why the tabular representation is showing up in that
stanza...
From the OP:
"If you assume that users re-execute COPY FROM with the output lines as
input, these columns are obstacles."
I really don't care right now to make that assumption nor is facilitating
it necessarily a goal. While this may be the current motivation there are
other parts that are useful in their own right whether or not we proceed
toward this final outcome. As Tom said its unclear whether its realistic
to make this work directly and there is no documentation that describes how
this patch would support that use case. I haven't delved that deeply into
this potential aspect of the patch but it is something worth of its own
patch on a thread with an appropriate title.
Given the current thread's title (conflict handling - not clear on whether
"make it work" or "response alterations when it doesn't" is the goal) all
this should be doing is making it so errors don't happen when all rows
would be inserted correctly if existing speculative insertion routines were
being used for the exact same data coming from a table and moved via INSERT
(ON CONFLICT DO NOTHING). That implies that any issues in populating said
equivalent table are out of scope for this patch - as is changing the error
handling mechanics. I don't know if, with those pieces cut out, whether
the core of the patch with speculative insertions is ready to commit.
Looking only at the patch it seems as if speculative insertion is only
attempted if we are willing to ignore errors. Thus ignoring errors is an
indirect way of saying that you want to bypass the bulk insertion
performance trade-offs and once you do that you might as well check for
conflicts and let them resolve. It is desirable, though, to be able to
turn off those performance trade-offs simply in order to attempt to insert
every single row and be informed of all the errors in the input in one
pass. Future flags like ERROR_LIMIT would then simply imply that state.
With the two tl/dr patches in place features like making actions besides
"DO NOTHING" (which is what "ignore" basically does but you can choose a
maximum count to accept a do nothing outcome) can be considered. Also,
adding an option to "IGNORE_PARSING_ERRORS" or IGNORE_STRUCTURE_ERRORS" (or
creating an enum and a multi-valued option...). I don't have a problem
with limited scope here but the naming should be chosen to match. You name
something "ERRORS" and it should cover everything. In addition, being more
precise, besides aiding development scope, gives the user the ability to
choose which kinds of errors that are willing to ignore and which they
really want to be fatal.
David J.