Lists: | Postg토토SQL : Postg토토SQL 메일 링리스트 : 2022-09-25 이후 PGSQL-BUGS 20:34 |
---|
From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | thomas(dot)d(dot)mckay(at)gmail(dot)com |
Subject: | BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction |
Date: | 2022-09-05 11:09:27 |
Message-ID: | 17607-bd8ccc81226f7f80@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 17607
Logged by: Thomas Mc Kay
Email address: thomas(dot)d(dot)mckay(at)gmail(dot)com
PostgreSQL version: 13.8
Operating system: Debian in docker
Description:
Hello,
I've encountered a bug while doing some SQL migrations on a PostgreSQL 13.8
server. My initial use-case is a Django app running migrations, a test setup
and a test, but that code was more than 10K lines of DDL, so I've reduced
the code to a minimal reproducible example (as best I could).
I also tested the same script in 14.0 and 14.5 and in both, the bug is
fixed. I haven't seen anything related to it in the changelog for 14.0, so I
don't know whether the bug was fixed purposefully and forgotten in the
changelog or if it was fixed by accident, maybe as a corollary of another
bugfix/feature. I also don't know if this is something the devs would want
to fix in the 13.x line. Either way, I figured it might make a good
regression test case for 14.x.
Environment (Docker):
- PostgreSQL 13.8 (Debian 13.8-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
Test code:
```
-- Migration 1
CREATE TABLE "tbl" ("id" serial NOT NULL PRIMARY KEY);
-- Migration 2
ALTER TABLE "tbl" ADD COLUMN "value" int DEFAULT 0 NOT NULL;
CREATE FUNCTION on_tbl_parent_id_change_fn() RETURNS TRIGGER AS
$on_tbl_parent_id_change$
BEGIN
RAISE EXCEPTION 'BOOM !';
END
$on_tbl_parent_id_change$ LANGUAGE PLpgSQL VOLATILE;
CREATE TRIGGER on_tbl_parent_id_update
AFTER UPDATE ON tbl
REFERENCING OLD TABLE AS old_rows NEW TABLE AS new_rows
FOR EACH STATEMENT
EXECUTE PROCEDURE on_tbl_parent_id_change_fn();
-- Test setup
INSERT INTO "tbl" ("value") VALUES (2);
-- Test
BEGIN;
SAVEPOINT svp;
UPDATE tbl SET value=1; -- Fails here
ROLLBACK TO SAVEPOINT svp;
ROLLBACK;
```
Output (verbose):
```
-- CREATE TABLE
-- ALTER TABLE
-- CREATE FUNCTION
-- CREATE TRIGGER
-- INSERT 0 1
-- BEGIN
-- SAVEPOINT
-- WARNING: 01000: AbortSubTransaction while in ABORT state
-- LOCATION: AbortSubTransaction, xact.c:4999
-- WARNING: 01000: AbortSubTransaction while in ABORT state
-- LOCATION: AbortSubTransaction, xact.c:4999
-- WARNING: 01000: AbortSubTransaction while in ABORT state
-- LOCATION: AbortSubTransaction, xact.c:4999
-- ERROR: P0001: BOOM !
-- CONTEXT: PL/pgSQL function on_tbl_parent_id_change_fn() line 3 at
RAISE
-- LOCATION: exec_stmt_raise, pl_exec.c:3854
-- ERROR: XX000: tupdesc reference 0x7fc02873fbc8 is not owned by resource
owner SubTransaction
-- LOCATION: ResourceOwnerForgetTupleDesc, resowner.c:1188
-- ERROR: XX000: tupdesc reference 0x7fc02873fbc8 is not owned by resource
owner SubTransaction
-- LOCATION: ResourceOwnerForgetTupleDesc, resowner.c:1188
-- ERROR: XX000: tupdesc reference 0x7fc02873fbc8 is not owned by resource
owner SubTransaction
-- LOCATION: ResourceOwnerForgetTupleDesc, resowner.c:1188
-- ERROR: XX000: tupdesc reference 0x7fc02873fbc8 is not owned by resource
owner SubTransaction
-- LOCATION: ResourceOwnerForgetTupleDesc, resowner.c:1188
-- PANIC: XX000: ERRORDATA_STACK_SIZE exceeded
-- LOCATION: errstart, elog.c:359
-- server closed the connection unexpectedly
-- This probably means the server terminated abnormally
-- before or while processing the request.
-- The connection to the server was lost. Attempting reset: Failed.
-- You are currently not connected to a database.
-- You are currently not connected to a database.
```
Expected output:
```
CREATE TABLE
ALTER TABLE
CREATE FUNCTION
CREATE TRIGGER
INSERT 0 1
BEGIN
SAVEPOINT
ERROR: BOOM !
CONTEXT: PL/pgSQL function on_tbl_parent_id_change_fn() line 3 at RAISE
ROLLBACK
ROLLBACK
```
The bug disappears if any of the following is true:
- value is added to the table definition and not with ALTER TABLE.
- value is declared without a default.
- `REFERENCING OLD TABLE AS old_rows NEW TABLE AS new_rows` is removed from
the trigger.
- a `VACUUM FULL tbl` is done between the trigger creation and the insert.
I don't know what to make of it, though.
Cheers,
Thomas
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | thomas(dot)d(dot)mckay(at)gmail(dot)com |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction |
Date: | 2022-09-05 17:16:19 |
Message-ID: | 2452250.1662398179@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg스포츠 토토SQL : Postg스포츠 토토SQL 메일 링리스트 : 2022-09-05 이후 PGSQL-BUGS 17:16 |
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> I've encountered a bug while doing some SQL migrations on a PostgreSQL 13.8
> server. My initial use-case is a Django app running migrations, a test setup
> and a test, but that code was more than 10K lines of DDL, so I've reduced
> the code to a minimal reproducible example (as best I could).
Thanks for the minimal reproducer! I confirm that this problem is visible
in the v12 and v13 branches, but not before or after. On the master
branch, it was introduced at
Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Branch: master Release: REL_14_BR [25936fd46] 2021-02-27 18:09:15 -0300
Fix use-after-free bug with AfterTriggersTableData.storeslot
and repaired, seemingly accidentally, by
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Branch: master Release: REL_14_BR [c5b7ba4e6] 2021-04-06 15:57:11 -0400
Postpone some stuff out of ExecInitModifyTable.
25936fd46 was back-patched into v12 and v13, which is why they show
the bug. c5b7ba4e6 was not back-patched, both because it'd be way too
invasive and because we weren't aware that it was fixing anything.
I'm not really sure that it *did* fix anything --- perhaps there are
variants of this example that still fail?
cc'ing Alvaro for comment.
regards, tom lane
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | thomas(dot)d(dot)mckay(at)gmail(dot)com |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction |
Date: | 2022-09-25 18:24:07 |
Message-ID: | 3065565.1664130247@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
I wrote:
> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
>> I've encountered a bug while doing some SQL migrations on a PostgreSQL 13.8
>> server. My initial use-case is a Django app running migrations, a test setup
>> and a test, but that code was more than 10K lines of DDL, so I've reduced
>> the code to a minimal reproducible example (as best I could).
> Thanks for the minimal reproducer! I confirm that this problem is visible
> in the v12 and v13 branches, but not before or after.
I got a chance to look at this today, and what I find is that we're making
a tuple slot that will be freed at end of subtransaction, but it has a
refcount on a refcounted tuple descriptor that belongs to the Portal's
resowner. While trying to free the slot, we get a complaint because the
subtransaction's resowner isn't holding the appropriate tupdesc reference.
It seems to me there are probably other hazards here, because the tupdesc
could possibly also go away before the slot does. I think what we ought
to do is copy the tupdesc, so that we have a non-refcounted descriptor
that we know has exactly the right lifespan. As attached.
Not sure about whether to bother with a test case. The submitted
reproducer doesn't work in >= v14, and even in v12/v13 it seems awfully
corner-case-ish. (I find for instance that you need the ALTER TABLE
step rather than just making the table in one step, and I'm a tad
baffled as to why.) In any case, I've got zero faith that there are
not other ways to trigger the same problem, so I think we need to apply
this up to HEAD.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-trigger-tupdesc-refcount-issue.patch | text/x-diff | 968 bytes |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | thomas(dot)d(dot)mckay(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction |
Date: | 2022-09-25 19:01:15 |
Message-ID: | 3092360.1664132475@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 캔SQL : Postg토토 캔SQL 메일 링리스트 : 2022-09-25 이후 PGSQL-BUGS 19:01 |
I wrote:
> It seems to me there are probably other hazards here, because the tupdesc
> could possibly also go away before the slot does. I think what we ought
> to do is copy the tupdesc, so that we have a non-refcounted descriptor
> that we know has exactly the right lifespan. As attached.
Actually, after looking around some more, I realize that there is a
second mistake in 25936fd46: it ignored the comment on
AfterTriggerFreeQuery that
* Note: it's important for this to be safe if interrupted by an error
* and then called again for the same query level.
This is the reason why we end up in a recursive error and PANIC:
we keep trying to free the tupdesc again after the previous error.
If I fix that but omit the CreateTupleDescCopy step, then the
reproducer behaves much more sanely:
psql:bug17607.sql:29: WARNING: AbortSubTransaction while in ABORT state
psql:bug17607.sql:29: ERROR: BOOM !
CONTEXT: PL/pgSQL function on_tbl_parent_id_change_fn() line 3 at RAISE
ERROR: tupdesc reference 0x7fdef236a1b8 is not owned by resource owner SubTransaction
ROLLBACK
ROLLBACK
So what we actually need here is more like the attached.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-trigger-tupdesc-refcount-issue-2.patch | text/x-diff | 1.3 KB |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | thomas(dot)d(dot)mckay(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction |
Date: | 2022-09-25 20:34:05 |
Message-ID: | 3136668.1664138045@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토SQL : Postg토토SQL 메일 링리스트 : 2022-09-25 이후 PGSQL-BUGS 20:34 |
... and after yet more poking at it, I see why c5b7ba4e6 masked
the problem: it added an optimization such that we don't use the
storeslot at all unless a tuple mapping conversion is required.
That led me to the test case shown in the attached, which exhibits
this symptom in all branches since v12.
I had been planning to let this wait until after 15rc1 wrap,
but now that I have evidence that the bug is still live in v15,
I'm going to go ahead and push. Both components of the fix
seem like extremely safe changes, even for the day before wrap.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-trigger-tupdesc-refcount-issue-3.patch | text/x-diff | 7.7 KB |