Quick Links

BUG #16184: Segmentation Fault during update

Lists:	pgsql-bugs

From:	PG Bug reporting form <noreply(at)postgresql(dot)org>
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc:	piotrwlodarczyk89(at)gmail(dot)com
Subject:	BUG #16184: Segmentation Fault during update
Date:	2020-01-02 15:16:45
Message-ID:	16184-2f718844be10c863@postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16184
Logged by: Piotr Wlodarczyk
Email address: piotrwlodarczyk89(at)gmail(dot)com
PostgreSQL version: 12.1
Operating system: Linux dev-g1-db-synchro.gnb.adm 3.10.0-1062.7.1.el
Description:

Hello geeks,

We're preparing to upgrade postgresql from 11.5 to 12.1 but still we have
some bugs and issues. Most of them are our fault, but now we have problem we
don't know how to play with it.

Log:
2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
server process (PID 28879) was terminated by signal 11: Segmentation fault
2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL:
Failed process was running: UPDATE job_abstract a
SET job_status = job_status
WHERE (a.xmax :: TEXT :: BIGINT < (SELECT

txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT

txid_snapshot_xmin(

txid_current_snapshot())

<= (SELECT

txid_snapshot_xmax(

txid_current_snapshot()))))
AND job_status = 0 AND NOT xmax = 0;

Please don't ask me why this update is made in this way - I suppose to set
xmax to 0 but why? I don't know. Its legacy code, being part of core of our
application.

Reproducting this bug is not easy and I don't know which conditions has to
be met to replicate this bug. But during application is working and we made
this update, segmentation fault occurs. But sometimes it works.

Column job_status is telling application that this row should be preceded
one more time. Up to 11.5 this bug doesn't exists.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	piotrwlodarczyk89(at)gmail(dot)com
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #16184: Segmentation Fault during update
Date:	2020-01-02 15:33:15
Message-ID:	17312.1577979195@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 핫SQL : Postg토토 핫SQL 메일 링리스트 : 2020-01-02 이후 PGSQL-BUGS 15:33

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> Log:
> 2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
> server process (PID 28879) was terminated by signal 11: Segmentation fault
> 2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL:
> Failed process was running: UPDATE job_abstract a
> SET job_status = job_status
> WHERE (a.xmax :: TEXT :: BIGINT < (SELECT
> txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT

Clearly a bug, but there's not much we can do with this amount of
information. If you can provide a stack trace, that'd be helpful.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

> Reproducting this bug is not easy and I don't know which conditions has to
> be met to replicate this bug. But during application is working and we made
> this update, segmentation fault occurs. But sometimes it works.

My gut instinct is that this might be a concurrent-update problem,
which would explain the lack of reproducibility. Try to get
multiple sessions to update the same table row(s) at the same time.

Also, we had some bugs of that ilk on 12.0, please make sure you
really are updated to 12.1.

regards, tom lane

From:	Piotr Włodarczyk <piotrwlodarczyk89(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #16184: Segmentation Fault during update
Date:	2020-01-31 12:59:11
Message-ID:	CAP-dhMo+cYuDtRvcwq_MziauaJrJw-_8VJpoy_MG1eH548Ym5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi,

sorry for the late reply, but it was not so easy to get more logs. So what
we have now is listed below:

(gdb) bt
#0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate(at)entry=0x2ae9520)
at execMain.c:2735
#1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate(at)entry=0x2ae9520,
relation=relation(at)entry=0x7f694c76f838, rti=14,
inputslot=inputslot(at)entry=0x31bc408)
at execMain.c:2454
#2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate(at)entry=0x2ae9428,
tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
planSlot=0x312fc28, epqstate=epqstate(at)entry=0x2ae9520,
estate=estate(at)entry=0x2aff450,
canSetTag=true) at nodeModifyTable.c:1387
#3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
nodeModifyTable.c:2223
#4 0x0000000000614302 in ExecProcNode (node=0x2ae9428) at
../../../src/include/executor/executor.h:239
#5 ExecutePlan (execute_once=<optimized out>, dest=0x2f82d88,
direction=<optimized out>, numberTuples=0, sendTuples=false,
operation=CMD_UPDATE, use_parallel_mode=<optimized out>,
planstate=0x2ae9428, estate=0x2aff450) at execMain.c:1646
#6 standard_ExecutorRun (queryDesc=0x2c7c460, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:364
#7 0x000000000076365a in ProcessQuery (plan=<optimized out>,
sourceText=0x2a9d7a0 " UPDATE job_abstract a\n SET job_status =
job_status\n WHERE (a.xmax :: TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39
times>, "txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n
"..., params=0x0, queryEnv=0x0,
dest=0x2f82d88, completionTag=0x7ffc94e3d250 "") at pquery.c:161
#8 0x0000000000763882 in PortalRunMulti (portal=portal(at)entry=0x2be30a0,
isTopLevel=isTopLevel(at)entry=true, setHoldSnapshot=setHoldSnapshot(at)entry=false,
dest=dest(at)entry=0x2f82d88, altdest=altdest(at)entry=0x2f82d88,
completionTag=completionTag(at)entry=0x7ffc94e3d250 "") at pquery.c:1283
#9 0x00000000007643b5 in PortalRun (portal=portal(at)entry=0x2be30a0,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true,
run_once=run_once(at)entry=true, dest=dest(at)entry=0x2f82d88,
altdest=altdest(at)entry=0x2f82d88,
completionTag=completionTag(at)entry=0x7ffc94e3d250
"") at pquery.c:796
#10 0x00000000007604b5 in exec_simple_query (query_string=0x2a9d7a0 "
UPDATE job_abstract a\n SET job_status = job_status\n WHERE (a.xmax ::
TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39 times>,
"txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n "...) at
postgres.c:1215
#11 0x0000000000761782 in PostgresMain (argc=<optimized out>,
argv=argv(at)entry=0x2ac50a8, dbname=0x2ac4f88 "synchro_test",
username=<optimized out>) at postgres.c:4236
#12 0x0000000000483d47 in BackendRun (port=<optimized out>, port=<optimized
out>) at postmaster.c:4437
#13 BackendStartup (port=0x2abd4c0) at postmaster.c:4128
#14 ServerLoop () at postmaster.c:1704
#15 0x00000000006f106f in PostmasterMain (argc=argc(at)entry=7,
argv=argv(at)entry=0x2a98330)
at postmaster.c:1377
#16 0x0000000000484cd3 in main (argc=7, argv=0x2a98330) at main.c:228
(gdb)

Is it enought?

On Thu, Jan 2, 2020 at 4:33 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > Log:
> > 2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
> > server process (PID 28879) was terminated by signal 11: Segmentation
> fault
> > 2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000
> DETAIL:
> > Failed process was running: UPDATE job_abstract a
> > SET job_status = job_status
> > WHERE (a.xmax :: TEXT :: BIGINT < (SELECT
> > txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT
>
> Clearly a bug, but there's not much we can do with this amount of
> information. If you can provide a stack trace, that'd be helpful.
>
>
> https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
>
> > Reproducting this bug is not easy and I don't know which conditions has
> to
> > be met to replicate this bug. But during application is working and we
> made
> > this update, segmentation fault occurs. But sometimes it works.
>
> My gut instinct is that this might be a concurrent-update problem,
> which would explain the lack of reproducibility. Try to get
> multiple sessions to update the same table row(s) at the same time.
>
> Also, we had some bugs of that ilk on 12.0, please make sure you
> really are updated to 12.1.
>
> regards, tom lane
>

Pozdrawiam
Piotr Włodarczyk

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Piotr Włodarczyk <piotrwlodarczyk89(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #16184: Segmentation Fault during update
Date:	2020-01-31 17:41:18
Message-ID:	11317.1580492478@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

=?utf-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89(at)gmail(dot)com> writes:
> sorry for the late reply, but it was not so easy to get more logs. So what
> we have now is listed below:

> (gdb) bt
> #0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate(at)entry=0x2ae9520)
> at execMain.c:2735
> #1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate(at)entry=0x2ae9520,
> relation=relation(at)entry=0x7f694c76f838, rti=14,
> inputslot=inputslot(at)entry=0x31bc408)
> at execMain.c:2454
> #2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate(at)entry=0x2ae9428,
> tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
> planSlot=0x312fc28, epqstate=epqstate(at)entry=0x2ae9520,
> estate=estate(at)entry=0x2aff450,
> canSetTag=true) at nodeModifyTable.c:1387
> #3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
> nodeModifyTable.c:2223

OK, so that confirms my suspicion that it was an EvalPlanQual issue
(hence, you need concurrent updates on the same row to trigger it).

I believe this might be the same bug we isolated and fixed just a
couple of days ago:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=87fed2a197abc1397b63ee74b3fa7eb20471fff5

The stack trace looks the same as what Oleksii showed in that thread
for a non-assert build. As far as I know, that bug only triggers if
a single ModifyTable plan node has to run EPQ checks for more than
one target table --- so, is this query updating a partitioned table
or inheritance tree, and is it plausible that it had to update rows
in more than one sub-table, and that some other transaction(s) were
concurrently updating those same rows?

regards, tom lane

From:	Piotr Włodarczyk <piotrwlodarczyk89(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #16184: Segmentation Fault during update
Date:	2020-01-31 22:25:44
Message-ID:	CAP-dhMpT8g4gRwoOBmx6V21up_Q7wvXa3RoSVX=nTD9XQtgKpQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi,

I don't know what is EvalPlanQual but conditions you wrote about are met.

Table which is updated is main table and it name is job_abstract. The're a
lot of another tables for example job_bca, job_mod, job_smth which extends
job_abstract.

In my case situation is as follows: in the same time it's possible that
some different transactions are in progress (started but not
committed yet). The're some legacy (looks like some kind of guard) that
update under some conditions table job_abstract. In another transaction(s)
table job_bca (which extends job_abstract - inheritance) is updated too.
This two (or more) transactions are updating the same rows, but different
set of fields. I think that author of this update tried to force change
xmax of job_abstract making dumb update job_status = 0 where ... job_status
= 0. But everything works fine to postgres 11.5. When we upgreade to 12.1
segmentation fault occurs. But i saw in commit you linked to description:
"Back-patch to v12 where the bug was introduced".

On Fri, Jan 31, 2020 at 6:41 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?utf-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89(at)gmail(dot)com> writes:
> > sorry for the late reply, but it was not so easy to get more logs. So
> what
> > we have now is listed below:
>
> > (gdb) bt
> > #0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate(at)entry
> =0x2ae9520)
> > at execMain.c:2735
> > #1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate(at)entry
> =0x2ae9520,
> > relation=relation(at)entry=0x7f694c76f838, rti=14,
> > inputslot=inputslot(at)entry=0x31bc408)
> > at execMain.c:2454
> > #2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate(at)entry=0x2ae9428,
> > tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
> > planSlot=0x312fc28, epqstate=epqstate(at)entry=0x2ae9520,
> > estate=estate(at)entry=0x2aff450,
> > canSetTag=true) at nodeModifyTable.c:1387
> > #3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
> > nodeModifyTable.c:2223
>
> OK, so that confirms my suspicion that it was an EvalPlanQual issue
> (hence, you need concurrent updates on the same row to trigger it).
>
> I believe this might be the same bug we isolated and fixed just a
> couple of days ago:
>
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=87fed2a197abc1397b63ee74b3fa7eb20471fff5
>
> The stack trace looks the same as what Oleksii showed in that thread
> for a non-assert build. As far as I know, that bug only triggers if
> a single ModifyTable plan node has to run EPQ checks for more than
> one target table --- so, is this query updating a partitioned table
> or inheritance tree, and is it plausible that it had to update rows
> in more than one sub-table, and that some other transaction(s) were
> concurrently updating those same rows?
>
> regards, tom lane
>

Pozdrawiam
Piotr Włodarczyk

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Piotr Włodarczyk <piotrwlodarczyk89(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #16184: Segmentation Fault during update
Date:	2020-01-31 22:30:16
Message-ID:	23897.1580509816@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

=?utf-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89(at)gmail(dot)com> writes:
> I don't know what is EvalPlanQual but conditions you wrote about are met.

OK, then I'm pretty confident that patch will fix your problem.
It'll be in 12.2 which is due out in two weeks.

regards, tom lane

From:	Piotr Włodarczyk <piotrwlodarczyk89(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #16184: Segmentation Fault during update
Date:	2020-01-31 22:32:17
Message-ID:	CAP-dhMq-ddfd8iywr1YirHz=aXMT=JCDeVUiFDexLMEw5z4cVw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg롤 토토SQL : Postg롤 토토SQL 메일 링리스트 : 2020-01-31 이후 PGSQL-BUGS 22:32

Ok,

thanks.

On Fri, Jan 31, 2020 at 11:30 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?utf-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89(at)gmail(dot)com> writes:
> > I don't know what is EvalPlanQual but conditions you wrote about are met.
>
> OK, then I'm pretty confident that patch will fix your problem.
> It'll be in 12.2 which is due out in two weeks.
>
> regards, tom lane
>

Pozdrawiam
Piotr Włodarczyk