Lists: | pgsql-bugs |
---|
From: | Christoph Berg <christoph(dot)berg(at)credativ(dot)de> |
---|---|
To: | PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-13 09:41:17 |
Message-ID: | 20160413094117.GC21485@msg.credativ.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Hi,
I've just uploaded 9.4.7 for Debian's wheezy-backports suite.
Unfortunately it's now crashing twice in the contrib/test_decoding
regression tests on sparc. I could reproduce the error on the porter
box:
LOG: server process (PID 22534) was terminated by signal 10: Bus error
DETAIL: Failed process was running: SELECT data
FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
WHERE data ~ 'UPDATE';
LOG: server process (PID 22558) was terminated by signal 10: Bus error
DETAIL: Failed process was running: SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
Reading symbols from /home/myon/postgresql-9.4-9.4.7/build/src/backend/postgres...done.
[New LWP 22558]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: myon regression [local] SELECT '.
Program terminated with signal 10, Bus error.
#0 ReorderBufferRestoreChange (data=0x5de0f7 "", txn=0x622660, rb=0x6225d8)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:2327
2327 Size tuplelen = ((HeapTuple) data)->t_len;
(gdb) l
2322 data += tuplelen;
2323 }
2324
2325 if (change->data.tp.newtuple)
2326 {
2327 Size tuplelen = ((HeapTuple) data)->t_len;
2328
2329 change->data.tp.newtuple =
2330 ReorderBufferGetTupleBuf(rb, tuplelen - offsetof(HeapTupleHeaderData, t_bits));
2331
(gdb) bt full
#0 ReorderBufferRestoreChange (data=0x5de0f7 "", txn=0x622660, rb=0x6225d8)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:2327
tuplelen = <optimized out>
ondisk = <optimized out>
change = 0x11ea888
#1 ReorderBufferRestoreChanges (rb=0x6225d8, txn=0x622660, fd=0x79860910, segno=0x79860918)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:2269
ondisk = <optimized out>
restored = <optimized out>
last_segno = 1
cleanup_iter = {cur = <optimized out>, next = <optimized out>, end = 0x6226b0}
__func__ = "ReorderBufferRestoreChanges"
#2 0x00261790 in ReorderBufferIterTXNNext (state=0x798608f0, rb=0x6225d8)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:981
change = 0x7985f858
entry = 0x79860900
off = 0
#3 ReorderBufferCommit (rb=0x6225d8, xid=<optimized out>, commit_lsn=27808752, end_lsn=<optimized out>,
commit_time=513854161898478)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:1355
change = 0x7985f858
save_exception_stack = 0xffbc6538
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {-1279391982, -1279392038, 1277457654}, __mask_was_saved = 0,
__saved_mask = {__val = {0, 4290536232, 2500668, 1296, 6195616, 4290536240, 3926116, 4290536336, 1,
5595136, 5598208, 0, 0, 1, 26206208, 0, 0, 0, 0, 0, 0, 1296, 1295, 1, 15213480, 0, 1, 1296, 0, 0,
1296, 0}}}}
txn = 0x622660
snapshot_now = 0x5ddfa0
command_id = 1
iterstate = 0x798608f0
__func__ = "ReorderBufferCommit"
#4 0x0025c9bc in DecodeXactOp (buf=0xffbc6460, ctx=0x6145c0)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/decode.c:226
xlrec = <optimized out>
subxacts = <optimized out>
invals = <optimized out>
builder = <optimized out>
reorder = <optimized out>
info = <optimized out>
r = 0xffbc6470
#5 LogicalDecodingProcessRecord (ctx=0x6145c0, record=0x6179a0)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/decode.c:111
buf = {origptr = 27808752, endptr = 27809568, record = {xl_tot_len = 812, xl_xid = 1296, xl_len = 780,
xl_info = 0 '\000', xl_rmid = 1 '\001', xl_prev = 27808688, xl_crc = 1979174325},
record_data = 0x6179c0 ""}
__func__ = "LogicalDecodingProcessRecord"
#6 0x0025e69c in pg_logical_slot_get_changes_guts (fcinfo=0xffbc6694, confirm=<optimized out>,
binary=<optimized out>)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/logicalfuncs.c:440
record = <optimized out>
errm = 0x0
save_exception_stack = 0xffbc6b28
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {-1279393398, -1279393542, 1277657190}, __mask_was_saved = 0,
__saved_mask = {__val = {4290536824, 1264536, 1314, 5446656, 5447680, 0, 5447680, 0, 0, 0, 0, 1, 25, 0,
0, 0, 0, 1, 3782, 1, 4290537052, 1, 0, 4294967295, 1, 0, 4290536944, 2309772, 0, 0, 0, 4290537540}}}}
name = 0x0
upto_lsn = 0
upto_nchanges = 0
rsinfo = <optimized out>
per_query_ctx = <optimized out>
oldcontext = 0x5e8320
end_of_wal = 28052656
startptr = 0
ctx = 0x6145c0
old_resowner = 0x5e7df8
arr = <optimized out>
ndim = <optimized out>
options = 0x60dfd0
p = 0x60e598
__func__ = "pg_logical_slot_get_changes_guts"
#7 0x001ab830 in ExecMakeTableFunctionResult (funcexpr=0x60ccd8, econtext=0x60c6a0, argContext=<optimized out>,
expectedDesc=0x60d548, randomAccess=0 '\000')
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execQual.c:2196
result = 4290537288
tupstore = 0x0
tupdesc = 0x0
funcrettype = 2249
returnsTuple = <optimized out>
returnsSet = 1 '\001'
fcinfo = {flinfo = 0x60cce8, context = 0x0, resultinfo = 0xffbc6640, fncollation = 100, isnull = 0 '\000',
nargs = 4, arg = {5960656, 0, 0, 5959720, 2, 5734184, 31808, 15213480, 4290537200, 2212260, 5732864,
5595136, 0, 0, 1072693248, 0, 5959856, 5598208, 0, 5959856, 5961824, 2, 700, 1884286976, 5732864,
5732032, 5733440, 0, 4294836227, 0, 0, 15213480, 184, 131328, 33554432, 0, 5638448, 0, 25, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 15213480, 0, 0, 0, 0, 0, 0, 0, 0, 4294836227, 0, 0, 3314144, 184, 131328, 33554432, 0,
5638448, 0, 25, 0, 0, 0, 0, 15213480, 0 <repeats 26 times>},
argnull = "\000\001\001\000\000\000\000\000\000\000\000\000\000\062\221\340\000\000\000\270\000\002\001\000\002\000\000\000\000\000\000\000\000V\t0\000\000\000\000\000\000\000\031", '\000' <repeats 55 times>}
fcusage = {fs = 0x0, save_f_total_time = {tv_sec = 5891064, tv_usec = 1}, save_total = {tv_sec = 4039680,
tv_usec = 1}, f_start = {tv_sec = 0, tv_usec = 1072693248}}
rsinfo = {type = T_ReturnSetInfo, econtext = 0x60c6a0, expectedDesc = 0x60d548, allowedModes = 11,
returnMode = SFRM_Materialize, isDone = ExprSingleResult, setResult = 0x60e068, setDesc = 0x60e768}
tmptup = {t_len = 0, t_self = {ip_blkid = {bi_hi = 61, bi_lo = 47104}, ip_posid = 89}, t_tableOid = 0,
t_data = 0x79820c1c}
callerContext = 0x5e8298
direct_function_call = <optimized out>
first_time = 1 '\001'
__func__ = "ExecMakeTableFunctionResult"
#8 0x001bdf90 in FunctionNext (node=0x60c618)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/nodeFunctionscan.c:94
tstore = 0x0
estate = <optimized out>
direction = ForwardScanDirection
scanslot = 0x60c780
alldone = <optimized out>
oldpos = <optimized out>
funcno = <optimized out>
att = <optimized out>
#9 0x001ad7f0 in ExecScanFetch (recheckMtd=0x1bdca0 <FunctionRecheck>, accessMtd=0x1bdcc0 <FunctionNext>,
node=0x60c618) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execScan.c:82
estate = <optimized out>
#10 ExecScan (node=0x60c618, accessMtd=0x1bdcc0 <FunctionNext>, recheckMtd=0x1bdca0 <FunctionRecheck>)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execScan.c:167
slot = 0x60c618
econtext = 0x60c6a0
qual = 0x0
projInfo = 0x60daa8
isDone = 5732760
resultSlot = <optimized out>
#11 0x001a6790 in ExecProcNode (node=0x60c618)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execProcnode.c:426
result = <optimized out>
__func__ = "ExecProcNode"
#12 0x001a35dc in ExecutePlan (dest=0x578840, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_SELECT, planstate=0x60c618, estate=0x60c590)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execMain.c:1490
slot = <optimized out>
current_tuple_count = 0
#13 standard_ExecutorRun (queryDesc=0x5face0, direction=<optimized out>, count=0)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execMain.c:319
estate = 0x60c590
operation = CMD_SELECT
dest = 0x578840
sendTuples = <optimized out>
oldcontext = 0x5e8ad8
#14 0x002ab0d4 in PortalRunSelect (portal=0x5ebb60, forward=1 '\001', count=0, dest=0x578840)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/tcop/pquery.c:942
queryDesc = 0x5face0
direction = ForwardScanDirection
nprocessed = <optimized out>
__func__ = "PortalRunSelect"
#15 0x002ac45c in PortalRun (portal=0x5ebb60, count=2147483647, isTopLevel=1 '\001', dest=0x578840,
altdest=0x578840, completionTag=0xffbc6d20 "")
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/tcop/pquery.c:786
save_exception_stack = 0xffbc6c64
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {-1279389822, -1279390062, 1277732494}, __mask_was_saved = 0,
__saved_mask = {__val = {4290538424, 5639264, 16777218, 0, 2, 1314, 0, 1276, 5595424, 5598208, 5444608,
5595136, 5381120, 161, 5640592, 0, 5732840, 0, 0, 0, 0, 5732824, 4290538424, 2786944, 102, 6192632,
2, 1, 0, 1, 1024, 0}}}}
result = <optimized out>
nprocessed = <optimized out>
saveTopTransactionResourceOwner = 0x560c60
saveTopTransactionContext = 0x560bd8
saveActivePortal = 0x0
saveResourceOwner = 0x560c60
savePortalContext = 0x0
saveMemoryContext = 0x560bd8
__func__ = "PortalRun"
#16 0x002a8558 in exec_simple_query (
query_string=0x5ae0d0 "SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');")
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/tcop/postgres.c:1072
parsetree = 0x5aef78
portal = 0x5ebb60
snapshot_set = <optimized out>
commandTag = <optimized out>
completionTag = "\000OPY 203\000LE\000\000CE", '\000' <repeats 32 times>, "\024\000\000\000\024\377\377\377\377\000S\024\204\000S\023\204"
plantree_list = 0x578828
receiver = 0x578840
format = 0
dest = DestRemote
parsetree_list = 0x5af010
save_log_statement_stats = 0 '\000'
was_logged = 0 '\000'
msec_str = "\000\274m\210\000\005\365D\000\000\000\000\000W\335\f0\n\000\034\000\000\000 \000\000\000\000\000\000\000\002"
parsetree_item = 0x5af000
isTopLevel = <optimized out>
#17 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=0x55fb00 "regression", username=<optimized out>)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/tcop/postgres.c:4079
query_string = 0x5ae0d0 "SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');"
firstchar = 5959544
input_message = {
data = 0x5ae0d0 "SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');", len = 139, maxlen = 1024, cursor = 139}
local_sigjmp_buf = {{__jmpbuf = {-1279390062, -1279391582, 1277696902}, __mask_was_saved = 1,
__saved_mask = {__val = {0, 0, 0, 0, 1882843284, 1879202304, 1881990312, 0, 1, 0, 0, 0, 1881277836, 4,
0, 0, 0, 0, 0, 0, 1886277632, 1886289096, 1886289096, 5758488, 0, 0, 0, 4290538792, 1885242328,
4290538768, 2359384, 0}}}}
send_ready_for_query = 0 '\000'
__func__ = "PostgresMain"
#18 0x0005f724 in BackendRun (port=0x57dc80)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/postmaster/postmaster.c:4285
ac = 1
secs = 513854165
usecs = 348281
i = 1
av = 0x55fc70
maxac = <optimized out>
#19 BackendStartup (port=0x57dc80)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/postmaster/postmaster.c:3948
bn = 0x10b3
pid = 0
#20 ServerLoop () at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/postmaster/postmaster.c:1679
port = 0x57dc80
rmask = {fds_bits = {16, 0 <repeats 31 times>}}
selres = <optimized out>
now = <optimized out>
readmask = {fds_bits = {16, 0 <repeats 31 times>}}
nSockets = 5
last_lockfile_recheck_time = 1460538957
last_touch_time = 1460538957
__func__ = "ServerLoop"
#21 0x0024c04c in PostmasterMain (argc=8, argv=<optimized out>)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/postmaster/postmaster.c:1287
opt = <optimized out>
status = <optimized out>
userDoption = <optimized out>
listen_addr_saved = <optimized out>
i = <optimized out>
output_config_variable = <optimized out>
__func__ = "PostmasterMain"
#22 0x00060750 in main (argc=8, argv=0x55ef80)
at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/main/main.c:233
No locals.
Original build log (but doesn't have much useful content):
https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.4&arch=sparc&ver=9.4.7-0%2Bdeb8u1~bpo70%2B1&stamp=1460482241
Mit freundlichen Grüßen,
Christoph Berg
--
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Christoph Berg <christoph(dot)berg(at)credativ(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-13 16:19:06 |
Message-ID: | 20160413161906.aykti6iwdc7zv7l6@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Hi,
On 2016-04-13 11:41:17 +0200, Christoph Berg wrote:
> I've just uploaded 9.4.7 for Debian's wheezy-backports suite.
> Unfortunately it's now crashing twice in the contrib/test_decoding
> regression tests on sparc. I could reproduce the error on the porter
> box:
>
> LOG: server process (PID 22534) was terminated by signal 10: Bus error
> DETAIL: Failed process was running: SELECT data
> FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
> WHERE data ~ 'UPDATE';
>
> LOG: server process (PID 22558) was terminated by signal 10: Bus error
> DETAIL: Failed process was running: SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
>
>
> Reading symbols from /home/myon/postgresql-9.4-9.4.7/build/src/backend/postgres...done.
> [New LWP 22558]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1".
> Core was generated by `postgres: myon regression [local] SELECT '.
> Program terminated with signal 10, Bus error.
> #0 ReorderBufferRestoreChange (data=0x5de0f7 "", txn=0x622660, rb=0x6225d8)
> at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:2327
> 2327 Size tuplelen = ((HeapTuple) data)->t_len;
> (gdb) l
> 2322 data += tuplelen;
> 2323 }
> 2324
> 2325 if (change->data.tp.newtuple)
> 2326 {
> 2327 Size tuplelen = ((HeapTuple) data)->t_len;
Hrmpf, I guess this is an alignment issue. We should probably replace
that ->t_len lookup with a memcpy. IIRC it's not entirely trivial to
access porter boxes anymore - could you perhaps just replace those two
t_len lookups with a memcpy (i.e. memcpy(&tuplelen, &(HeapTuple)
data)->t_len, sizeof(uint32))?
Greetings,
Andres Freund
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Christoph Berg <christoph(dot)berg(at)credativ(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-13 17:54:10 |
Message-ID: | 13165.1460570050@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2016-04-13 11:41:17 +0200, Christoph Berg wrote:
>> I've just uploaded 9.4.7 for Debian's wheezy-backports suite.
>> Unfortunately it's now crashing twice in the contrib/test_decoding
>> regression tests on sparc. I could reproduce the error on the porter
>> box:
> Hrmpf, I guess this is an alignment issue.
Maybe, but then why didn't the buildfarm detect it? We certainly
have several critters that would be sticky about this.
regards, tom lane
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Christoph Berg <christoph(dot)berg(at)credativ(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-13 17:58:27 |
Message-ID: | 20160413175827.dmlbtdf7c3mgmnex@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On 2016-04-13 13:54:10 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2016-04-13 11:41:17 +0200, Christoph Berg wrote:
> >> I've just uploaded 9.4.7 for Debian's wheezy-backports suite.
> >> Unfortunately it's now crashing twice in the contrib/test_decoding
> >> regression tests on sparc. I could reproduce the error on the porter
> >> box:
>
> > Hrmpf, I guess this is an alignment issue.
>
> Maybe, but then why didn't the buildfarm detect it? We certainly
> have several critters that would be sticky about this.
I'm wondering the same. My theory is that it might only actually matter
when built optimized; which few animals do. The instruction selection
definitely can influence whether an alignment issue becomes aparent or
not. Unless that idea pans out I guess I'll have to go through the
motions to get access to that porter box. I had that at some point in
the past (debugging something turning out to be a compiler bug).
Andres
From: | Christoph Berg <christoph(dot)berg(at)credativ(dot)de> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-13 21:11:52 |
Message-ID: | 20160413211152.GA21711@msg.credativ.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Re: Andres Freund 2016-04-13 <20160413161906(dot)aykti6iwdc7zv7l6(at)alap3(dot)anarazel(dot)de>
> > 2327 Size tuplelen = ((HeapTuple) data)->t_len;
>
> Hrmpf, I guess this is an alignment issue. We should probably replace
> that ->t_len lookup with a memcpy. IIRC it's not entirely trivial to
> access porter boxes anymore - could you perhaps just replace those two
> t_len lookups with a memcpy (i.e. memcpy(&tuplelen, &(HeapTuple)
> data)->t_len, sizeof(uint32))?
That fixed the contrib/test_decoding checks, thanks.
Will do another full build over night to double check.
Mit freundlichen Grüßen,
Christoph Berg
--
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE
Attachment | Content-Type | Size |
---|---|---|
pglogical-alignment | text/plain | 870 bytes |
From: | Christoph Berg <myon(at)debian(dot)org> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 09:29:20 |
Message-ID: | 20160414092919.GA11804@msg.df7cb.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Re: Andres Freund 2016-04-13 <20160413175827(dot)dmlbtdf7c3mgmnex(at)alap3(dot)anarazel(dot)de>
> Unless that idea pans out I guess I'll have to go through the
> motions to get access to that porter box. I had that at some point in
> the past (debugging something turning out to be a compiler bug).
Your account is still there, but these guest accounts are always only
unlocked for a limited time, so it's not functional now but could be
re-enabled easily. But ...
Re: To Andres Freund 2016-04-13 <20160413211152(dot)GA21711(at)msg(dot)credativ(dot)de>
> That fixed the contrib/test_decoding checks, thanks.
> Will do another full build over night to double check.
... the patch worked indeed, thanks!
Also, the sparc architecture will be officially dead at the end of the
month because wheezy is EOL and wheezy-lts + wheezy-backports will be
amd64+i386 only. (We'll still have sparc64, but it's not a release
architecture. "sparc" is not even on debian-ports.org...)
Christoph
From: | Christoph Berg <myon(at)debian(dot)org> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 11:25:18 |
Message-ID: | 20160414112518.GE11804@msg.df7cb.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Re: To Andres Freund 2016-04-14 <20160414092919(dot)GA11804(at)msg(dot)df7cb(dot)de>
> > That fixed the contrib/test_decoding checks, thanks.
> > Will do another full build over night to double check.
>
> ... the patch worked indeed, thanks!
Duh, the patch does work on sparc, but it breaks amd64:
*** /home/cbe/projects/postgresql/debian/9.4/build/../contrib/test_decoding/expected/ddl.out 2016-03-28 22:09:55.000000000 +0200
--- /home/cbe/projects/postgresql/debian/9.4/build/contrib/test_decoding/./regression_output/results/ddl.out 2016-04-14 13:12:45.310632488 +0200
***************
*** 189,201 ****
FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
GROUP BY substring(data, 1, 24)
ORDER BY 1,2;
! count | min | max
! -------+-------------------------------------------------+------------------------------------------------------------------------
! 1 | BEGIN | BEGIN
! 1 | COMMIT | COMMIT
! 20467 | table public.tr_etoomuch: DELETE: id[integer]:1 | table public.tr_etoomuch: UPDATE: id[integer]:9999 data[integer]:-9999
! (3 rows)
!
-- check updates of primary keys work correctly
BEGIN;
CREATE TABLE spoolme AS SELECT g.i FROM generate_series(1, 5000) g(i);
--- 189,195 ----
FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')
GROUP BY substring(data, 1, 24)
ORDER BY 1,2;
! ERROR: invalid memory alloc request size 94596654694480
-- check updates of primary keys work correctly
BEGIN;
CREATE TABLE spoolme AS SELECT g.i FROM generate_series(1, 5000) g(i);
Christoph
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Christoph Berg <myon(at)debian(dot)org> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 17:11:43 |
Message-ID: | 423.1460653903@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Christoph Berg <myon(at)debian(dot)org> writes:
>> ... the patch worked indeed, thanks!
> Duh, the patch does work on sparc, but it breaks amd64:
Well, yeah, because if Size is wider than int32 then that memcpy isn't
enough to initialize it.
But looking around a bit, I think that ReorderBufferRestoreChange is
almost completely broken on alignment-picky machines, if the expectation
is that the input data could be aligned arbitrarily. Aside from this
particular problem, it contains multiple places where we cast "data"
to something other than char*, and that is sufficient to allow a
spec-compliant compiler to decide that "data" is aligned on more than
a char boundary and hence generate code that depends on such alignment.
I also noticed that the INTERNAL_SNAPSHOT case doesn't bother to advance
"data" past the data read, where all the other switch cases do. In the
attached, I made it do likewise, but it looks like you could equally
easily just drop the final update of "data" in each switch case,
because nothing is looking at it after the switch.
Proposed patch attached, but I'm still wondering why the alignment-picky
buildfarm members aren't all failing on this. It seems to strongly
suggest that the regression tests' coverage is lacking here.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
alignment-fixup.patch | text/x-diff | 3.6 KB |
From: | Christoph Berg <myon(at)debian(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 18:36:02 |
Message-ID: | 20160414183602.GA22486@msg.df7cb.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Re: Tom Lane 2016-04-14 <423(dot)1460653903(at)sss(dot)pgh(dot)pa(dot)us>
> Proposed patch attached, but I'm still wondering why the alignment-picky
> buildfarm members aren't all failing on this. It seems to strongly
> suggest that the regression tests' coverage is lacking here.
This patch works both on sparc and amd64. Thanks!
(Tested on 9.4.7 only.)
Christoph
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 18:55:52 |
Message-ID: | 20160414185552.szg5zzwf4q7afz2q@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Hi,
On 2016-04-14 13:11:43 -0400, Tom Lane wrote:
> Well, yeah, because if Size is wider than int32 then that memcpy isn't
> enough to initialize it.
Yes, I wasn't suggesting this is a proper patch, just wanted to know
whether this actually is the issue.
> But looking around a bit, I think that ReorderBufferRestoreChange is
> almost completely broken on alignment-picky machines, if the expectation
> is that the input data could be aligned arbitrarily.
'data' cannot be aligned arbitrarily. It's palloc aligned, and the
buffer is used separately for everything is loaded from disk. I.e. the
data pointer passed to ReorderBufferRestoreChange is aligned to a
MAXALIGN boundary. The alignment issues come into play for data *after*
the ReorderBufferDiskChange.
> Aside from this particular problem, it contains multiple places where
> we cast "data" to something other than char*, and that is sufficient
> to allow a spec-compliant compiler to decide that "data" is aligned on
> more than a char boundary and hence generate code that depends on such
> alignment.
Afaics all cases but the ->newtuple branch showing the errors ought to
be ok because of the above.
> I also noticed that the INTERNAL_SNAPSHOT case doesn't bother to advance
> "data" past the data read, where all the other switch cases do. In the
> attached, I made it do likewise, but it looks like you could equally
> easily just drop the final update of "data" in each switch case,
> because nothing is looking at it after the switch.
Yea, I wasn't sure about keeping that.
> Proposed patch attached, but I'm still wondering why the alignment-picky
> buildfarm members aren't all failing on this. It seems to strongly
> suggest that the regression tests' coverage is lacking here.
Well, then Christoph's build wouldn't have caught the issue either?
Coverage isn't perfect, but it's not bad:
http://coverage.postgresql.org/src/backend/replication/logical/reorderbuffer.c.gcov.html
Greetings,
Andres Freund
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Christoph Berg <myon(at)debian(dot)org> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 19:14:37 |
Message-ID: | 15040.1460661277@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Christoph Berg <myon(at)debian(dot)org> writes:
> Re: Tom Lane 2016-04-14 <423(dot)1460653903(at)sss(dot)pgh(dot)pa(dot)us>
>> Proposed patch attached, but I'm still wondering why the alignment-picky
>> buildfarm members aren't all failing on this. It seems to strongly
>> suggest that the regression tests' coverage is lacking here.
> This patch works both on sparc and amd64. Thanks!
> (Tested on 9.4.7 only.)
I've been looking around and testing some more. I noticed that the only
caller of ReorderBufferRestoreChange always passes a maxaligned buffer,
so most of the changes I suggested are unnecessary. AFAICT, it's only
the second fetch of t_len that is at any risk. Even there, at least in
HEAD, I cannot construct a test case in which the first tuple's t_len is
not a multiple of 4. It's hard to tell what is causing that, because
surely heap_form_tuple guarantees no such thing. I have a suspicion that
something in the impenetrable thicket that passes for prefix/suffix
optimization in log_heap_update is forcing the WAL-logged tuple length to
be rounded off to sizeof(int) (*not* MAXALIGN).
It might be that 9.4 acts differently here, but the lack of crashes in our
buildfarm suggests otherwise. The best theory I can come up with is that
somehow, your Sparc compiler is deciding that "data" is 8-aligned at this
point and generating a code sequence that depends on that, even though
it's only being asked to fetch a uint32. If the machine were a 64-bit
machine, the compiler could legitimately make such a deduction because of
the presence of a pointer field in HeapTupleData; but I dunno how it gets
to that conclusion if the architecture is 32-bit. Anyway, there are few
enough architectures where it could make sense to use an 8-aligned
instruction to fetch a 4-byte value that it's not so surprising we've not
seen it crash elsewhere.
So I now think my previous patch is overkill, and we should instead use
something like the attached.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
alignment-fix-2.patch | text/x-diff | 1.9 KB |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 19:18:36 |
Message-ID: | 15176.1460661516@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2016-04-14 13:11:43 -0400, Tom Lane wrote:
>> But looking around a bit, I think that ReorderBufferRestoreChange is
>> almost completely broken on alignment-picky machines, if the expectation
>> is that the input data could be aligned arbitrarily.
> 'data' cannot be aligned arbitrarily. It's palloc aligned, and the
> buffer is used separately for everything is loaded from disk. I.e. the
> data pointer passed to ReorderBufferRestoreChange is aligned to a
> MAXALIGN boundary.
Right, I figured that out awhile later; it would have been better if it
was documented. See my second patch attempt.
regards, tom lane
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 19:24:40 |
Message-ID: | 15413.1460661880@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2016-04-14 13:11:43 -0400, Tom Lane wrote:
>> Proposed patch attached, but I'm still wondering why the alignment-picky
>> buildfarm members aren't all failing on this. It seems to strongly
>> suggest that the regression tests' coverage is lacking here.
> Well, then Christoph's build wouldn't have caught the issue either?
> Coverage isn't perfect, but it's not bad:
> http://coverage.postgresql.org/src/backend/replication/logical/reorderbuffer.c.gcov.html
BTW, so far as that goes, the question is whether we ever come through
here with two tuples of which the first has a t_len that's not aligned
at least on a 4-byte boundary. gcov can't tell you that.
I put in an Assert and satisfied myself that the current tests do *not*
ever create such a case. I tried modifying the test by adding updates
that set a trailing text field to different-length strings, and still
couldn't make it happen. So something is rounding off the tuple length,
though only to sizeof(int) not to maxalign, which is very curious.
regards, tom lane
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 22:22:42 |
Message-ID: | 20160414222242.ec4m74aqsh4eshde@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On 2016-04-14 15:14:37 -0400, Tom Lane wrote:
> I've been looking around and testing some more. I noticed that the only
> caller of ReorderBufferRestoreChange always passes a maxaligned buffer,
> so most of the changes I suggested are unnecessary. AFAICT, it's only
> the second fetch of t_len that is at any risk.
I think so too. And I agree that a comment about this would have been
rather helpful...
> Even there, at least in
> HEAD, I cannot construct a test case in which the first tuple's t_len is
> not a multiple of 4.
That's really kinda weird. It's a tuple from WAL, and the length's from
the record; the same way wal replay gets it. Wonder if this might not
indicate a relevant bug somewhere...
> I have a suspicion that
> something in the impenetrable thicket that passes for prefix/suffix
> optimization in log_heap_update is forcing the WAL-logged tuple length to
> be rounded off to sizeof(int) (*not* MAXALIGN).
I don't think so, because the prefix optimization is essentially
disabled for logical...
> So I now think my previous patch is overkill, and we should instead use
> something like the attached.
Looks good to me. Thanks!
- Andres
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc) |
Date: | 2016-04-14 23:20:12 |
Message-ID: | 31502.1460676012@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2016-04-14 15:14:37 -0400, Tom Lane wrote:
>> Even there, at least in
>> HEAD, I cannot construct a test case in which the first tuple's t_len is
>> not a multiple of 4.
> That's really kinda weird. It's a tuple from WAL, and the length's from
> the record; the same way wal replay gets it. Wonder if this might not
> indicate a relevant bug somewhere...
Got it: it's a testing deficiency. What I was not aware of is that this
code path (with two tuples to be restored) seems to only be entered for
updates that change the primary key. And guess what, there are only a
couple of instances of that in the test_decoding tests, and all of them
have old tuples that are a multiple of 4 long. I'd been modifying the
tests in a way that didn't produce any new exercise of the target code
path at all :-(
I have a test addition now that produces a bus error on gaur, and probably
on every alignment-picky platform:
diff --git a/contrib/test_decoding/sql/ddl.sql b/contrib/test_decoding/sql/ddl.sql
index b1f7bf6..e99b2d3 100644
*** a/contrib/test_decoding/sql/ddl.sql
--- b/contrib/test_decoding/sql/ddl.sql
*************** INSERT INTO tr_etoomuch(data) SELECT g.i
*** 116,121 ****
--- 116,123 ----
SELECT 'tx logical msg' FROM pg_logical_emit_message(true, 'test', 'tx logical msg');
DELETE FROM tr_etoomuch WHERE id < 5000;
UPDATE tr_etoomuch SET data = - data WHERE id > 5000;
+ CREATE TABLE tr_oddlength (id text primary key, data text);
+ INSERT INTO tr_oddlength VALUES('ab', 'foo');
COMMIT;
/* display results, but hide most of the output */
*************** ORDER BY 1,2;
*** 128,133 ****
--- 130,137 ----
BEGIN;
CREATE TABLE spoolme AS SELECT g.i FROM generate_series(1, 5000) g(i);
UPDATE tr_etoomuch SET id = -id WHERE id = 5000;
+ UPDATE tr_oddlength SET id = 'x', data = 'quux';
+ UPDATE tr_oddlength SET id = 'yy', data = 'a';
DELETE FROM spoolme;
DROP TABLE spoolme;
COMMIT;
Will commit this along with the fix.
Meanwhile, there's still the question of why Christoph's sparc machine got
a bus error on a case where other picky platforms don't. But I'm prepared
to write that off as some code-generation vagary.
regards, tom lane