Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64

Lists: pgsql-bugs
From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: github(dot)stheine(at)heine7(dot)de
Subject: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64
Date: 2024-06-11 09:59:39
Message-ID: 18503-6e0f5ab2f9c319c1@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18503
Logged by: Stefan Heine
Email address: github(dot)stheine(at)heine7(dot)de
PostgreSQL version: 16.3
Operating system: Ubuntu 24.04, Debian bookworm
Description:

This is a followup of
/message-id/flat/18471-4e01d7601cedf1b0%40postgresql.org
and maybe related to
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059476

The query described in
/message-id/flat/18471-4e01d7601cedf1b0%40postgresql.org
is causing a reproducible 'Segmentation fault'.
I have tried various versions of postgresql on different OS versions, trying
to find one that works fine, but this happens in 14.8, 14.12, 16.3 on Debian
bookworm.
It also happens in 16.3 on Ubuntu 24.04 when installing the standard
OS-provided version of postgresql.
I also tried installing the 16.3 on Ubuntu 24.04 from
https://wiki.postgresql.org/wiki/Apt, and it's still failing.

The issue is clearly related to jit, since it only reproduces if jit is
enabled and forced to kick in (jit_above_cost = 1, jit_inline_above_cost =
1,
jit_optimize_above_cost = 1). disabling jit makes the query run fine.

in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059476 there was a
similar issue, that pointed to llvm v14, but the postgresql version from
https://wiki.postgresql.org/wiki/Apt mentions `libllvm17t64`, so this seems
to include a newer version and still aborts.

That situation is clearly reproducible, so we can help troubleshooting in
case you want to look into details.


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: github(dot)stheine(at)heine7(dot)de, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64
Date: 2024-06-12 00:59:08
Message-ID: CA+hUKGJNtO=wZ2X1nDpFDrF_J7Xo7LnFinaTh-qT2vmqoHSJKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Jun 11, 2024 at 10:08 PM PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
> The query described in
> /message-id/flat/18471-4e01d7601cedf1b0%40postgresql.org
> is causing a reproducible 'Segmentation fault'.

Also on ARM?

That other report said that memory is leaking and implies (?) the OOM
killer, but you're talking about a segmentation fault, so it seems
like a different symptom (even if the root cause is same), right?

> That situation is clearly reproducible, so we can help troubleshooting in
> case you want to look into details.

Can you get a core file, and gdb backtrace?


From: Stefan Heine <github(dot)stheine(at)heine7(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64
Date: 2024-06-12 13:47:44
Message-ID: e35fafdb-9b2c-4259-aae5-0e8312a2a8f6@heine7.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 2024-06-12 02:59, Thomas Munro wrote:
> On Tue, Jun 11, 2024 at 10:08 PM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
>> The query described in
>> /message-id/flat/18471-4e01d7601cedf1b0%40postgresql.org
>> is causing a reproducible 'Segmentation fault'.
> Also on ARM?

yes, this is ARM (aarch64).
> That other report said that memory is leaking and implies (?) the OOM
> killer, but you're talking about a segmentation fault, so it seems
> like a different symptom (even if the root cause is same), right?

when jit is disabled, the query runs ok in < 2 seconds.
when the issue happens, it takes > 30 seconds and then results in a 6GB
core.

>> That situation is clearly reproducible, so we can help troubleshooting in
>> case you want to look into details.
> Can you get a core file, and gdb backtrace?

find a core here:
https://my.hidrive.com/share/bz2zb2alkp

do you have instructions for the gdb backtrace?


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Stefan Heine <github(dot)stheine(at)heine7(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64
Date: 2024-06-12 21:41:05
Message-ID: CA+hUKG+8a_NOUbNFuboqh6DLBb5m3AnmaqB4fg4w4SZHi3Hs4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Thu, Jun 13, 2024 at 1:47 AM Stefan Heine <github(dot)stheine(at)heine7(dot)de> wrote:
> do you have instructions for the gdb backtrace?

gdb /path/to/executable -c /path/to/core
... loads stuff ...
(gdb) bt
... prints out function call stack ...

It will probably just show some library names and addresses, but so
far we don't even know if this is crashing in LLVM or in PostgreSQL
code so that'd be a clue. Maybe function names would appear if you
set up DEBUGINFOD_URLS, depending on where you got your packages from:

https://wiki.debian.org/HowToGetABacktrace

Hoping to find time to repro this later on a cloud host. If this is a
cloud host, can you tell me which cloud, instance type, memory size
etc? I had already been trying on some local ARM hardware with no
luck (same versions but diferrent OS, so going to try making more
things match you case)...


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Stefan Heine <github(dot)stheine(at)heine7(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64
Date: 2024-06-12 21:54:13
Message-ID: CA+hUKGJ-e2-UnxAQybK0xyLHvcpKfDX_gZ0wdLAVdpZ4n1W2Wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Thu, Jun 13, 2024 at 9:41 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> cloud host, can you tell me which cloud, instance type, memory size
> etc?

(I realise that the email from the other thread gives an AWS instance
type that I can try, but that report is about memory usage and yours
has a segfault so I'm curious to know what conditions are different
for you..)


From: Stefan Heine <github(dot)stheine(at)heine7(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64
Date: 2024-06-13 06:19:12
Message-ID: d3effbb4-236d-4e26-ace6-9df3c8f3af0f@heine7.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 베이SQL : Postg토토 베이SQL 메일 링리스트 : 2024-06-13 이후 PGSQL-BUGS

On 2024-06-12 23:41, Thomas Munro wrote:
> On Thu, Jun 13, 2024 at 1:47 AM Stefan Heine<github(dot)stheine(at)heine7(dot)de> wrote:
>> do you have instructions for the gdb backtrace?
> gdb /path/to/executable -c /path/to/core
> ... loads stuff ...
> (gdb) bt
> ... prints out function call stack ...
>
> It will probably just show some library names and addresses, but so
> far we don't even know if this is crashing in LLVM or in PostgreSQL
> code so that'd be a clue. Maybe function names would appear if you
> set up DEBUGINFOD_URLS, depending on where you got your packages from:
>
> https://wiki.debian.org/HowToGetABacktrace

# gdb /usr/lib/postgresql/16/bin/postgres -c core.19
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
   <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/16/bin/postgres...
(No debugging symbols found in /usr/lib/postgresql/16/bin/postgres)
warning: Can't open file /dev/shm/PostgreSQL.384567174 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.2343312096 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.1247406204 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.50860586 during file-backed
mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.4136010652 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.2304500154 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.817475720 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.526004662 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.1223723046 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.4190931822 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.3836724180 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.1707942452 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.4107375064 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.2885303254 during
file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.4136268764 during
file-backed mapping note processing
warning: Can't open file /dev/zero (deleted) during file-backed mapping
note processing
warning: Can't open file /dev/shm/PostgreSQL.3153232120 during
file-backed mapping note processing
warning: Can't open file /SYSV03e40001 (deleted) during file-backed
mapping note processing
[New LWP 19]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: sa postgres 164.99.242.100(57456)
EXPLAIN                           '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000fffe0fb635b8in ??()
(gdb) bt
#0 0x0000fffe0fb635b8in ??()
#1 0x0000aaaaefd84330in ??()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) quit

> Hoping to find time to repro this later on a cloud host. If this is a
> cloud host, can you tell me which cloud, instance type, memory size
> etc? I had already been trying on some local ARM hardware with no
> luck (same versions but diferrent OS, so going to try making more
> things match you case)...
> (I realise that the email from the other thread gives an AWS instance
> type that I can try, but that report is about memory usage and yours
> has a segfault so I'm curious to know what conditions are different
> for you..)

it's running on AWS, t4g.large, 8GB RAM. this server is running Ubuntu
22.04.3 LTS and hosting docker.
inside docker, there is a container running postgres, based on the
official postgres:16.3 (Based on Debian Bookwork) from
https://hub.docker.com/_/postgres .


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Stefan Heine <github(dot)stheine(at)heine7(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64
Date: 2024-08-27 02:47:29
Message-ID: CA+hUKGKUSWO7+FYKPd2W=Wuc6P5vizH7DAYJO2z=dDzwrD=8RQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

FYI, Anthonin Bonnefoy has diagnosed an issue that could well be the
one you're seeing:

/message-id/flat/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com