From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | "Wood, Dan" <hexpert(at)amazon(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: VM map freeze corruption |
Date: | 2018-04-18 10:49:12 |
Message-ID: | CABOikdOecOFE--y0i1wO0CONr54VmyyoP2Er35CVrPYOCN8hZw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 18, 2018 at 7:37 AM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
>
>
> My analysis is that heap_prepare_freeze_tuple->FreezeMultiXactId()
> returns FRM_NOOP if the MultiXACT locked rows haven't committed. This
> results in changed=false and totally_frozen=true(as initialized). When
> this returns to lazy_scan_heap(), no rows are added to the frozen[] array.
> Yet, tuple_totally_frozen is true. This means the page is marked frozen in
> the VM, even though the MultiXACT row wasn't left untouch.
>
> A fix to heap_prepare_freeze_tuple() that seems to do the trick is:
> else
> {
> Assert(flags & FRM_NOOP);
> + totally_frozen = false;
> }
>
That's a great find! This can definitely lead to various problems and could
be one of the reasons behind the issue reported here [1]. For example, if
we change the script slightly at the end, we can get the same error
reported in the bug report.
sleep 4; # Wait for share locks to be released
# See if another vacuum freeze advances relminmxid beyond xmax present in
the
# heap
echo "vacuum (verbose, freeze) t;" | $p
echo "select pg_check_frozen('t');" | $p
# See if a vacuum freeze scanning all pages corrects the problem
echo "vacuum (verbose, freeze, disable_page_skipping) t;" | $p
echo "select pg_check_frozen('t');" | $p
Thanks,
Pavan
[1]
/message-id/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K%2BA%40mail.gmail.com
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Ildar Musin | 2018-04-18 11:34:12 | hostorder and failover_timeout for libpq |
Previous Message | Heikki Linnakangas | 2018-04-18 10:36:38 | Re: Built-in connection pooling |