From: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com> |
---|---|
To: | Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, reshkekirill <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Slow standby snapshot |
Date: | 2022-11-16 12:23:46 |
Message-ID: | CANtu0oiPoSdQsjRd6Red5WMHi1E83d2+-bM9J6dtWR3c5Tap9g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello everyone.
> However ... I tried to reproduce the original complaint, and
> failed entirely. I do see KnownAssignedXidsGetAndSetXmin
> eating a bit of time in the standby backends, but it's under 1%
> and doesn't seem to be rising over time. Perhaps we've already
> applied some optimization that ameliorates the problem? But
> I tested v13 as well as HEAD, and got the same results.
> Hmm. I wonder if my inability to detect a problem is because the startup
> process does keep ahead of the workload on my machine, while it fails
> to do so on the OP's machine. I've only got a 16-CPU machine at hand,
> which probably limits the ability of the primary to saturate the standby's
> startup process.
Yes, optimization by Andres Freund made things much better, but the
impact is still noticeable.
I was also using 16CPU machine - but two of them (primary and standby).
Here are the scripts I was using (1) for benchmark - maybe it could help.
> Nowadays we've *got* those primitives. Can we get rid of
> known_assigned_xids_lck, and if so would it make a meaningful
> difference in this scenario?
I was trying it already - but was unable to find real benefits for it.
WIP patch in attachment.
Hm, I see I have sent it to list, but it absent in archives... Just
quote from it:
> First potential positive effect I could see is
> (TransactionIdIsInProgress -> KnownAssignedXidsSearch) locking but
> seems like it is not on standby hotpath.
> Second one - locking for KnownAssignedXidsGetAndSetXmin (build
> snapshot). But I was unable to measure impact. It wasn’t visible
> separately in (3) test.
> Maybe someone knows scenario causing known_assigned_xids_lck or
> TransactionIdIsInProgress become bottleneck on standby?
The latest question is still actual :)
> I think it might be a bigger effect than one might immediately think. Because
> the spinlock will typically be on the same cacheline as head/tail, and because
> every spinlock acquisition requires the cacheline to be modified (and thus
> owned mexlusively) by the current core, uses of head/tail will very commonly
> be cache misses even in workloads without a lot of KAX activity.
I was trying to find some way to practically achieve any noticeable
impact here, but unsuccessfully.
>> But yeah, it does feel like the proposed
>> approach is only going to be optimal over a small range of conditions.
> In particular, it doesn't adapt at all to workloads that don't replay all that
> much, but do compute a lot of snapshots.
The approach (2) was optimized to avoid any additional work for anyone
except non-startup
processes (approach with offsets to skip gaps while building snapshot).
[1]: https://gist.github.com/michail-nikolaev/e1dfc70bdd7cfd1b902523dbb3db2f28
[2]: /message-id/flat/CANtu0ogzo4MsR7My9%2BNhu3to5%3Dy7G9zSzUbxfWYOn9W5FfHjTA%40mail.gmail.com#341a3c3b033f69b260120b3173a66382
--
Michail Nikolaev
Attachment | Content-Type | Size |
---|---|---|
0001-memory-barrier-instead-of-spinlock.patch | text/plain | 5.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2022-11-16 12:47:19 | Re: Reducing power consumption on idle servers |
Previous Message | Alvaro Herrera | 2022-11-16 11:44:02 | Re: ExecRTCheckPerms() and many prunable partitions |