Re: Mailing list subscription's mail delivery delays?

Lists: pgsql-www
From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: pgsql-www(at)postgresql(dot)org
Subject: Mailing list subscription's mail delivery delays?
Date: 2023-09-28 22:46:43
Message-ID: CAEze2Wi08Zw4BFfWaVMR1ufe9jhsbqYZtnBhOCyDsZLp-accXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

Hi,

By lack of a better place to ask:

I've recently noticed that in several of the email threads that I
follow over on -hackers@ that some of the email messages have a very
high time-to-delivery, and thus mails from the same thread arrive
out-of-order.
I've seen several occurances of this with very long delays of over 10
hours, with at least one larger than 19 hours, assuming mail server
clocks are accurate and receipt dates are correctly included in the
mail headers.

I'm not sure if the issue is on my side (mail servers are gmail's) or
on the mailing list server - all traces I've checked indicate that the
delay is somewhere in the delivery from postgres' last mail server to
the first gmail mail server.

I've only really noticed this sometime in the past few weeks. After
sampling my mails, I found other examples of significant delays (>1h)
for mails from well-respected hackers dating back to at least
2023-08-28.

Would you happen to know why this could be the case, and what I can do
to fix it if it's something on my side?

I've attached three recently received mails from -hackers as .eml, to
help with any debugging: one was delivered relatively quickly (91s),
one for which the delivery took a long time (11h+) and one more with a
very long delivery time (19h+). I haven't yet noticed any specific
differences or commonalities between fast and slow mails.

Kind regards,

Matthias van de Meent.


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-09-28 22:53:32
Message-ID: CAKFQuwZtPAoXi1OgeQL19ybix8fPe3u379C0S_8mAsH02PvjMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

On Thu, Sep 28, 2023 at 3:48 PM Matthias van de Meent <
boekewurm+postgres(at)gmail(dot)com> wrote:

> I'm not sure if the issue is on my side (mail servers are gmail's) or
> on the mailing list server - all traces I've checked indicate that the
> delay is somewhere in the delivery from postgres' last mail server to
> the first gmail mail server.
>
> I've only really noticed this sometime in the past few weeks. After
> sampling my mails, I found other examples of significant delays (>1h)
> for mails from well-respected hackers dating back to at least
> 2023-08-28.
>

I have noticed the same thing happening for the Gmail account that I use.

David J.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-09-28 23:10:51
Message-ID: 1112882.1695942651@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> On Thu, Sep 28, 2023 at 3:48 PM Matthias van de Meent <
> boekewurm+postgres(at)gmail(dot)com> wrote:
>> I'm not sure if the issue is on my side (mail servers are gmail's) or
>> on the mailing list server - all traces I've checked indicate that the
>> delay is somewhere in the delivery from postgres' last mail server to
>> the first gmail mail server.
>>
>> I've only really noticed this sometime in the past few weeks. After
>> sampling my mails, I found other examples of significant delays (>1h)
>> for mails from well-respected hackers dating back to at least
>> 2023-08-28.

> I have noticed the same thing happening for the Gmail account that I use.

I have been seeing the same thing for a few days now, on my
definitely-not-gmail personal server. Something's flaky in the
PG mail infrastructure. It's gotten better since yesterday's
outage, though I'm not convinced it's totally fixed.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-09-29 07:13:37
Message-ID: CABUevEwGPj-UXWM=WZGd5C3oDXvCv4KVHJA111PSdka=zvbJqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> > On Thu, Sep 28, 2023 at 3:48 PM Matthias van de Meent <
> > boekewurm+postgres(at)gmail(dot)com> wrote:
> >> I'm not sure if the issue is on my side (mail servers are gmail's) or
> >> on the mailing list server - all traces I've checked indicate that the
> >> delay is somewhere in the delivery from postgres' last mail server to
> >> the first gmail mail server.
> >>
> >> I've only really noticed this sometime in the past few weeks. After
> >> sampling my mails, I found other examples of significant delays (>1h)
> >> for mails from well-respected hackers dating back to at least
> >> 2023-08-28.
>
> > I have noticed the same thing happening for the Gmail account that I use.
>
> I have been seeing the same thing for a few days now, on my
> definitely-not-gmail personal server. Something's flaky in the
> PG mail infrastructure. It's gotten better since yesterday's
> outage, though I'm not convinced it's totally fixed.

There have been some pretty bad issues with gmail recently. Some
changes have been deployed that will hopefully help mitigate those and
make things better, but it takes time to recover.

The massive backlogs caused by gmail have been enough to spill over
and affect other destinations as well simply due to the load created
since we have such a huge number of gmail subscribers. But we're
slowly seeing the backlogs shrink now and the load come down so
hopefully the changes made will continue to have effect and let us be
back to normal soon.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/


From: Ray O'Donnell <ray(at)rodonnell(dot)ie>
To: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-10-01 18:42:05
Message-ID: ea0c73c2-b0e3-44fa-8e42-0d6863a8ae99@rodonnell.ie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

On 29/09/2023 08:13, Magnus Hagander wrote:
>
> There have been some pretty bad issues with gmail recently. Some

Just curious - what sort of issues? I don't use gmail myself.

Ray.

--
Raymond O'Donnell // Galway // Ireland
ray(at)rodonnell(dot)ie


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-10-02 20:52:48
Message-ID: 2158268.1696279968@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I have been seeing the same thing for a few days now, on my
>> definitely-not-gmail personal server. Something's flaky in the
>> PG mail infrastructure. It's gotten better since yesterday's
>> outage, though I'm not convinced it's totally fixed.

> There have been some pretty bad issues with gmail recently. Some
> changes have been deployed that will hopefully help mitigate those and
> make things better, but it takes time to recover.

> The massive backlogs caused by gmail have been enough to spill over
> and affect other destinations as well simply due to the load created
> since we have such a huge number of gmail subscribers. But we're
> slowly seeing the backlogs shrink now and the load come down so
> hopefully the changes made will continue to have effect and let us be
> back to normal soon.

I'm still seeing multi-hour delivery delays on a subset of traffic,
like maybe half a dozen instances today.

Looking at the Received: timestamps shows pretty conclusively that
the delays are within PG infra, for example this recent message from
Heikki got hung up at two separate jumps:

Return-Path: <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>
Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])
by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTPS id 392HruLZ2135620
(version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
for <tgl(at)sss(dot)pgh(dot)pa(dot)us>; Mon, 2 Oct 2023 13:53:57 -0400
Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
by malur.postgresql.org with esmtp (Exim 4.94.2)
(envelope-from <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>)
id 1qnN7D-00GbGd-FB
for tgl(at)sss(dot)pgh(dot)pa(dot)us; Mon, 02 Oct 2023 17:53:55 +0000
Received: from makus.postgresql.org ([2001:4800:3e1:1::229])
by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
(Exim 4.94.2)
(envelope-from <hlinnaka(at)iki(dot)fi>)
id 1qnGcb-00AqOg-Ti
for pgsql-hackers(at)lists(dot)postgresql(dot)org; Mon, 02 Oct 2023 10:57:53 +0000
Received: from meesny.iki.fi ([195.140.195.201])
by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
(Exim 4.94.2)
(envelope-from <hlinnaka(at)iki(dot)fi>)
id 1qnF5S-007kvc-AQ
for pgsql-hackers(at)postgresql(dot)org; Mon, 02 Oct 2023 09:19:35 +0000
Received: from [192.168.1.115] (dsl-hkibng22-54f8db-125.dhcp.inet.fi [84.248.219.125])
(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits))
(No client certificate requested)
(Authenticated sender: hlinnaka)
by meesny.iki.fi (Postfix) with ESMTPSA id 4Rzb4d51FBzydx;
Mon, 2 Oct 2023 12:19:29 +0300 (EEST)
Message-ID: <fe32d2a0-0998-d866-d6ee-2aed70b9be00(at)iki(dot)fi>
Date: Mon, 2 Oct 2023 12:19:29 +0300
...

Also, my own message <2154347(dot)1696278028(at)sss(dot)pgh(dot)pa(dot)us> went
out to -hackers about 25 minutes ago and hasn't come back,
so based on other recent examples I'm betting I won't see it
for hours.

Plenty of other traffic *is* coming through in normal-ish time,
so I'm not sure I buy that there's still a massive logjam.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-10-03 18:31:44
Message-ID: CABUevEx=Nswe8OLg7kdr4A3UNSTG_rTO4P8fs_VVpeJamzfp3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

On Mon, Oct 2, 2023 at 4:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I have been seeing the same thing for a few days now, on my
> >> definitely-not-gmail personal server. Something's flaky in the
> >> PG mail infrastructure. It's gotten better since yesterday's
> >> outage, though I'm not convinced it's totally fixed.
>
> > There have been some pretty bad issues with gmail recently. Some
> > changes have been deployed that will hopefully help mitigate those and
> > make things better, but it takes time to recover.
>
> > The massive backlogs caused by gmail have been enough to spill over
> > and affect other destinations as well simply due to the load created
> > since we have such a huge number of gmail subscribers. But we're
> > slowly seeing the backlogs shrink now and the load come down so
> > hopefully the changes made will continue to have effect and let us be
> > back to normal soon.
>
> I'm still seeing multi-hour delivery delays on a subset of traffic,
> like maybe half a dozen instances today.
>
> Looking at the Received: timestamps shows pretty conclusively that
> the delays are within PG infra, for example this recent message from
> Heikki got hung up at two separate jumps:
>
> Return-Path: <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>
> Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])
> by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTPS id 392HruLZ2135620
> (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
> for <tgl(at)sss(dot)pgh(dot)pa(dot)us>; Mon, 2 Oct 2023 13:53:57 -0400
> Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
> by malur.postgresql.org with esmtp (Exim 4.94.2)
> (envelope-from <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>)
> id 1qnN7D-00GbGd-FB
> for tgl(at)sss(dot)pgh(dot)pa(dot)us; Mon, 02 Oct 2023 17:53:55 +0000
> Received: from makus.postgresql.org ([2001:4800:3e1:1::229])
> by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> (Exim 4.94.2)
> (envelope-from <hlinnaka(at)iki(dot)fi>)
> id 1qnGcb-00AqOg-Ti
> for pgsql-hackers(at)lists(dot)postgresql(dot)org; Mon, 02 Oct 2023 10:57:53 +0000
> Received: from meesny.iki.fi ([195.140.195.201])
> by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> (Exim 4.94.2)
> (envelope-from <hlinnaka(at)iki(dot)fi>)
> id 1qnF5S-007kvc-AQ
> for pgsql-hackers(at)postgresql(dot)org; Mon, 02 Oct 2023 09:19:35 +0000
> Received: from [192.168.1.115] (dsl-hkibng22-54f8db-125.dhcp.inet.fi [84.248.219.125])
> (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
> key-exchange X25519 server-signature RSA-PSS (2048 bits))
> (No client certificate requested)
> (Authenticated sender: hlinnaka)
> by meesny.iki.fi (Postfix) with ESMTPSA id 4Rzb4d51FBzydx;
> Mon, 2 Oct 2023 12:19:29 +0300 (EEST)
> Message-ID: <fe32d2a0-0998-d866-d6ee-2aed70b9be00(at)iki(dot)fi>
> Date: Mon, 2 Oct 2023 12:19:29 +0300
> ...
>
>
> Also, my own message <2154347(dot)1696278028(at)sss(dot)pgh(dot)pa(dot)us> went
> out to -hackers about 25 minutes ago and hasn't come back,
> so based on other recent examples I'm betting I won't see it
> for hours.
>
> Plenty of other traffic *is* coming through in normal-ish time,
> so I'm not sure I buy that there's still a massive logjam.

There is still definitely a problem, but it is slowly recovering. It
is *mostliy* hitting gmail at this point, but there can be spillover
to others in some cases (for example, there's a general throttling
when the load on the server gets too high). In this particular case,
it coincides timing-wise with our old friend the oom-killer nuking
postgres on the machine thereby stopping all incoming email for a
while before it got moving again. That particular problem should have
been taken care of completely by now, but the general backlog/queueing
problem is still ongoing but has been improving.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-www(at)postgresql(dot)org
Subject: Re: Mailing list subscription's mail delivery delays?
Date: 2023-10-04 16:30:52
Message-ID: CABUevEwnM3H0pPs0aDQB36n6PzzqYi38LgSnJZRRugQ_CWxHkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

On Tue, Oct 3, 2023 at 2:31 PM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
> On Mon, Oct 2, 2023 at 4:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > > On Fri, Sep 29, 2023 at 1:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >> I have been seeing the same thing for a few days now, on my
> > >> definitely-not-gmail personal server. Something's flaky in the
> > >> PG mail infrastructure. It's gotten better since yesterday's
> > >> outage, though I'm not convinced it's totally fixed.
> >
> > > There have been some pretty bad issues with gmail recently. Some
> > > changes have been deployed that will hopefully help mitigate those and
> > > make things better, but it takes time to recover.
> >
> > > The massive backlogs caused by gmail have been enough to spill over
> > > and affect other destinations as well simply due to the load created
> > > since we have such a huge number of gmail subscribers. But we're
> > > slowly seeing the backlogs shrink now and the load come down so
> > > hopefully the changes made will continue to have effect and let us be
> > > back to normal soon.
> >
> > I'm still seeing multi-hour delivery delays on a subset of traffic,
> > like maybe half a dozen instances today.
> >
> > Looking at the Received: timestamps shows pretty conclusively that
> > the delays are within PG infra, for example this recent message from
> > Heikki got hung up at two separate jumps:
> >
> > Return-Path: <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>
> > Received: from malur.postgresql.org (malur.postgresql.org [217.196.149.56])
> > by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTPS id 392HruLZ2135620
> > (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
> > for <tgl(at)sss(dot)pgh(dot)pa(dot)us>; Mon, 2 Oct 2023 13:53:57 -0400
> > Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
> > by malur.postgresql.org with esmtp (Exim 4.94.2)
> > (envelope-from <pgsql-hackers-owner+M15-507066(at)lists(dot)postgresql(dot)org>)
> > id 1qnN7D-00GbGd-FB
> > for tgl(at)sss(dot)pgh(dot)pa(dot)us; Mon, 02 Oct 2023 17:53:55 +0000
> > Received: from makus.postgresql.org ([2001:4800:3e1:1::229])
> > by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> > (Exim 4.94.2)
> > (envelope-from <hlinnaka(at)iki(dot)fi>)
> > id 1qnGcb-00AqOg-Ti
> > for pgsql-hackers(at)lists(dot)postgresql(dot)org; Mon, 02 Oct 2023 10:57:53 +0000
> > Received: from meesny.iki.fi ([195.140.195.201])
> > by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
> > (Exim 4.94.2)
> > (envelope-from <hlinnaka(at)iki(dot)fi>)
> > id 1qnF5S-007kvc-AQ
> > for pgsql-hackers(at)postgresql(dot)org; Mon, 02 Oct 2023 09:19:35 +0000
> > Received: from [192.168.1.115] (dsl-hkibng22-54f8db-125.dhcp.inet.fi [84.248.219.125])
> > (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
> > key-exchange X25519 server-signature RSA-PSS (2048 bits))
> > (No client certificate requested)
> > (Authenticated sender: hlinnaka)
> > by meesny.iki.fi (Postfix) with ESMTPSA id 4Rzb4d51FBzydx;
> > Mon, 2 Oct 2023 12:19:29 +0300 (EEST)
> > Message-ID: <fe32d2a0-0998-d866-d6ee-2aed70b9be00(at)iki(dot)fi>
> > Date: Mon, 2 Oct 2023 12:19:29 +0300
> > ...
> >
> >
> > Also, my own message <2154347(dot)1696278028(at)sss(dot)pgh(dot)pa(dot)us> went
> > out to -hackers about 25 minutes ago and hasn't come back,
> > so based on other recent examples I'm betting I won't see it
> > for hours.
> >
> > Plenty of other traffic *is* coming through in normal-ish time,
> > so I'm not sure I buy that there's still a massive logjam.
>
> There is still definitely a problem, but it is slowly recovering. It
> is *mostliy* hitting gmail at this point, but there can be spillover
> to others in some cases (for example, there's a general throttling
> when the load on the server gets too high). In this particular case,
> it coincides timing-wise with our old friend the oom-killer nuking
> postgres on the machine thereby stopping all incoming email for a
> while before it got moving again. That particular problem should have
> been taken care of completely by now, but the general backlog/queueing
> problem is still ongoing but has been improving.

We *think* this issue has now been mostly resolved. We are still
seeing some extra delays in deliveries to gmail right now but that's
due to *us* slowing down the deliveries to not trigger things. But we
are now talking delays of minutes or tens of minutes, and not hours or
tens of hours. Non-gmail recipients should now be back to being mostly
unaffected.

We're continuing to monitor the situation of course, and to make
careful modifications to bring us back to the quicker deliverry times.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/