Re: Broken linkparsing in archives

Lists: pgsql-www
From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Broken linkparsing in archives
Date: 2022-11-02 12:31:08
Message-ID: D0DB017E-F605-4158-BB12-6D5C20636DF8@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

Looking at past announcements I noticed that Markdown links were parsed and/or
rendered incorrectly in the archives. The example email that I noticed it on
was this:

/message-id/163724833494.26187.1931723451787420391@wrigleys.postgresql.org

..but it happens on all it seems, a more recent example:

/message-id/166472941958.662.2706300812023074847%40wrigleys.postgresql.org

The rendered links follow the same pattern, the last word in the markdown text
block is prepended to the url block and all of it added as the href:

[call for papers](https://2022.nordicpgday.org/cfp/)

becomes:

[call for <a href="http://papers](https://2022.nordicpgday.org/cfp/)" rel="nofollow">papers](https://2022.nordicpgday.org/cfp/)</a>

Is this a known issue?

--
Daniel Gustafsson https://vmware.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: Broken linkparsing in archives
Date: 2022-11-02 12:39:38
Message-ID: CABUevEy-p2HS4Qzi4_QcK4W0ccFStZE-r+_wpd7UJSyzRZa9Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

On Wed, Nov 2, 2022 at 1:31 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:

> Looking at past announcements I noticed that Markdown links were parsed
> and/or
> rendered incorrectly in the archives. The example email that I noticed it
> on
> was this:
>
>
> /message-id/163724833494.26187.1931723451787420391@wrigleys.postgresql.org
>
> ..but it happens on all it seems, a more recent example:
>
>
> /message-id/166472941958.662.2706300812023074847%40wrigleys.postgresql.org
>
> The rendered links follow the same pattern, the last word in the markdown
> text
> block is prepended to the url block and all of it added as the href:
>
> [call for papers](https://2022.nordicpgday.org/cfp/)
>
> becomes:
>
> [call for <a href="http://papers](https://2022.nordicpgday.org/cfp/)"
> rel="nofollow">papers](https://2022.nordicpgday.org/cfp/)</a>
>
> Is this a known issue?
>
>
Well, there is no markdown support at all :) So what happens comes out as a
result of trying to extract links out of plaintext. This in turn is handled
by the django urlize filter:
https://docs.djangoproject.com/en/3.2/ref/templates/builtins/#urlize

Thus:

>>> from django.utils.html import urlize
>>> urlize('[call for papers](https://2022.nordicpgday.org/cfp/)')
'[call for <a href="http://papers](https://2022.nordicpgday.org/cfp/
)">papers](https://2022.nordicpgday.org/cfp/)</a>'

And I'm not sure they *should* be considered, since the mime type of the
body isn't markdown...

//Magnus


From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: Broken linkparsing in archives
Date: 2022-11-02 12:52:40
Message-ID: 9FE3BC56-7A7D-4036-8DF0-E94B8A28FD61@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

> On 2 Nov 2022, at 13:39, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> And I'm not sure they *should* be considered, since the mime type of the body isn't markdown...

For emails sent as text to -announce, sure. But. Since we support markdown
formatting in news postings that go out to -announce, it seems a bit unhelpful
to generate broken links for all those posts.

If I can come up with a filter that converts a broken link from urlize for the
known case of markdown links, would that be an accepted solution?

--
Daniel Gustafsson https://vmware.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: PostgreSQL WWW <pgsql-www(at)postgresql(dot)org>
Subject: Re: Broken linkparsing in archives
Date: 2022-11-03 13:17:43
Message-ID: CABUevEyu4tT7Eqe7Ph8enZ9VDU1V1M1i+0m6pN1cdv0Too5hSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

On Wed, Nov 2, 2022 at 1:52 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:

> > On 2 Nov 2022, at 13:39, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
> > And I'm not sure they *should* be considered, since the mime type of the
> body isn't markdown...
>
> For emails sent as text to -announce, sure. But. Since we support
> markdown
> formatting in news postings that go out to -announce, it seems a bit
> unhelpful
> to generate broken links for all those posts.
>

I agree with the principe, but the question is how reliable we can make it.
(One oculd also argue we *should* post those as text/markdown, but I fear
that will break even more MUAs).

If I can come up with a filter that converts a broken link from urlize for
> the
> known case of markdown links, would that be an accepted solution?
>

If it can be made reliable, I think that would be acceptable. It needs to
be validated that it works in the full chain that we use on the site (we
also include the silly obfuscation of email addresses in the filter chain),
but as long as that's done I think we can and should do it.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>