Re: cannot download mbox with python

Lists: pgsql-www
From: Pierre Forstmann <pierre(dot)forstmann(at)gmail(dot)com>
To: pgsql-www(at)postgresql(dot)org
Subject: cannot download mbox with python
Date: 2023-12-07 15:34:26
Message-ID: CAM-sOH8qZxmza4xm29jbNNpfXYDu7eUUZSXHgJ9dCoUbHK86OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

Hello,

I'm trying to download
/list/pgsql-bugs/mbox/pgsql-bugs.202312
with following code using my postgresql.org account:

print(url + '... ')
response = requests.get(url, auth=('xxx','yyy'))
print('status: ' + str(response.status_code))
print('... done')
print(response.text)

But I get :

/list/pgsql-bugs/mbox/pgsql-bugs.202312...
status: 200
... done
<!doctype html>
<html lang="en">
<head>
<title>PostgreSQL: </title>
<meta name="viewport" content="width=device-width, initial-scale=1,
shrink-to-fit=no">
<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8" />

<meta name="theme-color" content="#336791"/>
<meta name="copyright" content="The 토토 사이트"
/>
<link href="/media/css/fontawesome.css?97a426bd" rel="stylesheet">
<link rel="stylesheet" href="/media/css/bootstrap-4.4.1.min.css">
<link rel="shortcut icon" href="/favicon.ico" />

<link rel="stylesheet" type="text/css" href="/dyncss/base.css?97a426bd">

<script src="/media/js/theme.js?97a426bd"></script>

</head>
<body>
<div class="container-fluid">
<div class="row justify-content-md-center">
<div class="col">
<!-- Header -->
<nav class="navbar navbar-expand-lg navbar-light bg-light">
<a class="navbar-brand p-0" href="/">
<img class="logo" src="/media/img/about/press/elephant.png"
alt="PostgreSQL Elephant Logo">
</a>
<button class="navbar-toggler" type="button"
data-toggle="collapse" data-target="#pgNavbar" aria-controls="pgNavbar"
aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="pgNavbar">
<ul class="navbar-nav mr-auto">
<li class="nav-item p-2"><a href="/"
title="Home">Home</a></li>
<li class="nav-item p-2"><a href="/about/"
title="About">About</a></li>
<li class="nav-item p-2"><a href="/download/"
title="Download">Download</a></li>
<li class="nav-item p-2"><a href="/docs/"
title="Documentation">Documentation</a></li>
<li class="nav-item p-2"><a href="/community/"
title="Community">Community</a></li>
<li class="nav-item p-2"><a href="/developer/"
title="Developers">Developers</a></li>
<li class="nav-item p-2"><a href="/support/"
title="Support">Support</a></li>
<li class="nav-item p-2"><a href="/about/donate/"
title="Donate">Donate</a></li>
<li class="nav-item p-2"><a href="/account/" title="Your
account">Your account</a></li>
</ul>
<form role="search" method="get" action="/search/">
<div class="input-group">
<input id="q" name="q" type="text" size="20"
maxlength="255" accesskey="s" class="form-control" placeholder="Search
for...">
<span class="input-group-btn">
<button class="btn btn-default" type="submit"><i
class="fas fa-search"></i></button>
</span>
</div><!-- /input-group -->
</form>
<form id="form-theme" class="form-inline d-none">
<button id="btn-theme" class="btn btn-default ml-1"
type="button"></button>
</form>
</div>
</nav>
</div>
</div>
<div class="row justify-content-center pg-shout-box">
<div class="col text-white text-center">9th November 2023: <a
href="/about/news/postgresql-161-155-1410-1313-1217-and-1122-released-2749/">
PostgreSQL 16.1, 15.5, 14.10, 13.13, 12.17, and 11.22 Released!
</a>

</div>
</div>
</div>

<div class="container-fluid margin">
<div class="row">
<div class="col-lg-2">
<div id="pgSideWrap">

</div> <!-- pgSideWrap -->
</div>
<div class="col-lg-10">
<div id="pgContentWrap">

<h1>Sign in <i class="fas fa-sign-in-alt"></i></h1>
<p>

The website you are trying to log in to (List archives) is using the
postgresql.org community login system. In this system you create a
central account that is used to log into most postgresql.org services.
Once you are logged into this account, you will automatically be
logged in to the associated postgresql.org services.

</p>
<p>
If you do not already have an account,
you can either <a href="/account/signup/">create</a>
a dedicated account, or use one of the third party sign-in systems below.
</p>

<h2>Community account sign-in</h2>
<p>
If you have a postgresql.org community account with a password, please
use the form below to sign in. If you have one but have lost your
password, you can use the <a href="/account/reset/">password reset</a> form.
</p>

<form action="." method="post" id="login-form"><input type="hidden"
name="csrfmiddlewaretoken"
value="ZbN9nbSTSpSGxNueOJvp06wbuCTIVUeJuQfvb2e5VdjrhTZg6TZa1O7Atsd1a3vr">
<div class="form-group">
<input type="text" class="form-control" name="username"
id="id_username" placeholder="Username or email address" autofocus />
</div>
<div class="form-group">
<input type="password" class="form-control" name="password"
id="id_password" placeholder="Password"/>
<input type="hidden" name="this_is_the_login_form" value="1" />
<input type="hidden" name="next"
value="/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="
/>
</div>
<div class="submit-row">
<input class="btn btn-primary" type="submit" value="Community Sign-In">
</div>
</form>

<h2>Third party sign in</h2>

<p><a
href="/account/login/facebook/?next=/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="><img
src="/media/img/misc/btn_login_facebook.png" alt="Sign in with Facebook"
/></a></p>

<p><a
href="/account/login/github/?next=/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="><img
src="/media/img/misc/btn_login_github.png" alt="Sign in with Github"
/></a></p>

<p><a
href="/account/login/google/?next=/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="><img
src="/media/img/misc/btn_login_google.png" alt="Sign in with Google"
/></a></p>

<p><a
href="/account/login/microsoft/?next=/account/auth/21/?d=0KoEb9W1FaSjHeKQ643Htw==$Y4_M5XrobAlSIRZiHNqXOdS6ELWZdvdkgSBEEImQ70RTp41NCqkB68suwxlddO27rD1jjNiolGg-U172YeyXEw=="><img
src="/media/img/misc/btn_login_microsoft.png" alt="Sign in with Microsoft"
/></a></p>

</div> <!-- pgContentWrap -->
</div>
</div>
</div>

<!-- Footer -->
<footer id="footer">
<div class="container">
<div class="row">
<div class="col-md-12">
<ul>
<li><a href="https://twitter.com/postgresql"><img
src="/media/img/atpostgresql.png" alt="@postgresql"></a></li>
<li><a href="
https://git.postgresql.org/gitweb/?p=postgresql.git"><img
src="/media/img/git.png" alt="Git"></a></li>
</ul>
</div>
</div>
</div>
<!-- Copyright -->
<div class="container">
<a href="/about/policies/">Policies</a> |
<a href="/about/policies/coc/">Code of Conduct</a> |
<a href="/about/">About PostgreSQL</a> |
<a href="/about/contact/">Contact</a><br/>
<p>Copyright &copy; 1996-2023 The PostgreSQL Global Development
Group</p>
</div>
</footer>
<script src="/media/js/jquery-3.4.1.slim.min.js"></script>
<script src="/media/js/popper-1.16.0.min.js"></script>
<script src="/media/js/bootstrap-4.4.1.min.js"></script>
<script src="/media/js/main.js?97a426bd"></script>

</body>
</html>

I don't understand what is wrong here: I get status 200 but the HTML
response says that I must use the community account which I'm actually
using ?

Thanks


From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Pierre Forstmann <pierre(dot)forstmann(at)gmail(dot)com>
Cc: pgsql-www(at)postgresql(dot)org
Subject: Re: cannot download mbox with python
Date: 2023-12-07 16:06:11
Message-ID: EB2B3047-FAB1-4F5A-A535-6C3A82DFD945@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

> On 7 Dec 2023, at 16:34, Pierre Forstmann <pierre(dot)forstmann(at)gmail(dot)com> wrote:
>
> Hello,
>
> I'm trying to download
> /list/pgsql-bugs/mbox/pgsql-bugs.202312
> with following code using my postgresql.org account:
>
> print(url + '... ')
> response = requests.get(url, auth=('xxx','yyy'))

I'm not very well versed in Python, but isn't this for doing plain HTTP auth?
The postgresql.org account does not support http auth, you need to login and
create a session.

--
Daniel Gustafsson


From: Pierre Forstmann <pierre(dot)forstmann(at)gmail(dot)com>
To: daniel(at)yesql(dot)se
Cc: pgsql-www(at)postgresql(dot)org
Subject: Re: cannot download mbox with python
Date: 2023-12-07 16:39:18
Message-ID: CAM-sOH_TOtSDVHbC76hyr3akFt2P6o8myFcFdOiGpq6GCDc9JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

I've tried this:

import requests
from urllib.parse import urlparse

url = '/list/pgsql-bugs/mbox/pgsql-bugs.202312'
#response = requests.get(url, auth=('xxx','yyy'))
session = requests.session()
session.auth = ('xxx','yyy')
response = session.get(url)
print('status: ' + str(response.status_code))
print('... done')
print(response.content)

But I've have same behaviour:

status: 200
... done
b'<!doctype html>\n<html lang="en">\n <head>\n <title>PostgreSQL:
</title>\n <meta name="viewport" content="width=device-width,
initial-scale=1, shrink-to-fit=no">\n <meta http-equiv="Content-Type"
content="text/xhtml; charset=utf-8" />\n \n <meta name="theme-color"
content="#336791"/>\n <meta name="copyright" content="The PostgreSQL
Global Development Group" />\n <link
href="/media/css/fontawesome.css?97a426bd" rel="stylesheet">\n <link
rel="stylesheet" href="/media/css/bootstrap-4.4.1.min.css">\n <link
rel="shortcut icon" href="/favicon.ico" />\n \n <link rel="stylesheet"
type="text/css" href="/dyncss/base.css?97a426bd">\n\n <script
src="/media/js/theme.js?97a426bd"></script>\n\n \n </head>\n <body>\n
<div class="container-fluid">\n <div class="row
justify-content-md-center">\n <div class="col">\n <!--
Header -->\n <nav class="navbar navbar-expand-lg navbar-light
bg-light">\n <a class="navbar-brand p-0" href="/">\n
<img class="logo" src="/media/img/about/press/elephant.png"
alt="PostgreSQL Elephant Logo">\n </a>\n <button
class="navbar-toggler" type="button" data-toggle="collapse"
data-target="#pgNavbar" aria-controls="pgNavbar" aria-expanded="false"
aria-label="Toggle navigation">\n <span
class="navbar-toggler-icon"></span>\n </button>\n
<div class="collapse navbar-collapse" id="pgNavbar">\n <ul
class="navbar-nav mr-auto">\n <li class="nav-item p-2"><a
href="/" title="Home">Home</a></li>\n <li class="nav-item
p-2"><a href="/about/" title="About">About</a></li>\n <li
class="nav-item p-2"><a href="/download/"
title="Download">Download</a></li>\n <li class="nav-item
p-2"><a href="/docs/" title="Documentation">Documentation</a></li>\n
<li class="nav-item p-2"><a href="/community/"
title="Community">Community</a></li>\n <li class="nav-item
p-2"><a href="/developer/" title="Developers">Developers</a></li>\n
<li class="nav-item p-2"><a href="/support/"
title="Support">Support</a></li>\n <li class="nav-item
p-2"><a href="/about/donate/" title="Donate">Donate</a></li>\n
<li class="nav-item p-2"><a href="/account/" title="Your account">Your
account</a></li>\n </ul>\n <form role="search"
method="get" action="/search/">\n <div
class="input-group">\n <input id="q" name="q" type="text"
size="20" maxlength="255" accesskey="s" class="form-control"
placeholder="Search for...">\n <span
class="input-group-btn">\n <button class="btn
btn-default" type="submit"><i class="fas fa-search"></i></button>\n
</span>\n </div><!-- /input-group -->\n
</form>\n <form id="form-theme" class="form-inline d-none">\n
<button id="btn-theme" class="btn btn-default ml-1"
type="button"></button>\n </form>\n </div>\n
</nav>\n </div>\n </div>\n <div class="row
justify-content-center pg-shout-box">\n <div class="col text-white
text-center">9th November 2023: <a
href="/about/news/postgresql-161-155-1410-1313-1217-and-1122-released-2749/">\n
PostgreSQL 16.1, 15.5, 14.10, 13.13, 12.17, and 11.22
Released!\n</a>\n\n</div>\n </div>\n </div>\n \n<div
class="container-fluid margin">\n <div class="row">\n <div
class="col-lg-2">\n <div id="pgSideWrap">\n \n </div> <!--
pgSideWrap -->\n </div>\n <div class="col-lg-10">\n <div
id="pgContentWrap">\n \n<h1>Sign in <i class="fas
fa-sign-in-alt"></i></h1>\n<p>\n\nThe website you are trying to log in to
(List archives) is using the\npostgresql.org community login system. In
this system you create a\ncentral account that is used to log into most
postgresql.org services.\nOnce you are logged into this account, you will
automatically be\nlogged in to the associated postgresql.org
services.\n\n</p>\n<p>\nIf you do not already have an account,\nyou can
either <a href="/account/signup/">create</a>\na dedicated account, or use
one of the third party sign-in systems below.\n</p>\n\n<h2>Community
account sign-in</h2>\n<p>\nIf you have a postgresql.org community account
with a password, please\nuse the form below to sign in. If you have one but
have lost your\npassword, you can use the <a
href="/account/reset/">password reset</a> form.\n</p>\n\n\n<form action="."
method="post" id="login-form"><input type="hidden"
name="csrfmiddlewaretoken"
value="74NVUzwZ2xyfKCjEKU55vymmVDYvbgwiZHKhjCpIvdoYT7zPBvHYk9KOcVXgJng3">\n
<div class="form-group">\n <input type="text" class="form-control"
name="username" id="id_username" placeholder="Username or email address"
autofocus />\n </div>\n <div class="form-group">\n <input
type="password" class="form-control" name="password" id="id_password"
placeholder="Password"/>\n <input type="hidden"
name="this_is_the_login_form" value="1" />\n <input type="hidden"
name="next"
value="/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="
/>\n </div>\n <div class="submit-row">\n <input class="btn
btn-primary" type="submit" value="Community Sign-In">\n
</div>\n</form>\n\n\n<h2>Third party sign in</h2>\n\n<p><a
href="/account/login/facebook/?next=/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="><img
src="/media/img/misc/btn_login_facebook.png" alt="Sign in with Facebook"
/></a></p>\n\n<p><a
href="/account/login/github/?next=/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="><img
src="/media/img/misc/btn_login_github.png" alt="Sign in with Github"
/></a></p>\n\n<p><a
href="/account/login/google/?next=/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="><img
src="/media/img/misc/btn_login_google.png" alt="Sign in with Google"
/></a></p>\n\n<p><a
href="/account/login/microsoft/?next=/account/auth/21/?d=4LgkpffGQZ-w7rNu0SrH0A==$UcDKaTr8VxPom_YBpCfbgbv_aWh1WKTFT5lWX_XFUObGzyUqbaMnLclP3VMUfEDQaHEbrzaMs74Py2lrQ0atyw=="><img
src="/media/img/misc/btn_login_microsoft.png" alt="Sign in with Microsoft"
/></a></p>\n\n\n\n\n </div> <!-- pgContentWrap -->\n </div>\n
</div>\n</div>\n\n <!-- Footer -->\n <footer id="footer">\n
<div class="container">\n <div class="row">\n <div
class="col-md-12">\n <ul>\n <li><a href="
https://twitter.com/postgresql"><img src="/media/img/atpostgresql.png"
alt="@postgresql"></a></li>\n <li><a href="
https://git.postgresql.org/gitweb/?p=postgresql.git"><img
src="/media/img/git.png" alt="Git"></a></li>\n </ul>\n
</div>\n </div>\n </div>\n <!-- Copyright -->\n <div
class="container">\n <a href="/about/policies/">Policies</a> |\n
<a href="/about/policies/coc/">Code of Conduct</a> |\n <a
href="/about/">About PostgreSQL</a> |\n <a
href="/about/contact/">Contact</a><br/>\n <p>Copyright &copy;
1996-2023 The 토토 사이트</p>\n </div>\n
</footer>\n <script
src="/media/js/jquery-3.4.1.slim.min.js"></script>\n <script
src="/media/js/popper-1.16.0.min.js"></script>\n <script
src="/media/js/bootstrap-4.4.1.min.js"></script>\n <script
src="/media/js/main.js?97a426bd"></script>\n\n </body>\n</html>\n'

Thanks

Le jeu. 7 déc. 2023 à 17:06, Daniel Gustafsson <daniel(at)yesql(dot)se> a écrit :

> > On 7 Dec 2023, at 16:34, Pierre Forstmann <pierre(dot)forstmann(at)gmail(dot)com>
> wrote:
> >
> > Hello,
> >
> > I'm trying to download
> > /list/pgsql-bugs/mbox/pgsql-bugs.202312
> > with following code using my postgresql.org account:
> >
> > print(url + '... ')
> > response = requests.get(url, auth=('xxx','yyy'))
>
> I'm not very well versed in Python, but isn't this for doing plain HTTP
> auth?
> The postgresql.org account does not support http auth, you need to login
> and
> create a session.
>
> --
> Daniel Gustafsson
>
>


From: Andreas 'ads' Scherbaum <ads(at)pgug(dot)de>
To: Pierre Forstmann <pierre(dot)forstmann(at)gmail(dot)com>, daniel(at)yesql(dot)se
Cc: pgsql-www(at)postgresql(dot)org
Subject: Re: cannot download mbox with python
Date: 2023-12-07 16:49:46
Message-ID: 25b24513-cf6f-406e-a04e-b3d834ad5398@pgug.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-www

On 07/12/2023 17:39, Pierre Forstmann wrote:
> I've tried this:
>
> import requests
> from urllib.parse import urlparse
>
> url = '/list/pgsql-bugs/mbox/pgsql-bugs.202312'
> #response = requests.get(url, auth=('xxx','yyy'))
> session = requests.session()
> session.auth = ('xxx','yyy')
> response = session.get(url)
> print('status: ' + str(response.status_code))
> print('... done')
> print(response.content)

The session.auth is still doing a basic http auth, not what you need here.

Try opening your link in a browser in an anonymous window:

/list/pgsql-bugs/mbox/pgsql-bugs.202312

It redirects you to the login, you need to emulate that path in your script,
login into the website and then you can retrieve the mbox.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project