Re: Making plpython 2 and 3 coexist a bit better

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-11 15:44:42
Message-ID: 13858.1452527082@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Commit 803716013dc1350f installed a safeguard against loading plpython2
and plpython3 at the same time, stating

+ It is not allowed to use PL/Python based on Python 2 and PL/Python
+ based on Python 3 in the same session, because the symbols in the
+ dynamic modules would clash, which could result in crashes of the
+ PostgreSQL server process. There is a check that prevents mixing
+ Python major versions in a session, which will abort the session if
+ a mismatch is detected. It is possible, however, to use both
+ PL/Python variants in the same database, from separate sessions.

It turns out though that the freedom promised in the last sentence
is fairly illusory, because if you have functions in both languages
in one DB and you try a dump-and-restore, it will fail.

But it gets worse: a report today in pgsql-general points out that
even if you have the two languages in use *in different databases*,
pg_upgrade will fail, because pg_upgrade rather overenthusiastically
loads every .so mentioned anywhere in the source installation into
one session.

I think we might be able to improve this if we don't do the check in
plpython's _PG_init(), but delay it until we have a need to call into
libpython; which we could skip doing at _PG_init() time, at the cost
of perhaps one more flag check per plpython function call to see if
we've initialized libpython yet.

The behavior would then be that if you load both language .so's
into one session, neither one would be willing to do anything
anymore --- not run a function, and not syntax-check one either,
because any attempted call into either libpython might crash.
But restoring a dump has no need to do either of those things;
it just wants to be able to LOAD the .so. Also, the error for
not being able to do something wouldn't have to be FATAL.

Comments? Am I being too optimistic about whether it's safe to
have both libpythons loaded if we're not calling either of them?

(If I am, we should at least try to improve pg_upgrade so that
it's not imposing a greater restriction than anything else in
the system.)

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-11 17:51:45
Message-ID: 28712.1452534705@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I think we might be able to improve this if we don't do the check in
> plpython's _PG_init(), but delay it until we have a need to call into
> libpython; which we could skip doing at _PG_init() time, at the cost
> of perhaps one more flag check per plpython function call to see if
> we've initialized libpython yet.

> The behavior would then be that if you load both language .so's
> into one session, neither one would be willing to do anything
> anymore --- not run a function, and not syntax-check one either,
> because any attempted call into either libpython might crash.
> But restoring a dump has no need to do either of those things;
> it just wants to be able to LOAD the .so. Also, the error for
> not being able to do something wouldn't have to be FATAL.

Attached is a draft patch along this line. It seems to work as intended.

I can think of at least a couple of ways in which this test might be
inadequate. One is that a plpython2 function might call a plpython3
function or vice versa. The inner plpython_call_handler would correctly
throw an error, but if the outer function trapped the error and continued
execution, it would be at risk. Another, more hypothetical risk is that
once we've executed anything out of one libpython, it might have changed
process state (eg, installed hooks) in such a way that it can get control
back even without an explicit call to plpython. In that case, a
subsequent load of another Python version would put things at risk for
such execution in the first libpython.

We could ameliorate the first of these cases by putting the can't-run-
with-two-pythons error back up to FATAL rather than ERROR, but I'm not
sure that that would be a net improvement --- FATAL errors aren't very
friendly. In any case errors of the second type seem unpreventable
unless we stick with the immediate-FATAL-error approach.

Since plpython only comes in an untrusted (hence superuser-only) flavor,
holes like this wouldn't be security issues but only "well, you shouldn't
do that" problems. I'm inclined to think it's a good tradeoff for being
able to dump/restore/upgrade mixed installations, which are surely
likely to get more popular.

Thoughts?

regards, tom lane

Attachment Content-Type Size
relax-python-version-conflict-test.patch text/x-diff 6.0 KB

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-11 18:50:00
Message-ID: 5693F958.2080708@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토 사이트SQL

On 1/11/16 11:51 AM, Tom Lane wrote:
> We could ameliorate the first of these cases by putting the can't-run-
> with-two-pythons error back up to FATAL rather than ERROR, but I'm not
> sure that that would be a net improvement --- FATAL errors aren't very
> friendly. In any case errors of the second type seem unpreventable
> unless we stick with the immediate-FATAL-error approach.

Something that's always concerned me about functions in other languages
is that any kind of snafu in the function/language can hose the backend,
which you may or may not detect. I've used other databases that (by
default) spin up a separate process for executing functions, maybe we
could do something like that? If we treated 2 and 3 as different
languages you could actually use both at the same time in a single
backend. The only thing that's not clear to me is how you'd be able to
re-enter the process during recursive/nested calls.

Obviously this is a lot more work than what you're proposing though. :(
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-11 19:00:05
Message-ID: 31659.1452538805@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg사설 토토SQL

Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:
> Something that's always concerned me about functions in other languages
> is that any kind of snafu in the function/language can hose the backend,
> which you may or may not detect. I've used other databases that (by
> default) spin up a separate process for executing functions, maybe we
> could do something like that? If we treated 2 and 3 as different
> languages you could actually use both at the same time in a single
> backend. The only thing that's not clear to me is how you'd be able to
> re-enter the process during recursive/nested calls.

There's at least one PL/Java implementation that does that. The
interprocess communication overhead is pretty awful, IIRC. Don't know
what they do about nested calls.

> Obviously this is a lot more work than what you're proposing though. :(

Yeah. I think what I'm suggesting is a back-patchable fix, which that
certainly wouldn't be.

I realized that the patch I put up before would do the Wrong Thing if
an updated library were loaded followed by a not-updated library for
the other python version. Ideally that couldn't happen, but considering
that the two plpythons are likely to get packaged in separate subpackages
(certainly Red Hat does things that way), it's not too hard to envision
cases where it would. So the attached update corrects that and fixes the
docs too. We can lose the whole test in HEAD, but I think it's necessary
in the back branches.

The question of whether to do ERROR or FATAL remains open. I'm not sure
I have a strong preference either way.

regards, tom lane

Attachment Content-Type Size
relax-python-version-conflict-test-2.patch text/x-diff 7.3 KB

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-11 19:17:20
Message-ID: 5693FFC0.8030801@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/11/16 1:00 PM, Tom Lane wrote:
> There's at least one PL/Java implementation that does that. The
> interprocess communication overhead is pretty awful, IIRC. Don't know
> what they do about nested calls.

You'd think that pipes wouldn't be that much overhead...

>> >Obviously this is a lot more work than what you're proposing though.:(
> Yeah. I think what I'm suggesting is a back-patchable fix, which that
> certainly wouldn't be.

Yeah, and it sounds like we need one.

> The question of whether to do ERROR or FATAL remains open. I'm not sure
> I have a strong preference either way.

If they both get loaded is there risk of bad data happening? Personally,
I'll take a traceable FATAL (or even PANIC) over data corruption every
time. But I'm guessing that if you tried to use both you'd pretty
immediately end up crashing the backend.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-11 19:46:58
Message-ID: 865.1452541618@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> writes:
> On 1/11/16 1:00 PM, Tom Lane wrote:
>> The question of whether to do ERROR or FATAL remains open. I'm not sure
>> I have a strong preference either way.

> If they both get loaded is there risk of bad data happening? Personally,
> I'll take a traceable FATAL (or even PANIC) over data corruption every
> time. But I'm guessing that if you tried to use both you'd pretty
> immediately end up crashing the backend.

Yeah, the conservative solution would definitely be to use FATAL.
It's still a usability improvement over what we have now.

In further experimentation, I found out that even with this patch in
place, the plpython transform extensions still present a dump/reload
hazard:

test=# create extension plpython2u;
CREATE EXTENSION
test=# create extension ltree_plpython3u cascade;
NOTICE: installing required extension "ltree"
NOTICE: installing required extension "plpython3u"
ERROR: multiple Python libraries are present in session
DETAIL: Only one Python major version can be used in one session.

AFAICS, the reason for that is this code in the transform extension
scripts:

-- make sure the prerequisite libraries are loaded
DO '1' LANGUAGE plpython3u;

It does appear that we need something for this, because if you just
remove it you get failures like

Symbol not found: _PLyUnicode_FromStringAndSize

But I wonder why we couldn't make it do

LOAD 'plpython3';

instead. If I change the script like that, it seems to go through fine
even with both Python libraries loaded, because CREATE TRANSFORM does not
actually call into the language implementation.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-11 23:33:19
Message-ID: 9717.1452555199@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> In further experimentation, I found out that even with this patch in
> place, the plpython transform extensions still present a dump/reload
> hazard:
> ...
> But I wonder why we couldn't make it do
> LOAD 'plpython3';

I've verified that the attached additional patch makes everything behave
nicely in this corner of the world. AFAICS this adds no assumptions
that aren't already built into pg_pltemplate.

Last call for objections ...

regards, tom lane

Attachment Content-Type Size
transform-load-fix.patch text/x-diff 4.1 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-18 17:55:24
Message-ID: 20160118175524.GL31313@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:
> Commit 803716013dc1350f installed a safeguard against loading plpython2
> and plpython3 at the same time, stating
>
> + It is not allowed to use PL/Python based on Python 2 and PL/Python
> + based on Python 3 in the same session, because the symbols in the
> + dynamic modules would clash, which could result in crashes of the
> + PostgreSQL server process. There is a check that prevents mixing
> + Python major versions in a session, which will abort the session if
> + a mismatch is detected. It is possible, however, to use both
> + PL/Python variants in the same database, from separate sessions.
>
> It turns out though that the freedom promised in the last sentence
> is fairly illusory, because if you have functions in both languages
> in one DB and you try a dump-and-restore, it will fail.
>
> But it gets worse: a report today in pgsql-general points out that
> even if you have the two languages in use *in different databases*,
> pg_upgrade will fail, because pg_upgrade rather overenthusiastically
> loads every .so mentioned anywhere in the source installation into
> one session.

The fix for that would be for pg_upgrade to change
check_loadable_libraries() to start a new session for each LOAD command.
Would you like that done?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-18 18:10:05
Message-ID: 2342.1453140605@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:
>> But it gets worse: a report today in pgsql-general points out that
>> even if you have the two languages in use *in different databases*,
>> pg_upgrade will fail, because pg_upgrade rather overenthusiastically
>> loads every .so mentioned anywhere in the source installation into
>> one session.

> The fix for that would be for pg_upgrade to change
> check_loadable_libraries() to start a new session for each LOAD command.
> Would you like that done?

Not necessary anymore, at least not for PL/Python; and that solution
sounds a tad expensive.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Making plpython 2 and 3 coexist a bit better
Date: 2016-01-19 01:31:50
Message-ID: 20160119013150.GW31313@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 18, 2016 at 01:10:05PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:
> >> But it gets worse: a report today in pgsql-general points out that
> >> even if you have the two languages in use *in different databases*,
> >> pg_upgrade will fail, because pg_upgrade rather overenthusiastically
> >> loads every .so mentioned anywhere in the source installation into
> >> one session.
>
> > The fix for that would be for pg_upgrade to change
> > check_loadable_libraries() to start a new session for each LOAD command.
> > Would you like that done?
>
> Not necessary anymore, at least not for PL/Python; and that solution
> sounds a tad expensive.

Yes, it would certainly be inefficient.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +