Re: check for interrupts in set_rtable_names

Lists: pgsql-hackers
From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: check for interrupts in set_rtable_names
Date: 2015-11-13 22:51:08
Message-ID: CAMkU=1zq5a74E9JSftTDuyonqnSnSOxja8961qW+qq7Ua74u5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 캔SQL :

Someone sent my server a deranged query, it tripped my
auto_explain.log_min_duration setting, that hit some kind of
pathological case while assigning aliases, and now it sits
uninterruptibly in set_rtable_names for hours.

Is there any reason we can't check for interrupts in set_rtable_names,
like the attached?

I've tried it on a deranged query and it seems to do the job.

A deranged query:

perl -le 'print "explain"; print "select * from pgbench_accounts where
aid=5 union" foreach 1..5000; print "select * from pgbench_accounts
where aid=5 ;"'|psql

Cheers,

Jeff

Attachment Content-Type Size
rtable_names_interrupts.patch application/octet-stream 423 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: check for interrupts in set_rtable_names
Date: 2015-11-13 23:13:52
Message-ID: 8792.1447456432@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> Someone sent my server a deranged query, it tripped my
> auto_explain.log_min_duration setting, that hit some kind of
> pathological case while assigning aliases, and now it sits
> uninterruptibly in set_rtable_names for hours.

> Is there any reason we can't check for interrupts in set_rtable_names,
> like the attached?

There's probably no reason not to do that, but I'd be much more interested
in eliminating the slowness to begin with ...

regards, tom lane


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: check for interrupts in set_rtable_names
Date: 2015-11-13 23:36:53
Message-ID: CAMkU=1zdqb1a990faVY5POB3J6kbFQBBVd6z0aDsH-=e1eFf1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 13, 2015 at 3:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>> Someone sent my server a deranged query, it tripped my
>> auto_explain.log_min_duration setting, that hit some kind of
>> pathological case while assigning aliases, and now it sits
>> uninterruptibly in set_rtable_names for hours.
>
>> Is there any reason we can't check for interrupts in set_rtable_names,
>> like the attached?
>
> There's probably no reason not to do that, but I'd be much more interested
> in eliminating the slowness to begin with ...

I was thinking about that as well, but I don't think that would be
back-patchable, at least not the way I was envisioning it.

I was thinking of detecting bad cases (had to count to over 10 before
finding a novel name, more than 10 times) and then switching from an
object-local count, to a global count, for the numbers to add to the
name. But that would be a behavior change under some conditions.

I think the code says it clearer than prose, so crude first attempt at
that attached.

Cheers,

Jeff

Attachment Content-Type Size
rtable_names_faster.patch application/octet-stream 1.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: check for interrupts in set_rtable_names
Date: 2015-11-16 00:14:12
Message-ID: 16882.1447632852@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Fri, Nov 13, 2015 at 3:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There's probably no reason not to do that, but I'd be much more interested
>> in eliminating the slowness to begin with ...

> I was thinking about that as well, but I don't think that would be
> back-patchable, at least not the way I was envisioning it.

> I was thinking of detecting bad cases (had to count to over 10 before
> finding a novel name, more than 10 times) and then switching from an
> object-local count, to a global count, for the numbers to add to the
> name. But that would be a behavior change under some conditions.

I experimented with using a hash table to avoid the O(N^2) behavior.
This seems to work quite well, and I think it doesn't change the results
(leastwise, it does not break the regression tests).

It would be possible to do something similar to dodge the O(N^2) costs
in make_colname_unique/colname_is_unique; but it would be a lot messier,
and the tests I did suggest that it's fairly hard to get into a regime
where those functions are a huge part of the runtime anyway, because
we have limits on the numbers of columns. So I'm inclined to leave that
alone.

regards, tom lane

Attachment Content-Type Size
speed-up-set-rtable-names.patch text/x-diff 5.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: check for interrupts in set_rtable_names
Date: 2015-11-16 16:35:13
Message-ID: 16068.1447691713@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I experimented with using a hash table to avoid the O(N^2) behavior.
> This seems to work quite well, and I think it doesn't change the results
> (leastwise, it does not break the regression tests).

Looking at this again in the light of morning, it occurred to me that it's
pretty broken in the face of long (approaching NAMEDATALEN) input
identifiers. If the de-duplication digits are beyond the NAMEDATALEN
threshold, it would actually get into an infinite loop because hash_search
would ignore them as not part of the key. That can be fixed by truncating
the name appropriately.

However: actually, this code had a problem already with long identifiers.
What happened before was that it would blithely add digits and thereby
produce an overlength identifier, which indeed looks distinct from the
other ones in the query --- but if we're dumping a view/rule, the
identifier won't be distinct after truncation, which means that
dump/reload could fail, or even worse, silently produce something with
different semantics than intended.

So the attached updated patch takes care of that problem, not only for
relation names but also for column names.

I had been leaning towards not back-patching this, but I now feel that
we must back-patch at least the truncation fix, and we probably might as
well back-patch it in toto. Comments?

regards, tom lane

Attachment Content-Type Size
set-rtable-names-speedup-and-truncation-fix.patch text/x-diff 8.2 KB

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: check for interrupts in set_rtable_names
Date: 2015-11-16 23:52:30
Message-ID: CAMkU=1xgAGXL7z-iqH5JameU4DkqccN_6M7drBqt6Tr9cr+EHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 16, 2015 at 8:35 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I experimented with using a hash table to avoid the O(N^2) behavior.
>> This seems to work quite well, and I think it doesn't change the results
>> (leastwise, it does not break the regression tests).
>
> Looking at this again in the light of morning, it occurred to me that it's
> pretty broken in the face of long (approaching NAMEDATALEN) input
> identifiers. If the de-duplication digits are beyond the NAMEDATALEN
> threshold, it would actually get into an infinite loop because hash_search
> would ignore them as not part of the key. That can be fixed by truncating
> the name appropriately.
>
> However: actually, this code had a problem already with long identifiers.
> What happened before was that it would blithely add digits and thereby
> produce an overlength identifier, which indeed looks distinct from the
> other ones in the query --- but if we're dumping a view/rule, the
> identifier won't be distinct after truncation, which means that
> dump/reload could fail, or even worse, silently produce something with
> different semantics than intended.
>
> So the attached updated patch takes care of that problem, not only for
> relation names but also for column names.
>
> I had been leaning towards not back-patching this, but I now feel that
> we must back-patch at least the truncation fix, and we probably might as
> well back-patch it in toto. Comments?

As-committed it has solved the problem, as best as I can tell.

Thanks,

Jeff