From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: benchmarking Flex practices |
Date: | 2019-11-26 15:32:29 |
Message-ID: | 30156.1574782349@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> writes:
> It seems something is not quite right in v9 with the error position reporting:
> SELECT U&'wrong: +0061' UESCAPE '+';
> ERROR: invalid Unicode escape character at or near "'+'"
> LINE 1: SELECT U&'wrong: +0061' UESCAPE '+';
> - ^
> + ^
> The caret is not pointing to the third token, or the second for that
> matter.
Interesting. For me it points at the third token with or without
your fix ... some flex version discrepancy maybe? Anyway, I have
no objection to your fix; it's probably cleaner than what I had.
>> * I did not do more with ecpg than get it to compile, using the
>> same hacks as in your v7. It still fails its regression tests,
>> but now the reason is that what we've done in parser/parser.c
>> needs to be transposed into the identical functionality in
>> ecpg/preproc/parser.c. Or at least some kind of functionality
>> there. A problem with this approach is that it presumes we can
>> reduce a UIDENT sequence to a plain IDENT, but to do so we need
>> assumptions about the target encoding, and I'm not sure that
>> ecpg should make any such assumptions. Maybe ecpg should just
>> reject all cases that produce non-ASCII identifiers? (Probably
>> it could be made to do something smarter with more work, but
>> it's not clear to me that it's worth the trouble.)
> Hmm, I thought we only allowed Unicode escapes in the first place if
> the server encoding was utf-8. Or did you mean something else?
Well, yeah, but the problem here is that ecpg would have to assume
that the client encoding that its output program will be executed
with is utf-8. That seems pretty action-at-a-distance-y.
I haven't looked closely at what ecpg does with the processed
identifiers. If it just spits them out as-is, a possible solution
is to not do anything about de-escaping, but pass the sequence
U&"..." (plus UESCAPE ... if any), just like that, on to the grammar
as the value of the IDENT token.
BTW, in the back of my mind here is Chapman's point that it'd be
a large step forward in usability if we allowed Unicode escapes
when the backend encoding is *not* utf-8. I think I see how to
get there once this patch is done, so I definitely would not like
to introduce some comparable restriction in ecpg.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-11-26 15:49:11 | Re: ERROR: attribute number 6 exceeds number of columns 5 |
Previous Message | Alvaro Herrera | 2019-11-26 15:09:41 | Re: FETCH FIRST clause WITH TIES option |