Quick Links

Re: Protocol Question

Lists:	Postg스포츠 토토SQL :

From:	Thomas Heller <info(at)zilence(dot)net>
To:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Protocol Question
Date:	2014-08-11 14:03:58
Message-ID:	CAGTxmOvGwPoBC01-XnS-1cTNE29T2+QRuheQBPSrKNpbgVpmXQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Hello,

I wrote a new Postgresql Client in Java to directly talk to the server
without the mess that is JDBC.

Anyways, I have a fairly low level Question:

The first 16 bits (well skip the first 40) in every DataRow ('D') packet
refer to the number of columns following. Is there any way that this number
is different from the number received in the RowDescription ('T')? I only
plan to supported the Extended Query flow, so I assume I'll always have
RowDescription first.

Currently I use an assert since I assume it always is the same, however I
only have tested with a very limited dataset so far.

Just curious if there are possible exceptions? I'd assume not, just sanity
checking.

Thanks,
/thomas

PS: Is there some sort of test-suite or dataset I could use to test my
client implementation?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Thomas Heller <info(at)zilence(dot)net>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-11 15:06:19
Message-ID:	22826.1407769579@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Thomas Heller <info(at)zilence(dot)net> writes:
> The first 16 bits (well skip the first 40) in every DataRow ('D') packet
> refer to the number of columns following. Is there any way that this number
> is different from the number received in the RowDescription ('T')?

A quick look into libpq says that it doesn't support them being different:

/* Get the field count and make sure it's what we expect */
if (pqGetInt(&tupnfields, 2, conn))
{
/* We should not run out of data here, so complain */
errmsg = libpq_gettext("insufficient data in \"D\" message");
goto advance_and_error;
}

if (tupnfields != nfields)
{
errmsg = libpq_gettext("unexpected field count in \"D\" message");
goto advance_and_error;
}

> Currently I use an assert since I assume it always is the same, however I
> only have tested with a very limited dataset so far.

An assert seems overly optimistic. libpq has an explicit test with an
error recovery path, you should too.

regards, tom lane

From:	Thomas Heller <info(at)zilence(dot)net>
To:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-12 15:58:45
Message-ID:	CAGTxmOuW94eHab4==QxzPFxVkBB16jQ2UXpYH7CRjH3r+vimNg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Thanks, explicit test included.

Another quick Question about the request flow.

In an Extended Query Lifecycle, in order to prepare a query I send the
Commands

Parse('P') / Describe('D') / Sync('S')

read 1/t/T/Z then to execute

Bind('B') / Execute('E') / Flush('H')

then read until either CommandComplete('C') or PortalSuspended('s')

on 's' I send another 'E' (only happens when I executed with a limit before)

once I arrive at 'C' I send the final 'S'.

That all works perfectly fine, I am unsure however on the 'H' (Flush) part.

I looked at a tcpdump for a ruby and pgjdbc query flow but couldn't figure
out how to execute with a limit. Anyways they send:

P/B/D/E/S

"The Flush message does not cause any specific output to be generated, but
forces the backend to deliver any data pending in its output buffers. A
Flush must be sent after any extended-query command except Sync, if the
frontend wishes to examine the results of that command before issuing more
commands. Without Flush, messages returned by the backend will be combined
into the minimum possible number of packets to minimize network overhead."

If I skip the Flush after Execute I receive no data, if I Execute and Sync
I receive the the Limit of rows and a ReadyForQuery('Z').

Is it ok to send a Flush or should I Sync always? I'm leaning towards Sync,
but I don't really need the ReadyForQuery cause I'm still reading. (Only
related to Execute with a Limit).

Hope that wasn't too confusing, just trying to cover an edge case.

Regards,
/thomas

On Mon, Aug 11, 2014 at 5:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Thomas Heller <info(at)zilence(dot)net> writes:
> > The first 16 bits (well skip the first 40) in every DataRow ('D') packet
> > refer to the number of columns following. Is there any way that this
> number
> > is different from the number received in the RowDescription ('T')?
>
> A quick look into libpq says that it doesn't support them being different:
>
> /* Get the field count and make sure it's what we expect */
> if (pqGetInt(&tupnfields, 2, conn))
> {
> /* We should not run out of data here, so complain */
> errmsg = libpq_gettext("insufficient data in \"D\" message");
> goto advance_and_error;
> }
>
> if (tupnfields != nfields)
> {
> errmsg = libpq_gettext("unexpected field count in \"D\" message");
> goto advance_and_error;
> }
>
> > Currently I use an assert since I assume it always is the same, however I
> > only have tested with a very limited dataset so far.
>
> An assert seems overly optimistic. libpq has an explicit test with an
> error recovery path, you should too.
>
> regards, tom lane
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Thomas Heller <info(at)zilence(dot)net>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-12 22:08:48
Message-ID:	9844.1407881328@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Thomas Heller <info(at)zilence(dot)net> writes:
> In an Extended Query Lifecycle, in order to prepare a query I send the
> Commands

> Parse('P') / Describe('D') / Sync('S')

> read 1/t/T/Z then to execute

> Bind('B') / Execute('E') / Flush('H')

This is not a good idea. You *need* to use Sync to terminate a command
sequence in order to be sure of proper error recovery (because if there's
an error during the Execute, the backend will discard subsequent messages
until it sees Sync).

> If I skip the Flush after Execute I receive no data, if I Execute and Sync
> I receive the the Limit of rows and a ReadyForQuery('Z').

That's probably because you're not wrapping this in a transaction so the
Sync implicitly does a commit, discarding the open portal. If you want
to read from a portal in multiple steps then you should issue a BEGIN
first and a COMMIT (or ROLLBACK) after you're done. However, have you
considered just processing the data on-the-fly instead of using a limit?

regards, tom lane

From:	Thomas Heller <info(at)zilence(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-13 00:09:40
Message-ID:	CAGTxmOv3V3wa_yQO+RfXdXXkOw5qQ5gvHoiMhaL_8RG0kyyr8A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

I tried sending P/D/S...B/E/S...E/S... but the second Execute fails because
the portal no longer exists.

So I need a BEGIN/COMMIT for a SELECT in order to read in multiple steps?

I can process the data on-the-fly, my motivation for using Execute with a
limit was to introduce some kind of control on how fast things travel over
the wire. I'm not sure how the backend handles large results sets and
clients that can't keep up. Say I have a SELECT that returns 100k rows,
after 60k rows the clients into some sort of resource limit. Since the
backend just dumps everything on the wire I now have to continue reading
and discarding the remaining rows? I doubt I can Close the statement early
since the buffers on the backend should be full with more rows to send to
the frontend. I don't expect into these kinds of situations very often,
just want the client to "do the right thing". If I run into some sort of
hard limit I can just terminate the connection since I probably have other
things to worry about than reusing a connection.

Anyways, I might be overthinking things. The "standard" flow of
P/D/S.../B/E/S is probably the way to go then.

Regards,
/thomas

PS: The whole thing is available at https://github.com/thheller/shadow-pgsql
but be warned its very alpha.

On Wed, Aug 13, 2014 at 12:08 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Thomas Heller <info(at)zilence(dot)net> writes:
> > In an Extended Query Lifecycle, in order to prepare a query I send the
> > Commands
>
> > Parse('P') / Describe('D') / Sync('S')
>
> > read 1/t/T/Z then to execute
>
> > Bind('B') / Execute('E') / Flush('H')
>
> This is not a good idea. You *need* to use Sync to terminate a command
> sequence in order to be sure of proper error recovery (because if there's
> an error during the Execute, the backend will discard subsequent messages
> until it sees Sync).
>
> > If I skip the Flush after Execute I receive no data, if I Execute and
> Sync
> > I receive the the Limit of rows and a ReadyForQuery('Z').
>
> That's probably because you're not wrapping this in a transaction so the
> Sync implicitly does a commit, discarding the open portal. If you want
> to read from a portal in multiple steps then you should issue a BEGIN
> first and a COMMIT (or ROLLBACK) after you're done. However, have you
> considered just processing the data on-the-fly instead of using a limit?
>
> regards, tom lane
>

From:	Alistair Bayley <alistair(at)abayley(dot)org>
To:	Thomas Heller <info(at)zilence(dot)net>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-14 04:44:50
Message-ID:	CAKYODbh7uOohLUbJoEYNKLuqN=AMWtVYRNfSHhntD2wO-jBTDA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

Why wouldn't TCP be sufficient to throttle the data rate from the server?
If the client stops/slows consuming bytes from its socket, TCP will make
the server slow its sending.

On 13 August 2014 12:09, Thomas Heller <info(at)zilence(dot)net> wrote:

> I tried sending P/D/S...B/E/S...E/S... but the second Execute fails
> because the portal no longer exists.
>
> So I need a BEGIN/COMMIT for a SELECT in order to read in multiple steps?
>
> I can process the data on-the-fly, my motivation for using Execute with a
> limit was to introduce some kind of control on how fast things travel over
> the wire. I'm not sure how the backend handles large results sets and
> clients that can't keep up. Say I have a SELECT that returns 100k rows,
> after 60k rows the clients into some sort of resource limit. Since the
> backend just dumps everything on the wire I now have to continue reading
> and discarding the remaining rows? I doubt I can Close the statement early
> since the buffers on the backend should be full with more rows to send to
> the frontend. I don't expect into these kinds of situations very often,
> just want the client to "do the right thing". If I run into some sort of
> hard limit I can just terminate the connection since I probably have other
> things to worry about than reusing a connection.
>
>
> Anyways, I might be overthinking things. The "standard" flow of
> P/D/S.../B/E/S is probably the way to go then.
>
> Regards,
> /thomas
>
> PS: The whole thing is available at
> https://github.com/thheller/shadow-pgsql but be warned its very alpha.
>
>
>
>
>
> On Wed, Aug 13, 2014 at 12:08 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Thomas Heller <info(at)zilence(dot)net> writes:
>> > In an Extended Query Lifecycle, in order to prepare a query I send the
>> > Commands
>>
>> > Parse('P') / Describe('D') / Sync('S')
>>
>> > read 1/t/T/Z then to execute
>>
>> > Bind('B') / Execute('E') / Flush('H')
>>
>> This is not a good idea. You *need* to use Sync to terminate a command
>> sequence in order to be sure of proper error recovery (because if there's
>> an error during the Execute, the backend will discard subsequent messages
>> until it sees Sync).
>>
>> > If I skip the Flush after Execute I receive no data, if I Execute and
>> Sync
>> > I receive the the Limit of rows and a ReadyForQuery('Z').
>>
>> That's probably because you're not wrapping this in a transaction so the
>> Sync implicitly does a commit, discarding the open portal. If you want
>> to read from a portal in multiple steps then you should issue a BEGIN
>> first and a COMMIT (or ROLLBACK) after you're done. However, have you
>> considered just processing the data on-the-fly instead of using a limit?
>>
>> regards, tom lane
>>
>
>

From:	Thomas Heller <info(at)zilence(dot)net>
To:	Alistair Bayley <alistair(at)abayley(dot)org>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-14 08:00:27
Message-ID:	CAGTxmOug+0mheOYfxiZX65Vno3E641EBwubRCmhezBD7LKnQbg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg스포츠 토토SQL :

Yeah, I went with that and let the server send me everything. If more
control is required, a cursor can be used.

Thanks for the feedback.

Onto implementing the Types. I assume the only place the binary encoding is
documented is the source? Can't find anything anywhere else.

Regards,
/thomas

On Thu, Aug 14, 2014 at 6:44 AM, Alistair Bayley <alistair(at)abayley(dot)org>
wrote:

> Why wouldn't TCP be sufficient to throttle the data rate from the server?
> If the client stops/slows consuming bytes from its socket, TCP will make
> the server slow its sending.
>
>
> On 13 August 2014 12:09, Thomas Heller <info(at)zilence(dot)net> wrote:
>
>> I tried sending P/D/S...B/E/S...E/S... but the second Execute fails
>> because the portal no longer exists.
>>
>> So I need a BEGIN/COMMIT for a SELECT in order to read in multiple steps?
>>
>> I can process the data on-the-fly, my motivation for using Execute with
>> a limit was to introduce some kind of control on how fast things travel
>> over the wire. I'm not sure how the backend handles large results sets and
>> clients that can't keep up. Say I have a SELECT that returns 100k rows,
>> after 60k rows the clients into some sort of resource limit. Since the
>> backend just dumps everything on the wire I now have to continue reading
>> and discarding the remaining rows? I doubt I can Close the statement early
>> since the buffers on the backend should be full with more rows to send to
>> the frontend. I don't expect into these kinds of situations very often,
>> just want the client to "do the right thing". If I run into some sort of
>> hard limit I can just terminate the connection since I probably have other
>> things to worry about than reusing a connection.
>>
>>
>> Anyways, I might be overthinking things. The "standard" flow of
>> P/D/S.../B/E/S is probably the way to go then.
>>
>> Regards,
>> /thomas
>>
>> PS: The whole thing is available at
>> https://github.com/thheller/shadow-pgsql but be warned its very alpha.
>>
>>
>>
>>
>>
>> On Wed, Aug 13, 2014 at 12:08 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>>> Thomas Heller <info(at)zilence(dot)net> writes:
>>> > In an Extended Query Lifecycle, in order to prepare a query I send the
>>> > Commands
>>>
>>> > Parse('P') / Describe('D') / Sync('S')
>>>
>>> > read 1/t/T/Z then to execute
>>>
>>> > Bind('B') / Execute('E') / Flush('H')
>>>
>>> This is not a good idea. You *need* to use Sync to terminate a command
>>> sequence in order to be sure of proper error recovery (because if there's
>>> an error during the Execute, the backend will discard subsequent messages
>>> until it sees Sync).
>>>
>>> > If I skip the Flush after Execute I receive no data, if I Execute and
>>> Sync
>>> > I receive the the Limit of rows and a ReadyForQuery('Z').
>>>
>>> That's probably because you're not wrapping this in a transaction so the
>>> Sync implicitly does a commit, discarding the open portal. If you want
>>> to read from a portal in multiple steps then you should issue a BEGIN
>>> first and a COMMIT (or ROLLBACK) after you're done. However, have you
>>> considered just processing the data on-the-fly instead of using a limit?
>>>
>>> regards, tom lane
>>>
>>
>>
>

From:	Thomas Heller <info(at)zilence(dot)net>
To:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-16 10:01:13
Message-ID:	CAGTxmOt3TO6WsUtB0+jt20O_FzvOPw47q2bn=tO3oRYq9M09XA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-interfaces

I just encountered a weird protocol error while stress testing my client.

I'm receiving unexpected 'n'/NoData messages while reading 'D'/DataRow
messages.

Basically I connect, prepare a query, execute, close, terminate, repeat

The command flow is

-> P/D/S
<- 1/t/T/Z
-> B/E/S
<- 2/D+/C/Z
-> C/S
<- 3/Z
-> X, close socket

What's weird is that I normally expect a 2/D+/C/Z as the result of the
execute, which it is most of the time. But sometimes I receive interleaved
'n'/NoData. According to the protocol documentation NoData is only received
after a Descibe to describe a query that returns no results. If I just
ignore the message and continue reading the query completes normally.

Can someone explain why I get a 'n'/NoData here? Not a problem really, just
curious.

Thanks,
/thomas

On Thu, Aug 14, 2014 at 10:00 AM, Thomas Heller <info(at)zilence(dot)net> wrote:

> Yeah, I went with that and let the server send me everything. If more
> control is required, a cursor can be used.
>
> Thanks for the feedback.
>
> Onto implementing the Types. I assume the only place the binary encoding
> is documented is the source? Can't find anything anywhere else.
>
> Regards,
> /thomas
>
>
>
> On Thu, Aug 14, 2014 at 6:44 AM, Alistair Bayley <alistair(at)abayley(dot)org>
> wrote:
>
>> Why wouldn't TCP be sufficient to throttle the data rate from the server?
>> If the client stops/slows consuming bytes from its socket, TCP will make
>> the server slow its sending.
>>
>>
>> On 13 August 2014 12:09, Thomas Heller <info(at)zilence(dot)net> wrote:
>>
>>> I tried sending P/D/S...B/E/S...E/S... but the second Execute fails
>>> because the portal no longer exists.
>>>
>>> So I need a BEGIN/COMMIT for a SELECT in order to read in multiple steps?
>>>
>>> I can process the data on-the-fly, my motivation for using Execute
>>> with a limit was to introduce some kind of control on how fast things
>>> travel over the wire. I'm not sure how the backend handles large results
>>> sets and clients that can't keep up. Say I have a SELECT that returns 100k
>>> rows, after 60k rows the clients into some sort of resource limit. Since
>>> the backend just dumps everything on the wire I now have to continue
>>> reading and discarding the remaining rows? I doubt I can Close the
>>> statement early since the buffers on the backend should be full with more
>>> rows to send to the frontend. I don't expect into these kinds of situations
>>> very often, just want the client to "do the right thing". If I run into
>>> some sort of hard limit I can just terminate the connection since I
>>> probably have other things to worry about than reusing a connection.
>>>
>>>
>>> Anyways, I might be overthinking things. The "standard" flow of
>>> P/D/S.../B/E/S is probably the way to go then.
>>>
>>> Regards,
>>> /thomas
>>>
>>> PS: The whole thing is available at
>>> https://github.com/thheller/shadow-pgsql but be warned its very alpha.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 13, 2014 at 12:08 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>>> Thomas Heller <info(at)zilence(dot)net> writes:
>>>> > In an Extended Query Lifecycle, in order to prepare a query I send the
>>>> > Commands
>>>>
>>>> > Parse('P') / Describe('D') / Sync('S')
>>>>
>>>> > read 1/t/T/Z then to execute
>>>>
>>>> > Bind('B') / Execute('E') / Flush('H')
>>>>
>>>> This is not a good idea. You *need* to use Sync to terminate a command
>>>> sequence in order to be sure of proper error recovery (because if
>>>> there's
>>>> an error during the Execute, the backend will discard subsequent
>>>> messages
>>>> until it sees Sync).
>>>>
>>>> > If I skip the Flush after Execute I receive no data, if I Execute and
>>>> Sync
>>>> > I receive the the Limit of rows and a ReadyForQuery('Z').
>>>>
>>>> That's probably because you're not wrapping this in a transaction so the
>>>> Sync implicitly does a commit, discarding the open portal. If you want
>>>> to read from a portal in multiple steps then you should issue a BEGIN
>>>> first and a COMMIT (or ROLLBACK) after you're done. However, have you
>>>> considered just processing the data on-the-fly instead of using a limit?
>>>>
>>>> regards, tom lane
>>>>
>>>
>>>
>>
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Thomas Heller <info(at)zilence(dot)net>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-18 05:55:35
Message-ID:	27176.1408341335@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 사이트SQL :

Thomas Heller <info(at)zilence(dot)net> writes:
> I just encountered a weird protocol error while stress testing my client.
> I'm receiving unexpected 'n'/NoData messages while reading 'D'/DataRow
> messages.

AFAICS, NoData can only be returned in response to Describe Statement
or Describe Portal issued against a command that cannot return rows.

If you can provide a counterexample for this claim, I'd be interested
to see it. But my money is on confusion on the client side.

regards, tom lane

From:	Thomas Heller <info(at)zilence(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-interfaces(at)postgresql(dot)org
Subject:	Re: Protocol Question
Date:	2014-08-18 09:12:14
Message-ID:	CAGTxmOsMraAf=aX0bSQtTLEHtTfzOC6uF1=wjqaU+TWB3qY-YQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg무지개 토토SQL :

Thanks Tom, you were in fact correct.

The client tripped over itself and forgot to finish reading some bytes when
reading a binary Text field. How that ended up looking like a NoData
message is beyond me but I have an extra check in place now that ensures
all types read all the bytes received for it. So another potential source
of errors fixed.

Thanks again,
/thomas

On Mon, Aug 18, 2014 at 7:55 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Thomas Heller <info(at)zilence(dot)net> writes:
> > I just encountered a weird protocol error while stress testing my client.
> > I'm receiving unexpected 'n'/NoData messages while reading 'D'/DataRow
> > messages.
>
> AFAICS, NoData can only be returned in response to Describe Statement
> or Describe Portal issued against a command that cannot return rows.
>
> If you can provide a counterexample for this claim, I'd be interested
> to see it. But my money is on confusion on the client side.
>
> regards, tom lane
>