Quick Links

[Pljava-dev] UDT send and receive

Lists:	pljava-dev

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-21 20:28:20
Message-ID:	4512F5E4.9060600@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi,

I just stumbled about the readBytes() / writeBytes() methods from
SQLInputFromChunk / SQLOutputFromChunk classes.

They seem to assume that every array has a 2-byte length header.

Where does this assumption come from? Are there some internal data types
that use this format?

I ask because I'm currently writing a pljava interface for the PostGIS
datatypes, and my hope was that readBytes() just reads the whole
(variable-length) datatype into the byte array.

Another issue is the endianness. For double, I assume that, at least on
IEEE floating point platforms, the layout is endian free. But readInt()
etc. seem to use big endian regardless of the underlying platform.

So it seems that I have to use readByte() and put everything together in
the right order on myself, right?

Btw, is does the stream include the 4-byte length header for variable
length datatypes? The examples seem to cover only fixed-size types.

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060921/fb5f2f43/attachment.bin>

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 08:17:27
Message-ID:	45139C17.3050006@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> Hi,
>
> I just stumbled about the readBytes() / writeBytes() methods from
> SQLInputFromChunk / SQLOutputFromChunk classes.
>
> They seem to assume that every array has a 2-byte length header.
>
Those classes are part of the JNI interface. They map to the internal PostgreSQL StringInfo
structure. They are *not* intended for any other use.

Kind Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 08:38:31
Message-ID:	4513A107.4060809@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

>> I just stumbled about the readBytes() / writeBytes() methods from
>> SQLInputFromChunk / SQLOutputFromChunk classes.
>>
>> They seem to assume that every array has a 2-byte length header.
>>
> Those classes are part of the JNI interface. They map to the internal
> PostgreSQL StringInfo structure. They are *not* intended for any other use.

Ok, it seems I've been mislead by reading the docs and source, sorry.
Please tell me, what's their use, if not implementing datatype mappings?

As far as I can see, they're the only implementations of the
SQLInput/ SQLOutput interfaces one uses to implement a custom data type.

So, when mapping a datatype (like the ComplexTuple example delivered
with pljava), one is actually using those classes.

And as the storage format of the PostGIS datatypes is determined by the
C implementation in liblwgeom.so, I have to adhere to that format.

I have to understand how SQLInputFromChunk and SQLOutputFromChunk
work, to find the most efficient implementation while still being
compatible to that predefined binary format.

As it looks now, my only possibility seems to find out the native
endianness of the machine, and then read the data byte-for-byte via
readByte(), and then put it together for myself. Most of the code should
be copyable from the WKB parsers I've already written, so no real
effort, but I'm still afraid that calling a native function per byte
will not perform best.

Thanks for your patience,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 08:48:49
Message-ID:	4513A371.1050401@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Markus Schaber wrote:
> So, when mapping a datatype (like the ComplexTuple example delivered
> with pljava), one is actually using those classes.

Sorry, I meant the ComplexScalar interface, as the PostGIS geometries
are no composite types from PostgreS' view.

Sadly, there are no examples for custom datatypes with variable size, so
I've to extrapolate from the given fixed-length examples, and the
details I know about the C implementation of variable-sized datatypes.

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 08:50:53
Message-ID:	4513A3ED.3050401@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> Hi, Thomas,
>
> Thomas Hallgren wrote:
>
>
>>> I just stumbled about the readBytes() / writeBytes() methods from
>>> SQLInputFromChunk / SQLOutputFromChunk classes.
>>>
>>> They seem to assume that every array has a 2-byte length header.
>>>
>>>
>> Those classes are part of the JNI interface. They map to the internal
>> PostgreSQL StringInfo structure. They are *not* intended for any other use.
>>
>
> Ok, it seems I've been mislead by reading the docs and source, sorry.
> Please tell me, what's their use, if not implementing datatype mappings?
>
>
Their use *is* to implement datatype mappings. PostgreSQL's receive/send
functions use the StringInfo to pass datastructures.

> As far as I can see, they're the only implementations of the
> SQLInput/ SQLOutput interfaces one uses to implement a custom data type.
>
>
You use SQLInput/SQLOutput to map all datatypes but the pre-mapped ones
(i.e. int, long, float, etc.) with PL/Java.

> So, when mapping a datatype (like the ComplexTuple example delivered
> with pljava), one is actually using those classes.
>
>
Correct. But there's more to it. Please read:
http://wiki.tada.se/wiki/display/pljava/Mapping+an+SQL+type+to+a+Java+class.
There's an example mapping the PostgreSQL geometric point type to a Java
class.

> And as the storage format of the PostGIS datatypes is determined by the
> C implementation in liblwgeom.so, I have to adhere to that format.
>
>
Yes you do. But that should not be a problem.

> I have to understand how SQLInputFromChunk and SQLOutputFromChunk
> work, to find the most efficient implementation while still being
> compatible to that predefined binary format.
>
> As it looks now, my only possibility seems to find out the native
> endianness of the machine, and then read the data byte-for-byte via
> readByte(), and then put it together for myself. Most of the code should
> be copyable from the WKB parsers I've already written, so no real
> effort, but I'm still afraid that calling a native function per byte
> will not perform best.
>
>
You should not bother with endianess. The implementation deals with that.

Kind Regards,
Thomas Hallgren

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 08:54:46
Message-ID:	4513A4D6.7060900@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	PostgreSQL : PostgreSQL 메일 링리스트 : 2006-09-22 이후 토토 결과 08:54

Markus Schaber wrote:
> Sadly, there are no examples for custom datatypes with variable size, so
> I've to extrapolate from the given fixed-length examples, and the
> details I know about the C implementation of variable-sized datatypes.
>
AFAIK, they all start with a 4 byte header denoting the length. So your mapping starts by
doing a readInt() and then you base the rest on that.

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 13:18:56
Message-ID:	4513E2C0.3050804@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

> AFAIK, they all start with a 4 byte header denoting the length. So
> your mapping starts by doing a readInt() and then you base the rest
> on that.

>From my tests, that 4 byte header is cut of, the SQLInput starts at the
first "real" data byte.

>> As it looks now, my only possibility seems to find out the native
>> endianness of the machine, and then read the data byte-for-byte via
>> readByte(), and then put it together for myself. Most of the code
>> should be copyable from the WKB parsers I've already written, so no
>> real effort, but I'm still afraid that calling a native function
>> per byte will not perform best.
>>
> You should not bother with endianess. The implementation deals with
> that.

At leas on my machine (i386 architecture), the doubles I read in get
scrambled when using readDouble(). However, when I swap the bytes, I get
the correct results. From looking at the code, readDouble() & co seem to
assume network byte order, instead of using the native endianness of the
machine. But for reading PostGIS geometry data, I've to parse the native
format, as that is what the C code uses.

So I will resort to read byte-for-byte.

Thanks for your patience,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 13:29:15
Message-ID:	4513E52B.5080108@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi Markus,
I should have checked before I spoke. I see now that the implementation
is incorrect. The intended functionality should of course be that the 4
byte integer is intact when present and that all endianness is taken
care of. This is a bug of course and I'll fix that eventually. Sorry for
the confusion.

Regards,
Thomas Hallgren

Markus Schaber wrote:
> Hi, Thomas,
>
> Thomas Hallgren wrote:
>
>
>> AFAIK, they all start with a 4 byte header denoting the length. So
>> your mapping starts by doing a readInt() and then you base the rest
>> on that.
>>
>
> From my tests, that 4 byte header is cut of, the SQLInput starts at the
> first "real" data byte.
>
>
>>> As it looks now, my only possibility seems to find out the native
>>> endianness of the machine, and then read the data byte-for-byte via
>>> readByte(), and then put it together for myself. Most of the code
>>> should be copyable from the WKB parsers I've already written, so no
>>> real effort, but I'm still afraid that calling a native function
>>> per byte will not perform best.
>>>
>>>
>> You should not bother with endianess. The implementation deals with
>> that.
>>
>
> At leas on my machine (i386 architecture), the doubles I read in get
> scrambled when using readDouble(). However, when I swap the bytes, I get
> the correct results. From looking at the code, readDouble() & co seem to
> assume network byte order, instead of using the native endianness of the
> machine. But for reading PostGIS geometry data, I've to parse the native
> format, as that is what the C code uses.
>
> So I will resort to read byte-for-byte.
>
> Thanks for your patience,
> Markus
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pljava-dev mailing list
> Pljava-dev at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/pljava-dev
>

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 14:22:24
Message-ID:	4513F1A0.8020800@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

> I should have checked before I spoke. I see now that the implementation
> is incorrect. The intended functionality should of course be that the 4
> byte integer is intact when present and that all endianness is taken
> care of. This is a bug of course and I'll fix that eventually. Sorry for
> the confusion.

No problem.

Fixing that now will break all datatypes that already use the interface
(forcing a dump/reload of the database when they upgrade).

At least, if there are any. :-)

What's the best way to fix it? Using native functions for reading the
values?

I can try to contribute when you like.

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 14:42:01
Message-ID:	4513F639.3000108@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> What's the best way to fix it? Using native functions for reading the
> values?
>
>
It's two fixes basically. The endian stuff can be fixed in Java and
directly in the SQLOutputToChunk and SQLInputFromChunk classes. The
methods should mimic what java.nio.DirectByteBuffer does.
java.nio.ByteOrder.nativeOrder() gives the native ordering.

The four byte offset stuff must be fixed in the C file that corresponds
to the SQLInputFromChunk. There's no need to fix for the output since
such a fix will introduce an ambiguity. The length is determined by the
number of bytes written anyway, so why introduce a need to go back and
add it. Important to document this behavior though.

I'm not too concerned with backward compatibility on this. The current
behavior is severely broken. It must be fixed.

> I can try to contribute when you like.
>
>
Yes, please do! Give it a try and submit a patch. My time is very
limited these days.

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 15:11:29
Message-ID:	4513FD21.5080608@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

> Markus Schaber wrote:
>> What's the best way to fix it? Using native functions for reading the
>> values?
>
> It's two fixes basically. The endian stuff can be fixed in Java and
> directly in the SQLOutputToChunk and SQLInputFromChunk classes. The
> methods should mimic what java.nio.DirectByteBuffer does.
> java.nio.ByteOrder.nativeOrder() gives the native ordering.

Ah, great. I'll look into it.

> The four byte offset stuff must be fixed in the C file that corresponds
> to the SQLInputFromChunk. There's no need to fix for the output since
> such a fix will introduce an ambiguity. The length is determined by the
> number of bytes written anyway, so why introduce a need to go back and
> add it. Important to document this behavior though.

Its assymetric when we read the length on input, but don't write it on
output, I would hesitate to use this assymetry.

Maybe the code should check whether the length header is equal to the
real length, and raise a warning otherwise?

Additionally, the underlying code could intercept an writeInt() at
offset 0, and allocate the underyling StringInfo to the right size. This
avoids repeated reallocation on larger values. (Geometries representing
country borders can easily get several MB large.) But, on the other
hand, this would slow down all writeInt() calls, so I don't know whether
it's worth the effort.

> I'm not too concerned with backward compatibility on this. The current
> behavior is severely broken. It must be fixed.

Ok, I fully agree.

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 15:25:18
Message-ID:	4514005E.2010703@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> Hi, Thomas,
>
> Thomas Hallgren wrote:
>
>
>> Markus Schaber wrote:
>>
>>> What's the best way to fix it? Using native functions for reading the
>>> values?
>>>
>> It's two fixes basically. The endian stuff can be fixed in Java and
>> directly in the SQLOutputToChunk and SQLInputFromChunk classes. The
>> methods should mimic what java.nio.DirectByteBuffer does.
>> java.nio.ByteOrder.nativeOrder() gives the native ordering.
>>
>
> Ah, great. I'll look into it.
>
>
>> The four byte offset stuff must be fixed in the C file that corresponds
>> to the SQLInputFromChunk. There's no need to fix for the output since
>> such a fix will introduce an ambiguity. The length is determined by the
>> number of bytes written anyway, so why introduce a need to go back and
>> add it. Important to document this behavior though.
>>
>
> Its assymetric when we read the length on input, but don't write it on
> output, I would hesitate to use this assymetry.
>
> Maybe the code should check whether the length header is equal to the
> real length, and raise a warning otherwise?
>
>
The problem is that you often don't know the length when you start
writing. And there's no way to do a seek and go back and rewrite once
you're done. But as you mention, asymmetry is not good and far from
everyone will consult the documentation.

A good compromise is perhaps to to introduce the check that you suggest
but also allow the length to be zero? That way you will always need to
write the length and if it's not known at that time, you just write zero
and PL/Java will assign it once you're done writing.

> Additionally, the underlying code could intercept an writeInt() at
> offset 0, and allocate the underyling StringInfo to the right size. This
> avoids repeated reallocation on larger values. (Geometries representing
> country borders can easily get several MB large.) But, on the other
> hand, this would slow down all writeInt() calls, so I don't know whether
> it's worth the effort.
>
>
In most cases (for small datatypes) you'll only loose with that
approach. If it's very common that you write long sequences of data, it
will be a win (reallocation is not cheap). I'd leave it for now and
perhaps make it configurable sometime in the future.

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] Patch for SQLInput implementation
Date:	2006-09-22 15:46:42
Message-ID:	45140562.9070003@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Here's my first attempt for the SQLInput side. (I did not live-test it,
and I possibly broke your formatting wr/t tabs/spaces.)

It fetches the Endianness from ByteOrder, and changes readShort(),
readInt() and readLong() accordingly.

I also refactored the readBytes() method to use readShort() internally
for the length tag (or should we change the length to int, while we're
at it?)

Additionally, I added some missing m_handle checks, and moved the
m_position check into the synchronized block at read() (which is
necessary from my understanding of the java memory model, as position
and chunkSize are both not volatile).

Could you have a quick look at it?

The SQLOutput side will follow soon. :-)

Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: SQLInput.diff
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060922/96f99770/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060922/96f99770/attachment.bin>

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] Patch for SQLOutput implementation
Date:	2006-09-22 16:07:45
Message-ID:	45140A51.8030605@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi,

Here's the patch attempt for the SQLOutput side.

First, it seems that the writeBytes() method was broken, it did write
nothing (not even the length count) for a zero-length array, which lead
to random garbage being read or worse on readBytes().

I also added some handle checks.

As I also adopted your coding style wr/t {}s, so I included the input
side again.

I've to quit now, my schedule is tight his afternoon, but I'll see what
I can do at the weekend and on Monday.

Regards,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: SQLIO.diff
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060922/d8a733ab/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060922/d8a733ab/attachment.bin>

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] readBytes() / writeBytes()
Date:	2006-09-22 16:10:37
Message-ID:	45140AFD.9070500@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

> A good compromise is perhaps to to introduce the check that you suggest
> but also allow the length to be zero? That way you will always need to
> write the length and if it's not known at that time, you just write zero
> and PL/Java will assign it once you're done writing.

Yes, that sounds good. In that case, we should make the check mandatory,
and throw an exception when an explicit size is mismatching.

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] Patch for SQLOutput implementation
Date:	2006-09-22 20:56:11
Message-ID:	45144DEB.2050107@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi Markus,
I looked at it and it looks fine. If you want to do some more work on
it, then I'll wait until Monday applying it. Are you planning to look at
the C-code as well? You seem to know how to keep your bits in order :-)

Regards,
Thomas Hallgren

Markus Schaber wrote:
> Hi,
>
> Here's the patch attempt for the SQLOutput side.
>
> First, it seems that the writeBytes() method was broken, it did write
> nothing (not even the length count) for a zero-length array, which lead
> to random garbage being read or worse on readBytes().
>
> I also added some handle checks.
>
> As I also adopted your coding style wr/t {}s, so I included the input
> side again.
>
> I've to quit now, my schedule is tight his afternoon, but I'll see what
> I can do at the weekend and on Monday.
>
> Regards,
> Markus
>
>

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] Patch for SQLOutput implementation
Date:	2006-09-23 10:36:00
Message-ID:	45150E10.8030907@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

> I looked at it and it looks fine. If you want to do some more work on
> it, then I'll wait until Monday applying it. Are you planning to look at
> the C-code as well? You seem to know how to keep your bits in order :-)

I'd like to build and test it, if you don't have time to do so. At
least, I want to make sure to not have confused big and little endian,
as I tend to do from time to time. :-)

I'm definitely planning to look into the C code. However, I might need
some time to understand the "inner workings", and there's some higher
priority work on my table, so it might take a few days.

Thanks,
Markus

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] UDT send and receive
Date:	2006-09-25 15:05:29
Message-ID:	4517F039.6050709@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Markus Schaber wrote:

> I'm definitely planning to look into the C code. However, I might need
> some time to understand the "inner workings", and there's some higher
> priority work on my table, so it might take a few days.

I'm currently stuck in the C code in UDT.h.

>From looking at the code, it seems to assume that the send and receive
code assumes that the internal (on-disk) representation is the same than
the one used for binary I/O, rather than relying on the send and receive
functions. Also, there are CREATE FUNCTION calls in the example on
http://wiki.tada.se/wiki/display/pljava/Creating+a+Scalar+UDT+in+Java
but the java code does not actually define them.

Is that assumption correct?

At least PostGIS currently uses a slightly different internal
representation internally compared to what send/receive use. The
internal on-disk format is optimized, the external representation is an
upwards compatible extension to the OpenGIS standardized WKB format.

So, from my understanding of the code, it's currently impossible to
implement an 1:1 replacement for PostGIS in pljava. (Not that I
seriously plan to do this.)

Btw, I have the impression that I'm the first one actually trying VARLEN
UDT mapping with pljava :-)

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] UDT send and receive
Date:	2006-09-25 15:58:31
Message-ID:	4517FCA7.8050208@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> Hi, Thomas,
>
> Markus Schaber wrote:
>
>
>> I'm definitely planning to look into the C code. However, I might need
>> some time to understand the "inner workings", and there's some higher
>> priority work on my table, so it might take a few days.
>>
>
> I'm currently stuck in the C code in UDT.h.
>
> From looking at the code, it seems to assume that the send and receive
> code assumes that the internal (on-disk) representation is the same than
> the one used for binary I/O, rather than relying on the send and receive
> functions.
Not sure what you mean. The UDT functions *are* the send/receive
functions (with an added UDT parameter). They don't care much about the
representation as such. There's a length (-2, -1 or a verbatim length)
and then there are bytes of data. The send fucntion uses byteasend,
unknownsend, or a StringInfo depending on the length. The receive
performs the corresponding read. Totally representation agnostic.

> Also, there are CREATE FUNCTION calls in the example on
> http://wiki.tada.se/wiki/display/pljava/Creating+a+Scalar+UDT+in+Java
> but the java code does not actually define them.
>
> Is that assumption correct?
>
>
Yes. They will all be redirected to the UDT_input, UDT_output, UDT_send,
and UDT_receive that you'll find in UTD.c

> At least PostGIS currently uses a slightly different internal
> representation internally compared to what send/receive use. The
> internal on-disk format is optimized, the external representation is an
> upwards compatible extension to the OpenGIS standardized WKB format.
>
OK, so if you want to read that in Java, I assume your SQLData
input/outpout methods must deal with that. Perhaps I miss the point
altogether here. Who converts between the internal format and the
on-disk format?

> So, from my understanding of the code, it's currently impossible to
> implement an 1:1 replacement for PostGIS in pljava. (Not that I
> seriously plan to do this.)
>
>
Not sure I understand why. Why can the data conversion not take place in
Java, should you choose to do that?

> Btw, I have the impression that I'm the first one actually trying VARLEN
> UDT mapping with pljava :-)
>
>
You are. As you've discovered, it's broken.

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VARLEN Patch for UDT.c
Date:	2006-09-25 17:06:43
Message-ID:	45180CA3.2040908@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Here's my try for a VARLEN patch for UDT.c, as we discussed on Friday.

I implemented the size to be 0-able, with strict check if it does not
match on non-0 values, as discussed. I use the same error code, but
slightly different error message (to ease debugging), as the fixed
length path.

One caveat still applies: For the coerceScalarDatum case, the Java code
has to apply the Bitmask VARATT_MASK_SIZE aka 0x3fffffff itsself, as I'm
not confident that I'm allowed to change the data inplace.

What do you think about that?

I also changed dataLen to be int32, as Datums can be larger than 65535
bytes on PostgreSQL. Btw, the old Code contained:
appendBinaryStringInfo(&buffer, (char*)&dataLen, sizeof(int32));
In my eyes, this would have produced garbage, as well as the error
message using %d formatting for int16 dataLen.

I also added the fixed versions of the .java files, It seems that I
really switched the LE and BE branches in some cases. :-(+

The c->java side of the path was tested quick'n'dirty, but not the other
way round, due to time restrictions.

I'm going to implement at least the POINT mapping for PostGIS, and use
that as test case for both directions tomorrow.

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: UDT.diff
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060925/f6a70697/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060925/f6a70697/attachment.bin>

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] UDT send and receive
Date:	2006-09-25 17:45:51
Message-ID:	451815CF.9010507@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

>> From looking at the code, it seems to assume that the send and receive
>> code assumes that the internal (on-disk) representation is the same than
>> the one used for binary I/O, rather than relying on the send and receive
>> functions.
> Not sure what you mean. The UDT functions *are* the send/receive
> functions (with an added UDT parameter). They don't care much about the
> representation as such. There's a length (-2, -1 or a verbatim length)
> and then there are bytes of data. The send fucntion uses byteasend,
> unknownsend, or a StringInfo depending on the length. The receive
> performs the corresponding read. Totally representation agnostic.

Yes. And, as I said, they send the on-disk data 1:1.

PostGIS internally uses a different storage format compare to binary
input/output, for abstraction purposes, and the send and receive
functions perform some conversions (including adding varlena header, and
adopting byte order).

In my eyes, your current approach will fail when someone gets a binary
dump of the data via COPY, and then reloads the data on a platform with
different endianness. And clients using binary V3 protocol need to know
the server's endianness.

>> At least PostGIS currently uses a slightly different internal
>> representation internally compared to what send/receive use. The
>> internal on-disk format is optimized, the external representation is an
>> upwards compatible extension to the OpenGIS standardized WKB format.
>>
> OK, so if you want to read that in Java, I assume your SQLData
> input/outpout methods must deal with that. Perhaps I miss the point
> altogether here. Who converts between the internal format and the
> on-disk format?

For PostGIS: the send and receive functions which are given at CREATE
DATATYPE.

http://svn.refractions.net/postgis/trunk/lwgeom/lwgeom_inout.c has the C
code for those functions, LWGEOM_recv() and LWGEOM_send() which call the
conversion routines WKBFromLWGEOM and LWGEOMFromWKB under the hood.

>> So, from my understanding of the code, it's currently impossible to
>> implement an 1:1 replacement for PostGIS in pljava. (Not that I
>> seriously plan to do this.)
>
> Not sure I understand why. Why can the data conversion not take place in
> Java, should you choose to do that?

Because there is nobody that calls the appropriate Java functions.

When an SQL COPY reads binary data, it is passed through the RECEIVE
function in UDT.c, and then put on the platters.

I hope that I managed to explain the problem, if not, please ask.

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] UDT send and receive
Date:	2006-09-25 18:58:48
Message-ID:	451826E8.1000404@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> Hi, Thomas,
>
> Thomas Hallgren wrote:
>
>
>>> From looking at the code, it seems to assume that the send and receive
>>> code assumes that the internal (on-disk) representation is the same than
>>> the one used for binary I/O, rather than relying on the send and receive
>>> functions.
>>>
>> Not sure what you mean. The UDT functions *are* the send/receive
>> functions (with an added UDT parameter). They don't care much about the
>> representation as such. There's a length (-2, -1 or a verbatim length)
>> and then there are bytes of data. The send fucntion uses byteasend,
>> unknownsend, or a StringInfo depending on the length. The receive
>> performs the corresponding read. Totally representation agnostic.
>>
>
> Yes. And, as I said, they send the on-disk data 1:1.
>
> PostGIS internally uses a different storage format compare to binary
> input/output, for abstraction purposes, and the send and receive
> functions perform some conversions (including adding varlena header, and
> adopting byte order).
>
> In my eyes, your current approach will fail when someone gets a binary
> dump of the data via COPY, and then reloads the data on a platform with
> different endianness. And clients using binary V3 protocol need to know
> the server's endianness.
>
>
We're talking past each other here. The send/receive functions are the
ones that perform the conversion. If you do it in Java, you write
send/receive functions in Java. What you see is just the middle man,
passing data to/from such functions. It pays no specific attention to
representation. That is something you must do yourself in the
corresponding methods of the SQLData implementation.

>>> At least PostGIS currently uses a slightly different internal
>>> representation internally compared to what send/receive use. The
>>> internal on-disk format is optimized, the external representation is an
>>> upwards compatible extension to the OpenGIS standardized WKB format.
>>>
>>>
>> OK, so if you want to read that in Java, I assume your SQLData
>> input/outpout methods must deal with that. Perhaps I miss the point
>> altogether here. Who converts between the internal format and the
>> on-disk format?
>>
>
> For PostGIS: the send and receive functions which are given at CREATE
> DATATYPE.
>
>
Well, yes. But if you want to use the data types in you'll need to have
corresponding functions there. Are you suggesting that when reading, the
receive in the CREATE DATATYPE should be called first, and then, another
function should be called that would create the actual Java object from
the result of what the receive would output? IMHO, that's just pushing
the problem one step ahead. The SQLData implementation might just as
well mimic what the receive does in the first place, i.e. why not manage
the input to the receive rather then the output and save one roundtrip?

> http://svn.refractions.net/postgis/trunk/lwgeom/lwgeom_inout.c has the C
> code for those functions, LWGEOM_recv() and LWGEOM_send() which call the
> conversion routines WKBFromLWGEOM and LWGEOMFromWKB under the hood.
>
>
OK, so it would be fully possible to create special purpose Java/C JNI
mappings for that. But what we are discussing here is a *generic* way to
map virtually everything. You can do that by mimicing what lwgeom_inout
is doing in Java. If you don't want to mimic that, and if you don't want
special purpose JNI functions, then I have a hard time understanding how
you'd go about doing the mapping.

>
>>> So, from my understanding of the code, it's currently impossible to
>>> implement an 1:1 replacement for PostGIS in pljava. (Not that I
>>> seriously plan to do this.)
>>>
>> Not sure I understand why. Why can the data conversion not take place in
>> Java, should you choose to do that?
>>
>
> Because there is nobody that calls the appropriate Java functions.
>
>
On the contrary. If you map a PostGIS datatype to a SQLData
implementation class, PL/Java will see to that an instance of that class
is created and that it is fed with raw data from the PostGIS data type.

> When an SQL COPY reads binary data, it is passed through the RECEIVE
> function in UDT.c, and then put on the platters.
>
>
Right, and in that case you never see Java do anything. Not unless the
type is a UDT that is fully Java of course (with a CREATE DATATYPE that
actually appoints the functions in UDT.c)

> I hope that I managed to explain the problem, if not, please ask.
>
>
Still unclear I'm afraid :-)

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] Sorry, wrong patch...
Date:	2006-09-25 19:05:01
Message-ID:	4518285D.5070200@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

It seems that I attached the wrong patch, the correct version is here (I
hope :-).

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: UDT2.diff
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060925/3afd59c2/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060925/3afd59c2/attachment.bin>

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] UDT send and receive
Date:	2006-09-25 19:51:31
Message-ID:	45183343.8000105@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

[Foreword: Currently, I don't have any intent of defining the PostGIS
datatype as a full pljava-defined UDT (the C code deals fine with this).
A "simple" mapping (and thus the readSQL/writeSQL methods) are enough
for me now. I just want to ensure that my understanding of the whole
issue is correct, and that we remove all limitations that would prevent
others (or me in the future) from implementing full UDT mappings.]

Thomas Hallgren wrote:

> We're talking past each other here. The send/receive functions are the
> ones that perform the conversion. If you do it in Java, you write
> send/receive functions in Java. What you see is just the middle man,
> passing data to/from such functions. It pays no specific attention to
> representation. That is something you must do yourself in the
> corresponding methods of the SQLData implementation.

Ah, now I understand.

I erroneously assumed the same Magic like the input/output functions
mapping to parse()/toString().

You suggest to implement send and receive as normal static functions in
Java, and then use CREATE FUNCTION without the "UDT[foo] output" special
syntax.

The Problem I see here is that, for the receive function, the Datatype
"internal" is used as function parameter, which currently has no mapping
for pljava, at least according to
http://wiki.tada.se/wiki/display/pljava/Default+Type+Mapping

So I can define the send function (which returns bytea) the way you
suggest, but not the receive function.

> Well, yes. But if you want to use the data types in you'll need to have
> corresponding functions there. Are you suggesting that when reading, the
> receive in the CREATE DATATYPE should be called first, and then, another
> function should be called that would create the actual Java object from
> the result of what the receive would output?

Yes, at least that's how I understand PostgreSQL under the hoods.

Let me clarify my view of the things:

We have (at least) 4 different representations:

A) the "canonical text representation"
B) the "canonical binary representation"
C) the "internal" representation
D) Java Objects.

C) is what PostgreSQL passes around (to PostGIS C functions as well as
to the PLJava glue code), and stores on disk. The size is defined as
"internallength" in the datatype, and contained in a 4-byte VARLEN
header for variable length datatypes which have internallength set to
-1. (let's ignore TOAST and 0-terminated Strings for simplification.)

D) is what's passed around in "user functions" in pljava lands, obviously.

A) is used in psql, pg_dump, non-binary COPY, the V2 protocol and the
text mode of the V3 protocol.

B) is used in binary COPY and the binary mode of the V3 protocol.

The pljava UDT mapping converts between C and D via the readSQL() and
writeSQL() methods.

PostgreSQL uses the input and output functions defined for the type to
convert between A and C.

The send and receive functions for the datatype convert between B and C.

http://www.postgresql.org/docs/8.1/interactive/xtypes.html contains an
example with some less-sophisticated, but explicitly coded send and
receive functions.

I hope it is understandable what I try to explain.

Thanks for your patience,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] UDT send and receive
Date:	2006-09-25 20:47:09
Message-ID:	4518404D.2050407@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> I erroneously assumed the same Magic like the input/output functions
> mapping to parse()/toString().
>
Nope, no magic :-)

> You suggest to implement send and receive as normal static functions in
> Java, and then use CREATE FUNCTION without the "UDT[foo] output" special
> syntax.
>
Unless you already have a type with send/receive written in C, then yes.
If you want to map a pre-existing type, you can still do that, but then
you need to duplicate the send/receive functionality in Java in order to
access the internals of that type.

> The Problem I see here is that, for the receive function, the Datatype
> "internal" is used as function parameter, which currently has no mapping
> for pljava, at least according to
> http://wiki.tada.se/wiki/display/pljava/Default+Type+Mapping
>
In some sense, that's true. The mapping is in itself internal. It's
there though...

> So I can define the send function (which returns bytea) the way you
> suggest, but not the receive function.
>
Yes you can. The internal receive function will map the internal bytea
that it receives to an SQLInput (the SQLInputFromChunk) and will pass it
to a SQLData implementation (which the user must provide).

> We have (at least) 4 different representations:
>
> A) the "canonical text representation"
> B) the "canonical binary representation"
> C) the "internal" representation
> D) Java Objects.
>
>
> C) is what PostgreSQL passes around (to PostGIS C functions as well as
> to the PLJava glue code), and stores on disk. The size is defined as
> "internallength" in the datatype, and contained in a 4-byte VARLEN
> header for variable length datatypes which have internallength set to
> -1. (let's ignore TOAST and 0-terminated Strings for simplification.)
>
> D) is what's passed around in "user functions" in pljava lands,
obviously.
>
> A) is used in psql, pg_dump, non-binary COPY, the V2 protocol and the
> text mode of the V3 protocol.
>
> B) is used in binary COPY and the binary mode of the V3 protocol.
>
>
> The pljava UDT mapping converts between C and D via the readSQL() and
> writeSQL() methods.
>
> PostgreSQL uses the input and output functions defined for the type to
> convert between A and C.
>
> The send and receive functions for the datatype convert between B and C.
>
> http://www.postgresql.org/docs/8.1/interactive/xtypes.html contains an
> example with some less-sophisticated, but explicitly coded send and
> receive functions.
>
> I hope it is understandable what I try to explain.
>
Yes, this makes sense to me. Do you still find this approach limiting or
is it in line with what you would like to have?

Regards,
Thomas Hallgren