Quick Links

[Pljava-dev] VarLenTuple example code

Lists:	pljava-dev

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-28 13:40:01
Message-ID:	451BD0B1.8000408@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Here's a small example I implemented for variable length data types.

It's not useful for anything but an example.

I included the source, as well as the diff for the examples.ddr file, so
regard it as contribution.

It all seems to work fine, apart from one small problem:

> [local]:pljavatest=# copy foo to '/tmp/foo' binary;
> COPY
> [local]:pljavatest=# copy foo from '/tmp/foo' binary;
> ERROR: Unable to find static method
> org.postgresql.pljava.example.VarLenTuple.receive
> with signature
> (Ljava/lang/String;)Lorg/postgresql/pljava/example/VarLenTuple;
> CONTEXT: COPY foo, line 1, column t

>From our discussion, I had the impression that the receive method
will get an VarLenTuple instance (as I created a type mapping)
and not a String instance.

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: varlentuple.diff
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060928/db64ac7a/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: VarLenTuple.java
Type: text/x-java
Size: 6542 bytes
Desc: not available
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060928/db64ac7a/attachment.bin>

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-28 13:49:40
Message-ID:	451BD2F4.60202@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hmm, perhaps just an oversight, bot you don't have UDT syntax on the
send/receive methods, i.e. this line:

+ AS 'org.postgresql.pljava.example.VarLenTuple.receive'

should read

+ AS 'UDT[org.postgresql.pljava.example.VarLenTuple] receive'

Regards,
Thomas Hallgren

Markus Schaber wrote:
> Hi, Thomas,
>
> Here's a small example I implemented for variable length data types.
>
> It's not useful for anything but an example.
>
> I included the source, as well as the diff for the examples.ddr file, so
> regard it as contribution.
>
> It all seems to work fine, apart from one small problem:
>
>
>> [local]:pljavatest=# copy foo to '/tmp/foo' binary;
>> COPY
>> [local]:pljavatest=# copy foo from '/tmp/foo' binary;
>> ERROR: Unable to find static method
>> org.postgresql.pljava.example.VarLenTuple.receive
>> with signature
>> (Ljava/lang/String;)Lorg/postgresql/pljava/example/VarLenTuple;
>> CONTEXT: COPY foo, line 1, column t
>>
>
> >From our discussion, I had the impression that the receive method
> will get an VarLenTuple instance (as I created a type mapping)
> and not a String instance.
>
> Thanks,
> Markus
>
> ------------------------------------------------------------------------
>
> Index: src/java/examples/deployment/examples.ddr
> ===================================================================
> RCS file: /usr/local/cvsroot/pljava/org.postgresql.pljava/src/java/examples/deployment/examples.ddr,v
> retrieving revision 1.35
> diff -u -r1.35 examples.ddr
> --- src/java/examples/deployment/examples.ddr 13 May 2006 09:18:38 -0000 1.35
> +++ src/java/examples/deployment/examples.ddr 28 Sep 2006 13:29:26 -0000
> @@ -430,6 +430,57 @@
> AS 'org.postgresql.pljava.example.ComplexTuple.logAndReturn'
> LANGUAGE java IMMUTABLE STRICT;
>
> + /*
> + * Shell Type Definition for VarLenTuple:
> + */
> + CREATE FUNCTION javatest.varlentuple(cstring) RETURNS javatest.varlentuple AS 'lower' LANGUAGE INTERNAL;
> +
> + /* The scalar input function */
> + CREATE FUNCTION varlentuple_in(cstring)
> + RETURNS javatest.varlentuple
> + AS 'UDT[org.postgresql.pljava.example.VarLenTuple] input'
> + LANGUAGE java IMMUTABLE STRICT;
> +
> + /* The scalar output function */
> + CREATE FUNCTION varlentuple_out(javatest.varlentuple)
> + RETURNS cstring
> + AS 'UDT[org.postgresql.pljava.example.VarLenTuple] output'
> + LANGUAGE java IMMUTABLE STRICT;
> +
> + /* The scalar receive function */
> + CREATE FUNCTION varlentuple_recv(internal)
> + RETURNS javatest.varlentuple
> + AS 'org.postgresql.pljava.example.VarLenTuple.receive'
> + LANGUAGE java IMMUTABLE STRICT;
> +
> + /* The scalar send function */
> + CREATE FUNCTION varlentuple_send(javatest.varlentuple)
> + RETURNS bytea
> + AS 'org.postgresql.pljava.example.VarLenTuple.send'
> + LANGUAGE java IMMUTABLE STRICT;
> +
> + /* The scalar type declaration */
> + CREATE TYPE javatest.varlentuple (
> + internallength = -1,
> + input = javatest.varlentuple_in,
> + output = javatest.varlentuple_out,
> + receive = javatest.varlentuple_recv,
> + send = javatest.varlentuple_send,
> + alignment = int4
> + );
> +
> + /* A test function that just logs and returns its argument.
> + */
> + CREATE FUNCTION javatest.logvarlentuple(javatest.varlentuple)
> + RETURNS javatest.varlentuple
> + AS 'org.postgresql.pljava.example.VarLenTuple.logAndReturn'
> + LANGUAGE java IMMUTABLE STRICT;
> +
> + /* Install the actual type mapping.
> + */
> + SELECT sqlj.add_type_mapping('javatest.varlentuple', 'org.postgresql.pljava.example.VarLenTuple');
> +
> +
> /*
> * An example using the ANY type
> */
>
> ------------------------------------------------------------------------
>
> /*
> * Copyright (c) 2004, 2005, 2006 TADA AB - Taby Sweden
> * Copyright (c) 2006 Markus Schaber <schabi at logix-tt.com>
> * Distributed under the terms shown in the file COPYRIGHT
> * found in the root directory of this distribution or at
> * http://eng.tada.se/osprojects/COPYRIGHT.html
> */
> package org.postgresql.pljava.example;
>
> import java.io.ByteArrayOutputStream;
> import java.io.DataOutputStream;
> import java.io.IOException;
> import java.nio.ByteOrder;
> import java.sql.SQLData;
> import java.sql.SQLException;
> import java.sql.SQLInput;
> import java.sql.SQLOutput;
> import java.util.logging.Logger;
>
>
> /** Datatype implementing a variable length Tuple of Doubles.
> *
> * Not really useful except as an example and test for implementing fully fledged VARLENA datatyes
> * including custom binary representation via Java. Code is intentionally kept simple, for the sake
> * of efficency, and based on the ComplexTuple example.
> * @author schabi
> *
> */
> public class VarLenTuple implements SQLData
> {
> private static Logger s_logger = Logger.getAnonymousLogger();
>
> private static final String m_typeName = "javatest.varlentuple";
>
> private static final double[] EMPTY = {};
>
> private double[] arr = EMPTY;
>
> public static VarLenTuple parse(String input, String typeName) throws SQLException
> {
> checkType(typeName);
> if ((input == null) || (input.length()<2) || (input.charAt(0)!='(') || (input.charAt(input.length()-1) != ')') ) {
> throw new SQLException("Illegal value for "+m_typeName+": "+input);
> }
>
> if (input.length()==2) {
> return new VarLenTuple();
> }
> String[] parts = input.substring(1, input.length()-1).split(",");
>
> double[] temparr = new double[parts.length];
>
> for (int i=0; i < parts.length; i++) {
> try {
> temparr[i] = Double.parseDouble(parts[i]);
> } catch (NumberFormatException e) {
> throw new SQLException("Bad double '"+parts[i]+"' while parsing "+m_typeName+": "+e.getMessage());
> }
> }
>
> return new VarLenTuple(temparr);
>
> }
>
> private static void checkType(String typeName) throws SQLException {
> if (!m_typeName.equalsIgnoreCase(typeName)) {
> throw new SQLException("parser for "+m_typeName+" cannot deserialize type "+typeName+"!");
> }
> }
>
> public VarLenTuple()
> {
> }
>
> public VarLenTuple(double[] arr)
> {
> this.arr = arr;
> }
>
> public String getSQLTypeName()
> {
> return m_typeName;
> }
>
> public void readSQL(SQLInput stream, String typeName) throws SQLException
> {
> s_logger.info(typeName + " from SQLInput");
> checkType(typeName);
>
> // fetch PostgreSQL VarLenA Header
> int bin_size = stream.readInt() & 0x3fffffff;
> int count = (bin_size - 4)/8;
>
> if (bin_size != (count*8 +4))
> {
> throw new SQLException("Bad Size Header!");
> }
>
> double[] temparr = new double[count];
>
> for (int i=0; i < count; i++)
> {
> temparr[i] = stream.readDouble();
> }
>
> // all went well, we can assign the value
> arr = temparr;
> }
>
> public void writeSQL(SQLOutput stream) throws SQLException
> {
> s_logger.info(m_typeName + " to SQLOutput");
> int count = arr.length;
>
> stream.writeInt(count*8 +4);
> for (int i=0; i < count; i++)
> {
> stream.writeDouble(arr[i]);
> }
> }
>
> public String toString()
> {
> s_logger.info(m_typeName + " toString");
> StringBuffer sb = new StringBuffer();
> sb.append('(');
> if (arr.length>0)
> {
> sb.append(arr[0]);
> for (int i=1; i < arr.length; i++)
> {
> sb.append(',');
> sb.append(arr[i]);
> }
> }
> sb.append(')');
> return sb.toString();
> }
>
> public static VarLenTuple logAndReturn(VarLenTuple cpl)
> {
> s_logger.info(cpl.getSQLTypeName() + cpl);
> return cpl;
> }
>
> public static VarLenTuple receive(SQLInput input) throws SQLException {
> int count = input.readInt();
> // we have to swap, as we use network byte order for external representation
> if (ByteOrder.nativeOrder()==ByteOrder.LITTLE_ENDIAN) {
> count = Integer.reverseBytes(count);
> }
>
> if (count==0) {
> return new VarLenTuple();
> }
> double[] temparr = new double[count];
>
> if (ByteOrder.nativeOrder()==ByteOrder.LITTLE_ENDIAN) {
> for (int i=0; i < count; i++) {
> temparr[i] = Double.longBitsToDouble(Long.reverseBytes(input.readLong()));
> }
> } else {
> for (int i=0; i < count; i++) {
> temparr[i] = input.readDouble();
> }
>
> }
> return new VarLenTuple(temparr);
> }
>
> public static byte[] send(VarLenTuple input) throws SQLException {
> double[] temparr = input.arr;
> int count = temparr.length;
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(count*8+4);
>
> //DataOutputStream always uses Network Byte Order (Big Endian)
> DataOutputStream os = new DataOutputStream(bos);
> try {
> os.writeInt(count);
>
> for (int i=0; i < count; i++) {
> os.writeDouble(temparr[i]);
> }
> os.flush();
> } catch (IOException e) {
> throw new SQLException("Error writing temporary buffer: "+e.getMessage());
> }
> return bos.toByteArray();
>
> }
> }
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pljava-dev mailing list
> Pljava-dev at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/pljava-dev
>

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-28 14:04:00
Message-ID:	451BD650.2080201@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:
> Hmm, perhaps just an oversight, bot you don't have UDT syntax on the
> send/receive methods, i.e. this line:
>
> + AS 'org.postgresql.pljava.example.VarLenTuple.receive'
>
> should read
>
> + AS 'UDT[org.postgresql.pljava.example.VarLenTuple] receive'

That was intentional, as I wanted the customary written send and receive
methods to be called (and not those from UDT.c) as I want the
send/receive binary format [type B)] in a platform independent
big-endian format, different from the internal one [type C)].

Declaring those methods without UDT[] magic was the way I hoped to
achieve that, according to how I understood our discussion.

Btw, I don't know whether my actual implementations of send() and
receive() are correct, as I could not really test them, but from a quick
look at the hexdump of the COPYed file, at least the send() way seems to
work.

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.pgfoundry.org/pipermail/pljava-dev/attachments/20060928/d259ead0/attachment.bin>

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-28 14:25:34
Message-ID:	451BDB5E.1020107@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> Hi, Thomas,
>
> Thomas Hallgren wrote:
>
>> Hmm, perhaps just an oversight, bot you don't have UDT syntax on the
>> send/receive methods, i.e. this line:
>>
>> + AS 'org.postgresql.pljava.example.VarLenTuple.receive'
>>
>> should read
>>
>> + AS 'UDT[org.postgresql.pljava.example.VarLenTuple] receive'
>>
>
> That was intentional, as I wanted the customary written send and receive
> methods to be called (and not those from UDT.c) as I want the
> send/receive binary format [type B)] in a platform independent
> big-endian format, different from the internal one [type C)].
>
> Declaring those methods without UDT[] magic was the way I hoped to
> achieve that, according to how I understood our discussion.
>
>
No, in that case you must have misunderstood. When you do that, you
remove the middleman that sets up the correct SQLData semantics. The
send/receive functions in PostgreSQL maps to the readSQL/writeSQL
methods in SQLData. The readSQL is the method that supposedly populate
your VarLenTuple from the byte buffer native presentation (wrapped in
the SQLInput). The writeSQL creates the native presentation byte buffer
(wrapped in the SQLOutput) from the VarLenTuple. There's absolutely no
need for you to create some other send/receive mechanism in Java.

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-28 14:50:19
Message-ID:	451BE12B.6060809@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

> No, in that case you must have misunderstood. When you do that, you
> remove the middleman that sets up the correct SQLData semantics. The
> send/receive functions in PostgreSQL maps to the readSQL/writeSQL
> methods in SQLData. The readSQL is the method that supposedly populate
> your VarLenTuple from the byte buffer native presentation (wrapped in
> the SQLInput). The writeSQL creates the native presentation byte buffer
> (wrapped in the SQLOutput) from the VarLenTuple. There's absolutely no
> need for you to create some other send/receive mechanism in Java.

I just re-read your last mail, and I now see that I misread it.

But I still don't grasp how the SQLData implementation (readSQL/
writeSQL) knows whether it's supposed to read or write the internal
on-disk representation (type C) or the external binary (type B) format?

Do I need three different SQLData implementations, one for send (read C,
write B), one for receiving (read B, write C), and one for the type
mapping (read C, write C)? And which of them are supposed to have the
toString()/parse() methods?

Anyhow, I'm going to re-read our discussion and the source tomorrow,
maybe a little more sleep helps. :-)

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-28 16:07:06
Message-ID:	451BF32A.7000405@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> Hi, Thomas,
>
> Thomas Hallgren wrote:
>
>> No, in that case you must have misunderstood. When you do that, you
>> remove the middleman that sets up the correct SQLData semantics. The
>> send/receive functions in PostgreSQL maps to the readSQL/writeSQL
>> methods in SQLData. The readSQL is the method that supposedly populate
>> your VarLenTuple from the byte buffer native presentation (wrapped in
>> the SQLInput). The writeSQL creates the native presentation byte buffer
>> (wrapped in the SQLOutput) from the VarLenTuple. There's absolutely no
>> need for you to create some other send/receive mechanism in Java.
>
> I just re-read your last mail, and I now see that I misread it.
>
> But I still don't grasp how the SQLData implementation (readSQL/
> writeSQL) knows whether it's supposed to read or write the internal
> on-disk representation (type C) or the external binary (type B) format?
>
> Do I need three different SQLData implementations, one for send (read C,
> write B), one for receiving (read B, write C), and one for the type
> mapping (read C, write C)?
>
No. Just one SQLData implementation per type. The parameter to readSQL should be an SQLInput
wrapping the exact bytes that you would expect to be the parameter to the receive function.
The writeSQL will create the bytes that would otherwise be be created by the send function.

> And which of them are supposed to have the toString()/parse() methods?
>
The parse()/toString() methods are a direct replacements of the mandatory PostgreSQL
input/output functions.

Normally when you create a type in PostgreSQL you'd create four functions in C. Now you
create four methods in Java instead:

input -> parse
output -> toString
receive -> readSQL
send -> writeSQL

> Anyhow, I'm going to re-read our discussion and the source tomorrow,
> maybe a little more sleep helps. :-)
>
:-)

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-29 10:14:41
Message-ID:	451CF211.7000000@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Let's recall the 4 representations I enumerated:

A) the "canonical text representation"
B) the "canonical binary representation"
C) the "internal" representation
D) Java Objects.

Thomas Hallgren wrote:

> No. Just one SQLData implementation per type. The parameter to
> readSQL should be an SQLInput wrapping the exact bytes that you would
> expect to be the parameter to the receive function. The writeSQL will
> create the bytes that would otherwise be be created by the send
> function.

Ok, so then the writeSQL and readSQL are meant to convert between the
representations B and D.

>From the examples and some points of our discussion, I had the
impression they should convert between C and D.

> Normally when you create a type in PostgreSQL you'd create four
> functions in C. Now you create four methods in Java instead:
>
> input -> parse
> output -> toString
> receive -> readSQL
> send -> writeSQL

Yes. Those four functions, as I understand it, do the following
representation conversions:

input: A -> C
output: C -> A
receive: B -> C
send: C -> B

The java methods should do the following tasks:

parse: A -> D
toString: D -> A
readSQL: B -> A & C -> A
writeSQL: A -> B & A -> D

The whole thing only makes sense for me under the assumption that the
representations B and C are identical.

But having them different is the whole point of why explicit send/
receive functions were invented in PostgreSQL, IIRC.

They are not equal for all PostgreSQL numerical datatypes I checked in
source, those convert to network byte order in send(), and back to
platform in receive().

The same is for PostGIS, which uses a compatible extension of the
OpenGIS defined "Well Known Binary" standard format for B, and an
optimized format in platform endiannes that tends to change between
major releases for C.

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-29 10:51:25
Message-ID:	451CFAAD.9030603@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
> Hi, Thomas,
>
> Let's recall the 4 representations I enumerated:
>
> A) the "canonical text representation"
> B) the "canonical binary representation"
> C) the "internal" representation
> D) Java Objects.
>
>
> Ok, so then the writeSQL and readSQL are meant to convert between the
> representations B and D.
>
>
> Yes. Those four functions, as I understand it, do the following
> representation conversions:
>
> input: A -> C
> output: C -> A
> receive: B -> C
> send: C -> B
>
> The java methods should do the following tasks:
>
> parse: A -> D
> toString: D -> A

So far, I agree.

> readSQL: B -> A & C -> A
> writeSQL: A -> B & A -> D
>
Where did A come into the picture? readSQL/writeSQL has nothing
whatsoever to do with text representation. They perform the work of
send/receive and work with binary representations only. Only difference
is that they bypass the C and instead go directly to D.

readSQL: B -> D
writeSQL: D -> B

> The whole thing only makes sense for me under the assumption that the
> representations B and C are identical.
>
> But having them different is the whole point of why explicit send/
> receive functions were invented in PostgreSQL, IIRC.
>
Yes, I agree with that. B and C are different.

For the sake of the argument, lets assume that we create a Java type
that corresponds to the internal format C. That Java type is essentially
a byte[] but we wrap it in a SQLData implementation in order to use it
as a type. For this type, we can claim that C == D since its
representation *is* the internal representation.

The readSQL of this type will then perform *exactly* the same work as
the receive function do, i.e. B -> C. Its corresponding writeSQL method
will perform *exactly* the same work as send, i.e. C -> B.

In Java, the above object is cannot be used for much. It's just an
opaque sequence of bytes. So instead of just wrapping the byte[], you
probably add some more semantics to it. When you do that, you also
rewrite the readSQL go directly from B -> D.

In essence, C is the internal representation used in C-functions
(structures etc.) whereas D is the internal representation used by Java
methods (real objects). They both stem from B in case of send/receive or
from A in case of input/output.

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-29 11:37:12
Message-ID:	451D0568.60105@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Markus Schaber wrote:

> readSQL: B -> A & C -> A
> writeSQL: A -> B & A -> D

I obviously messed that one up.

It should read:

readSQL: B->D & C->D
writeSQL: D->B & D->C

Sorry,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-29 12:01:15
Message-ID:	451D0B0B.4010008@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

>> readSQL: B -> A & C -> A
>> writeSQL: A -> B & A -> D

Sorry, I messed that one up. Should read:

readSQL: B->D & C->D
writeSQL: D->B & D->C

> Where did A come into the picture? readSQL/writeSQL has nothing
> whatsoever to do with text representation. They perform the work of
> send/receive and work with binary representations only. Only difference
> is that they bypass the C and instead go directly to D.

But C is the one that's stored on disk by PostgreSQL, neither D nor B.

>> The whole thing only makes sense for me under the assumption that the
>> representations B and C are identical.
>>
>> But having them different is the whole point of why explicit send/
>> receive functions were invented in PostgreSQL, IIRC.
>>
> Yes, I agree with that. B and C are different.
>
> For the sake of the argument, lets assume that we create a Java type
> that corresponds to the internal format C. That Java type is essentially
> a byte[] but we wrap it in a SQLData implementation in order to use it
> as a type. For this type, we can claim that C == D since its
> representation *is* the internal representation.

But C!=D, as PostgreSQL does not know how to serialize Java Objects, AFAIK.

And when you call writeSQL() on the java object, you set B==C and
effectively store B on disk.

This may ok for new datatypes developed purely in java, although
possibly be suboptimal. And for them, it makes sense to have the
SQLInput/SQLOutput always using network byteorder, as they were before I
had my patch.

But it does not work for someone creating a drop-in replacement for an
existing datatype that was implemented in C previously, as he has to
keep both B and C as they've been before, to be compatible.

> In essence, C is the internal representation used in C-functions
> (structures etc.) whereas D is the internal representation used by Java
> methods (real objects). They both stem from B in case of send/receive or
> from A in case of input/output.

But C is the representation that's stored on-disk, and passed around
internally on the PostgreS' side, into functions implemented in other
procedural languages like C or plpython/plperl. So we cannot simply
ignore it, IMHO.

For my plans regarding PostGIS, I currently only want a type mapping, so
I need conversions between C and D. That works fine, AFAICS, at least
with the endianness patch applied.

But should we ever want to replace the C implementation with a java
implementation for some reason, and stay drop-in compatible wr/t both
clients seeing B, and 3rd-party extensions (like a plpython mapping)
seeing C, that would be impossible using the current pljava design, correct?

Note: I don't want any changes or fixes for that, I only want to be sure
that I understood the internals correctly, and have it documented
properly (I'm willing to write and submit that piece of documentation).

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-29 12:26:08
Message-ID:	451D10E0.8020003@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Markus Schaber wrote:
>
> But C is the one that's stored on disk by PostgreSQL, neither D nor B.
>

Now I'm confused. I though you said C was the internal representation?

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-29 13:37:28
Message-ID:	451D2198.7050202@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

>> But C is the one that's stored on disk by PostgreSQL, neither D nor B.
> Now I'm confused. I though you said C was the internal representation?

Yes, as I described it in Message-ID: <45183343.8000105 at logix-tt.com>
rsp. http://gborg.postgresql.org/pipermail/pljava-dev/2006/000925.html -
at least that's how I thought things work in PostgreSQL:

> C) is what PostgreSQL passes around (to PostGIS C functions as well as
> to the PLJava glue code), and stores on disk. The size is defined as
^^^^^^^^^^^^^^^^^^
> "internallength" in the datatype, and contained in a 4-byte VARLEN
> header for variable length datatypes which have internallength set to
> -1. (let's ignore TOAST and 0-terminated Strings for simplification.)

It's "internal" in the sense that it is not visible to clients outside
of the Postmaster.

Clients (including pg_dump/pg_restore and the COPY formats) see what
output/input (A) and send/receive (B) produce and consume, depending on
whether they use text and binary mode.

But as far as I can tell, PostgreSQL does not use send/receive before
writing the tuple data on disk. At least that is what I believed so far,
I'm rather irritated now and I'm going to double- and triple-check that
immediately.

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-29 14:06:01
Message-ID:	451D2849.8030109@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi Markus,
OK, now I think I finally understand what it is you are trying to hammer
through my thick skull :-)

You are saying that two mechanisms are not enough. We need:

The text (canonical) to disc conversion (the current input/output)
The external binary (canonical) format (used in various protocols) to
disc conversion (the send/receive).
A way to create Java objects from canonical binary format.
A way to create the canonical binary format from Java objects.

Then it all falls into place. This is what I think needs to be done:

1. We need two new methods. One that does the C -> B conversion and one
that does the B -> C. I.e. verbatim Java implementations of the
send/receive. Those methods should be static since this is a byte[] to
byte[] conversion only.
2. The readSQL methods should be called with an SQLInput that wraps the
output of the C -> B converter.
3. The data written by the writeSQL method should be passed to the B ->
C converter.

And of course, Java really comes into play when a COPY is performed
since the B -> C and C ->B must be executed.

Sorry for being so thick-headed.

Kind Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-09-29 15:08:50
Message-ID:	451D3702.5070802@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

> You are saying that two mechanisms are not enough. We need:
>
> The text (canonical) to disc conversion (the current input/output)
> The external binary (canonical) format (used in various protocols) to
> disc conversion (the send/receive).

Yes, correct.

> A way to create Java objects from canonical binary format.
> A way to create the canonical binary format from Java objects.

No, I think we should convert between java objects and the disc format.
The disc format (aka C, "internal") is what we get handed when
PostgreSQL calls SQL functions, and what we have to give back to
PostgreSQL as result (and when calling via SPI). I don't see any value
in the C->B->D roundtrip.

Conversions between C and D are the most common ones, so they should be
fast.

> Then it all falls into place. This is what I think needs to be done:
>
> 1. We need two new methods. One that does the C -> B conversion and one
> that does the B -> C. I.e. verbatim Java implementations of the
> send/receive. Those methods should be static since this is a byte[] to
> byte[] conversion only.

Yes. That's what I wanted to implement with my static receive and send
methods in the VarLenTuple class.

> 2. The readSQL methods should be called with an SQLInput that wraps the
> output of the C -> B converter.
> 3. The data written by the writeSQL method should be passed to the B ->
> C converter.

No, the SQLInput/SQLoutput should wrap the on-disk format directly (what
it does now, AFAICS) for datatype mappings, without calling any converters.

> And of course, Java really comes into play when a COPY is performed
> since the B -> C and C ->B must be executed.

Yes.

[Warning: Direct brain-dump follows. Anticipate confusion.]

So my idea is:

- Use the readSQL/writeSQL format to read and write C (internal / disk
format) for the "type mapping". This means all the example code using
type mapping should continue to work, no code wr/t this has to be changed.

- Leave the UDT[] magic for input/output mapping as it is now, as it
works fine, and using the toString() semantics is nice.

- Drop the UDT[] magic for send/receive, and allow the users to define
them as "normal" static methods.

The only problem when implementing the send/receive functions as static
methods was that the pseudo type "internal" parameter is not mapped
usefully. In my eyes, mapping the "internal" parameter from receive
(which is an StringInfo in PG_GETARG_POINTER(0) internally) to an
SQLInput object should be the easiest way to solve that problem.

(It's a bit misleading to call that pseudo type "internal" despite the
fact that it carries the external representation in this case. One might
think the PostgreSQL core hackers celebrate some kind of cynism. :-)

This way, it's possible to implement both send() and receive() as static
methods in plain java, each implementing one way of a clean conversion
between the Java object and the serialized, external B form.

Actually, this are C->D->B and B->D->C conversions. The C->D and D->B is
handled by the existing type mapping. Just look how my VarLenTuple is
coded, that's what I have in mind.

While meditating, I came to the following conclusion: AFAICS, there's
already a mapping in place for CString to Java Strings, so input/output
functions seem to be implementable with static methods instead of UDT[]
magic, if wanted. And keeping the UDT[] magic code in place for send/
receive will not hurt either, so users can use it when they think it
fits their needs (having equal B and C formats).

Advanced optimization:
For efficiency reasons, it might be useful to have send() and receive()
convert directly between C and B, without the intermediate D step. This
could be made by having both methods working on a pair of SQLInput/
SQLOutput, but we'd need some additional magic to be able to declare
such methods, I think. Maybe we could introduce some declaration that
allows a function to receive and send SQLInput/SQLOutput directly,
without calling the type mapping code, thus they could work with the
on-disk representation directly, a kinda "high speed" path. But I'm
afraid that would be highly PostgreSQL specific and not portable in any way.

Phew.
I think I'll need another night or two to think about all that, but at
least at the moment, it seems to make sense for me. :-)

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-10-01 11:53:52
Message-ID:	451FAC50.9060100@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi Markus,
I started inlining some responses to your ideas but gave it up. I like
your ideas and I think it all comes down to two different approaches;
either we do the static approach and let the send/receive and
input/output be pure static methods and deal with B<->C and A<->C
conversions or we do the more puristic OO approach and let the
send/receive and input/output take the overhead of creating Java objects
and then go from there.

Regardless of what method we choose, we will also need something that
does the C<->D conversion (i.e. SQLData).
I think we must choose one or the other and go all the way. If we go the
static approach (your advanced optimization), then we must do that for
both the input/output and the send/receive pairs. It doesn't make much
sense to do it with only one of them. The more I think about it, the
more I'm leaning towards the static approach. There are three major reasons:

1. Less overhead. Instances are only created when really needed.
2. Very easy to explain since our input, output, send, and receive will
perform the exact same work as would be done by corresponding
implementations written in C.
3. There is no such thing as parse/toString in the SQLData interface so
that's a somewhat contrived construction anyway.

So, this is what I would suggest that we do:

1. We keep the UDT[] syntax for all four methods. They stipulate the
static "interface" and should not be made arbitrary methods. If you want
a type, you have to make it adhere to this "interface" (within quotes
since it's about static methods and not a Java interface per se).
2. We still require that a type implements the SQLData interface. This
will then be used exclusively for C<->D conversions and those methods
have nothing to do with the UDT[] methods. They will be called by the
coerceObject/coerceData methods in PL/Java UDT.c (like today).

I.e. a fullblown type implementation would then have 6 methods (4
static) and look like this:

// The new static interface
//
public static void input(String input, SQLData output) throws
SQLException { ... }
public static String output(SQLData input) throws SQLException { ... }
public static void receive(SQLData input, SQLData output) throws
SQLException { ... }
public static void send(SQLData input, SQLData output) throws
SQLException { ... }

// The SQLData interface
//
public void readSQL(SQLData input, String typename) throws SQLException
{ ... }
public void writeSQL(SQLData output) throws SQLException { ... }

What do you think?

Regards,
Thomas Hallgren

Markus Schaber wrote:
>
> [Warning: Direct brain-dump follows. Anticipate confusion.]
>
> So my idea is:
>
> - Use the readSQL/writeSQL format to read and write C (internal / disk
> format) for the "type mapping". This means all the example code using
> type mapping should continue to work, no code wr/t this has to be changed.
>
> - Leave the UDT[] magic for input/output mapping as it is now, as it
> works fine, and using the toString() semantics is nice.
>
> - Drop the UDT[] magic for send/receive, and allow the users to define
> them as "normal" static methods.
>
> The only problem when implementing the send/receive functions as static
> methods was that the pseudo type "internal" parameter is not mapped
> usefully. In my eyes, mapping the "internal" parameter from receive
> (which is an StringInfo in PG_GETARG_POINTER(0) internally) to an
> SQLInput object should be the easiest way to solve that problem.
>
> (It's a bit misleading to call that pseudo type "internal" despite the
> fact that it carries the external representation in this case. One might
> think the PostgreSQL core hackers celebrate some kind of cynism. :-)
>
> This way, it's possible to implement both send() and receive() as static
> methods in plain java, each implementing one way of a clean conversion
> between the Java object and the serialized, external B form.
>
> Actually, this are C->D->B and B->D->C conversions. The C->D and D->B is
> handled by the existing type mapping. Just look how my VarLenTuple is
> coded, that's what I have in mind.
>
>
>
> While meditating, I came to the following conclusion: AFAICS, there's
> already a mapping in place for CString to Java Strings, so input/output
> functions seem to be implementable with static methods instead of UDT[]
> magic, if wanted. And keeping the UDT[] magic code in place for send/
> receive will not hurt either, so users can use it when they think it
> fits their needs (having equal B and C formats).
>
>
> Advanced optimization:
> For efficiency reasons, it might be useful to have send() and receive()
> convert directly between C and B, without the intermediate D step. This
> could be made by having both methods working on a pair of SQLInput/
> SQLOutput, but we'd need some additional magic to be able to declare
> such methods, I think. Maybe we could introduce some declaration that
> allows a function to receive and send SQLInput/SQLOutput directly,
> without calling the type mapping code, thus they could work with the
> on-disk representation directly, a kinda "high speed" path. But I'm
> afraid that would be highly PostgreSQL specific and not portable in any way.
>
> Phew.
> I think I'll need another night or two to think about all that, but at
> least at the moment, it seems to make sense for me. :-)
>
> Thanks,
> Markus
>

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-10-01 12:38:37
Message-ID:	451FB6CD.7070808@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

Hi, Thomas,

Thomas Hallgren wrote:

> I.e. a fullblown type implementation would then have 6 methods (4
> static) and look like this:
>
> // The new static interface
> //
> public static void input(String input, SQLData output) throws
> SQLException { ... }
> public static String output(SQLData input) throws SQLException { ... }

Using a String in input/output looks like a good compromise between
usability and speed.

I'd prefer to have a CharSequence in both places (especially as we could
use a StringBuilder in output then), but as they're @since 1.5, String
seems the better choice.

> public static void receive(SQLData input, SQLData output) throws
> SQLException { ... }
> public static void send(SQLData input, SQLData output) throws
> SQLException { ... }

Did you mean SQLInput/SQLOutput instead of SQLData here?

> // The SQLData interface
> //
> public void readSQL(SQLData input, String typename) throws SQLException
> { ... }
> public void writeSQL(SQLData output) throws SQLException { ... }

SQLInput/SQLOutput here, too, I guess. :-)

And a simple mapping for an existing Datatype still needs to implement
only those two, excellent.

> What do you think?

I think it's a good idea. Someone who wants to have the object
representation in input/output and/or send/receive can still use the
readSQL/writeSQL methods of an instance he created itsself. I'll recode
my VarLenTuple example to work this way, and explain the alternatives
using comments.

There's one little problem I currently see: We'll need 2 implementations
of SQLInput and SQLOutput. One for machine endianness (to parse C), and
one for network byte order (to parse D).

Or maybe we have both a Big and Little endian implementation, and the
UDT[] syntax contains a hint on which byte order to use (Big=Network,
Little, Native=whatever the machine has), with the defaults of Network
for B and Native for C.

Having two fixed-endian implementations will even be more efficient
compared to the current one that has an if() in every method, in case
the Jit does not grasp it.

Btw, do you have any idea if and how other databases implement the UDT
mapping for java?

Thanks,
Markus

From:	thomas at tada(dot)se (Thomas Hallgren)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-10-01 13:31:54
Message-ID:	451FC34A.5070803@tada.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	PostgreSQL : PostgreSQL 메일 링리스트 : 2006-10-01 이후 범퍼카 토토 13:31

Markus Schaber wrote:
> I'd prefer to have a CharSequence in both places (especially as we could
> use a StringBuilder in output then), but as they're @since 1.5, String
> seems the better choice.
>
>
Yes, I thought about that too. Another thing that could be used is the
java.io.Reader/Writer but it would only be beneficial for very large
types. For smaller types we 'd probably loose performance and definitely
use simplicity.

>> public static void receive(SQLData input, SQLData output) throws
>> SQLException { ... }
>> public static void send(SQLData input, SQLData output) throws
>> SQLException { ... }
>>
>
> Did you mean SQLInput/SQLOutput instead of SQLData here?
>
>
Doh, of course!

>> // The SQLData interface
>> //
>> public void readSQL(SQLData input, String typename) throws SQLException
>> { ... }
>> public void writeSQL(SQLData output) throws SQLException { ... }
>>
>
> SQLInput/SQLOutput here, too, I guess. :-)
>
>
Yes.

> There's one little problem I currently see: We'll need 2 implementations
> of SQLInput and SQLOutput. One for machine endianness (to parse C), and
> one for network byte order (to parse D).
>
>
You mean B here, right?

> Or maybe we have both a Big and Little endian implementation, and the
> UDT[] syntax contains a hint on which byte order to use (Big=Network,
> Little, Native=whatever the machine has), with the defaults of Network
> for B and Native for C.
>
>
I agree that we need two implementations but I think the choice witch
one to use should be made by PL/Java at all times. The end user doesn't
need to be exposed to this and PL/Java will always know what it wraps.
Why introduce the complexity?

> Having two fixed-endian implementations will even be more efficient
> compared to the current one that has an if() in every method, in case
> the Jit does not grasp it.
>
>
Right. And another thing that struck me is that since all calls origin
from C-code, and since the backend is inherently single-threaded, it
would be OK to use one singleton instances of each type. If we do that,
no objects need to be created when doing send/receive.

> Btw, do you have any idea if and how other databases implement the UDT
> mapping for java?
>
>
I've done some experimenting with Oracle. They are much closer to the
SQL 2003 standard where you actually define types along with attributes,
methods, and constructors. I've brought it up with the PostgreSQL
community a couple of times but they hasn't shown much interest so far.
I should mention that the Oracle experiments was performed five years
ago so a lot might have happened since.

Regards,
Thomas Hallgren

From:	schabi at logix-tt(dot)com (Markus Schaber)
To:
Subject:	[Pljava-dev] VarLenTuple example code
Date:	2006-10-01 18:16:28
Message-ID:	452005FC.9070104@logix-tt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	PostgreSQL : PostgreSQL 메일 링리스트 : 2006-10-01 이후 토토 18:16

Hi, Thomas,

Thomas Hallgren wrote:

>> There's one little problem I currently see: We'll need 2 implementations
>> of SQLInput and SQLOutput. One for machine endianness (to parse C), and
>> one for network byte order (to parse D).
>>
> You mean B here, right?

Yes. Replace D with B.

>> Or maybe we have both a Big and Little endian implementation, and the
>> UDT[] syntax contains a hint on which byte order to use (Big=Network,
>> Little, Native=whatever the machine has), with the defaults of Network
>> for B and Native for C.
>>
> I agree that we need two implementations but I think the choice witch
> one to use should be made by PL/Java at all times. The end user doesn't
> need to be exposed to this and PL/Java will always know what it wraps.
> Why introduce the complexity?

I agree.

Currently, I don't have any use-case currently for the non-default
mappings. And should someone really need a different endianness, pljava
can still be adopted, or he/she can put it together byte for byte, or
use Integer.swapBytes or such.

>> Having two fixed-endian implementations will even be more efficient
>> compared to the current one that has an if() in every method, in case
>> the Jit does not grasp it.
>>
> Right. And another thing that struck me is that since all calls origin
> from C-code, and since the backend is inherently single-threaded, it
> would be OK to use one singleton instances of each type. If we do that,
> no objects need to be created when doing send/receive.

Yes, I agree. So we need 4 static variables:

- SQLInput Network Byte Order / Big Endian
- SQLInput Machine Byte Order

- SQLOutput Network Byte Order
- SQLOutput Machine Byte Order

On big endian machines, both input variables can even point to the same
instance, and the same for the output ones, but I think that's
overoptimization.

>> Btw, do you have any idea if and how other databases implement the UDT
>> mapping for java?
>
> I've done some experimenting with Oracle. They are much closer to the
> SQL 2003 standard where you actually define types along with attributes,
> methods, and constructors. I've brought it up with the PostgreSQL
> community a couple of times but they hasn't shown much interest so far.
> I should mention that the Oracle experiments was performed five years
> ago so a lot might have happened since.

So I guess they use their own mapping between the on-disk and java
representations, possibly along the pathes of java serialization.

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org