Quick Links

[Pljava-dev] advice needed

Lists:	pljava-dev

From:	info at wyse-systems(dot)ltd(dot)uk (WYSE Systems Limited (Information))
To:
Subject:	[Pljava-dev] advice needed
Date:	2005-02-16 21:09:55
Message-ID:	200502162114.j1GLEta04996@eagle.cqhost.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

On Wed, 16 Feb 2005 17:53:46 +0100, Thomas Hallgren wrote:

>Keep in mind that the PostgreSQL backend is inherently single threaded.
>You have one JVM per session. This means two things. 1) you hardly ever
>need to worry about synchronization. The only time you'll need it is
>when you expect finalizers to do something that might conflict with the
>main thread. 2) Since you hardly ever will encounter a contended
>synchronization locks, the overhead of using synchronizers will be
>almost none at all. A modern JVM is extremely fast when it comes to
>uncontendend synchronizations.
>
>Don't rely on finalizers to do cleanup. Do that just prior to returning
>false in the assignRowValues.

Frankly, the whole idea of having a method called each time to set a ResultSet containing one row does not inspire me at all - it has to be said!

Wouldn't it be better to have a single method like:

// client interface method
public ResultSet getResult();

The receiver presented in the assignRowValues is already a ResultSet consisting of a single row. If you are worried about type matching you can easily check for that by gathering the appropriate meta data (as you do with a normal ResultSet) or throw an exception if type constraints have been violated (see below).
Why is it a problem to supplement it with a ResultSet object containing not one row, but the entire set and then iterate over it *internally*, using next?
In a way what I am suggesting is that the call to 'next' instead of being in the assignRowValues as you suggested to be processed internally after a call is made to getResult() for the entire set instead, i.e:

current approach:
~~~~~~~~~~~~
1. set a one-row ResultSet object;
2. set row number value;
3. call assignRowValues;
4. process result;

new approach:
~~~~~~~~~~
1. call getResult();
2. process result (iterate over each row, if necessary, otherwise store the ResultSet internally for later processing);

If there is a requirement/constraint from PostgreSQL to process a single row at a time, then call getResult once, store the result internally and use it in another private get-one-resultset-row method (probably similar to assignRowValues) within pljava.
That I think would be much easier for developers like myself to handle it - all I have to worry about then is to prepare the ResultSet in a way I want it without the need to get bogged down in implementing iteractions and mess about with row numbers and the like.

Better still, you can define and 'fire' different events through the entire process to give developers more control of what is being done. If adopted I have a suggestion of (at least) three such events (I suspect these methods will be in addition to the once controlling the pool behaviour, like 'make', 'activate', 'passivate' and 'destroy'):

public void initialise(); // fired before the getResult() interface method is actually called to give the client class a chance to initialise itself
public void lastRowProcessed(); // fired after the last row of the ResultSet has been processed;
public void processException(Exception e); // when pljava/processing exception has occured

>
>>Wouldn't it be better for pljava to create a pool of instances and present
>>them to the caller each time a call to the function is made. If you adopt
>>this approach and extend the ResultSetProvider class to include at least
>>two more methods for managing class 'activate' and 'deactivate' events
>>while keeping/recycle the class instance that would bring a performance
>>boost (not to mention the memory management improvements). You can then add
>>a few more set of options in postgresql.conf file to configure the object
>>pool. I am using a similar pool here on our system (it deals with between
>>30 and 120 different class instances existing in the pool at any point in
>>time, each of which has an independent network connection to a client) and
>>by using a pool of objects (as oppose to class instances
>>creation/destruction upon every call) this brings a performance boost
>>between 17 and 28% of the entire system.
>>
>>
>Yes, using a pool is an excellent idea. Stay tuned for next release ;-)

I can't wait 'til the next release - I want it *NOW* (;

Seriously though, by having supplemented the assignRowValues thingy with one swift ResultSet getter is going to be a lot better me thinks.

>Meanwhile, try something like this (replace PooledProvider with a name
>of choice):

Nah! When I have time (probably around Friday time by the looks of it) I'll post the code of our pool framework based on Jakarta's Generic Object Pool which has been used by us for the past 7 months - all that without a single glitch (I can hardly recall stopping the service or bringing down the servers more than 2-3 times since it was launched).
The framework is very robuslt and as I pointed out earlier has dealt with a lot of pounding in the past.

>>2. I am no expert in PostgreSQL internals (in fact I don't know anything
>>about that at all), but with the above class (DetailsView) wouldn't be
>>wiser to call 'a method' once and get the result in one go, instead of
>>calling assignRowValues for each individual row. I think I know the answer
>>to that one, but it is worth asking and give it a go anyway (;
>>
>>
>The reason is a combination of factors:
>1. The PostgreSQL backend function interface stipulates that you
>implement a function that returns one row at a time.
>2. Returning all in one go implies that you either build everything up
>in memory (not an option for you), or that you create some kind of
>implementation that in turn can return one row at a time. Then you've
>gained nothing since that's exactly what the ResultSetProvider does.

There is one single most important and precious benefit of it all: *TIME*. By doing a single call you are going to save developers a great deal of development time, which would be otherwise spent/wasted on implementing the assignRowValues classes. By having a single call (getResult() as I suggested above) I wouldn't worry too much about what goes on and can concentrate on other important tasks.
This may not sound much but when you have about 800+ classes spawned in over 54 packages to deal with it becomes a real issue and time is a precious commodity indeed.

>Having said that, perhaps PLJava should provide a variant that allowes
>you to return a ResultSet and thereby bypass the transfer between
>ResultSet's that has to take place today. We are currently adding
>DatabaseMetaData support to PLJava and in that addition a new
>SyntheticResultSet is included that would make such an approach even
>more valuable.

Eureka! As if you have read my mind!

>
>>Once I clarify the above questions I am willing to contribute.
>>
>>
>Super!

As I said I'll post the pool framework we use here to start with. Will see how it all goes after that.

Regards,

George

From:	thhal at mailblocks(dot)com (Thomas Hallgren)
To:
Subject:	[Pljava-dev] advice needed
Date:	2005-02-16 22:29:04
Message-ID:	thhal-0s2HvAi+OxicoAri/IE5Q6XehocNEMg@mailblocks.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pljava-dev

George,
I think we agree that it would be beneficial to add an alternative way
of doing things such as the getResult() you suggest. I'd like to clarify
a couple of things with the current design though.

>// client interface method
>public ResultSet getResult();
>
>The receiver presented in the assignRowValues is already a ResultSet consisting of a single row. If you are worried about type matching you can easily check for that by gathering the appropriate meta data (as you do with a normal ResultSet) or throw an exception if type constraints have been violated (see below).
>Why is it a problem to supplement it with a ResultSet object containing not one row, but the entire set and then iterate over it *internally*, using next?
>
>
Let's assume that you want to return a set that cannot be expressed as a
single query. Each row in your result contains data that you create from
one or more sources. The source might be a socket, a file, a query
combined with other sources, etc. Point is, you don't have a ResultSet
to return. Using the current approach you have no problem doing this.
You simply update the row that is passed to with data from your sources,
once for each row, and you're all set. The tailor made, single row
ResultSet object that is passed to this method of course reused for each
call.

You must look at this ResultSet object, not as a "set" per se, but as a
single Tuple. There's no way to position within this set or add a row.
The one and only row is always present and the set is always positioned
on this row. If there was a standard interface in java.sql representing
a single Tuple, that interface would have been used instead. But there
is no such interface.

So current approach:
~~~~~~~~~~~~
on first call:
1. create a single-row ResultSet object.
2. call assignRowValues.
3. process result (extract the tuple data from the ResultSet object).

on each subsequent call:
1. call assignRowValues using the same single-row ResultSet object.
2. process result (extract the tuple data from the ResultSet object).

Now, with your suggested approach you have two choices:
1. Build a SyntheticResultSet in memory and return it. A fair amount of
code and the result might consume an unacceptable amount of memory.
2. Create your own implementation of ResultSet where you are the
implementor of the next() method. This is a great deal of work. A lot
more then just implementing the assignRowValues method.

New approach:
~~~~~~~~~~~~
on first call:
1. call getResult()
2. call next() on the obtained ResultSet
3. process result (extract the tuple from the ResultSet object).

on each subsequent call:
1. call next()
2. process result (extract the tuple from the ResultSet object).

Same amount of work for both approaches. The first approach doesn't
suffer from any of the disadvantages that the second approach has so
there's a good motivation to keep it.

>That I think would be much easier for developers like myself to handle it - all I have to worry about then is to prepare the ResultSet in a way I want it without the need to get bogged down in implementing iteractions and mess about with row numbers and the like.
>
>
Yes. The use case you have, when a function actually executes a query
and want to return the result of that query is a good motivation to add
the new approach.

>Better still, you can define and 'fire' different events through the entire process to give developers more control of what is being done. If adopted I have a suggestion of (at least) three such events (I suspect these methods will be in addition to the once controlling the pool behaviour, like 'make', 'activate', 'passivate' and 'destroy'):
>
>public void initialise(); // fired before the getResult() interface method is actually called to give the client class a chance to initialise itself
>public void lastRowProcessed(); // fired after the last row of the ResultSet has been processed;
>public void processException(Exception e); // when pljava/processing exception has occured
>
>
I assume that 'activate' is the same as 'initialise' and 'passivate' is
the same as 'lastRowProcessed'? If so, I must say I like the original
names better. 'initialise' sounds like a constructor. 'activate' is a
well known term for patterns that use pooling, and if we use 'activate',
then 'passivate' comes natural.

'destroy' is fine but I don't think we need 'make' since that's the same
as the constructor and I'm opposed to a special processException method.
Let the implementor decide how and when he want to deal with exceptions.
It's easy enough to implement and several patterns are possible.

>Nah! When I have time (probably around Friday time by the looks of it) I'll post the code of our pool framework based on Jakarta's Generic Object Pool which has been used by us for the past 7 months - all that without a single glitch (I can hardly recall stopping the service or bringing down the servers more than 2-3 times since it was launched).
>The framework is very robuslt and as I pointed out earlier has dealt with a lot of pounding in the past.
>
>
Great. I look forward to reviewing it.

Regards,
Thomas Hallgren