[Pljava-dev] advice needed

Lists: pljava-dev
From: info at wyse-systems(dot)ltd(dot)uk (info at wyse-systems(dot)ltd(dot)uk)
To:
Subject: [Pljava-dev] advice needed
Date: 2005-02-16 15:43:20
Message-ID: 309710-220052316154320451@M2W052.mail2web.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pljava-dev

>pljava approach:
>~~~~~~~~~~~~~~~~
>
>>1. same as above (assume the table is properly populated);
>>2.
>> - CREATE TYPE details_fn AS (name text);
>> - CREATE FUNCTION details_fn()
>> RETURNS SETOF details_v
>> AS 'whatever-package.DetailsView'
>> IMMUTABLE LANGUAGE java;
>>
>> - now, my understanding is that in the Java class DetailsView I have to
>>implement a method
>>
>> public boolean assignRowValues(ResultSet receiver, int currentRow)
>>throws SQLException {
>> }
>>
>> which is called until the return value is false (i.e. no more rows are
>>to be returned) and this needs to populate the one-row resultset which is
>>due to be displayed to the client, i.e.
>> select * from details_fn() should, in theory, get the desired results.
>>
>> public class DetailsView implements ResultSetProvider{
>> private ResultSet theResult;
>> public DetailsView() {
>> Connection con =
>>DriverManager.getConnection("jdbc:default:connection");
>> Statement st = con.createStatement();
>> this.theResult = con.executeQuery("select forename, surname from
>>details_tbl where title ilike 'mr'");
>> }
>> }
>>
>> The difficulty I am having is that even though I can determine the
>>entire rowset at the constructor, when the 'select * from details_fn()'
>>statement is run the assignRowValues method gets called to populate a
>>single row of the receiver. So, in order to build the entire set I would
>>need to do something like:
>>
>> public boolean assignRowValues(ResultSet receiver, int currentRow)
>>throws SQLException {
>> if (this.theResult != null && this.theResult.absolute(currentRow)
{
>> receiver.updateString(1, this.theResult.getString(1) + " " +
>>this.theResult.getString(2));
>> return true;
>> else return false;
>> }
>>
>>If the above is correct as I already pointed out this would be very
>>inneficient, because of all the 'absolute' method calls (and subsequent
>>scans of the entire result set object).
>>
>>If I am not correct I welcome suggestions as to how this can be
implemented?
>>
>>
>Firs of all, I'm sorry I misread you question. There are two ResultSet's
>involved in your scenario, not just the one I thought you where
>referring to, namely the parameter. I missed the whole point :-)

>You are guaranteed that the currentRow will be increased with exactly
>one for each call, so there's no need to do absolute positioning. A
>simple next will be sufficient.

Nice to know. In this case a couple more questions:

1. I assume the main class is created (and the class constructor is called)
upon a direct call to the function and then instance is left to be
destroyed by the gc.
If that is the case this wouldn't be the most wise thing to do because of
(at least) 2 reasons: 1. Class cleanup (in my example I have to perform
connection shutdown as well as Resultset and Statement close upon
'completion' or exception event of each task), and 2: there is no way on
Earth you can manage multiple instances and synchorization (unless you
synchronise all methods - way too slow).

Wouldn't it be better for pljava to create a pool of instances and present
them to the caller each time a call to the function is made. If you adopt
this approach and extend the ResultSetProvider class to include at least
two more methods for managing class 'activate' and 'deactivate' events
while keeping/recycle the class instance that would bring a performance
boost (not to mention the memory management improvements). You can then add
a few more set of options in postgresql.conf file to configure the object
pool. I am using a similar pool here on our system (it deals with between
30 and 120 different class instances existing in the pool at any point in
time, each of which has an independent network connection to a client) and
by using a pool of objects (as oppose to class instances
creation/destruction upon every call) this brings a performance boost
between 17 and 28% of the entire system.

2. I am no expert in PostgreSQL internals (in fact I don't know anything
about that at all), but with the above class (DetailsView) wouldn't be
wiser to call 'a method' once and get the result in one go, instead of
calling assignRowValues for each individual row. I think I know the answer
to that one, but it is worth asking and give it a go anyway (;

>
>Another approach could be to build an ArrayList of objects (some class
>that contains the two strings) in the constructor or on the first call
>to assignRowValues. It all depends on what you are after (I assume its
>more then just implementing a view), how much data that is expected to
>build up, etc.
>

This is worth considering providing I work with small amount of data, which
I don't - it ranges in the 100K-740K records mark and about 10% of it is
bytea (hence my delight that the new version of the postgresql.jar has been
enchanced to include proper treatment of bytea objects without eating
3-times as much memory as before - it was a real nightmare). To answer you
question - no, it is better to utilise ResultSet as I would get the result
on a per-required basis with little memory footprint.

>>The reason I've said that the examples are rather daft (and I stick with
>>what I've said) is that, yes, even though I *did* look at the code and all
>>the examples, in all of them the SETOF returned is determined without any
>>involvement of datasets or database iteraction in general, i.e. in
>>assignRowValues the receiver set row is determined purely on the basis of
>>simple integer additions and the current timestamp - no database
iteraction
>>whatsoever as shown in my example above. In other words, if I want to
>>implement a view result set based on *real* data (and *not* just simple
>>integer additions) there is nowhere to look at!
>>
>>Do you get my point?
>>
>>
>I do. And if I succeed helping you implement what you need then you're
>more then welcome to submit additional examples.

Once I clarify the above questions I am willing to contribute.

Regards,

George

--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .


From: thhal at mailblocks(dot)com (Thomas Hallgren)
To:
Subject: [Pljava-dev] advice needed
Date: 2005-02-16 16:53:46
Message-ID: thhal-0IRPvAjWJxicEk4lsNP3uqVJU2q8J7u@mailblocks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pljava-dev

info at wyse-systems.ltd.uk wrote:

>1. I assume the main class is created (and the class constructor is called)
>upon a direct call to the function and then instance is left to be
>destroyed by the gc.
>If that is the case this wouldn't be the most wise thing to do because of
>(at least) 2 reasons: 1. Class cleanup (in my example I have to perform
>connection shutdown as well as Resultset and Statement close upon
>'completion' or exception event of each task), and 2: there is no way on
>Earth you can manage multiple instances and synchorization (unless you
>synchronise all methods - way too slow).
>
>
Keep in mind that the PostgreSQL backend is inherently single threaded.
You have one JVM per session. This means two things. 1) you hardly ever
need to worry about synchronization. The only time you'll need it is
when you expect finalizers to do something that might conflict with the
main thread. 2) Since you hardly ever will encounter a contended
synchronization locks, the overhead of using synchronizers will be
almost none at all. A modern JVM is extremely fast when it comes to
uncontendend synchronizations.

Don't rely on finalizers to do cleanup. Do that just prior to returning
false in the assignRowValues.

>Wouldn't it be better for pljava to create a pool of instances and present
>them to the caller each time a call to the function is made. If you adopt
>this approach and extend the ResultSetProvider class to include at least
>two more methods for managing class 'activate' and 'deactivate' events
>while keeping/recycle the class instance that would bring a performance
>boost (not to mention the memory management improvements). You can then add
>a few more set of options in postgresql.conf file to configure the object
>pool. I am using a similar pool here on our system (it deals with between
>30 and 120 different class instances existing in the pool at any point in
>time, each of which has an independent network connection to a client) and
>by using a pool of objects (as oppose to class instances
>creation/destruction upon every call) this brings a performance boost
>between 17 and 28% of the entire system.
>
>
Yes, using a pool is an excellent idea. Stay tuned for next release ;-)
Meanwhile, try something like this (replace PooledProvider with a name
of choice):

public class PooledProvider implements ResultSetProvider
{
private static PooledProvider s_pool;
private PooledProvider m_next;

public boolean assignRowValues(ResultSet receiver, int currentRow)
throws SQLException
{
// Code that returns true omitted. This happens just before you
return false.
this.deactivate();
m_next = s_pool;
s_pool = this;
return false;
}

public static ResultSetProvider getPooledProvider()
throws SQLException
{
PooledProvider ret = s_pool;
if(ret != null)
s_pool = ret.m_next;
else
ret = new PooledProvider();
ret.activate();
return ret;
}

void activate() {}
void deactivate() {}
}

>2. I am no expert in PostgreSQL internals (in fact I don't know anything
>about that at all), but with the above class (DetailsView) wouldn't be
>wiser to call 'a method' once and get the result in one go, instead of
>calling assignRowValues for each individual row. I think I know the answer
>to that one, but it is worth asking and give it a go anyway (;
>
>
The reason is a combination of factors:
1. The PostgreSQL backend function interface stipulates that you
implement a function that returns one row at a time.
2. Returning all in one go implies that you either build everything up
in memory (not an option for you), or that you create some kind of
implementation that in turn can return one row at a time. Then you've
gained nothing since that's exactly what the ResultSetProvider does.
3. A lot of implementations will return data that doesn't origin from an
SQL query.

Having said that, perhaps PLJava should provide a variant that allowes
you to return a ResultSet and thereby bypass the transfer between
ResultSet's that has to take place today. We are currently adding
DatabaseMetaData support to PLJava and in that addition a new
SyntheticResultSet is included that would make such an approach even
more valuable.

>Once I clarify the above questions I am willing to contribute.
>
>
Super!

Regards,
Thomas Hallgren