Re: Optimizing the implementation of an optimized tree

Lists: pgsql-sql
From: Christian Rishoej <chrris(at)mail(dot)dk>
To: pgsql-sql(at)postgresql(dot)org
Cc: Anders Johannsen <anders(at)johannsen(dot)com>
Subject: Optimizing the implementation of an optimized tree
Date: 2002-05-07 01:02:16
Message-ID: 1020733336.892.79.camel@penelope.dyndns.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hi,

Recently I needed to store a hiearchy of pages on a website in Postgres
in a way that would allow for fast extraction of parts of the tree for
menu generation and "path" specification (which nodes in the tree make
up the path from the root to my position?).

This can be accomplished by letting each node in the tree have an l and
r value with values determined by traversing the edge of the tree and
assigning the value of integer that is incremented at each node visit to
l and doing the same thing for r, this time traversing the edge of the
tree backwards.

The tree then becomes:

CREATE TABLE tree (
id serial PRIMARY KEY,
parent int REFERENCES tree,
l int DEFAULT NULL,
r int DEFAULT NULL,
);

The parent id strictly not needed, but I chose to include it for
convenience.

I can easily extract a complete branch like this:

SELECT treeNode.id
FROM tree treeNode, tree treeParent
WHERE treeNode.l BETWEEN treeParent.l AND treeParent.r
AND treeParent.id = $1
ORDER BY treeNode.l

And the "path" from the root to a particular node like this:

SELECT treeNode.id
FROM tree treeNode, tree currentNode
WHERE currentNode.r BETWEEN treeNode.l AND treeNode.r
AND currentNode.id = $1
ORDER BY treeNode.l;

Now, in order to avoid having to maintain the values of l and r for each
node when the tree is modified I created triggers in PL/PGSQL that
handle INSERTs of new nodes into the tree, DELETEs of existing nodes and
UPDATEs of a node's position in the tree (i.e. the parant field).

The implementation is included as an attachment.

I am very interested in comments on the implementation - especially
hints on possible optimizations.

It seems to me that the PL/PGSQL-definition of each function is parsed
at each call. Is this true? Is it possible to avoid this?

It is my understanding that Postgres does not yet support triggers that
fire on the update of a paricular column. Therefore I have a check in
the UPDATE trigger that checks if the parent-field was modified. Is
there any was to do this in a nicer manner?

Regards,
Christian

Attachment Content-Type Size
triggers.sql text/x-sql 2.8 KB

From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Christian Rishoej" <chrris(at)mail(dot)dk>, <pgsql-sql(at)postgresql(dot)org>
Cc: "Anders Johannsen" <anders(at)johannsen(dot)com>
Subject: Re: Optimizing the implementation of an optimized tree
Date: 2002-05-07 02:00:24
Message-ID: GNELIHDDFBOCMGBFGEFOGEHJCCAA.chriskl@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Why not try Oleg and Teodor's tree module?

http://cddb.sai.msu.su/~megera/postgres/gist/

You have to expend a little effort in implementing it as the README's in
Russian :) Still has examples tho.

Chris

> -----Original Message-----
> From: pgsql-sql-owner(at)postgresql(dot)org
> [mailto:pgsql-sql-owner(at)postgresql(dot)org]On Behalf Of Christian Rishoej
> Sent: Tuesday, 7 May 2002 9:02 AM
> To: pgsql-sql(at)postgresql(dot)org
> Cc: Anders Johannsen
> Subject: [SQL] Optimizing the implementation of an optimized tree
>
>
>
>
> Hi,
>
> Recently I needed to store a hiearchy of pages on a website in Postgres
> in a way that would allow for fast extraction of parts of the tree for
> menu generation and "path" specification (which nodes in the tree make
> up the path from the root to my position?).
>
> This can be accomplished by letting each node in the tree have an l and
> r value with values determined by traversing the edge of the tree and
> assigning the value of integer that is incremented at each node visit to
> l and doing the same thing for r, this time traversing the edge of the
> tree backwards.
>
> The tree then becomes:
>
> CREATE TABLE tree (
> id serial PRIMARY KEY,
> parent int REFERENCES tree,
> l int DEFAULT NULL,
> r int DEFAULT NULL,
> );
>
> The parent id strictly not needed, but I chose to include it for
> convenience.
>
> I can easily extract a complete branch like this:
>
> SELECT treeNode.id
> FROM tree treeNode, tree treeParent
> WHERE treeNode.l BETWEEN treeParent.l AND treeParent.r
> AND treeParent.id = $1
> ORDER BY treeNode.l
>
> And the "path" from the root to a particular node like this:
>
> SELECT treeNode.id
> FROM tree treeNode, tree currentNode
> WHERE currentNode.r BETWEEN treeNode.l AND treeNode.r
> AND currentNode.id = $1
> ORDER BY treeNode.l;
>
> Now, in order to avoid having to maintain the values of l and r for each
> node when the tree is modified I created triggers in PL/PGSQL that
> handle INSERTs of new nodes into the tree, DELETEs of existing nodes and
> UPDATEs of a node's position in the tree (i.e. the parant field).
>
> The implementation is included as an attachment.
>
> I am very interested in comments on the implementation - especially
> hints on possible optimizations.
>
> It seems to me that the PL/PGSQL-definition of each function is parsed
> at each call. Is this true? Is it possible to avoid this?
>
> It is my understanding that Postgres does not yet support triggers that
> fire on the update of a paricular column. Therefore I have a check in
> the UPDATE trigger that checks if the parent-field was modified. Is
> there any was to do this in a nicer manner?
>
> Regards,
> Christian
>


From: Christian Rishoej <chrris(at)mail(dot)dk>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Optimizing the implementation of an optimized tree
Date: 2002-05-07 11:03:19
Message-ID: 1020769399.889.10.camel@penelope.dyndns.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Tue, 2002-05-07 at 04:00, Christopher Kings-Lynne wrote:

> Why not try Oleg and Teodor's tree module?
>
> http://cddb.sai.msu.su/~megera/postgres/gist/
>
> You have to expend a little effort in implementing it as the README's in
> Russian :) Still has examples tho.

As far as I can tell Oleg and Teotor are doing *balanced* trees, and not
trees usable for containing a hiearchy of nodes where each node can have
0-N children.

It is my understanding that I cannot rely on the organization of a
balanced tree - hence the name "balanced" (nodes are inserted where some
algotithm thinks it fits in, not where I want it. Imagine if your
directory-hiearchy on your filesystem was a balanced tree...).

Does anyone have a clue on how to optimize triggers? Can I use another
language than PL/PGSQL? Would C be crazy when I do lots of SQL's and no
calculation in the trigger?

Christian


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Christian Rishoej <chrris(at)mail(dot)dk>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Optimizing the implementation of an optimized tree
Date: 2002-05-07 11:54:01
Message-ID: Pine.GSO.4.44.0205071443490.12738-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 7 May 2002, Christian Rishoej wrote:

> On Tue, 2002-05-07 at 04:00, Christopher Kings-Lynne wrote:
>
> > Why not try Oleg and Teodor's tree module?
> >
> > http://cddb.sai.msu.su/~megera/postgres/gist/
> >
> > You have to expend a little effort in implementing it as the README's in
> > Russian :) Still has examples tho.
>
> As far as I can tell Oleg and Teotor are doing *balanced* trees, and not
> trees usable for containing a hiearchy of nodes where each node can have
> 0-N children.
>
> It is my understanding that I cannot rely on the organization of a
> balanced tree - hence the name "balanced" (nodes are inserted where some
> algotithm thinks it fits in, not where I want it. Imagine if your
> directory-hiearchy on your filesystem was a balanced tree...).

Hmm, you decide where to insert new node. It's my fault we don't have
english documentation for our module. But it was designed to work with
catalog like structure. Here is the link to my original posting:
http://fts.postgresql.org/db/mw/msg.html?mid=1352122

Test data (snapshot DMOZ) is available from
http://www.sai.msu.su/~megera/postgres/gist/tree/dmoz-full.sql.gz

There are functions
entree_level(entree) and bitree_level(bitree) which returns next level of
the node. To add node:

select entree_next(tid) from dmoz where tid <* '1.2.3.*.0' order by tid
desc limit 1;

but you're free to add node by hand:

test=# create table tree ( node bitree );
CREATE
test=# insert into tree values ('22.9.15.19.12');
INSERT 41728161 1
est=# select * from tree;
node
---------------
22.9.15.19.12
(1 row)

>
> Does anyone have a clue on how to optimize triggers? Can I use another
> language than PL/PGSQL? Would C be crazy when I do lots of SQL's and no
> calculation in the trigger?
>
> Christian
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83