Re: contrib/tree

Lists: Postg토토 핫SQL :
From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: <dhogaza(at)pacifier(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: contrib/tree
Date: 2002-01-25 19:17:34
Message-ID: Pine.GSO.4.33.0201252213530.19023-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Don,

does your approach handle directed graphs ( DAG ) ?
Actually our module is just a result of our research for new
data type which could handle DAGs ( yahoo, dmoz -like hierarchies)
effectively in PostgreSQL.
While we didn't find a solution we decided to release this module
because 64 children would quite ok for many people.
Of course, 128 would be better :-)

How about 'move' operation in your approach ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: dhogaza(at)pacifier(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: contrib/tree
Date: 2002-01-25 19:38:50
Message-ID: 1011987530.2371.2.camel@rh72.home.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2002-01-26 at 00:17, Oleg Bartunov wrote:
> Don,
>
> does your approach handle directed graphs ( DAG ) ?
> Actually our module is just a result of our research for new
> data type which could handle DAGs ( yahoo, dmoz -like hierarchies)
> effectively in PostgreSQL.

Why not use intarray's instead of (n=6)bit-arrays?

Is it just space savings ( 64(0) of anything is enough ;) ) or something
more fundamental ?

> While we didn't find a solution we decided to release this module
> because 64 children would quite ok for many people.
> Of course, 128 would be better :-)

4294967296 would be enough for almost everybody :)

> How about 'move' operation in your approach ?

I have not looked at his code long enough but it seems to still need
replacing all child nodes bitarray tails ...

--------------
Hannu


From: Don Baccus <dhogaza(at)pacifier(dot)com>
To:
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: contrib/tree
Date: 2002-01-26 00:19:35
Message-ID: 3C51F617.6050701@pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing wrote:

>>How about 'move' operation in your approach ?
>>
>
> I have not looked at his code long enough but it seems to still need
> replacing all child nodes bitarray tails ...

Yes, it does. But moving items around is a rare event in our environment.

--
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: <dhogaza(at)pacifier(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: contrib/tree
Date: 2002-01-26 19:07:36
Message-ID: Pine.LNX.4.30.0201261406000.688-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oleg Bartunov writes:

> does your approach handle directed graphs ( DAG ) ?
> Actually our module is just a result of our research for new
> data type which could handle DAGs ( yahoo, dmoz -like hierarchies)
> effectively in PostgreSQL.
> While we didn't find a solution we decided to release this module
> because 64 children would quite ok for many people.
> Of course, 128 would be better :-)

I was under the impression that the typical way to handle tree structures
in relational databases was with recursive unions. It's probably
infinitely slower than stuffing everything into one datum, but it gets you
all the flexibility that the DBMS has to offer.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Don Baccus <dhogaza(at)pacifier(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: contrib/tree
Date: 2002-01-26 19:25:13
Message-ID: 3C530299.30709@pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:

> Oleg Bartunov writes:
>
>
>>does your approach handle directed graphs ( DAG ) ?
>>Actually our module is just a result of our research for new
>>data type which could handle DAGs ( yahoo, dmoz -like hierarchies)
>>effectively in PostgreSQL.
>>While we didn't find a solution we decided to release this module
>>because 64 children would quite ok for many people.
>>Of course, 128 would be better :-)
>>
>
> I was under the impression that the typical way to handle tree structures
> in relational databases was with recursive unions. It's probably
> infinitely slower than stuffing everything into one datum, but it gets you
> all the flexibility that the DBMS has to offer.

As I explained to Oleg privately (I think it was privately, at least) a
key-based approach doesn't work well for DAGs because in essence you
need a set of keys, one for each path that can reach the node. One of
my websites tracks bird sightings for people in the Pacific NW and our
geographical database is a DAG, not a tree (we have wildlife refuges
that overlap states, counties etc). In that system I use a
parent-child table to track the relationships.

My impression is that there's no single "best way" to handle trees or
graphs in an RDBMS that doesn't provide internal support (such as Oracle
with its "CONNECT BY" extension).

The method we use in OpenACS works very well for us. Insertion and
selection are fast, and these are the common operations in *our*
environment. YMMV, of course. Key-based approaches are fairly well
known, at least none of us claim to have invented anything here. The
only novelty, if you will, is our taking advantage of the fact that PG's
implementation of BIT VARYING just happens to work really well as a
datatype for storing keys. Full indexing support, substring, position,
etc ... very slick.

Someone asked about using an integer array to store the hierarchical
information. I looked at that a few months back but it would require
providing custom operators, so rejected it in favor of the approach
we're now using. It is important to us that users be able to fire up
OpenACS 4 on a vanilla PG, such as the one installed by their Linux or
BSD distribution. That rules out special operators that require contrib
code or the like.

We do use Oleg's full-text search stuff, but searching's optional and
can be added in after the user's more comfortable with our toolkit,
PostgreSQL, etc. A lot of our users are new to Postgres, or at least
have a lot more Oracle experience than PG experience.

But the integer array approach might well work for folks who don't mind
the need to build in special operators.

--
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Don Baccus <dhogaza(at)pacifier(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: contrib/tree
Date: 2002-01-26 20:29:56
Message-ID: 1012076996.2410.0.camel@rh72.home.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2002-01-27 at 00:25, Don Baccus wrote:
> Peter Eisentraut wrote:
> > I was under the impression that the typical way to handle tree structures
> > in relational databases was with recursive unions. It's probably
> > infinitely slower than stuffing everything into one datum, but it gets you
> > all the flexibility that the DBMS has to offer.

I see no reason why WITH RECURSIVE should be inherently slower than other
approaches except that checks for infinite recursion could be pricey.

Other than that getting rows by index should be more or less equal in both
cases.

> As I explained to Oleg privately (I think it was privately, at least) a
> key-based approach doesn't work well for DAGs because in essence you
> need a set of keys, one for each path that can reach the node. One of
> my websites tracks bird sightings for people in the Pacific NW and our
> geographical database is a DAG, not a tree (we have wildlife refuges
> that overlap states, counties etc). In that system I use a
> parent-child table to track the relationships.
>
> My impression is that there's no single "best way" to handle trees or
> graphs in an RDBMS that doesn't provide internal support (such as Oracle
> with its "CONNECT BY" extension).

The full SQL3 syntax for it is much more powerful and complex (see
attachment).

I think that this is what should eventually go into postgresql.

> Someone asked about using an integer array to store the hierarchical
> information. I looked at that a few months back but it would require
> providing custom operators, so rejected it in favor of the approach
> we're now using. It is important to us that users be able to fire up
> OpenACS 4 on a vanilla PG, such as the one installed by their Linux or
> BSD distribution. That rules out special operators that require contrib
> code or the like.
>
> We do use Oleg's full-text search stuff, but searching's optional and
> can be added in after the user's more comfortable with our toolkit,
> PostgreSQL, etc. A lot of our users are new to Postgres, or at least
> have a lot more Oracle experience than PG experience.
>
> But the integer array approach might well work for folks who don't mind
> the need to build in special operators.

I'll try if I can build the operators in PL/PGSL so one would not
"really" need to build special operators ;)

Tell me if this is something impossible.

------------------
Hannu

Attachment Content-Type Size
image/gif 14.6 KB

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: contrib/tree
Date: 2002-01-26 20:46:39
Message-ID: m3665ozvps.fsf@chvatal.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

dhogaza(at)pacifier(dot)com (Don Baccus) writes:
> Peter Eisentraut wrote:
> > I was under the impression that the typical way to handle tree
> > structures in relational databases was with recursive unions.
> > It's probably infinitely slower than stuffing everything into one
> > datum, but it gets you all the flexibility that the DBMS has to
> > offer.

> As I explained to Oleg privately (I think it was privately, at
> least) a key-based approach doesn't work well for DAGs because in
> essence you need a set of keys, one for each path that can reach the
> node. One of my websites tracks bird sightings for people in the
> Pacific NW and our geographical database is a DAG, not a tree (we
> have wildlife refuges that overlap states, counties etc). In that
> system I use a parent-child table to track the relationships.

... Where parent/child has the unfortunate demerit that walking the
tree requires (more-or-less; it could get _marginally_ hidden in a
stored procedure) a DB query for each node that gets explored.

> My impression is that there's no single "best way" to handle trees
> or graphs in an RDBMS that doesn't provide internal support (such as
> Oracle with its "CONNECT BY" extension).

> The method we use in OpenACS works very well for us. Insertion and
> selection are fast, and these are the common operations in *our*
> environment. YMMV, of course. Key-based approaches are fairly well
> known, at least none of us claim to have invented anything here.
> The only novelty, if you will, is our taking advantage of the fact
> that PG's implementation of BIT VARYING just happens to work really
> well as a datatype for storing keys. Full indexing support,
> substring, position, etc ... very slick.

Have you a URL for this? (A link to a relevant source code file would
be acceptable...)

> Someone asked about using an integer array to store the hierarchical
> information. I looked at that a few months back but it would
> require providing custom operators, so rejected it in favor of the
> approach we're now using. It is important to us that users be able
> to fire up OpenACS 4 on a vanilla PG, such as the one installed by
> their Linux or BSD distribution. That rules out special operators
> that require contrib code or the like.

Are you referring to the "nested tree" model (particularly promoted by
Joe Celko; I don't know of a seminal source behind him)? It
unfortunately doesn't work with graphs...
--
(concatenate 'string "cbbrowne" "@ntlug.org")
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
"Did you ever walk in a room and forget why you walked in? I think
that's how dogs spend their lives." -- Sue Murphy


From: Don Baccus <dhogaza(at)pacifier(dot)com>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: contrib/tree
Date: 2002-01-27 01:06:13
Message-ID: 3C535285.1050003@pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing wrote:

> On Sun, 2002-01-27 at 00:25, Don Baccus wrote:
>
>>Peter Eisentraut wrote:
>>
>>>I was under the impression that the typical way to handle tree structures
>>>in relational databases was with recursive unions. It's probably
>>>infinitely slower than stuffing everything into one datum, but it gets you
>>>all the flexibility that the DBMS has to offer.
>>>
>
> I see no reason why WITH RECURSIVE should be inherently slower than other
> approaches except that checks for infinite recursion could be pricey.
>
> Other than that getting rows by index should be more or less equal in both
> cases.

We use Oracle's "CONNECT BY", not a key-oriented approach, in the Oracle
version of the toolkit. There are some awkwardnesses involved in their
implementation. You can't join with the table you're "connecting". If
you do it in a subselect then join against the result, you get the right
rows (of course) but Oracle's free to join in any order. So you can't
get a "tree walk" output order if you need to join against your tree,
meaning you have to fall back on a sort key anyway (a simpler one, though).

I haven't looked at "WITH RECURSIVE" to see if it's defined to be more
useful than Oracle's "CONNECT BY" since I don't use any RDBMS that
implements it.

>>My impression is that there's no single "best way" to handle trees or
>>graphs in an RDBMS that doesn't provide internal support (such as Oracle
>>with its "CONNECT BY" extension).
>>
>
> The full SQL3 syntax for it is much more powerful and complex (see
> attachment).
>
> I think that this is what should eventually go into postgresql.

Yes, indeed.

> I'll try if I can build the operators in PL/PGSL so one would not
> "really" need to build special operators ;)
>
> Tell me if this is something impossible.

There's the speed issue, of course ... and the extra space, which for
deep trees could be significant.

Our solution suits our needs very well, and we're happy with it. We do
get 2 billion plus immediate children per node and a one-byte per level
key for trees that aren't big and flat. The intarray approach is just a
different storage technique for the same method, I don't see that moving
nodes is any easier (you have to touch the same number of nodes if you
move a subtree around). It takes more storage and the necessary
comparisons (even if written in C) will be slower unless the tree's big
and flat (because you're using four bytes for every level, while our BIT
VARYING scheme, in practice, uses one byte for each level in a very
large majority of cases).

--
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Don Baccus <dhogaza(at)pacifier(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: contrib/tree
Date: 2002-01-27 11:31:34
Message-ID: 1012131094.9785.0.camel@rh72.home.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 핫SQL :

On Sun, 2002-01-27 at 06:06, Don Baccus wrote:
> Hannu Krosing wrote:
>
>
> > I'll try if I can build the operators in PL/PGSL so one would not
> > "really" need to build special operators ;)

Ok, I've done most of it (the comparison functions and operators), but
now I'm stuck with inability to find any arrayconstructing functionality
in postgres - the only way seems to be the text-to-type functions .

Also arrays seem to be read only -- a[i] := 1 is a syntax error.

And get/set slice operators are defined static in source ;(

> > Tell me if this is something impossible.
>
>
> There's the speed issue, of course ... and the extra space, which for
> deep trees could be significant.
>
> Our solution suits our needs very well, and we're happy with it. We do
> get 2 billion plus immediate children per node and a one-byte per level
> key for trees that aren't big and flat. The intarray approach is just a
> different storage technique for the same method, I don't see that moving
> nodes is any easier (you have to touch the same number of nodes if you
> move a subtree around).

Is there a simple query for getting all ancestors of a node ?

The intarray approach has all them already encoded .

> It takes more storage and the necessary
> comparisons (even if written in C) will be slower unless the tree's big
> and flat (because you're using four bytes for every level, while our BIT
> VARYING scheme, in practice, uses one byte for each level in a very
> large majority of cases).

I'm inclining more and more towards using your approach. I just even
figured out that I don't rreally need to get the ancestors for my needs.

-------------
Hannu


From: Don Baccus <dhogaza(at)pacifier(dot)com>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: contrib/tree
Date: 2002-01-27 15:16:40
Message-ID: 3C5419D8.6010001@pacifier.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing wrote:

> Is there a simple query for getting all ancestors of a node ?

Yes, a recursive SQL function that returns a rowset of ancestor keys.
It works off the key directly, doesn't need to touch any tables, so is
quite fast.

--
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org