Re: is GiST still alive?

Lists: pgsql-chatpgsql-hackers
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, zeitling(at)informatik(dot)hu-berlin(dot)de
Subject: Re: is GiST still alive?
Date: 2003-10-22 19:19:43
Message-ID: 200310221219.43179.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

Gregor,

> I'm developing a native XML database (C++) (which is supposed to become
> open source one day) and I'm wondering wheather I could use GiST for it's
> indexes. Is GiST still alive?

Don't know, sorry.

> Would PostgreSQL fit that requirement? And are you interested in having a
> fast, scalable XML access method?

We would welcome good XML tools for XML stored in databases.

However, I'm not sure that PostgreSQL is the appropriate platform for an
all-XML database; we're pretty dogmatically a relational database ... perhaps
the grandfather of most relational databases.

--
-Josh Berkus
Aglio Database Solutions
San Francisco


From: Christopher Browne <cbbrowne(at)libertyrms(dot)info>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: is GiST still alive?
Date: 2003-10-22 20:48:46
Message-ID: 60ptgpf175.fsf@dev6.int.libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

josh(at)agliodbs(dot)com (Josh Berkus) writes:
> Gregor,
>> I'm developing a native XML database (C++) (which is supposed to become
>> open source one day) and I'm wondering wheather I could use GiST for it's
>> indexes. Is GiST still alive?
>
> Don't know, sorry.
>
>> Would PostgreSQL fit that requirement? And are you interested in having a
>> fast, scalable XML access method?
>
> We would welcome good XML tools for XML stored in databases.
>
> However, I'm not sure that PostgreSQL is the appropriate platform for an
> all-XML database; we're pretty dogmatically a relational database ... perhaps
> the grandfather of most relational databases.

It leaves open the question of what is the appropriate way of
expressing XML entities and attributes and CDATA in database form.

If the point of the exercise is to have some way of storing XML data
in a database, then you may readily declare it thus:

create table some_table (
-- key columns omitted
xml character varying,
-- additional columns omitted
);

And that's perfectly good at storing XML information.

But I think back to the XML generator I wrote for GnuCash; it has the
notion of building up a hierarchy of entities and attributes, each of
which is visible as an identifyable object of some sort. Mapping that
onto a set of PostgreSQL relations wouldn't work terribly well.
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: is GiST still alive?
Date: 2003-10-22 21:43:31
Message-ID: 3F96FA03.6040009@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

Christopher Browne wrote:

>But I think back to the XML generator I wrote for GnuCash; it has the
>notion of building up a hierarchy of entities and attributes, each of
>which is visible as an identifyable object of some sort. Mapping that
>onto a set of PostgreSQL relations wouldn't work terribly well.
>
>

*nod* I have tried this several times - it just doesn't work well,
because the maps are too different.

You could do something like this:
. a table for each element type, fields being the attributes, plus
the node id.
. a table to tie everything together (parent_id, child_id,
child_order, child_type).
In theory you could even generate the DB schema from an XML schema and
evaluate it with XPath-like expressions.

But why put yourself to such bother? I have never found a good reason to
do this sort of thing.

cheers

andrew


From: Gregor Zeitlinger <zeitling(at)informatik(dot)hu-berlin(dot)de>
To: Christopher Browne <cbbrowne(at)libertyrms(dot)info>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: is GiST still alive?
Date: 2003-10-22 22:16:21
Message-ID: Pine.LNX.4.33.0310230013500.24570-100000@mitte.informatik.hu-berlin.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

On Wed, 22 Oct 2003, Christopher Browne wrote:
> It leaves open the question of what is the appropriate way of
> expressing XML entities and attributes and CDATA in database form.
<snip>

Thanx for your advise, but that's not what I had in mind. The original
idea to have a native xml database was that I doesn't work too well in a
relational database.
I was just wondering wheater I have to reinvent the wheel of database
technology when it comes to transaction processing, ACID, and Indexes,
which a native xml database ought to have as well.

--
Gregor Zeitlinger
gregor(at)zeitlinger(dot)de


From: Gregor Zeitlinger <zeitling(at)informatik(dot)hu-berlin(dot)de>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: is GiST still alive?
Date: 2003-10-22 22:22:55
Message-ID: Pine.LNX.4.33.0310230019220.24570-100000@mitte.informatik.hu-berlin.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

On Wed, 22 Oct 2003, Andrew Dunstan wrote:
> But why put yourself to such bother? I have never found a good reason to
> do this sort of thing.
I think there is a huge potential for XML databases once there are good
ones and people start using them more extensively.
But for having real fast xml databases it's too slow to store them in
relational tables. After all, if you were designing a database for xml
data only - would you come up with tables to store them?

--
Gregor Zeitlinger
gregor(at)zeitlinger(dot)de


From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: is GiST still alive?
Date: 2003-10-23 01:39:54
Message-ID: m37k2wu3yt.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

In the last exciting episode, zeitling(at)informatik(dot)hu-berlin(dot)de (Gregor Zeitlinger) wrote:
> On Wed, 22 Oct 2003, Andrew Dunstan wrote:
>> But why put yourself to such bother? I have never found a good reason to
>> do this sort of thing.
> I think there is a huge potential for XML databases once there are good
> ones and people start using them more extensively.
> But for having real fast xml databases it's too slow to store them in
> relational tables. After all, if you were designing a database for xml
> data only - would you come up with tables to store them?

No, "tables" wouldn't be the right way to do it.

But it's going to be troubled, in any case, because of the
every-popular mixtures of:

a) Often weird declarations of what character sets are in use;

b) Pointers to other parts of a document;

c) What's a "database" going to consist of? One XML document? Or
many? And if many, then then how do you have a centralized
reference point to navigate from to find the document that you
want?

And "navigate" was a carefully chosen word; what you then have is
essentially a network database system, and have to then start making
up ways of describing queries. XQuery may be better than CODASYL of
yesteryear, but you're still left writing a lot of recursive code.
(Thus making those that understand the Lambda Nature more powerful...)

At the end, do you have a "database?" Or just a set of documents?
It's hard to tell, a priori.

And do you think this is likely to be useful because:

a) You have some clear notion as to why this ought to be useful?

or merely because

b) XML is a big buzzword, and people have been able to succesfully
attract "research funds" or "vulture capital" on the basis of
having that acronym in a proposal?

The latter seems to have been really common the last few years, as
people seem often to add "Java" and "XML" to proposals in order to
attract unwary investors in much the same way that a "lady of the
evening" wears scanty clothing in order to attract her customers.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
http://www3.sympatico.ca/cbbrowne/lisp.html
Eagles may soar, but weasels don't get sucked into jet engines.


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: is GiST still alive?
Date: 2003-10-23 01:53:30
Message-ID: 3F97349A.5040300@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

Do this:

1. Create a new type called 'xml', based on text.

2. The xmlin function for that type will validate what you are
enterering is XML

3. Create new functions to implement XPath, SAX, etc. on the xml type.

4. Extend the contrib/ltree gist-based tree indexing scheme to work on
xml and hence the operations in no.3 above are really fast...

Chris

Andrew Dunstan wrote:

> Christopher Browne wrote:
>
>> But I think back to the XML generator I wrote for GnuCash; it has the
>> notion of building up a hierarchy of entities and attributes, each of
>> which is visible as an identifyable object of some sort. Mapping that
>> onto a set of PostgreSQL relations wouldn't work terribly well.
>>
>>
>
> *nod* I have tried this several times - it just doesn't work well,
> because the maps are too different.
>
> You could do something like this:
> . a table for each element type, fields being the attributes, plus the
> node id.
> . a table to tie everything together (parent_id, child_id,
> child_order, child_type).
> In theory you could even generate the DB schema from an XML schema and
> evaluate it with XPath-like expressions.
>
> But why put yourself to such bother? I have never found a good reason to
> do this sort of thing.
>
> cheers
>
> andrew
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


From: Kurt at DBC <kurtw(at)dbc(dot)co(dot)nz>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-chat(at)postgresql(dot)org
Subject: Re: [HACKERS] is GiST still alive?
Date: 2003-10-23 03:37:24
Message-ID: 3F974CF4.90509@dbc.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

Christopher Browne wrote:

> No, "tables" wouldn't be the right way to do it.
>
> But it's going to be troubled, in any case, because of the
> every-popular mixtures of:
> a) Often weird declarations of what character sets are in use;
> b) Pointers to other parts of a document;
> c) What's a "database" going to consist of? One XML document? Or
> many? And if many, then then how do you have a centralized
> reference point to navigate from to find the document that you
> want?
> And "navigate" was a carefully chosen word; what you then have is
> essentially a network database system, and have to then start making
> up ways of describing queries. XQuery may be better than CODASYL of
> yesteryear, but you're still left writing a lot of recursive code.
> (Thus making those that understand the Lambda Nature more powerful...)
>
> At the end, do you have a "database?" Or just a set of documents?
> It's hard to tell, a priori.
>
[Trimmed comments on utility of such features]

transferred from pgsql-hackers to pgsql-chat.

Coincidentally, just recently looked at Fongs "The design
and implementation of the POSTGRES query optimizer2" (referenced in the
Developers Guide) which implies that one of the initial goals of
postquel and postgres was to handle similar structures :

"The basic problem here is that the relational model is not well-suited
for representing hierarchical relationships. As a solution, Stonebraker
has proposed embedding queries within data fields and using these
queries to express the hierarchical relationship between the
corresponding tuple and information elsewhere in the database [STON84].
Using this idea, which POSTGRES supports, our complex object example is
now represented as shown in figure 2.5. To retrieve information executed
by the queries embedded within this data field, the user would issue the
following query:

execute (COBJECT.components) where COBJECT.coid = 1.

Thus, n join queries reduce to a single execute query. In addition,
users can selectively retrieve information
linked through these hierarchies by nesting attributes"

Go's around, comes around.
Cheers, Kurt.


From: Gregor Zeitlinger <zeitling(at)informatik(dot)hu-berlin(dot)de>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: PostgreSQL Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: is GiST still alive?
Date: 2003-10-23 11:03:17
Message-ID: Pine.LNX.4.33.0310231254210.28617-100000@mitte.informatik.hu-berlin.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

On Wed, 22 Oct 2003, Christopher Browne wrote:
> No, "tables" wouldn't be the right way to do it.
>
> But it's going to be troubled, in any case, because of the
> every-popular mixtures of:
>
> a) Often weird declarations of what character sets are in use;
I gotta admit that I haven't spend too much attention on that specific
part. But couln't you just store it in the character set that was
originally used to populate the document?

> b) Pointers to other parts of a document;
do you mean to the parent element and the child elements?
This is specifially what my custom format is designed for.

> c) What's a "database" going to consist of? One XML document? Or
> many?
many, each of which can be up to about 1TB

> And if many, then then how do you have a centralized
> reference point to navigate from to find the document that you
> want?
This one could be a table, or another xml document.

> And "navigate" was a carefully chosen word; what you then have is
> essentially a network database system, and have to then start making
> up ways of describing queries. XQuery may be better than CODASYL of
> yesteryear, but you're still left writing a lot of recursive code.
> (Thus making those that understand the Lambda Nature more powerful...)
I don't get your point? XQuery works on one document, IIRC.

> At the end, do you have a "database?" Or just a set of documents?
> It's hard to tell, a priori.
OK, know waht you mean. I'd say it's a database, because the information
is stored not plain - but in pages and in an optimized format for
insertion, deletion and querying.

> And do you think this is likely to be useful because:
>
> a) You have some clear notion as to why this ought to be useful?
yes. Modyfing and querying plain xml files sucks performancewise once your
documents get a little larger (100 MB+)

> b) XML is a big buzzword, and people have been able to succesfully
> attract "research funds" or "vulture capital" on the basis of
> having that acronym in a proposal?
That time's over anyways, isn't it?

--
Gregor Zeitlinger
gregor(at)zeitlinger(dot)de


From: Gregor Zeitlinger <zeitling(at)informatik(dot)hu-berlin(dot)de>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: is GiST still alive?
Date: 2003-10-23 11:06:03
Message-ID: Pine.LNX.4.33.0310231304480.28617-100000@mitte.informatik.hu-berlin.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

On Thu, 23 Oct 2003, Christopher Kings-Lynne wrote:
> 4. Extend the contrib/ltree gist-based tree indexing scheme to work on
> xml and hence the operations in no.3 above are really fast...
but then, the plain xml data is still stored in a database colum, if I
understand correctly?

--
Gregor Zeitlinger
gregor(at)zeitlinger(dot)de


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: is GiST still alive?
Date: 2003-10-23 11:40:33
Message-ID: 3F97BE31.1030605@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers


Storing the XML text has problems - you have to parse it every time you
want something - that has to cause a huge performance hit.

I use XML a lot for all sorts of purposes, but it is appropriate for
data transfer rather than data storage, IMNSHO.

cheers

andrew

Christopher Kings-Lynne wrote:

> Do this:
>
> 1. Create a new type called 'xml', based on text.
>
> 2. The xmlin function for that type will validate what you are
> enterering is XML
>
> 3. Create new functions to implement XPath, SAX, etc. on the xml type.
>
> 4. Extend the contrib/ltree gist-based tree indexing scheme to work on
> xml and hence the operations in no.3 above are really fast...
>
> Chris
>
>
> Andrew Dunstan wrote:
>
>> Christopher Browne wrote:
>>
>>> But I think back to the XML generator I wrote for GnuCash; it has the
>>> notion of building up a hierarchy of entities and attributes, each of
>>> which is visible as an identifyable object of some sort. Mapping that
>>> onto a set of PostgreSQL relations wouldn't work terribly well.
>>>
>>>
>>
>> *nod* I have tried this several times - it just doesn't work well,
>> because the maps are too different.
>>
>> You could do something like this:
>> . a table for each element type, fields being the attributes, plus
>> the node id.
>> . a table to tie everything together (parent_id, child_id,
>> child_order, child_type).
>> In theory you could even generate the DB schema from an XML schema
>> and evaluate it with XPath-like expressions.
>>
>> But why put yourself to such bother? I have never found a good reason
>> to do this sort of thing.
>>
>> cheers
>>
>> andrew
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 7: don't forget to increase your free space map settings
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>


From: Gregor Zeitlinger <zeitling(at)informatik(dot)hu-berlin(dot)de>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: is GiST still alive?
Date: 2003-10-23 11:51:52
Message-ID: Pine.LNX.4.33.0310231349250.28617-100000@mitte.informatik.hu-berlin.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

On Thu, 23 Oct 2003, Andrew Dunstan wrote:
> Storing the XML text has problems - you have to parse it every time you
> want something - that has to cause a huge performance hit.
You couldn't have said better what I meant.
I store the xml already parsed. You can navigate right along. To the
parent, the previous, the next elemnt or the first or last child.

> I use XML a lot for all sorts of purposes, but it is appropriate for
> data transfer rather than data storage, IMNSHO.
Right now, you're quite right. But I want to change that.

--
Gregor Zeitlinger
gregor(at)zeitlinger(dot)de


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Gregor Zeitlinger <gregor(at)zeitlinger(dot)de>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: is GiST still alive?
Date: 2003-10-23 13:09:11
Message-ID: 3F97D2F7.20601@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

>>4. Extend the contrib/ltree gist-based tree indexing scheme to work on
>>xml and hence the operations in no.3 above are really fast...
>
> but then, the plain xml data is still stored in a database colum, if I
> understand correctly?

Yep - which to me seems to be the most useful way to store it :)

Chris


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Gregor Zeitlinger <gregor(at)zeitlinger(dot)de>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: is GiST still alive?
Date: 2003-10-23 13:10:46
Message-ID: 3F97D356.1070502@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

> You couldn't have said better what I meant.
> I store the xml already parsed. You can navigate right along. To the
> parent, the previous, the next elemnt or the first or last child.

Which is the whole point of indexing it...

>>I use XML a lot for all sorts of purposes, but it is appropriate for
>>data transfer rather than data storage, IMNSHO.
>
> Right now, you're quite right. But I want to change that.

No point, it's a data exchange format, it's not usefull for data storage.

Chris


From: Makoto Yui <yuin(at)bb(dot)din(dot)or(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: is GiST still alive?
Date: 2003-10-23 13:38:09
Message-ID: 20031023223809.7de8804e.yuin@bb.din.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-chat pgsql-hackers

> *nod* I have tried this several times - it just doesn't work well,
> because the maps are too different.
>
> You could do something like this:
> . a table for each element type, fields being the attributes, plus
> the node id.
> . a table to tie everything together (parent_id, child_id,
> child_order, child_type).
> In theory you could even generate the DB schema from an XML schema and
> evaluate it with XPath-like expressions.

Although bits separated from a topic, I am developing XML database enviroment
based on PostgreSQL, XpSQL.
http://gborg.postgresql.org/project/xpsql/

The enviroment decompose XML documents into fragments and uses PostgreSQL
to store them in a set of relations.
Users do not have to know that they are stored as relations.

It provides a variety of means to access XML Documents;
(a) DOM functions to build and traverse XML Documents
(b) XPath engine to extract infomation from documents
(c) update functions to modify documents

Although the development is on going,
the performance is better than Apache Xindice.
#if you require, I'll send you the result after.

- points -
* node labeling is based on dewey decimal classification,
so updating performance is well.
* The index method is general (postgres) B-tree.

this is a programing image.
http://gborg.postgresql.org/project/xpsql/cvs/co.php/XpSQL/doc/USAGE?r=1.4

+-------------------------------------------------------------------+
Makoto Yui <yuin(at)bb(dot)din(dot)or(dot)jp>
Key fingerprint = 6462 E285 97D8 1323 40C4 F9E5 EB0F 9DE6 1713 219E
+-------------------------------------------------------------------+


From: Gregor Zeitlinger <zeitling(at)informatik(dot)hu-berlin(dot)de>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: is GiST still alive?
Date: 2003-10-23 13:59:49
Message-ID: Pine.LNX.4.33.0310231549560.28617-100000@mitte.informatik.hu-berlin.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg사설 토토SQL : Postg사설 토토SQL 메일 링리스트 : 2003-10-23 이후 PGSQL-Chat 13:59 pgsql-hackers

On Thu, 23 Oct 2003, Christopher Kings-Lynne wrote:
> > You couldn't have said better what I meant.
> > I store the xml already parsed. You can navigate right along. To the
> > parent, the previous, the next elemnt or the first or last child.
>
> Which is the whole point of indexing it...
not quite. Indexing can solve some of the problems, not all.
1) You have to update the index every time you modify the data. My custom
format serves as an index for some queries.

2) The page format is designed in such a way that modifications
(insertion, deletion) are as fast as the original parsing. I'm not sure
how that stacks up to modifying data in a column. I guess it depens on the
strategy to store very large strings in columns.

> >>I use XML a lot for all sorts of purposes, but it is appropriate for
> >>data transfer rather than data storage, IMNSHO.
> >
> > Right now, you're quite right. But I want to change that.
>
> No point, it's a data exchange format, it's not usefull for data storage.
Well, neither one is a data exchange format only or a data storage format
only.
Rather, the difference is that relations are designed to store structued
data while xml is desinged to store semi-structued (not so regular) data.

Which is better suited for data exchange is a matter of convention (where
xml seems to be good), while efficiency and other features of an
implementation determine, which one is suited for data storage.

If your point is that currently xml is not suited for storage, because
there are more efficent RDBMS than xml databases, I agree. Otherwise, I
don't see your point.

--
Gregor Zeitlinger
gregor(at)zeitlinger(dot)de