Quick Links

Re: Serious performance problem

Lists:	pgsql-hackers

From:	"Tille, Andreas" <TilleA(at)rki(dot)de>
To:	"Claus, Hermann" <ClausH(at)rki(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-30 10:44:16
Message-ID:	Pine.LNX.4.33.0110301114580.6117-100000@wr-linux02.rki.ivbb.bund.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 29 Oct 2001, Vsevolod Lobko wrote:

> Seems that problem is very simple :))
> MSSql can do queries from indexes, without using actual table at all.
> Postgresql doesn't.
>
> So mssql avoids sequental scanning of big table, and simply does scan of
> index which is already in needed order and has very much less size.
I forewarded this information to my colleague and he replied the following
(im translating from German into English):

hc> I expected this problem. But what is the purpose of an index: Not
hc> to look into the table itself. Moreover this means that the expense
hc> grows linear with the table size - no good prospect at all (the
hc> good thing is it is not exponential :-)).
I have to explain that we are in the *beginning* of production process.
We expect a lot more of data.

hc> In case of real index usage the expense grows only with log(n).
hc> No matter about the better philosophy of database servers, MS-SQL-Server
hc> has a consequent index usage and so it is very fast at many queries.
hc> When performing a query to a field without index, I get a slow
hc> table scan. This is like measuring the speed of the harddisk and
hc> the cleverness of the cache.

The consequence for my problem is now: If it is technically possible
to implement index scans without table lookups please implement it. If
not we just have to look for another database engine which does so,
because our applictaion really need the speed on this type of queries.
I repeat from my initial posting: The choice of the server for our
application could have importance for many projects in the field of
medicine in Germany. I really hope that there is a reasonable solution
which perhaps could give a balance between safety and speed. For
example I can assure in my application that the index, once created
will be valid, because I just want to read in a new set of data once
a day (from the MS-SQL Server which collects data over the day). So
I could recreate all indices after the import and the database is
readonly until the next cron job. Is there any chance to speed up
those applications?

Kind regards

Andreas.

From:	Brent Verner <brent(at)rcfile(dot)org>
To:	"Tille, Andreas" <TilleA(at)rki(dot)de>
Cc:	"Claus, Hermann" <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-30 11:48:40
Message-ID:	20011030064840.A58106@rcfile.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 30 Oct 2001 at 11:44 (+0100), Tille, Andreas wrote:
| On Mon, 29 Oct 2001, Vsevolod Lobko wrote:
|
| > Seems that problem is very simple :))
| > MSSql can do queries from indexes, without using actual table at all.
| > Postgresql doesn't.
| >
| > So mssql avoids sequental scanning of big table, and simply does scan of
| > index which is already in needed order and has very much less size.
| I forewarded this information to my colleague and he replied the following
| (im translating from German into English):
|
| hc> I expected this problem. But what is the purpose of an index: Not
| hc> to look into the table itself. Moreover this means that the expense
| hc> grows linear with the table size - no good prospect at all (the
| hc> good thing is it is not exponential :-)).
| I have to explain that we are in the *beginning* of production process.
| We expect a lot more of data.
|
| hc> In case of real index usage the expense grows only with log(n).
| hc> No matter about the better philosophy of database servers, MS-SQL-Server
| hc> has a consequent index usage and so it is very fast at many queries.
| hc> When performing a query to a field without index, I get a slow
| hc> table scan. This is like measuring the speed of the harddisk and
| hc> the cleverness of the cache.
|
| The consequence for my problem is now: If it is technically possible
| to implement index scans without table lookups please implement it. If
| not we just have to look for another database engine which does so,
| because our applictaion really need the speed on this type of queries.
| I repeat from my initial posting: The choice of the server for our
| application could have importance for many projects in the field of
| medicine in Germany. I really hope that there is a reasonable solution
| which perhaps could give a balance between safety and speed. For
| example I can assure in my application that the index, once created
| will be valid, because I just want to read in a new set of data once
| a day (from the MS-SQL Server which collects data over the day). So
| I could recreate all indices after the import and the database is
| readonly until the next cron job. Is there any chance to speed up
| those applications?

CREATE INDEX idx_meldekategorie_hauptdaten_f
ON hauptdaten_fall(meldekategorie);
CLUSTER idx_meldekategorie_hauptdaten_f ON hauptdaten_fall;

Aggregate (cost=5006.02..5018.90 rows=258 width=16)
-> Group (cost=5006.02..5012.46 rows=2575 width=16)
-> Sort (cost=5006.02..5006.02 rows=2575 width=16)
-> Seq Scan on hauptdaten_fall (cost=0.00..4860.12 rows=2575 width=16)

This looks much nicer, but is still quite slow. I'm quite sure the
slowness is in the sort(), since all queries that don't sort, return
quickly. I hoped the clustered index would speed up the sort, but
that is not the case.

It _seems_ a simple optimization would be to not (re)sort the tuples
when using a clustered index.

if( the_column_to_order_by_is_clustered ){
if( order_by_is_DESC )
// reverse the tuples to handle
}

I haven't looked at the code to see if this is even feasible, but I
do imagine there is enough info available to avoid an unnecessary
sort on the CLUSTERED index. The only problem I see with this is
if the CLUSTERed index is not kept in a CLUSTERed state as more
records are added to this table.

brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

From:	Antonio Fiol Bonnín <fiol(at)w3ping(dot)com>
To:	Brent Verner <brent(at)rcfile(dot)org>
Cc:	"Tille, Andreas" <TilleA(at)rki(dot)de>, "Claus, Hermann" <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-30 13:53:29
Message-ID:	3BDEB0D9.A277E82C@w3ping.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

AFAIK, sorting is necessary even when you have CLUSTERed a table using an index.

Somewhere on the docs I read sth like "CLUSTER reorders the table on disk so that entries
closer on the index are closer on the disk" (obviously written in better English ;-)

But if you INSERT a single row later, it will NOT get inserted to the right place. So
SORT is still necessary.

MAYBE, but I am not sure at all, the sort may take place in less "real" time than in case
the table was not CLUSTERed, as the table is "nearly" sorted.

Hackers, is the sorting algorithm capable of exiting at the very moment the table is
sorted, or are some extra passes always calculated?

Good luck!

Antonio

Brent Verner wrote:

> On 30 Oct 2001 at 11:44 (+0100), Tille, Andreas wrote:
> | On Mon, 29 Oct 2001, Vsevolod Lobko wrote:
> |
> | > Seems that problem is very simple :))
> | > MSSql can do queries from indexes, without using actual table at all.
> | > Postgresql doesn't.
> | >
> | > So mssql avoids sequental scanning of big table, and simply does scan of
> | > index which is already in needed order and has very much less size.
> | I forewarded this information to my colleague and he replied the following
> | (im translating from German into English):
> |
> | hc> I expected this problem. But what is the purpose of an index: Not
> | hc> to look into the table itself. Moreover this means that the expense
> | hc> grows linear with the table size - no good prospect at all (the
> | hc> good thing is it is not exponential :-)).
> | I have to explain that we are in the *beginning* of production process.
> | We expect a lot more of data.
> |
> | hc> In case of real index usage the expense grows only with log(n).
> | hc> No matter about the better philosophy of database servers, MS-SQL-Server
> | hc> has a consequent index usage and so it is very fast at many queries.
> | hc> When performing a query to a field without index, I get a slow
> | hc> table scan. This is like measuring the speed of the harddisk and
> | hc> the cleverness of the cache.
> |
> | The consequence for my problem is now: If it is technically possible
> | to implement index scans without table lookups please implement it. If
> | not we just have to look for another database engine which does so,
> | because our applictaion really need the speed on this type of queries.
> | I repeat from my initial posting: The choice of the server for our
> | application could have importance for many projects in the field of
> | medicine in Germany. I really hope that there is a reasonable solution
> | which perhaps could give a balance between safety and speed. For
> | example I can assure in my application that the index, once created
> | will be valid, because I just want to read in a new set of data once
> | a day (from the MS-SQL Server which collects data over the day). So
> | I could recreate all indices after the import and the database is
> | readonly until the next cron job. Is there any chance to speed up
> | those applications?
>
> CREATE INDEX idx_meldekategorie_hauptdaten_f
> ON hauptdaten_fall(meldekategorie);
> CLUSTER idx_meldekategorie_hauptdaten_f ON hauptdaten_fall;
>
> Aggregate (cost=5006.02..5018.90 rows=258 width=16)
> -> Group (cost=5006.02..5012.46 rows=2575 width=16)
> -> Sort (cost=5006.02..5006.02 rows=2575 width=16)
> -> Seq Scan on hauptdaten_fall (cost=0.00..4860.12 rows=2575 width=16)
>
> This looks much nicer, but is still quite slow. I'm quite sure the
> slowness is in the sort(), since all queries that don't sort, return
> quickly. I hoped the clustered index would speed up the sort, but
> that is not the case.
>
> It _seems_ a simple optimization would be to not (re)sort the tuples
> when using a clustered index.
>
> if( the_column_to_order_by_is_clustered ){
> if( order_by_is_DESC )
> // reverse the tuples to handle
> }
>
> I haven't looked at the code to see if this is even feasible, but I
> do imagine there is enough info available to avoid an unnecessary
> sort on the CLUSTERED index. The only problem I see with this is
> if the CLUSTERed index is not kept in a CLUSTERed state as more
> records are added to this table.
>
> brent
>
> --
> "Develop your talent, man, and leave the world something. Records are
> really gifts from people. To think that an artist would love you enough
> to share his music with anyone is a beautiful thing." -- Duane Allman
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

From:	"Tille, Andreas" <TilleA(at)rki(dot)de>
To:
Cc:	"Claus, Hermann" <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-30 14:09:00
Message-ID:	Pine.LNX.4.33.0110301501110.6117-100000@wr-linux02.rki.ivbb.bund.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 30 Oct 2001, Antonio Fiol Bonnín wrote:

> AFAIK, sorting is necessary even when you have CLUSTERed a table using an index.
Sorting is not the performance constraint in my example. Just leave out
the sorting and see what happens ...

> But if you INSERT a single row later, it will NOT get inserted to the right place. So
> SORT is still necessary.
Well rearanging the database in a cronjob after inserting new data once a day
over night would be possible - but I doubt that it makes a big difference.

Kind regards

Andreas.

From:	Alex Pilosov <alex(at)pilosoft(dot)com>
To:	Antonio Fiol Bonnín <fiol(at)w3ping(dot)com>
Cc:	Brent Verner <brent(at)rcfile(dot)org>, "Tille, Andreas" <TilleA(at)rki(dot)de>, "Claus, Hermann" <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-30 15:24:17
Message-ID:	Pine.BSO.4.10.10110301012530.2632-100000@spider.pilosoft.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 30 Oct 2001, Antonio Fiol [iso-8859-1] Bonnn wrote:

> > | > Seems that problem is very simple :))
> > | > MSSql can do queries from indexes, without using actual table at all.
> > | > Postgresql doesn't.
> > | >
> > | > So mssql avoids sequental scanning of big table, and simply does scan of
> > | > index which is already in needed order and has very much less size.
<snip>
> > | The consequence for my problem is now: If it is technically possible
> > | to implement index scans without table lookups please implement it. If
The feature you are looking for is called 'index coverage'. Unfortunately,
it is not easy to implement with Postgresql, and it is one of few
outstanding 'nasties'. The reason you can't do it is follows: Postgres
uses MVCC, and stores 'when' the tuple is alive inside the tuple. So, even
if index contains all the information you need, you still need to access
main table to check if the tuple is valid.

Possible workaround: store tuple validity in index, that way, a lot more
space is wasted (16 more bytes/tuple/index), and you will need to update
all indices when the base table is updated, even if indexed information
have not changed.

Fundamentally, this may be necessary anyway, to make index handlers aware
of transactions and tuple validity (currently, if you have unique index,
you may have conflicts when different transactions attempt to insert
conflicting data, _at the time of insert, not at time of commit_).

-alex

From:	"Tille, Andreas" <TilleA(at)rki(dot)de>
To:
Cc:	"Claus, Hermann" <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-30 16:13:46
Message-ID:	Pine.LNX.4.33.0110301706000.6117-100000@wr-linux02.rki.ivbb.bund.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 30 Oct 2001, Alex Pilosov wrote:

> The feature you are looking for is called 'index coverage'. Unfortunately,
> it is not easy to implement with Postgresql, and it is one of few
> outstanding 'nasties'. The reason you can't do it is follows: Postgres
> uses MVCC, and stores 'when' the tuple is alive inside the tuple. So, even
> if index contains all the information you need, you still need to access
> main table to check if the tuple is valid.
Well, I do not fully understand that stuff, but I get a feeling of the
problem. Thanks for the explanation.

> Possible workaround: store tuple validity in index, that way, a lot more
> space is wasted (16 more bytes/tuple/index), and you will need to update
> all indices when the base table is updated, even if indexed information
> have not changed.
This would be acceptable for *my* special application but I´m afraid
this could be a problem for others.

> Fundamentally, this may be necessary anyway, to make index handlers aware
> of transactions and tuple validity (currently, if you have unique index,
> you may have conflicts when different transactions attempt to insert
> conflicting data, _at the time of insert, not at time of commit_).
As I said all this wouln´t be a problem for my application. I just
run a sequential insert of data each night. Then the database is read only.

Does anybody see chances that 'index coverage' would be implemented into
7.2. This would be a cruxial feature for my application. If it will
not happen in a reasonable time frame I would have to look for
alternative database server. Anybody knows something about MySQL or
Interbase?

Kind regards

Andreas.

From:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To:	"Tille, Andreas" <TilleA(at)rki(dot)de>
Cc:	"Claus, Hermann" <ClausH(at)rki(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Serious performance problem
Date:	2001-10-30 16:51:50
Message-ID:	20011030084404.F14322-100000@megazone23.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Does anybody see chances that 'index coverage' would be implemented into
> 7.2. This would be a cruxial feature for my application. If it will
> not happen in a reasonable time frame I would have to look for
> alternative database server. Anybody knows something about MySQL or
> Interbase?

Since I don't remember anyone mentioning working on it here and 7.2 just
went into beta, I don't think it's likely. If you want to push, you may
be able to convince someone for 7.3.

From:	Horst Herb <hherb(at)malleenet(dot)net(dot)au>
To:	"Tille, Andreas" <TilleA(at)rki(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-31 03:37:29
Message-ID:	20011031033311.4792.qmail@gnumed.dhs.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wednesday 31 October 2001 03:13, you wrote:
> On Tue, 30 Oct 2001, Alex Pilosov wrote:

> As I said all this wouln´t be a problem for my application. I just
> run a sequential insert of data each night. Then the database is read
> only.
>
> Does anybody see chances that 'index coverage' would be implemented into
> 7.2. This would be a cruxial feature for my application. If it will

Andreas,

I have the feeling that your problem is solved best by taking a different
approach.
As A. Pilosovs posting pointed out, index coverage is a problem intrinsic to
the MVCC implementation (IMHO a small price to pay for a priceless feature).
I can't see why much effort should go into a brute force method to implement
index coverage, if your problem can be solved more elegant in a different way.

With the example you posted, it is essentially only simple statistics you
want to run on tables where the *majority* of records would qualify in your
query.
Why not create an extra "statistics" table which is updated automatically
through triggers in your original table? That way, you will always get
up-to-date INSTANT query results no matter how huge your original table is.

And, don't forget that the only way MS SQL can achieve the better performance
here is through mercilessly hogging ressources. In a complex database
environment with even larger tables, the performance gain in MS SQL would be
minimal (my guess).

Horst

From:	Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-31 05:41:56
Message-ID:	4.2.0.58.20011031064030.00d91620@pop.freesurf.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>Why not create an extra "statistics" table which is updated automatically
>through triggers in your original table? That way, you will always get
>up-to-date INSTANT query results no matter how huge your original table is.
>
>And, don't forget that the only way MS SQL can achieve the better performance
>here is through mercilessly hogging ressources. In a complex database
>environment with even larger tables, the performance gain in MS SQL would be
>minimal (my guess).

Definitely. This is a design optimization problem not an index problem.

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	Alex Pilosov <alex(at)pilosoft(dot)com>
Cc:	Antonio Fiol Bonnín <fiol(at)w3ping(dot)com>, Brent Verner <brent(at)rcfile(dot)org>, Tille(at)taru(dot)tm(dot)ee, Andreas <TilleA(at)rki(dot)de>, Claus(at)taru(dot)tm(dot)ee, Hermann <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-31 08:34:32
Message-ID:	3BDFB798.6760CD90@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alex Pilosov wrote:
>
> On Tue, 30 Oct 2001, Antonio Fiol [iso-8859-1] Bonnín wrote:
>
> > > | > Seems that problem is very simple :))
> > > | > MSSql can do queries from indexes, without using actual table at all.
> > > | > Postgresql doesn't.
> > > | >
> > > | > So mssql avoids sequental scanning of big table, and simply does scan of
> > > | > index which is already in needed order and has very much less size.
> <snip>
> > > | The consequence for my problem is now: If it is technically possible
> > > | to implement index scans without table lookups please implement it. If
> The feature you are looking for is called 'index coverage'. Unfortunately,
> it is not easy to implement with Postgresql, and it is one of few
> outstanding 'nasties'. The reason you can't do it is follows: Postgres
> uses MVCC, and stores 'when' the tuple is alive inside the tuple. So, even
> if index contains all the information you need, you still need to access
> main table to check if the tuple is valid.
>
> Possible workaround: store tuple validity in index, that way, a lot more
> space is wasted (16 more bytes/tuple/index), and you will need to update
> all indices when the base table is updated, even if indexed information
> have not changed.

AFAIK you will need to update all indexes anyway as MVCC changes the
location
of the new tuple.

-------------
Hannu

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	"Tille, Andreas" <TilleA(at)rki(dot)de>
Cc:	"Claus, Hermann" <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-31 08:47:00
Message-ID:	3BDFBA84.9D4C67C8@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tille, Andreas" wrote:
>
> On Tue, 30 Oct 2001, Alex Pilosov wrote:
>
> > The feature you are looking for is called 'index coverage'. Unfortunately,
> > it is not easy to implement with Postgresql, and it is one of few
> > outstanding 'nasties'. The reason you can't do it is follows: Postgres
> > uses MVCC, and stores 'when' the tuple is alive inside the tuple. So, even
> > if index contains all the information you need, you still need to access
> > main table to check if the tuple is valid.
> Well, I do not fully understand that stuff, but I get a feeling of the
> problem. Thanks for the explanation.
>
> > Possible workaround: store tuple validity in index, that way, a lot more
> > space is wasted (16 more bytes/tuple/index), and you will need to update
> > all indices when the base table is updated, even if indexed information
> > have not changed.
> This would be acceptable for *my* special application but IŽm afraid
> this could be a problem for others.
>
> > Fundamentally, this may be necessary anyway, to make index handlers aware
> > of transactions and tuple validity (currently, if you have unique index,
> > you may have conflicts when different transactions attempt to insert
> > conflicting data, _at the time of insert, not at time of commit_).
> As I said all this woulnŽt be a problem for my application. I just
> run a sequential insert of data each night. Then the database is read only.
>
> Does anybody see chances that 'index coverage' would be implemented into
> 7.2. This would be a cruxial feature for my application. If it will
> not happen in a reasonable time frame I would have to look for
> alternative database server. Anybody knows something about MySQL or
> Interbase?

If it is static data and simple queries then there is fairly good chance
that MySQL is a good choice .

As fo the other two opensource databases (Interbase and SAPDB (a
modyfied
version of ADABAS released under GPL by SAP - http://www.sapdb.com/) I
have
no direct experience.

I occasionally read sapdb mailing list, and I've got an impression that
it
is quite usable and stable DB once you have set it up. Setting up seems
order(s) of magnitude harder than for PostgreSQL or MySQL.

Weather it actually runs full-table aggregates faster than PG is a thing
I can't comment on, but you could get some of their people to do the
benchmarking for you if you send them an advocacy-urging request, like
I'd
switch if you show me that yur dbis fast enough ;)

-------------------
Hannu

From:	Denis Perchine <dyp(at)perchine(dot)com>
To:	Alex Pilosov <alex(at)pilosoft(dot)com>, Antonio Fiol Bonn?n <fiol(at)w3ping(dot)com>
Cc:	Brent Verner <brent(at)rcfile(dot)org>, "Tille, Andreas" <TilleA(at)rki(dot)de>, "Claus, Hermann" <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-10-31 18:26:10
Message-ID:	200110311907.f9VJ7lP85231@postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tuesday 30 October 2001 21:24, Alex Pilosov wrote:
> > > | The consequence for my problem is now: If it is technically possible
> > > | to implement index scans without table lookups please implement it.
> > > | If
>
> The feature you are looking for is called 'index coverage'. Unfortunately,
> it is not easy to implement with Postgresql, and it is one of few
> outstanding 'nasties'. The reason you can't do it is follows: Postgres
> uses MVCC, and stores 'when' the tuple is alive inside the tuple. So, even
> if index contains all the information you need, you still need to access
> main table to check if the tuple is valid.
>
> Possible workaround: store tuple validity in index, that way, a lot more
> space is wasted (16 more bytes/tuple/index), and you will need to update
> all indices when the base table is updated, even if indexed information
> have not changed.

What is the problem to implement this index as a special index type for
people who need this? Just add a flag keyword to index creation clause.

Actually I would like to hear Tom's opinion on this issue. This issue is of
my interest too.

Also I saw sometime ago in hackers that there is a patch implementing this...
Or I am wrong here?

--
Denis

From:	"Tille, Andreas" <TilleA(at)rki(dot)de>
To:
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-11-01 15:24:48
Message-ID:	Pine.LNX.4.33.0111011603540.21752-100000@wr-linux02.rki.ivbb.bund.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	Postg토토 베이SQL

On Wed, 31 Oct 2001, Horst Herb wrote:

> I have the feeling that your problem is solved best by taking a different
> approach.
> As A. Pilosovs posting pointed out, index coverage is a problem intrinsic to
> the MVCC implementation (IMHO a small price to pay for a priceless feature).
Could somebody explain MVCC to such an uneducated man like me. Is this a
certain feature (which perhaps MS SQL) doesn´t have and which might be
important in the future?
> I can't see why much effort should go into a brute force method to implement
> index coverage, if your problem can be solved more elegant in a different way.
>
> With the example you posted, it is essentially only simple statistics you
> want to run on tables where the *majority* of records would qualify in your
> query.
> Why not create an extra "statistics" table which is updated automatically
> through triggers in your original table? That way, you will always get
> up-to-date INSTANT query results no matter how huge your original table is.
My problem is to convince my colleague. I´m afraid that he would consider
those optimizing stuff as "tricks" to work around constraints of the
database server. He might argue that if it comes to the point that also
MS SQL server needs some speed improvement and he has to do the same
performance tuning things MS SQL does outperform PostgreSQL again and we
are at the end with our wisdom. I repeat: I for myself see the strength
of OpenSource (Horst, you know me ;-) ) and I would really love to use
PostgreSQL. But how to prove those arguing wrong? *This* is my problem.
We have to do a design decision. My colleague is a mathematician who
has prefered MS SQL server some years ago over Oracle and had certain
reasons for it based on estimations of our needs. He had no problems
with UNIX or something else and he theoretically is on my side that OpenSource
is the better way and would accept it if it would give the same results
as his stuff.
But he had never had some performance problems with his databases and
knows people who claim to fill Zillions of Megabytes of MS SQL server.
So he doubt on the quality of PostgreSQL server if it has problems in
the first run. I have to admit that his point of view is easy to
understand. I would have to prove (!) that we wouldn´t have trouble
with bigger databases and that those things are no "dirty workarounds"
of a weak server.

> And, don't forget that the only way MS SQL can achieve the better performance
> here is through mercilessly hogging ressources. In a complex database
> environment with even larger tables, the performance gain in MS SQL would be
> minimal (my guess).
Unfortunately it is not enough to guess. He has enough experiences that
I knows that the MS SQL server is fit for the task he wants to solve. If
I tell him: "*Perhaps* you could run into trouble.", he would just laugh
about me because I´m in trouble *now* and can´t prove that I won´t be
again.

Kind regards

Andreas.

From:	Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To:	"Tille, Andreas" <TilleA(at)rki(dot)de>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Serious performance problem
Date:	2001-11-01 16:37:12
Message-ID:	20011101083129.O22509-100000@megazone23.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> My problem is to convince my colleague. Im afraid that he would consider
> those optimizing stuff as "tricks" to work around constraints of the
> database server. He might argue that if it comes to the point that also
> MS SQL server needs some speed improvement and he has to do the same
> performance tuning things MS SQL does outperform PostgreSQL again and we
> are at the end with our wisdom. I repeat: I for myself see the strength
> of OpenSource (Horst, you know me ;-) ) and I would really love to use
> PostgreSQL. But how to prove those arguing wrong? *This* is my problem.
> We have to do a design decision. My colleague is a mathematician who
> has prefered MS SQL server some years ago over Oracle and had certain
> reasons for it based on estimations of our needs. He had no problems
> with UNIX or something else and he theoretically is on my side that OpenSource
> is the better way and would accept it if it would give the same results
> as his stuff.
> But he had never had some performance problems with his databases and
> knows people who claim to fill Zillions of Megabytes of MS SQL server.
> So he doubt on the quality of PostgreSQL server if it has problems in
> the first run. I have to admit that his point of view is easy to
> understand. I would have to prove (!) that we wouldnt have trouble
> with bigger databases and that those things are no "dirty workarounds"
> of a weak server.
>
> > And, don't forget that the only way MS SQL can achieve the better performance
> > here is through mercilessly hogging ressources. In a complex database
> > environment with even larger tables, the performance gain in MS SQL would be
> > minimal (my guess).
> Unfortunately it is not enough to guess. He has enough experiences that
> I knows that the MS SQL server is fit for the task he wants to solve. If
> I tell him: "*Perhaps* you could run into trouble.", he would just laugh
> about me because Im in trouble *now* and cant prove that I wont be
> again.

The only way to know for certain is to try both at various sizes to see.
Getting numbers for one type of query on one size database tells very
little. Load a test set that's 100, 1000, whatever times the current size
and see what happens. ISTM anything short of this is fairly meaningless.
What point does the other person expect to run into problems, how would
he solve them, how does postgres run at that point with and without
special optimization.

It's perfectly possible that for the particular queries and load you're
running that MSSQL will be better, there's nothing
wrong with that. Conversely, it's entirely possible that one could find
workloads that postgres is better at.

From:	Doug McNaught <doug(at)wireboard(dot)com>
To:	"Tille, Andreas" <TilleA(at)rki(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-11-01 17:11:05
Message-ID:	m3ofmmxvpy.fsf@belphigor.mcnaught.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tille, Andreas" <TilleA(at)rki(dot)de> writes:

> Could somebody explain MVCC to such an uneducated man like me. Is this a
> certain feature (which perhaps MS SQL) doesnt have and which might be
> important in the future?

http://www.us.postgresql.org/users-lounge/docs/7.1/postgres/mvcc.html

(Or substitute your favorite mirror)

Only Oracle has anything like it AFAIK.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

From:	"Dave Cramer" <Dave(at)micro-automation(dot)net>
To:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Licensing issues including another projects source code into the jdbc driver
Date:	2001-11-01 17:58:47
Message-ID:	018f01c162fe$db26c190$c201a8c0@inspiron
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I am contemplating including log4jme source code into the jdbc driver.
Who would be the best person to contact wrt ironing out the licensing
issues?

Dave

From:	"Dave Cramer" <dave(at)fastcrypt(dot)com>
To:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Licensing issues including another projects source code into the jdbc driver
Date:	2001-11-01 19:13:28
Message-ID:	01a701c16309dd4d20$c201a8c0@inspiron
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I am contemplating including log4jme source code into the jdbc driver.
Who would be the best person to contact wrt ironing out the licensing
issues?

Dave

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	dave(at)fastcrypt(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Licensing issues including another projects source code
Date:	2001-11-01 19:48:39
Message-ID:	200111011948.fA1Jmdn19684@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
> I am contemplating including log4jme source code into the jdbc driver.
> Who would be the best person to contact wrt ironing out the licensing
> issues?

Can you tell us what license it uses?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

From:	"Dave Cramer" <dave(at)fastcrypt(dot)com>
To:	"'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Licensing issues including another projects source code into the jdbc driver
Date:	2001-11-01 20:14:28
Message-ID:	PostgreSQL : Re : 다른 프로젝트를 포함한 토토 베이 문제
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

It is using the apache licence

http://www.qos.ch/log4jME/LICENSE.txt

It appears that they allow the code to be used in either binary or
source as long as their licence remains intact

Dave

-----Original Message-----
From: Bruce Momjian [mailto:pgman(at)candle(dot)pha(dot)pa(dot)us]
Sent: November 1, 2001 2:49 PM
To: dave(at)fastcrypt(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Licensing issues including another projects
source code into the jdbc driver

>
> I am contemplating including log4jme source code into the jdbc driver.
> Who would be the best person to contact wrt ironing out the licensing
> issues?

Can you tell us what license it uses?

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Dave Cramer <dave(at)fastcrypt(dot)com>
Cc:	"'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Licensing issues including another projects source
Date:	2001-11-04 13:04:54
Message-ID:	Pine.LNX.4.30.0111022100300.874-100000@peter.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dave Cramer writes:

> It is using the apache licence
>
> http://www.qos.ch/log4jME/LICENSE.txt
>
> It appears that they allow the code to be used in either binary or
> source as long as their licence remains intact

The apache license has an advertising clause, which is not acceptable.
Getting someone to relicense the software is difficult to impossible if
there is a multitude of outside contributors.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter

From:	"Dave Cramer" <dave(at)fastcrypt(dot)com>
To:	"'Peter Eisentraut'" <peter_e(at)gmx(dot)net>
Cc:	"'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Licensing issues including another projects source code into the jdbc driver
Date:	2001-11-04 14:21:20
Message-ID:	PostgreSQL : Re : 다른 프로젝트를 포함한 젠 토토 문제 소스
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter,

I presume you are referring to the 3rd clause? What is the issue with
this clause?

Dave

-----Original Message-----
From: Peter Eisentraut [mailto:peter_e(at)gmx(dot)net]
Sent: November 4, 2001 8:05 AM
To: Dave Cramer
Cc: 'Bruce Momjian'; pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Licensing issues including another projects
source code into the jdbc driver

Dave Cramer writes:

> It is using the apache licence
>
> http://www.qos.ch/log4jME/LICENSE.txt
>
> It appears that they allow the code to be used in either binary or
> source as long as their licence remains intact

The apache license has an advertising clause, which is not acceptable.
Getting someone to relicense the software is difficult to impossible if
there is a multitude of outside contributors.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter

From:	"Andrea Aime" <aaime(at)comune(dot)modena(dot)it>
To:
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-11-05 11:03:23
Message-ID:	3BE671FB.F9D07D60@comune.modena.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Maybe just a silly idea, but would'nt it be possible (and useful)
to store tuple validity in a separate bitmap file, that reports in every
bit the validity of the corresponding tuple? It would grow linearly, but
at least it would be very small compared to the actual data...
Best regards
Andrea Aime

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	andrea(dot)aime(at)comune(dot)modena(dot)it
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Serious performance problem
Date:	2001-11-05 11:51:05
Message-ID:	3BE67D29.29322DDC@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrea Aime wrote:
>
> Alex Pilosov wrote:
> >
> > On Tue, 30 Oct 2001, Antonio Fiol [iso-8859-1] BonnМn wrote:
> >
> > > > | > Seems that problem is very simple :))
> > > > | > MSSql can do queries from indexes, without using actual table at all.
> > > > | > Postgresql doesn't.
> > > > | >
> > > > | > So mssql avoids sequental scanning of big table, and simply does scan of
> > > > | > index which is already in needed order and has very much less size.
> > <snip>
> > > > | The consequence for my problem is now: If it is technically possible
> > > > | to implement index scans without table lookups please implement it. If
> > The feature you are looking for is called 'index coverage'. Unfortunately,
> > it is not easy to implement with Postgresql, and it is one of few
> > outstanding 'nasties'. The reason you can't do it is follows: Postgres
> > uses MVCC, and stores 'when' the tuple is alive inside the tuple. So, even
> > if index contains all the information you need, you still need to access
> > main table to check if the tuple is valid.
> >
> > Possible workaround: store tuple validity in index, that way, a lot more
> > space is wasted (16 more bytes/tuple/index), and you will need to update
> > all indices when the base table is updated, even if indexed information
> > have not changed.
> >
>
> Maybe just a silly idea, but would'nt it be possible (and useful)
> to store tuple validity in a separate bitmap file, that reports in every
> bit the validity of the corresponding tuple? It would grow linearly, but
> at least it would be very small compared to the actual data...

I see two problems with this approach:

1. Tuple validity is different for different transactions running
concurrently.

We still could cache death-transaction_ids of tuples _in_memory_ quite
cheaply
time-wize, but I'm not sure how big win it will be in general

2. thene is no easy way to know which bit corresponds to which tuple as
each
database page can contain arbitrary number of pages (this one is
easyer,
as we can use a somewhat sparse bitmap that is less space-efficient)

------------
Hannu

From:	"Dave Cramer" <dave(at)fastcrypt(dot)com>
To:	"'Peter Eisentraut'" <peter_e(at)gmx(dot)net>
Cc:	"'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Licensing issues including another projects source code into the jdbc driver
Date:	2001-11-05 19:59:43
Message-ID:	PostgreSQL : Re : 다른 프로젝트를 포함한 사설 토토 문제
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Umm ok,

This is dissappointing; kind of defeats the purpose of being able to
leverage open source code for the good of all open source projects. It
isn't that big a deal, we can write our own logging package.

Dave

-----Original Message-----
From: Peter Eisentraut [mailto:peter_e(at)gmx(dot)net]
Sent: November 5, 2001 3:05 PM
To: Dave Cramer
Cc: 'Bruce Momjian'; pgsql-hackers(at)postgresql(dot)org
Subject: RE: [HACKERS] Licensing issues including another projects
source code into the jdbc driver

Dave Cramer writes:

> I presume you are referring to the 3rd clause? What is the issue with
> this clause?

It would require everyone that ships a product based on the JDBC driver
to
mention this acknowledgement in advertisements, which is annoying and
impractical. More generally, it would introduce a divergence in
licensing
in PostgreSQL, which should be avoided.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Dave Cramer <dave(at)fastcrypt(dot)com>
Cc:	"'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Licensing issues including another projects source
Date:	2001-11-05 20:04:46
Message-ID:	Pine.LNX.4.30.0111051714510.617-100000@peter.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dave Cramer writes:

> I presume you are referring to the 3rd clause? What is the issue with
> this clause?

It would require everyone that ships a product based on the JDBC driver to
mention this acknowledgement in advertisements, which is annoying and
impractical. More generally, it would introduce a divergence in licensing
in PostgreSQL, which should be avoided.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter