Lists: | pgsql-hackers |
---|
From: | "Tille, Andreas" <TilleA(at)rki(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Serious performance problem |
Date: | 2001-11-04 16:24:04 |
Message-ID: | Pine.LNX.4.33.0111041723090.15449-100000@wr-linux02.rki.ivbb.bund.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, 2 Nov 2001, Zeugswetter Andreas SB SD wrote:
> This is not true, since the index scan also neads to read the leaf pages
> in MS Sql. The number of leaf pages grows linear with number of rows
> that qualify the where restriction.
>
> R = number of rows that qualify
> --> O(R + log(R))
>
> The pg measurements showed, that PostgreSQL query performance can be
> expected
> to stay nearly the same regardless of number of rows in the table as
> long as
> the number of rows that qualify the where restriction stays constant.
> The response time is linear to the number of rows that qualify the where
>
> restriction, but that linear behavior is also expected with MS Sql.
Well, may be you are right here but I talked once more with my colleague
about specifications. We can assure that the input of data is about 1GB.
We can be sure about that because it is defined what has to be stored
is fixed in the German law about infectious diseases. We have no online
shop system or something else. <sarcastic>If the recent anthrax problem
would increase exponential we could be into trouble, but chances are
low.</sarcastic> So we have good chances to estimate the amount of data
quite well. It is a linear growth of 1GB per year. If MS SQL server is
now fast enough we can grow with normal hardware performance increase
over the year. This is a fact I have to accept.
Additional constraint is that the underlying data modell with an
Access application is running by about 170 clients which have an amount
of data of about 100 - 500 data sets which they export once a week into
our central server. The developers tried hard to get the Access application
and the MS SQL server solution in sync and having a third application
(by rewriting some 500 queries) would be a lot of work. (I´m not afraid
this work but I must be sure it would make sense before I start and so
I hope for advice of people who perhaps did so.)
I discussed the issue of using statistics tables to speed up certain
queries. He told me that those technique is known as OLAP tubes in
MS SQL server and that there are tools to build such things. Is this
a valid comparison? He did not use it because it would disable the
access solution of our clients. Are there any tools for PostgreSQL for
such stuff besides the manual creating tables and triggers?
Currently I see two solutions to solve my problem:
1. Hoping that 'index coverage' coverage is implented (perhaps by a
patch ... sombody asked about it but no response) in 7.2 or at
least 7.3.
In this case I would try to do my best with the statistic tables
but I wouldn´t cope with it if at some stage our data model would
change and I would rework all such stuff.
2. Giving MySQL a trial because I expect it to solve my problem in
the fashion I need. (Well - readonly is OK, surely no such features
like MVCC and thus perhaps faster index scans.) I would definitely
come back to PostgreSQL once 'index coverage' or any other method
to speed up index search will be implemented.
Could somebody give any advise what would be the best strategy? (Perhaps
I should switch back to pgsql-general for this question, but I definitely
want to hear a statement from the hackers about future implementation
plans!)
By the way in my former postings I forgot to mention a further problem
which stayed unanswered in my questions on pgsql-general is the fact that
while observing "top" while doing a query (over some 30 seconds) the
memory load from postgresql increases heavily when executing a query.
I wonder if it could help if there would be some mechanism to let keep
some information of the database resident in memory. I surely know that
memory handling of Linux/UNIX is different from Win (and this is a great
feature ;-) ), but if I have a plenty of free memory (2GB) and my box
wasn´t swapping at any time I wonder if it shouldn´t be possible to
hold some information in memory in favour of simply relying on the hard
disk cache of the OS. Any opinions?
Kind regards
Andreas.
From: | Horst Herb <horst(at)hherb(dot)com> |
---|---|
To: | "Tille, Andreas" <TilleA(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Serious performance problem |
Date: | 2001-11-05 15:05:42 |
Message-ID: | 20011105150108.8399.qmail@gnumed.dhs.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Monday 05 November 2001 03:24, Tille, Andreas wrote:
> I discussed the issue of using statistics tables to speed up certain
> queries. He told me that those technique is known as OLAP tubes in
> MS SQL server and that there are tools to build such things. Is this
> a valid comparison? He did not use it because it would disable the
> access solution of our clients. Are there any tools for PostgreSQL for
> such stuff besides the manual creating tables and triggers?
I still don't understand your guy. Knowing that the table (and with it the
performance demands) will grow, it is quite stubborn and certainly not
elegant at all to insist on the blunt query instead of a smart solution. The
smart solution as outlined always returns results instantly and needs next to
no memory or other ressources as compared to the blunt query, regardless of
the growth of your database. It would only impact the growth *rate* due to
the fired triggers, but then, your application does not seem to have a heavy
insert load anyway and you could always queue the inserts with middleware as
you have no realtime demands.
Btw, what is wrong with creating a few tables and a few trigger functions
"manually"? Writing, testing, and debugging them should not cost more than
a couple of days. Why would I want a tool for it? I might spend a couple of
hours writing a python script if I would need similar triggers for many
tables over and over again, but your problem does not seem to have the need
for this.
Horst
From: | Horst Herb <hherb(at)malleenet(dot)net(dot)au> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Serious performance problem |
Date: | 2001-11-05 17:58:08 |
Message-ID: | 20011105175327.10249.qmail@gnumed.dhs.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Monday 05 November 2001 03:24, Tille, Andreas wrote:
> I discussed the issue of using statistics tables to speed up certain
> queries. He told me that those technique is known as OLAP tubes in
> MS SQL server and that there are tools to build such things. Is this
> a valid comparison? He did not use it because it would disable the
> access solution of our clients. Are there any tools for PostgreSQL for
> such stuff besides the manual creating tables and triggers?
I still don't understand your guy. Knowing that the table (and with it the
performance demands) will grow, it is quite stubborn and certainly not
elegant at all to insist on the blunt query instead of a smart solution. The
smart solution as outlined always returns results instantly and needs next to
no memory or other ressources as compared to the blunt query, regardless of
the growth of your database. It would only impact the growth *rate* due to
the fired triggers, but then, your application does not seem to have a heavy
insert load anyway and you could always queue the inserts with middleware as
you have no realtime demands.
Btw, what is wrong with creating a few tables and a few trigger functions
"manually"? Writing, testing, and debugging them should not cost more than
a couple of days. Why would I want a tool for it? I might spend a couple of
hours writing a python script if I would need similar triggers for many
tables over and over again, but your problem does not seem to have the need
for this.
Horst