Re: [DOCS] Please Help: PostgreSQL Query Optimizer

Lists: Postg토토 사이트 추천SQLPostg윈 토토SQL : Postg윈 토토SQL 메일 링리스트 : 2005-12-11 이후 PGSQL-Chat 17:19pgsql-docspgsql-hackers
From: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-chat(at)postgresql(dot)org, pgsql-benchmarks(at)postgresql(dot)org, pgsql-docs(at)postgresql(dot)org
Cc: Amirishetty Anjan Kumar <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
Subject: Please Help: PostgreSQL Query Optimizer
Date: 2005-12-11 10:45:26
Message-ID: Pine.LNX.4.61.0512111610320.4525@nsl-33.cse.iitb.ac.in
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs Postg토토 꽁 머니SQL

I'm working on a project, whose implementation deals with PostgreSQL. A brief description of the project is given below.

Project Description:
--------------------
In Main Memory DataBase(MMDB) entire database on the disk is loaded on to the main memory during initial startup of the system. There after all the references are made to database on the main memory. When the system is going to shutdown, we will write back the database on the main memory to disk. Here, for the sake of recovery we are writing log records on to the disk during the transaction execution.

We want to implement MMDB by modifying PostgreSQL. We implemented our own Main Memory File System to store the primary copy of the database in main memory, and Modified the PostgreSQL to access the data in the Main Memory File System.

Now, in our implementation Disk access is completely avoided during normal transaction execution. So, we need to modify the Query Optimizer of PostgreSQL so that it wont consider disk related costs during calculation of Query Costs. Query Optimizer should try to minimize the Processing Cost. The criteria for cost can be taken as the number of tuples that have to read/write from main memory, number of comparisons, etc.

Can any one tell me the modifications needs to be incorporated to PostgreSQL, so that it considers only Processing Costs during optimization of the Query.

In PostgreSQL, Path costs are measured in units of disk accesses. One sequential page fetch has cost 1. I think, in PostgreSQL following paramters are used in calculating the cost of the Query Path :

#random_page_cost = 4 # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)
#effective_cache_size = 1000 # typically 8KB each

In our case we are reading pages from Main Memory File System, but not from Disk. Will it be sufficient, if we change the default values of above paramters in "src/include/optimizer/cost.h and src/backend/utils/misc/postgresql.conf.sample" as follows:

random_page_cost = 4;
cpu_tuple_cost = 2;
cpu_index_tuple_cost = 0.2;
cpu_operator_cost = 0.05;

Please help us in this regard. I request all of you to give comments/suggestions on this. Waiting for your kind help.

--
Thanks.

Anjan Kumar A.
MTech2, Comp Sci.,
www.cse.iitb.ac.in/~anjankumar
______________________________________________________________
May's Law:
The quality of correlation is inversly proportional to the density
of control. (The fewer the data points, the smoother the curves.)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [DOCS] Please Help: PostgreSQL Query Optimizer
Date: 2005-12-11 17:19:13
Message-ID: 17378.1134321553@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 추천SQL Postg윈 토토SQL : Postg윈 토토SQL 메일 링리스트 : 2005-12-11 이후 PGSQL-Chat 17:19 pgsql-docs pgsql-hackers

[ trimming cc list to something sane ]

"Anjan Kumar. A." <anjankumar(at)cse(dot)iitb(dot)ac(dot)in> writes:
> In Main Memory DataBase(MMDB) entire database on the disk is loaded on to the main memory during initial startup of the system. There after all the references are made to database on the main memory. When the system is going to shutdown, we will write back the database on the main memory to disk. Here, for the sake of recovery we are writing log records on to the disk during the transaction execution.

Don't you get 99.9% of this for free with Postgres' normal behavior?
Just increase shared_buffers.

> Can any one tell me the modifications needs to be incorporated to PostgreSQL, so that it considers only Processing Costs during optimization of the Query.

Assuming that a page fetch costs zero is wrong even in an all-in-memory
environment. So I don't see any reason you can't maintain the
convention that a page fetch costs 1.0 unit, and just adjust the other
cost parameters in the light of a different idea about what that
actually means.

> Will it be sufficient, if we change the default values of above paramters in "src/include/optimizer/cost.h and src/backend/utils/misc/postgresql.conf.sample" as follows:

> random_page_cost = 4;
> cpu_tuple_cost = 2;
> cpu_index_tuple_cost = 0.2;
> cpu_operator_cost = 0.05;

You'd want random_page_cost = 1 since there is presumably no penalty for
random access in this context. Also, I think you'd want
cpu_operator_cost a lot higher than that (maybe you dropped a decimal
place? You scaled the others up by 200 but this one only by 20).

It's entirely possible that the ratios of the cpu_xxx_cost values
aren't very good and will need work. In the past we've never had
occasion to study them very carefully, since they were only marginal
contributions anyway.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>, pgsql-chat(at)postgresql(dot)org, pgsql-benchmarks(at)postgresql(dot)org, pgsql-docs(at)postgresql(dot)org
Subject: Re: [HACKERS] Please Help: PostgreSQL Query Optimizer
Date: 2005-12-11 19:26:18
Message-ID: 200512111126.18566.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

Anjan,

> In our case we are reading pages from Main Memory File System, but not from
> Disk. Will it be sufficient, if we change the default values of above
> paramters in "src/include/optimizer/cost.h and
> src/backend/utils/misc/postgresql.conf.sample" as follows:
>
> random_page_cost = 4;

This should be dramatically lowered. It's supposed to represent the ratio of
seek-fetches to seq scans on disk. Since there's no disk, it should be a
flat 1.0. However, we are aware that there are flaws in our calculations
involving random_page_cost, such that the actual number for a system where
there is no disk cost would be lower than 1.0. Your research will hopefully
help us find these flaws.

> cpu_tuple_cost = 2;
> cpu_index_tuple_cost = 0.2;
> cpu_operator_cost = 0.05;

I don't see why you're increasing the various cpu_* costs. CPU costs would be
unaffected by the database being in memory. In general, I lower these by a
divisor based on the cpu speed; for example, on a dual-opteron system I lower
the defaults by /6. However, that's completely unrelated to using an MMDB.

So, other than random_page_cost, I don't know of other existing GUCs that
would be directly related to using a disk/not using a disk. How are you
handling shared memory and work memory?

I look forward to hearing more about your test!

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>, pgsql-chat(at)postgresql(dot)org, pgsql-benchmarks(at)postgresql(dot)org, pgsql-docs(at)postgresql(dot)org
Subject: Re: [DOCS] [HACKERS] Please Help: PostgreSQL Query Optimizer
Date: 2005-12-11 20:41:36
Message-ID: 18706.1134333696@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> I don't see why you're increasing the various cpu_* costs.

You missed the point Josh --- these numbers are relative to the cost of
a page fetch, so if page fetch is measured in microseconds instead of
milliseconds, then you *do* want to bump the CPU costs up.

regards, tom lane


From: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
To: pgsql-hackers(at)postgresql(dot)org
Cc: pgsql-chat(at)postgresql(dot)org, pgsql-benchmarks(at)postgresql(dot)org, pgsql-docs(at)postgresql(dot)org
Subject: Re: [HACKERS] Please Help: PostgreSQL Query Optimizer
Date: 2005-12-11 22:06:01
Message-ID: Pine.LNX.4.61.0512120139080.6920@nsl-33.cse.iitb.ac.in
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

Since sequential access is not significantly faster than random access in a MMDB, random_page_cost will be approximately same as sequential page fetch cost.

As every thing is present in Main Memory, we need to give approximately same cost to read/write to Main Memory and CPU Related operations.

But, in PostgreSQL all costs are scaled relative to a page fetch. If we make both sequential_page_fetch_cost and random_page_cost to "1", then we need to increase the various cpu_* paramters by multiplying the default values with appropriate Scaling Factor. Now, we need to determine this Scaling Factor.

Still, i want to confirm whether this approach is the correct one.

On Sun, 11 Dec 2005, Josh Berkus wrote:

> Anjan,
>
>> In our case we are reading pages from Main Memory File System, but not from
>> Disk. Will it be sufficient, if we change the default values of above
>> paramters in "src/include/optimizer/cost.h and
>> src/backend/utils/misc/postgresql.conf.sample" as follows:
>>
>> random_page_cost = 4;
>
> This should be dramatically lowered. It's supposed to represent the ratio of
> seek-fetches to seq scans on disk. Since there's no disk, it should be a
> flat 1.0. However, we are aware that there are flaws in our calculations
> involving random_page_cost, such that the actual number for a system where
> there is no disk cost would be lower than 1.0. Your research will hopefully
> help us find these flaws.
>
>> cpu_tuple_cost = 2;
>> cpu_index_tuple_cost = 0.2;
>> cpu_operator_cost = 0.05;
>
> I don't see why you're increasing the various cpu_* costs. CPU costs would be
> unaffected by the database being in memory. In general, I lower these by a
> divisor based on the cpu speed; for example, on a dual-opteron system I lower
> the defaults by /6. However, that's completely unrelated to using an MMDB.
>
> So, other than random_page_cost, I don't know of other existing GUCs that
> would be directly related to using a disk/not using a disk. How are you
> handling shared memory and work memory?
>
> I look forward to hearing more about your test!
>
>

--
Regards.

Anjan Kumar A.
MTech2, Comp Sci.,
www.cse.iitb.ac.in/~anjankumar
______________________________________________________________
Do not handicap your children by making their lives easy.
-- Robert Heinlein


From: Carlos Moreno <moreno(at)mochima(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Please Help: PostgreSQL Query Optimizer
Date: 2005-12-11 22:55:09
Message-ID: 439CAE4D.2030208@mochima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers


Hi,

I'm very new to this list -- I've been using and advocating PostgreSQL for
no less than 4 or 5 years now, and have participated in some of the other
mailing lists, but never on this one.

My question is (short version): how would one go about adding a new
(built-in) function to PostgreSQL?

Long-ish version:

I know the answer "in theory" -- one goes through the source code, find
out how it all works, and modify/add the code to add or fix whatever
feature we want.

I guess my point in here would be rather a "feature request" -- except
that I'd find it pretty exciting to implement it myself, and then propose
the new feature by volunteering the implementation that I already wrote
(seems like the spirit of open-source communities, right?) -- then of
course, it would be subject to consensus, whether or not the feature
makes sense and the implementation is good enough.

I'm interested in adding additional hash functions -- PG supports, as part
of the built-in SQL functions, MD5 hashing. So, for instance, I can simply
type, at a psql console, the following:

select md5('abc');

My "feature request" (which again, I'd like to implement it myself) would
be the ability to do:

select sha1('xyz'), sha256('etc');

(At least these two -- maybe for completeness it would be good to have
sha224, 384, and 512, but I guess SHA1 and SHA-256 would be a very good
and sound starting point)

So, can you offer some advice or pointers on how to go about that?

I started by doing a search for the string md5 through all the source
code -- the problem is, md5 shows up in many many many places (it is
part of the authentication protocol, among other things), so I got a
little bit lost searching through it all.

I wonder if you have some documents specifically aimed at providing
advice and documentation for prospective developers (or for people
that want to "tweak" the source code to fix/tuneup or add functionality),
I guess that would be great for me in this case.

Thanks!

Carlos
--


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Carlos Moreno <moreno(at)mochima(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Adding funtions to postgresql (Not - )e: Please Help: PostgreSQL Query Optimizer
Date: 2005-12-11 23:45:26
Message-ID: 1134344727.3567.53.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

Ühel kenal päeval, P, 2005-12-11 kell 17:55, kirjutas Carlos Moreno:
> Hi,
>
> I'm very new to this list -- I've been using and advocating PostgreSQL for
> no less than 4 or 5 years now, and have participated in some of the other
> mailing lists, but never on this one.
>
> My question is (short version): how would one go about adding a new
> (built-in) function to PostgreSQL?

Ask your question as a separate post, not as an answer t another
thread :)

> I'm interested in adding additional hash functions -- PG supports, as part
> of the built-in SQL functions, MD5 hashing. So, for instance, I can simply
> type, at a psql console, the following:
>
> select md5('abc');
>
> My "feature request" (which again, I'd like to implement it myself) would
> be the ability to do:
>
> select sha1('xyz'), sha256('etc');
>
> (At least these two -- maybe for completeness it would be good to have
> sha224, 384, and 512, but I guess SHA1 and SHA-256 would be a very good
> and sound starting point)

Take a look at the pgcrypto contrib module (in src/contrib/pgcrypto )

Not only does it show how to add functions, but it also provides many of
the ones you need.

----------------
Hannu


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Carlos Moreno <moreno(at)mochima(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Please Help: PostgreSQL Query Optimizer
Date: 2005-12-11 23:52:33
Message-ID: 439CBBC1.2030700@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토 결과SQL pgsql-chat pgsql-docs Postg토토 꽁 머니SQL


Actually, there is probably comparatively little to gain from making it
a builtin. And SHA1 is already there in the pgcrypto contrib module.
Presumably if we wanted a builtin we would start from that code base.

cheers

andrew

Carlos Moreno wrote:

>
> Hi,
>
> I'm very new to this list -- I've been using and advocating PostgreSQL
> for
> no less than 4 or 5 years now, and have participated in some of the other
> mailing lists, but never on this one.
>
> My question is (short version): how would one go about adding a new
> (built-in) function to PostgreSQL?
>
> Long-ish version:
>
> I know the answer "in theory" -- one goes through the source code, find
> out how it all works, and modify/add the code to add or fix whatever
> feature we want.
>
> I guess my point in here would be rather a "feature request" -- except
> that I'd find it pretty exciting to implement it myself, and then propose
> the new feature by volunteering the implementation that I already wrote
> (seems like the spirit of open-source communities, right?) -- then of
> course, it would be subject to consensus, whether or not the feature
> makes sense and the implementation is good enough.
>
> I'm interested in adding additional hash functions -- PG supports, as
> part
> of the built-in SQL functions, MD5 hashing. So, for instance, I can
> simply
> type, at a psql console, the following:
>
> select md5('abc');
>
> My "feature request" (which again, I'd like to implement it myself) would
> be the ability to do:
>
> select sha1('xyz'), sha256('etc');
>
> (At least these two -- maybe for completeness it would be good to have
> sha224, 384, and 512, but I guess SHA1 and SHA-256 would be a very good
> and sound starting point)
>
> So, can you offer some advice or pointers on how to go about that?
>
> I started by doing a search for the string md5 through all the source
> code -- the problem is, md5 shows up in many many many places (it is
> part of the authentication protocol, among other things), so I got a
> little bit lost searching through it all.
>
> I wonder if you have some documents specifically aimed at providing
> advice and documentation for prospective developers (or for people
> that want to "tweak" the source code to fix/tuneup or add functionality),
> I guess that would be great for me in this case.
>
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Adding funtions to postgresql (Not - )e: Please Help:
Date: 2005-12-12 00:02:08
Message-ID: 439CBE00.4000002@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

Hannu Krosing wrote:

>Ask your question as a separate post, not as an answer t another
>thread :)
>
>
>

Also, if you post to a mailing list, you should have the courtesy to
arrange it so your spam filter does not reject replies.

cheers

andrew


From: Carlos Moreno <moreno(at)mochima(dot)com>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Adding funtions to postgresql (Not - )e: Please Help:
Date: 2005-12-12 02:36:29
Message-ID: 439CE22D.70609@mochima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

Hannu Krosing wrote:

>Ühel kenal päeval, P, 2005-12-11 kell 17:55, kirjutas Carlos Moreno:
>
>>Hi,
>>
>>I'm very new to this list -- I've been using and advocating PostgreSQL for
>>no less than 4 or 5 years now, and have participated in some of the other
>>mailing lists, but never on this one.
>>
>>My question is (short version): how would one go about adding a new
>>(built-in) function to PostgreSQL?
>>
>
>Ask your question as a separate post, not as an answer t another
>thread :)
>

I know... Sorry -- My brain must have temporarily shut down when I was
posting. Wrong subject, wrong link to the rest of the messages... :-(

> My "feature request" (which again, I'd like to implement it myself) would
>
>>be the ability to do:
>>
>>select sha1('xyz'), sha256('etc');
>>
>>(At least these two -- maybe for completeness it would be good to have
>>sha224, 384, and 512, but I guess SHA1 and SHA-256 would be a very good
>>and sound starting point)
>>
>
>Take a look at the pgcrypto contrib module (in src/contrib/pgcrypto )
>
>Not only does it show how to add functions, but it also provides many of
>the ones you need.
>

Ok. Will do.

Thanks for the pointer!

Carlos
--


From: Tino Wildenhain <tino(at)wildenhain(dot)de>
To: Carlos Moreno <moreno(at)mochima(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Please Help: PostgreSQL Query Optimizer
Date: 2005-12-12 06:35:35
Message-ID: 1134369335.4975.74.camel@Andrea.peacock.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

Am Sonntag, den 11.12.2005, 17:55 -0500 schrieb Carlos Moreno:
...
> I'm interested in adding additional hash functions -- PG supports, as part
> of the built-in SQL functions, MD5 hashing. So, for instance, I can simply
> type, at a psql console, the following:
>
> select md5('abc');
>
> My "feature request" (which again, I'd like to implement it myself) would
> be the ability to do:
>
> select sha1('xyz'), sha256('etc');
>
> (At least these two -- maybe for completeness it would be good to have
> sha224, 384, and 512, but I guess SHA1 and SHA-256 would be a very good
> and sound starting point)
>
> So, can you offer some advice or pointers on how to go about that?

You might want to check out contrib/pgcrypto

more often then not, if you want something, its
already done ;)

Not sure if this will ever be included in the core,
since not many people need these advanced hash functions.

HTH
Tino Wildenhain


From: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Please Help: PostgreSQL Query Optimizer
Date: 2005-12-12 13:09:42
Message-ID: Pine.LNX.4.61.0512121833490.11179@nsl-33.cse.iitb.ac.in
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 베이SQL : Postg토토 pgsql-chat pgsql-docs pgsql-hackers

Defaulat values of various parameters in PostgreSQL:

#random_page_cost = 4 # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)
#effective_cache_size = 1000 # typically 8KB each

Since sequential access is not significantly faster than random access in a MMDB, random_page_cost will be approximately same as sequential page fetch cost.

If we make both sequential_page_fetch_cost and random_page_cost to "1", then we need to increase the various cpu_* paramters by multiplying the default values with appropriate "Scaling Factor". Now, we need to determine this Scaling Factor.

Through googling, i found that Normal Disk has external data transfer rate of around 40MBps,
where as Main Memory has Data transfer rate ranging from 1.6GBps to 2.8GBps.

As we can see, the ratio between Disk and Main Memory data transfer rates is around 50. Then, if we multiply all cpu_* paramters by 50, the resulting values will be:

random_page_cost = 1;
cpu_tuple_cost = 0.5;
cpu_index_tuple_cost = 0.05;
cpu_operator_cost = 0.0125;

Would it be a suitable approach ? We request all of u to give comments/suggestions on this calcualations. Thanking You.

On Sun, 11 Dec 2005, Tom Lane wrote:

> [ trimming cc list to something sane ]
>
> "Anjan Kumar. A." <anjankumar(at)cse(dot)iitb(dot)ac(dot)in> writes:
>> In Main Memory DataBase(MMDB) entire database on the disk is loaded on to the main memory during initial startup of the system. There after all the references are made to database on the main memory. When the system is going to shutdown, we will write back the database on the main memory to disk. Here, for the sake of recovery we are writing log records on to the disk during the transaction execution.
>
> Don't you get 99.9% of this for free with Postgres' normal behavior?
> Just increase shared_buffers.
>
>> Can any one tell me the modifications needs to be incorporated to PostgreSQL, so that it considers only Processing Costs during optimization of the Query.
>
> Assuming that a page fetch costs zero is wrong even in an all-in-memory
> environment. So I don't see any reason you can't maintain the
> convention that a page fetch costs 1.0 unit, and just adjust the other
> cost parameters in the light of a different idea about what that
> actually means.
>
>> Will it be sufficient, if we change the default values of above paramters in "src/include/optimizer/cost.h and src/backend/utils/misc/postgresql.conf.sample" as follows:
>
>> random_page_cost = 4;
>> cpu_tuple_cost = 2;
>> cpu_index_tuple_cost = 0.2;
>> cpu_operator_cost = 0.05;
>
> You'd want random_page_cost = 1 since there is presumably no penalty for
> random access in this context. Also, I think you'd want
> cpu_operator_cost a lot higher than that (maybe you dropped a decimal
> place? You scaled the others up by 200 but this one only by 20).
>
> It's entirely possible that the ratios of the cpu_xxx_cost values
> aren't very good and will need work. In the past we've never had
> occasion to study them very carefully, since they were only marginal
> contributions anyway.
>
> regards, tom lane
>

--
Regards.

Anjan Kumar A.
MTech2, Comp Sci.,
www.cse.iitb.ac.in/~anjankumar
______________________________________________________________
A woman physician has made the statement that smoking is neither
physically defective nor morally degrading, and that nicotine, even
when indulged to in excess, is less harmful than excessive petting."
-- Purdue Exponent, Jan 16, 1925


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Please Help: PostgreSQL Query Optimizer
Date: 2005-12-12 20:40:03
Message-ID: 20051212204003.GY54639@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

On Mon, Dec 12, 2005 at 06:39:42PM +0530, Anjan Kumar. A. wrote:
> Through googling, i found that Normal Disk has external data transfer rate
> of around 40MBps,
> where as Main Memory has Data transfer rate ranging from 1.6GBps to 2.8GBps.

I think 40MB/s is a burst speed. You should do some testing to verify.

In any case, PostgreSQL doesn't come close to the theoretical maximum
disk bandwidth even on a sequential scan. There's been discussion about
this on various lists in the past. For a single drive, expect something
more in the range of 4-6MB/s (depending on the drive).

More important that throughput though, is latency. Because the latency
on memory is much closer to 0 (it's not truely 0 due to L1/L2 caching),
you can serve concurrent requests a lot faster.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>, pgsql-chat(at)postgresql(dot)org, pgsql-benchmarks(at)postgresql(dot)org, pgsql-docs(at)postgresql(dot)org
Subject: Re: [HACKERS] Please Help: PostgreSQL Query Optimizer
Date: 2005-12-13 06:47:21
Message-ID: 200512122247.21560.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

Anjan,

> But, in PostgreSQL all costs are scaled relative to a page fetch. If we
> make both sequential_page_fetch_cost and random_page_cost to "1", then we
> need to increase the various cpu_* paramters by multiplying the default
> values with appropriate Scaling Factor. Now, we need to determine this
> Scaling Factor.

I see, so you're saying that because the real cost of a page fetch has
decreased, the CPU_* costs should increase proportionally because relative to
the real costs of a page fetch they should be higher? That makes a sort of
sense.

The problem that you're going to run into is that currently we have no
particularly reason to believe that the various cpu_* costs are more than
very approximately correct as rules of thumb. So I think you'd be a lot
better off trying to come up with some means of computing the real cpu costs
of each operation, rather than trying to calculate a multiple of numbers
which may be wrong in the first place.

I know that someone on this list was working on a tool to digest EXPLAIN
ANALYZE results and run statistics on them. Can't remember who, though.

Also, I'm still curious on how you're handling shared_mem, work_mem and
maintenance_mem. You didn't answer last time.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Please Help: PostgreSQL Query Optimizer
Date: 2006-01-12 23:05:45
Message-ID: Pine.LNX.4.61.0601130435010.24920@nsl-22.cse.iitb.ac.in
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

> Through googling, i found that Normal Disk has external data transfer rate of
> around 40MBps,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Does this includes, seek and rotational latency ?

> where as Main Memory has Data transfer rate ranging from 1.6GBps to 2.8GBps.

>
> As we can see, the ratio between Disk and Main Memory data transfer rates is
> around 50. Then, if we multiply all cpu_* paramters by 50, the resulting
> values will be:
>
> random_page_cost = 1;
> cpu_tuple_cost = 0.5;
> cpu_index_tuple_cost = 0.05;
> cpu_operator_cost = 0.0125;
>
>
> Would it be a suitable approach ? We request all of u to give
> comments/suggestions on this calcualations. Thanking You.
>
>
>
>
>
> On Sun, 11 Dec 2005, Tom Lane wrote:
>
>> [ trimming cc list to something sane ]
>>
>> "Anjan Kumar. A." <anjankumar(at)cse(dot)iitb(dot)ac(dot)in> writes:
>>> In Main Memory DataBase(MMDB) entire database on the disk is loaded
>>> on to the main memory during initial startup of the system. There after
>>> all the references are made to database on the main memory. When the
>>> system is going to shutdown, we will write back the database on the main
>>> memory to disk. Here, for the sake of recovery we are writing log records
>>> on to the disk during the transaction execution.
>>
>> Don't you get 99.9% of this for free with Postgres' normal behavior?
>> Just increase shared_buffers.
>>
>>> Can any one tell me the modifications needs to be incorporated to
>>> PostgreSQL, so that it considers only Processing Costs during
>>> optimization of the Query.
>>
>> Assuming that a page fetch costs zero is wrong even in an all-in-memory
>> environment. So I don't see any reason you can't maintain the
>> convention that a page fetch costs 1.0 unit, and just adjust the other
>> cost parameters in the light of a different idea about what that
>> actually means.
>>
>>> Will it be sufficient, if we change the default values of above paramters
>>> in "src/include/optimizer/cost.h and
>>> src/backend/utils/misc/postgresql.conf.sample" as follows:
>>
>>> random_page_cost = 4;
>>> cpu_tuple_cost = 2;
>>> cpu_index_tuple_cost = 0.2;
>>> cpu_operator_cost = 0.05;
>>
>> You'd want random_page_cost = 1 since there is presumably no penalty for
>> random access in this context. Also, I think you'd want
>> cpu_operator_cost a lot higher than that (maybe you dropped a decimal
>> place? You scaled the others up by 200 but this one only by 20).
>>
>> It's entirely possible that the ratios of the cpu_xxx_cost values
>> aren't very good and will need work. In the past we've never had
>> occasion to study them very carefully, since they were only marginal
>> contributions anyway.
>>
>> regards, tom lane
>>
>
>

--
Regards.

Anjan Kumar A.
MTech2, Comp Sci.,
www.cse.iitb.ac.in/~anjankumar
______________________________________________________________
Bradley's Bromide:
If computers get too powerful, we can organize
them into a committee -- that will do them in.


From: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Problem with the Planner
Date: 2006-01-15 23:35:04
Message-ID: Pine.LNX.4.61.0601160500250.13979@nsl-33.cse.iitb.ac.in
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg사설 토토 사이트SQL pgsql-chat pgsql-docs Postg스포츠 토토SQL

Please observe the following queries. Why PostgreSQL is favouring MergeJoin eventhough, it leading to higher execution times than NestedLoopJoin. Any suggestions to fix this problem.

bench=# EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=665.09..4704.60 rows=166701 width=488) (actual time=10.128..40.843 rows=50 loops=1)
Merge Cond: ("outer".unique2 = "inner".unique2)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..1514.00 rows=10000 width=244) (actual time=0.031..20.520 rows=10000 loops=1)
-> Sort (cost=665.09..673.42 rows=3334 width=244) (actual time=9.601..9.646 rows=50 loops=1)
Sort Key: t1.unique2
-> Seq Scan on tenk1 t1 (cost=0.00..470.00 rows=3334 width=244) (actual time=0.154..9.140 rows=50 loops=1)
Filter: (unique1 < 50)
Total runtime: 41.101 ms
(8 rows)

bench=# SET enable_mergejoin = off;
SET
bench=# EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=588.34..11841.35 rows=166701 width=488) (actual time=9.028..70.453 rows=50 loops=1)
Hash Cond: ("outer".unique2 = "inner".unique2)
-> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..11.846 rows=10000 loops=1)
-> Hash (cost=470.00..470.00 rows=3334 width=244) (actual time=8.378..8.378 rows=0 loops=1)
-> Seq Scan on tenk1 t1 (cost=0.00..470.00 rows=3334 width=244) (actual time=0.135..8.093 rows=50 loops=1)
Filter: (unique1 < 50)
Total runtime: 70.659 ms
(7 rows)

bench=# SET enable_hashjoin = off;
SET
bench=# EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..633218.15 rows=166701 width=488) (actual time=0.178..9.389 rows=50 loops=1)
-> Seq Scan on tenk1 t1 (cost=0.00..470.00 rows=3334 width=244) (actual time=0.135..8.349 rows=50 loops=1)
Filter: (unique1 < 50)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..189.16 rows=50 width=244) (actual time=0.009..0.011 rows=1 loops=50)
Index Cond: ("outer".unique2 = t2.unique2)
Total runtime: 9.552 ms
(6 rows)

--
Regards.

Anjan Kumar A.
MTech2, Comp Sci.,
www.cse.iitb.ac.in/~anjankumar
______________________________________________________________


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: "Anjan Kumar(dot) A(dot)" <anjankumar(at)cse(dot)iitb(dot)ac(dot)in>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problem with the Planner
Date: 2006-01-16 00:20:16
Message-ID: Pine.LNX.4.58.0601161119230.30653@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-benchmarks pgsql-chat pgsql-docs pgsql-hackers

On Mon, 16 Jan 2006, Anjan Kumar. A. wrote:

>
>
>
> Please observe the following queries. Why PostgreSQL is favouring MergeJoin eventhough, it leading to higher execution times than NestedLoopJoin. Any suggestions to fix this problem.
>
>
> bench=# EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------
> Merge Join (cost=665.09..4704.60 rows=166701 width=488) (actual time=10.128..40.843 rows=50 loops=1)
> Merge Cond: ("outer".unique2 = "inner".unique2)
> -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..1514.00 rows=10000 width=244) (actual time=0.031..20.520 rows=10000 loops=1)
> -> Sort (cost=665.09..673.42 rows=3334 width=244) (actual time=9.601..9.646 rows=50 loops=1)
> Sort Key: t1.unique2
> -> Seq Scan on tenk1 t1 (cost=0.00..470.00 rows=3334 width=244) (actual time=0.154..9.140 rows=50 loops=1)
> Filter: (unique1 < 50)
> Total runtime: 41.101 ms
> (8 rows)

Your statistics are way off. The seqscan on tenk1 estimates 3334 rows but
gets only 50. Run ANALYZE and try again.

Thanks,

Gavin