Quick Links

Re: Index usage vs large repetitions of key

Lists:	pgsql-general

From:	Francisco Reyes <lists(at)natserv(dot)com>
To:	felix(at)crowfix(dot)com
Cc:	pgsql General List <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Index usage vs large repetitions of key
Date:	2002-05-07 13:48:13
Message-ID:	20020507092310.U3527-100000@zoraida.natserv.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Sun, 5 May 2002 felix(at)crowfix(dot)com wrote:

> This sequential scan bugged the heck out of me, until I finally
> understood what is going on. Usually records on disk are scattered
> all over, so many per data page. I think the standard is 8K. Suppose
> your table has 10 records per data page. Roughly 1/20 of the records
> will be selected, so it is going to hit about half the data pages. In
> this case, it is better to simply march through all data pages
> sequentially than to read half the data pages randomly, and also hit a
> lot of index pages, also randomly scattered around the disk. It's
> better for the OS to read twice as much sequentially compared to half
> as much randomly. Throw in more random reading for index pages, and
> the sequential scan is a big win.

Thanks for the explanation. So I guess it is a factor of how big each row
is, the percentage of records to be selected and the page size.
The particular queries I am doing would return about %10 of the records
and using 8K pages there would be about 200 rows per page. Now I better
understand why the index is not been used.

> I think there is some way to force an indexed read, but I have
> forgotten what little I knew about that. If there is, you could try
> both ways and compare timings.

Based on this info it may make sense to let it do the sequential scan.

In the coming months the table in question is going to grow 3 to 4 times
it's number of records so at that point the index may make more sense.

Is there a drawback on having the index right now?
I guess it would make the optimizer's work more even though it would
likely not choose the index anyway.

From:	Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To:	"Francisco Reyes" <lists(at)natserv(dot)com>
Cc:	felix(at)crowfix(dot)com, pgsql-general(at)postgresql(dot)org
Subject:	Re: Index usage vs large repetitions of key
Date:	2002-05-07 17:29:46
Message-ID:	20020507132946.52fe57e9.nconway@klamath.dyndns.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Tue, 7 May 2002 09:48:13 -0400 (EDT)
"Francisco Reyes" <lists(at)natserv(dot)com> wrote:
> On Sun, 5 May 2002 felix(at)crowfix(dot)com wrote:
> > I think there is some way to force an indexed read, but I have
> > forgotten what little I knew about that. If there is, you could try
> > both ways and compare timings.
>
> Based on this info it may make sense to let it do the sequential scan.

You can easily test this hypothesis by disabling sequential scans (SET
enable_seqscan = off;), and using EXPLAIN ANALYZE to compare the performance
of the resulting query plan with the one chosen by the planner to
begin with.

> In the coming months the table in question is going to grow 3 to 4 times
> it's number of records so at that point the index may make more sense.
>
> Is there a drawback on having the index right now?

Yes; inserts and updates will need to update the index. Depending on
your queries, this can be a significant performance hit.

> I guess it would make the optimizer's work more even though it would
> likely not choose the index anyway.

My guess would be that this wouldn't be a very significant factor.

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC

From:	Francisco Reyes <lists(at)natserv(dot)com>
To:	Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
Cc:	Francisco Reyes <lists(at)natserv(dot)com>, <felix(at)crowfix(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Index usage vs large repetitions of key
Date:	2002-05-11 01:41:51
Message-ID:	20020510214022.U8234-100000@zoraida.natserv.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Tue, 7 May 2002, Neil Conway wrote:

> On Tue, 7 May 2002 09:48:13 -0400 (EDT)
> "Francisco Reyes" <lists(at)natserv(dot)com> wrote:
> > On Sun, 5 May 2002 felix(at)crowfix(dot)com wrote:
> > > I think there is some way to force an indexed read, but I have
> > > forgotten what little I knew about that. If there is, you could try
> > > both ways and compare timings.
> >
> > Based on this info it may make sense to let it do the sequential scan.
>
> You can easily test this hypothesis by disabling sequential scans (SET
> enable_seqscan = off;), and using EXPLAIN ANALYZE to compare the performance
> of the resulting query plan with the one chosen by the planner to
> begin with.

I tried to set enable_seqscan = off and it still did a sequential scan.

> > Is there a drawback on having the index right now?
>
> Yes; inserts and updates will need to update the index. Depending on
> your queries, this can be a significant performance hit.

This is a "reporting" server and I do a set of "copy" jobs once a day,
followed by a vacuum analyze.. and a nightly "vacuum full"