Re: Access statistics

Lists: Postg배트맨 토토SQL
From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: PostgreSQL PATCHES <pgsql-patches(at)postgreSQL(dot)org>
Subject: Access statistics
Date: 2001-06-02 16:05:22
Message-ID: 200106021605.f52G5M318206@jupiter.us.greatbridge.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Here it comes.

The views created for now are:

pg_stat_activity

Current connections to the entire installation including
actual querystrings.

pg_stat_database

Per database IO summary.

pg_stat_{all|sys|user}_tables

Per table scan statistics.

pg_statio_{all|sys|user}_tables

Per table buffer cache statistics.

pg_stat_{all|sys|user}_indexes

Per index scan statistics.

pg_statio_{all|sys|user}_indexes

Per index buffer cache statistics.

pg_statio_{all|sys|user}_indexes

Per sequence buffer cache statistics.

Of course, an initdb is required (this patch up to now
doesn't bump catversion).

The rules regression test is broken due to the new views from
initdb.

The existing functionality to reset stats isn't callable yet.
Should become a utility command someday.

Collecting must be configurable per database (catalog
change).

Due to all the changes that have happened (ever tried to
apply a 130K patch after a pgindent run?) I must verify all
this patch touches once more.

Have fun.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

Attachment Content-Type Size
access_stats.diff application/diff 135.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: PostgreSQL PATCHES <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Access statistics
Date: 2001-06-04 17:58:59
Message-ID: 6521.991677539@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> Here it comes.

Hm. I really don't like the way you've uglified the heap_fetch
interface; can't that be done better? It's just plain bizarre that
heap_fetch is now responsible for counting stats for index scans.
And surely this bit of code is wrong:

*userbuf = buffer;
+
+ if (iscan != NULL)
+ pgstat_count_heap_fetch(&iscan->xs_pgstat_info);
+ else
+ pgstat_count_heap_fetch(&relation->pgstat_info);
}

You can't count heapscan info via the relcache entry --- what if there's
more than one heapscan in progress on the same rel? What if the
relcache entry gets rebuilt by SI invalidation? I don't think the stats
stuff should touch the relcache at all.

There's been talk in the past of trying to unify the heapscan and
indexscan APIs, so that we wouldn't need ugliness like the two sets of
coding in AttrDefaultFetch and similar places. Maybe it's time to bite
that bullet, if it'd provide a cleaner place to put the stats calls.

Also, I do not like the pgStatPmPipe mechanism for detecting postmaster
shutdown. That eats two file descriptors in every backend, which is a
pretty substantial waste of resources. Why not just have the postmaster
send a signal to the stats collector when it's time to shut down?

BTW, PG_FUNCTION_INFO_V1() is not needed for built-in functions.

regards, tom lane


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, PostgreSQL PATCHES <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Access statistics
Date: 2001-06-04 20:25:23
Message-ID: 200106042025.f54KPNb10019@jupiter.us.greatbridge.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> > Here it comes.
>
> Hm. I really don't like the way you've uglified the heap_fetch
> interface; can't that be done better? It's just plain bizarre that
> heap_fetch is now responsible for counting stats for index scans.
> And surely this bit of code is wrong:
>
> *userbuf = buffer;
> +
> + if (iscan != NULL)
> + pgstat_count_heap_fetch(&iscan->xs_pgstat_info);
> + else
> + pgstat_count_heap_fetch(&relation->pgstat_info);
> }
>
> You can't count heapscan info via the relcache entry --- what if there's
> more than one heapscan in progress on the same rel? What if the
> relcache entry gets rebuilt by SI invalidation? I don't think the stats
> stuff should touch the relcache at all.

The counting end's up in one and the same global slot of the
tabstat messages. These slots are collected per frontend
command and are identified by the tables oid. Does it matter
how often the pointer to that slot is updated to the same
value?

>
> There's been talk in the past of trying to unify the heapscan and
> indexscan APIs, so that we wouldn't need ugliness like the two sets of
> coding in AttrDefaultFetch and similar places. Maybe it's time to bite
> that bullet, if it'd provide a cleaner place to put the stats calls.

Exactly that's the problem. Only heap_fetch() knows if the
tid passed in is visible or not, so right now the counting
could be done there or in all the places where heap_fetch()
is called.

>
> Also, I do not like the pgStatPmPipe mechanism for detecting postmaster
> shutdown. That eats two file descriptors in every backend, which is a
> pretty substantial waste of resources. Why not just have the postmaster
> send a signal to the stats collector when it's time to shut down?

Don't kill -9 the postmaster :-)

Well, we could do the signal *plus* having the child checking
from time to time with a zero kill if the postmaster died to
suizide in case.

>
> BTW, PG_FUNCTION_INFO_V1() is not needed for built-in functions.

Survived somehow from the time where they resided in a
loadable module.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)yahoo(dot)com>
Cc: PostgreSQL PATCHES <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Access statistics
Date: 2001-06-04 20:29:17
Message-ID: 7001.991686557@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Jan Wieck <JanWieck(at)yahoo(dot)com> writes:
>> You can't count heapscan info via the relcache entry --- what if there's
>> more than one heapscan in progress on the same rel? What if the
>> relcache entry gets rebuilt by SI invalidation? I don't think the stats
>> stuff should touch the relcache at all.

> The counting end's up in one and the same global slot of the
> tabstat messages. These slots are collected per frontend
> command and are identified by the tables oid. Does it matter
> how often the pointer to that slot is updated to the same
> value?

In that case, forget about the fields in the heapscan and indexscan
structs. You can easily keep both the heap and index access counts
in (or referenced by) the relcache entry, no? It just bothers me
that the treatment of heap and index counts is so inconsistent.
One or the other is wrong, or at least inefficient.

BTW, does it really matter whether we are counting an index-driven
or heap-driven fetch? Seems to me that if you count indextuple and
heaptuple fetches, you can figure out what's going on without having
to have heap_fetch know the difference.

>> Also, I do not like the pgStatPmPipe mechanism for detecting postmaster
>> shutdown. That eats two file descriptors in every backend, which is a
>> pretty substantial waste of resources. Why not just have the postmaster
>> send a signal to the stats collector when it's time to shut down?

> Don't kill -9 the postmaster :-)

What's your point? If the postmaster crashes, whether the stats
collector goes away without help seems like the least of your worries.
You'll end up killing backends manually anyhow, so why not the stats
process too? (For that matter, do you care if the stats process doesn't
die? Once there is no one left to send messages to it, it'll never do
anything, no?)

regards, tom lane


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, PostgreSQL PATCHES <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Access statistics
Date: 2001-06-04 20:46:17
Message-ID: 200106042046.f54KkHH10169@jupiter.us.greatbridge.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg배트맨 토토SQL

Tom Lane wrote:
> Jan Wieck <JanWieck(at)yahoo(dot)com> writes:
> >> You can't count heapscan info via the relcache entry --- what if there's
> >> more than one heapscan in progress on the same rel? What if the
> >> relcache entry gets rebuilt by SI invalidation? I don't think the stats
> >> stuff should touch the relcache at all.
>
> > The counting end's up in one and the same global slot of the
> > tabstat messages. These slots are collected per frontend
> > command and are identified by the tables oid. Does it matter
> > how often the pointer to that slot is updated to the same
> > value?
>
> In that case, forget about the fields in the heapscan and indexscan
> structs. You can easily keep both the heap and index access counts
> in (or referenced by) the relcache entry, no? It just bothers me
> that the treatment of heap and index counts is so inconsistent.
> One or the other is wrong, or at least inefficient.

heap tuples fetched due to an index scan are counted as
tuples_fetched on the indexes entry. Only heap_fetch() calls
not done due to an index scan count into the tuples_fetched
count for the table (e.g. tuples refetched for deferred
trigger invocation and the like).

>
> BTW, does it really matter whether we are counting an index-driven
> or heap-driven fetch? Seems to me that if you count indextuple and
> heaptuple fetches, you can figure out what's going on without having
> to have heap_fetch know the difference.
>
> >> Also, I do not like the pgStatPmPipe mechanism for detecting postmaster
> >> shutdown. That eats two file descriptors in every backend, which is a
> >> pretty substantial waste of resources. Why not just have the postmaster
> >> send a signal to the stats collector when it's time to shut down?
>
> > Don't kill -9 the postmaster :-)
>
> What's your point? If the postmaster crashes, whether the stats
> collector goes away without help seems like the least of your worries.
> You'll end up killing backends manually anyhow, so why not the stats
> process too? (For that matter, do you care if the stats process doesn't
> die? Once there is no one left to send messages to it, it'll never do
> anything, no?)

Except for generating another FAQ. We haven't been able to
stop ppl from killing the postmaster up to now. Why do you
think they'll ever learn? They are human ...

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com