Lists: | pgsql-hackers |
---|
From: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
---|---|
To: | "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Jie Zhang" <jzhang(at)greenplum(dot)com> |
Cc: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-28 20:02:32 |
Message-ID: | D425483C2C5C9F49B5B7A41F8944154757DAC7@postal.corporate.connx.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Luke Lonergan
> Sent: Friday, July 28, 2006 12:18 PM
> To: Jim C. Nasby; Jie Zhang
> Cc: Tom Lane; Mark Kirkwood; Josh Berkus; Gavin Sherry; pgsql-
> hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] On-disk bitmap index patch
>
> Jim,
>
> On 7/28/06 10:17 AM, "Jim C. Nasby" <jnasby(at)pervasive(dot)com> wrote:
>
> > If the usefulness of bitmap indexes is still in doubt, could someone
at
> > Greenplum provide data from actual data warehouses from actual
> > customers?
>
> First, is anyone in doubt?
Others have looked into the usefulness of bitmap indexes. Here is what
they found:
http://www.oracle.com/technology/pub/articles/sharma_indexes.html
http://citeseer.ist.psu.edu/stockinger02bitmap.html
Oracle, IBM, and even Microsoft[1] supports them. Probably not just to
be trendy.
[1] Microsoft SQL Server creates temporary bitmap indexes during some
queries, though you cannot declaratively create a bitmap index.
> - Luke
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Dann Corbit" <DCorbit(at)connx(dot)com> |
Cc: | "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Jie Zhang" <jzhang(at)greenplum(dot)com>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-28 20:18:53 |
Message-ID: | 809.1154117933@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
"Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> Others have looked into the usefulness of bitmap indexes. Here is what
> they found:
> http://www.oracle.com/technology/pub/articles/sharma_indexes.html
I like this guy's style of argument: he admits a bitmap index on a
unique column will be much bigger than a btree, and then airily
dismisses it as not a problem. Not real convincing.
> http://citeseer.ist.psu.edu/stockinger02bitmap.html
Both of these pages say up front that they are considering read-only
data. So one of the questions that has to be answered (and the
submitters have been entirely mum about) is exactly how bad is the
update performance? If it's really awful that's going to constrain
the use cases quite a lot, whereas merely "a bit slower than btree"
wouldn't be such a problem.
In any case, arguing that other DBs find it's a win will cut no ice
with me. See adjacent discussion about hash indexes --- those *ought*
to be a win, but they aren't in Postgres, for reasons that are still
guesses. The translation gap between other DBs' experience and ours
can be large.
regards, tom lane
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Dann Corbit <DCorbit(at)connx(dot)com>, Luke Lonergan <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Jie Zhang <jzhang(at)greenplum(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-28 20:25:23 |
Message-ID: | 200607282025.k6SKPNB08324@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
What we don't want to happen is for us to release bitmapped indexes, and
find out later that btree is better in all cases. Then we have to tell
people not to use bitmapped indexes until we fix it in the next major
releasse. FYI, that is basically where we are right now with hash
indexes.
---------------------------------------------------------------------------
Tom Lane wrote:
> "Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> > Others have looked into the usefulness of bitmap indexes. Here is what
> > they found:
> > http://www.oracle.com/technology/pub/articles/sharma_indexes.html
>
> I like this guy's style of argument: he admits a bitmap index on a
> unique column will be much bigger than a btree, and then airily
> dismisses it as not a problem. Not real convincing.
>
> > http://citeseer.ist.psu.edu/stockinger02bitmap.html
>
> Both of these pages say up front that they are considering read-only
> data. So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance? If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.
>
> In any case, arguing that other DBs find it's a win will cut no ice
> with me. See adjacent discussion about hash indexes --- those *ought*
> to be a win, but they aren't in Postgres, for reasons that are still
> guesses. The translation gap between other DBs' experience and ours
> can be large.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From: | "Luke Lonergan" <llonergan(at)greenplum(dot)com> |
---|---|
To: | "Bruce Momjian" <bruce(at)momjian(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Dann Corbit" <DCorbit(at)connx(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Jie Zhang" <jzhang(at)greenplum(dot)com>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-28 21:43:23 |
Message-ID: | C0EFD30B.2B307%llonergan@greenplum.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Bruce,
On 7/28/06 1:25 PM, "Bruce Momjian" <bruce(at)momjian(dot)us> wrote:
> What we don't want to happen is for us to release bitmapped indexes, and
> find out later that btree is better in all cases. Then we have to tell
> people not to use bitmapped indexes until we fix it in the next major
> releasse. FYI, that is basically where we are right now with hash
> indexes.
On this thread people have presented results that show clear and irrefutable
evidence that there are use cases where bitmap indexes outperform Btree for
many datatypes on realistic problems, including the TPC-H benchmark.
In many cases the bitmap indexes outperform BTREE by a factor of 50 and are
a tiny fraction of the size and also take dramatically less time to build.
Of the cases presented, we need to have someone specifically address them
and point out why they aren't proof of bitmap index performance. So far
this has not been done, rather there are some unsupported opinions about
other cases that might be problematic.
- Luke
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Dann Corbit <DCorbit(at)connx(dot)com>, Luke Lonergan <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Jie Zhang <jzhang(at)greenplum(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-28 22:12:20 |
Message-ID: | 44CA8BC4.2080204@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
>
> Both of these pages say up front that they are considering read-only
> data. So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance? If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.
>
> In any case, arguing that other DBs find it's a win will cut no ice
> with me. See adjacent discussion about hash indexes --- those *ought*
> to be a win, but they aren't in Postgres, for reasons that are still
> guesses. The translation gap between other DBs' experience and ours
> can be large.
>
Notwithstanding that, I have a couple of non-postgres data points /
anecdotes on this.
Back in my days as an Ingres DBA in the mid 90s, our fairly highly tuned
system used hash organised tables only for small fairly static
lookup-type tables (state codes, postcodes, etc). Everything that was
more dynamic was done with btree indexed tables.
A few years later, I was producing very large tables of email addresses
using BDB. I quickly stopped using hash tables when I found that the
reorganisation penalty was huge. Switching to btree worked just fine,
with no sudden performance blip. This might not be directly relevant,
but clearly the bucket size is.
I guess what we need to demonstrate is that the better hash performance
will actually persist to a scale where it is actually worth it - surely
for very small tables the index method won't matter much anyway.
cheers
andrew
From: | Hannu Krosing <hannu(at)skype(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Dann Corbit <DCorbit(at)connx(dot)com>, Luke Lonergan <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Jie Zhang <jzhang(at)greenplum(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-28 22:47:30 |
Message-ID: | 1154126850.2961.35.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Ühel kenal päeval, R, 2006-07-28 kell 16:18, kirjutas Tom Lane:
> "Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> > Others have looked into the usefulness of bitmap indexes. Here is what
> > they found:
> > http://www.oracle.com/technology/pub/articles/sharma_indexes.html
>
> I like this guy's style of argument: he admits a bitmap index on a
> unique column will be much bigger than a btree, and then airily
> dismisses it as not a problem. Not real convincing.
This problem can be easyly avoided by not creating bitmap indexes on
unique columns. So I think it is ok to dismiss it.
> > http://citeseer.ist.psu.edu/stockinger02bitmap.html
>
> Both of these pages say up front that they are considering read-only
> data. So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance? If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.
May be.
OTOH, in OLAP databases you may be better off dropping the indexes
before data loading and rebuilding them after. And it has been shown
that bitmap indexes build a lot faster than btrees.
> In any case, arguing that other DBs find it's a win will cut no ice
> with me.
How about a more general argument. I claim that an index that is small
and fits in RAM is faster than a big one that does not fit in RAM.
> See adjacent discussion about hash indexes --- those *ought*
> to be a win, but they aren't in Postgres, for reasons that are still
> guesses. The translation gap between other DBs' experience and ours
> can be large.
IIRC the tests showing bitmap indexes being much faster on TPC-H were
done on postgresql, no ?
You pointed out that btree indexes are more bloated in this case as they
store padding spaces for all CHAR(N) fields whereas bitmap index stores
padding spaces only once for each distinct value.
Are there any plans to start optimising btree storage model in
forseeable future ?
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia
Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com
From: | Hannu Krosing <hannu(at)skype(dot)net> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dann Corbit <DCorbit(at)connx(dot)com>, Luke Lonergan <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Jie Zhang <jzhang(at)greenplum(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-28 22:54:08 |
Message-ID: | 1154127248.2961.39.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Ühel kenal päeval, R, 2006-07-28 kell 16:25, kirjutas Bruce Momjian:
> What we don't want to happen is for us to release bitmapped indexes, and
> find out later that btree is better in all cases.
Actually I'd love it if adding bitmap indexes to core pg would magically
make btree several times faster for the cases where bitmap indexes are
faster now :)
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia
Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com
From: | mark(at)mark(dot)mielke(dot)cc |
---|---|
To: | Luke Lonergan <llonergan(at)greenplum(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dann Corbit <DCorbit(at)connx(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Jie Zhang <jzhang(at)greenplum(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-07-29 04:26:23 |
Message-ID: | 20060729042622.GA22145@mark.mielke.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Jul 28, 2006 at 02:43:23PM -0700, Luke Lonergan wrote:
> On 7/28/06 1:25 PM, "Bruce Momjian" <bruce(at)momjian(dot)us> wrote:
> > What we don't want to happen is for us to release bitmapped indexes, and
> > find out later that btree is better in all cases. Then we have to tell
> > people not to use bitmapped indexes until we fix it in the next major
> > releasse. FYI, that is basically where we are right now with hash
> > indexes.
> On this thread people have presented results that show clear and irrefutable
> evidence that there are use cases where bitmap indexes outperform Btree for
> many datatypes on realistic problems, including the TPC-H benchmark.
Irrefutable is a little optimistic, don't you think? :-)
There is reason to believe that a bitmap index is useful in some
scenarios. We're not yet clear on what these are, whether they apply
to production use scenarios, or whether b-tree could not be optimized
to be better.
I support you - I want to see these great things for myself.
But irrefutable? Irrefutable is not true. :-)
Cheers,
mark
--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
From: | Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: On-disk bitmap index patch |
Date: | 2006-08-02 18:38:09 |
Message-ID: | 44D0F111.60100@cheapcomplexdevices.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Both of these pages say up front that they are considering read-only
> data.
Can I assume read-mostly partitions could use the read-I/O
efficient indexes on update-intensive partitions of the same
table could use b-tree indexes?
All of my larger (90GB+) tables can have partitions that are either
almost read-only (spatial data updated one state/country at
a time, about quarterly) or are totally read-only (previous
months and years historical data).
> So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance? If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.
Once we have easy-to-use partitioning, would it be the case that
many larger tables will have near-read-only partitions that could
use I/O friendlier indexes (GIN? Hash? Bitmap?)? Any examples
of a large data set that can't be partitioned into hot areas and
read-mostly areas?