Re: Using bitwise operator vs. mapping table

Lists: sfpug
From: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Using bitwise operator vs. mapping table
Date: 2007-10-14 00:37:05
Message-ID: 471164B1.8020909@pukkasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug


Gurus,

I'm looking for some guidance on deciding between using a bitwise scheme
or a mapping table. I have an "events" table with "type" and a
corresponding "types" definition table. I want to change from one type
per event to many. I am considering:

1. Storing the types as bits in the "events" table. I can join them
with "where types.type & events.type = types.type"

2. Using a mapping table like "eventsToTypes" to connect the two tables.

I am more familiar with the second method but I like the bits approach
because it lets me do things like search for "any of the following event
types" very easily and the number of types of events is limited to what
will fit in a 32-bit integer.

Any thoughts? How does that type of bitwise operation perform in
comparison to the mapping table (where the key may be either an integer
or a UUID)?

Thanks,

Brian


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Re: Using bitwise operator vs. mapping table
Date: 2007-10-14 21:07:19
Message-ID: 200710141407.19905.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

Brian,

> I am more familiar with the second method but I like the bits approach
> because it lets me do things like search for "any of the following event
> types" very easily and the number of types of events is limited to what
> will fit in a 32-bit integer.
>
> Any thoughts? How does that type of bitwise operation perform in
> comparison to the mapping table (where the key may be either an integer
> or a UUID)?

Actually, for this approach you want to use an INTARRAY and not a bitmap,
becuase we have special indexes (based on GIST or GIN) for INTARRAY. In
theory, one could create a GIST index for bitmaps, but nobody's done it yet.

Whether or not to use such an approach is the usual question of
denormalization. That is, using the normalized approach (an attributes child
table) will make data maintenance, validation, and future schema extensions
easier. Using the denormailzed approach (intarrays) will make specific
queries faster.

So the first question to ask before even considering denormalization is, are
your search queries, in fact, slow?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Re: Using bitwise operator vs. mapping table
Date: 2007-10-15 21:19:15
Message-ID: 4713D953.9@pukkasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug


Josh Berkus wrote:
> Actually, for this approach you want to use an INTARRAY and not a bitmap,
> becuase we have special indexes (based on GIST or GIN) for INTARRAY. In
> theory, one could create a GIST index for bitmaps, but nobody's done it yet.

I've never used the array types before; I'll read up on it.

> So the first question to ask before even considering denormalization is, are
> your search queries, in fact, slow?

They aren't terrible today and I suspect there is some low hanging
optimization that could be collected by a real DBA. However, we're
growing and our users run quite a lot of reports which already do 5 or
6-way joins. I'm considering bit/intarray to avoid making it an 8-way
join (there are two fields we're going many-to-many with on this central
"registration" table).

> easier. Using the denormailzed approach (intarrays) will make specific
> queries faster.

Are there any benefits other than speed in the denormalized approach?
One potential I see is that we can determine the various member types
from a single record which I can see being convenient for reporting and
decision making.

Once I get through this PCI DSS effort I may hire someone to get that
low-hanging fruit but I am trying to keep this other dev work moving
forward in the mean time.

Brian


From: David Fetter <david(at)fetter(dot)org>
To: SF Postgres <sfpug(at)postgresql(dot)org>
Subject: Re: Using bitwise operator vs. mapping table
Date: 2007-10-15 22:14:27
Message-ID: 20071015221427.GN27311@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

On Mon, Oct 15, 2007 at 02:19:15PM -0700, Brian Ghidinelli wrote:
> Josh Berkus wrote:
> >Actually, for this approach you want to use an INTARRAY and not a
> >bitmap, becuase we have special indexes (based on GIST or GIN) for
> >INTARRAY. In theory, one could create a GIST index for bitmaps,
> >but nobody's done it yet.
>
> I've never used the array types before; I'll read up on it.
>
> >So the first question to ask before even considering
> >denormalization is, are your search queries, in fact, slow?
>
> They aren't terrible today and I suspect there is some low hanging
> optimization that could be collected by a real DBA.

The word "optimize," is usually a synonym for, "give up on getting the
whole thing right." Don't do it unless forced to.

> However, we're growing and our users run quite a lot of reports
> which already do 5 or 6-way joins.

Is this actually causing a problem?

> I'm considering bit/intarray to avoid making it an 8-way join (there
> are two fields we're going many-to-many with on this central
> "registration" table).
>
> >easier. Using the denormailzed approach (intarrays) will make
> >specific queries faster.
>
> Are there any benefits other than speed in the denormalized
> approach?

No, and there are a lot of maintenance and data integrity problems
inherent in it.

> One potential I see is that we can determine the various member
> types from a single record which I can see being convenient for
> reporting and decision making.
>
> Once I get through this PCI DSS effort I may hire someone to get
> that low-hanging fruit but I am trying to keep this other dev work
> moving forward in the mean time.

Premature optimization is the root of all evil.
Donald E. Knuth,
ACM Journal Computing Surveys
Vol 6, No. 4, Dec. 1974. p.268

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
To: SF Postgres <sfpug(at)postgresql(dot)org>
Subject: Re: Using bitwise operator vs. mapping table
Date: 2007-10-16 02:02:26
Message-ID: 47141BB2.3030708@pukkasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug


David Fetter wrote:
>> However, we're growing and our users run quite a lot of reports
>> which already do 5 or 6-way joins.
>
> Is this actually causing a problem?

EXPLAIN ANALYZE isn't too excited. This is where I think a real DBA
could tweak the queries and indexes a bit to improve the times.

> Premature optimization is the root of all evil.
> Donald E. Knuth,
> ACM Journal Computing Surveys
> Vol 6, No. 4, Dec. 1974. p.268

And optimization permanently left undone is the root of all slow
applications! :) Ok, maybe that's not true either. This application
is 4 years old; it's due for some optimization so I pinged the list to
see if this approach had any advantages.

Brian


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Re: Using bitwise operator vs. mapping table
Date: 2007-10-16 15:08:06
Message-ID: 200710160808.07024.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

Brian,

> They aren't terrible today and I suspect there is some low hanging
> optimization that could be collected by a real DBA. However, we're
> growing and our users run quite a lot of reports which already do 5 or
> 6-way joins.

Do you expect that the number of joins would actually increase, though?

> Are there any benefits other than speed in the denormalized approach?

No, so you only want to do it if you have a real performance problem which
you've already tried to solve other ways.

If you want the "rollup" view of the attributes table, you can always create a
custom aggregate which displays a list.

> Once I get through this PCI DSS effort I may hire someone to get that
> low-hanging fruit but I am trying to keep this other dev work moving
> forward in the mean time.

Good idea.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco