Re: data model help, was Re: Using null date fields to indicate

Lists: sfpug
From: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Using null date fields to indicate active/expired records
Date: 2004-08-06 11:31:12
Message-ID: 41136C00.1020107@pukkasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug


Does anyone have any experience using a start/end date pair to indicate
active/expired records where a NULL end date indicates an active record?

I'm looking for performance comparisons to using a simple boolean field
to indicate the same. The extra information about start to end would be
useful but is not strictly required.

I would be grateful for any advice. Thanks,

Brian


From: David Fetter <david(at)fetter(dot)org>
To: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
Cc: sfpug(at)postgresql(dot)org
Subject: Re: Using null date fields to indicate active/expired records
Date: 2004-08-06 15:34:01
Message-ID: 20040806153401.GF27306@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

On Fri, Aug 06, 2004 at 01:31:12PM +0200, Brian Ghidinelli wrote:

> Does anyone have any experience using a start/end date pair to
> indicate active/expired records where a NULL end date indicates an
> active record?

NULL end date really means "end date is unknown." An expired record
would be one with an end date in the past.

> I'm looking for performance comparisons to using a simple boolean
> field to indicate the same.

Premature optimization is a Bad Thing(TM). Don't do it!
Micro-optimization on this level is just silly.

Cheers,
D
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: David Fetter <david(at)fetter(dot)org>, Brian Ghidinelli <brian(at)pukkasoft(dot)com>
Cc: sfpug(at)postgresql(dot)org
Subject: Re: Using null date fields to indicate active/expired records
Date: 2004-08-06 16:15:47
Message-ID: 200408060915.47045.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

Brian,

> Premature optimization is a Bad Thing(TM). Don't do it!
> Micro-optimization on this level is just silly.

I agree with David on this one. How many billion records do you have that
adding a boolean field is a concern?

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
Cc: sfpug(at)postgresql(dot)org
Subject: Re: Using null date fields to indicate active/expired records
Date: 2004-08-06 17:27:17
Message-ID: 200408061027.17012.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

Brian,

> Approximately 14 billion. Ok, I'm kidding, but can anyone answer my
> question? Although this might be micro-optimization, does it hurt to
> know more about date performance?

No, there's nothing wrong with knowing. What David and I are asserting is
that your decision should be determined by having a correct data model, and
not by counting bits, unless you're in a really extreme situation where you
have already tested and know you have a problem.

Josh's Database Rules #3: the performance loss for slow queries is generally
exceeded by the downtime caused by a bad data model by a factor of 100 or
more.

> There are two parts to my question; one, which is faster, and two, does
> anyone have any advice about using either a single boolean flag or using
> a 1-n type date?

Which is faster depends on your query structure and the distribution of your
data; that's the other reason not to make decisions on this basis now. If
you want a serious answer on this, please post your current table structure
and an explanation of what kind of data is kept in the various fields and how
you query it.

--
-Josh Berkus
Aglio Database Solutions
San Francisco


From: David Fetter <david(at)fetter(dot)org>
To: SF Postgres <sfpug(at)postgresql(dot)org>
Subject: Re: Using null date fields to indicate active/expired records
Date: 2004-08-06 19:08:53
Message-ID: 20040806190853.GM27306@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

On Fri, Aug 06, 2004 at 10:27:17AM -0700, Josh Berkus wrote:
> Brian,
>
> > Approximately 14 billion. Ok, I'm kidding, but can anyone answer
> > my question? Although this might be micro-optimization, does it
> > hurt to know more about date performance?
>
> No, there's nothing wrong with knowing. What David and I are
> asserting is that your decision should be determined by having a
> correct data model, and not by counting bits, unless you're in a
> really extreme situation where you have already tested and know you
> have a problem.
>
> Josh's Database Rules #3: the performance loss for slow queries is
> generally exceeded by the downtime caused by a bad data model by a
> factor of 100 or more.

A corollary, Dave's Database Rule #187, is that Josh's Database Rule
#3 wildly underestimates this factor.

> > There are two parts to my question; one, which is faster, and two,
> > does anyone have any advice about using either a single boolean
> > flag or using a 1-n type date?
>
> Which is faster depends on your query structure and the distribution
> of your data; that's the other reason not to make decisions on this
> basis now. If you want a serious answer on this, please post your
> current table structure and an explanation of what kind of data is
> kept in the various fields and how you query it.

Yep.

Cheers,
D
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
To: josh(at)agliodbs(dot)com
Cc: sfpug(at)postgresql(dot)org
Subject: Re: Using null date fields to indicate active/expired records
Date: 2004-08-10 12:50:46
Message-ID: 4118C4A6.9030408@pukkasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug


Josh Berkus wrote:
> Which is faster depends on your query structure and the distribution of your
> data; that's the other reason not to make decisions on this basis now. If
> you want a serious answer on this, please post your current table structure
> and an explanation of what kind of data is kept in the various fields and how
> you query it.

My database for this project is about 40 tables. I don't know where the
line is drawn on this list between postgres specific discussion and data
model review.

I am making a lot of changes right now including this archived/deleted
indicator so I would welcome the opportunity to review my plans if it's
acceptable, but I don't want to abuse the list.

Brian


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
Cc: sfpug(at)postgresql(dot)org
Subject: Re: Using null date fields to indicate active/expired records
Date: 2004-08-10 22:48:40
Message-ID: 200408101548.40370.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

Brian,

> My database for this project is about 40 tables. I don't know where the
> line is drawn on this list between postgres specific discussion and data
> model review.

I don't think that line exists.

> I am making a lot of changes right now including this archived/deleted
> indicator so I would welcome the opportunity to review my plans if it's
> acceptable, but I don't want to abuse the list.

Well, just show us the immediately relevant table(s); I'm not sure I'll read
an entire schema dump.

--
-Josh Berkus
"A developer of Very Little Brain"
Aglio Database Solutions
San Francisco


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Brian Ghidinelli <brian(at)pukkasoft(dot)com>, sfpug(at)postgresql(dot)org
Subject: Re: data model help, was Re: Using null date fields to indicate active/expired records
Date: 2004-08-11 16:19:01
Message-ID: 200408110919.01110.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

Brian,

> Alrighty... attached is a small (32k) image generated from Erwin that
> shows the relevant parts of the model. The tables are not fleshed out
> to save space; they mostly show keys.

How about a SQL dump of just the tables we're dealing with? I can't speak
for others, but I'm not going to do a general data model critique.
Particularly since I'd have to take a machete to it, and I charge for that
;-)

> I use a psuedo-hungarian notation
> to indicate what the datatype is (uid = uuid, vch = varchar, dte = date,
> dtm = datetime, int/tnt/snt = int, boo = boolean).

Hungarian notation! Aiieeeeeeeeeeeeeeeeeee!
For the sanity of your successors, you may want to consider giving up HU in
database design. Actually, you should give up HU entirely; it's a relic of
the "bad early '90's" and visual basic "programming."

> A challenge I'm facing is balancing a very strict data model versus a
> more flexible data model that will allow the organizers to "bend the
> rules" a bit (e.g., whether or not to require tblMapAttendeeClasses to
> use classes available in tblMapEventClasses).

Flexible data models should be accomodated through your database and
application design, not created by omission. For example, if you want
organizers to be able to add new "classes" on the fly, then design for it;
don't just leave the constraint out.

Now, how does this relate to your question about using NULL dates?

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: sfpug(at)postgresql(dot)org
Subject: Re: data model help, was Re: Using null date fields to indicate
Date: 2004-08-12 09:39:17
Message-ID: 411B3AC5.6000305@pukkasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug


Josh Berkus wrote:
> How about a SQL dump of just the tables we're dealing with? I can't speak

These tables don't exist yet, they're in the design stage.

> For the sanity of your successors, you may want to consider giving up HU in

Since it is intelligible and provides useful information at the
application development and debugging level, I don't consider it a relic.

> Now, how does this relate to your question about using NULL dates?

As asked, I provided examples of how the data is queried and described
my current questions:

1. Using NULL dates versus a boolean indicator, for example, in the
tblMapAttendeeFlags. The extra information about start and end is not
really necessary but may be a "nice to have". My original inquiry
regarding performance, if it was determined the date approach was
considerably slower, would have swung me to use boolean here.
Membership type, however, in tblMapMemberType requires the tenure of a
position so that will use the date approach, regardless of performance.

2. I'm not sure about feeding the PK of tblMapAttendeeClasses from to
tblMapEventClasses. The image I sent doesn't have it this way but this
requirement would force the attendee classes to use those classes that
have been "offered" explicitly. I can't decide if tblMapEventClasses is
completely unnecessary or should be more important?

3. tblMapAttendeeClasses stores 1-n records of a tblLookupAttendee
record to specify which event parts the attendee participates in (along
with class, vehicle, instructor, grid pos, etc). I've currently not
referenced tblLookupAttendee (uidClubMember, uidEvent) but I'm wondering
if I'm better off doing so?

4. Does my approach for storing arbitrary custom questions (that may be
of several data types) in tblLookupCustomQ and tblMapCustomQ seem OK?
I'm sure others have come across a similar requirement; thoughts?

Brian


From: David Fetter <david(at)fetter(dot)org>
To: SF Postgres <sfpug(at)postgresql(dot)org>
Subject: Re: data model help, was Re: Using null date fields to indicate
Date: 2004-08-12 15:21:30
Message-ID: 20040812152130.GG3515@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

On Thu, Aug 12, 2004 at 10:39:17AM +0100, Brian Ghidinelli wrote:
>
> Josh Berkus wrote:
> >How about a SQL dump of just the tables we're dealing with? I can't
> >speak
>
> These tables don't exist yet, they're in the design stage.

Well, that part's a blessing anyhow.

> >For the sanity of your successors, you may want to consider giving
> >up HU in
>
> Since it is intelligible and provides useful information at the
> application development and debugging level, I don't consider it a
> relic.

You seriously missed Josh's point. What *you* understand is not
relevant from a maintenance point of view. What your successors are
likely to understand *is*.

> >Now, how does this relate to your question about using NULL dates?
>
> As asked, I provided examples of how the data is queried and described
> my current questions:
>
> 1. Using NULL dates versus a boolean indicator, for example, in the
> tblMapAttendeeFlags. The extra information about start and end is
> not really necessary but may be a "nice to have". My original
> inquiry regarding performance,

You are <a href="http://fetter.org/optimization.html">optimizing</a>.
You are optimizing prematurely. Stop it now.

> 2. I'm not sure about feeding the PK of tblMapAttendeeClasses from
> to tblMapEventClasses. The image I sent doesn't have it this way
> but this requirement would force the attendee classes to use those
> classes that have been "offered" explicitly. I can't decide if
> tblMapEventClasses is completely unnecessary or should be more
> important?
>
> 3. tblMapAttendeeClasses stores 1-n records of a tblLookupAttendee
> record to specify which event parts the attendee participates in
> (along with class, vehicle, instructor, grid pos, etc). I've
> currently not referenced tblLookupAttendee (uidClubMember, uidEvent)
> but I'm wondering if I'm better off doing so?
>
> 4. Does my approach for storing arbitrary custom questions (that may
> be of several data types) in tblLookupCustomQ and tblMapCustomQ seem
> OK? I'm sure others have come across a similar requirement;
> thoughts?

Here's mine. Hire somebody to help with your design, and take their
advice. You're in way over your head here.

Cheers,
D
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
Cc: sfpug(at)postgresql(dot)org
Subject: Re: data model help, was Re: Using null date fields to indicate active/expired records
Date: 2004-08-12 16:49:43
Message-ID: 200408120949.43766.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug

Brian,

> Since it is intelligible and provides useful information at the
> application development and debugging level, I don't consider it a relic.

It's your app. Just let me know who the client is in case they ever call me.
I charge extra to deal with HU ;-)

> 1. Using NULL dates versus a boolean indicator, for example, in the
> tblMapAttendeeFlags. The extra information about start and end is not
> really necessary but may be a "nice to have". My original inquiry
> regarding performance, if it was determined the date approach was
> considerably slower, would have swung me to use boolean here.
> Membership type, however, in tblMapMemberType requires the tenure of a
> position so that will use the date approach, regardless of performance.

Can you describe the purpose of each field in this table? I think that's far
more crucial than any performance criteria.

> 2. I'm not sure about feeding the PK of tblMapAttendeeClasses from to
> tblMapEventClasses. The image I sent doesn't have it this way but this
> requirement would force the attendee classes to use those classes that
> have been "offered" explicitly. I can't decide if tblMapEventClasses is
> completely unnecessary or should be more important?

I'm afraid that I don't understand your application well enough to answer that
sort of question. Particularly, you appear to be doing a fair amount of
object-relational mapping here and that always requires a thorough
understanding of the application which I don't have.

> 3. tblMapAttendeeClasses stores 1-n records of a tblLookupAttendee
> record to specify which event parts the attendee participates in (along
> with class, vehicle, instructor, grid pos, etc). I've currently not
> referenced tblLookupAttendee (uidClubMember, uidEvent) but I'm wondering
> if I'm better off doing so?

See above.

> 4. Does my approach for storing arbitrary custom questions (that may be
> of several data types) in tblLookupCustomQ and tblMapCustomQ seem OK?
> I'm sure others have come across a similar requirement; thoughts?

Define "question".

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Brian Ghidinelli <brian(at)pukkasoft(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Re: data model help, was Re: Using null date fields to indicate
Date: 2004-08-15 08:19:30
Message-ID: 411F1C92.40102@pukkasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: sfpug


David Fetter wrote:
> You are <a href="http://fetter.org/optimization.html">optimizing</a>.
> You are optimizing prematurely. Stop it now.
>
> Here's mine. Hire somebody to help with your design, and take their
> advice. You're in way over your head here.

Thanks for the constructive input.

Although I'm no more entitled to help here than anyone else,
condescending attitudes such as yours foster feifdom over community.
It's no surprise to me now why this list has such low traffic.

I hope it doesn't drive people away from Postgres because it is a great
product. It will, however, drive people away from this mailing list.

Gurgle, gurgle,

Brian