Lists: | pdxpug |
---|
From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | pdxpug(at)postgresql(dot)org |
Subject: | Entering '<1' and 'BDL' |
Date: | 2010-04-19 23:12:06 |
Message-ID: | alpine.LNX.2.00.1004191608060.22750@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
For a scientific database holding observed and measured values there are
two non-numeric (NAN) legitimate entries: <1 and 'below detection limits.'
Is anyone aware of a standard way of expressing these?
The problem is that the column needs to be of a numeric type because the
numbers will be manipulated mathematically and summarized statistically. So
a NAN such as '<1' or 'BDL' doesn't cut it.
Any thoughts?
Rich
--
Richard B. Shepard, Ph.D. | Integrity Credibility
Applied Ecosystem Services, Inc. | Innovation
<http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | pdxpug(at)postgresql(dot)org |
Subject: | Re: Entering '<1' and 'BDL' -- RESOLVED |
Date: | 2010-04-20 02:09:08 |
Message-ID: | alpine.LNX.2.00.1004191908250.22750@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
On Mon, 19 Apr 2010, Rich Shepard wrote:
> Any thoughts?
NULL does the trick. After all, if the value is below detection limits or
less than 1 the actual value is unknown.
Rich
From: | "Melissa Hollingsworth" <melissa(at)fastanimals(dot)com> |
---|---|
To: | pdxpug(at)postgresql(dot)org |
Subject: | Re: Entering '<1' and 'BDL' -- RESOLVED |
Date: | 2010-04-20 16:16:02 |
Message-ID: | op.vbg0w0emoen9tq@satin.mshome.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
On Mon, 19 Apr 2010 20:09:08 -0600, Rich Shepard
<rshepard(at)appl-ecosys(dot)com> wrote:
> On Mon, 19 Apr 2010, Rich Shepard wrote:
>
>> Any thoughts?
>
> NULL does the trick. After all, if the value is below detection
> limits or
> less than 1 the actual value is unknown.
Might you ever need to distinguish between something less than one
and something which was actually undetectably low? I don't like
losing information, personally.
Using NULL for both loses the distinction between them. If you'll
never need to distinguish them, why are there two valid NAN values
in the first place?
--
Melissa Hollingsworth
melissa(at)fastanimals(dot)com
+1 503 841 5240
From: | "Ewan, Michael" <michael(dot)ewan(at)intel(dot)com> |
---|---|
To: | Rich Shepard <rshepard(at)appl-ecosys(dot)com>, "pdxpug(at)postgresql(dot)org" <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 17:38:10 |
Message-ID: | 66C9C47441840949A66773F8F0D9D6A7B01A6583@rrsmsx503.amr.corp.intel.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
As you noted, NULL will fit the bill for your NAN values, but make sure the table allows NULL's in that column and that your code handles NULL's either by skipping them (a self join can often help) or by converting them to 0 if you want them to count in the statistics.
> -----Original Message-----
> From: pdxpug-owner(at)postgresql(dot)org [mailto:pdxpug-owner(at)postgresql(dot)org]
> On Behalf Of Rich Shepard
> Sent: Monday, April 19, 2010 4:12 PM
> To: pdxpug(at)postgresql(dot)org
> Subject: [pdxpug] Entering '<1' and 'BDL'
>
> For a scientific database holding observed and measured values there
> are
> two non-numeric (NAN) legitimate entries: <1 and 'below detection
> limits.'
> Is anyone aware of a standard way of expressing these?
>
> The problem is that the column needs to be of a numeric type because
> the
> numbers will be manipulated mathematically and summarized
> statistically. So
> a NAN such as '<1' or 'BDL' doesn't cut it.
>
> Any thoughts?
>
> Rich
>
> --
> Richard B. Shepard, Ph.D. | Integrity
> Credibility
> Applied Ecosystem Services, Inc. | Innovation
> <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-
> 8863
>
> --
> Sent via pdxpug mailing list (pdxpug(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pdxpug
From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | "pdxpug(at)postgresql(dot)org" <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 17:43:01 |
Message-ID: | alpine.LNX.2.00.1004201039550.4238@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
On Tue, 20 Apr 2010, Ewan, Michael wrote:
> As you noted, NULL will fit the bill for your NAN values, but make sure
> the table allows NULL's in that column and that your code handles NULL's
> either by skipping them (a self join can often help) or by converting them
> to 0 if you want them to count in the statistics.
Michael,
There are always valid reasons for missing data: stream channel is dry or
water is frozen; mill not operating; lab equipment malfunction. While
regulators might be interested in how many values are missing, they tend to
be more interested in values that exceed statutory thresholds. Regardless,
we will be able to provide them with the information they want.
Thanks,
Rich
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | "Ewan, Michael" <michael(dot)ewan(at)intel(dot)com> |
Cc: | Rich Shepard <rshepard(at)appl-ecosys(dot)com>, "pdxpug(at)postgresql(dot)org" <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 18:00:57 |
Message-ID: | 1271786457.25602.10.camel@jd-desktop.unknown.charter.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
On Tue, 2010-04-20 at 11:38 -0600, Ewan, Michael wrote:
> As you noted, NULL will fit the bill for your NAN values, but make sure the table allows NULL's in that column and that your code handles NULL's either by skipping them (a self join can often help) or by converting them to 0 if you want them to count in the statistics.
I missed most of this discussion but, why aren't you using numeric which
actually supports NaN?
Joshua D. Drake
>
>
> > -----Original Message-----
> > From: pdxpug-owner(at)postgresql(dot)org [mailto:pdxpug-owner(at)postgresql(dot)org]
> > On Behalf Of Rich Shepard
> > Sent: Monday, April 19, 2010 4:12 PM
> > To: pdxpug(at)postgresql(dot)org
> > Subject: [pdxpug] Entering '<1' and 'BDL'
> >
> > For a scientific database holding observed and measured values there
> > are
> > two non-numeric (NAN) legitimate entries: <1 and 'below detection
> > limits.'
> > Is anyone aware of a standard way of expressing these?
> >
> > The problem is that the column needs to be of a numeric type because
> > the
> > numbers will be manipulated mathematically and summarized
> > statistically. So
> > a NAN such as '<1' or 'BDL' doesn't cut it.
> >
> > Any thoughts?
> >
> > Rich
> >
> > --
> > Richard B. Shepard, Ph.D. | Integrity
> > Credibility
> > Applied Ecosystem Services, Inc. | Innovation
> > <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-
> > 8863
> >
> > --
> > Sent via pdxpug mailing list (pdxpug(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pdxpug
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | "pdxpug(at)postgresql(dot)org" <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 19:08:30 |
Message-ID: | alpine.LNX.2.00.1004201207070.4238@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
On Tue, 20 Apr 2010, Joshua D. Drake wrote:
> I missed most of this discussion but, why aren't you using numeric which
> actually supports NaN?
Joshua,
The database we're building uses floating point numbers, where
appropriate. Spreadsheets use whatever someone wants to put in a cell. I
need to look at how NAN is expressed.
Thanks,
Rich
From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | "pdxpug(at)postgresql(dot)org" <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 19:38:31 |
Message-ID: | alpine.LNX.2.00.1004201236300.4238@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
On Tue, 20 Apr 2010, Rich Shepard wrote:
> The database we're building uses floating point numbers, where
> appropriate. Spreadsheets use whatever someone wants to put in a cell. I
> need to look at how NAN is expressed.
However, this does raise a question that I need to answer: how to
distinguish between a sample taken, analyzed, and with a value below
detection limits and a sample not taken. There may be legitimate reasons for
not taking a sample, or it might have been missed inadvertently. Or, the
sameple might have been taken and analyzed but the results not entered in
the system.
We need to ponder this.
Rich
From: | Tom Keller <kellert(at)ohsu(dot)edu> |
---|---|
To: | Postgresql PDX_Users <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 19:49:50 |
Message-ID: | 02BF391A-F3C9-480A-8C9B-C108C15AA425@ohsu.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
Why distinguish "below detection limit" at the data level? I would use the value measured and enforce the ramification of the detection limit in subsequent analysis. Then use NULL for missing values only.
... or isn't that feasible?
Tom
kellert(at)ohsu(dot)edu<mailto:kellert(at)ohsu(dot)edu>
503-494-2442
On Apr 20, 2010, at 12:38 PM, Rich Shepard wrote:
On Tue, 20 Apr 2010, Rich Shepard wrote:
The database we're building uses floating point numbers, where
appropriate. Spreadsheets use whatever someone wants to put in a cell. I
need to look at how NAN is expressed.
However, this does raise a question that I need to answer: how to
distinguish between a sample taken, analyzed, and with a value below
detection limits and a sample not taken. There may be legitimate reasons for
not taking a sample, or it might have been missed inadvertently. Or, the
sameple might have been taken and analyzed but the results not entered in
the system.
We need to ponder this.
Rich
--
Sent via pdxpug mailing list (pdxpug(at)postgresql(dot)org<mailto:pdxpug(at)postgresql(dot)org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pdxpug
From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | Postgresql PDX_Users <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 20:25:10 |
Message-ID: | alpine.LNX.2.00.1004201318470.4238@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
On Tue, 20 Apr 2010, Tom Keller wrote:
> Why distinguish "below detection limit" at the data level? I would use the
> ... value measured and enforce the ramification of the detection limit in
> subsequent analysis. Then use NULL for missing values only. or isn't that
> feasible?
Tom,
There is no measured value when it falls below the detection limit of the
instrument. Consider: you have a 6' measuring tape and you use that to
measure the length of your car. The car extends beyond the length of the
tape. What value do you enter for car length?
There are actually three situations that may occur with permit compliance
monitoring, and each must be recorded as distinct from the others. One, the
sample was not taken, or analyzed, or entered in the system. Two, the sample
was taken and analyzed, but the value falls outside the measuring range of
the instrument (or analytical method) used. Three, the sample could not be
taken because the stream channel was dry (ergo, no water to sample), the
pond frozen (ditto), or the machinery shut down and the plant not operating.
The latter two are expected and reasonable occurrences; the first one is a
no-no.
Rich
From: | Tom Keller <kellert(at)ohsu(dot)edu> |
---|---|
To: | Postgresql PDX_Users <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 20:49:10 |
Message-ID: | DABCCC8C-CC30-4BF5-AD5B-BAB118839696@ohsu.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
Rich,
I was thinking of something like a UV absorbance of a sample too dilute to accurately measure. You would get a reading of 0.04 or something, but since your spectrophotometer is only accurate between say, 0.4 - 3.0, it's essentially noise. Nonetheless, I can tell that by any value below my detection limit, so the value, though not accurate, would distinguish between not measured and measured but below the detection limit.
Similarly, I should think, if you measured something beyond, or off-scale, you could tell that from the value entered too. So for the spectrophotometer example, below 0.4 is noise and above 3.0 is off-scale. So you don't know what their accurate values are, you know they were "too low" to detect, or "too high" to measure with your instrument. But you know something. You then need to filter the records that can't be used for quantitative analysis. But retain them when calculating work done or other qualitative stuff.
That would allow you to use your current schema. Otherwise you need to complicate your schema, don't you?
Tom
kellert(at)ohsu(dot)edu<mailto:kellert(at)ohsu(dot)edu>
503-494-2442
On Apr 20, 2010, at 1:25 PM, Rich Shepard wrote:
On Tue, 20 Apr 2010, Tom Keller wrote:
Why distinguish "below detection limit" at the data level? I would use the
... value measured and enforce the ramification of the detection limit in
subsequent analysis. Then use NULL for missing values only. or isn't that
feasible?
Tom,
There is no measured value when it falls below the detection limit of the
instrument. Consider: you have a 6' measuring tape and you use that to
measure the length of your car. The car extends beyond the length of the
tape. What value do you enter for car length?
There are actually three situations that may occur with permit compliance
monitoring, and each must be recorded as distinct from the others. One, the
sample was not taken, or analyzed, or entered in the system. Two, the sample
was taken and analyzed, but the value falls outside the measuring range of
the instrument (or analytical method) used. Three, the sample could not be
taken because the stream channel was dry (ergo, no water to sample), the
pond frozen (ditto), or the machinery shut down and the plant not operating.
The latter two are expected and reasonable occurrences; the first one is a
no-no.
Rich
--
Sent via pdxpug mailing list (pdxpug(at)postgresql(dot)org<mailto:pdxpug(at)postgresql(dot)org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pdxpug
From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | Postgresql PDX_Users <pdxpug(at)postgresql(dot)org> |
Subject: | Re: Entering '<1' and 'BDL' |
Date: | 2010-04-20 20:59:49 |
Message-ID: | alpine.LNX.2.00.1004201351370.4238@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pdxpug |
On Tue, 20 Apr 2010, Tom Keller wrote:
> I was thinking of something like a UV absorbance of a sample too dilute to
> accurately measure.
Tom,
Our current client has this situation: the amount of SO2 (a surrogate for
Hg) coming out the roaster stacks is so low that the calibration is at the
low end of the measuring instrument. Therefore, calibrations fail on a
frequent basis. However, the EPA defines the scale to be used so they're
caught in a bind.
> You would get a reading of 0.04 or something, but since your
> spectrophotometer is only accurate between say, 0.4 - 3.0, it's
> essentially noise. Nonetheless, I can tell that by any value below my
> detection limit, so the value, though not accurate, would distinguish
> between not measured and measured but below the detection limit.
So, if you have results you know are below 0.4, what do you enter as a
value in the database? And, if the meter pegs at 3.0 and is trying to bend
around the pin, what value to you enter into the table for that reading? The
answers have to be understood by the techs who will be entering the data via
a Web browser form.
> Similarly, I should think, if you measured something beyond, or off-scale,
> you could tell that from the value entered too. So for the
> spectrophotometer example, below 0.4 is noise and above 3.0 is off-scale.
> So you don't know what their accurate values are, you know they were "too
> low" to detect, or "too high" to measure with your instrument.
Gimme' a f'instance of how you would record these readings in a table
where that attribute is of type NUMERIC. That's what we're looking for. :-)
We need a way to record such instances that make sense, are consistent and
easy for techs and mill workers to use, don't confuse data analyses (such as
averages and trends), and would be acceptable to regulators.
Rich