Re: Finding overlapping records

Lists: Postg토토 캔SQL : Postg토토 캔SQL 메일 링리스트 : 2009-12-10 이후 AustinPug 17:15
From: Frank Sheiness <frank(at)dough(dot)net>
To: austinpug(at)postgresql(dot)org
Subject: Finding overlapping records
Date: 2009-12-10 08:30:14
Message-ID: 20091210083014.GA31938@forbidden.texas.rr.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: austinpug


Hello,

I had a question that I meant to bring up during the meeting but forgot
until the end..

I have these tables representing DHCP leases:

CREATE TABLE ips (
ip serial PRIMARY KEY,
address inet NOT NULL
);

CREATE TABLE leases (
lease serial PRIMARY KEY,
ip integer NOT NULL REFERENCES ips (ip) ON UPDATE CASCADE ON DELETE RESTRICT,
mac macaddr NOT NULL,
starts timestamp NOT NULL,
ends timestamp NOT NULL,
server integer NOT NULL,
UNIQUE (ip, mac, starts, ends)
);

To optimize the number of records, I would like to make it so when I insert a
new lease, it checks to see if there's a relevant overlapping lease for that
mac/ip combination. If so, I want to update the end time of the old record
with the end time of the new record instead of doing the insert.

Currently, I'm accomplishing this with a trigger that calls this function:

CREATE FUNCTION optimize_lease_fx() RETURNS trigger AS $optimize_lease_fx$
DECLARE
id integer;
BEGIN
SELECT lease INTO id FROM leases WHERE mac = NEW.mac and
ip = NEW.ip and NEW.starts between starts AND ends AND
NEW.ends >= ends;
IF NOT FOUND THEN
RETURN NEW;
ELSE
EXECUTE 'UPDATE leases SET ends = '
|| quote_literal(NEW.ends)
|| ' where lease = '
|| quote_literal(id);
RETURN NULL;
END IF;
END
$optimize_lease_fx$ LANGUAGE plpgsql;

It works, but I'm wondering if there's a better way to do this. I'll be
flushing data when it's older than say.. 3 months. Also, keep in mind that
some pre-optimized data may look like this (below). The new rows will
overlap with each other and/or existing rows.

lease | ip | mac | starts | ends | server
--------+-------+-------------------+---------------------+---------------------+---------
255646 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 04:21:02 | 2009-12-10 06:21:02 | 400001
256948 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:38 | 2009-12-10 08:13:38 | 400001
256951 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:42 | 2009-12-10 08:13:42 | 400001
256954 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:50 | 2009-12-10 08:13:50 | 400001
257073 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:34 | 2009-12-10 08:17:34 | 400001
257077 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:38 | 2009-12-10 08:17:38 | 400001
257084 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:47 | 2009-12-10 08:17:47 | 400001
257157 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:33 | 2009-12-10 08:19:33 | 400001
257158 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:37 | 2009-12-10 08:19:37 | 400001
257168 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:44 | 2009-12-10 08:19:44 | 400001
257212 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:31 | 2009-12-10 08:20:31 | 400001
257215 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:34 | 2009-12-10 08:20:34 | 400001
257217 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:43 | 2009-12-10 08:20:43 | 400001
257230 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:59 | 2009-12-10 08:20:59 | 400001
257234 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:21:02 | 2009-12-10 08:21:02 | 400001
257236 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:21:03 | 2009-12-10 08:21:03 | 400001

Thanks,
Frank


From: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
To: Frank Sheiness <frank(at)dough(dot)net>
Cc: austinpug(at)postgresql(dot)org
Subject: Re: Finding overlapping records
Date: 2009-12-10 17:12:32
Message-ID: 4B212C00.4010608@thewickedtribe.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg윈 토토SQL : Postg윈 토토SQL 메일 링리스트 : 2009-12-10 17:12 이후 AustinPug

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frank,

First of all, you've got a big hole in your overlap check. You're only
checking for (new span is --- existing is +++):

*+++++++*
*-------*

when you really need to check for:

*++++++++*
*---------*
*-------*
*--------------*
*---*

/me pulls out Celko's SQL For Smarties...

So what you would naturally write is perhaps (s1 and e1 are start and
end of existing span, s2 and e2 are the new span):

WHERE
s2 between s1 and e1
OR e2 between s1 and e1
OR s1 between s2 and e2
OR e1 between s2 and e2;

which is a bit long and ugly. There's a shortcut you can take, here's
how you would search for things that *don't* overlap:

*+++++*
*----*
*-----*

so you can write it as:

WHERE NOT ((e2 < s1) OR (s2 > e1));

which is *much* cleaner, no? ;)

Credit goes to Joe Celko, SQL for Smarties, Chapter 13: Between and
Overlaps Predicate, 13.2 Overlaps Predicate, page 279.

Postgres actually has OVERLAPS, so you can just say:

WHERE (s2, e2) OVERLAPS (s1, e1);

however, at least in 8.1, that doesn't use the indexes on the start_date
and end_date. The shortcut above does use those indexes and is nice and
fast.

You should test and see if 8.3 or 8.4 will use the indexes for OVERLAPS
though...
- --

Jon T Erdman

Chief Information Officer voice: (210) 400-5717
Progressive Practice, Inc. jon(at)progressivepractice(dot)com
P.O. Box 17288 www.progressivepractice.com
Rochester, NY 14617

Frank Sheiness wrote:
> Hello,
>
> I had a question that I meant to bring up during the meeting but forgot
> until the end..
>
> I have these tables representing DHCP leases:
>
> CREATE TABLE ips (
> ip serial PRIMARY KEY,
> address inet NOT NULL
> );
>
> CREATE TABLE leases (
> lease serial PRIMARY KEY,
> ip integer NOT NULL REFERENCES ips (ip) ON UPDATE CASCADE ON DELETE RESTRICT,
> mac macaddr NOT NULL,
> starts timestamp NOT NULL,
> ends timestamp NOT NULL,
> server integer NOT NULL,
> UNIQUE (ip, mac, starts, ends)
> );
>
> To optimize the number of records, I would like to make it so when I insert a
> new lease, it checks to see if there's a relevant overlapping lease for that
> mac/ip combination. If so, I want to update the end time of the old record
> with the end time of the new record instead of doing the insert.
>
> Currently, I'm accomplishing this with a trigger that calls this function:
>
> CREATE FUNCTION optimize_lease_fx() RETURNS trigger AS $optimize_lease_fx$
> DECLARE
> id integer;
> BEGIN
> SELECT lease INTO id FROM leases WHERE mac = NEW.mac and
> ip = NEW.ip and NEW.starts between starts AND ends AND
> NEW.ends >= ends;
> IF NOT FOUND THEN
> RETURN NEW;
> ELSE
> EXECUTE 'UPDATE leases SET ends = '
> || quote_literal(NEW.ends)
> || ' where lease = '
> || quote_literal(id);
> RETURN NULL;
> END IF;
> END
> $optimize_lease_fx$ LANGUAGE plpgsql;
>
> It works, but I'm wondering if there's a better way to do this. I'll be
> flushing data when it's older than say.. 3 months. Also, keep in mind that
> some pre-optimized data may look like this (below). The new rows will
> overlap with each other and/or existing rows.
>
>
> lease | ip | mac | starts | ends | server
> --------+-------+-------------------+---------------------+---------------------+---------
> 255646 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 04:21:02 | 2009-12-10 06:21:02 | 400001
> 256948 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:38 | 2009-12-10 08:13:38 | 400001
> 256951 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:42 | 2009-12-10 08:13:42 | 400001
> 256954 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:13:50 | 2009-12-10 08:13:50 | 400001
> 257073 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:34 | 2009-12-10 08:17:34 | 400001
> 257077 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:38 | 2009-12-10 08:17:38 | 400001
> 257084 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:17:47 | 2009-12-10 08:17:47 | 400001
> 257157 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:33 | 2009-12-10 08:19:33 | 400001
> 257158 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:37 | 2009-12-10 08:19:37 | 400001
> 257168 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:19:44 | 2009-12-10 08:19:44 | 400001
> 257212 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:31 | 2009-12-10 08:20:31 | 400001
> 257215 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:34 | 2009-12-10 08:20:34 | 400001
> 257217 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:43 | 2009-12-10 08:20:43 | 400001
> 257230 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:20:59 | 2009-12-10 08:20:59 | 400001
> 257234 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:21:02 | 2009-12-10 08:21:02 | 400001
> 257236 | 22625 | 00:0d:36:e8:4b:d2 | 2009-12-10 06:21:03 | 2009-12-10 08:21:03 | 400001
>
>
>
> Thanks,
> Frank
>

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkshLAAACgkQRAk1+p0GhSHjEACfRVbMyUIOxAQCfFpsjwe+Yp5f
XREAn1GEUsrriKu9z3giBmxGL/5SxtOH
=rm2l
-----END PGP SIGNATURE-----


From: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
To: Frank Sheiness <frank(at)dough(dot)net>
Cc: austinpug(at)postgresql(dot)org
Subject: Re: Finding overlapping records
Date: 2009-12-10 17:15:38
Message-ID: 4B212CBA.9010602@thewickedtribe.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 캔SQL : Postg토토 캔SQL 메일 링리스트 : 2009-12-10 이후 AustinPug 17:15

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Doh. Just read your whole message, so I see why you were only checking
for the one case. I think the full discussion is still worthwhile though.

Jon Erdman wrote:
>
> Frank,
>
> First of all, you've got a big hole in your overlap check. You're only
> checking for (new span is --- existing is +++):
>
> *+++++++*
> *-------*
>
> when you really need to check for:
>
> *++++++++*
> *---------*
> *-------*
> *--------------*
> *---*
>
> /me pulls out Celko's SQL For Smarties...
>
> So what you would naturally write is perhaps (s1 and e1 are start and
> end of existing span, s2 and e2 are the new span):
>
> WHERE
> s2 between s1 and e1
> OR e2 between s1 and e1
> OR s1 between s2 and e2
> OR e1 between s2 and e2;
>
> which is a bit long and ugly. There's a shortcut you can take, here's
> how you would search for things that *don't* overlap:
>
> *+++++*
> *----*
> *-----*
>
> so you can write it as:
>
> WHERE NOT ((e2 < s1) OR (s2 > e1));
>
> which is *much* cleaner, no? ;)
>
> Credit goes to Joe Celko, SQL for Smarties, Chapter 13: Between and
> Overlaps Predicate, 13.2 Overlaps Predicate, page 279.
>
> Postgres actually has OVERLAPS, so you can just say:
>
> WHERE (s2, e2) OVERLAPS (s1, e1);
>
> however, at least in 8.1, that doesn't use the indexes on the start_date
> and end_date. The shortcut above does use those indexes and is nice and
> fast.
>
> You should test and see if 8.3 or 8.4 will use the indexes for OVERLAPS
> though...

- --

Jon T Erdman

Chief Information Officer voice: (210) 400-5717
Progressive Practice, Inc. jon(at)progressivepractice(dot)com
P.O. Box 17288 www.progressivepractice.com
Rochester, NY 14617

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkshLLoACgkQRAk1+p0GhSFVoQCePh1qJeljm6M294ItqKmO36a9
mvoAn2qo1uzd0keVZe8XfH6Zg5DI6XS1
=0/8a
-----END PGP SIGNATURE-----


From: Frank Sheiness <frank(at)korcett(dot)com>
To: austinpug(at)postgresql(dot)org
Cc: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
Subject: Re: Finding overlapping records
Date: 2009-12-10 20:42:56
Message-ID: 20091210204256.GA44979@forbidden.texas.rr.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: austinpug

I think you are right about the other cases. Especially about the case
where the newer lease resides entirely inside the older lease. I noticed
that one right after I sent the email last night and updated my trigger for
it.

As for the other ones, I was just depending on the order in the lease file
to protect me.

We're still on 8.2 for now. I started to look at the period data type from
John Davis and will play with it today.

On Thu, Dec 10, 2009 at 11:15:38AM -0600, Jon Erdman wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Doh. Just read your whole message, so I see why you were only checking
> for the one case. I think the full discussion is still worthwhile though.
>
> Jon Erdman wrote:
> >
> > Frank,
> >
> > First of all, you've got a big hole in your overlap check. You're only
> > checking for (new span is --- existing is +++):
> >
> > *+++++++*
> > *-------*
> >
> > when you really need to check for:
> >
> > *++++++++*
> > *---------*
> > *-------*
> > *--------------*
> > *---*
> >
> > /me pulls out Celko's SQL For Smarties...
> >
> > So what you would naturally write is perhaps (s1 and e1 are start and
> > end of existing span, s2 and e2 are the new span):
> >
> > WHERE
> > s2 between s1 and e1
> > OR e2 between s1 and e1
> > OR s1 between s2 and e2
> > OR e1 between s2 and e2;
> >
> > which is a bit long and ugly. There's a shortcut you can take, here's
> > how you would search for things that *don't* overlap:
> >
> > *+++++*
> > *----*
> > *-----*
> >
> > so you can write it as:
> >
> > WHERE NOT ((e2 < s1) OR (s2 > e1));
> >
> > which is *much* cleaner, no? ;)
> >
> > Credit goes to Joe Celko, SQL for Smarties, Chapter 13: Between and
> > Overlaps Predicate, 13.2 Overlaps Predicate, page 279.
> >
> > Postgres actually has OVERLAPS, so you can just say:
> >
> > WHERE (s2, e2) OVERLAPS (s1, e1);
> >
> > however, at least in 8.1, that doesn't use the indexes on the start_date
> > and end_date. The shortcut above does use those indexes and is nice and
> > fast.
> >
> > You should test and see if 8.3 or 8.4 will use the indexes for OVERLAPS
> > though...
>
> - --
>
> Jon T Erdman
>
> Chief Information Officer voice: (210) 400-5717
> Progressive Practice, Inc. jon(at)progressivepractice(dot)com
> P.O. Box 17288 www.progressivepractice.com
> Rochester, NY 14617
>
>
> -----BEGIN PGP SIGNATURE-----
>
> iEYEARECAAYFAkshLLoACgkQRAk1+p0GhSFVoQCePh1qJeljm6M294ItqKmO36a9
> mvoAn2qo1uzd0keVZe8XfH6Zg5DI6XS1
> =0/8a
> -----END PGP SIGNATURE-----


From: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
To: Frank Sheiness <frank(at)korcett(dot)com>
Cc: austinpug(at)postgresql(dot)org
Subject: Re: Finding overlapping records
Date: 2009-12-10 20:52:08
Message-ID: 4B215F78.9030505@thewickedtribe.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: austinpug

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hmm...reading my message now, I realize this bit might have been
unclear. To find things that *don't* overlap is straightforward, you can
say:

WHERE (e2 < s1) OR (s2 > e1);

so the trick is, you can simply invert that to find things that *do*
overlap:

WHERE NOT ((e2 < s1) OR (s2 > e1));

and it uses indexes and catches all cases! ;)

Frank Sheiness wrote:
> I think you are right about the other cases. Especially about the case
> where the newer lease resides entirely inside the older lease. I noticed
> that one right after I sent the email last night and updated my trigger for
> it.
>
> As for the other ones, I was just depending on the order in the lease file
> to protect me.
>
> We're still on 8.2 for now. I started to look at the period data type from
> John Davis and will play with it today.
>
> On Thu, Dec 10, 2009 at 11:15:38AM -0600, Jon Erdman wrote:
>
> Doh. Just read your whole message, so I see why you were only checking
> for the one case. I think the full discussion is still worthwhile though.
>
> Jon Erdman wrote:
>>>> Frank,
>>>>
>>>> First of all, you've got a big hole in your overlap check. You're only
>>>> checking for (new span is --- existing is +++):
>>>>
>>>> *+++++++*
>>>> *-------*
>>>>
>>>> when you really need to check for:
>>>>
>>>> *++++++++*
>>>> *---------*
>>>> *-------*
>>>> *--------------*
>>>> *---*
>>>>
>>>> /me pulls out Celko's SQL For Smarties...
>>>>
>>>> So what you would naturally write is perhaps (s1 and e1 are start and
>>>> end of existing span, s2 and e2 are the new span):
>>>>
>>>> WHERE
>>>> s2 between s1 and e1
>>>> OR e2 between s1 and e1
>>>> OR s1 between s2 and e2
>>>> OR e1 between s2 and e2;
>>>>
>>>> which is a bit long and ugly. There's a shortcut you can take, here's
>>>> how you would search for things that *don't* overlap:
>>>>
>>>> *+++++*
>>>> *----*
>>>> *-----*
>>>>
>>>> so you can write it as:
>>>>
>>>> WHERE NOT ((e2 < s1) OR (s2 > e1));
>>>>
>>>> which is *much* cleaner, no? ;)
>>>>
>>>> Credit goes to Joe Celko, SQL for Smarties, Chapter 13: Between and
>>>> Overlaps Predicate, 13.2 Overlaps Predicate, page 279.
>>>>
>>>> Postgres actually has OVERLAPS, so you can just say:
>>>>
>>>> WHERE (s2, e2) OVERLAPS (s1, e1);
>>>>
>>>> however, at least in 8.1, that doesn't use the indexes on the start_date
>>>> and end_date. The shortcut above does use those indexes and is nice and
>>>> fast.
>>>>
>>>> You should test and see if 8.3 or 8.4 will use the indexes for OVERLAPS
>>>> though...

- --

Jon T Erdman

Chief Information Officer voice: (210) 400-5717
Progressive Practice, Inc. jon(at)progressivepractice(dot)com
P.O. Box 17288 www.progressivepractice.com
Rochester, NY 14617

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkshX3gACgkQRAk1+p0GhSHy4QCdH7jvcQUVlaATLdD2GXeqSEsC
gsAAn1KkrHcfNuyBrQqWONWEFNYM3c12
=Ux3W
-----END PGP SIGNATURE-----


From: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
To:
Cc: austinpug(at)postgresql(dot)org
Subject: Re: Finding overlapping records
Date: 2010-02-26 04:41:16
Message-ID: 4B8750EC.8030308@thewickedtribe.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 순위SQL : Postg토토 사이트 순위SQL 메일 링리스트 : 2010-02-26 이후 AustinPug 04:41

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Some additional clarification from a email directly to me...
- --

Jon T Erdman (aka StuckMojo)
PostgreSQL Zealot

- -------- Original Message --------
Subject: Re: in response to finding overlapping records
Date: Wed, 30 Dec 2009 16:46:42 -0500 (EST)
From: Michael Alaimo <malaimo(at)sesda2(dot)com>
To: Jon Erdman <jon(at)progressivepractice(dot)com>
References: <48435(dot)128(dot)183(dot)191(dot)6(dot)1262201660(dot)squirrel(at)mail(dot)sesda2(dot)com>
<4B3BC174(dot)3040206(at)progressivepractice(dot)com>

> Michael Alaimo wrote:
>> Hello Jon,
>>
>> I was reading the mail archives located here:
>> http://archives.postgresql.org/austinpug/2009-12/msg00016.php.
>>
>> I cannot seem to get the same results with your query as with overlaps.
>> Are you sure thats what was in the book SQL for smarties?
>>
>> Just checking. I am having a really difficult time with overlaps and no
>> index support..... My queries are dragging.
>
> Ummm...I may have done it slightly different than OVERLAPS as far as
> inclusive vs exclusive (i.e. >= vs >)...let me check...actually it looks
> like I did it right. Could you send me your table definitions and the
> two versions of the queries you're using? As well as an example of the
> differing results?
>
> Hmm, upon further inspection, he does mention that: "please remember
> that the BETWEEN predicate will include the end point of an interval,
> and the OVERLAPS predicate will not."
>
> He also says: The result of the <overlaps predicate> is formally defined
> as the result of the following expression:
>
> (s1 > s2 AND NOT (s1 >= t2 AND t1 >= t2))
> OR (s2 > s1 AND NOT (s2 >= t1 AND t2 >= t1))
> OR (s1 = s2 AND (t1 <> t2 OR t1 = t2))
>
> where s1 and s2 are the starting times of the two time periods, and t1
> and t2 are their termination times.
> --
>
> Jon T Erdman
>
> Chief Information Officer voice: (210) 400-5717
> Progressive Practice, Inc. jon(at)progressivepractice(dot)com
> P.O. Box 17288 www.progressivepractice.com
> Rochester, NY 14617
>
>
>
>

Hello Jon,

Thanks for the reply! Your more in depth solution helped me use your
solution posted on the web. It works like a charm now. Its all fast just
like you said :)

Much thanks.

Mike

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAkuHUOwACgkQRAk1+p0GhSFd7ACfQdILgyC6gRYLP2lfK/n6VwWl
zrAAniABEmsS636nkujwrBVca/zMSFiZ
=PneT
-----END PGP SIGNATURE-----