Re: [PROPOSAL] Improvements of Hunspell dictionaries support

Lists: pgsql-hackers
From: Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [PATCH] we have added support for box type in SP-GiST index
Date: 2015-10-31 20:49:54
Message-ID: 56352972.9020608@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 핫SQL :

Hello, Hacker.

* [PATCH] add a box index to sp-gist

We have extended sp-gist with an index that keeps track of boxes

We have used ideas underlying sp-gist range implementation to
represent 2D boxes as points in 4D space. We use quad tree
analogue, but in 4-dimensional space. We call this tree q4d. Each
node of this tree is a box (a point in 4D space) which splits space
in 16 hyperrectangles.

Rationale: r-tree assumes that boxes we're trying to index don't
overlap much. When this assumption fails, r-tree performs badly,
while our proposal to represent a rectangle as a point in 4D space
solves this problem.

NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

During implementation of box index for sp-gist we saw that we only
keep rectangles, but to determine traversal direction we may need
to know boundaries of a hyperrectangle. So we calculate them
on the fly and store them in traversalValue, available
when traversing child nodes, because otherwise we would't be able to
calculate them from inside the inner_consistent function call.

This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

During traversal, local context is
insufficient to pick traversal direction. As of now, this is worked
around with the help of reconstructValue. reconstructValue only
stores data of the same type as a tree node, that is, a range.

We have written a patch that calculates auxillary values and makes
them accessible during traversal.

We then use traversalValue in a new box index and change
range_spgist.c to use it in place of reconstructValue.

NB: apply this patch after traversalValue patch.

Attachment Content-Type Size
traversalValue.patch text/plain 6.4 KB
q4d.patch text/plain 31.8 KB
range.patch text/plain 2.5 KB

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2015-10-31 21:15:40
Message-ID: CAF4Au4zy7S_bvFcU+W_e_R9UZKJ7kzRQUNHix8xFYYmzjMY_xQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: 503 토토 베이 페치 실패

On Sat, Oct 31, 2015 at 9:49 PM, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru
> wrote:

> Hello, Hacker.
>
> * [PATCH] add a box index to sp-gist
>
> We have extended sp-gist with an index that keeps track of boxes
>
> We have used ideas underlying sp-gist range implementation to
> represent 2D boxes as points in 4D space. We use quad tree
> analogue, but in 4-dimensional space. We call this tree q4d. Each
> node of this tree is a box (a point in 4D space) which splits space
> in 16 hyperrectangles.
>
> Rationale: r-tree assumes that boxes we're trying to index don't
> overlap much. When this assumption fails, r-tree performs badly,
> while our proposal to represent a rectangle as a point in 4D space
> solves this problem.
>
> NB: the index uses traversalValue introduced in a separate patch.
>
> * [PATCH] add traversalValue in sp-gist
>
> During implementation of box index for sp-gist we saw that we only
> keep rectangles, but to determine traversal direction we may need
> to know boundaries of a hyperrectangle. So we calculate them
> on the fly and store them in traversalValue, available
> when traversing child nodes, because otherwise we would't be able to
> calculate them from inside the inner_consistent function call.
>
> This patch was written by Teodor Sigaev.
>
> * [PATCH] change reconstructValue -> traversalValue in range_spgist.c
>
> During traversal, local context is
> insufficient to pick traversal direction. As of now, this is worked
> around with the help of reconstructValue. reconstructValue only
> stores data of the same type as a tree node, that is, a range.
>
> We have written a patch that calculates auxillary values and makes
> them accessible during traversal.
>
> We then use traversalValue in a new box index and change
> range_spgist.c to use it in place of reconstructValue.
>
> NB: apply this patch after traversalValue patch.
>
>
Did you forget to show us performance numbers ?

>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-01-04 19:10:52
Message-ID: 20160104191052.GA187050@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

CC'ing Teodor because he's author of one of the patches.

Alexander Lebedev wrote:
> Hello, Hacker.

So, can we have a more thorough explanation of what is the point of this
patch? If it is supposed to improve the performance of searching for
boxes, can we see measurements from some benchmark? Do the patches
still apply, or do you need to rebase? If so, please post an updated
version.

I'm setting this patch as Waiting on Author.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Date: 2016-01-09 02:38:35
Message-ID: 20160109023835.GA670563@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Artur Zakirov wrote:

> Now almost all dictionaries are loaded into PostgreSQL. But the da_dk
> dictionary does not load. I see the following error:
>
> ERROR: invalid regular expression: quantifier operand invalid
> CONTEXT: line 439 of configuration file
> "/home/artur/progs/pgsql/share/tsearch_data/da_dk.affix": "SFX 55 0 s
> +GENITIV
>
> If you open the affix file in editor you can see that there is incorrect
> format of the affix 55 in 439 line (screen1.png):

[ another email ]

> I also had implemented a patch that fixes an error from the e-mail
> http://www.postgresql.org/message-id/562E1073.8030805@postgrespro.ru
> This patch just ignore that error.

I think it's a bad idea to just ignore these syntax errors. This affix
file is effectively corrupt, after all, so it seems a bad idea that we
need to cope with it. I think it would be better to raise the error
normally and instruct the user to fix the file; obviously it's better if
the upstream provider of the file fixes it.

Now, if there is proof somewhere that the file is correct, then the code
must cope in some reasonable way. But in any case I don't think this
change is acceptable ... it can only cause pain, in the long run.

> *** 429,443 **** NIAddAffix(IspellDict *Conf, int flag, char flagflags, const char *mask, const c
> err = pg_regcomp(&(Affix->reg.regex), wmask, wmasklen,
> REG_ADVANCED | REG_NOSUB,
> DEFAULT_COLLATION_OID);
> if (err)
> ! {
> ! char errstr[100];
> !
> ! pg_regerror(err, &(Affix->reg.regex), errstr, sizeof(errstr));
> ! ereport(ERROR,
> ! (errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
> ! errmsg("invalid regular expression: %s", errstr)));
> ! }
> }
>
> Affix->flagflags = flagflags;
> --- 429,437 ----
> err = pg_regcomp(&(Affix->reg.regex), wmask, wmasklen,
> REG_ADVANCED | REG_NOSUB,
> DEFAULT_COLLATION_OID);
> + /* Ignore regular expression error and do not add wrong affix */
> if (err)
> ! return;
> }
>
> Affix->flagflags = flagflags;

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Date: 2016-01-09 17:42:05
Message-ID: 5691466D.4020807@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.01.2016 05:38, Alvaro Herrera wrote:
> Artur Zakirov wrote:
>
>> Now almost all dictionaries are loaded into PostgreSQL. But the da_dk
>> dictionary does not load. I see the following error:
>>
>> ERROR: invalid regular expression: quantifier operand invalid
>> CONTEXT: line 439 of configuration file
>> "/home/artur/progs/pgsql/share/tsearch_data/da_dk.affix": "SFX 55 0 s
>> +GENITIV
>>
>> If you open the affix file in editor you can see that there is incorrect
>> format of the affix 55 in 439 line (screen1.png):
>
> [ another email ]
>
>> I also had implemented a patch that fixes an error from the e-mail
>> http://www.postgresql.org/message-id/562E1073.8030805@postgrespro.ru
>> This patch just ignore that error.
> I think it's a bad idea to just ignore these syntax errors. This affix
> file is effectively corrupt, after all, so it seems a bad idea that we
> need to cope with it. I think it would be better to raise the error
> normally and instruct the user to fix the file; obviously it's better if
> the upstream provider of the file fixes it.
>
> Now, if there is proof somewhere that the file is correct, then the code
> must cope in some reasonable way. But in any case I don't think this
> change is acceptable ... it can only cause pain, in the long run.
This error is raised in Danish dictionary because of erroneous entry in
the .affix file. I sent a bug-report to developer. He fixed this bug.
Corrected dictionary can be downloaded from LibreOffice site.

I undo the changes and the error will be raised. I will update the patch
soon.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-01-28 00:04:48
Message-ID: 20160128000448.GA712457@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 커뮤니티SQL

Alexander Lebedev wrote:
> Hello, Hacker.
>
> * [PATCH] add a box index to sp-gist

I closed this patch as "returned with feedback" because the author
didn't reply in three months.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Date: 2016-01-28 11:19:02
Message-ID: 20160128111902.GA725449@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg롤 토토SQL :

Artur Zakirov wrote:

> I undo the changes and the error will be raised. I will update the patch
> soon.

I don't think you ever did this. I'm closing it now, but it sounds
useful stuff so please do resubmit for 2016-03.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Date: 2016-01-28 11:50:24
Message-ID: 56AA0080.70606@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28.01.2016 14:19, Alvaro Herrera wrote:
> Artur Zakirov wrote:
>
>> I undo the changes and the error will be raised. I will update the patch
>> soon.
>
> I don't think you ever did this. I'm closing it now, but it sounds
> useful stuff so please do resubmit for 2016-03.
>

I'm working on the patch. I wanted to send this changes after all changes.

This version of the patch has a top-level comment. Another comments I
will provides soon.

Also this patch has some changes with ternary operators.

> I don't think you ever did this. I'm closing it now, but it sounds
> useful stuff so please do resubmit for 2016-03.

Moved to next CF.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
hunspell_dict_v5.patch text/x-patch 29.0 KB

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-02-15 15:29:13
Message-ID: 56C1EEC9.2090406@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It's very pity but author is not able to continue work on this patch, and I
would like to raise this flag.

I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in SP-GiST tree
walk. Unlike to recostructedValue it could be just pointer, it havn't to be a
regular pgsql type. Also, it could be used independly from reconstructedValue.
This patch is used both following attached patches.

range patch just switchs using reconstructedValue to traversalValue in range
opclasses. reconstructedValue was used just because it was an acceptable
workaround in case of range type. Range opclase stores a full value in leafs and
doesn't need to use reconstructedValue to return tuple in index only scan.
See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an
observation, range opclass thinks about one-dimensional ranges as 2D points.
Following this idea, we can think that 2D box (what is 2 points or 4 numbers)
could represent a 4D point. We hoped that this approach will be much more
effective than traditional R-Tree in case of many overlaps in box's collection.
Performance results are coming shortly.

In near future (say, tommorrow) I hope to suggest a opclass over polygons based
on this approach.

Alexander Lebedev wrote:
> Hello, Hacker.
>
> * [PATCH] add a box index to sp-gist
>
> We have extended sp-gist with an index that keeps track of boxes
>
> We have used ideas underlying sp-gist range implementation to
> represent 2D boxes as points in 4D space. We use quad tree
> analogue, but in 4-dimensional space. We call this tree q4d. Each
> node of this tree is a box (a point in 4D space) which splits space
> in 16 hyperrectangles.
>
> Rationale: r-tree assumes that boxes we're trying to index don't
> overlap much. When this assumption fails, r-tree performs badly,
> while our proposal to represent a rectangle as a point in 4D space
> solves this problem.
>
> NB: the index uses traversalValue introduced in a separate patch.
>
> * [PATCH] add traversalValue in sp-gist
>
> During implementation of box index for sp-gist we saw that we only
> keep rectangles, but to determine traversal direction we may need
> to know boundaries of a hyperrectangle. So we calculate them
> on the fly and store them in traversalValue, available
> when traversing child nodes, because otherwise we would't be able to
> calculate them from inside the inner_consistent function call.
>
> This patch was written by Teodor Sigaev.
>
> * [PATCH] change reconstructValue -> traversalValue in range_spgist.c
>
> During traversal, local context is
> insufficient to pick traversal direction. As of now, this is worked
> around with the help of reconstructValue. reconstructValue only
> stores data of the same type as a tree node, that is, a range.
>
> We have written a patch that calculates auxillary values and makes
> them accessible during traversal.
>
> We then use traversalValue in a new box index and change
> range_spgist.c to use it in place of reconstructValue.
>
> NB: apply this patch after traversalValue patch.
>
>
>
>

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
traversalValue-1.patch.gz application/x-gzip 1.9 KB
range-1.patch.gz application/x-gzip 1.1 KB
q4d-1.patch.gz application/x-gzip 6.9 KB

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-02-29 20:17:39
Message-ID: CAF4Au4zxd2XOV0A__FU7xoHxSiwJzm1z2xhs-FFaT1DzB9ub3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 캔SQL :

On Mon, Feb 15, 2016 at 6:29 PM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

> It's very pity but author is not able to continue work on this patch, and
> I would like to raise this flag.
>
> I'd like to add some comments about patches:
>
> traversalValue patch adds arbitrary value assoсiated with branch in
> SP-GiST tree walk. Unlike to recostructedValue it could be just pointer, it
> havn't to be a regular pgsql type. Also, it could be used independly from
> reconstructedValue. This patch is used both following attached patches.
>
> range patch just switchs using reconstructedValue to traversalValue in
> range opclasses. reconstructedValue was used just because it was an
> acceptable workaround in case of range type. Range opclase stores a full
> value in leafs and doesn't need to use reconstructedValue to return tuple
> in index only scan.
> See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us
>
> q4d patch implements index over boxes using SP-GiST. Basic idea was an
> observation, range opclass thinks about one-dimensional ranges as 2D points.
> Following this idea, we can think that 2D box (what is 2 points or 4
> numbers) could represent a 4D point. We hoped that this approach will be
> much more effective than traditional R-Tree in case of many overlaps in
> box's collection.
> Performance results are coming shortly.
>

I made some benchmarks using two real-life datasets. Streets data set is
consists of 3691620 MBRs of russian streets with not much overlaps. As
expected, spgist is slower than rtree (gist), because of much bigger
number of blocks readed. Build time, however, is faster for spgist than
rtree (~ 1.3 times).

Bounds data set is consists of 7803499 MBRs of some objects and are highly
overlapped, see attached picture (crop with center in Moscow) of boxes.
Create index time (ms):
spgist: 47036.051
rtree: 68491.242

Size:
spgist: 505 MB
rtree: 663 MB
heap: 871 MB

Let's consider a small box in downtown of Moscow: box
'(37.607164,55.756489), (37.607797,55.756725)'.
&& оператор (returns 23958 boxes) :
spgist: 14.649
rtree: 23.715
seqscan: 924.344

@> operator (returns 23868 boxes):

spgist: 14.113
rtree: 26.853
seqscan: 909.147

<@ operator (returns 0 boxes):
spgist: 0..268 ms
rtree: 12.563

My conclusion is that for "spagetti" data new opclass for spgist is good to
have - it's smaller and faster (both, created index and search) that rtree
(gist based).

>
> In near future (say, tommorrow) I hope to suggest a opclass over polygons
> based on this approach.
>
>
Yes, it'd be interested.

>
> Alexander Lebedev wrote:
>
>> Hello, Hacker.
>>
>> * [PATCH] add a box index to sp-gist
>>
>> We have extended sp-gist with an index that keeps track of boxes
>>
>> We have used ideas underlying sp-gist range implementation to
>> represent 2D boxes as points in 4D space. We use quad tree
>> analogue, but in 4-dimensional space. We call this tree q4d. Each
>> node of this tree is a box (a point in 4D space) which splits space
>> in 16 hyperrectangles.
>>
>> Rationale: r-tree assumes that boxes we're trying to index don't
>> overlap much. When this assumption fails, r-tree performs badly,
>> while our proposal to represent a rectangle as a point in 4D space
>> solves this problem.
>>
>> NB: the index uses traversalValue introduced in a separate patch.
>>
>> * [PATCH] add traversalValue in sp-gist
>>
>> During implementation of box index for sp-gist we saw that we only
>> keep rectangles, but to determine traversal direction we may need
>> to know boundaries of a hyperrectangle. So we calculate them
>> on the fly and store them in traversalValue, available
>> when traversing child nodes, because otherwise we would't be able to
>> calculate them from inside the inner_consistent function call.
>>
>> This patch was written by Teodor Sigaev.
>>
>> * [PATCH] change reconstructValue -> traversalValue in range_spgist.c
>>
>> During traversal, local context is
>> insufficient to pick traversal direction. As of now, this is worked
>> around with the help of reconstructValue. reconstructValue only
>> stores data of the same type as a tree node, that is, a range.
>>
>> We have written a patch that calculates auxillary values and makes
>> them accessible during traversal.
>>
>> We then use traversalValue in a new box index and change
>> range_spgist.c to use it in place of reconstructValue.
>>
>> NB: apply this patch after traversalValue patch.
>>
>>
>>
>>
>>
> --
> Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
> WWW:
> http://www.sigaev.ru/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Attachment Content-Type Size
11148387_10207592076787467_3791931596034486026_o.jpg image/jpeg 488.8 KB

From: David Steele <david(at)pgmasters(dot)net>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <obartunov(at)gmail(dot)com>
Cc: Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-14 18:01:52
Message-ID: 56E6FC90.5000800@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2/15/16 10:29 AM, Teodor Sigaev wrote:

> It's very pity but author is not able to continue work on this patch,
> and I would like to raise this flag.
>
> I'd like to add some comments about patches:
>
> traversalValue patch adds arbitrary value assoсiated with branch in
> SP-GiST tree walk. Unlike to recostructedValue it could be just pointer,
> it havn't to be a regular pgsql type. Also, it could be used independly
> from reconstructedValue. This patch is used both following attached
> patches.
>
> range patch just switchs using reconstructedValue to traversalValue in
> range opclasses. reconstructedValue was used just because it was an
> acceptable workaround in case of range type. Range opclase stores a full
> value in leafs and doesn't need to use reconstructedValue to return
> tuple in index only scan.
> See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us
>
> q4d patch implements index over boxes using SP-GiST. Basic idea was an
> observation, range opclass thinks about one-dimensional ranges as 2D
> points.
> Following this idea, we can think that 2D box (what is 2 points or 4
> numbers) could represent a 4D point. We hoped that this approach will be
> much more effective than traditional R-Tree in case of many overlaps in
> box's collection.
> Performance results are coming shortly.

It appears that the issues raised in this thread have been addressed but
the patch still has not gone though a real review.

Anybody out there willing to take a crack at a review? All three
patches apply (with whitespace issues).

Thanks,
--
-David
david(at)pgmasters(dot)net


From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>, Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-15 08:13:13
Message-ID: CAF4Au4yG=QHzDDK-5fJR1zSD8_J03LuiPvGT67Ao9e42Mpt9Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg범퍼카 토토SQL

On Mon, Mar 14, 2016 at 9:01 PM, David Steele <david(at)pgmasters(dot)net> wrote:

> On 2/15/16 10:29 AM, Teodor Sigaev wrote:
>
> It's very pity but author is not able to continue work on this patch,
>> and I would like to raise this flag.
>>
>> I'd like to add some comments about patches:
>>
>> traversalValue patch adds arbitrary value assoсiated with branch in
>> SP-GiST tree walk. Unlike to recostructedValue it could be just pointer,
>> it havn't to be a regular pgsql type. Also, it could be used independly
>> from reconstructedValue. This patch is used both following attached
>> patches.
>>
>> range patch just switchs using reconstructedValue to traversalValue in
>> range opclasses. reconstructedValue was used just because it was an
>> acceptable workaround in case of range type. Range opclase stores a full
>> value in leafs and doesn't need to use reconstructedValue to return
>> tuple in index only scan.
>> See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us
>>
>> q4d patch implements index over boxes using SP-GiST. Basic idea was an
>> observation, range opclass thinks about one-dimensional ranges as 2D
>> points.
>> Following this idea, we can think that 2D box (what is 2 points or 4
>> numbers) could represent a 4D point. We hoped that this approach will be
>> much more effective than traditional R-Tree in case of many overlaps in
>> box's collection.
>> Performance results are coming shortly.
>>
>
> It appears that the issues raised in this thread have been addressed but
> the patch still has not gone though a real review.
>
> Anybody out there willing to take a crack at a review? All three patches
> apply (with whitespace issues).
>

Emre Hasegeli will review the patch.

>
> Thanks,
> --
> -David
> david(at)pgmasters(dot)net
>


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Oleg Bartunov <obartunov(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-18 15:47:36
Message-ID: CAE2gYzx+rwkH94ODU3nDDRPCMnu96-skQZO+=EMTvofXxU79WQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토 베트맨SQL

Here are my first comments. I haven't read the actual index
implementation, yet.

I think traversal value is a useful addition. It is nice that the
implementation for the range types is also changed. My questions
about them are:

Do reconstructedValues is required now? Wouldn't it be cleaner to use
the new field on the prefix tree implementation, too?

We haven't had specific memory context for reconstructedValues. Why
is it required for the new field?

> + Memory for <structfield>traversalValues</> should be allocated in
> + the default context, but make sure to switch to
> + <structfield>traversalMemoryContext</> before allocating memory
> + for pointers themselves.

This sentence is not understandable. Shouldn't it say "the memory
should *not* be allocated in the default context"? What are the
"pointers themselves"?

The box opclass is broken because of the floating point arithmetics of
the box type. You can reproduce it with the attached script. We
really need to fix the geometric types, before building more on them.
They fail to serve the only purpose they are useful for, demonstrating
features.

It think the opclass should support the cross data type operators like
box @> point. Ideally we should do it by using multiple opclasses in
an opfamily. The floating problem will bite us again in here, because
some of the operators are not using FP macros.

The tests check not supported operator "@". It should be "<@". It is
nice that the opclass doesn't support long deprecated operators.

We needs tests for the remaining operators and for adding new values
to the index. I am not sure create_index.sql is the best place for
them. Maybe we should use the test scripts of "box".

> + #define LT -1
> + #define GT 1
> + #define EQ 0

Using these numbers is a very well established pattern. I don't think
macros make the code any more readable.

> + /* SP-GiST API functions */
> + Datum spg_box_quad_config(PG_FUNCTION_ARGS);
> + Datum spg_box_quad_choose(PG_FUNCTION_ARGS);
> + Datum spg_box_quad_picksplit(PG_FUNCTION_ARGS);
> + Datum spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
> + Datum spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);

I guess they should go to the header file.

Attachment Content-Type Size
box-index-test.sql application/octet-stream 1.8 KB

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: emre(at)hasegeli(dot)com, Oleg Bartunov <obartunov(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-18 17:24:35
Message-ID: 56EC39D3.30708@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Do reconstructedValues is required now? Wouldn't it be cleaner to use
> the new field on the prefix tree implementation, too?
reconstructedValues is needed to reconctruct full indexed value and should match
to type info indexed column

>
> We haven't had specific memory context for reconstructedValues. Why
> is it required for the new field?
because pg knows type of reconstructedValues (see above why) but traversalValue
just a void*, SP-GiST core doesn't knnow how to free memory of allocated for it.

>
>> + Memory for <structfield>traversalValues</> should be allocated in
>> + the default context, but make sure to switch to
>> + <structfield>traversalMemoryContext</> before allocating memory
>> + for pointers themselves.
>
> This sentence is not understandable. Shouldn't it say "the memory
> should *not* be allocated in the default context"? What are the
> "pointers themselves"?
Clarified, I hope

>
> The box opclass is broken because of the floating point arithmetics of
> the box type. You can reproduce it with the attached script. We
hope, fixed. At least, seems, syncronized with seqscan

> really need to fix the geometric types, before building more on them.
> They fail to serve the only purpose they are useful for, demonstrating
> features.
agree, but it's not a deal for this patch

>
> It think the opclass should support the cross data type operators like
> box @> point. Ideally we should do it by using multiple opclasses in
> an opfamily. The floating problem will bite us again in here, because
> some of the operators are not using FP macros.
Again, agree. But I suggest to do it by separate patch.

>
> The tests check not supported operator "@". It should be "<@". It is
> nice that the opclass doesn't support long deprecated operators.
fixed

>> + #define LT -1
>> + #define GT 1
>> + #define EQ 0
>
> Using these numbers is a very well established pattern. I don't think
> macros make the code any more readable.
fixed

>
>> + /* SP-GiST API functions */
>> + Datum spg_box_quad_config(PG_FUNCTION_ARGS);
>> + Datum spg_box_quad_choose(PG_FUNCTION_ARGS);
>> + Datum spg_box_quad_picksplit(PG_FUNCTION_ARGS);
>> + Datum spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
>> + Datum spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);
>
> I guess they should go to the header file.
Remove them because they are presented in auto-generated file
./src/include/utils/builtins.h

range patch is unchanged, but still attached to keep them altogether.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
q4d-2.patch.gz application/x-gzip 7.1 KB
traversalValue-2.patch.gz application/x-gzip 1.9 KB
range-1.patch.gz application/x-gzip 1.1 KB

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-21 09:20:19
Message-ID: CAE2gYzzue-bQihDe1WDhqjf0_DkOnRFD1aC1f1gypOnNEQQR5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 추천SQL

Here are my comments about the operator class implementation:

> + * implementation of quad-4d tree over boxes for SP-GiST.

Isn't the whole thing actually 3D?

> + * For example consider the case of intersection.

No need for a new line after this.

> + * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner nodes.

I couldn't get the term 4D point. Maybe it means that we are using
box datatype as the prefix, but we are not treating them as boxes.

> + * We use traversalValue to calculate quadrant bounds from parent's quadrant
> + * bounds.

I couldn't understand this sentence. We are using the parent's prefix
and the quadrant to generate the traversalValue. I think this is the
crucial part of the opclass. It would be nice to explain it more
clearly.

> + * Portions Copyright (c) 1996-2015, 토토 사이트

2016.

> + * src/backend/utils/adt/boxtype_spgist.c

Maybe we should name this file as geo_spgist.c to support other types
in the future.

> + #include "utils/builtins.h";

Extra ;.

> + #include "utils/datum.h"

I think this is unnecessary.

> + /* InfR type implements doubles and +- infinity */
> + typedef struct
> + {
> + int infFlag;
> + double val;
> + } InfR;

Why do we need this? Can't we store infinity in "double"?

> + /* wrap double to InfR */
> + static InfR
> + toInfR(double v, InfR * r)
> + {
> + r->infFlag = NotInf;
> + r->val = v;
> + }

This isn't returning the value.

> + typedef struct
> + {
> + Range range_x;
> + Range range_y;
> + } Rectangle;

Wouldn't it be simpler to just using BOX instead of this?

> + static void
> + evalIRangeBox(const IRangeBox *range_box, const Range *range, const int half1,
> + const int half2, IRangeBox *new_range_box)
>
> + static void
> + evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
> + const uint8 quadrant, IRectBox * new_rect_box)
>
> + inline static void
> + initializeUnboundedBox(IRectBox * rect_box)

Wouldn't it be simpler to palloc and return the value on those functions?

> + static int
> + intersect2D(const Range * range, const IRangeBox * range_box)

Wouldn't it be better to return "bool" on those functions.

> + return ((p1 >= 0) && (p2 <= 0));

Extra parentheses.

> + i const spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
> + i spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);

Many variables are defined with "const". I am not sure they are all
that doesn't change. If it applies to the pointer, "out" should also
not change in here. Am I wrong?

> + /*
> + * Begin. This block evaluates the median of coordinates of boxes
> + */

I would rather explain what the function does on the function header.

> + memcpy(new_rect_box, rect_box, sizeof(IRectBox));

Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.

> + for (i = 0; flag && i < in->nkeys && flag; i++)

It checks flag two times.

> + boxPointerToRectangle(
> + DatumGetBoxP(in->scankeys[i].sk_argument),
> + p_query_rect
> + );

I don't think this matches the project coding style.

> + int flag = 1,

Wouldn't it be better as "bool"?

The regression tests for the remaining operators are still missing.


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: emre(at)hasegeli(dot)com
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-21 16:57:10
Message-ID: 56F027E6.2070303@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> + * implementation of quad-4d tree over boxes for SP-GiST.
> Isn't the whole thing actually 3D?
No. The idea if this work is a representation of 2d box as 4d point. Quad means
quadrant of 2d plane. Originally such kind of tree was developed for 2d point,
and each node of tree splits plane on 4 quadrant. For 3d tree each node splits
space for 8 "quadrants", 4d - 16 ones.

>> + * For example consider the case of intersection.
> No need for a new line after this.
fixed

>> + * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner nodes.
> I couldn't get the term 4D point. Maybe it means that we are using
> box datatype as the prefix, but we are not treating them as boxes.
exactly, we treat boxes as 4-dimentional points.

>
>> + * We use traversalValue to calculate quadrant bounds from parent's quadrant
>> + * bounds.
> I couldn't understand this sentence. We are using the parent's prefix
> and the quadrant to generate the traversalValue. I think this is the
> crucial part of the opclass. It would be nice to explain it more
> clearly.

I'll try to explain with two-dimensional example over points. ASCII-art:
|
|
1 | 2
|
-----------+-------------
|P
3 | 4
|
|

'+' with 'A' represents a centroid or, other words, point which splits plane for
4 quadrants and it strorend in parent node. 1,2,3,4 are labels of quadrants,
each labeling will be the same for all pictures and all centriods, and i will
not show them in pictures later to prevent too complicated images. Let we add C
- child node (and again, it will split plane for 4 quads):

| |
----+---------+---
X |B |C
| |
-----------+---------+---
|P |A
| |
|
|

A and B are points of intersection of lines. So, box PBCAis a bounding box for
points contained in 3-rd (see labeling above). For example X labeled point is
not a descendace of child node with centroid C because it must be in branch of
1-st quad of parent node. So, each node (except root) will have a limitation in
its quadrant. To transfer that limitation the traversalValue is used.

To keep formatting I attached a txt file with this fragment.

>
>> + * Portions Copyright (c) 1996-2015, 토토 사이트
> 2016.
fixed

>
>> + * src/backend/utils/adt/boxtype_spgist.c
> Maybe we should name this file as geo_spgist.c to support other types
> in the future.
done

>> + #include "utils/builtins.h";
> Extra ;.
fixed

>
>> + #include "utils/datum.h"
> I think this is unnecessary.
removed

>
>> + /* InfR type implements doubles and +- infinity */
>> + typedef struct
>> + {
>> + int infFlag;
>> + double val;
>> + } InfR;
> Why do we need this? Can't we store infinity in "double"?
Hmm, you are right. fixed.

> This isn't returning the value.
removed

>
>> + typedef struct
>> + {
>> + Range range_x;
>> + Range range_y;
>> + } Rectangle;
>
> Wouldn't it be simpler to just using BOX instead of this?
Too much the same names in RectBox

>
>> + static void
>> + evalIRangeBox(const IRangeBox *range_box, const Range *range, const int half1,
>> + const int half2, IRangeBox *new_range_box)
>>
>> + static void
>> + evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
>> + const uint8 quadrant, IRectBox * new_rect_box)
>>
>> + inline static void
>> + initializeUnboundedBox(IRectBox * rect_box)
>
> Wouldn't it be simpler to palloc and return the value on those functions?
evalRangeBox() initializes part of RectBox, so, it could not palloc its result.
Other methods use the same signature just for consistency.

>
>> + static int
>> + intersect2D(const Range * range, const IRangeBox * range_box)
>
> Wouldn't it be better to return "bool" on those functions.
done

>
>> + return ((p1 >= 0) && (p2 <= 0));
> Extra parentheses.
removed

>
>> + i const spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
>> + i spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);
>
> Many variables are defined with "const". I am not sure they are all
> that doesn't change. If it applies to the pointer, "out" should also
> not change in here. Am I wrong?
No, changes

>
>> + /*
>> + * Begin. This block evaluates the median of coordinates of boxes
>> + */
> I would rather explain what the function does on the function header.
fixed

>
>> + memcpy(new_rect_box, rect_box, sizeof(IRectBox));
> Do we really need to copy the traversalValues on allTheSame case.
> Wouldn't it work if just the same value is passed for all of them.
> The search shouldn't continue after allTheSame case.
Seems, yes. spgist tree could contain a locng branches with allTheSame.

>
>> + for (i = 0; flag && i < in->nkeys && flag; i++)
> It checks flag two times.
fixed

>
>> + boxPointerToRectangle(
>> + DatumGetBoxP(in->scankeys[i].sk_argument),
>> + p_query_rect
>> + );
> I don't think this matches the project coding style.
fixed

>
>> + int flag = 1,
> Wouldn't it be better as "bool"?
fixed.

> The regression tests for the remaining operators are still missing.
Ooops, will be soon.

Attached all patches to simplify review. Thank you very much!

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
art.txt text/plain 1.1 KB
traversalValue-2.patch.gz application/x-gzip 1.9 KB
q4d-3.patch.gz application/x-gzip 6.9 KB
range-1.patch.gz application/x-gzip 1.1 KB

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: emre(at)hasegeli(dot)com, Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-21 19:38:49
Message-ID: CACjxUsMWTL1Zef1y+zQ8SHcKZApbk3vrCvu0f1tsEEjh66anSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 21, 2016 at 11:57 AM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

>> I couldn't get the term 4D point. Maybe it means that we are
>> using box datatype as the prefix, but we are not treating them
>> as boxes.
>
> exactly, we treat boxes as 4-dimentional points.

I'm not entirely sure I understand the terminology either, since I
tend to think of a *point* as having *zero* dimensions. Would it
perhaps be more accurate to say we are treating a 2-dimensional box
as a point in 4-dimensional space?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, emre(at)hasegeli(dot)com, Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-21 21:54:53
Message-ID: 6C555E82-AB18-4983-BEDC-08AA16C58ABD@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> On 21 Mar 2016, at 22:38, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
>
> On Mon, Mar 21, 2016 at 11:57 AM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>
>>> I couldn't get the term 4D point. Maybe it means that we are
>>> using box datatype as the prefix, but we are not treating them
>>> as boxes.
>>
>> exactly, we treat boxes as 4-dimentional points.
>
> I'm not entirely sure I understand the terminology either, since I
> tend to think of a *point* as having *zero* dimensions. Would it
> perhaps be more accurate to say we are treating a 2-dimensional box
> as a point in 4-dimensional space?

Or just say 4-d vector instead of 4-d point. Look like it will be enough rigorous.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, <emre(at)hasegeli(dot)com>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-21 22:47:16
Message-ID: 56F079F4.3070804@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/21/16 11:57 AM, Teodor Sigaev wrote:
> A and B are points of intersection of lines. So, box PBCAis a bounding
> box for points contained in 3-rd (see labeling above). For example X
> labeled point is not a descendace of child node with centroid C because
> it must be in branch of 1-st quad of parent node. So, each node (except
> root) will have a limitation in its quadrant. To transfer that
> limitation the traversalValue is used.

Isn't this basically the same thing that the cube contrib module does?
(Which has the added benefit of kNN-capable operators).

If that's true then ISTM it'd be better to work on pulling cube's
features into box?

If it's not true, I'm still wondering if there's enough commonality here
that we should pull cube into core...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, emre(at)hasegeli(dot)com, Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-22 00:41:58
Message-ID: D4148745-6912-4DFC-9278-F0B7666E37A2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> On 22 Mar 2016, at 01:47, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> wrote:
>
> On 3/21/16 11:57 AM, Teodor Sigaev wrote:
>> A and B are points of intersection of lines. So, box PBCAis a bounding
>> box for points contained in 3-rd (see labeling above). For example X
>> labeled point is not a descendace of child node with centroid C because
>> it must be in branch of 1-st quad of parent node. So, each node (except
>> root) will have a limitation in its quadrant. To transfer that
>> limitation the traversalValue is used.
>
> Isn't this basically the same thing that the cube contrib module does? (Which has the added benefit of kNN-capable operators).

Cube and postgres own geometric types are indexed by building R-tree. While this is one of the most popular solutions it
degrades almost to linear search when bounding boxes for each index node overlaps a lot. This problem will arise when one will
try to index streets in the city with grid system — a lot of street's bounding boxes will just coincide with bounding box for whole city.

But that patch use totally different strategy for building index and do not suffer from above problem.

>
> If that's true then ISTM it'd be better to work on pulling cube's features into box?
>
> If it's not true, I'm still wondering if there's enough commonality here that we should pull cube into core…

There is also intarray module with very common functionality, but it also addressing different data pattern.

Optimal indexing and kNN strategy are very sensible to the data itself and if we want some general multidimensional search routines,
then I think none of the mentioned extensions is general enough. Cube barely applicable when dimensions number is higher than 10-20,
intarray performs bad on data with big sets of possible coordinates, this patch is also intended to help with specific, niche problem.

While people tends to use machine learning and regressions models more and more it is interesting to have some general n-dim indexing with kNN,
but I think it is different problem and should be solved in a different way.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, <emre(at)hasegeli(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-22 15:40:22
Message-ID: 56F16766.1090305@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토SQL

On 3/21/16 7:41 PM, Stas Kelvich wrote:
> While people tends to use machine learning and regressions models more and more it is interesting to have some general n-dim indexing with kNN,
> but I think it is different problem and should be solved in a different way.

I think one of the issues here is it's not very clear to someone without
a good amount of ML knowledge how these things relate. I hear "box' and
'cube' and think it's just a 2D vs 3D issue, and intarray isn't even on
the radar.

Maybe what's needed are actual vector and matrix types?

In any case, if you've got a good reason why box and cube should stay
separate then further discussion should happen in another thread.

BTW, if you haven't seen it, take a look at http://madlib.apache.org/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: emre(at)hasegeli(dot)com, Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-22 15:58:07
Message-ID: 56F16B8F.1060908@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg롤 토토SQL :

> tend to think of a *point* as having *zero* dimensions. Would it
> perhaps be more accurate to say we are treating a 2-dimensional box
> as a point in 4-dimensional space?
Exactly, sorry for ambiguity.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, emre(at)hasegeli(dot)com
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-22 16:08:10
Message-ID: 56F16DEA.2030905@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg와이즈 토토SQL


> Isn't this basically the same thing that the cube contrib module does? (Which
> has the added benefit of kNN-capable operators).
No, cube module introduces new type - N-dimensional box. And adds an index
support for it.

Current patch suggests non-traditional indexing technique for 2D boxes by
treating them as point in 4D space. With such representation it's became
possible to use quad-tree technique which:
1 supports only points to index
2 already supported by SP-GiST

Such technique provides some benefits in comparance with traditional R-Tree
which implemented in pg with a help GiST. See Oleg's message.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-24 14:56:59
Message-ID: CAE2gYzxwWbTcVrAgz_BGw2iXGWtWveJ-fZURwgmF8GMis0uTDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토SQL : Postg토토SQL

> + * boxtype_spgist.c

The names on the file header need to be changed, too.

> I'll try to explain with two-dimensional example over points. ASCII-art:
> |
> |
> 1 | 2
> |
> -----------+-------------
> |P
> 3 | 4
> |
> |
>
> '+' with 'A' represents a centroid or, other words, point which splits plane
> for 4 quadrants and it strorend in parent node. 1,2,3,4 are labels of
> quadrants, each labeling will be the same for all pictures and all
> centriods, and i will not show them in pictures later to prevent too
> complicated images. Let we add C - child node (and again, it will split
> plane for 4 quads):
>
>
> | |
> ----+---------+---
> X |B |C
> | |
> -----------+---------+---
> |P |A
> | |
> |
> |
>
> A and B are points of intersection of lines. So, box PBCAis a bounding box
> for points contained in 3-rd (see labeling above). For example X labeled
> point is not a descendace of child node with centroid C because it must be
> in branch of 1-st quad of parent node. So, each node (except root) will have
> a limitation in its quadrant. To transfer that limitation the traversalValue
> is used.
>
> To keep formatting I attached a txt file with this fragment.

Thank you for the explanation. Should we incorporate this with the patch.

>>> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development
>>> Group
>>
>> 2016.
>
> fixed

Not on the patch.

> + cmp_double(const double a, const double b)

Does this function necessary? We can just compare the doubles.

> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)

The "rectangle" variable in here should be renamed.

> + Assert(is_infinite(b) == 0);

This is failing on my test. You can reproduce with the script I have sent.

>> Wouldn't it be simpler to palloc and return the value on those functions?
>
> evalRangeBox() initializes part of RectBox, so, it could not palloc its
> result.
> Other methods use the same signature just for consistency.

I couldn't understand it. evalRangeBox() can palloc and return the
result. evalRectBox() can return the result palloc'ed by
evalRangeBox(). The caller wouldn't need to palloc anything.

>> Many variables are defined with "const". I am not sure they are all
>> that doesn't change. If it applies to the pointer, "out" should also
>> not change in here. Am I wrong?
>
> No, changes

I now read about "const". I am sorry for not being experienced in C.
The new version of the patch marks them as "const" by mistake.

I went through all other variables:

> + int r = is_infinite(a);
>
> + double x = *(double *) a;
> + double y = *(double *) b;
>
> + spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
>
> + spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
>
> + BOX *leafBox = DatumGetBoxP(in->leafDatum);

Shouldn't they be "const", too?

>>> + /*
>>> + * Begin. This block evaluates the median of coordinates of boxes
>>> + */
>>
>> I would rather explain what the function does on the function header.
>
> fixed

The "end" part of it is still there.

>> Do we really need to copy the traversalValues on allTheSame case.
>> Wouldn't it work if just the same value is passed for all of them.
>> The search shouldn't continue after allTheSame case.
>
> Seems, yes. spgist tree could contain a locng branches with allTheSame.

Does SP-GiST allows any node under allTheSame to not being allTheSame?
Not setting traversalValues for allTheSame worked fine with my test.

> + if (in->allTheSame)

Most of the things happening before this check is not used in there.
Shouldn't we move this to the top of the function?

> + out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);

This could go before allTheSame block.


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: emre(at)hasegeli(dot)com
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-24 17:45:40
Message-ID: 56F427C4.2020004@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> + * boxtype_spgist.c
> The names on the file header need to be changed, too.
Oops. fixed

>> I'll try to explain with two-dimensional example over points. ASCII-art:
> Thank you for the explanation. Should we incorporate this with the patch.
added

>> + cmp_double(const double a, const double b)
> Does this function necessary? We can just compare the doubles.
Yes, it compares with limited precision as it does by geometry operations.
Renamed to point actual arguments.

>
>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
> The "rectangle" variable in here should be renamed.
fixed

>
>> + Assert(is_infinite(b) == 0);
> This is failing on my test. You can reproduce with the script I have sent.
I didn't know:
# select '(1,inf)'::point;
point
---------
(1,inf)

fixed

>
>>> Wouldn't it be simpler to palloc and return the value on those functions?
>> evalRangeBox() initializes part of RectBox, so, it could not palloc its
>> result.
>> Other methods use the same signature just for consistency.
>
> I couldn't understand it. evalRangeBox() can palloc and return the
> result. evalRectBox() can return the result palloc'ed by
> evalRangeBox(). The caller wouldn't need to palloc anything.
evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into
RangeBox struct and free. Let both fucntion use the same interface.

> I went through all other variables:
>> + int r = is_infinite(a);
>> + double x = *(double *) a;
>> + double y = *(double *) b;
>> + spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
>> + spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
>> + BOX *leafBox = DatumGetBoxP(in->leafDatum);
> Shouldn't they be "const", too?
They could. But it doesn't required. To be in consistent state I've removed
const modifier where possible

>
>>>> + /*
>>>> + * Begin. This block evaluates the median of coordinates of boxes
>>>> + */
>>>
>>> I would rather explain what the function does on the function header.
>>
>> fixed
> The "end" part of it is still there.
Oops again, fixed

>
>>> Do we really need to copy the traversalValues on allTheSame case.
>>> Wouldn't it work if just the same value is passed for all of them.
>>> The search shouldn't continue after allTheSame case.
>>
>> Seems, yes. spgist tree could contain a locng branches with allTheSame.
>
> Does SP-GiST allows any node under allTheSame to not being allTheSame?
No, SP-GiST doesn't allow that
> Not setting traversalValues for allTheSame worked fine with my test.
it works until allthesame branch contains only one inner node. Otherwise
traversalValue will be freeed several times, see spgWalk().
# select i as id, '1,2,3,4'::box as b into x from generate_series(1,1000000) i;
# create index ix on i using spgist (b);
# select count(*) from x where b && '1,2,3,4'::box; -- coredump
gdb:
#0 0x000000080143564a in thr_kill () from /lib/libc.so.7
#1 0x0000000801435636 in raise () from /lib/libc.so.7
#2 0x00000008014355b9 in abort () from /lib/libc.so.7
#3 0x0000000000a80739 in in_error_recursion_trouble () at elog.c:199
#4 0x0000000000abb748 in pfree (pointer=0x801e90868) at mcxt.c:1016
#5 0x000000000053330c in freeScanStackEntry (so=0x801e8d358,
stackEntry=0x801e935d8) at spgscan.c:47
#6 0x0000000000532cdb in spgWalk (index=0x801f1c588, so=0x801e8d358,
scanWholeIndex=1 '\001', storeRes=0x532d10 <storeBitm
...

>
>> + if (in->allTheSame)
> Most of the things happening before this check is not used in there.
> Shouldn't we move this to the top of the function?
yep, fixed

>
>> + out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);
> This could go before allTheSame block.
yep, fixed

I've attached all patches again. Thank you very much!

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
q4d-4.patch.gz application/x-gzip 7.4 KB
traversalValue-2.patch.gz application/x-gzip 1.9 KB
range-1.patch.gz application/x-gzip 1.1 KB

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-27 12:35:29
Message-ID: CAE2gYzxkO2KDHB3y5WTmz8Rbok=POGiUMFcK08chXapHPB_AYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: 503 배트맨 토토 페치

>>> I'll try to explain with two-dimensional example over points. ASCII-art:
>>
>> Thank you for the explanation. Should we incorporate this with the patch.
>
> added

I have worked on the comments of the patch. It is attached. I hope
it looks more clear than it was before.

>>> + cmp_double(const double a, const double b)
>>
>> Does this function necessary? We can just compare the doubles.
>
> Yes, it compares with limited precision as it does by geometry operations.
> Renamed to point actual arguments.

I meant that we could use FP macros directly instead of this function.
Doing so is also more correct. I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results. I have removed it on the attached version.

>>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
>>
>> The "rectangle" variable in here should be renamed.
>
> fixed

I found a bunch of those too and renamed for clarity. I have also
reordered the arguments of the helper functions to keep them
consistent.

> evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
> evalRangeBox() will palloc its result then we need to copy its result into
> RangeBox struct and free. Let both fucntion use the same interface.

I found it nicer to copy and edit the existing value than creating it
in two steps like this. It is also attached.

> it works until allthesame branch contains only one inner node. Otherwise
> traversalValue will be freeed several times, see spgWalk().

It just works, when traversalValues is not set. It is also attached.

I have also added the missing regression tests. While doing that I
noticed that some operators are missing and also added support for
them. It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much. I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist

Attachment Content-Type Size
q4d-5.patch application/octet-stream 44.5 KB
box-index-test.sql application/octet-stream 2.8 KB

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: emre(at)hasegeli(dot)com
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-30 16:00:36
Message-ID: 56FBF824.8000501@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토SQL

Thank you, pushed

Emre Hasegeli wrote:
>>>> I'll try to explain with two-dimensional example over points. ASCII-art:
>>>
>>> Thank you for the explanation. Should we incorporate this with the patch.
>>
>> added
>
> I have worked on the comments of the patch. It is attached. I hope
> it looks more clear than it was before.
>
>>>> + cmp_double(const double a, const double b)
>>>
>>> Does this function necessary? We can just compare the doubles.
>>
>> Yes, it compares with limited precision as it does by geometry operations.
>> Renamed to point actual arguments.
>
> I meant that we could use FP macros directly instead of this function.
> Doing so is also more correct. I haven't tried to produce the
> problem, but this function is not same as using the macros directly.
> For differences smaller that the epsilon, it can return different
> results. I have removed it on the attached version.
>
>>>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
>>>
>>> The "rectangle" variable in here should be renamed.
>>
>> fixed
>
> I found a bunch of those too and renamed for clarity. I have also
> reordered the arguments of the helper functions to keep them
> consistent.
>
>> evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
>> evalRangeBox() will palloc its result then we need to copy its result into
>> RangeBox struct and free. Let both fucntion use the same interface.
>
> I found it nicer to copy and edit the existing value than creating it
> in two steps like this. It is also attached.
>
>> it works until allthesame branch contains only one inner node. Otherwise
>> traversalValue will be freeed several times, see spgWalk().
>
> It just works, when traversalValues is not set. It is also attached.
>
> I have also added the missing regression tests. While doing that I
> noticed that some operators are missing and also added support for
> them. It is also attached with the updated version of my test script.
>
> I hope I haven't changed the patch too much. I have also pushed the
> changes here:
>
> https://github.com/hasegeli/postgres/commits/box-spgist
>

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-31 15:46:41
Message-ID: CAE2gYzz_HKmStLJR+XUVo-_hLALiJ4UGDq-aVM1ZZmZf86XGdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Thank you, pushed

Thank you for working on this.

I noticed that have overlooked this:

static RectBox *
-initRectBox()
+initRectBox(void)
{


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-04-23 14:41:18
Message-ID: 20160423144118.GA7280@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토SQL : Postg토토SQL

On Thu, Mar 31, 2016 at 05:46:41PM +0200, Emre Hasegeli wrote:
> > Thank you, pushed
>
> Thank you for working on this.
>
> I noticed that have overlooked this:
>
> static RectBox *
> -initRectBox()
> +initRectBox(void)
> {

Done, thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +