Re: Result set stability in queries sorted by column with non-unique values

Lists: pgsql-in-general
From: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
To: pgsql-in-general(at)postgresql(dot)org
Subject: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-15 11:40:30
Message-ID: CACmQ0_Kd76jtCF09UYTf=zVSvzsbizW+y=DKTWif0rpLKmduDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

Hi all,

I would like to know if postgresql is guaranteed to return the rows
always in the same order in the following case:

table "boxes"
-------------------
width | height
1 | 5
1 | 3
1 | 7
1 | 2
1 | 4

SELECT * FROM boxes ORDER BY width;

result:
1 | 5
1 | 3
1 | 7
1 | 2
1 | 4

In practice we've seen that postgresql returns the rows in some kind
of stably arbitrary order (which is expected because "width" contains
non unique values) but the subtle ambiguity here (which is important
for our application) is whether postresql is guaranteed to always
return the same arbitrary order in a stable fashion or if it can (even
theoretically) return an always different aribtrary order every time I
perform the query, say for example:

SELECT * FROM boxes ORDER BY width;
1 | 3
1 | 5
1 | 7
1 | 4
1 | 2

SELECT * FROM boxes ORDER BY width;
1 | 2
1 | 5
1 | 7
1 | 4
1 | 3

etc.

The only two relevant lines I've found in the docs
(http://www.postgresql.org/docs/current/interactive/queries-order.html)
are:

"A particular output ordering can only be guaranteed if the sort step
is explicitly chosen." which does not tell us in what order rows with
same sorting-key value are returned.

and

"If sorting is not chosen, the rows will be returned in an unspecified
order." which I would expect to be the case also when sorting is
chosen BUT the sorting-key values are always the same. However this
still does not tell us if the "unspecified order" is always the same
or not, ie. if it's stable. Another way to put this last point would
be: is "SELECT * FROM boxes LIMIT 3" guaranteed to always return the
same 3 rows (assuming I don't insert or delete new rows) or can
theoretically return any different 3 every time I query?

Thank you.


From: Niranjan <niranjan81(at)gmail(dot)com>
To: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
Cc: pgsql-in-general(at)postgresql(dot)org
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-15 14:54:07
Message-ID: CAMYjFn-BJrF_PCBDWyGzsPLLwQv2vpfW3NWVDd-+_7ydCxsr+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

Hi,
I am sure the variation depends on the query plan decided by the engine at
the time of execution, in short, AFAIK (as far as I know) the answer is No.
There is no guarantee that the rows will be returned in the exact order.

Some of the possible answers to be further explored would be
A) if we want the data to be always randomly ordered explore function
random(), or is it rand() I guess? to be included in the query.
B) if we want the data to be arbitrarily ordered but maintain the order for
each query execution, save a randomly ordered same number of records in a
different table and then write a joined query.

I hope it helps.

Thanks & Regards
Niranjan D. Pandit
On 15-Apr-2015 5:11 pm, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com> wrote:

> Hi all,
>
> I would like to know if postgresql is guaranteed to return the rows
> always in the same order in the following case:
>
> table "boxes"
> -------------------
> width | height
> 1 | 5
> 1 | 3
> 1 | 7
> 1 | 2
> 1 | 4
>
> SELECT * FROM boxes ORDER BY width;
>
> result:
> 1 | 5
> 1 | 3
> 1 | 7
> 1 | 2
> 1 | 4
>
> In practice we've seen that postgresql returns the rows in some kind
> of stably arbitrary order (which is expected because "width" contains
> non unique values) but the subtle ambiguity here (which is important
> for our application) is whether postresql is guaranteed to always
> return the same arbitrary order in a stable fashion or if it can (even
> theoretically) return an always different aribtrary order every time I
> perform the query, say for example:
>
> SELECT * FROM boxes ORDER BY width;
> 1 | 3
> 1 | 5
> 1 | 7
> 1 | 4
> 1 | 2
>
> SELECT * FROM boxes ORDER BY width;
> 1 | 2
> 1 | 5
> 1 | 7
> 1 | 4
> 1 | 3
>
> etc.
>
> The only two relevant lines I've found in the docs
> (http://www.postgresql.org/docs/current/interactive/queries-order.html)
> are:
>
> "A particular output ordering can only be guaranteed if the sort step
> is explicitly chosen." which does not tell us in what order rows with
> same sorting-key value are returned.
>
> and
>
> "If sorting is not chosen, the rows will be returned in an unspecified
> order." which I would expect to be the case also when sorting is
> chosen BUT the sorting-key values are always the same. However this
> still does not tell us if the "unspecified order" is always the same
> or not, ie. if it's stable. Another way to put this last point would
> be: is "SELECT * FROM boxes LIMIT 3" guaranteed to always return the
> same 3 rows (assuming I don't insert or delete new rows) or can
> theoretically return any different 3 every time I query?
>
> Thank you.
>
>
> --
> Sent via pgsql-in-general mailing list (pgsql-in-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-in-general
>


From: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
To: Niranjan <niranjan81(at)gmail(dot)com>
Cc: pgsql-in-general(at)postgresql(dot)org
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-15 15:54:11
Message-ID: CACmQ0_LzvOEj+YSsMCu=ztM+U57vfU__Vub-MmDpwHxP17kWLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

Thanks Niranjan,

In our case we require the rows returned to be in the same consistent
order across several queries as we need to paginate through a large
set of results by doing a combination of ORDER BY x, LIMIT, OFFSET.
However the sorting key "x" can contain non-unique values, thus we
need to be sure that rows with the same values end up in the same
position across the different queries we do per each result "page".

As I understand, unless we add a secondary sorting key (like the "id"
of the row) which is guaranteed to be unique, we might end up with
rows that can potentially "fluctuate" across the query result, is it
correct?

Thank you

On 15 April 2015 at 15:54, Niranjan <niranjan81(at)gmail(dot)com> wrote:
> Hi,
> I am sure the variation depends on the query plan decided by the engine at
> the time of execution, in short, AFAIK (as far as I know) the answer is No.
> There is no guarantee that the rows will be returned in the exact order.
>
> Some of the possible answers to be further explored would be
> A) if we want the data to be always randomly ordered explore function
> random(), or is it rand() I guess? to be included in the query.
> B) if we want the data to be arbitrarily ordered but maintain the order for
> each query execution, save a randomly ordered same number of records in a
> different table and then write a joined query.
>
> I hope it helps.
>
> Thanks & Regards
> Niranjan D. Pandit
>
> On 15-Apr-2015 5:11 pm, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com> wrote:
>>
>> Hi all,
>>
>> I would like to know if postgresql is guaranteed to return the rows
>> always in the same order in the following case:
>>
>> table "boxes"
>> -------------------
>> width | height
>> 1 | 5
>> 1 | 3
>> 1 | 7
>> 1 | 2
>> 1 | 4
>>
>> SELECT * FROM boxes ORDER BY width;
>>
>> result:
>> 1 | 5
>> 1 | 3
>> 1 | 7
>> 1 | 2
>> 1 | 4
>>
>> In practice we've seen that postgresql returns the rows in some kind
>> of stably arbitrary order (which is expected because "width" contains
>> non unique values) but the subtle ambiguity here (which is important
>> for our application) is whether postresql is guaranteed to always
>> return the same arbitrary order in a stable fashion or if it can (even
>> theoretically) return an always different aribtrary order every time I
>> perform the query, say for example:
>>
>> SELECT * FROM boxes ORDER BY width;
>> 1 | 3
>> 1 | 5
>> 1 | 7
>> 1 | 4
>> 1 | 2
>>
>> SELECT * FROM boxes ORDER BY width;
>> 1 | 2
>> 1 | 5
>> 1 | 7
>> 1 | 4
>> 1 | 3
>>
>> etc.
>>
>> The only two relevant lines I've found in the docs
>> (http://www.postgresql.org/docs/current/interactive/queries-order.html)
>> are:
>>
>> "A particular output ordering can only be guaranteed if the sort step
>> is explicitly chosen." which does not tell us in what order rows with
>> same sorting-key value are returned.
>>
>> and
>>
>> "If sorting is not chosen, the rows will be returned in an unspecified
>> order." which I would expect to be the case also when sorting is
>> chosen BUT the sorting-key values are always the same. However this
>> still does not tell us if the "unspecified order" is always the same
>> or not, ie. if it's stable. Another way to put this last point would
>> be: is "SELECT * FROM boxes LIMIT 3" guaranteed to always return the
>> same 3 rows (assuming I don't insert or delete new rows) or can
>> theoretically return any different 3 every time I query?
>>
>> Thank you.
>>
>>
>> --
>> Sent via pgsql-in-general mailing list (pgsql-in-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-in-general


From: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
To: "Podrigal, Aron" <aronp(at)guaranteedplus(dot)com>, pgsql-in-general(at)postgresql(dot)org
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-15 17:31:40
Message-ID: CACmQ0_J6f+muxkBfeyWyrFJBHyiWECmY=QPiA3-nmsx=_oR1tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

That's what I heard but I could not find any convincing reference in
the docs or elsewhere, do you know of any?

On 15 April 2015 at 17:54, Podrigal, Aron <aronp(at)guaranteedplus(dot)com> wrote:
> I think the non-unique cols are usually returned in the order they're
> stored on disk.
>
>
> Aron
>
> On 4/15/15, Michele Bosi <michele(dot)bosi(at)gmail(dot)com> wrote:
>> Thanks Niranjan,
>>
>> In our case we require the rows returned to be in the same consistent
>> order across several queries as we need to paginate through a large
>> set of results by doing a combination of ORDER BY x, LIMIT, OFFSET.
>> However the sorting key "x" can contain non-unique values, thus we
>> need to be sure that rows with the same values end up in the same
>> position across the different queries we do per each result "page".
>>
>> As I understand, unless we add a secondary sorting key (like the "id"
>> of the row) which is guaranteed to be unique, we might end up with
>> rows that can potentially "fluctuate" across the query result, is it
>> correct?
>>
>> Thank you
>>
>> On 15 April 2015 at 15:54, Niranjan <niranjan81(at)gmail(dot)com> wrote:
>>> Hi,
>>> I am sure the variation depends on the query plan decided by the engine
>>> at
>>> the time of execution, in short, AFAIK (as far as I know) the answer is
>>> No.
>>> There is no guarantee that the rows will be returned in the exact order.
>>>
>>> Some of the possible answers to be further explored would be
>>> A) if we want the data to be always randomly ordered explore function
>>> random(), or is it rand() I guess? to be included in the query.
>>> B) if we want the data to be arbitrarily ordered but maintain the order
>>> for
>>> each query execution, save a randomly ordered same number of records in a
>>> different table and then write a joined query.
>>>
>>> I hope it helps.
>>>
>>> Thanks & Regards
>>> Niranjan D. Pandit
>>>
>>> On 15-Apr-2015 5:11 pm, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I would like to know if postgresql is guaranteed to return the rows
>>>> always in the same order in the following case:
>>>>
>>>> table "boxes"
>>>> -------------------
>>>> width | height
>>>> 1 | 5
>>>> 1 | 3
>>>> 1 | 7
>>>> 1 | 2
>>>> 1 | 4
>>>>
>>>> SELECT * FROM boxes ORDER BY width;
>>>>
>>>> result:
>>>> 1 | 5
>>>> 1 | 3
>>>> 1 | 7
>>>> 1 | 2
>>>> 1 | 4
>>>>
>>>> In practice we've seen that postgresql returns the rows in some kind
>>>> of stably arbitrary order (which is expected because "width" contains
>>>> non unique values) but the subtle ambiguity here (which is important
>>>> for our application) is whether postresql is guaranteed to always
>>>> return the same arbitrary order in a stable fashion or if it can (even
>>>> theoretically) return an always different aribtrary order every time I
>>>> perform the query, say for example:
>>>>
>>>> SELECT * FROM boxes ORDER BY width;
>>>> 1 | 3
>>>> 1 | 5
>>>> 1 | 7
>>>> 1 | 4
>>>> 1 | 2
>>>>
>>>> SELECT * FROM boxes ORDER BY width;
>>>> 1 | 2
>>>> 1 | 5
>>>> 1 | 7
>>>> 1 | 4
>>>> 1 | 3
>>>>
>>>> etc.
>>>>
>>>> The only two relevant lines I've found in the docs
>>>> (http://www.postgresql.org/docs/current/interactive/queries-order.html)
>>>> are:
>>>>
>>>> "A particular output ordering can only be guaranteed if the sort step
>>>> is explicitly chosen." which does not tell us in what order rows with
>>>> same sorting-key value are returned.
>>>>
>>>> and
>>>>
>>>> "If sorting is not chosen, the rows will be returned in an unspecified
>>>> order." which I would expect to be the case also when sorting is
>>>> chosen BUT the sorting-key values are always the same. However this
>>>> still does not tell us if the "unspecified order" is always the same
>>>> or not, ie. if it's stable. Another way to put this last point would
>>>> be: is "SELECT * FROM boxes LIMIT 3" guaranteed to always return the
>>>> same 3 rows (assuming I don't insert or delete new rows) or can
>>>> theoretically return any different 3 every time I query?
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-in-general mailing list (pgsql-in-general(at)postgresql(dot)org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-in-general
>>
>>
>> --
>> Sent via pgsql-in-general mailing list (pgsql-in-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-in-general
>>


From: "Podrigal, Aron" <aronp(at)guaranteedplus(dot)com>
To: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
Cc: pgsql-in-general(at)postgresql(dot)org
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-16 01:34:47
Message-ID: CANJp-yjxbei8iwAMXJsp04Q2X_1iGzpapkLxnu3X7dYsrQ0DwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

I think this stack answer would give you some more info
http://dba.stackexchange.com/a/38718
On Apr 15, 2015 1:32 PM, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com> wrote:

> That's what I heard but I could not find any convincing reference in
> the docs or elsewhere, do you know of any?
>
> On 15 April 2015 at 17:54, Podrigal, Aron <aronp(at)guaranteedplus(dot)com>
> wrote:
> > I think the non-unique cols are usually returned in the order they're
> > stored on disk.
> >
> >
> > Aron
> >
> > On 4/15/15, Michele Bosi <michele(dot)bosi(at)gmail(dot)com> wrote:
> >> Thanks Niranjan,
> >>
> >> In our case we require the rows returned to be in the same consistent
> >> order across several queries as we need to paginate through a large
> >> set of results by doing a combination of ORDER BY x, LIMIT, OFFSET.
> >> However the sorting key "x" can contain non-unique values, thus we
> >> need to be sure that rows with the same values end up in the same
> >> position across the different queries we do per each result "page".
> >>
> >> As I understand, unless we add a secondary sorting key (like the "id"
> >> of the row) which is guaranteed to be unique, we might end up with
> >> rows that can potentially "fluctuate" across the query result, is it
> >> correct?
> >>
> >> Thank you
> >>
> >> On 15 April 2015 at 15:54, Niranjan <niranjan81(at)gmail(dot)com> wrote:
> >>> Hi,
> >>> I am sure the variation depends on the query plan decided by the engine
> >>> at
> >>> the time of execution, in short, AFAIK (as far as I know) the answer is
> >>> No.
> >>> There is no guarantee that the rows will be returned in the exact
> order.
> >>>
> >>> Some of the possible answers to be further explored would be
> >>> A) if we want the data to be always randomly ordered explore function
> >>> random(), or is it rand() I guess? to be included in the query.
> >>> B) if we want the data to be arbitrarily ordered but maintain the order
> >>> for
> >>> each query execution, save a randomly ordered same number of records
> in a
> >>> different table and then write a joined query.
> >>>
> >>> I hope it helps.
> >>>
> >>> Thanks & Regards
> >>> Niranjan D. Pandit
> >>>
> >>> On 15-Apr-2015 5:11 pm, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I would like to know if postgresql is guaranteed to return the rows
> >>>> always in the same order in the following case:
> >>>>
> >>>> table "boxes"
> >>>> -------------------
> >>>> width | height
> >>>> 1 | 5
> >>>> 1 | 3
> >>>> 1 | 7
> >>>> 1 | 2
> >>>> 1 | 4
> >>>>
> >>>> SELECT * FROM boxes ORDER BY width;
> >>>>
> >>>> result:
> >>>> 1 | 5
> >>>> 1 | 3
> >>>> 1 | 7
> >>>> 1 | 2
> >>>> 1 | 4
> >>>>
> >>>> In practice we've seen that postgresql returns the rows in some kind
> >>>> of stably arbitrary order (which is expected because "width" contains
> >>>> non unique values) but the subtle ambiguity here (which is important
> >>>> for our application) is whether postresql is guaranteed to always
> >>>> return the same arbitrary order in a stable fashion or if it can (even
> >>>> theoretically) return an always different aribtrary order every time I
> >>>> perform the query, say for example:
> >>>>
> >>>> SELECT * FROM boxes ORDER BY width;
> >>>> 1 | 3
> >>>> 1 | 5
> >>>> 1 | 7
> >>>> 1 | 4
> >>>> 1 | 2
> >>>>
> >>>> SELECT * FROM boxes ORDER BY width;
> >>>> 1 | 2
> >>>> 1 | 5
> >>>> 1 | 7
> >>>> 1 | 4
> >>>> 1 | 3
> >>>>
> >>>> etc.
> >>>>
> >>>> The only two relevant lines I've found in the docs
> >>>> (
> http://www.postgresql.org/docs/current/interactive/queries-order.html)
> >>>> are:
> >>>>
> >>>> "A particular output ordering can only be guaranteed if the sort step
> >>>> is explicitly chosen." which does not tell us in what order rows with
> >>>> same sorting-key value are returned.
> >>>>
> >>>> and
> >>>>
> >>>> "If sorting is not chosen, the rows will be returned in an unspecified
> >>>> order." which I would expect to be the case also when sorting is
> >>>> chosen BUT the sorting-key values are always the same. However this
> >>>> still does not tell us if the "unspecified order" is always the same
> >>>> or not, ie. if it's stable. Another way to put this last point would
> >>>> be: is "SELECT * FROM boxes LIMIT 3" guaranteed to always return the
> >>>> same 3 rows (assuming I don't insert or delete new rows) or can
> >>>> theoretically return any different 3 every time I query?
> >>>>
> >>>> Thank you.
> >>>>
> >>>>
> >>>> --
> >>>> Sent via pgsql-in-general mailing list (
> pgsql-in-general(at)postgresql(dot)org)
> >>>> To make changes to your subscription:
> >>>> http://www.postgresql.org/mailpref/pgsql-in-general
> >>
> >>
> >> --
> >> Sent via pgsql-in-general mailing list (pgsql-in-general(at)postgresql(dot)org
> )
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-in-general
> >>
>


From: Niranjan <niranjan81(at)gmail(dot)com>
To: "Podrigal, Aron" <aronp(at)guaranteedplus(dot)com>
Cc: pgsql-in-general(at)postgresql(dot)org, Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-16 02:02:00
Message-ID: CAMYjFn-oocF5WKbccB+GHMFb+8WviaOGVQRNa7yv2KeBLFyh0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

Hi, when I said, may be we can get non repetitiveness in the results, the
probability of it is more dependent on the non repetitive nature of the
"data" being queried, and less dependent on the index logic. Hence the need
for heavy testing.
But if we can not be certain on the data and its consistency, then it's
best not to depend on the index, but on a separate unique column.

Thanks & Regards
Niranjan D. Pandit
On 16-Apr-2015 7:05 am, "Podrigal, Aron" <aronp(at)guaranteedplus(dot)com> wrote:

> I think this stack answer would give you some more info
> http://dba.stackexchange.com/a/38718
> On Apr 15, 2015 1:32 PM, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com> wrote:
>
>> That's what I heard but I could not find any convincing reference in
>> the docs or elsewhere, do you know of any?
>>
>> On 15 April 2015 at 17:54, Podrigal, Aron <aronp(at)guaranteedplus(dot)com>
>> wrote:
>> > I think the non-unique cols are usually returned in the order they're
>> > stored on disk.
>> >
>> >
>> > Aron
>> >
>> > On 4/15/15, Michele Bosi <michele(dot)bosi(at)gmail(dot)com> wrote:
>> >> Thanks Niranjan,
>> >>
>> >> In our case we require the rows returned to be in the same consistent
>> >> order across several queries as we need to paginate through a large
>> >> set of results by doing a combination of ORDER BY x, LIMIT, OFFSET.
>> >> However the sorting key "x" can contain non-unique values, thus we
>> >> need to be sure that rows with the same values end up in the same
>> >> position across the different queries we do per each result "page".
>> >>
>> >> As I understand, unless we add a secondary sorting key (like the "id"
>> >> of the row) which is guaranteed to be unique, we might end up with
>> >> rows that can potentially "fluctuate" across the query result, is it
>> >> correct?
>> >>
>> >> Thank you
>> >>
>> >> On 15 April 2015 at 15:54, Niranjan <niranjan81(at)gmail(dot)com> wrote:
>> >>> Hi,
>> >>> I am sure the variation depends on the query plan decided by the
>> engine
>> >>> at
>> >>> the time of execution, in short, AFAIK (as far as I know) the answer
>> is
>> >>> No.
>> >>> There is no guarantee that the rows will be returned in the exact
>> order.
>> >>>
>> >>> Some of the possible answers to be further explored would be
>> >>> A) if we want the data to be always randomly ordered explore function
>> >>> random(), or is it rand() I guess? to be included in the query.
>> >>> B) if we want the data to be arbitrarily ordered but maintain the
>> order
>> >>> for
>> >>> each query execution, save a randomly ordered same number of records
>> in a
>> >>> different table and then write a joined query.
>> >>>
>> >>> I hope it helps.
>> >>>
>> >>> Thanks & Regards
>> >>> Niranjan D. Pandit
>> >>>
>> >>> On 15-Apr-2015 5:11 pm, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com>
>> wrote:
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> I would like to know if postgresql is guaranteed to return the rows
>> >>>> always in the same order in the following case:
>> >>>>
>> >>>> table "boxes"
>> >>>> -------------------
>> >>>> width | height
>> >>>> 1 | 5
>> >>>> 1 | 3
>> >>>> 1 | 7
>> >>>> 1 | 2
>> >>>> 1 | 4
>> >>>>
>> >>>> SELECT * FROM boxes ORDER BY width;
>> >>>>
>> >>>> result:
>> >>>> 1 | 5
>> >>>> 1 | 3
>> >>>> 1 | 7
>> >>>> 1 | 2
>> >>>> 1 | 4
>> >>>>
>> >>>> In practice we've seen that postgresql returns the rows in some kind
>> >>>> of stably arbitrary order (which is expected because "width" contains
>> >>>> non unique values) but the subtle ambiguity here (which is important
>> >>>> for our application) is whether postresql is guaranteed to always
>> >>>> return the same arbitrary order in a stable fashion or if it can
>> (even
>> >>>> theoretically) return an always different aribtrary order every time
>> I
>> >>>> perform the query, say for example:
>> >>>>
>> >>>> SELECT * FROM boxes ORDER BY width;
>> >>>> 1 | 3
>> >>>> 1 | 5
>> >>>> 1 | 7
>> >>>> 1 | 4
>> >>>> 1 | 2
>> >>>>
>> >>>> SELECT * FROM boxes ORDER BY width;
>> >>>> 1 | 2
>> >>>> 1 | 5
>> >>>> 1 | 7
>> >>>> 1 | 4
>> >>>> 1 | 3
>> >>>>
>> >>>> etc.
>> >>>>
>> >>>> The only two relevant lines I've found in the docs
>> >>>> (
>> http://www.postgresql.org/docs/current/interactive/queries-order.html)
>> >>>> are:
>> >>>>
>> >>>> "A particular output ordering can only be guaranteed if the sort step
>> >>>> is explicitly chosen." which does not tell us in what order rows with
>> >>>> same sorting-key value are returned.
>> >>>>
>> >>>> and
>> >>>>
>> >>>> "If sorting is not chosen, the rows will be returned in an
>> unspecified
>> >>>> order." which I would expect to be the case also when sorting is
>> >>>> chosen BUT the sorting-key values are always the same. However this
>> >>>> still does not tell us if the "unspecified order" is always the same
>> >>>> or not, ie. if it's stable. Another way to put this last point would
>> >>>> be: is "SELECT * FROM boxes LIMIT 3" guaranteed to always return the
>> >>>> same 3 rows (assuming I don't insert or delete new rows) or can
>> >>>> theoretically return any different 3 every time I query?
>> >>>>
>> >>>> Thank you.
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Sent via pgsql-in-general mailing list (
>> pgsql-in-general(at)postgresql(dot)org)
>> >>>> To make changes to your subscription:
>> >>>> http://www.postgresql.org/mailpref/pgsql-in-general
>> >>
>> >>
>> >> --
>> >> Sent via pgsql-in-general mailing list (
>> pgsql-in-general(at)postgresql(dot)org)
>> >> To make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-in-general
>> >>
>>
>


From: Deepak <mail(dot)dktyagi(at)gmail(dot)com>
To: Niranjan <niranjan81(at)gmail(dot)com>
Cc: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>, pgsql-in-general(at)postgresql(dot)org, "Podrigal, Aron" <aronp(at)guaranteedplus(dot)com>
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-16 05:04:18
Message-ID: CALz1kS2reQJ+H2_fMVS=UF-HkywQgE=Gpm7mRqtf8hnJbya7Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

Hi,

why don't you include "height" in order by clause like below?
SELECT * FROM boxes ORDER BY width,height;

If you want to test if postgres returns result in particular order create
multiple copies of "Boxes" table and run your SQL query on each table until
you get variation in result.

Thanks,
Deepak
On Apr 16, 2015 7:32 AM, "Niranjan" <niranjan81(at)gmail(dot)com> wrote:

Hi, when I said, may be we can get non repetitiveness in the results, the
probability of it is more dependent on the non repetitive nature of the
"data" being queried, and less dependent on the index logic. Hence the need
for heavy testing.
But if we can not be certain on the data and its consistency, then it's
best not to depend on the index, but on a separate unique column.

Thanks & Regards
Niranjan D. Pandit
On 16-Apr-2015 7:05 am, "Podrigal, Aron" <aronp(at)guaranteedplus(dot)com> wrote:

> I think this stack answer would give you some more info
> http://dba.stackexchange.com/a/38718
> On Apr 15, 2015 1:32 PM, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com> wrote:
>
>> That's what I heard but I could not find any convincing reference in
>> the docs or elsewhere, do you know of any?
>>
>> On 15 April 2015 at 17:54, Podrigal, Aron <aronp(at)guaranteedplus(dot)com>
>> wrote:
>> > I think the non-unique cols are usually returned in the order they're
>> > stored on disk.
>> >
>> >
>> > Aron
>> >
>> > On 4/15/15, Michele Bosi <michele(dot)bosi(at)gmail(dot)com> wrote:
>> >> Thanks Niranjan,
>> >>
>> >> In our case we require the rows returned to be in the same consistent
>> >> order across several queries as we need to paginate through a large
>> >> set of results by doing a combination of ORDER BY x, LIMIT, OFFSET.
>> >> However the sorting key "x" can contain non-unique values, thus we
>> >> need to be sure that rows with the same values end up in the same
>> >> position across the different queries we do per each result "page".
>> >>
>> >> As I understand, unless we add a secondary sorting key (like the "id"
>> >> of the row) which is guaranteed to be unique, we might end up with
>> >> rows that can potentially "fluctuate" across the query result, is it
>> >> correct?
>> >>
>> >> Thank you
>> >>
>> >> On 15 April 2015 at 15:54, Niranjan <niranjan81(at)gmail(dot)com> wrote:
>> >>> Hi,
>> >>> I am sure the variation depends on the query plan decided by the
>> engine
>> >>> at
>> >>> the time of execution, in short, AFAIK (as far as I know) the answer
>> is
>> >>> No.
>> >>> There is no guarantee that the rows will be returned in the exact
>> order.
>> >>>
>> >>> Some of the possible answers to be further explored would be
>> >>> A) if we want the data to be always randomly ordered explore function
>> >>> random(), or is it rand() I guess? to be included in the query.
>> >>> B) if we want the data to be arbitrarily ordered but maintain the
>> order
>> >>> for
>> >>> each query execution, save a randomly ordered same number of records
>> in a
>> >>> different table and then write a joined query.
>> >>>
>> >>> I hope it helps.
>> >>>
>> >>> Thanks & Regards
>> >>> Niranjan D. Pandit
>> >>>
>> >>> On 15-Apr-2015 5:11 pm, "Michele Bosi" <michele(dot)bosi(at)gmail(dot)com>
>> wrote:
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> I would like to know if postgresql is guaranteed to return the rows
>> >>>> always in the same order in the following case:
>> >>>>
>> >>>> table "boxes"
>> >>>> -------------------
>> >>>> width | height
>> >>>> 1 | 5
>> >>>> 1 | 3
>> >>>> 1 | 7
>> >>>> 1 | 2
>> >>>> 1 | 4
>> >>>>
>> >>>> SELECT * FROM boxes ORDER BY width;
>> >>>>
>> >>>> result:
>> >>>> 1 | 5
>> >>>> 1 | 3
>> >>>> 1 | 7
>> >>>> 1 | 2
>> >>>> 1 | 4
>> >>>>
>> >>>> In practice we've seen that postgresql returns the rows in some kind
>> >>>> of stably arbitrary order (which is expected because "width" contains
>> >>>> non unique values) but the subtle ambiguity here (which is important
>> >>>> for our application) is whether postresql is guaranteed to always
>> >>>> return the same arbitrary order in a stable fashion or if it can
>> (even
>> >>>> theoretically) return an always different aribtrary order every time
>> I
>> >>>> perform the query, say for example:
>> >>>>
>> >>>> SELECT * FROM boxes ORDER BY width;
>> >>>> 1 | 3
>> >>>> 1 | 5
>> >>>> 1 | 7
>> >>>> 1 | 4
>> >>>> 1 | 2
>> >>>>
>> >>>> SELECT * FROM boxes ORDER BY width;
>> >>>> 1 | 2
>> >>>> 1 | 5
>> >>>> 1 | 7
>> >>>> 1 | 4
>> >>>> 1 | 3
>> >>>>
>> >>>> etc.
>> >>>>
>> >>>> The only two relevant lines I've found in the docs
>> >>>> (
>> http://www.postgresql.org/docs/current/interactive/queries-order.html)
>> >>>> are:
>> >>>>
>> >>>> "A particular output ordering can only be guaranteed if the sort step
>> >>>> is explicitly chosen." which does not tell us in what order rows with
>> >>>> same sorting-key value are returned.
>> >>>>
>> >>>> and
>> >>>>
>> >>>> "If sorting is not chosen, the rows will be returned in an
>> unspecified
>> >>>> order." which I would expect to be the case also when sorting is
>> >>>> chosen BUT the sorting-key values are always the same. However this
>> >>>> still does not tell us if the "unspecified order" is always the same
>> >>>> or not, ie. if it's stable. Another way to put this last point would
>> >>>> be: is "SELECT * FROM boxes LIMIT 3" guaranteed to always return the
>> >>>> same 3 rows (assuming I don't insert or delete new rows) or can
>> >>>> theoretically return any different 3 every time I query?
>> >>>>
>> >>>> Thank you.
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Sent via pgsql-in-general mailing list (
>> pgsql-in-general(at)postgresql(dot)org)
>> >>>> To make changes to your subscription:
>> >>>> http://www.postgresql.org/mailpref/pgsql-in-general
>> >>
>> >>
>> >> --
>> >> Sent via pgsql-in-general mailing list (
>> pgsql-in-general(at)postgresql(dot)org)
>> >> To make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-in-general
>> >>
>>
>


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
Cc: Niranjan <niranjan81(at)gmail(dot)com>, "pgsql-in-general(at)postgresql(dot)org" <pgsql-in-general(at)postgresql(dot)org>
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-16 05:23:29
Message-ID: CABOikdMA0KJOnCQadURPdF6ssk3JwP8Ez-HQqdEMjcRm-1ATBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

On Wed, Apr 15, 2015 at 9:24 PM, Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
wrote:

> Thanks Niranjan,
>
> In our case we require the rows returned to be in the same consistent
> order across several queries as we need to paginate through a large
> set of results by doing a combination of ORDER BY x, LIMIT, OFFSET.
> However the sorting key "x" can contain non-unique values, thus we
> need to be sure that rows with the same values end up in the same
> position across the different queries we do per each result "page".
>
> As I understand, unless we add a secondary sorting key (like the "id"
> of the row) which is guaranteed to be unique, we might end up with
> rows that can potentially "fluctuate" across the query result, is it
> correct?
>
>
There are two reasons why the ordering may differ between queries when
ORDER BY is on a non-unique column:

1. Different query plans being used for different queries
2. Physical ordering of rows has varied because of UPDATE/DELETES and
either VACUUM or HOT recycling dead space in the table.

If you control both these factors or somehow know that they are going to be
consistent across all queries, which pretty much means that all queries use
the same scan method for the table and the table itself is not undergoing
any DELETE/UPDATE, you could get the same row ordering. But clearly you
can't build your application under those assumptions and it will be far
better to add another column to the ORDER BY to guarantee same ordering.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Niranjan <niranjan81(at)gmail(dot)com>, "pgsql-in-general(at)postgresql(dot)org" <pgsql-in-general(at)postgresql(dot)org>, aronp(at)guaranteedplus(dot)com
Subject: Re: Result set stability in queries sorted by column with non-unique values
Date: 2015-04-16 10:12:29
Message-ID: CACmQ0_+1pPC6eKK8Hw+RVuekhkQZMOZ1Sesny3uw0WRbAty3=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-in-general

Thank you guys, I think it's clear now that in order to ensure a
deterministically 100% consistent result across queries I should add a
secondary sorting key with unique values. Since we have an "id"
column, this seems the perfect candidate.
Best regards,
Mic

On 16 April 2015 at 06:23, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>
> On Wed, Apr 15, 2015 at 9:24 PM, Michele Bosi <michele(dot)bosi(at)gmail(dot)com>
> wrote:
>>
>> Thanks Niranjan,
>>
>> In our case we require the rows returned to be in the same consistent
>> order across several queries as we need to paginate through a large
>> set of results by doing a combination of ORDER BY x, LIMIT, OFFSET.
>> However the sorting key "x" can contain non-unique values, thus we
>> need to be sure that rows with the same values end up in the same
>> position across the different queries we do per each result "page".
>>
>> As I understand, unless we add a secondary sorting key (like the "id"
>> of the row) which is guaranteed to be unique, we might end up with
>> rows that can potentially "fluctuate" across the query result, is it
>> correct?
>>
>
> There are two reasons why the ordering may differ between queries when ORDER
> BY is on a non-unique column:
>
> 1. Different query plans being used for different queries
> 2. Physical ordering of rows has varied because of UPDATE/DELETES and either
> VACUUM or HOT recycling dead space in the table.
>
> If you control both these factors or somehow know that they are going to be
> consistent across all queries, which pretty much means that all queries use
> the same scan method for the table and the table itself is not undergoing
> any DELETE/UPDATE, you could get the same row ordering. But clearly you
> can't build your application under those assumptions and it will be far
> better to add another column to the ORDER BY to guarantee same ordering.
>
> Thanks,
> Pavan
>
> --
> Pavan Deolasee
> http://www.linkedin.com/in/pavandeolasee