Re: pgbench - add pseudo-random permutation function

Lists: Postg배트맨 토토SQL
From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: pgbench - add pseudo-random permutation function
Date: 2018-07-28 14:03:49
Message-ID: alpine.DEB.2.21.1807280944370.5142@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 베이SQL


Hello,

This patch adds a pseudo-random permutation function to pgbench. It allows
to mix non uniform random keys to avoid trivial correlations between
neighboring values, hence between pages.

The function is a simplistic form of encryption adapted to any size, using
a few iterations of scramble and scatter phases. The result is not
cryptographically convincing, nor even statistically, but it is quite
inexpensive and achieves the desired result. A computation costs 0.22 µs
per call on my laptop, about three times the cost of a simple function.

Alternative designs, such as iterating over an actual encryption function
or using some sbox, would lead to much more costly solutions and complex
code.

I also join a few scripts I used for testing.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-1.patch text/plain 11.1 KB
prp_init.sql application/x-sql 245 bytes
prp_analyse.sql application/x-sql 264 bytes
prp_gen_vals.sql application/x-sql 348 bytes
prp_test_2.sql application/x-sql 243 bytes
prp_test_3.sql application/x-sql 245 bytes
prp_test_4.sql application/x-sql 284 bytes
prp_test_5.sql application/x-sql 324 bytes
prp_test_6.sql application/x-sql 381 bytes
prp_test_n.sql application/x-sql 274 bytes
prp_perf.sql application/x-sql 303 bytes

From: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-09 22:35:30
Message-ID: f30bc717-5532-c2d8-e513-b4912c10ddbe@interdb.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Fabian,

I reviewed `pgbench-prp-func-1.patch` and found an incomplete
implementation.

In the pseudorandom_perm function, I confirmed that the scramble and
scatter operations are mathematically bijections. Therefore, this
function is mathematically correct.

However, the implementation of the scatter operation in this patch
overflows in many cases if the variable:size is 38 bit integer or
greater. Because the variable:size and the item of the array:primes[]
which stores 27-29 bit integers are multiplicated. If overflow occurs,
the scatter operation does not satisfy bijective.

I did a sampling inspection, whose conditions are the variable:size is
1099511627773 (= 40 bit integer) and the variable:seed is 5432. Even
with very few samples, I found a huge number of collisions as shown below:

pr_perm(409749816, 1099511627773, 5432) = pr_perm(491041141,
1099511627773, 5432) = pr_perm(729171766, 1099511627773, 5432) =
pr_perm(849775914, 1099511627773, 5432) = 22445482629,
pr_perm(45609644, 1099511627773, 5432) = pr_perm(174005122,
1099511627773, 5432) = pr_perm(811754941, 1099511627773, 5432) =
pr_perm( 1131176891, 1099511627773, 5432) = 10626826534,
and so on.

There are two ways to resolve this issue. The first one is to reduce the
maximum value can be set for the variable:size. The second one is to add
a modular multiplication function to avoid overflow. I made a patch
including the function that can be calculated modular multiplication
without overflow, and attached this mail.
(I also attached a simple test suite of the function I added.)

In the others parts of the original patch, I could apply the patch and
did tests without trouble; the documentations and comments are well
except one comment in the function shown below:

>> (1) scramble: partial xors on power-or-2 subsets

I could not understand this meaning when I read it at the first time.

Best regards,

On 2018/07/28 15:03, Fabien COELHO wrote:
>
> Hello,
>
> This patch adds a pseudo-random permutation function to pgbench. It
> allows to mix non uniform random keys to avoid trivial correlations
> between neighboring values, hence between pages.
>
> The function is a simplistic form of encryption adapted to any size,
> using a few iterations of scramble and scatter phases. The result is not
> cryptographically convincing, nor even statistically, but it is quite
> inexpensive and achieves the desired result. A computation costs 0.22 µs
> per call on my laptop, about three times the cost of a simple function.
>
> Alternative designs, such as iterating over an actual encryption
> function or using some sbox, would lead to much more costly solutions
> and complex code.
>
> I also join a few scripts I used for testing.
>

Attachment Content-Type Size
pgbench-prp-func-2.patch text/plain 13.0 KB
modular_multiplicate.c text/plain 2.6 KB
modular_multiplicate.dat text/plain 1.8 KB
modular_multiplicate_test.sh application/x-sh 416 bytes

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-11 14:57:33
Message-ID: alpine.DEB.2.21.1809111652390.30244@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg윈 토토SQL :


Hello Hironobu-san,

> I reviewed `pgbench-prp-func-1.patch` and found an incomplete implementation.

Indeed, thanks a lot for the debug, and the proposed fix!

I'm going to work a little bit more on the patch based on your proposed
changes, and submit a v3, hopefully soon.

--
Fabien.


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-12 09:26:53
Message-ID: alpine.DEB.2.21.1809121104160.13887@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Hironobu-san,

> However, the implementation of the scatter operation in this patch overflows
> in many cases if the variable:size is 38 bit integer or greater. Because the
> variable:size and the item of the array:primes[] which stores 27-29 bit
> integers are multiplicated. If overflow occurs, the scatter operation does
> not satisfy bijective.

Indeed. Again, thanks for the debug! As you contributed some code, I added
you as a co-author in the CF entry.

Attached a v3, based on your fix, plus some additional changes:
- explicitly declare unsigned variables where appropriate, to avoid casts
- use smaller 24 bits primes instead of 27-29 bits
- add a shortcut for multiplier below 24 bits and y value below 40 bits,
which should avoid the manually implemented multiplication in most
practical cases (tables with over 2^40 rows are pretty rare...).
- change the existing shortcut to look a the number of bits instead of
using 32 limits.
- add a test for minimal code coverage with over 40 bits sizes
- attempt to improve the documentation
- some comments were updates, hopefully for the better

The main idea behind the smaller primes is to avoid the expensive modmul
implementation on most realistic cases.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-3.patch text/x-diff 15.0 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-16 20:05:28
Message-ID: alpine.DEB.2.21.1809162200180.1834@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: 503 스포츠 토토 베트맨


Hello Hironobu-san,

Here is a v4, based on our out-of-list discussion:
- the mask function is factored out
- the popcount builtin is used if available

> Attached a v3, based on your fix, plus some additional changes:
> - explicitly declare unsigned variables where appropriate, to avoid casts
> - use smaller 24 bits primes instead of 27-29 bits
> - add a shortcut for multiplier below 24 bits and y value below 40 bits,
> which should avoid the manually implemented multiplication in most
> practical cases (tables with over 2^40 rows are pretty rare...).
> - change the existing shortcut to look a the number of bits instead of
> using 32 limits.
> - add a test for minimal code coverage with over 40 bits sizes
> - attempt to improve the documentation
> - some comments were updates, hopefully for the better

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-4.patch text/x-diff 18.0 KB

From: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-18 17:15:51
Message-ID: 5e049bd6-fceb-958e-2a3b-165f8b7261a9@interdb.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg롤 토토SQL :

Hi Fabian-san,

I reviewed 'pgbench-prp-func/pgbench-prp-func-4.patch'.

I could apply it and did the TAP test successfully. I could not find
typo in the documentations and comments.

To make sure, I checked the new routine which contains the
__builtin_popcountll() function on Linux + gcc 7.3.1 and I confirmed
that it works well.

I thinks this patch is fine.

Best regards,

On 2018/09/16 21:05, Fabien COELHO wrote:
>
> Hello Hironobu-san,
>
> Here is a v4, based on our out-of-list discussion:
>  - the mask function is factored out
>  - the popcount builtin is used if available
>
>> Attached a v3, based on your fix, plus some additional changes:
>> - explicitly declare unsigned variables where appropriate, to avoid casts
>> - use smaller 24 bits primes instead of 27-29 bits
>> - add a shortcut for multiplier below 24 bits and y value below 40 bits,
>>   which should avoid the manually implemented multiplication in most
>>   practical cases (tables with over 2^40 rows are pretty rare...).
>> - change the existing shortcut to look a the number of bits instead of
>>   using 32 limits.
>> - add a test for minimal code coverage with over 40 bits sizes
>> - attempt to improve the documentation
>> - some comments were updates, hopefully for the better
>


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-18 19:07:34
Message-ID: alpine.DEB.2.21.1809182059330.8683@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I reviewed 'pgbench-prp-func/pgbench-prp-func-4.patch'. [...] I thinks
> this patch is fine.

Thanks!

You may consider turning it "ready" in the CF app, so as to see whether
some committer agrees with you.

--
Fabien.


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: hironobu(at)interdb(dot)jp, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-26 04:49:09
Message-ID: CAEepm=2Z7yuLTNuoo3FdSD+y-Hj6pCJGknXqT6a6URx4Sc1u5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 19, 2018 at 7:07 AM Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> > I reviewed 'pgbench-prp-func/pgbench-prp-func-4.patch'. [...] I thinks
> > this patch is fine.

modular_multiply() is an interesting device. I will leave this to
committers with a stronger mathematical background than me, but I have
a small comment in passing:

+#ifdef HAVE__BUILTIN_POPCOUNTLL
+ return __builtin_popcountll(n);
+#else /* use clever no branching bitwise operator version */

I think it is not enough for the compiler to have
__builtin_popcountll(). The CPU that this is eventually executed on
must actually have the POPCNT instruction[1] (or equivalent on ARM,
POWER etc), or the program will die with SIGILL. There are CPUs in
circulation produced in this decade that don't have it. I have
previously considered something like this[2], but realised you would
therefore need a runtime check. There are a couple of ways to do
that: see commit a7a73875 for one example, also
__builtin_cpu_supports(), and direct CPU ID bit checks (some
platforms). There is also the GCC "multiversion" system that takes
care of that for you but that worked only for C++ last time I checked.

[1] https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT
[2] /message-id/CAEepm%3D3g1_fjJGp38QGv%2B38BC2HHVkzUq6s69nk3mWLgPHqC3A%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: hironobu(at)interdb(dot)jp, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-26 08:26:41
Message-ID: alpine.DEB.2.21.1809260717020.22248@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 결과SQL


Hello Thomas,

> modular_multiply() is an interesting device. I will leave this to
> committers with a stronger mathematical background than me, but I have
> a small comment in passing:

For testing this function, I have manually commented out the various
shortcuts so that the manual multiplication code is always used, and the
tests passed. I just did it again.

> +#ifdef HAVE__BUILTIN_POPCOUNTLL
> + return __builtin_popcountll(n);
> +#else /* use clever no branching bitwise operator version */
>
> I think it is not enough for the compiler to have
> __builtin_popcountll(). The CPU that this is eventually executed on
> must actually have the POPCNT instruction[1] (or equivalent on ARM,
> POWER etc), or the program will die with SIGILL.

Hmmm, I'd be pretty surprised: The point of the builtin is to delegate to
the compiler the hassle of selecting the best option available, depending
on target hardware class: whether it issues a cpu/sse4 instruction is
not/should not be our problem.

> There are CPUs in circulation produced in this decade that don't have
> it.

Then the compiler, when generating code that is expected to run for a
large class of hardware which include such old ones, should not use a
possibly unavailable instruction, or the runtime should take
responsability for dynamically selecting a viable option.

My understanding is that it should always be safe to call __builtin_XYZ
functions when available. Then if you compile saying that you want code
specific to cpu X and then run it on cpu Y and it fails, basically you
just shot yourself in the foot.

> I have previously considered something like this[2], but realised you
> would therefore need a runtime check. There are a couple of ways to do
> that: see commit a7a73875 for one example, also
> __builtin_cpu_supports(), and direct CPU ID bit checks (some platforms).
> There is also the GCC "multiversion" system that takes care of that for
> you but that worked only for C++ last time I checked.

ISTM that the purpose of a dynamic check would be to have the better
hardware benefit even when compiling for a generic class of hardware which
may or may not have the better option.

This approach is fine for reaching a better performance/portability
compromise, but I do not think that it is that useful for pgbench to go to
this level of optimization, unless someone else takes care, hence the
compiler builtin.

An interesting side effect of your mail is that while researching a bit on
the subject I noticed __builtin_clzll() which helps improve the nbits code
compared to using popcount. Patch attached uses CLZ insted of POPCOUNT.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-5.patch text/x-diff 17.8 KB

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: hironobu(at)interdb(dot)jp, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-26 10:25:09
Message-ID: CAEepm=0FMnWm9miM3Tkuq5j3Q31WQJ7vvniy6rinCfkrsByQRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토 결과SQL

On Wed, Sep 26, 2018 at 8:26 PM Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> > modular_multiply() is an interesting device. I will leave this to
> > committers with a stronger mathematical background than me, but I have
> > a small comment in passing:
>
> For testing this function, I have manually commented out the various
> shortcuts so that the manual multiplication code is always used, and the
> tests passed. I just did it again.
>
> > +#ifdef HAVE__BUILTIN_POPCOUNTLL
> > + return __builtin_popcountll(n);
> > +#else /* use clever no branching bitwise operator version */
> >
> > I think it is not enough for the compiler to have
> > __builtin_popcountll(). The CPU that this is eventually executed on
> > must actually have the POPCNT instruction[1] (or equivalent on ARM,
> > POWER etc), or the program will die with SIGILL.
>
> Hmmm, I'd be pretty surprised: The point of the builtin is to delegate to
> the compiler the hassle of selecting the best option available, depending
> on target hardware class: whether it issues a cpu/sse4 instruction is
> not/should not be our problem.

True, it selects based on what you tell it:

$ cat x.c
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{
printf("%d\n", __builtin_popcount(atoi(argv[1])));
}
$ gdb -q a.out
...
(gdb) disass main
...
0x00000000004005a4 <+39>: callq 0x4005c0 <__popcountdi2>
...
$ gcc -mpopcnt x.c
$ gdb -q a.out
...
(gdb) disass main
...
0x000000000040059f <+34>: popcnt %eax,%eax
...

I'd forgotten one detail... we figure out if need to tell it that we
have SSE4.2 instructions explicitly in the configure script:

checking which CRC-32C implementation to use... SSE 4.2 with runtime check

We enable -msse4.2 just on the one file that needs it, in src/port/Makefile:

pg_crc32c_sse42.o: CFLAGS+=$(CFLAGS_SSE42)

On this CentOS machine I see:

gcc ... -msse4.2 ... -c pg_crc32c_sse42.c -o pg_crc32c_sse42_srv.o

That is necessary because most people consume PostgreSQL through
packages from distributions that have to work on an Athlon II or
whatever, so we can't just use -msse4.2 for every translation unit.
So it becomes our job to isolate small bits of code that use newer
instructions, if it's really worth the effort to do that, and supply
our own runtime checks and provide a fallback.

> > There are CPUs in circulation produced in this decade that don't have
> > it.
>
> Then the compiler, when generating code that is expected to run for a
> large class of hardware which include such old ones, should not use a
> possibly unavailable instruction, or the runtime should take
> responsability for dynamically selecting a viable option.

Right. Our problem is that if we didn't do our own runtime testing,
users of (say) Debian and RHEL packages (= most users of PostgreSQL)
would effectively never be able to use things like CRC32 instructions.
None of that seems worth it for something like this.

> An interesting side effect of your mail is that while researching a bit on
> the subject I noticed __builtin_clzll() which helps improve the nbits code
> compared to using popcount. Patch attached uses CLZ insted of POPCOUNT.

It's annoying that fls() didn't make it into POSIX along with ffs().
On my system it compiles to a BSR instruction, just like
__builtin_clz().

--
Thomas Munro
http://www.enterprisedb.com


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: hironobu(at)interdb(dot)jp, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-09-26 11:20:49
Message-ID: alpine.DEB.2.21.1809261305270.22248@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 순위SQL


Hello,

> That is necessary because most people consume PostgreSQL through
> packages from distributions that have to work on an Athlon II or
> whatever, so we can't just use -msse4.2 for every translation unit.
> So it becomes our job to isolate small bits of code that use newer
> instructions, if it's really worth the effort to do that, and supply
> our own runtime checks and provide a fallback.

Ok. That was my understanding so as to improve the portability/performance
compromise. I do not think that pgbench is worth the effort on this
particular point.

> [...] None of that seems worth it for something like this.

Indeed.

So, am I right to deducing that you are satisfied with the current status
of the patch, with the nbits implementation either based on popcount (v4)
or clz (v5) compiler intrinsics? I think that the clz option is better.

--
Fabien.


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, hironobu(at)interdb(dot)jp, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-01 06:40:04
Message-ID: 20181001064004.GE11712@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트SQL

On Wed, Sep 26, 2018 at 01:20:49PM +0200, Fabien COELHO wrote:
> So, am I right to deducing that you are satisfied with the current status of
> the patch, with the nbits implementation either based on popcount (v4) or
> clz (v5) compiler intrinsics? I think that the clz option is better.

Fabien, please note that v5 does not apply, so I moved this patch to
next CF, waiting on author.
--
Michael


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, hironobu(at)interdb(dot)jp, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-01 11:19:42
Message-ID: alpine.DEB.2.21.1810010927520.4827@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
> Subject: Re: pgbench - add pseudo-random permutation function
>
> On Wed, Sep 26, 2018 at 01:20:49PM +0200, Fabien COELHO wrote:
>> So, am I right to deducing that you are satisfied with the current status of
>> the patch, with the nbits implementation either based on popcount (v4) or
>> clz (v5) compiler intrinsics? I think that the clz option is better.
>
> Fabien, please note that v5 does not apply,

Indeed, tests interact with 92a0342 committed on Thursday.

Here is a rebase of the latest version based on CLZ. According to basic
test I made, the underlying hardware instruction seems to be more often
available.

> so I moved this patch to next CF, waiting on author.

I'm going to change its state to "Needs review".

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-6.patch text/x-diff 17.8 KB

From: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-15 02:03:04
Message-ID: 655df530-b233-e3d8-4c08-9edf853934a2@interdb.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I reviewed pgbench-prp-func-6.patch.

(1) modular_multiply()
In modular_multiply(), the numbers of digits of inputs are checked in
order to determine using the interleaved modular multiplication
algorithm or not. However, the check condition absolutely depends on the
implementation of pseudorandom_perm() even though modular_multiply() is
a general function.

I think it is better that pseudorandom_perm() directly checks the
condition because it can be worked efficiently and modular_multiply()
can be used in general purpose.

(2) pseudorandom_perm() and 001_pgbench_with_server.pl
Reading the discussion of __builtin_xxx functions, I remembered another
overflow issue.

pseudorandom_perm() uses the Donald Knuth's linear congruential
generator method to obtain pseudo-random numbers. This method, not only
this but also all linear congruential generator methods, always overflows.

If we do not need to guarantee the portability of this code, we do not
care about the result of this method because we just use *pseudo* random
numbers. (In fact, I ignored it before.) However, since we have to
guarantee the portability, we have to calculate it accurately. I
therefore implemented the function dk_lcg() and rewrote pseudorandom_perm().

Just to be sure, I made a python code to check the algorithm of
pseudorandom_perm() and run it.
Fortunately, my python code and pseudorandom_perm() which I rewrote
returned the same answers, so I rewrote the answers in
001_pgbench_with_server.pl.

I attached the new patch `pgbench-prp-func-7.patch`, the python code
`pseudorandom_perm.py`, and the pr_perm check script file
`pr_perm_check.sql`.

Best regards,

On 2018/10/01 12:19, Fabien COELHO wrote:
>>     PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
>> Subject: Re: pgbench - add pseudo-random permutation function
>>
>> On Wed, Sep 26, 2018 at 01:20:49PM +0200, Fabien COELHO wrote:
>>> So, am I right to deducing that you are satisfied with the current
>>> status of
>>> the patch, with the nbits implementation either based on popcount
>>> (v4) or
>>> clz (v5) compiler intrinsics? I think that the clz option is better.
>>
>> Fabien, please note that v5 does not apply,
>
> Indeed, tests interact with 92a0342 committed on Thursday.
>
> Here is a rebase of the latest version based on CLZ. According to basic
> test I made, the underlying hardware instruction seems to be more often
> available.
>
>> so I moved this patch to next CF, waiting on author.
>
> I'm going to change its state to "Needs review".
>

Attachment Content-Type Size
pr_perm_check.sql text/plain 346 bytes
pseudorandom_perm.py text/x-python-script 1.6 KB
pgbench-prp-func-7.patch text/plain 17.6 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-15 13:02:52
Message-ID: alpine.DEB.2.21.1810150957290.28507@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Hironobu-san,

> I reviewed pgbench-prp-func-6.patch.

Thanks.

> (1) modular_multiply()
> In modular_multiply(), the numbers of digits of inputs are checked in order
> to determine using the interleaved modular multiplication algorithm or not.
> However, the check condition absolutely depends on the implementation of
> pseudorandom_perm() even though modular_multiply() is a general function.
>
> I think it is better that pseudorandom_perm() directly checks the condition
> because it can be worked efficiently and modular_multiply() can be used in
> general purpose.

You moved the shortcut to the caller. Why not, it removes one modulo
operation and avoids the call altogether.

> (2) pseudorandom_perm() and 001_pgbench_with_server.pl
> Reading the discussion of __builtin_xxx functions, I remembered another
> overflow issue.
>
> pseudorandom_perm() uses the Donald Knuth's linear congruential generator
> method to obtain pseudo-random numbers. This method, not only this but also
> all linear congruential generator methods, always overflows.

> If we do not need to guarantee the portability of this code,

ISTM that we do not need such changes: the code is already portable
because standard C unsigned operations overflow consistently and this is
what it really expected for the linear congruential generator.

If an alternate implementation should be provided, given the heavy cost of
the modular multiplication function, it would be only for those
architectures which fails, detected at configure time. I would not go into
this without an actual failing architecture & C compiler.

Also, the alternate implementation should not change the result, so
something looks amiss in your version. Moreover, I'm unclear how to
implement an overflow multiply with the safe no overflow version.

Here is a v8 which:
- moves the shortcut to the caller
- changes the r formula as you did
- does nothing about Knuth's formula, as nothing should be needed
- fixes tests after Peter's exit status changes

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-8.patch text/x-diff 17.9 KB

From: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-23 10:11:32
Message-ID: 29821e16-2877-4681-c9fd-3b3fdbfe089d@interdb.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

> Also, the alternate implementation should not change the result, so
> something looks amiss in your version. Moreover, I'm unclear how to
> implement an overflow multiply with the safe no overflow version.
(snip)

I made an honest mistake. I had assumed the modulo number of Knuth's LCG
is (2 ^ 64 - 1).

BTW, I found other overflow issue.
In pseudorandom_perm(), `modular_multiply() + (key >> LCG_SHIFT)` may
overflow if the result of modular_multiply() is large. Therefore, I've
improved it.

Also, I've simplified Step 5 in modular_multiply().

I attached pgbench-prp-func-9.patch.

Best regards,

Attachment Content-Type Size
pgbench-prp-func-9.patch text/plain 17.6 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-24 11:55:46
Message-ID: alpine.DEB.2.21.1810241154210.30118@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg메이저 토토 사이트SQL


Hello Hironobu-san,

> In pseudorandom_perm(), `modular_multiply() + (key >> LCG_SHIFT)` may
> overflow if the result of modular_multiply() is large. Therefore, I've
> improved it.

> Also, I've simplified Step 5 in modular_multiply().

Attached is a v10, where I have:
- updated some comments
- the + cannot overflow because size is taken from a signed int
and the added value is small thanks to the shift.
I have put back the simple formula and added a comment about it.
- added a few test cases, and fix the associated checks

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-10.patch text/x-diff 18.8 KB

From: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-24 14:55:57
Message-ID: 1680e8eb-13e8-63f8-ee41-9444112d4523@interdb.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Fabian-san,

I reviewed 'pgbench-prp-func/pgbench-prp-func-10.patch'.

On 2018/10/24 12:55, Fabien COELHO wrote:
>
> Hello Hironobu-san,
>
>> In pseudorandom_perm(), `modular_multiply() + (key >> LCG_SHIFT)` may
>> overflow if the result of modular_multiply() is large. Therefore, I've
>> improved it.
>
>> Also, I've simplified Step 5 in modular_multiply().
>
> Attached is a v10, where I have:
>  - updated some comments
>  - the + cannot overflow because size is taken from a signed int
>    and the added value is small thanks to the shift.
>    I have put back the simple formula and added a comment about it.
>  - added a few test cases, and fix the associated checks
>

I agree your discussion before.

I checked the tests you added in this patch and I confirmed that there
is no problem.

I thinks this patch is fine.

Best regards,


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-24 15:08:00
Message-ID: alpine.DEB.2.21.1810241706420.26778@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I thinks this patch is fine.

Thanks!

Hopefully some committer will pick it up at some point.

--
Fabien.


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-24 16:59:04
Message-ID: 20181024165904.4cz72a4ln5s6v4bl@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: 503 스포츠 토토 사이트

Can you please pgindent this?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-24 20:43:02
Message-ID: alpine.DEB.2.21.1810242235170.26778@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg스포츠 토토 사이트SQL


Hello Alvaro,

> Can you please pgindent this?

Hmmm. After some investigation, I installed some "pg_bsd_indent" and ran
the "pgindent" script, which reindented far more than the patch... So I
picked up the patch-related changes and integrated them manually, although
not comment changes which break the logic of the algorithm descriptions. I
have not found how to tell pgindent to let comments indentation alone.

Here is the result for the code, and for part of comments.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-11.patch text/x-diff 18.9 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-24 21:00:08
Message-ID: 20181024210008.bzskfwidbm2fazbi@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2018-Oct-24, Fabien COELHO wrote:

>
> Hello Alvaro,
>
> > Can you please pgindent this?
>
> Hmmm. After some investigation, I installed some "pg_bsd_indent" and ran the
> "pgindent" script, which reindented far more than the patch... So I picked
> up the patch-related changes and integrated them manually,

Cool, thanks.

> although not comment changes which break the logic of the algorithm
> descriptions. I have not found how to tell pgindent to let comments
> indentation alone.

You can use /*----- for such comments.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-25 01:12:52
Message-ID: 20181025011252.GC5902@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: 503 롤 토토 페치 실패

On Wed, Oct 24, 2018 at 06:00:08PM -0300, Alvaro Herrera wrote:
> On 2018-Oct-24, Fabien COELHO wrote:
>> although not comment changes which break the logic of the algorithm
>> descriptions. I have not found how to tell pgindent to let comments
>> indentation alone.
>
> You can use /*----- for such comments.

A recent example of that is what d55241af has done in
pg_verify_checksums.c.
--
Michael


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2018-10-25 06:21:27
Message-ID: alpine.DEB.2.21.1810250755090.26778@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Alvaro,

>> although not comment changes which break the logic of the algorithm
>> descriptions. I have not found how to tell pgindent to let comments
>> indentation alone.
>
> You can use /*----- for such comments.

Thanks for the hint. Here is an updated patch using this marker.

I noticed that some instances use a closing "*-----" sequence, but it does
not seem useful, so I left it out.

I have also tried to improve a few comments, and moved a declaration into
a loop because I did not like the pg-indented version much.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-12.patch text/x-diff 18.8 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-02-02 01:16:56
Message-ID: 20190202011656.GD11299@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 25, 2018 at 08:21:27AM +0200, Fabien COELHO wrote:
> Thanks for the hint. Here is an updated patch using this marker.
>
> I noticed that some instances use a closing "*-----" sequence, but it does
> not seem useful, so I left it out.
>
> I have also tried to improve a few comments, and moved a declaration into a
> loop because I did not like the pg-indented version much.

This patch is marked as ready for committer for some time now, and it
has rotten, so I am moving it to next CF, waiting on author.
--
Michael


From: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-02-10 17:46:15
Message-ID: 230cef37-bb93-94bc-e654-f558de4cace1@interdb.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I updated the patch. And also I added some explanations and simple
examples in the modular_multiply function.

Fabian-san,

To make easily understanding, I think it is a good idea to add a brief
explanation of outline the pseudorandom_perm function and bijective
functions/transformations. What do you think about?

Best regards,

On 2019/02/02 1:16, Michael Paquier wrote:
> On Thu, Oct 25, 2018 at 08:21:27AM +0200, Fabien COELHO wrote:
>> Thanks for the hint. Here is an updated patch using this marker.
>>
>> I noticed that some instances use a closing "*-----" sequence, but it does
>> not seem useful, so I left it out.
>>
>> I have also tried to improve a few comments, and moved a declaration into a
>> loop because I did not like the pg-indented version much.
>
> This patch is marked as ready for committer for some time now, and it
> has rotten, so I am moving it to next CF, waiting on author.
> --
> Michael
>

Attachment Content-Type Size
pgbench-prp-func-13.patch text/plain 19.6 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-02-14 17:59:58
Message-ID: 20190214175958.6jh6gjcfh5iovghc@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2019-02-10 17:46:15 +0000, Hironobu SUZUKI wrote:
> I updated the patch. And also I added some explanations and simple examples
> in the modular_multiply function.

It'd be good to update the commitfest entry to say 'needs review' the
next time.

> +# PGAC_C_BUILTIN_CLZLL
> +# -------------------------
> +# Check if the C compiler understands __builtin_clzll(),
> +# and define HAVE__BUILTIN_CLZLL if so.
> +# Both GCC & CLANG seem to have one.
> +AC_DEFUN([PGAC_C_BUILTIN_CLZLL],
> +[AC_CACHE_CHECK(for __builtin_clzll, pgac_cv__builtin_clzll,
> +[AC_COMPILE_IFELSE([AC_LANG_SOURCE(
> +[static unsigned long int x = __builtin_clzll(0xaabbccddeeff0011);]
> +)],
> +[pgac_cv__builtin_clzll=yes],
> +[pgac_cv__builtin_clzll=no])])
> +if test x"$pgac_cv__builtin_clzll" = xyes ; then
> +AC_DEFINE(HAVE__BUILTIN_CLZLL, 1,
> + [Define to 1 if your compiler understands __builtin_clzll.])
> +fi])# PGAC_C_BUILTIN_CLZLL

I think this has been partially superceded by

commit 711bab1e4d19b5c9967328315a542d93386b1ac5
Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Date: 2019-02-13 16:10:06 -0300

Add basic support for using the POPCNT and SSE4.2s LZCNT opcodes

could you make sur eit's integrated appropriately?

> <para>
> + Function <literal>pr_perm</literal> implements a pseudo-random permutation.
> + It allows to mix the output of non uniform random functions so that
> + values drawn more often are not trivially correlated.
> + It permutes integers in [0, size) using a seed by applying rounds of
> + simple invertible functions, similarly to an encryption function,
> + although beware that it is not at all cryptographically secure.
> + Compared to <literal>hash</literal> functions discussed above, the function
> + ensures that a perfect permutation is applied: there are no collisions
> + nor holes in the output values.
> + Values outside the interval are interpreted modulo the size.
> + The function errors if size is not positive.
> + If no seed is provided, <literal>:default_seed</literal> is used.
> + For a given size and seed, the function is fully deterministic: if two
> + permutations on the same size must not be correlated, use distinct seeds
> + as outlined in the previous example about hash functions.
> + </para>

This doesn't really explain why we want this in pgbench.

Greetings,

Andres Freund


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-02-14 21:17:53
Message-ID: alpine.DEB.2.21.1902142155050.20189@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Andres,

>> +# PGAC_C_BUILTIN_CLZLL
>
> I think this has been partially superceded by
>
> commit 711bab1e4d19b5c9967328315a542d93386b1ac5
> Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> Date: 2019-02-13 16:10:06 -0300

Indeed, the patch needs a rebase & conflit resolution. I'll do it. Later.

>> <para>
>> + Function <literal>pr_perm</literal> implements a pseudo-random permutation.
>> + It allows to mix the output of non uniform random functions so that
>> + values drawn more often are not trivially correlated.
>> + It permutes integers in [0, size) using a seed by applying rounds of
>> + simple invertible functions, similarly to an encryption function,
>> + although beware that it is not at all cryptographically secure.
>> + Compared to <literal>hash</literal> functions discussed above, the function
>> + ensures that a perfect permutation is applied: there are no collisions
>> + nor holes in the output values.
>> + Values outside the interval are interpreted modulo the size.
>> + The function errors if size is not positive.
>> + If no seed is provided, <literal>:default_seed</literal> is used.
>> + For a given size and seed, the function is fully deterministic: if two
>> + permutations on the same size must not be correlated, use distinct seeds
>> + as outlined in the previous example about hash functions.
>> + </para>
>
> This doesn't really explain why we want this in pgbench.

Who is "we"?

If someone runs non uniform tests, ie with random_exp/zipf/gauss, close
values are drawn with a similar frequency, thus correlated, inducing an
undeserved correlation at the page level (eg for read) and better
performance that would be the case if relative frequencies were not
correlated to key values.

So the function allows having more realistic non uniform test, whereas
currently we can only have non uniform test with very unrealistically
correlated values at the key level and possibly at the page level, meaning
non representative performances because of these induced bias.

This is under the assumption that pgbench should allow more realistic
performance test scenarii, which I believe is a desirable purpose. If
someone disagree with this purpose, then they would consider both non
uniform random functions and this proposed pseudo-random permutation
function as useless, as probably most other additions to pgbench.

--
Fabien.


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-03-03 08:55:58
Message-ID: alpine.DEB.2.21.1903030947200.8095@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 순위SQL


> Indeed, the patch needs a rebase & conflit resolution. I'll do it. Later.

Here is an update:

- take advantage of pg_bitutils (although I noted that the "slow"
popcount there could be speeded-up and shorten with a bitwise operator
implementation that I just removed from pgbench).

- add comments about the bijective transformations in the code.

As already stated, this function makes sense for people who want to test
performance with pgbench using non uniform rands. If you do not want to do
that, you will probably find the function pretty useless. I can't help it.

Also, non uniform rands is also a way to test pg lock contention behavior.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-14.patch text/x-diff 17.8 KB

From: David Steele <david(at)pgmasters(dot)net>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Re: pgbench - add pseudo-random permutation function
Date: 2019-03-21 17:27:49
Message-ID: be567349-f2ea-436f-faf0-bb5cc3b01c4a@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트 순위SQL

Hi Hironobu,

On 3/3/19 12:55 PM, Fabien COELHO wrote:
>
>> Indeed, the patch needs a rebase & conflit resolution. I'll do it. Later.
>
> Here is an update:
>
>  - take advantage of pg_bitutils (although I noted that the "slow"
>    popcount there could be speeded-up and shorten with a bitwise operator
>    implementation that I just removed from pgbench).
>
>  - add comments about the bijective transformations in the code.
>
> As already stated, this function makes sense for people who want to test
> performance with pgbench using non uniform rands. If you do not want to
> do that, you will probably find the function pretty useless. I can't
> help it.
>
> Also, non uniform rands is also a way to test pg lock contention behavior.

You have signed up as a reviewer for this patch. Do you know when
you'll have time to do the review?

Regards,
--
-David
david(at)pgmasters(dot)net


From: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: David Steele <david(at)pgmasters(dot)net>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-03-30 04:33:17
Message-ID: 307a532c-bc16-8b00-9395-15ca8f5129db@interdb.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2019/03/21 17:27, David Steele wrote:
> Hi Hironobu,
>

Sorry for the late reply. I reviewed this patch.

Function nbits(), which was previously discussed, has been simplified by
using the function pg_popcount64().

By adding the mathematical explanation, it has been easier to understand
the operation of this function.

I believe that these improvements will have a positive impact on
maintenance.

The patch could be applied successfully and the tests passed without
problems.

So, I think the latest patch is fine.

Best regards,

> On 3/3/19 12:55 PM, Fabien COELHO wrote:
>>
>>> Indeed, the patch needs a rebase & conflit resolution. I'll do it.
>>> Later.
>>
>> Here is an update:
>>
>>   - take advantage of pg_bitutils (although I noted that the "slow"
>>     popcount there could be speeded-up and shorten with a bitwise
>> operator
>>     implementation that I just removed from pgbench).
>>
>>   - add comments about the bijective transformations in the code.
>>
>> As already stated, this function makes sense for people who want to
>> test performance with pgbench using non uniform rands. If you do not
>> want to do that, you will probably find the function pretty useless. I
>> can't help it.
>>
>> Also, non uniform rands is also a way to test pg lock contention
>> behavior.
>
> You have signed up as a reviewer for this patch.  Do you know when
> you'll have time to do the review?
>
> Regards,


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-05-23 14:45:43
Message-ID: alpine.DEB.2.21.1905231641450.28482@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Hironobu-san,

Here is a v15 which is a rebase, plus a large simplification of the modmul
function if an int128 type is available, which is probably always…

Could you have a look and possibly revalidate?

> Sorry for the late reply. I reviewed this patch.
>
> Function nbits(), which was previously discussed, has been simplified by
> using the function pg_popcount64().
>
> By adding the mathematical explanation, it has been easier to understand the
> operation of this function.
>
> I believe that these improvements will have a positive impact on maintenance.
>
> The patch could be applied successfully and the tests passed without
> problems.
>
> So, I think the latest patch is fine.
>
>
> Best regards,
>
>
>
>> On 3/3/19 12:55 PM, Fabien COELHO wrote:
>>>
>>>> Indeed, the patch needs a rebase & conflit resolution. I'll do it. Later.
>>>
>>> Here is an update:
>>>
>>>   - take advantage of pg_bitutils (although I noted that the "slow"
>>>     popcount there could be speeded-up and shorten with a bitwise operator
>>>     implementation that I just removed from pgbench).
>>>
>>>   - add comments about the bijective transformations in the code.
>>>
>>> As already stated, this function makes sense for people who want to test
>>> performance with pgbench using non uniform rands. If you do not want to do
>>> that, you will probably find the function pretty useless. I can't help it.
>>>
>>> Also, non uniform rands is also a way to test pg lock contention behavior.
>>
>> You have signed up as a reviewer for this patch.  Do you know when you'll
>> have time to do the review?
>>
>> Regards,
>
>

--
Fabien Coelho - CRI, MINES ParisTech

Attachment Content-Type Size
pgbench-prp-func-15.patch text/x-diff 17.9 KB

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-07-16 04:47:55
Message-ID: CA+hUKGJexK+xNCyY6xmg+NcMjHBX-+eoW7bWzzv3c6xb=EYL6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg무지개 토토SQL

On Fri, May 24, 2019 at 2:46 AM Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> Here is a v15 which is a rebase, plus a large simplification of the modmul
> function if an int128 type is available, which is probably always…

> > Function nbits(), which was previously discussed, has been simplified by
> > using the function pg_popcount64().

Hi Fabien, Suzuki-san,

I am not smart enough to commit this or judge its value for
benchmarking, but I have a few trivial comments on the language:

+ It allows to mix the output of non uniform random functions so that

"It allows the output of non-uniform random functions to be mixed so that"

+ ensures that a perfect permutation is applied: there are no collisions
+ nor holes in the output values.

"neither collisions nor holes", or "no collisions or holes"

+ The function errors if size is not positive.

"raises an error"

+ * 24 bits mega primes from https://primes.utm.edu/lists/small/millions/

"24 bit mega primes"

+/* length of n binary representation */
+static int
+nbits(uint64 n)
+{
+ /* set lower bits to 1 and count them */
+ return pg_popcount64(compute_mask(n));
+}

I suppose you could use n == 0 ? 0 : pg_leftmost_one_pos64(n) + 1, and then...

+/* return smallest mask holding n */
+static uint64
+compute_mask(uint64 n)
+{
+ n |= n >> 1;
+ n |= n >> 2;
+ n |= n >> 4;
+ n |= n >> 8;
+ n |= n >> 16;
+ n |= n >> 32;
+ return n;
+}

... here you could use 1 << nbits(n)) - 1. I have no idea if doing it
that way around is better, it's just a thought and removes a few lines
of bit-swizzling code.

--
Thomas Munro
https://enterprisedb.com


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2019-07-23 07:44:09
Message-ID: alpine.DEB.2.21.1907230730150.7144@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Thomas,

>>> Function nbits(), which was previously discussed, has been simplified by
>>> using the function pg_popcount64().
>
> Hi Fabien, Suzuki-san,
>
> I am not smart enough to commit this

I'm in no hurry:-)

> or judge its value for benchmarking,

I think that it is valuable given that we have non uniform random
generators in pgbench.

I'm wondering about the modular_multiply manual implementation which adds
quite a few lines of non trivial code. If int128 is available on all/most
platforms, it could be removed and marked as not supported on these,
although it would create an issue with non regression tests.

> but I have a few trivial comments on the language:
>
> + It allows to mix the output of non uniform random functions so that
>
> "It allows the output of non-uniform random functions to be mixed so that"

Fixed.

> + ensures that a perfect permutation is applied: there are no collisions
> + nor holes in the output values.
>
> "neither collisions nor holes", or "no collisions or holes"

I choose the first.

> + The function errors if size is not positive.
>
> "raises an error"

Fixed.

> + * 24 bits mega primes from https://primes.utm.edu/lists/small/millions/
>
> "24 bit mega primes"

Fixed.

> +/* length of n binary representation */
> +static int
> +nbits(uint64 n)
> +{
> + /* set lower bits to 1 and count them */
> + return pg_popcount64(compute_mask(n));
> +}
>
> I suppose you could use n == 0 ? 0 : pg_leftmost_one_pos64(n) + 1, and then...

It would create a branch, that I'm trying to avoid.

> +/* return smallest mask holding n */
> +static uint64
> +compute_mask(uint64 n)
> +{
> + n |= n >> 1;
> + n |= n >> 2;
> + n |= n >> 4;
> + n |= n >> 8;
> + n |= n >> 16;
> + n |= n >> 32;
> + return n;
> +}
>
> ... here you could use 1 << nbits(n)) - 1. I have no idea if doing it
> that way around is better, it's just a thought and removes a few lines
> of bit-swizzling code.

This would create a infinite recursion as nbits currently uses
compute_mask. The 6 bitfield operation above is pretty efficient, I'd let
it at that.

Attached a v16.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-16.patch text/x-diff 18.0 KB

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-01-04 22:11:15
Message-ID: 20200104221115.exveitoo5r7zcdre@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

This patch was marked as RFC on 2019-03-30, but since then there have
been a couple more issues pointed out in a review by Thomas Munro, and
it went through 2019-09 and 2019-11 without any attention. Is the RFC
status still appropriate?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-01-05 09:02:29
Message-ID: alpine.DEB.2.21.2001050953530.1648@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> This patch was marked as RFC on 2019-03-30, but since then there have
> been a couple more issues pointed out in a review by Thomas Munro, and
> it went through 2019-09 and 2019-11 without any attention. Is the RFC
> status still appropriate?

Thomas review was about comments/documentation wording and asking for
explanations, which I think I addressed, and the code did not actually
change, so I'm not sure that the "needs review" is really needed, but do
as you feel.

--
Fabien


From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-01-30 17:43:41
Message-ID: ade43cd5-b786-7ad9-1b5c-ccfffd5b7fa3@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020-01-05 10:02, Fabien COELHO wrote:
>
>> This patch was marked as RFC on 2019-03-30, but since then there have
>> been a couple more issues pointed out in a review by Thomas Munro, and
>> it went through 2019-09 and 2019-11 without any attention. Is the RFC
>> status still appropriate?
>
> Thomas review was about comments/documentation wording and asking for
> explanations, which I think I addressed, and the code did not actually
> change, so I'm not sure that the "needs review" is really needed, but do
> as you feel.

I read the whole thread, I still don't know what this patch is supposed
to do. I know what the words in the subject line mean, but I don't know
how this helps a pgbench user run better benchmarks. I feel this is
also the sentiment expressed by others earlier in the thread. You
indicated that this functionality makes sense to those who want this
functionality, but so far only two people, namely the patch author and
the reviewer, have participated in the discussion on the substance of
this patch. So either the feature is extremely niche, or nobody
understands it. I think you ought to take about three steps back and
explain this in more basic terms, even just in email at first so that we
can then discuss what to put into the documentation.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-01-30 18:03:46
Message-ID: 20200130180346.GA29922@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020-Jan-30, Peter Eisentraut wrote:

> I read the whole thread, I still don't know what this patch is supposed to
> do. I know what the words in the subject line mean, but I don't know how
> this helps a pgbench user run better benchmarks. I feel this is also the
> sentiment expressed by others earlier in the thread. You indicated that
> this functionality makes sense to those who want this functionality, but so
> far only two people, namely the patch author and the reviewer, have
> participated in the discussion on the substance of this patch. So either
> the feature is extremely niche, or nobody understands it. I think you ought
> to take about three steps back and explain this in more basic terms, even
> just in email at first so that we can then discuss what to put into the
> documentation.

After re-reading one more time, it dawned on me that the point of this
is similar in spirit to this one:
https://wiki.postgresql.org/wiki/Pseudo_encrypt

The idea seems to be to map the int4 domain into itself, so you can use
a sequence to generate numbers that will not look like a sequence,
allowing the user to hide some properties (such as the generation rate)
that might be useful to an eavesdropper/attacker. In terms of writing
benchmarks, it seems useful to destroy all locality of access, which
changes the benchmark completely. (I'm not sure if this is something
benchmark writers really want to have.)

If I'm right, then I agree that the documentation provided with the
patch does a pretty bad job at explaining it, because until now I didn't
at all realize this is what it was.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-02-01 09:07:13
Message-ID: alpine.DEB.2.21.2002010916510.20752@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Peter,

>>> This patch was marked as RFC on 2019-03-30, but since then there have
>>> been a couple more issues pointed out in a review by Thomas Munro, and
>>> it went through 2019-09 and 2019-11 without any attention. Is the RFC
>>> status still appropriate?
>>
>> Thomas review was about comments/documentation wording and asking for
>> explanations, which I think I addressed, and the code did not actually
>> change, so I'm not sure that the "needs review" is really needed, but do
>> as you feel.
>
> I read the whole thread, I still don't know what this patch is supposed to
> do. I know what the words in the subject line mean, but I don't know how
> this helps a pgbench user run better benchmarks. I feel this is also the
> sentiment expressed by others earlier in the thread. You indicated that this
> functionality makes sense to those who want this functionality, but so far
> only two people, namely the patch author and the reviewer, have participated
> in the discussion on the substance of this patch. So either the feature is
> extremely niche, or nobody understands it. I think you ought to take about
> three steps back and explain this in more basic terms, even just in email at
> first so that we can then discuss what to put into the documentation.

Here is the motivation for this feature:

When you design a benchmark to test performance, you want to avoid
unwanted correlation which may impact performance unduly, one way or
another. For the default pgbench benchmarks, accounts are chosen
uniformly, no problem. However, if you want to test a non uniform
distribution, which is what most people would encounter in practice,
things are different.

For instance, you would replace random by random_exp in the default
benchmarks. If you do that, then all accounts with lower ids would come
out more often. However this creates an unwanted correlation and
performance effect: why frequent accounts would just happen to be those
with small ids? which just happen to reside together in the first few
pages of the table? In order to avoid these effects, you need to shuffle
the chosen account ids, so that frequent accounts are not specifically
those at the beginning of the table.

Basically, as soon as you have a non uniform random generator selecting
step you want to add some shuffle, otherwise you are going to skew your
benchmark measures. Nobody should use a non-uniform generator for
selecting rows without some shuffling, ever. I have no doubt that many
people do without realizing that they are skewing their performance data.

Once you realize that, you can try to invent your own shuffling, but
frankly this is not as easy as it looks to have something non trivial
which would not generate unwanted correlation. I had a lot of (very) bad
design before coming up with the one in the patch: You want something
fast, you want good steering, which are contradictory objectives. There is
also the question of being able to change the shuffling for a given size,
for instance to check that there is no unwanted effects, hence the
seeding. Good seeded shuffling is what an encryption algorithm do, but for
these it is somehow simpler because they would usually work on power of
two sizes.

In conclusion, using a seeded shuffle step is needed as soon as you start
using non random generators. Providing one in pgbench, a tool designed to
run possibly non-uniform benchmarks against postgres, looks like common
courtesy. Not providing one is helping people to design bad benchmarks,
possibly without realizing it, so is outright thoughtlessness.

I have no doubt that the documentation should be improved to explain that
in a concise but clear way.

--
Fabien.


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-02-01 10:12:25
Message-ID: alpine.DEB.2.21.2002011007340.20752@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Alvaro,

>> I read the whole thread, I still don't know what this patch is supposed to
>> do. I know what the words in the subject line mean, but I don't know how
>> this helps a pgbench user run better benchmarks. I feel this is also the
>> sentiment expressed by others earlier in the thread. You indicated that
>> this functionality makes sense to those who want this functionality, but so
>> far only two people, namely the patch author and the reviewer, have
>> participated in the discussion on the substance of this patch. So either
>> the feature is extremely niche, or nobody understands it. I think you ought
>> to take about three steps back and explain this in more basic terms, even
>> just in email at first so that we can then discuss what to put into the
>> documentation.
>
> After re-reading one more time, it dawned on me that the point of this
> is similar in spirit to this one:
> https://wiki.postgresql.org/wiki/Pseudo_encrypt

Indeed. The one in the wiki is useless because it is on all integers,
whereas in a benchmark you want it for a given size and you want seeding,
but otherwise the same correlation-avoidance problem is addressed.

> The idea seems to be to map the int4 domain into itself, so you can use
> a sequence to generate numbers that will not look like a sequence,
> allowing the user to hide some properties (such as the generation rate)
> that might be useful to an eavesdropper/attacker. In terms of writing
> benchmarks, it seems useful to destroy all locality of access, which
> changes the benchmark completely.

Yes.

> (I'm not sure if this is something benchmark writers really want to
> have.)

I do not get this sentence. I'm sure that a benchmark writer should really
want to avoid unrealistic correlations that have a performance impact.

> If I'm right, then I agree that the documentation provided with the
> patch does a pretty bad job at explaining it, because until now I didn't
> at all realize this is what it was.

The documentation is improvable, no doubt.

Attached is an attempt at improving things. I have added a explicit note
and hijacked an existing example to better illustrate the purpose of the
function.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-18.patch text/x-diff 20.7 KB

From: David Steele <david(at)pgmasters(dot)net>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-04-01 16:03:35
Message-ID: 68c1cb47-1bf3-2284-8647-8296e93a1830@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg와이즈 토토SQL

Hi Fabien,

On 2/1/20 5:12 AM, Fabien COELHO wrote:
>
> Attached is an attempt at improving things. I have added a explicit note
> and hijacked an existing example to better illustrate the purpose of the
> function.

This patch does not build on Linux due to some unused functions and
variables: http://cfbot.cputube.org/patch_27_1712.log

The CF entry has been updated to Waiting on Author.

A few committers have expressed doubts about whether this patch is
needed and it doesn't make sense to keep moving it from CF to CF.

I'm planning to mark this patch RwF on April 8 and I suggest you
resubmit if you are able to get some consensus.

Regards,
--
-David
david(at)pgmasters(dot)net


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-04-02 06:42:52
Message-ID: alpine.DEB.2.21.2004020809050.16227@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg배트맨 토토SQL


Hello David,

>> Attached is an attempt at improving things. I have added a explicit note
>> and hijacked an existing example to better illustrate the purpose of the
>> function.
>
> This patch does not build on Linux due to some unused functions and
> variables: http://cfbot.cputube.org/patch_27_1712.log

This link is about some other patch, but indeed there is something amiss
in v18. Attached a v19 which fixes that.

> The CF entry has been updated to Waiting on Author.

I put it back to "needs review".

> A few committers have expressed doubts about whether this patch is needed

Yep.

The key point is that if you (think that you) do not need it, it is
by definition useless.

If you finally figure out that you need it (IMHO you must for any
benchmark with non uniform randoms, otherwise performance result are
biased and thus invalid) and it is not available, then you are just stuck.

> and it doesn't make sense to keep moving it from CF to CF.

You do as you feel. IMO such a feature is useful and consistent with
providing non-uniform random functions.

> I'm planning to mark this patch RwF on April 8 and I suggest you resubmit if
> you are able to get some consensus.

People interested in non-uniform benchmarks would see the point. Why many
people would be happy with uniform benchmarks only while life is not
uniform at all fails me.

--
Fabien.


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-04-02 07:01:56
Message-ID: 20200402070156.GA16377@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2020-Apr-02, Fabien COELHO wrote:

> > I'm planning to mark this patch RwF on April 8 and I suggest you
> > resubmit if you are able to get some consensus.
>
> People interested in non-uniform benchmarks would see the point. Why many
> people would be happy with uniform benchmarks only while life is not uniform
> at all fails me.

I don't think we should boot this patch. I don't think I would be able
to get this over the commit line in this CF, but let's not discard it.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-04-05 12:45:42
Message-ID: alpine.DEB.2.21.2004051422020.16227@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg젠 토토SQL :


> Attached is an attempt at improving things. I have added a explicit note and
> hijacked an existing example to better illustrate the purpose of the
> function.

A significant part of the complexity of the patch is the overflow-handling
implementation of (a * b % c) for 64 bits integers.

However this implementation is probably never used because int128 bits are
available and the one line implementation takes precedence, or the size is
small enough (less than 48 bits) so that there are no overflows with the
primes involved, thus the operation is done directly on 64 bits integers.

I could remove the implementation and replace it with a "not available on
this architecture" message, which would seldom be triggered: you would
have to use a 32 bits arch (?) and test against a table with about 262
Trows, which I guess does not exists anywhere. This approach would remove
about 40% of the code & comments.

Thougths?

--
Fabien.


From: David Steele <david(at)pgmasters(dot)net>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-04-09 12:46:48
Message-ID: 31062b94-fc63-9179-24e9-bedeedd7d054@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg젠 토토SQL :

On 4/2/20 3:01 AM, Alvaro Herrera wrote:
> On 2020-Apr-02, Fabien COELHO wrote:
>
>>> I'm planning to mark this patch RwF on April 8 and I suggest you
>>> resubmit if you are able to get some consensus.
>>
>> People interested in non-uniform benchmarks would see the point. Why many
>> people would be happy with uniform benchmarks only while life is not uniform
>> at all fails me.
>
> I don't think we should boot this patch. I don't think I would be able
> to get this over the commit line in this CF, but let's not discard it.

Understood. I have moved the patch to the 2020-07 CF in Needs Review state.

Regards,
--
-David
david(at)pgmasters(dot)net


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-07-09 06:57:01
Message-ID: alpine.DEB.2.22.394.2007090856140.417896@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토SQL : Postg토토SQL


>> I don't think we should boot this patch. I don't think I would be able
>> to get this over the commit line in this CF, but let's not discard it.
>
> Understood. I have moved the patch to the 2020-07 CF in Needs Review state.

Attached v19 is a rebase, per cfbot.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-19.patch text/x-diff 20.2 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-07-10 13:36:07
Message-ID: alpine.DEB.2.22.394.2007101534200.928291@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Attached v19 is a rebase, per cfbot.

Attached v20 fixes a doc xml typo, per cfbot again.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-20.patch text/x-diff 20.2 KB

From: Muhammad Usama <m(dot)usama(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Fabien Coelho <postgresql(dot)org(at)coelho(dot)net>, Hironobu Suzuki <hironobu(at)interdb(dot)jp>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2020-10-26 10:52:46
Message-ID: 160370956615.1204.14514967356384507379.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

There are few "Stripping trailing CRs from the patch" and one "Hunk succeeded offset -2 lines" warning.
other than that the patch applies successfully and works as it claims.

The new status of this patch is: Ready for Committer


From: David Steele <david(at)pgmasters(dot)net>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-05 14:11:47
Message-ID: cf552f8b-8835-886b-842f-e96dcac52dbc@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Álvaro,

On 4/2/20 3:01 AM, Alvaro Herrera wrote:
> On 2020-Apr-02, Fabien COELHO wrote:
>
>>> I'm planning to mark this patch RwF on April 8 and I suggest you
>>> resubmit if you are able to get some consensus.
>>
>> People interested in non-uniform benchmarks would see the point. Why many
>> people would be happy with uniform benchmarks only while life is not uniform
>> at all fails me.
>
> I don't think we should boot this patch. I don't think I would be able
> to get this over the commit line in this CF, but let's not discard it.

As far as I can see you are the only committer who has shown real
interest in this patch. It's been sitting idle for the last year.

What are your current thoughts?

Regards,
--
-David
david(at)pgmasters(dot)net


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-05 16:30:15
Message-ID: 20210305163015.GA24899@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: 503 스포츠 토토 결과

Hi David

On 2021-Mar-05, David Steele wrote:

> On 4/2/20 3:01 AM, Alvaro Herrera wrote:
> > On 2020-Apr-02, Fabien COELHO wrote:
> >
> > > > I'm planning to mark this patch RwF on April 8 and I suggest you
> > > > resubmit if you are able to get some consensus.
> > >
> > > People interested in non-uniform benchmarks would see the point. Why many
> > > people would be happy with uniform benchmarks only while life is not uniform
> > > at all fails me.
> >
> > I don't think we should boot this patch. I don't think I would be able
> > to get this over the commit line in this CF, but let's not discard it.
>
> As far as I can see you are the only committer who has shown real interest
> in this patch. It's been sitting idle for the last year.
>
> What are your current thoughts?

Thanks for prodding. I still think it's a useful feature. However I
don't think I'll have to time to get it done on the current commitfest.
I suggest to let it sit in the commitfest to see if somebody else will
pick it up -- and if not, we move it to the next one, with apologies to
author and reviewers.

I may have time to become familiar or at least semi-comfortable with all
that weird math in it by then.

--
Álvaro Herrera 39°49'30"S 73°17'W
"At least to kernel hackers, who really are human, despite occasional
rumors to the contrary" (LWN.net)


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-08 10:50:43
Message-ID: alpine.DEB.2.22.394.2103081136490.2754793@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> What are your current thoughts?
>
> Thanks for prodding. I still think it's a useful feature. However I
> don't think I'll have to time to get it done on the current commitfest.
> I suggest to let it sit in the commitfest to see if somebody else will
> pick it up -- and if not, we move it to the next one, with apologies to
> author and reviewers.
>
> I may have time to become familiar or at least semi-comfortable with all
> that weird math in it by then.

Yep.

Generating a parametric good-quality low-cost (but not
cryptographically-secure) pseudo-random permutations on arbitrary sizes
(not juste power of two sizes) is not a trivial task, I had to be quite
creative to achieve it, hence the "weird" maths. I had a lot of bad
not-really-working ideas before the current status of the patch.

The code could be simplified if we assume that PG_INT128_TYPE will be
available on all relevant architectures, and accept the feature not to be
available if not.

--
Fabien.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-11 00:58:39
Message-ID: 20210311005839.GC11261@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 8, 2021 at 11:50:43AM +0100, Fabien COELHO wrote:
>
> > > What are your current thoughts?
> >
> > Thanks for prodding. I still think it's a useful feature. However I
> > don't think I'll have to time to get it done on the current commitfest.
> > I suggest to let it sit in the commitfest to see if somebody else will
> > pick it up -- and if not, we move it to the next one, with apologies to
> > author and reviewers.
> >
> > I may have time to become familiar or at least semi-comfortable with all
> > that weird math in it by then.
>
> Yep.
>
> Generating a parametric good-quality low-cost (but not
> cryptographically-secure) pseudo-random permutations on arbitrary sizes (not
> juste power of two sizes) is not a trivial task, I had to be quite creative
> to achieve it, hence the "weird" maths. I had a lot of bad
> not-really-working ideas before the current status of the patch.
>
> The code could be simplified if we assume that PG_INT128_TYPE will be
> available on all relevant architectures, and accept the feature not to be
> available if not.

Maybe Dean Rasheed can help because of his math background --- CC'ing him.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-11 16:31:44
Message-ID: CAEZATCWa1P7hVwJCxCDEDOFvVERO2J51XBz_6gbsozR_ton1fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 11 Mar 2021 at 00:58, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> Maybe Dean Rasheed can help because of his math background --- CC'ing him.
>

Reading the thread I can see how such a function might be useful to
scatter non-uniformly random values.

The implementation looks plausible too, though it adds quite a large
amount of new code. The main thing that concerns me is justifying the
code. With this kind of thing, it's all too easy to overlook corner
cases and end up with trivial sequences in certain special cases. I'd
feel better about that if we were implementing a standard algorithm
with known pedigree.

Thinking about the use case for this, it seems that it's basically
designed to turn a set of non-uniform random numbers (produced by
random_exponential() et al.) into another set of non-uniform random
numbers, where the non-uniformity is scattered so that the more/less
common values aren't all clumped together.

I'm wondering if that's something that can't be done more simply by
passing the non-uniform random numbers through the uniform random
number generator to scatter them uniformly across some range -- e.g.,
given an integer n, return the n'th value from the sequence produced
by random(), starting from some initial seed -- i.e., implement
nth_random(lb, ub, seed, n). That would actually be pretty
straightforward to implement using O(log(n)) time to execute (see the
attached python example), though it wouldn't generate a permutation,
so it'd need a bit of thought to see if it met the requirements.

Regards,
Dean

Attachment Content-Type Size
erand48.py text/x-python 926 bytes

From: David Bowen <dmb0317(at)gmail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-11 19:06:06
Message-ID: CANO09fgLiT3BfAfjTSrBeg+eQGXbe_AtTVMEBwrqraQMbB=+8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg사설 토토SQL

The algorithm for generating a random permutation with a uniform
distribution across all permutations is easy:
for (i=0; i<n; i++) {
swap a[n-i] with a[rand(n-i+1)]
}

where 0 <= rand(x) < x and a[i] is initially i (see Knuth, Section 3.4.2
Algorithm P)

David Bowen

On Thu, Mar 11, 2021 at 9:32 AM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
wrote:

> On Thu, 11 Mar 2021 at 00:58, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >
> > Maybe Dean Rasheed can help because of his math background --- CC'ing
> him.
> >
>
> Reading the thread I can see how such a function might be useful to
> scatter non-uniformly random values.
>
> The implementation looks plausible too, though it adds quite a large
> amount of new code. The main thing that concerns me is justifying the
> code. With this kind of thing, it's all too easy to overlook corner
> cases and end up with trivial sequences in certain special cases. I'd
> feel better about that if we were implementing a standard algorithm
> with known pedigree.
>
> Thinking about the use case for this, it seems that it's basically
> designed to turn a set of non-uniform random numbers (produced by
> random_exponential() et al.) into another set of non-uniform random
> numbers, where the non-uniformity is scattered so that the more/less
> common values aren't all clumped together.
>
> I'm wondering if that's something that can't be done more simply by
> passing the non-uniform random numbers through the uniform random
> number generator to scatter them uniformly across some range -- e.g.,
> given an integer n, return the n'th value from the sequence produced
> by random(), starting from some initial seed -- i.e., implement
> nth_random(lb, ub, seed, n). That would actually be pretty
> straightforward to implement using O(log(n)) time to execute (see the
> attached python example), though it wouldn't generate a permutation,
> so it'd need a bit of thought to see if it met the requirements.
>
> Regards,
> Dean
>


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: David Bowen <dmb0317(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-11 19:34:57
Message-ID: CAEZATCVmqTUOhdhiLP7X9g7ZL-j0XGRY6z+TrnZeq0UfpCSG1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 11 Mar 2021 at 19:06, David Bowen <dmb0317(at)gmail(dot)com> wrote:
>
> The algorithm for generating a random permutation with a uniform distribution across all permutations is easy:
> for (i=0; i<n; i++) {
> swap a[n-i] with a[rand(n-i+1)]
> }
>
> where 0 <= rand(x) < x and a[i] is initially i (see Knuth, Section 3.4.2 Algorithm P)
>

True, but n can be very large, so that might use a lot of memory and
involve a lot of pre-processing.

Regards,
Dean


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-12 09:43:59
Message-ID: alpine.DEB.2.22.394.2103121031420.599618@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Dean,

> The implementation looks plausible too, though it adds quite a large
> amount of new code.

A significant part of this new code the the multiply-modulo
implementation, which can be dropped if we assume that the target has
int128 available, and accept that the feature is not available otherwise.
Also, there are quite a lot of comments which add to the code length.

> The main thing that concerns me is justifying the code. With this kind
> of thing, it's all too easy to overlook corner cases and end up with
> trivial sequences in certain special cases. I'd feel better about that
> if we were implementing a standard algorithm with known pedigree.

Yep. I did not find anything convincing with the requirements: generate a
permutation, can be parametric, low constant cost, good quality, work on
arbitrary sizes…

> Thinking about the use case for this, it seems that it's basically
> designed to turn a set of non-uniform random numbers (produced by
> random_exponential() et al.) into another set of non-uniform random
> numbers, where the non-uniformity is scattered so that the more/less
> common values aren't all clumped together.

Yes.

> I'm wondering if that's something that can't be done more simply by
> passing the non-uniform random numbers through the uniform random
> number generator to scatter them uniformly across some range -- e.g.,
> given an integer n, return the n'th value from the sequence produced
> by random(), starting from some initial seed -- i.e., implement
> nth_random(lb, ub, seed, n). That would actually be pretty
> straightforward to implement using O(log(n)) time to execute (see the
> attached python example), though it wouldn't generate a permutation,
> so it'd need a bit of thought to see if it met the requirements.

Indeed, this violates two requirements: constant cost & permutation.

--
Fabien.


From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-12 21:06:48
Message-ID: CA+hUKGJJskxpyAev3jcFq-owkTkZLHwC_b71uMFNh+wWuTZzVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 8, 2021 at 11:50 PM Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> > I may have time to become familiar or at least semi-comfortable with all
> > that weird math in it by then.
>
> Yep.
>
> Generating a parametric good-quality low-cost (but not
> cryptographically-secure) pseudo-random permutations on arbitrary sizes
> (not juste power of two sizes) is not a trivial task, I had to be quite
> creative to achieve it, hence the "weird" maths. I had a lot of bad
> not-really-working ideas before the current status of the patch.
>
> The code could be simplified if we assume that PG_INT128_TYPE will be
> available on all relevant architectures, and accept the feature not to be
> available if not.

That doesn't sound like a bad option to me, if it makes this much
simpler. The main modern system without it seems to be MSVC. The
Linux, BSD, Apple, illumos, AIX systems using Clang/GCC with
Intel/AMD/ARM/PowerPC CPUs have it, and the Windows systems using open
source compilers have it.


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-12 21:12:22
Message-ID: 20210312211222.GA30770@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2021-Mar-13, Thomas Munro wrote:

> That doesn't sound like a bad option to me, if it makes this much
> simpler. The main modern system without it seems to be MSVC. The
> Linux, BSD, Apple, illumos, AIX systems using Clang/GCC with
> Intel/AMD/ARM/PowerPC CPUs have it, and the Windows systems using open
> source compilers have it.

Hmm. Can we go a bit further, and say that if you don't have 128-bit
ints, then you can use pr_perm but only to a maximum of 32-bit ints?
Then you can do the calculations in 64-bit ints. That's much less bad
than desupporting the feature altogether.

--
Álvaro Herrera 39°49'30"S 73°17'W
"No necesitamos banderas
No reconocemos fronteras" (Jorge González)


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-13 08:49:53
Message-ID: alpine.DEB.2.22.394.2103130910470.771005@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Alvaro,

>> That doesn't sound like a bad option to me, if it makes this much
>> simpler. The main modern system without it seems to be MSVC. The
>> Linux, BSD, Apple, illumos, AIX systems using Clang/GCC with
>> Intel/AMD/ARM/PowerPC CPUs have it, and the Windows systems using open
>> source compilers have it.
>
> Hmm. Can we go a bit further, and say that if you don't have 128-bit
> ints, then you can use pr_perm but only to a maximum of 32-bit ints?
> Then you can do the calculations in 64-bit ints. That's much less bad
> than desupporting the feature altogether.

This looks like a good compromise, which can even be a little better
because we still have 64 bits ints.

As there is already a performance shortcut in the code relying on the fact
that x is 24 bits and that no overflow occurs if y & m are up to 48 bits
(we are using unsigned int), the simplification is just to abort if int128
is not available, because we would have called the function only if it was
really needed.

Attached a simplified patch which does that, removing 1/3 of the code.
What do you think?

Note that this creates one issue though: tests now fail if 128 bits ints
are not available. I'm not sure how to detect & skip that from the tap
tests. I'm not keen on removing the tests either. Will give it some
thoughts if you want to proceed in that direction.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-21.patch text/x-diff 16.9 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-13 14:44:00
Message-ID: 20210313144400.GA23514@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

On 2021-Mar-13, Fabien COELHO wrote:

> This looks like a good compromise, which can even be a little better because
> we still have 64 bits ints.

Yeah, we rely on those being available elsewhere.

> Attached a simplified patch which does that, removing 1/3 of the code. What
> do you think?

Clearly we got rid of a lot of code. About the tests, maybe the easiest
thing to do is "skip ... if $Config{'osname'} eq 'MSWin32'".

One comment in pseudorandom_perm talks about "whitening" while the other
talks about "scramble". I think they should both use the same term to
ease understanding.

You kept routine nbits() which is unneeded now.

pgbench.c gains too many includes for my taste. Can we put
pseudorandom_perm() in a separate file?

The function name pr_perm() is much too short; I think we should use a
more descriptive name; maybe \prand_perm is sufficient? Though I admit
the abbreviation "perm" evokes "permission" more than "permutation" to
me, so maybe \prand_permutation is better? (If you're inclined to
abbreviate permutation, maybe \prand_permut?)

I think the reference docs are not clear enough. What do you think of
something like this?

> + <row>
> + <entry role="func_table_entry"><para role="func_signature">
> + <function>pr_perm</function> ( <parameter>i</parameter>, <parameter>size</parameter> [, <parameter>seed</parameter> ] )
> + <returnvalue>integer</returnvalue>
> + </para>
> + <para>
> + i'th pseudo-random permutation in <literal>[0,size)</literal>
> + <!-- "The i'th value of the pseudo-random permutation in the interval [0,size)"? -->
> + </para>
> + <para>
> + <literal>pr_perm(2, 4)</literal>
> + <returnvalue>the 2nd integer between 0 and (4-1)</returnvalue>
> + </para></entry>
> + </row>

--
Álvaro Herrera Valdivia, Chile


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-13 15:42:18
Message-ID: alpine.DEB.2.22.394.2103131548550.771005@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Alvaro,

> Clearly we got rid of a lot of code. About the tests, maybe the easiest
> thing to do is "skip ... if $Config{'osname'} eq 'MSWin32'".

I tried that.

> One comment in pseudorandom_perm talks about "whitening" while the other
> talks about "scramble". I think they should both use the same term to
> ease understanding.

Good point.

> You kept routine nbits() which is unneeded now.

Indeed.

> pgbench.c gains too many includes for my taste. Can we put
> pseudorandom_perm() in a separate file?

The refactoring I was thinking about for some time now is to separate
expression evaluation entirely, not just one function, and possibly other
parts as well. I suggest that this should wait for another submission.

> The function name pr_perm() is much too short; I think we should use a
> more descriptive name; maybe \prand_perm is sufficient? Though I admit
> the abbreviation "perm" evokes "permission" more than "permutation" to
> me, so maybe \prand_permutation is better? (If you're inclined to
> abbreviate permutation, maybe \prand_permut?)

What about simply "permute"? Pgbench is unlikely to get another permute
function anyway.

> I think the reference docs are not clear enough. What do you think of
> something like this?

I agree that the documentation is not clear enough. The proposal would not
help me to understand what it does, though. I've tried to improve the
explanation based on wikipedia explanations about permutations.

See attached v22.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-22.patch text/x-diff 17.5 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-14 10:17:56
Message-ID: alpine.DEB.2.22.394.2103141116220.164573@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> See attached v22.

v23:
- replaces remaining occurences of "pr_perm" in the documentation
- removes duplicated includes

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-23.patch text/x-diff 17.2 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-14 14:54:46
Message-ID: 20210314145446.GA31638@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2021-Mar-14, Fabien COELHO wrote:

> + /*-----
> + * Apply 4 rounds of bijective transformations using key updated
> + * at each stage:
> + *
> + * (1) whiten: partial xors on overlapping power-of-2 subsets
> + * for instance with v in 0 .. 14 (i.e. with size == 15):
> + * if v is in 0 .. 7 do v = (v ^ k) % 8
> + * if v is in 7 .. 14 do v = 14 - ((14-v) ^ k) % 8
> + * note that because of the overlap (here 7), v may be changed twice.
> + * this transformation if bijective because the condition to apply it
> + * is still true after applying it, and xor itself is bijective on a
> + * power-of-2 size.
> + *
> + * (2) scatter: linear modulo
> + * v = (v * p + k) % size
> + * this transformation is bijective is p & size are prime, which is
> + * ensured in the code by the while loop which discards primes when
> + * size is a multiple of it.
> + *
> + */

My main question on this now is, do you have a scholar reference for
this algorithm?

--
Álvaro Herrera Valdivia, Chile
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude." (Brian Kernighan)


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-14 16:08:30
Message-ID: alpine.DEB.2.22.394.2103141647020.339707@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Alvaro,

>> + /*-----
>> + * Apply 4 rounds of bijective transformations using key updated
>> + * at each stage:
>> + *
>> + * (1) whiten: partial xors on overlapping power-of-2 subsets
>> + * for instance with v in 0 .. 14 (i.e. with size == 15):
>> + * if v is in 0 .. 7 do v = (v ^ k) % 8
>> + * if v is in 7 .. 14 do v = 14 - ((14-v) ^ k) % 8
>> + * note that because of the overlap (here 7), v may be changed twice.
>> + * this transformation if bijective because the condition to apply it
>> + * is still true after applying it, and xor itself is bijective on a
>> + * power-of-2 size.
>> + *
>> + * (2) scatter: linear modulo
>> + * v = (v * p + k) % size
>> + * this transformation is bijective is p & size are prime, which is
>> + * ensured in the code by the while loop which discards primes when
>> + * size is a multiple of it.
>> + *
>> + */
>
> My main question on this now is, do you have a scholar reference for
> this algorithm?

Nope, otherwise I would have put a reference. I'm a scholar though, if
it helps:-)

I could not find any algorithm that fitted the bill. The usual approach
(eg benchmarking designs) is too use some hash function and assume that it
is not a permutation, too bad.

Basically the algorithm mimics a few rounds of cryptographic encryption
adapted to any size and simple operators, whereas encryption function are
restricted to power of two blocks (eg the Feistel network). The structure
is the same AES with its AddRoundKey the xor-ing stage (adapted to non
power of two in whiten above), MixColumns which does the scattering, and
for key expansion, I used Donald Knuth generator. Basically one could say
that permute is an inexpensive and insecure AES:-)

We could add a reference to AES for the structure of the algorithm itself,
but otherwise I just iterated designs till I was satisfied with the
result (again: inexpensive and constant cost, any size, permutation…).

--
Fabien.


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-22 13:43:53
Message-ID: CAEZATCVbotfL4WOiwDshBn2CYo+-gKv45x74jM4A8qFFFiRALQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 14 Mar 2021 at 16:08, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> > My main question on this now is, do you have a scholar reference for
> > this algorithm?
>
> Nope, otherwise I would have put a reference. I'm a scholar though, if
> it helps:-)
>
> I could not find any algorithm that fitted the bill. The usual approach
> (eg benchmarking designs) is too use some hash function and assume that it
> is not a permutation, too bad.
>
> Basically the algorithm mimics a few rounds of cryptographic encryption
> adapted to any size and simple operators, whereas encryption function are
> restricted to power of two blocks (eg the Feistel network). The structure
> is the same AES with its AddRoundKey the xor-ing stage (adapted to non
> power of two in whiten above), MixColumns which does the scattering, and
> for key expansion, I used Donald Knuth generator. Basically one could say
> that permute is an inexpensive and insecure AES:-)
>

I spent a little time playing around with this, trying to come up with
a reasonable way to test it. It's apparent from the code and
associated comments that the transformation is bijective and so will
produce a permutation, but it's less obvious how random that
permutation will be. Obviously the Fisher-Yates algorithm would give
the most random permutation, but that's not really practical in this
context. Any other algorithm will, in some sense, be less random than
that, so I think it's really just a question of testing that it's
random enough to satisfy the intended use case.

The approach to testing I tried was to use the Kolmogorov-Smirnov test
to test how uniformly random a couple of quantities are:

1). For a given size and fixed input value, and a large number of
randomly chosen seed values, is the output uniformly random?

2). For a given size and a pair of nearby input values, are the two
output values uniformly randomly spaced apart?

This second test seems desirable to ensure sufficient shuffling so
that inputs coming from some non-uniform random distribution are
scattered sufficiently to distribute the maxima/minima (x -> x + rand
mod size would pass test 1, so that by itself is insufficient).

I tested this using the attached (somewhat ugly) stand-alone test C
program, and for the most part these 2 tests seemed to pass. The
program also tests that the output really is a permutation, just to be
sure, and this was confirmed in all cases.

However, I noticed that the results are less good when size is a power
of 2. In this case, the results seem significantly less uniform,
suggesting that for such sizes, there is an increased chance that the
permuted output might still be skewed.

Based on the above, I also experimented with a different permutation
algorithm (permute2() in the test), which attempts to inject more
randomness into the bijection, and between pairs of inputs, to make
the output less predictable and less likely to be accidentally
non-uniform. It's possible that it suffers from other problems, but it
did do better in these 2 tests.

Maybe some other tests might be useful, but I really don't know. As
noted above, any algorithm is likely to lead to some pattern in the
output (e.g., it looks like both these algorithms cause adjacent
inputs to always end up separated by an odd number), so a judgement
call will be required to decide what is random enough.

Regards,
Dean

Attachment Content-Type Size
permute.c text/x-csrc 6.8 KB

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-30 09:55:12
Message-ID: CAEZATCXxmmgjEFFkB4-N-xMrbMit89oGhef4iLiusVvWzDnqcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 22 Mar 2021 at 13:43, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> On Sun, 14 Mar 2021 at 16:08, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> >
> > > My main question on this now is, do you have a scholar reference for
> > > this algorithm?
> >
> > Nope, otherwise I would have put a reference. I'm a scholar though, if
> > it helps:-)
> >
> > I could not find any algorithm that fitted the bill. The usual approach
> > (eg benchmarking designs) is too use some hash function and assume that it
> > is not a permutation, too bad.
> >
> > Basically the algorithm mimics a few rounds of cryptographic encryption
> > adapted to any size and simple operators, whereas encryption function are
> > restricted to power of two blocks (eg the Feistel network). The structure
> > is the same AES with its AddRoundKey the xor-ing stage (adapted to non
> > power of two in whiten above), MixColumns which does the scattering, and
> > for key expansion, I used Donald Knuth generator. Basically one could say
> > that permute is an inexpensive and insecure AES:-)
> >
>
> I spent a little time playing around with this, trying to come up with
> a reasonable way to test it.

I spent more time testing this -- the previous testing was mostly for
large values of size, so I decided to also test it for small sizes.
The theoretical number of possible permutations grows rapidly with
size (size!), and the actual number of permutations observed grew
almost as quickly:

size size! distinct perms
2 2 2
3 6 6
4 24 24
5 120 120
6 720 720
7 5040 5040
8 40320 24382
9 362880 181440
10 3628800 3627355

My test script stopped once the first permutation had been seen 100
times, so it might have seen more permutations had it kept going for
longer.

However, looking at the actual output, there is a very significant
non-uniformity in the probability of particular permutations being
chosen, especially at certain sizes like 8 and 10. For example, at
size = 8, the identity permutation is significantly more likely than
almost any other permutation (roughly 13 times more likely than it
would be by random chance). Additionally, the signs are that this
non-uniformity tends to increase with size. At size = 10, the average
number of occurrences of each permutation in the test was 148, but
there were some that didn't appear at all, many that only appeared 2
or 3 times, and then some that appeared nearly 5000 times.

Also, there appears to be an unfortunate dependence on the seed -- if
the seed is even and the size is a power of 2, it looks like the
number of distinct permutations produced is just size/2, which is a
tiny fraction of size!.

This, together with the results from the previous K-S uniformity tests
at larger sizes, suggests that the current algorithm may not be random
enough to scatter values and remove correlations from a non-uniform
distribution.

In an attempt to do better, I tweaked the algorithm in the attached
patch, which makes use of erand48() to inject more randomness into the
permutation choice. Repeating the tests with the updated algorithm
shows that it has a number of advantages:

1). For small sizes (2-10), each of the size! possible permutations is
now produced with roughly equal probability. No single permutation was
much more likely than expected.

2). The loss of randomness for even seeds is gone.

3). For any given input, the output is more uniformly randomly
distributed for different seeds.

4). For any pair of nearby inputs, the corresponding outputs are more
uniformly randomly separated from one another.

5). The updated algorithm no longer uses modular_multiply(), so it
works the same on all platforms.

I still feel slightly uneasy about inventing our own algorithm here,
but I wasn't able to find anything else suitable, and I've now tested
this quite extensively. All the indications are that this updated
algorithm works well at all sizes and produces sufficiently random
results, though if anyone else can think of other approaches to
testing the core algorithm, that would be useful. For reference, I
attach 2 standalone test C programs I used for testing, which include
copies of the old and new algorithms.

I also did a quick copy editing pass over the docs, and I think the
patch is in pretty good shape. Barring objections, I'll see about
committing it later this week.

Regards,
Dean

Attachment Content-Type Size
pgbench-prp-func-24.patch text/x-patch 15.4 KB
permute.c text/x-csrc 7.4 KB
permute_small.c text/x-csrc 4.3 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-30 18:26:22
Message-ID: alpine.DEB.2.22.394.2103301920340.305215@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Dean,

Thanks a lot for this work. This version looks much better than the
previous one you sent, and has significant advantages over the one I sent,
in particular avoiding the prime number stuff and large modular multiply.
So this looks good!

I'm happy that halves-xoring is back because it was an essential part of
the design. ISTM that the previous version you sent only had linear/affine
transforms which was a bad idea.

The linear transform is moved inside halves, why not, and the
any-odd-number multiplication is prime with power-of-two trick is neat on
these.

However I still have some reservations:

First, I have a thing against erand48. The erand 48 bits design looks like
something that made sense when computer architectures where 16/32 bits and
large integers were 32 bits, so a 16 bits margin looked good enough. Using
a int16[3] type now is really depressing and probably costly on modern
architectures. With 64 bits architecture, and given that we are
manipulating 64 bits integers (well 63 really), ISTM that the minimal
state size for a PRNG should be at least 64 bits. It there a good reason
NOT to use a 64 bits state prn generator? I took Knuth's because it is
cheap and 64 bits. I'd accept anything which is at least 64 bits. I looked
at xor-shift things, but there was some controversy around these so I kept
it simple and just shifted small values to get rid of the obvious cycles
on Knuth's generator.

Second, I have a significant reservation about the very structure of the
transformation in this version:

loop 4 times :

// FIRST HALF STEER
m/r = pseudo randoms
if v in first "half"
v = ((v * m) ^ r) & mask;
rotate1(v)

// FULL SHIFT 1
r = pseudo random
v = (v + r) % size

// SECOND HALF STEER
m/r = pseudo randoms
if v in second "half"
same as previous on second half

// FULL SHIFT 2
r = pseudo random
v = (v + r) % size

I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of
values are kept out of STEERING. Whole chunks of values could be kept
unshuffled because they would only have SHIFTS apply to them and each time
fall in the not steered half. It should be an essential part of the design
that at least one steer is applied on a value at each round, and if two
are applied then fine, but certainly not zero. So basically I think that
the design would be significantly improved by removing "FULL SHIFT 1".

Third, I think that the rotate code can be simplified, in particular the
?: should be avoided because it may induce branches quite damaging to
processor performance.

I'll give it some more thoughts.

--
Fabien.


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-30 19:31:13
Message-ID: CAEZATCWMUEdoxkC-R9-Uq5q5N7oWbW_vFy8N1Ph3aBN9f=ORHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 30 Mar 2021 at 19:26, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> First, I have a thing against erand48.

Yeah, that's probably a fair point. However, all the existing pgbench
random functions are using it, so I think it's fair enough for
permute() to do the same (and actually 2^48 is pretty huge). Switching
to a 64-bit PRNG might not be a bad idea, but I think that's something
we'd want to do across the board, and so I think it should be out of
scope for this patch.

> Second, I have a significant reservation about the very structure of the
> transformation in this version:
>
> loop 4 times :
>
> // FIRST HALF STEER
> m/r = pseudo randoms
> if v in first "half"
> v = ((v * m) ^ r) & mask;
> rotate1(v)
>
> // FULL SHIFT 1
> r = pseudo random
> v = (v + r) % size
>
> // SECOND HALF STEER
> m/r = pseudo randoms
> if v in second "half"
> same as previous on second half
>
> // FULL SHIFT 2
> r = pseudo random
> v = (v + r) % size
>
> I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of
> values are kept out of STEERING. Whole chunks of values could be kept
> unshuffled because they would only have SHIFTS apply to them and each time
> fall in the not steered half. It should be an essential part of the design
> that at least one steer is applied on a value at each round, and if two
> are applied then fine, but certainly not zero. So basically I think that
> the design would be significantly improved by removing "FULL SHIFT 1".

Ah, that's a good point. Something else that also concerned me there
was that it might lead to 2 consecutive full shifts with nothing in
between, which would lead to less uniform randomness (like the
Irwin-Hall distribution).

I just did a quick test without the first full shift, and the results
do appear to be better, so removing that looks like a good idea.

> Third, I think that the rotate code can be simplified, in particular the
> ?: should be avoided because it may induce branches quite damaging to
> processor performance.

Yeah, I wondered about that. Perhaps there's a "trick" that can be
used to simplify it. Pre-computing the number of bits in the mask
would probably help. I'll give it some thought.

Regards,
Dean


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-30 19:44:05
Message-ID: CAEZATCUEkKMFPWZWgS7c0Fe=XxD3GgPKoXK+aoyq_Bt=sNVQBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 30 Mar 2021 at 20:31, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>
> Yeah, that's probably a fair point. However, all the existing pgbench
> random functions are using it, so I think it's fair enough for
> permute() to do the same (and actually 2^48 is pretty huge). Switching
> to a 64-bit PRNG might not be a bad idea, but I think that's something
> we'd want to do across the board, and so I think it should be out of
> scope for this patch.
>

Of course the immediate counter-argument to changing the existing
random functions would be that doing so would break lots of people's
tests, and no one would thank us for that. Still, I think that, since
the existing random functions use a 48-bit PRNG, it's not unreasonable
for permute() to do the same.

Regards,
Dean


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-31 08:02:42
Message-ID: alpine.DEB.2.22.394.2103310849380.411399@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Dean,

>> First, I have a thing against erand48.
>
> Yeah, that's probably a fair point. However, all the existing pgbench
> random functions are using it, so I think it's fair enough for permute()
> to do the same (and actually 2^48 is pretty huge). Switching to a 64-bit
> PRNG might not be a bad idea, but I think that's something we'd want to
> do across the board, and so I think it should be out of scope for this
> patch.

But less likely to pass, whereas here we have an internal function that
we can set as we want.

Also, there is a 64 bits seed provided to the function which instantly
ignores 16 of them, which looks pretty silly to me.

Also, the function is named everywhere erand48 with its hardcoded int16[3]
state, which makes a poor abstraction.

At least, I suggest that two 48-bits prng could be initialized with parts
of the seed and used in different places, eg for r & m.

Also, the seed could be used to adjust the rotation, maybe.

>> I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of
>> values are kept out of STEERING. [...]
>
> Ah, that's a good point. Something else that also concerned me there was
> that it might lead to 2 consecutive full shifts with nothing in between,
> which would lead to less uniform randomness (like the Irwin-Hall
> distribution). I just did a quick test without the first full shift, and
> the results do appear to be better,

Indeed, it makes sense to me.

> so removing that looks like a good idea.

>> Third, I think that the rotate code can be simplified, in particular
>> the ?: should be avoided because it may induce branches quite damaging
>> to processor performance.
>
> Yeah, I wondered about that. Perhaps there's a "trick" that can be
> used to simplify it. Pre-computing the number of bits in the mask
> would probably help.

See pg_popcount64().

--
Fabien.


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-31 13:06:13
Message-ID: CAEZATCUMbvg0xOsi1+J1LCV0yz+r8MNESf4sEy++Z9V8GPEO3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 31 Mar 2021 at 09:02, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> >> First, I have a thing against erand48.
> >
> Also, there is a 64 bits seed provided to the function which instantly
> ignores 16 of them, which looks pretty silly to me.
>

Yeah, that was copied from set_random_seed().

> At least, I suggest that two 48-bits prng could be initialized with parts
> of the seed and used in different places, eg for r & m.
>

That could work. I'd certainly feel better about that than
implementing a whole new PRNG.

> Also, the seed could be used to adjust the rotation, maybe.
>

Perhaps. I'm not sure it's really necessary though.

> >> I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of
> >> values are kept out of STEERING. [...]
> >
> > Ah, that's a good point. Something else that also concerned me there was
> > that it might lead to 2 consecutive full shifts with nothing in between,
> > which would lead to less uniform randomness (like the Irwin-Hall
> > distribution). I just did a quick test without the first full shift, and
> > the results do appear to be better,
>
> Indeed, it makes sense to me.
>

OK, attached is an update making this change and simplifying the
rotate code, which hopefully just leaves the question of what (if
anything) to do with pg_erand48().

> >> Third, I think that the rotate code can be simplified, in particular
> >> the ?: should be avoided because it may induce branches quite damaging
> >> to processor performance.
> >
> > Yeah, I wondered about that. Perhaps there's a "trick" that can be
> > used to simplify it. Pre-computing the number of bits in the mask
> > would probably help.
>
> See pg_popcount64().
>

Actually, I used pg_leftmost_one_pos64() to calculate the mask length,
allowing the mask to be computed from that, so there is no longer a
need for compute_mask(), which seems like a neat little
simplification.

Regards,
Dean

Attachment Content-Type Size
pgbench-prp-func-25.patch text/x-patch 15.2 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-31 17:53:24
Message-ID: alpine.DEB.2.22.394.2103311933490.620883@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Dean,

> OK, attached is an update making this change and simplifying the rotate
> code, which hopefully just leaves the question of what (if anything) to
> do with pg_erand48().

Yep. While looking at it, I have some doubts on this part:

m = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)) | 1;
r = (uint64) (pg_erand48(random_state.xseed) * (mask + 1));
r = (uint64) (pg_erand48(random_state.xseed) * size);

I do not understand why the random values are multiplied by anything in
the first place…

This one looks like a no-op :

r = (uint64) (pg_erand48(random_state.xseed) * size);
v = (v + r) % size;

v = (v + r) % size
= (v + rand * size) % size
=? (v % size + rand * size % size) % size
=? (v % size + 0) % size
= v % size
= v

I'm also skeptical about this one:

r = (uint64) (pg_erand48(random_state.xseed) * (mask + 1));
if (v <= mask)
v = ((v * m) ^ r) & mask;

v = ((v * m) ^ r) & mask
= ((v * m) ^ r) % (mask+1)
= ((v * m) ^ (rand * (mask+1))) % (mask+1)
=? ((v * m) % (mask+1)) ^ (rand * (mask+1) % (mask+1))
=? ((v * m) % (mask+1)) ^ (0)
= (v * m) & mask

Or possibly I'm missing something obvious and I'm wrong with my
arithmetic?

--
Fabien.


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-04-01 08:46:42
Message-ID: CAEZATCVgW-hT2L_eyighcBB3+pujuRmihXAE73_vwNF4WtDLVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 31 Mar 2021 at 18:53, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> While looking at it, I have some doubts on this part:
>
> m = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)) | 1;
> r = (uint64) (pg_erand48(random_state.xseed) * (mask + 1));
> r = (uint64) (pg_erand48(random_state.xseed) * size);
>
> I do not understand why the random values are multiplied by anything in
> the first place…
>

These are just random integers in the range [0,mask] and [0,size-1],
formed in exactly the same way as getrand().

> This one looks like a no-op :
>
> r = (uint64) (pg_erand48(random_state.xseed) * size);
> v = (v + r) % size;
>
> v = (v + r) % size
> = (v + rand * size) % size
> =? (v % size + rand * size % size) % size
> =? (v % size + 0) % size
> = v % size
> = v
>

rand * size % size is not zero because rand is a floating point number
in the range [0,1), so actually rand * size % size = rand * size.
Similarly in the other case, you're forgetting that rand is not an
integer.

Thinking more about our use of erand48(), the only real impact it has
is to limit the number of possible permutations produced, and actually
2^48 is so huge (roughly 281 million million) that I can't ever see
that being an issue in practice. (In a quick dummy test, replacing
erand48() with a silly "erand8()" function that only returned one of
256 distinct values, permute() still worked fine at any size, except
for the fact that only up to 256 distinct permutations were produced.
In other words, limitations on the source of randomness don't prevent
it from producing permutations of any size, they just limit the number
of distinct permutations possible. And since 2^48 is so big, that
shouldn't be an issue.)

Also, I think the source of the input seed is most likely to be either
manually hand-picked integers or pgbench's own random() function, so
the only real issue I can see is that by ignoring the upper 16-bits,
there's a very small chance of always using the same random sequence
if some hand-picked numbers only vary in the top 16 bits, though I
think that's highly unlikely in practice.

Nonetheless, it's not much more effort to make another random state
and use those remaining bits of the seed and get more internal random
states, so here's an update doing that. I intentionally chose to reuse
the lower 16 bits of the seed in the second random function (in a
different slot of the random state), since those are probably the ones
most likely to vary in practice.

This doesn't actually make any measurable difference to any of the
tests, but it closes that potential loophole of ignoring part of the
seed. In all my tests, the biggest improvement was between v23 and v24
of the patch. By comparison, the later versions have been relatively
small improvements, and it's probably now "random enough" for the
intended purposes.

Regards,
Dean

Attachment Content-Type Size
pgbench-prp-func-26.patch text/x-patch 15.4 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-04-02 05:38:57
Message-ID: alpine.DEB.2.22.394.2104011103460.702551@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: Postg토토 사이트SQL


>> r = (uint64) (pg_erand48(random_state.xseed) * size);
>>
>> I do not understand why the random values are multiplied by anything in
>> the first place…
>
> These are just random integers in the range [0,mask] and [0,size-1],
> formed in exactly the same way as getrand().

Indeed, erand returns a double, this was the part I was missing. I did not
realize that you had switched to doubles in your approach.

I think that permute should only use integer operations. I'd suggest to
use one of the integer variants instead of going through a double
computation and casting back to int. The internal state is based on
integers, I do not see the added value of going through floats, possibly
enduring floating point issues (undeflow, rounding, normalization,
whatever) on the way, whereas from start to finish we just need ints.

See attached v27 proposal.

I still think that *rand48 is a poor (relatively small state) and
inefficient (the implementation includes packing and unpacking 16 bits
ints to build a 64 bits int) choice.

--
Fabien.


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-04-02 05:43:55
Message-ID: alpine.DEB.2.22.394.2104020743030.968531@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> See attached v27 proposal.

As usual, it is easier to see with the actual attachement:-)

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-27.patch text/x-diff 15.9 KB

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-04-04 18:21:49
Message-ID: CAEZATCXBZqSBr5iBhH-jQXjo2k04twnscX4Wi4JdMX8=2DdLSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2 Apr 2021 at 06:38, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> >> r = (uint64) (pg_erand48(random_state.xseed) * size);
> >>
> >> I do not understand why the random values are multiplied by anything in
> >> the first place…
> >
> > These are just random integers in the range [0,mask] and [0,size-1],
> > formed in exactly the same way as getrand().
>
> Indeed, erand returns a double, this was the part I was missing. I did not
> realize that you had switched to doubles in your approach.
>
> I think that permute should only use integer operations. I'd suggest to
> use one of the integer variants instead of going through a double
> computation and casting back to int. The internal state is based on
> integers, I do not see the added value of going through floats, possibly
> enduring floating point issues (undeflow, rounding, normalization,
> whatever) on the way, whereas from start to finish we just need ints.
>

This is the already-established coding pattern used in getrand() to
pick a random number uniformly in some range that's not necessarily a
power of 2.

Floating point underflow and normalisation issues are not possible
because erand48() takes a 48-bit integer N and uses ldexp() to store
N/2^48 in a double, which is an exact operation since IEEE doubles
have something like 56-bit mantissas. This is then turned back into an
integer in the required range by multiplying by the desired maximum
value, so there's never any risk of underflow or normalisation issues.
If you didn't do it that way, you'd need to rely on some larger
integer datatype, such as 128-bit integers.

I guess that there may be rounding variations once the required
maximum value exceeds something like 2^56 (although the comment in
getrand() is much more conservative than that), so it's possible that
a pgbench script calling random() with (ub-lb) larger than that might
give different results on different platforms. For the non-uniform
random functions, that effect might well kick in sooner. I'm not aware
of any field complaints about that though, possibly because real-world
data sizes are significantly smaller than that.

In practice, permute() is likely to take its input from one of the
non-uniform random functions, so it won't be permute() that first
introduces rounding issues.

> See attached v27 proposal.
>

This update has a number of flaws. For example, this:

+static uint64
+randu64(RandomState * state)
+{
+ uint64 r1 = pg_jrand48((*state).xseed),
+ r2 = pg_jrand48((*state).xseed);
+ return r1 << 51 | r2 << 13 | r1 >> 13;
+}

It still uses a 48-bit RandomState, so it doesn't improve on getrand()
in that respect.

It replaces a single erand48() call with 2 jrand48() calls, which
comes at a cost in terms of performance. (In fact, changing the number
of rounds in the previous version of permute() from 4 to 6 has a
smaller performance impact than this -- more about that below.)

jrand48() returns a signed 32-bit integer, which has a 50% chance of
being negative, so when that is cast to a uint64, there is a 50%
chance that the 32 most significant bits will be 1. When the various
parts are OR'ed together, that will then mask out any randomness from
the other parts. For example, 50% of the time, the jrand48() value
used for r1 will be negative, and so 32 bits in the middle of the
final result will all be set.

There is essentially no randomness at all in bits 45..50, and the r1
and r2 values overlap in bits 13..18, giving them a 75% chance of
being set.

So overall, the results will be highly non-uniform, with less
randomness and poorer performance than erand48().

In addition, it returns a result in the range [0,2^64), which is not
really what's wanted. For example:

+ /* Random offset */
+ r = randu64(&random_state2);
+ v = (v + r) % size;

The previous code used r = getrand(0, size-1), which gave a uniformly
random offset. However, the new code risks introducing additional
non-uniformity when size is not a power of 2.

Finally, worst of all, this random offset is no longer bijective, due
to 64-bit integer wrap-around. For example, suppose that size=100 and
r=(2^64-10), then the following 2 values both map to the same result:

v = 20 -> (v + r) % size
= (20 + (2^64 - 10)) % 100
= (2^64 + 10) % 100
= (10) % 100
= 10

v = 4 -> (v + r) % size
= (4 + (2^64 - 10)) % 100
= (2^64 - 6) % 100
= (18446744073709551610) % 100
= 10

So not only are the results no longer uniformly random, they might not
even form a permutation.

I did some more testing of the previous version (v26), this time
looking at much larger sizes, all the way up to the maximum, which is
2^63-1 since it comes from a signed int64. In general, the results
were very good, though I did notice some slight non-uniformity in the
way adjacent inputs were separated from another when the size was just
under a power of 2. I think that's the hardest case for this
algorithm, because there's very little overlap between the 2 halves.
Increasing the number of rounds from 4 to 6 ironed out that
non-uniformity (and as mentioned above, is still cheaper than using
randu64() with 4 rounds), so I think we should go with that.

> I still think that *rand48 is a poor (relatively small state) and
> inefficient (the implementation includes packing and unpacking 16 bits
> ints to build a 64 bits int) choice.
>

I can somewhat understand your dislike of *rand48(), but in the
context of pgbench I think that it is perfectly adequate. Since we now
use 2 RandomStates, I don't think the state size is an issue anymore,
if it ever really was. Using erand48() gives better performance than
jrand48() because it returns 48 bits rather than 32, so fewer calls
are needed, which allows more rounds for the same cost. Additionally,
following the same pattern as existing code reduces the risk of bugs,
and builds on proven, tested code.

You may wish to submit a separate patch to replace pgbench's use of
*rand48() with something else, and that would be discussed on its own
merits, but I don't see why that should hold up adding permute().

Regards,
Dean


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-04-05 12:07:25
Message-ID: alpine.DEB.2.22.394.2104051009040.2033293@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Dean,

>> I think that permute should only use integer operations. I'd suggest to
>> use one of the integer variants instead of going through a double
>> computation and casting back to int. The internal state is based on
>> integers, I do not see the added value of going through floats, possibly
>> enduring floating point issues (undeflow, rounding, normalization,
>> whatever) on the way, whereas from start to finish we just need ints.
>
> This is the already-established coding pattern used in getrand() to
> pick a random number uniformly in some range that's not necessarily a
> power of 2.

Indeed. I'm not particularly happy with that one either.

> Floating point underflow and normalisation issues are not possible
> because erand48() takes a 48-bit integer N and uses ldexp() to store
> N/2^48 in a double, which is an exact operation since IEEE doubles
> have something like 56-bit mantissas.

Double mantissa size is 52 bits.

> This is then turned back into an integer in the required range by
> multiplying by the desired maximum value, so there's never any risk of
> underflow or normalisation issues.

ISTM that there are significant issues when multiplying with an integer,
because the integer is cast to a double before multiplying, so if the int
is over 52 bits then it is coldly truncated and some values are just lost
in the process and will never be drawn. Probably not too many of them, but
some of them anyway.

> I guess that there may be rounding variations once the required
> maximum value exceeds something like 2^56 (although the comment in
> getrand() is much more conservative than that), so it's possible that
> a pgbench script calling random() with (ub-lb) larger than that might
> give different results on different platforms.

Dunno. This may be the same issue I'm pointing out above.

> For the non-uniform random functions, that effect might well kick in
> sooner. I'm not aware of any field complaints about that though,
> possibly because real-world data sizes are significantly smaller than
> that.
>
> In practice, permute() is likely to take its input from one of the
> non-uniform random functions, so it won't be permute() that first
> introduces rounding issues.

Sure. I'd like permute to be immune to that.

>> See attached v27 proposal.
>
> This update has a number of flaws. For example, this:

Indeed:-)

> +static uint64
> +randu64(RandomState * state)
> +{
> + uint64 r1 = pg_jrand48((*state).xseed),
> + r2 = pg_jrand48((*state).xseed);
> + return r1 << 51 | r2 << 13 | r1 >> 13;
> +}
>
> It still uses a 48-bit RandomState, so it doesn't improve on getrand()
> in that respect.

Sure. I'm pretty unhappy with that one, but I was not trying to address
that. My idea that randu64 would be replace with something better at some
point. My intention was "64-bits pseudo-random", my implementation does
not work, ok:-)

> It replaces a single erand48() call with 2 jrand48() calls, which
> comes at a cost in terms of performance. (In fact, changing the number
> of rounds in the previous version of permute() from 4 to 6 has a
> smaller performance impact than this -- more about that below.)

Sure, same remark as above, I was not trying to address that pointB.

> jrand48() returns a signed 32-bit integer, which has a 50% chance of
> being negative, so when that is cast to a uint64, there is a 50%
> chance that the 32 most significant bits will be 1.

Argh.

> When the various parts are OR'ed together, that will then mask out any
> randomness from the other parts. For example, 50% of the time, the
> jrand48() value used for r1 will be negative, and so 32 bits in the
> middle of the final result will all be set.

Argh. I hesitated to use xor. I should not have:-)

> So overall, the results will be highly non-uniform, with less
> randomness and poorer performance than erand48().

Indeed, bad choice. I wanted to used the unsigned version but it is not
implemented, and swichted to the signed version without thinking of some
of the implications.

> In addition, it returns a result in the range [0,2^64), which is not
> really what's wanted. For example:
>
> + /* Random offset */
> + r = randu64(&random_state2);
> + v = (v + r) % size;
>
> The previous code used r = getrand(0, size-1), which gave a uniformly
> random offset. However, the new code risks introducing additional
> non-uniformity when size is not a power of 2.

ISTM that the overall non uniformity is worse with the float approach as
opposed to the int approach.

Conceptually, the same kind of bias is expected whether you get through
floats or through ints, because the underlying power-of-two maths is the
same, so what makes the difference in reducing non-uniformity is using
more bits. Basically, when enough bits are used the same number of values
should appear n vs n+1 times.

When not enough bits are provided, things get ugly: for instance, with
size = 2^53, even if the floats were fully the 52-bit float pseudo-random
mantissa (they are really 48 with erand48) would result in only even
numbers to be produced, whereas with ints all numbers are produced. With
erand48, when size is above 48 bits ISTM that last bits are always zeros
with the double approach. I'm not counting lost values because of size
truncation when converting it to double.

> Finally, worst of all, this random offset is no longer bijective, due
> to 64-bit integer wrap-around. For example, suppose that size=100 and
> r=(2^64-10), then the following 2 values both map to the same result:
>
> v = 20 -> (v + r) % size
> = (20 + (2^64 - 10)) % 100
> = (2^64 + 10) % 100
> = (10) % 100
> = 10
>
> v = 4 -> (v + r) % size
> = (4 + (2^64 - 10)) % 100
> = (2^64 - 6) % 100
> = (18446744073709551610) % 100
> = 10
>
> So not only are the results no longer uniformly random, they might not
> even form a permutation.

Indeed, this one is pretty fun! Probably the right formula for this
approach is "(v + r % size) % size", which is kind of a mouthful.

I fully agree that my v27 implementation is butched on many dimensions,
some of them intentional and temporary (use jrand48 twice) and some of
them accidental (not considering int overflows, being optimistic on signed
to unsigned casts…).

I still disagree though that going through floating point is the right
thing to do, because of some of the issues I outlined above (eg truncation
and rounding for above 48/52-bits sizes). Basically I think that an
algorithm dealing with integers should not have to resort to floating
point computations unless it is actually required. This is not the case
for permute, were v26 is using doubles as glorified 48-bit integers, that
could be extended to 52-bit integers, but no more. The only benefit I see
is using implicitly the internal 104-bit rounding by truncation on
multiply, but I do not think that implicitely reducing the practical int
values to 52 bits is worth it, and that the same quality (bias) can be
achieved for 63 bits integers by keeping them as ints are writing the
right formula, which I fully failed to demonstrate in v27.

> I did some more testing of the previous version (v26), this time
> looking at much larger sizes, all the way up to the maximum, which is
> 2^63-1 since it comes from a signed int64. In general, the results
> were very good, though I did notice some slight non-uniformity in the
> way adjacent inputs were separated from another when the size was just
> under a power of 2. I think that's the hardest case for this
> algorithm, because there's very little overlap between the 2 halves.

Yes, less values are steered twice per round. However, as for adjacent
values for large sizes, I'm wondering whether this may have more to do
with the 48 bit limitations, so that lower bits are not really xored for
instance. Not sure.

> Increasing the number of rounds from 4 to 6 ironed out that
> non-uniformity (and as mentioned above, is still cheaper than using
> randu64() with 4 rounds), so I think we should go with that.

There is a quality-cost tradeoff. With the previous version I convinced
myself that 4 rounds were a good compromise (not perfect, but ok for
keeping the cost low on practical sizes).

With this version, I'll admit that I do not have an opinion.

> You may wish to submit a separate patch to replace pgbench's use of
> *rand48() with something else, and that would be discussed on its own
> merits, but I don't see why that should hold up adding permute().

I'll see.

Attached a v28 which I hope fixes the many above issues and stays with
ints. The randu64 is still some kind of a joke, I artificially reduced the
cost by calling jrand48 once and extending it to 64 bits, so it could give
an idea of the cost endured if a 64-bit prng was used.

Now you are the committer, you can do as you please, I'm just stating my
(mathematical) opinions about using floating point computations for that.
I think that apart from this point of principle/philosophy the permute
performance and implementation are reasonable, and better than my initial
version because it avoids int128 computations and the large prime number
business.

--
Fabien.

Attachment Content-Type Size
pgbench-prp-func-28.patch text/x-diff 16.2 KB

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-04-06 11:19:26
Message-ID: CAEZATCU4V6FXtUREHEpejdeBvNb3geidLsojC09JZch6z=nEvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: 503 사설 토토 페치 실패

On Mon, 5 Apr 2021 at 13:07, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> Attached a v28 which I hope fixes the many above issues and stays with
> ints. The randu64 is still some kind of a joke, I artificially reduced the
> cost by calling jrand48 once and extending it to 64 bits, so it could give
> an idea of the cost endured if a 64-bit prng was used.
>
> Now you are the committer, you can do as you please, I'm just stating my
> (mathematical) opinions about using floating point computations for that.
> I think that apart from this point of principle/philosophy the permute
> performance and implementation are reasonable, and better than my initial
> version because it avoids int128 computations and the large prime number
> business.
>

Pushed.

I decided not to go with the "joke" randu64() function, but instead
used getrand() directly. This at least removes any *direct* use of
doubles in permute() (though of course they're still used indirectly),
and means that if getrand() is improved in the future, permute() will
benefit too.

Regards,
Dean


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-04-06 21:31:31
Message-ID: alpine.DEB.2.22.394.2104062329130.2555247@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hello Dean,

> Pushed.
>
> I decided not to go with the "joke" randu64() function, but instead
> used getrand() directly. This at least removes any *direct* use of
> doubles in permute() (though of course they're still used indirectly),
> and means that if getrand() is improved in the future, permute() will
> benefit too.

Good idea, at least it is hidden and in one place.

Thanks for the push!

--
Fabien.