From: | Krasiyan Andreev <krasiyan(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | [PATCH] distinct aggregates within a window function WIP |
Date: | 2020-01-13 09:17:02 |
Message-ID: | CAN1Pwonf4waD+PWkEFK8ANLua8fPjZ4DmV+hixO62+LiR8gwaA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | Postg스포츠 토토 베트맨SQL |
Hi hackers,
I want to propose to you an old patch for Postgres 11, off-site developed
by Oliver Ford,
but I have permission from him to publish it and to continue it's
development,
that allow distinct aggregates, like select sum(distinct nums) within a
window function.
I have rebased it for current git master branch and have made necessary
changes to it to work with Postgres 13devel.
It's a WIP, because it doesn't have tests yet (I will add them later) and
also, it works for a int, float, and numeric types,
but probably distinct check can be rewritten for possible performance
improvement,
with storing the distinct elements in a hash table which should give a
performance improvement.
If you find the implementation of patch acceptable from committers
perspective,
I will answer to all yours design and review notes and will try to go ahead
with it,
also, I will add this patch to the March commit fest.
For example usage of a patch, if you have time series data, with current
Postgres you will get an error:
postgres=# CREATE TABLE t_demo AS
postgres-# SELECT ordinality, day, date_part('week', day) AS week
postgres-# FROM generate_series('2020-01-02', '2020-01-15', '1
day'::interval)
postgres-# WITH ORDINALITY AS day;
SELECT 14
postgres=# SELECT * FROM t_demo;
ordinality | day | week
------------+------------------------+------
1 | 2020-01-02 00:00:00+02 | 1
2 | 2020-01-03 00:00:00+02 | 1
3 | 2020-01-04 00:00:00+02 | 1
4 | 2020-01-05 00:00:00+02 | 1
5 | 2020-01-06 00:00:00+02 | 2
6 | 2020-01-07 00:00:00+02 | 2
7 | 2020-01-08 00:00:00+02 | 2
8 | 2020-01-09 00:00:00+02 | 2
9 | 2020-01-10 00:00:00+02 | 2
10 | 2020-01-11 00:00:00+02 | 2
11 | 2020-01-12 00:00:00+02 | 2
12 | 2020-01-13 00:00:00+02 | 3
13 | 2020-01-14 00:00:00+02 | 3
14 | 2020-01-15 00:00:00+02 | 3
(14 rows)
postgres=# SELECT *,
postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2
FOLLOWING)
postgres-# FROM t_demo;
ERROR: DISTINCT is not implemented for window functions
LINE 2: array_agg(DISTINCT week) OVER (ORDER BY day ROWS
^
So you will need to write something like this:
postgres=# SELECT *, (SELECT array_agg(DISTINCT unnest) FROM unnest(x)) AS
b
postgres-# FROM
postgres-# (
postgres(# SELECT *,
postgres(# array_agg(week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS x
postgres(# FROM t_demo
postgres(# ) AS a;
ordinality | day | week | x | b
------------+------------------------+------+-------------+-------
1 | 2020-01-02 00:00:00+02 | 1 | {1,1,1} | {1}
2 | 2020-01-03 00:00:00+02 | 1 | {1,1,1,1} | {1}
3 | 2020-01-04 00:00:00+02 | 1 | {1,1,1,1,2} | {1,2}
4 | 2020-01-05 00:00:00+02 | 1 | {1,1,1,2,2} | {1,2}
5 | 2020-01-06 00:00:00+02 | 2 | {1,1,2,2,2} | {1,2}
6 | 2020-01-07 00:00:00+02 | 2 | {1,2,2,2,2} | {1,2}
7 | 2020-01-08 00:00:00+02 | 2 | {2,2,2,2,2} | {2}
8 | 2020-01-09 00:00:00+02 | 2 | {2,2,2,2,2} | {2}
9 | 2020-01-10 00:00:00+02 | 2 | {2,2,2,2,2} | {2}
10 | 2020-01-11 00:00:00+02 | 2 | {2,2,2,2,3} | {2,3}
11 | 2020-01-12 00:00:00+02 | 2 | {2,2,2,3,3} | {2,3}
12 | 2020-01-13 00:00:00+02 | 3 | {2,2,3,3,3} | {2,3}
13 | 2020-01-14 00:00:00+02 | 3 | {2,3,3,3} | {2,3}
14 | 2020-01-15 00:00:00+02 | 3 | {3,3,3} | {3}
(14 rows)
With attached version, you will get the desired results:
postgres=# SELECT *,
postgres-# array_agg(DISTINCT week) OVER (ORDER BY day ROWS
postgres(# BETWEEN 2 PRECEDING AND 2
FOLLOWING)
postgres-# FROM t_demo;
ordinality | day | week | array_agg
------------+------------------------+------+-----------
1 | 2020-01-02 00:00:00+02 | 1 | {1}
2 | 2020-01-03 00:00:00+02 | 1 | {1}
3 | 2020-01-04 00:00:00+02 | 1 | {1,2}
4 | 2020-01-05 00:00:00+02 | 1 | {1,2}
5 | 2020-01-06 00:00:00+02 | 2 | {1,2}
6 | 2020-01-07 00:00:00+02 | 2 | {1,2}
7 | 2020-01-08 00:00:00+02 | 2 | {2}
8 | 2020-01-09 00:00:00+02 | 2 | {2}
9 | 2020-01-10 00:00:00+02 | 2 | {2}
10 | 2020-01-11 00:00:00+02 | 2 | {2,3}
11 | 2020-01-12 00:00:00+02 | 2 | {2,3}
12 | 2020-01-13 00:00:00+02 | 3 | {2,3}
13 | 2020-01-14 00:00:00+02 | 3 | {2,3}
14 | 2020-01-15 00:00:00+02 | 3 | {3}
(14 rows)
Attachment | Content-Type | Size |
---|---|---|
pg13-distinct-window.patch | text/x-patch | 11.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Sergei Kornilov | 2020-01-13 09:21:58 | Re: [HACKERS] Block level parallel vacuum |
Previous Message | Michael Paquier | 2020-01-13 08:59:38 | Re: Question regarding heap_multi_insert documentation |