Re: How to Copy/Load 1 billions rows into a Partition Tables Fast

Lists: pgsql-general
From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: how to use trace_lock_oidmin config correctly
Date: 2024-10-14 11:00:37
Message-ID: CAM+6J95roDtn05fcobKJjWhG1YdJYwzoUFhjaj-dVOCw+9m6OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I do not see logs for locks linked to attaching and detaching partitions if
I use the
trace_lock_oidmin config set to oid below the table of concern, basically
any locking on objects above the threshold oid do not log.

from the doc:
PostgreSQL: Documentation: 17: 19.17. Developer Options
</docs/current/runtime-config-developer.html#GUC-TRACE-LOCK-OIDMIN>

trace_lock_oidmin (integer)

If set, do not trace locks for tables below this OID (used to avoid output
on system tables).
This parameter is only available if the LOCK_DEBUG macro was defined when
PostgreSQL was compiled

i compiled my postgres with LOCK_DEBUG macro

/*
postgres(at)ubuntu:/tmp$ pg_config
BINDIR = /opt/postgresql/bin
DOCDIR = /opt/postgresql/share/doc
HTMLDIR = /opt/postgresql/share/doc
INCLUDEDIR = /opt/postgresql/include
PKGINCLUDEDIR = /opt/postgresql/include
INCLUDEDIR-SERVER = /opt/postgresql/include/server
LIBDIR = /opt/postgresql/lib
PKGLIBDIR = /opt/postgresql/lib
LOCALEDIR = /opt/postgresql/share/locale
MANDIR = /opt/postgresql/share/man
SHAREDIR = /opt/postgresql/share
SYSCONFDIR = /opt/postgresql/etc
PGXS = /opt/postgresql/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/postgresql' '--with-openssl' '--enable-debug'
'--enable-profiling' '--enable-cassert' '--enable-tap-tests' 'CFLAGS=-ggdb
-Og -g3 -fno-omit-frame-pointer -DLOCK_DEBUG -DBTREE_BUILD_STATS
-DWAL_DEBUG '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation
-g -pg -DLINUX_PROFILE -ggdb -Og -g3 -fno-omit-frame-pointer -DLOCK_DEBUG
-DBTREE_BUILD_STATS -DWAL_DEBUG
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed
-Wl,-rpath,'/opt/postgresql/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lssl -lcrypto -lz -lreadline -lm
VERSION = PostgreSQL 18devel

postgres(at)ubuntu:/tmp$ psql
psql (18devel)
Type "help" for help.

postgres=# show client_min_messages;
client_min_messages
---------------------
log
(1 row)

postgres=# show trace_lock_oidmin;
trace_lock_oidmin
-------------------
16400
(1 row)

postgres=# select 't'::regclass::oid;
oid
-------
16401
(1 row)

postgres=# select 't1'::regclass::oid;
oid
-------
16404
(1 row)

postgres=# alter table t detach partition t1;
ALTER TABLE
postgres=# alter table t attach partition t1 for values in (0);
ALTER TABLE
*/

but if i map the trace_lock_table to the oid of one table, it logs locking
fine.

/*
postgres=# alter system set trace_lock_table = 16401;
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)

postgres=# alter table t detach partition t1;
LOG: LockAcquire: lock [5,16401] AccessExclusiveLock
LOG: LockAcquire: new: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0)
type(AccessExclusiveLock)
LOG: LockAcquire: new: proclock(0x708b6d1da680) lock(0x708b6d12ae78)
method(1) proc(0x708b6d4b8250) hold(0)
LOG: LockCheckConflicts: no conflict: proclock(0x708b6d1da680)
lock(0x708b6d12ae78) method(1) proc(0x708b6d4b8250) hold(0)
LOG: GrantLock: lock(0x708b6d12ae78) id(5,16401,0,0,0,1) grantMask(100)
req(0,0,0,0,0,0,0)=1 grant(0,0,0,0,0,0,0)=1 wait(0)
type(AccessExclusiveLock)
LOG: LockAcquire: lock [5,16401] AccessExclusiveLock
LOG: LockReleaseAll: proclock(0x708b6d1da680) lock(0x708b6d12ae78)
method(1) proc(0x708b6d4b8250) hold(100)
LOG: LockReleaseAll: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
grantMask(100) req(0,0,0,0,0,0,0)=1 grant(0,0,0,0,0,0,0)=1 wait(0)
type(INVALID)
LOG: UnGrantLock: updated: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0)
type(AccessExclusiveLock)
LOG: UnGrantLock: updated: proclock(0x708b6d1da680) lock(0x708b6d12ae78)
method(1) proc(0x708b6d4b8250) hold(0)
LOG: LockReleaseAll: updated: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0)
type(INVALID)
LOG: CleanUpLock: deleting: proclock(0x708b6d1da680) lock(0x708b6d12ae78)
method(1) proc(0x708b6d4b8250) hold(0)
LOG: CleanUpLock: deleting: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0)
type(INVALID)
ALTER TABLE
*/

--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada>


From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: how to use trace_lock_oidmin config correctly
Date: 2024-10-14 11:12:50
Message-ID: CAM+6J96QhAqcZpRkP7NdOCDbvsCd=QB2DXS0afDAK+b3HPvs4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

ok pls ignore.

i think this flag has to be used along with trace_locks = on flag.
now it works.

PostgreSQL: Documentation: 17: 19.17. Developer Options
</docs/current/runtime-config-developer.html#GUC-TRACE-LOCKS>

sorry for the noise.

/*
postgres=# show trace_lock_oidmin;
LOG: LockReleaseAll: lockmethod=1
LOG: LockReleaseAll done
trace_lock_oidmin
-------------------
16406
(1 row)

postgres=# show trace_locks;
LOG: LockReleaseAll: lockmethod=1
LOG: LockReleaseAll done
trace_locks
-------------
on
(1 row)

postgres=# select 't'::regclass::oid;
LOG: LockReleaseAll: lockmethod=1
LOG: LockReleaseAll done
oid
-------
16401
(1 row)

postgres=# drop table t;
LOG: LockReleaseAll: lockmethod=1
LOG: LockReleaseAll done
DROP TABLE
*/

On Mon, 14 Oct 2024 at 16:30, Vijaykumar Jain <
vijaykumarjain(dot)github(at)gmail(dot)com> wrote:

>
>
> I do not see logs for locks linked to attaching and detaching partitions
> if I use the
> trace_lock_oidmin config set to oid below the table of concern, basically
> any locking on objects above the threshold oid do not log.
>
> from the doc:
> PostgreSQL: Documentation: 17: 19.17. Developer Options
> </docs/current/runtime-config-developer.html#GUC-TRACE-LOCK-OIDMIN>
>
> trace_lock_oidmin (integer)
>
> If set, do not trace locks for tables below this OID (used to avoid output
> on system tables).
> This parameter is only available if the LOCK_DEBUG macro was defined when
> PostgreSQL was compiled
>
> i compiled my postgres with LOCK_DEBUG macro
>
> /*
> postgres(at)ubuntu:/tmp$ pg_config
> BINDIR = /opt/postgresql/bin
> DOCDIR = /opt/postgresql/share/doc
> HTMLDIR = /opt/postgresql/share/doc
> INCLUDEDIR = /opt/postgresql/include
> PKGINCLUDEDIR = /opt/postgresql/include
> INCLUDEDIR-SERVER = /opt/postgresql/include/server
> LIBDIR = /opt/postgresql/lib
> PKGLIBDIR = /opt/postgresql/lib
> LOCALEDIR = /opt/postgresql/share/locale
> MANDIR = /opt/postgresql/share/man
> SHAREDIR = /opt/postgresql/share
> SYSCONFDIR = /opt/postgresql/etc
> PGXS = /opt/postgresql/lib/pgxs/src/makefiles/pgxs.mk
> CONFIGURE = '--prefix=/opt/postgresql' '--with-openssl' '--enable-debug'
> '--enable-profiling' '--enable-cassert' '--enable-tap-tests' 'CFLAGS=-ggdb
> -Og -g3 -fno-omit-frame-pointer -DLOCK_DEBUG -DBTREE_BUILD_STATS
> -DWAL_DEBUG '
> CC = gcc
> CPPFLAGS = -D_GNU_SOURCE
> CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation
> -g -pg -DLINUX_PROFILE -ggdb -Og -g3 -fno-omit-frame-pointer -DLOCK_DEBUG
> -DBTREE_BUILD_STATS -DWAL_DEBUG
> CFLAGS_SL = -fPIC
> LDFLAGS = -Wl,--as-needed
> -Wl,-rpath,'/opt/postgresql/lib',--enable-new-dtags
> LDFLAGS_EX =
> LDFLAGS_SL =
> LIBS = -lpgcommon -lpgport -lssl -lcrypto -lz -lreadline -lm
> VERSION = PostgreSQL 18devel
>
>
> postgres(at)ubuntu:/tmp$ psql
> psql (18devel)
> Type "help" for help.
>
> postgres=# show client_min_messages;
> client_min_messages
> ---------------------
> log
> (1 row)
>
> postgres=# show trace_lock_oidmin;
> trace_lock_oidmin
> -------------------
> 16400
> (1 row)
>
> postgres=# select 't'::regclass::oid;
> oid
> -------
> 16401
> (1 row)
>
> postgres=# select 't1'::regclass::oid;
> oid
> -------
> 16404
> (1 row)
>
> postgres=# alter table t detach partition t1;
> ALTER TABLE
> postgres=# alter table t attach partition t1 for values in (0);
> ALTER TABLE
> */
>
>
> but if i map the trace_lock_table to the oid of one table, it logs locking
> fine.
>
>
> /*
> postgres=# alter system set trace_lock_table = 16401;
> ALTER SYSTEM
> postgres=# select pg_reload_conf();
> pg_reload_conf
> ----------------
> t
> (1 row)
>
> postgres=# alter table t detach partition t1;
> LOG: LockAcquire: lock [5,16401] AccessExclusiveLock
> LOG: LockAcquire: new: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
> grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0)
> type(AccessExclusiveLock)
> LOG: LockAcquire: new: proclock(0x708b6d1da680) lock(0x708b6d12ae78)
> method(1) proc(0x708b6d4b8250) hold(0)
> LOG: LockCheckConflicts: no conflict: proclock(0x708b6d1da680)
> lock(0x708b6d12ae78) method(1) proc(0x708b6d4b8250) hold(0)
> LOG: GrantLock: lock(0x708b6d12ae78) id(5,16401,0,0,0,1) grantMask(100)
> req(0,0,0,0,0,0,0)=1 grant(0,0,0,0,0,0,0)=1 wait(0)
> type(AccessExclusiveLock)
> LOG: LockAcquire: lock [5,16401] AccessExclusiveLock
> LOG: LockReleaseAll: proclock(0x708b6d1da680) lock(0x708b6d12ae78)
> method(1) proc(0x708b6d4b8250) hold(100)
> LOG: LockReleaseAll: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
> grantMask(100) req(0,0,0,0,0,0,0)=1 grant(0,0,0,0,0,0,0)=1 wait(0)
> type(INVALID)
> LOG: UnGrantLock: updated: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
> grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0)
> type(AccessExclusiveLock)
> LOG: UnGrantLock: updated: proclock(0x708b6d1da680) lock(0x708b6d12ae78)
> method(1) proc(0x708b6d4b8250) hold(0)
> LOG: LockReleaseAll: updated: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
> grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0)
> type(INVALID)
> LOG: CleanUpLock: deleting: proclock(0x708b6d1da680) lock(0x708b6d12ae78)
> method(1) proc(0x708b6d4b8250) hold(0)
> LOG: CleanUpLock: deleting: lock(0x708b6d12ae78) id(5,16401,0,0,0,1)
> grantMask(0) req(0,0,0,0,0,0,0)=0 grant(0,0,0,0,0,0,0)=0 wait(0)
> type(INVALID)
> ALTER TABLE
> */
>
> --
> Thanks,
> Vijay
>
> Open to work
> Resume - Vijaykumar Jain <https://github.com/cabecada>
>

--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada>


From: "Wong, Kam Fook (TR Technology)" <kamfook(dot)wong(at)thomsonreuters(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: How to Copy/Load 1 billions rows into a Partition Tables Fast
Date: 2024-10-14 17:59:22
Message-ID: CH0PR03MB6100337240436C3FF72FDE37FE442@CH0PR03MB6100.namprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I am trying to copy a table (Postgres) that is close to 1 billion rows into a Partition table (Postgres) within the same DB. What is the fastest way to copy the data? This table has 37 columns where some of which are text data types.

Thank you
Kam Fook Wong

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html


From: Durgamahesh Manne <maheshpostgres9(at)gmail(dot)com>
To: "Wong, Kam Fook (TR Technology)" <kamfook(dot)wong(at)thomsonreuters(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
Date: 2024-10-14 20:43:31
Message-ID: CAJCZkoLbAjY5me5jdF-AzPanyfpY8na5RX2NUGvnaO4YNPExEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 14 Oct, 2024, 23:29 Wong, Kam Fook (TR Technology), <
kamfook(dot)wong(at)thomsonreuters(dot)com> wrote:

> I am trying to copy a table (Postgres) that is close to 1 billion rows
> into a Partition table (Postgres) within the same DB. What is the fastest
> way to copy the data? This table has 37 columns where some of which are
> text data types.
>
> Thank you
> Kam Fook Wong
>
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website:
> https://www.thomsonreuters.com/en/resources/disclosures.html
>

Hi Kam Fook Wong,

You can achieve it with pg_bulkload utility

Regards,
Durga Mahesh

>


From: Muhammad Usman Khan <usman(dot)k(at)bitnine(dot)net>
To: "Wong, Kam Fook (TR Technology)" <kamfook(dot)wong(at)thomsonreuters(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
Date: 2024-10-15 03:39:19
Message-ID: CAPnRvGt12Kf4aBaU8kHKG-0w0BW-o0LnNgw8ZFeY4F+wvzcCWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,
There are many methods to achieve this and one of them is pg_bulkload
utility as described in previous email but I always preferred using python
multiprocessing which I think is more efficient. Below is the code which
you can modify as per your requirement:

import multiprocessing
import psycopg2

def insert_partition(date_range):
conn = psycopg2.connect("dbname=your_db user=your_user
password=your_password")
cur = conn.cursor()
query = f"""
INSERT INTO partitioned_table (column1, column2, ...)
SELECT column1, column2, ...
FROM source_table
WHERE partition_key BETWEEN '{date_range[0]}' AND '{date_range[1]}';
"""
cur.execute(query)
conn.commit()
cur.close()
conn.close()

if __name__ == "__main__":
ranges = [
('2024-01-01', '2024-03-31'),
('2024-04-01', '2024-06-30'),
# Add more ranges as needed
]
pool = multiprocessing.Pool(processes=4) # Adjust based on CPU cores
pool.map(insert_partition, ranges)
pool.close()
pool.join()

On Mon, 14 Oct 2024 at 22:59, Wong, Kam Fook (TR Technology) <
kamfook(dot)wong(at)thomsonreuters(dot)com> wrote:

> I am trying to copy a table (Postgres) that is close to 1 billion rows
> into a Partition table (Postgres) within the same DB. What is the fastest
> way to copy the data? This table has 37 columns where some of which are
> text data types.
>
> Thank you
> Kam Fook Wong
>
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website:
> https://www.thomsonreuters.com/en/resources/disclosures.html
>


From: Juan Rodrigo Alejandro Burgos Mella <rodrigoburgosmella(at)gmail(dot)com>
To: "Wong, Kam Fook (TR Technology)" <kamfook(dot)wong(at)thomsonreuters(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
Date: 2024-10-15 04:27:57
Message-ID: CAHbZ42wy7dNouZkmOOc8-pbZ++W+7Ez2bSPkcsM3QD9JHVtNyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi Wong
On one occasion I had to upload 600 million records, and the most viable
and safest option was to generate plans and upload them through a massively
parallelized process (because for each process we audited that everything
was correct)

Atte.
JRBM

El lun, 14 oct 2024 a las 14:59, Wong, Kam Fook (TR Technology) (<
kamfook(dot)wong(at)thomsonreuters(dot)com>) escribió:

> I am trying to copy a table (Postgres) that is close to 1 billion rows
> into a Partition table (Postgres) within the same DB. What is the fastest
> way to copy the data? This table has 37 columns where some of which are
> text data types.
>
> Thank you
> Kam Fook Wong
>
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website:
> https://www.thomsonreuters.com/en/resources/disclosures.html
>


From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: "Wong, Kam Fook (TR Technology)" <kamfook(dot)wong(at)thomsonreuters(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to Copy/Load 1 billions rows into a Partition Tables Fast
Date: 2024-10-15 04:57:46
Message-ID: CAApHDvre3nuCLA4DVfYSW_EAiuV9FKaZV6k7_dPMSa9d6_Mg2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 15 Oct 2024 at 06:59, Wong, Kam Fook (TR Technology)
<kamfook(dot)wong(at)thomsonreuters(dot)com> wrote:
> I am trying to copy a table (Postgres) that is close to 1 billion rows into a Partition table (Postgres) within the same DB. What is the fastest way to copy the data? This table has 37 columns where some of which are text data types.

Is the purpose of this question because you're partitioning an
existing table? If so, you might want to consider if speed is the
biggest consideration to the requirements. It is possible, for
example, to partition a table "online" by using table inheritance as
an intermediate way to partition the table and migrate the rows in
smaller batches into inheritance child tables with CHECK constraints
backing up the partition constraint. You can use a CTE with a DELETE
.. WHERE <clause to some small batch of rows> RETURNING with an INSERT
INTO new_table SELECT * FROM cte;. Once the inheritance parent table
is empty, you can then consider rearranging the inheritance hierarchy
into a partitioned table and its partitions. The CHECK constraint will
allow the tables to be ATTACHed as partitions to a new partitioned
table without having to scan each partition to ensure no rows violate
the partition constraint. If done correctly, the only blocking
operation done is some DDL which includes renaming a table and
attaching all the partitions. All of that should be metadata-only
operations. You'll want to rehearse the migration a few times away
from production to help ensure it'll run smoothly on the day.

I'm not familiar with pg_bulkload so can't comment on the other
suggestions, however, I'd be surprised if exporting the data out of
and back into PostgreSQL would be faster than having it remain inside
PostgreSQL. Not exporting/importing means you don't need to call
output and input functions for every row and column. If you didn't
want to go down the inheritance table as an intermediate step, then
you might find it's quite fast to start up a series of parallel jobs
to INSERT INTO partition_name SELECT * FROM original_table WHERE <rows
for this partition>;

David