time taking deletion on large tables

Lists: pgsql-adminpgsql-docspgsql-in-generalpgsql-performance
From: Atul Kumar <akumar14871(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org, pgsql-in-general(at)lists(dot)postgresql(dot)org, pgsql-performance(at)lists(dot)postgresql(dot)org, pgsql-docs(at)lists(dot)postgresql(dot)org, pgadmin-support(at)lists(dot)postgresql(dot)org
Subject: time taking deletion on large tables
Date: 2020-12-03 08:49:24
Message-ID: CA+ONtZ541OzW3dzrUKcXscTYAK8G_3DEtUhFz+i2+HsD3-5ffQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

Hi,

The feed_posts table has over 50 Million rows.

When I m deleting all rows of a certain type that are over 60 days old.

When I try to do a delete like this: it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).

DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';

So– I need help in figuring out how to do large deletes on a
production database during normal hours.

explain plan is given below

"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"

please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?

Regards,
Atul


From: Atul Kumar <akumar14871(at)gmail(dot)com>
To: pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: time taking deletion on large tables
Date: 2020-12-03 14:45:06
Message-ID: CA+ONtZ7RTPhQraWw13Dr06HqCU5LZxkBPopYoPam84kNw_RwCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

Hi,

The feed_posts table has over 50 Million rows.

When I m deleting all rows of a certain type that are over 60 days old.

When I try to do a delete like this: it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).

DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';

So– I need help in figuring out how to do large deletes on a
production database during normal hours.

explain plan is given below

"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"

please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?

Regards,
Atul


From: Atul Kumar <akumar14871(at)gmail(dot)com>
To: pgsql-in-general <pgsql-in-general(at)lists(dot)postgresql(dot)org>
Subject: time taking deletion on large tables
Date: 2020-12-03 14:45:40
Message-ID: CA+ONtZ79V_zac83vOEKm5xCvHxYYZFfq_EJrjUOed+yA-ETUFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

Hi,

The feed_posts table has over 50 Million rows.

When I m deleting all rows of a certain type that are over 60 days old.

When I try to do a delete like this: it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).

DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';

So– I need help in figuring out how to do large deletes on a
production database during normal hours.

explain plan is given below

"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"

please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?

Regards,
Atul


From: Atul Kumar <akumar14871(at)gmail(dot)com>
To: pgsql-performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: time taking deletion on large tables
Date: 2020-12-03 14:46:07
Message-ID: CA+ONtZ7rNziZKMjgvocR9EHH=L8V93C-K4E20qP8JwEfJpW5Zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

Hi,

The feed_posts table has over 50 Million rows.

When I m deleting all rows of a certain type that are over 60 days old.

When I try to do a delete like this: it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).

DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';

So– I need help in figuring out how to do large deletes on a
production database during normal hours.

explain plan is given below

"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"

please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?

Regards,
Atul


From: Atul Kumar <akumar14871(at)gmail(dot)com>
To: pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: time taking deletion on large tables
Date: 2020-12-03 14:47:17
Message-ID: CA+ONtZ69O=isKqjMasJm3TPBzROAN_+y_2fdZDRMJotVakxyAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

Hi,

The feed_posts table has over 50 Million rows.

When I m deleting all rows of a certain type that are over 60 days old.

When I try to do a delete like this: it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).

DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';

So– I need help in figuring out how to do large deletes on a
production database during normal hours.

explain plan is given below

"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"

please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?

Regards,
Atul


From: Atul Kumar <akumar14871(at)gmail(dot)com>
To: pgsql-in-general <pgsql-in-general(at)lists(dot)postgresql(dot)org>
Subject: time taking deletion on large tables
Date: 2020-12-03 14:49:19
Message-ID: CA+ONtZ7BkLB7JoreiRRaMA_nSFRQo-gNar5kV+4TYJiVTQzj=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

Hi,

The feed_posts table has over 50 Million rows.

When I m deleting all rows of a certain type that are over 60 days old.

When I try to do a delete like this: it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).

DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';

So– I need help in figuring out how to do large deletes on a
production database during normal hours.

explain plan is given below

"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"

please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?

Regards,
Atul


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Atul Kumar <akumar14871(at)gmail(dot)com>
Cc: pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: time taking deletion on large tables
Date: 2020-12-03 14:51:26
Message-ID: 20201203145126.GA4726@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

On Thu, Dec 03, 2020 at 08:15:06PM +0530, Atul Kumar wrote:
> Hi,
>
> The feed_posts table has over 50 Million rows.
>
> When I m deleting all rows of a certain type that are over 60 days old.

The common solution to this problem is to partition, and then, instead
of deleting rows - delete old partitions.

depesz


From: Ravikumar Reddy <urravikumarreddy(at)gmail(dot)com>
To: Atul Kumar <akumar14871(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: time taking deletion on large tables
Date: 2020-12-03 15:13:57
Message-ID: CANMO9LBud=vzUc7Pa4B7Ebe3QvMPJi=bQtqtwTPCNFVEitsH+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

Hi Atul,

Please try the code below. Execute all the statements in one transaction.

select * into new_table from old_table where type = 'abcz';
truncate table old_table;
inesrt into old_table select * from new_table;

On Thu, Dec 3, 2020 at 8:16 PM Atul Kumar <akumar14871(at)gmail(dot)com> wrote:

> Hi,
>
> The feed_posts table has over 50 Million rows.
>
> When I m deleting all rows of a certain type that are over 60 days old.
>
> When I try to do a delete like this: it hangs for an entire day, so I
> need to kill it with pg_terminate_backend(pid).
>
> DELETE FROM feed_posts
> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
> AND created_at > '2020-05-11 00:00:00'
> AND created_at < '2020-05-12 00:00:00';
>
> So– I need help in figuring out how to do large deletes on a
> production database during normal hours.
>
> explain plan is given below
>
>
>
> "Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
> " -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
> rows=15534 width=6)"
> " Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
> without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
> without time zone))"
> " Filter: (feed_definition_id =
> 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
> " -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
> rows=54812 width=0)"
> " Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
> time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
> time zone))"
>
>
> please help me on deleting the rows, Do I need to anything in postgres
> configuration ?
> or in table structure ?
>
>
>
>
>
> Regards,
> Atul
>
>
>

--
*Regards,*
*Ravikumar S,*
*Ph: 8106741263*


From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Ravikumar Reddy <urravikumarreddy(at)gmail(dot)com>
Cc: Atul Kumar <akumar14871(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: time taking deletion on large tables
Date: 2020-12-03 15:36:12
Message-ID: 20201203153612.GM24052@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote:
> Please try the code below. Execute all the statements in one transaction.
>
> select * into new_table from old_table where type = 'abcz';
> truncate table old_table;
> inesrt into old_table select * from new_table;

This looks like advice for when most of the rows are being deleted, but I don't
think that's true here. It'd need to LOCK old_table, first, right? Also,
truncate isn't MVCC safe.

Atul: What server version? Do you have an index on feed_definition_id ?
https://wiki.postgresql.org/wiki/Slow_Query_Questions

If explain (analyze,buffers) SELECT runs in a reasonable time for that query,
include its output.

On Thu, Dec 3, 2020 at 8:16 PM Atul Kumar <akumar14871(at)gmail(dot)com> wrote:
> The feed_posts table has over 50 Million rows.
>
> When I m deleting all rows of a certain type that are over 60 days old.
>
> When I try to do a delete like this: it hangs for an entire day, so I
> need to kill it with pg_terminate_backend(pid).
>
> DELETE FROM feed_posts
> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
> AND created_at > '2020-05-11 00:00:00'
> AND created_at < '2020-05-12 00:00:00';
>
> So– I need help in figuring out how to do large deletes on a
> production database during normal hours.
>
> please help me on deleting the rows, Do I need to anything in postgres
> configuration ?
> or in table structure ?


From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: time taking deletion on large tables
Date: 2020-12-03 15:48:46
Message-ID: 5f37b65c-4b3a-9f5f-85fb-2fc3c7a6f530@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

On 12/3/20 8:45 AM, Atul Kumar wrote:
> Hi,
>
> The feed_posts table has over 50 Million rows.
>
> When I m deleting all rows of a certain type that are over 60 days old.
>
> When I try to do a delete like this: it hangs for an entire day, so I
> need to kill it with pg_terminate_backend(pid).
>
> DELETE FROM feed_posts
> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
> AND created_at > '2020-05-11 00:00:00'
> AND created_at < '2020-05-12 00:00:00';
>
> So– I need help in figuring out how to do large deletes on a
> production database during normal hours.

Presumably there is an index on created_at?

What about feed_definition_id?

> explain plan is given below
>
>
>
> "Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
> " -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
> rows=15534 width=6)"
> " Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
> without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
> without time zone))"
> " Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
> " -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
> rows=54812 width=0)"
> " Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
> time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
> time zone))"

Have you recently analyzed the table?

--
Angular momentum makes the world go 'round.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Ravikumar Reddy <urravikumarreddy(at)gmail(dot)com>, Atul Kumar <akumar14871(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: time taking deletion on large tables
Date: 2020-12-03 16:16:16
Message-ID: 1735185.1607012176@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote:
>> When I try to do a delete like this: it hangs for an entire day, so I
>> need to kill it with pg_terminate_backend(pid).
>>
>> DELETE FROM feed_posts
>> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
>> AND created_at > '2020-05-11 00:00:00'
>> AND created_at < '2020-05-12 00:00:00';

90% of the "delete takes forever" complaints that we hear trace down to
having a foreign key reference to the deletion-target table that's not
backed by an index on the referencing column. Then you end up getting
a seqscan on the referencing table to look for rows referencing a
row-to-be-deleted. And then another one for the next row. Etc.

You could try "explain analyze" on a query deleting just a single
one of these rows and see if an RI enforcement trigger is what's
eating the time.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Ravikumar Reddy <urravikumarreddy(at)gmail(dot)com>, Atul Kumar <akumar14871(at)gmail(dot)com>, pgsql-performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: time taking deletion on large tables
Date: 2020-12-03 17:00:44
Message-ID: ff79272c-eaf0-23dc-ae35-086995bc56a0@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin Postg토토 사이트SQL : Postg토토 사이트SQL 메일 링리스트 : 2020-12-03 이후 PGSQL-DOCS 17:00 pgsql-in-general pgsql-performance


On 12/3/20 11:16 AM, Tom Lane wrote:
> Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
>> On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote:
>>> When I try to do a delete like this: it hangs for an entire day, so I
>>> need to kill it with pg_terminate_backend(pid).
>>>
>>> DELETE FROM feed_posts
>>> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
>>> AND created_at > '2020-05-11 00:00:00'
>>> AND created_at < '2020-05-12 00:00:00';
> 90% of the "delete takes forever" complaints that we hear trace down to
> having a foreign key reference to the deletion-target table that's not
> backed by an index on the referencing column. Then you end up getting
> a seqscan on the referencing table to look for rows referencing a
> row-to-be-deleted. And then another one for the next row. Etc.
>
> You could try "explain analyze" on a query deleting just a single
> one of these rows and see if an RI enforcement trigger is what's
> eating the time.
>
>

Yeah. IIRC some other RDBMS systems actually create such an index if it
doesn't already exist. Maybe we should have a warning when setting up an
FK constraint if the referencing fields aren't usefully indexed.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com


From: Rui DeSousa <rui(at)crazybean(dot)net>
To: Atul Kumar <akumar14871(at)gmail(dot)com>
Cc: pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: time taking deletion on large tables
Date: 2020-12-03 18:45:33
Message-ID: 35E33498-8CD9-4D81-A641-42343342D337@crazybean.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-docs pgsql-in-general pgsql-performance

> On Dec 3, 2020, at 9:45 AM, Atul Kumar <akumar14871(at)gmail(dot)com> wrote:
>
> The feed_posts table has over 50 Million rows.
>
> When I m deleting all rows of a certain type that are over 60 days old.
>
> When I try to do a delete like this: it hangs for an entire day, so I
> need to kill it with pg_terminate_backend(pid).

Delete the records in batches. I have used this approach many times successfully for large tables that are highly active on live production systems.

You’ll have to find the correct batch size to use for your dataset while keeping the run time short; i.e. 30 seconds. Then repeatedly call the function using a script — I’ve used a perl script with the DBI module to accomplish it.

i.e.

create or replace function purge_feed_post (_purge_date date, _limit int default 5000)
returns int
as
$$
declare
_rowcnt int;
begin
create temp table if not exists purge_feed_post_set (
feed_post_id int
)
;

/* Identify records to be purged */
insert into purge_feed_post_set (
feed_post_id
)
select feed_post_id
from feed_posts
where created_at < _purge_date
order by created_at
limit _limit
;

/* Remove old records */
delete from feed_posts using purge_feed_post_set
where feed_posts.feed_post_id = purge_feed_post_set.feed_post_id
;

get diagnostics _rowcnt = ROW_COUNT;

delete from purge_feed_post_set;

return _rowcnt;
end;
$$ language plpgsql
set search_path = public
;