Lists: | pgsql-hackers |
---|
From: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | memory leak in pgoutput |
Date: | 2024-11-16 08:37:07 |
Message-ID: | DM3PR84MB3442E14B340E553313B5C816E3252@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello,
I recently noticed a unusually large memory consumption in pgoutput where
"RelationSyncCache" is maintaining approximately 3 GB of memory while
handling 15,000 tables.
Upon investigating the cause, I found that "tts_tupleDescriptor" in both
"old_slot" and "new_slot" wouldn't be freed before the reusing the entry.
The refcount of the tuple descriptor is initialized as -1 in init_tuple_slot(),
which means it will not be refcounted. So, when cleaning the outdated slots,
ExecDropSingleTupleTableSlot() would omit tuple descriptors.
To address this issue, calling FreeTupleDesc() to release the tuple descriptor
before ExecDropSingleTupleTableSlot() might be a suitable solution. However,
I am uncertain whether this approach is appropriate.
Reproduct
=========
It's easy to reproduce the issue with sysbench:
1. Create emtpy tables on primary.
```
sysbench oltp_read_only --db-driver=pgsql \
--pgsql-port=5432 \
--pgsql-db=postgres \
--pgsql-user=postgres \
--tables=15000 --table_size=0 \
--report-interval=1 \
--threads=10 prepare
```
2. Create a standby and promote it.
3. Create publication on primary and subscription on standby.
```
CREATE PUBLICATION pub1 FOR ALL TABLES;
CREATE SUBSCRIPTION sub1
CONNECTION 'port=5432 user=postgres dbname=postgres'
PUBLICATION pub1
WITH (enabled, create_slot, slot_name='pub1_to_sub1', copy_data=false);
```
4. Bench on primary.
```
sysbench oltp_write_only --db-driver=pgsql \
--pgsql-port=5432 \
--pgsql-db=postgres \
--pgsql-user=postgres \
--tables=15000 --table_size=100 \
--report-interval=1 \
--threads=10 run \
--time=180
```
Tested in PostgreSQL 17, the memory consumption of walsender is 804MB after the
benchmark, and it decreases to 441MB after applying the patch.
Thanks for your feedback.
Kind Regards,
Boyu Yang
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-memory-leak-in-pgoutput.patch | application/octet-stream | 1.2 KB |
From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: memory leak in pgoutput |
Date: | 2024-11-18 02:53:52 |
Message-ID: | OS0PR01MB5716606345806CA8F43B5EF794272@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Saturday, November 16, 2024 4:37 PM by Yang <mobile(dot)yang(at)outlook(dot)com> wrote:
> I recently noticed a unusually large memory consumption in pgoutput where
> "RelationSyncCache" is maintaining approximately 3 GB of memory while
> handling 15,000 tables.
>
> Upon investigating the cause, I found that "tts_tupleDescriptor" in both
> "old_slot" and "new_slot" wouldn't be freed before the reusing the entry.
> The refcount of the tuple descriptor is initialized as -1 in init_tuple_slot(),
> which means it will not be refcounted. So, when cleaning the outdated slots,
> ExecDropSingleTupleTableSlot() would omit tuple descriptors.
Thanks for reporting the issue. I also confirmed that the bug exists and
your analysis is correct.
>
> To address this issue, calling FreeTupleDesc() to release the tuple descriptor
> before ExecDropSingleTupleTableSlot() might be a suitable solution. However,
> I am uncertain whether this approach is appropriate.
I think the proposed change is reasonable.
I considered another approach which is to mark tupledesc reference-counted
instead. But to make that work, we lack a global resource owner which is
required by IncrTupleDescRefCount/DecrTupleDescRefCount. (pgoutput doesn't
create its own resowner, only the toptransaction's resowner is available.)
Also, pgoutput doesn't reference the tupledesc in other places so it doesn't
seems useful to mark them reference-counted.
But I think there is an issue in the attached patch:
+ FreeTupleDesc(entry->old_slot->tts_tupleDescriptor);
ExecDropSingleTupleTableSlot(entry->old_slot);
Here, after freeing the tupledesc, the ExecDropSingleTupleTableSlot will still
access the freed tupledesc->tdrefcount which is an illegal memory access.
I think we can do something like below instead:
+ TupleDesc desc = entry->old_slot->tts_tupleDescriptor;
+
+ Assert(desc->tdrefcount == -1);
+
ExecDropSingleTupleTableSlot(entry->old_slot);
+ FreeTupleDesc(desc);
Best Regards,
Hou zj
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | by Yang <mobile(dot)yang(at)outlook(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: memory leak in pgoutput |
Date: | 2024-11-18 03:43:27 |
Message-ID: | Zzq334VG-l2vCAw5@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Nov 18, 2024 at 02:53:52AM +0000, Zhijie Hou (Fujitsu) wrote:
> But I think there is an issue in the attached patch:
>
> + FreeTupleDesc(entry->old_slot->tts_tupleDescriptor);
> ExecDropSingleTupleTableSlot(entry->old_slot);
>
> Here, after freeing the tupledesc, the ExecDropSingleTupleTableSlot will still
> access the freed tupledesc->tdrefcount which is an illegal memory access.
>
> I think we can do something like below instead:
>
> + TupleDesc desc = entry->old_slot->tts_tupleDescriptor;
> +
> + Assert(desc->tdrefcount == -1);
> +
> ExecDropSingleTupleTableSlot(entry->old_slot);
> + FreeTupleDesc(desc);
Yep, obviously.
I was first surprised that the DecrTupleDescRefCount() done in
ExecDropSingleTupleTableSlot() would not be enough to free the
TupleDesc.
We do some allocations for dynamically-allocated resources like what
pgoutput does in the SRF code, if I recall correctly, as these need
are a problem across multiple calls and the query-level memory context
would not do this cleanup..
--
Michael
From: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: memory leak in pgoutput |
Date: | 2024-11-18 07:00:57 |
Message-ID: | DM3PR84MB3442C5736F1207066CFBCC88E3272@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> Here, after freeing the tupledesc, the ExecDropSingleTupleTableSlot will still
> access the freed tupledesc->tdrefcount which is an illegal memory access.
Yes, I overlooked that.
> I think we can do something like below instead:
>
> + TupleDesc desc = entry->old_slot->tts_tupleDescriptor;
> +
> + Assert(desc->tdrefcount == -1);
> +
> ExecDropSingleTupleTableSlot(entry->old_slot);
> + FreeTupleDesc(desc);
It seems a bit odd because "entry->old_slot->tts_tupleDescriptor" is accessed
after "entry->old_slot" has been freed. I think we can avoid this by assigning
"desc" to NULL before ExecDropSingleTupleTableSlot().
```
+ TupleDesc desc = entry->old_slot->tts_tupleDescriptor;
+
+ Assert(desc->tdrefcount == -1);
+
+ FreeTupleDesc(desc);
+ desc = NULL;
ExecDropSingleTupleTableSlot(entry->old_slot);
```
By the way, this issue is introduced in 52e4f0cd472d39d. Therefore, we may need
to backport the patch to v15.
Best Regards,
Boyu Yang
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: memory leak in pgoutput |
Date: | 2024-11-18 07:58:38 |
Message-ID: | ZzrzrpXJTgYWM_eg@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Nov 18, 2024 at 07:00:57AM +0000, by Yang wrote:
> By the way, this issue is introduced in 52e4f0cd472d39d. Therefore, we may need
> to backport the patch to v15.
Yes. Note that nothing can happen on stable branches for a few days
as a release is planned for this week. See here:
/message-id/cd54868c-3162-41f9-bba6-16e6b1449ff3@postgresql.org
--
Michael
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: memory leak in pgoutput |
Date: | 2024-11-19 00:21:17 |
Message-ID: | ZzvZ_SwGdEWMFYWZ@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Nov 18, 2024 at 04:58:38PM +0900, Michael Paquier wrote:
> On Mon, Nov 18, 2024 at 07:00:57AM +0000, by Yang wrote:
>> By the way, this issue is introduced in 52e4f0cd472d39d. Therefore, we may need
>> to backport the patch to v15.
>
> Yes. Note that nothing can happen on stable branches for a few days
> as a release is planned for this week. See here:
> /message-id/cd54868c-3162-41f9-bba6-16e6b1449ff3@postgresql.org
By the way, if possible, could you send an updated version of the
patch to show what you have in mind?
--
Michael
From: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Subject: | Re: memory leak in pgoutput |
Date: | 2024-11-19 07:08:26 |
Message-ID: | DM3PR84MB344267C77FE5CF0612E59C6FE3202@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> By the way, if possible, could you send an updated version of the
> patch to show what you have in mind?
Yeah, here is the new patch:
I have verifed that this patch works for REL_[15-17]_STABLE and master.
The memory consumption of "logical replication cache context" remains
consistently at 112 MB during the benchmark mentioned above.
> sysbench oltp_write_only --db-driver=pgsql \
> --pgsql-port=5432 \
> --pgsql-db=postgres \
> --pgsql-user=postgres \
> --tables=15000 --table_size=100 \
> --report-interval=1 \
> --threads=10 run \
> --time=180
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Fix-memory-leak-in-pgoutput.patch | application/octet-stream | 1.2 KB |
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Subject: | Re: memory leak in pgoutput |
Date: | 2024-11-20 00:39:55 |
Message-ID: | Zz0v26PbMkukcizA@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Nov 19, 2024 at 07:08:26AM +0000, by Yang wrote:
> I have verifed that this patch works for REL_[15-17]_STABLE and master.
> The memory consumption of "logical replication cache context" remains
> consistently at 112 MB during the benchmark mentioned above.
You should be more careful with the amount of tests you are doing
here. This fails while waiting for some changes to be streamed when
creating a subscription:
cd src/test/subscription/ && PROVE_TESTS=t/017_stream_ddl.pl make check
--
Michael
From: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: memory leak in pgoutput |
Date: | 2024-11-20 10:41:50 |
Message-ID: | DM3PR84MB3442BCD53125120366645CF5E3212@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> You should be more careful with the amount of tests you are doing
> here. This fails while waiting for some changes to be streamed when
> creating a subscription:
> cd src/test/subscription/ && PROVE_TESTS=t/017_stream_ddl.pl make check
I apologize for the obvious error in the previous patch. I have corrected it
in the new patch(v3) and pass the regression testing.
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Fix-memory-leak-in-pgoutput.patch | application/octet-stream | 1.2 KB |
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | by Yang <mobile(dot)yang(at)outlook(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: memory leak in pgoutput |
Date: | 2024-11-21 06:25:09 |
Message-ID: | Zz7SRcBXxhOYngOr@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Nov 20, 2024 at 10:41:50AM +0000, by Yang wrote:
> I apologize for the obvious error in the previous patch. I have corrected it
> in the new patch(v3) and pass the regression testing.
It took me quite a bit of time to evaluate the amount of the damage,
and indeed sysbench has been quite good at showing the problem. It is
not that much though depending on the number of tables. With map
laptop and 500 tables, the cleanup of the slots showed up in sudden
burts that increased the memory footprint of the WAL sender.
I think that there is at least one more leak, which is even smaller
than the one you have reported here. It takes a longer run time to
show up with sysbench, but it's here with the WAL sender memory
growing slowly over time. We should be much more careful with this
area of the code in terms of memory handling. Perhaps with a broader
memory context associated to the data of the cached entries, reset
each time an entry is validated? This current coding style is quite
dangerous to rely on.
Anyway, this patch fixes a portion of the damage and Hou's method was
a bit cleaner, so I have used it and applied it down to v15.
--
Michael