From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Noah Misch <noah(at)leadboat(dot)com>, Jeff Davis <jdavis(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: pgsql: Add contrib/pg_walinspect. |
Date: | 2022-04-27 10:22:06 |
Message-ID: | CALj2ACWLV+g=a7ojcLo-R52HeAUZtGdG6XPG6nTFoo7S7e17fw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On Wed, Apr 27, 2022 at 1:47 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > I've now done several runs with your patch and not seen the test failure.
> > > However, I think we ought to rethink this API a bit rather than just
> > > apply the patch as-is. Even if it were documented, relying on
> > > errormsg = NULL to mean something doesn't seem like a great plan.
> >
> > Sorry for being late in the game, occupied with other stuff.
> >
> > How about using private_data of XLogReaderState for
> > read_local_xlog_page_no_wait, something like this?
> >
> > typedef struct ReadLocalXLogPageNoWaitPrivate
> > {
> > bool end_of_wal;
> > } ReadLocalXLogPageNoWaitPrivate;
> >
> > In read_local_xlog_page_no_wait:
> >
> > /* If asked, let's not wait for future WAL. */
> > if (!wait_for_wal)
> > {
> > private_data->end_of_wal = true;
> > break;
> > }
> >
> > /*
> > * Opaque data for callbacks to use. Not used by XLogReader.
> > */
> > void *private_data;
>
> I found an easy way to reproduce this consistently (I think on any server):
>
> I basically generated huge WAL record (I used a fun extension that I
> wrote - https://github.com/BRupireddy/pg_synthesize_wal, but one can
> use pg_logical_emit_message as well)
> session 1:
> select * from pg_synthesize_wal_record(1*1024*1024); --> generate 1 MB
> of WAL record first and make a note of the output lsn (lsn1)
>
> session 2:
> select * from pg_get_wal_records_info_till_end_of_wal(lsn1);
> \watch 1
>
> session 1:
> select * from pg_synthesize_wal_record(1000*1024*1024); --> generate
> ~1 GB of WAL record and we see ERROR: could not read WAL at XXXXX in
> session 2.
>
> Delay the checkpoint (set checkpoint_timeout to 1hr) just not recycle
> the wal files while we run pg_walinspect functions, no other changes
> required from the default initdb settings on the server.
>
> And, Thomas's patch fixes the issue.
Here's v2 patch (up on Thomas's v1 at [1]) using private_data to set
the end of the WAL flag. Please have a look at it.
[1] /message-id/CA%2BhUKGLtswFk9ZO3WMOqnDkGs6dK5kCdQK9gxJm0N8gip5cpiA%40mail.gmail.com
Regards,
Bharath Rupireddy.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Fix-pg_walinspect-race-against-flush-LSN.patch | application/octet-stream | 7.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-04-27 16:45:14 | Unstable tests for recovery conflict handling |
Previous Message | Bharath Rupireddy | 2022-04-27 08:17:11 | Re: pgsql: Add contrib/pg_walinspect. |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2022-04-27 10:33:11 | Re: bogus: logical replication rows/cols combinations |
Previous Message | Пантюшин Александр Иванович | 2022-04-27 10:17:38 | Re: Wrong rows count in EXPLAIN |