From: | Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: block-level incremental backup |
Date: | 2019-09-09 11:30:33 |
Message-ID: | CAM2+6=VHTQZjL9=f9YmjiLYyyhbJNwDQCL8pKSZR8ZaZSa+AHQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 27, 2019 at 11:59 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Aug 16, 2019 at 6:23 AM Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> > [ patches ]
>
> Reviewing 0002 and 0003:
>
> - Commit message for 0003 claims magic number and checksum are 0, but
> that (fortunately) doesn't seem to be the case.
>
Oops, updated commit message.
>
> - looks_like_rel_name actually checks whether it looks like a
> *non-temporary* relation name; suggest adjusting the function name.
>
> - The names do_full_backup and do_incremental_backup are quite
> confusing because you're really talking about what to do with one
> file. I suggest sendCompleteFile() and sendPartialFile().
>
Changed function names.
>
> - Is there any good reason to have 'refptr' as a global variable, or
> could we just pass the LSN around via function arguments? I know it's
> just mimicking startptr, but storing startptr in a global variable
> doesn't seem like a great idea either, so if it's not too annoying,
> let's pass it down via function arguments instead. Also, refptr is a
> crappy name (even worse than startptr); whether we end up with a
> global variable or a bunch of local variables, let's make the name(s)
> clear and unambiguous, like incremental_reference_lsn. Yeah, I know
> that's long, but I still think it's better than being unclear.
>
Renamed variable.
However, I have kept that as global only as it needs many functions to
change their signature, like, sendFile(), sendDir(), sendTablspeace() etc.
> - do_incremental_backup looks like it can never report an error from
> fread(), which is bad. But I see that this is just copied from the
> existing code which has the same problem, so I started a separate
> thread about that.
>
> - I think that passing cnt and blkindex to verify_page_checksum()
> doesn't look very good from an abstraction point of view. Granted,
> the existing code isn't great either, but I think this makes the
> problem worse. I suggest passing "int backup_distance" to this
> function, computed as cnt - BLCKSZ * blkindex. Then, you can
> fseek(-backup_distance), fread(BLCKSZ), and then fseek(backup_distance
> - BLCKSZ).
>
Yep. Done these changes in the refactoring patch.
>
> - While I generally support the use of while and for loops rather than
> goto for flow control, a while (1) loop that ends with a break is
> functionally a goto anyway. I think there are several ways this could
> be revised. The most obvious one is probably to use goto, but I vote
> for inverting the sense of the test: if (PageIsNew(page) ||
> PageGetLSN(page) >= startptr) break; This approach also saves a level
> of indentation for more than half of the function.
>
I have used this new inverted condition, but we still need a while(1) loop.
> - I am not sure that it's a good idea for sendwholefile = true to
> result in dumping the entire file onto the wire in a single CopyData
> message. I don't know of a concrete problem in typical
> configurations, but someone who increases RELSEG_SIZE might be able to
> overflow CopyData's length word. At 2GB the length word would be
> negative, which might break, and at 4GB it would wrap around, which
> would certainly break. See CopyData in
> /docs/12/protocol-message-formats.html To
> avoid this issue, and maybe some others, I suggest defining a
> reasonably large chunk size, say 1MB as a constant in this file
> someplace, and sending the data as a series of chunks of that size.
>
OK. Done as per the suggestions.
>
> - I don't think that the way concurrent truncation is handled is
> correct for partial files. Right now it just falls through to code
> which appends blocks of zeroes in either the complete-file or
> partial-file case. I think that logic should be moved into the
> function that handles the complete-file case. In the partial-file
> case, the blocks that we actually send need to match the list of block
> numbers we promised to send. We can't just send the promised blocks
> and then tack a bunch of zero-filled blocks onto the end that the file
> header doesn't know about.
>
Well, in partial file case we won't end up inside that block. So we are
never sending zeroes at the end in case of partial file.
> - For reviewer convenience, please use the -v option to git
> format-patch when posting and reposting a patch series. Using -v2,
> -v3, etc. on successive versions really helps.
>
Sure. Thanks for letting me know about this option.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Jeevan Chalke | 2019-09-09 11:42:39 | Re: block-level incremental backup |
Previous Message | Jeevan Chalke | 2019-09-09 11:21:34 | Re: block-level incremental backup |