Re: Incorrect number of rows inserted into partitioned table

Lists: pgsql-bugs
From: Роман Осипов <osipovromanvladimirovich(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Incorrect number of rows inserted into partitioned table
Date: 2023-04-10 18:42:28
Message-ID: CAAqk126uv_J-T8MhOvMqmUUD6MPsxFpq=zojDnWeftZokDYyEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Good afternoon

When reloading data from a simple table into a partitioned (through
inheritance) using a query like:* insert into [new_partition_table] select
* from [old_table] limit xxxxxxx*;

There is an insertion not of the amount specified in *limit*,, but a little
more or less than it.

SQL queries in attachment

Attachment Content-Type Size
Test-limit.sql application/octet-stream 1.2 KB

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Роман Осипов <osipovromanvladimirovich(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Incorrect number of rows inserted into partitioned table
Date: 2023-04-10 21:36:16
Message-ID: CAApHDvrY3UJN5xfZ6pACx3zdctXw7jCUaP9GdvfRpUW5vxUENQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, 11 Apr 2023 at 08:50, Роман Осипов
<osipovromanvladimirovich(at)gmail(dot)com> wrote:
> When reloading data from a simple table into a partitioned (through inheritance) using a query like: insert into [new_partition_table] select * from [old_table] limit xxxxxxx;
>
> There is an insertion not of the amount specified in limit,, but a little more or less than it.

It would be good to see what your output actually is. I don't
currently see a way for the number of rows inserted to be *more* than
the LIMIT. If it's less, then that'll be because of the rule you have
defined to insert into another table instead.

With your example SQLs, you'll probably find that the reported number
of inserted rows matches: select count(*) from ONLY book_inherit_test;

Also, you might want to look at declarative partitioning [1] and use
that instead.

David

[1] /docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE


From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Роман Осипов <osipovromanvladimirovich(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Incorrect number of rows inserted into partitioned table
Date: 2023-04-10 21:39:07
Message-ID: 4561fe9b-e303-dc28-1a3e-1b5aab626395@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 4/10/23 20:42, Роман Осипов wrote:
> Good afternoon
>
> When reloading data from a simple table into a partitioned (through
> inheritance) using a query like:*insert into [new_partition_table]
> select * from [old_table] limit xxxxxxx*;
>
> There is an insertion not of the amount specified in *limit*,, but a
> little more or less than it.
>

Which version? I did try this on master, and I can't reproduce it - the
count at the end returns the correct value (1M).

The rule affects the insert status, which looks e.g. like this:

INSERT 0 611532

instead of 1M, but that's expected because of the rule, I think.

FWIW I wonder why you use rules, it's rather tricky to get that right,
which is why the usual recommendation is not to use this if there's an
alternative way to do stuff (which for partitioning there is).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Роман Осипов <osipovromanvladimirovich(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Incorrect number of rows inserted into partitioned table
Date: 2023-04-10 21:58:15
Message-ID: CAAqk124SXFPhY-6En1YYq1EB3=QYk8bUBj6XT4HBDjis8fBSVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Good afternoon David.

Thanks for the quick response.
I know about declarative partitioning and just use it. But unfortunately
that doesn't fix the problem.

In this case, the problem is not how many returns *insert *with *limit*.
And the fact that the tables always have a different number of rows. For
example:

1) truncate table book_inherit_test;

2) insert into book_inherit_test select * from book_test limit 1000000;

3) select count(*) from book_inherit_test;

4) select count(*) from only book_inherit_test;

5) select count(*) from only book_inherit_test_before_1950;
Attempt 1

1) Updated Rows 0

2) Updated Rows 612528

3) count(*) 1001227

4) count(*) 612528

5) count(*) 388699

(4)+(5)=(3)

612528 + 388699 = 1001227 That's right, but expected 1000000

Attempt 2

1) Updated Rows 0

2) Updated Rows 612602

3) count(*) 1001320

4) count(*) 612602

5) count(*) 388718

(4)+(5)=(3)

612602 + 388718 = 1001320 That's right, but expected 1000000

вт, 11 апр. 2023 г. в 00:39, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>:

> On 4/10/23 20:42, Роман Осипов wrote:
> > Good afternoon
> >
> > When reloading data from a simple table into a partitioned (through
> > inheritance) using a query like:*insert into [new_partition_table]
> > select * from [old_table] limit xxxxxxx*;
> >
> > There is an insertion not of the amount specified in *limit*,, but a
> > little more or less than it.
> >
>
> Which version? I did try this on master, and I can't reproduce it - the
> count at the end returns the correct value (1M).
>
> The rule affects the insert status, which looks e.g. like this:
>
> INSERT 0 611532
>
> instead of 1M, but that's expected because of the rule, I think.
>
> FWIW I wonder why you use rules, it's rather tricky to get that right,
> which is why the usual recommendation is not to use this if there's an
> alternative way to do stuff (which for partitioning there is).
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Роман Осипов <osipovromanvladimirovich(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Incorrect number of rows inserted into partitioned table
Date: 2023-04-10 22:04:39
Message-ID: 23169.1681164279@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

=?utf-8?B?0KDQvtC80LDQvSDQntGB0LjQv9C+0LI=?= <osipovromanvladimirovich(at)gmail(dot)com> writes:
> There is an insertion not of the amount specified in *limit*,, but a little
> more or less than it.

I think you're getting burnt by the fact that a rule is a macro,
combined with the fact that your query is underspecified:

insert into book_inherit_test select * from book_test limit 1000000;

This doesn't constrain *which* 1000000 rows of book_test get inserted.
If we EXPLAIN it we get:

=# explain insert into book_inherit_test select * from book_test limit 1000000;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Insert on book_inherit_test (cost=0.00..32353.00 rows=0 width=0)
-> Subquery Scan on "*SELECT*" (cost=0.00..32353.00 rows=617176 width=31)
Filter: ((("*SELECT*".releaseyear >= 1900) AND ("*SELECT*".releaseyear < 1951)) IS NOT TRUE)
-> Limit (cost=0.00..17353.00 rows=1000000 width=31)
-> Seq Scan on book_test (cost=0.00..34706.00 rows=2000000 width=31)

Insert on book_inherit_test_before_1950 (cost=0.00..32353.00 rows=0 width=0)
-> Subquery Scan on "*SELECT*" (cost=0.00..32353.00 rows=382824 width=31)
Filter: (("*SELECT*".releaseyear >= 1900) AND ("*SELECT*".releaseyear < 1951))
-> Limit (cost=0.00..17353.00 rows=1000000 width=31)
-> Seq Scan on book_test (cost=0.00..34706.00 rows=2000000 width=31)
(11 rows)

from which we can see that the insertions into book_inherit_test and
those into book_inherit_test_before_1950 are driven off completely
separate scans of book_test. And that table is big enough that
the synchronize_seqscans feature kicks in, meaning that indeed this
will scan two different million-row subsets of book_test, producing
results fundamentally unlike what you expected.

I get stable results after disabling synchronize_seqscans, but
a more correct way would be to add ORDER BY to fully determine which
rows of book_test are considered.

Advice: don't use a RULE for this sort of thing. If you really want to,
you can get the same effects more predictably with a trigger.
But as David said, the whole thing looks like a poor reimplementation
of partitioning.

regards, tom lane


From: Роман Осипов <osipovromanvladimirovich(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Incorrect number of rows inserted into partitioned table
Date: 2023-04-11 05:48:23
Message-ID: CAAqk126zKVpD1A+i0vU_WSLFD4vM0aT+=+YaEtMrXcuMJpb-ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Good afternoon Thomas.

I am using the PostgreSQL 15.2, compiled by Visual C++ build 1914, 64-bit.

If you answer the question why I use this option.
That answer will be simple. This is just the result of experiments in which
I stumbled upon this behavior.

вт, 11 апр. 2023 г. в 00:58, Роман Осипов <
osipovromanvladimirovich(at)gmail(dot)com>:

> Good afternoon David.
>
> Thanks for the quick response.
> I know about declarative partitioning and just use it. But unfortunately
> that doesn't fix the problem.
>
> In this case, the problem is not how many returns *insert *with *limit*.
> And the fact that the tables always have a different number of rows. For
> example:
>
> 1) truncate table book_inherit_test;
>
> 2) insert into book_inherit_test select * from book_test limit 1000000;
>
> 3) select count(*) from book_inherit_test;
>
> 4) select count(*) from only book_inherit_test;
>
> 5) select count(*) from only book_inherit_test_before_1950;
> Attempt 1
>
> 1) Updated Rows 0
>
> 2) Updated Rows 612528
>
> 3) count(*) 1001227
>
> 4) count(*) 612528
>
> 5) count(*) 388699
>
> (4)+(5)=(3)
>
> 612528 + 388699 = 1001227 That's right, but expected 1000000
>
> Attempt 2
>
> 1) Updated Rows 0
>
> 2) Updated Rows 612602
>
> 3) count(*) 1001320
>
> 4) count(*) 612602
>
> 5) count(*) 388718
>
> (4)+(5)=(3)
>
> 612602 + 388718 = 1001320 That's right, but expected 1000000
>
> вт, 11 апр. 2023 г. в 00:39, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>:
>
>> On 4/10/23 20:42, Роман Осипов wrote:
>> > Good afternoon
>> >
>> > When reloading data from a simple table into a partitioned (through
>> > inheritance) using a query like:*insert into [new_partition_table]
>> > select * from [old_table] limit xxxxxxx*;
>> >
>> > There is an insertion not of the amount specified in *limit*,, but a
>> > little more or less than it.
>> >
>>
>> Which version? I did try this on master, and I can't reproduce it - the
>> count at the end returns the correct value (1M).
>>
>> The rule affects the insert status, which looks e.g. like this:
>>
>> INSERT 0 611532
>>
>> instead of 1M, but that's expected because of the rule, I think.
>>
>> FWIW I wonder why you use rules, it's rather tricky to get that right,
>> which is why the usual recommendation is not to use this if there's an
>> alternative way to do stuff (which for partitioning there is).
>>
>>
>> regards
>>
>> --
>> Tomas Vondra
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>