Lists: | pgsql-hackers |
---|
From: | jesper(at)krogh(dot)cc |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 13:11:40 |
Message-ID: | 47c04e8c42e50ad2c63975cbf9c3cfb4.squirrel@shrek.krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi
I tried running pg_upgrade from the current snapshot of postgresql and
upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
in the process and all that came out was only "ok's" but when I tried a
simple query on the databse it keeps throwing these message out of the back
side.
DETAIL: You might have already suffered transaction-wraparound data loss.
WARNING: some databases have not been vacuumed in over 2 billion
transactions
The database was around 600GB and it took a couple of minutes to run
pg_upgrade after I had all the binaries in the correct place.
It is not really an easy task to throw around 600GB of data, so I cannot
gaurantee that the above is reproducible, but I'll see if I can get time
and try to reproduce it.
Jesper
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | jesper(at)krogh(dot)cc |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 16:57:43 |
Message-ID: | 201005181657.o4IGvhk05043@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
jesper(at)krogh(dot)cc wrote:
> Hi
>
> I tried running pg_upgrade from the current snapshot of postgresql and
> upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
> in the process and all that came out was only "ok's" but when I tried a
> simple query on the databse it keeps throwing these message out of the back
> side.
>
> DETAIL: You might have already suffered transaction-wraparound data loss.
> WARNING: some databases have not been vacuumed in over 2 billion
> transactions
>
>
> The database was around 600GB and it took a couple of minutes to run
> pg_upgrade after I had all the binaries in the correct place.
>
> It is not really an easy task to throw around 600GB of data, so I cannot
> gaurantee that the above is reproducible, but I'll see if I can get time
> and try to reproduce it.
This certainly should never have happened, so I am guessing it is a bug.
pg_upgrade tries hard to make sure all your datfrozenxid and
relfrozenxid are properly migrated from the old server, and the
transaction id is set properly. Unfortunately this is the first time I
have heard of such a problem, so I am unclear on its cause.
The warning is issued from vacuum.c::vac_truncate_clog(). Can you run
this query and show us the output:
SELECT datname, datfrozenxid FROM pg_database;
It would be good to see these numbers on both the old and new servers.
I would also like to see:
SELECT txid_current();
on the old and new servers, but if you can only provide these values on
one of the two servers, it is still useful. Thanks.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
From: | jesper(at)krogh(dot)cc |
---|---|
To: | "Bruce Momjian" <bruce(at)momjian(dot)us> |
Cc: | jesper(at)krogh(dot)cc, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 17:22:12 |
Message-ID: | b1c2322f3425a870df1bfcb1f2319a33.squirrel@shrek.krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> jesper(at)krogh(dot)cc wrote:
>> Hi
>>
>> I tried running pg_upgrade from the current snapshot of postgresql and
>> upgrading from 8.4.4 to the snapshot version. Everything seem to look
>> fine
>> in the process and all that came out was only "ok's" but when I tried a
>> simple query on the databse it keeps throwing these message out of the
>> back
>> side.
>>
>> DETAIL: You might have already suffered transaction-wraparound data
>> loss.
>> WARNING: some databases have not been vacuumed in over 2 billion
>> transactions
>>
>>
>> The database was around 600GB and it took a couple of minutes to run
>> pg_upgrade after I had all the binaries in the correct place.
>>
>> It is not really an easy task to throw around 600GB of data, so I cannot
>> gaurantee that the above is reproducible, but I'll see if I can get time
>> and try to reproduce it.
>
> This certainly should never have happened, so I am guessing it is a bug.
> pg_upgrade tries hard to make sure all your datfrozenxid and
> relfrozenxid are properly migrated from the old server, and the
> transaction id is set properly. Unfortunately this is the first time I
> have heard of such a problem, so I am unclear on its cause.
>
> The warning is issued from vacuum.c::vac_truncate_clog(). Can you run
> this query and show us the output:
>
> SELECT datname, datfrozenxid FROM pg_database;
>
> It would be good to see these numbers on both the old and new servers.
> I would also like to see:
>
> SELECT txid_current();
>
> on the old and new servers, but if you can only provide these values on
> one of the two servers, it is still useful. Thanks.
Hi Bruce, thanks for your prompt response.
First the new one..
jk(at)pal:~$ psql -p 5433
psql (9.0beta1)
Type "help" for help.
data=# SELECT datname, datfrozenxid FROM pg_database;
datname | datfrozenxid
-----------+--------------
template0 | 654
postgres | 2374592801
data | 2023782337
jk | 2023822188
template1 | 2374592801
workqueue | 2023822188
(6 rows)
data=# SELECT txid_current();
txid_current
--------------
2375384556
(1 row)
data=# \q
Then the old one.
jk(at)pal:~$ psql data
psql (9.0beta1, server 8.4.1)
WARNING: psql version 9.0, server version 8.4.
Some psql features might not work.
Type "help" for help.
data# SELECT datname, datfrozenxid FROM pg_database;
datname | datfrozenxid
-----------+--------------
template0 | 2073823552
postgres | 2023820521
data | 2023782337
jk | 2023822188
template1 | 2073823552
workqueue | 2023822188
(6 rows)
data=# SELECT txid_current();
txid_current
--------------
2390524243
(1 row)
The old database has been "copied" over using rsync and
pg_start_backup()/pg_stop_backup() procecures and started up
using a recovery.conf file.
Jesper
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | jesper(at)krogh(dot)cc |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 18:52:17 |
Message-ID: | 201005181852.o4IIqHW05061@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
jesper(at)krogh(dot)cc wrote:
> > on the old and new servers, but if you can only provide these values on
> > one of the two servers, it is still useful. Thanks.
>
> Hi Bruce, thanks for your prompt response.
>
> First the new one..
Great.
> jk(at)pal:~$ psql -p 5433
> psql (9.0beta1)
> Type "help" for help.
>
> data=# SELECT datname, datfrozenxid FROM pg_database;
> datname | datfrozenxid
> -----------+--------------
> template0 | 654
> postgres | 2374592801
> data | 2023782337
> jk | 2023822188
> template1 | 2374592801
> workqueue | 2023822188
> (6 rows)
>
> data=# SELECT txid_current();
> txid_current
> --------------
> 2375384556
> (1 row)
>
> data=# \q
>
> Then the old one.
>
> jk(at)pal:~$ psql data
> psql (9.0beta1, server 8.4.1)
> WARNING: psql version 9.0, server version 8.4.
> Some psql features might not work.
> Type "help" for help.
>
> data# SELECT datname, datfrozenxid FROM pg_database;
> datname | datfrozenxid
> -----------+--------------
> template0 | 2073823552
This line above looks very odd because I didn't think the template0
datfrozenxid could be advanced. Can I see the output of this query:
SELECT datname, datfrozenxid, datallowconn FROM pg_database;
I am wondering if you set datallowconn for template0 to 'true'.
> postgres | 2023820521
> data | 2023782337
> jk | 2023822188
> template1 | 2073823552
> workqueue | 2023822188
> (6 rows)
>
> data=# SELECT txid_current();
> txid_current
> --------------
> 2390524243
> (1 row)
>
>
> The old database has been "copied" over using rsync and
> pg_start_backup()/pg_stop_backup() procecures and started up
> using a recovery.conf file.
My other idea is that somehow recovery touches datallowconn for
template0.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 19:21:46 |
Message-ID: | 4BF2E8CA.7090407@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 사이트 순위SQL |
On 2010-05-18 20:52, Bruce Momjian wrote:
> This line above looks very odd because I didn't think the template0
> datfrozenxid could be advanced. Can I see the output of this query:
>
> SELECT datname, datfrozenxid, datallowconn FROM pg_database;
>
>
Only from the "old" database:
data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
datname | datfrozenxid | datallowconn
-----------+--------------+--------------
template0 | 2073823552 | f
postgres | 2023820521 | t
data | 2023782337 | t
jk | 2023822188 | t
template1 | 2073823552 | t
workqueue | 2023822188 | t
(6 rows)
> I am wondering if you set datallowconn for template0 to 'true'.
From this database, I cannot give any more results, I ran some other
queries and then restarted postgres, subsequently it seemed
totally broken. I'm in the process of running the test over again, but
it'll take a while before data is in. I'll report back.
--
Jesper
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Jesper Krogh <jesper(at)krogh(dot)cc> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 19:56:48 |
Message-ID: | 201005181956.o4IJumw18267@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg토토 베이SQL |
Jesper Krogh wrote:
> On 2010-05-18 20:52, Bruce Momjian wrote:
> > This line above looks very odd because I didn't think the template0
> > datfrozenxid could be advanced. Can I see the output of this query:
> >
> > SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> >
> >
>
> Only from the "old" database:
> data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> datname | datfrozenxid | datallowconn
> -----------+--------------+--------------
> template0 | 2073823552 | f
> postgres | 2023820521 | t
> data | 2023782337 | t
> jk | 2023822188 | t
> template1 | 2073823552 | t
> workqueue | 2023822188 | t
> (6 rows)
OK, datallowconn = false is right for template0, but I am still confused
how it got set to that high value.
> > I am wondering if you set datallowconn for template0 to 'true'.
>
> From this database, I cannot give any more results, I ran some other
> queries and then restarted postgres, subsequently it seemed
> totally broken. I'm in the process of running the test over again, but
> it'll take a while before data is in. I'll report back.
OK, thanks. This does seem odd. Frankly, having template0's
datfrozenxid be wrong would not cause any kind of instability because
template0 is used only by pg_dump, so I am wondering if something else
is seriously wrong.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | jesper(at)krogh(dot)cc |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 20:17:51 |
Message-ID: | 201005182017.o4IKHp325194@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
jesper(at)krogh(dot)cc wrote:
> First the new one..
>
> jk(at)pal:~$ psql -p 5433
> psql (9.0beta1)
> Type "help" for help.
>
> data=# SELECT datname, datfrozenxid FROM pg_database;
> datname | datfrozenxid
> -----------+--------------
> template0 | 654
> postgres | 2374592801
> data | 2023782337
> jk | 2023822188
> template1 | 2374592801
> workqueue | 2023822188
> (6 rows)
>
> data=# SELECT txid_current();
> txid_current
> --------------
> 2375384556
> (1 row)
I just ran a test and all the datfrozenxids are less than the current
xid, so the only database that could be generating a wraparound warning
is 'template0'. But, again, I though that template0 was not touched for
wraparound protection --- I am starting to think I am wrong.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | jesper <jesper(at)krogh(dot)cc> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 20:20:17 |
Message-ID: | 1274213979-sup-6080@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Excerpts from jesper's message of mar may 18 13:22:12 -0400 2010:
> jk(at)pal:~$ psql data
> psql (9.0beta1, server 8.4.1)
> WARNING: psql version 9.0, server version 8.4.
> Some psql features might not work.
> Type "help" for help.
>
> data# SELECT datname, datfrozenxid FROM pg_database;
> datname | datfrozenxid
> -----------+--------------
> template0 | 2073823552
> postgres | 2023820521
> data | 2023782337
> jk | 2023822188
> template1 | 2073823552
> workqueue | 2023822188
> (6 rows)
Does the old server have pg_database.datallowconn = true for template0?
--
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | jesper <jesper(at)krogh(dot)cc>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 20:32:30 |
Message-ID: | 201005182032.o4IKWUc27569@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Alvaro Herrera wrote:
> Excerpts from jesper's message of mar may 18 13:22:12 -0400 2010:
>
> > jk(at)pal:~$ psql data
> > psql (9.0beta1, server 8.4.1)
> > WARNING: psql version 9.0, server version 8.4.
> > Some psql features might not work.
> > Type "help" for help.
> >
> > data# SELECT datname, datfrozenxid FROM pg_database;
> > datname | datfrozenxid
> > -----------+--------------
> > template0 | 2073823552
> > postgres | 2023820521
> > data | 2023782337
> > jk | 2023822188
> > template1 | 2073823552
> > workqueue | 2023822188
> > (6 rows)
>
> Does the old server have pg_database.datallowconn = true for template0?
The user reported back that it did not:
Only from the "old" database:
data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
datname | datfrozenxid | datallowconn
-----------+--------------+--------------
template0 | 2073823552 | f
postgres | 2023820521 | t
data | 2023782337 | t
jk | 2023822188 | t
template1 | 2073823552 | t
workqueue | 2023822188 | t
(6 rows)
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 21:08:01 |
Message-ID: | 4BF301B1.4070206@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2010-05-18 21:56, Bruce Momjian wrote:
> Jesper Krogh wrote:
>
>> On 2010-05-18 20:52, Bruce Momjian wrote:
>>
>>> This line above looks very odd because I didn't think the template0
>>> datfrozenxid could be advanced. Can I see the output of this query:
>>>
>>> SELECT datname, datfrozenxid, datallowconn FROM pg_database;
>>>
>>>
>>>
>> Only from the "old" database:
>> data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
>> datname | datfrozenxid | datallowconn
>> -----------+--------------+--------------
>> template0 | 2073823552 | f
>> postgres | 2023820521 | t
>> data | 2023782337 | t
>> jk | 2023822188 | t
>> template1 | 2073823552 | t
>> workqueue | 2023822188 | t
>> (6 rows)
>>
> OK, datallowconn = false is right for template0, but I am still confused
> how it got set to that high value.
>
This is the "production system". I have absolutely no indications that
anything should be wrong in there. It has run rock-solid since it got
migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
scared about you telling that it seems wrong. (but that cannot be
attributed to pg_upgrade)
> OK, thanks. This does seem odd. Frankly, having template0's
> datfrozenxid be wrong would not cause any kind of instability because
> template0 is used only by pg_dump, so I am wondering if something else
> is seriously wrong.
>
I also think that something was seriously wrong with the pg_upgrade'd
version. I'll try to reproduce and be a bit more carefull in tracking
the steps
this time.
--
Jesper
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Jesper Krogh <jesper(at)krogh(dot)cc> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-18 21:16:24 |
Message-ID: | 201005182116.o4ILGOQ09352@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Jesper Krogh wrote:
> On 2010-05-18 21:56, Bruce Momjian wrote:
> > Jesper Krogh wrote:
> >
> >> On 2010-05-18 20:52, Bruce Momjian wrote:
> >>
> >>> This line above looks very odd because I didn't think the template0
> >>> datfrozenxid could be advanced. Can I see the output of this query:
> >>>
> >>> SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> >>>
> >>>
> >>>
> >> Only from the "old" database:
> >> data=# SELECT datname, datfrozenxid, datallowconn FROM pg_database;
> >> datname | datfrozenxid | datallowconn
> >> -----------+--------------+--------------
> >> template0 | 2073823552 | f
> >> postgres | 2023820521 | t
> >> data | 2023782337 | t
> >> jk | 2023822188 | t
> >> template1 | 2073823552 | t
> >> workqueue | 2023822188 | t
> >> (6 rows)
> >>
> > OK, datallowconn = false is right for template0, but I am still confused
> > how it got set to that high value.
> >
>
> This is the "production system". I have absolutely no indications that
> anything should be wrong in there. It has run rock-solid since it got
> migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
> scared about you telling that it seems wrong. (but that cannot be
> attributed to pg_upgrade)
I am on chat with Alvaro now and it seems we do somehow connect to
template0 for transaction id wraparound. I think Alvaro will post
shortly on this.
> > OK, thanks. This does seem odd. Frankly, having template0's
> > datfrozenxid be wrong would not cause any kind of instability because
> > template0 is used only by pg_dump, so I am wondering if something else
> > is seriously wrong.
> >
> I also think that something was seriously wrong with the pg_upgrade'd
> version. I'll try to reproduce and be a bit more carefull in tracking
> the steps
> this time.
Thanks, but I think the entire problem might be this template0 xid issue
that Alvaro and I are researching. I can now see how invalid template0
xids could cause the instability you saw in the new database. Odd no
one has seen this bug before.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-19 17:29:28 |
Message-ID: | 4BF41FF8.2080001@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2010-05-18 18:57, Bruce Momjian wrote:
> jesper(at)krogh(dot)cc wrote:
>
>> Hi
>>
>> I tried running pg_upgrade from the current snapshot of postgresql and
>> upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
>> in the process and all that came out was only "ok's" but when I tried a
>> simple query on the databse it keeps throwing these message out of the back
>> side.
>>
>> DETAIL: You might have already suffered transaction-wraparound data loss.
>> WARNING: some databases have not been vacuumed in over 2 billion
>> transactions
>>
>>
>> The database was around 600GB and it took a couple of minutes to run
>> pg_upgrade after I had all the binaries in the correct place.
>>
>> It is not really an easy task to throw around 600GB of data, so I cannot
>> gaurantee that the above is reproducible, but I'll see if I can get time
>> and try to reproduce it.
>>
> This certainly should never have happened, so I am guessing it is a bug.
> pg_upgrade tries hard to make sure all your datfrozenxid and
> relfrozenxid are properly migrated from the old server, and the
> transaction id is set properly. Unfortunately this is the first time I
> have heard of such a problem, so I am unclear on its cause.
>
Other people are typically way faster than I am looking into it.
Depesz has produced a full trace to reproduce the problem here:
http://www.depesz.com/index.php/2010/05/19/waiting-for-9-0-pg_upgrade/
Jesper
--
Jesper
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-19 18:29:30 |
Message-ID: | 201005191829.o4JITUr04829@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Bruce Momjian wrote:
> > This is the "production system". I have absolutely no indications that
> > anything should be wrong in there. It has run rock-solid since it got
> > migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
> > scared about you telling that it seems wrong. (but that cannot be
> > attributed to pg_upgrade)
>
> I am on chat with Alvaro now and it seems we do somehow connect to
> template0 for transaction id wraparound. I think Alvaro will post
> shortly on this.
>
> > > OK, thanks. This does seem odd. Frankly, having template0's
> > > datfrozenxid be wrong would not cause any kind of instability because
> > > template0 is used only by pg_dump, so I am wondering if something else
> > > is seriously wrong.
> > >
> > I also think that something was seriously wrong with the pg_upgrade'd
> > version. I'll try to reproduce and be a bit more carefull in tracking
> > the steps
> > this time.
>
> Thanks, but I think the entire problem might be this template0 xid issue
> that Alvaro and I are researching. I can now see how invalid template0
> xids could cause the instability you saw in the new database. Odd no
> one has seen this bug before.
OK, after talking to Alvaro and Heikki, the problem is that while you
cannot connect to template0, it is accessed by autovacuum for vacuum
freeze, even if autovacuum is turned off. I think the reason you are
seeing this bug is that your xid counter is near 2 billion (50% to
wraparound) and the original template0 xids are the maximum distance
from your counter.
I am attaching the newest patch which fixes this issue. I did modify
this code yesterday with another patch, and I am unclear exactly if you
need that patch as well. CVS now has all these changes.
If you could test with this and the earlier patch, I think it will now
work fine. Thanks for the valuable testing, and quick feedback.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
/pgpatches/pg_upgrade | text/x-diff | 4.5 KB |
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Jesper Krogh <jesper(at)krogh(dot)cc> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Date: | 2010-05-19 18:45:43 |
Message-ID: | 201005191845.o4JIjhx08147@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Jesper Krogh wrote:
> On 2010-05-18 18:57, Bruce Momjian wrote:
> > jesper(at)krogh(dot)cc wrote:
> >
> >> Hi
> >>
> >> I tried running pg_upgrade from the current snapshot of postgresql and
> >> upgrading from 8.4.4 to the snapshot version. Everything seem to look fine
> >> in the process and all that came out was only "ok's" but when I tried a
> >> simple query on the databse it keeps throwing these message out of the back
> >> side.
> >>
> >> DETAIL: You might have already suffered transaction-wraparound data loss.
> >> WARNING: some databases have not been vacuumed in over 2 billion
> >> transactions
> >>
> >>
> >> The database was around 600GB and it took a couple of minutes to run
> >> pg_upgrade after I had all the binaries in the correct place.
> >>
> >> It is not really an easy task to throw around 600GB of data, so I cannot
> >> gaurantee that the above is reproducible, but I'll see if I can get time
> >> and try to reproduce it.
> >>
> > This certainly should never have happened, so I am guessing it is a bug.
> > pg_upgrade tries hard to make sure all your datfrozenxid and
> > relfrozenxid are properly migrated from the old server, and the
> > transaction id is set properly. Unfortunately this is the first time I
> > have heard of such a problem, so I am unclear on its cause.
> >
>
> Other people are typically way faster than I am looking into it.
> Depesz has produced a full trace to reproduce the problem here:
> http://www.depesz.com/index.php/2010/05/19/waiting-for-9-0-pg_upgrade/
Thanks. I have commented on the blog to mention that we have fixed the
bug reported there.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com