Lists: | pgsql-bugs |
---|
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 15:04:36 |
Message-ID: | 20090821150436.GA5835@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Hello,
8.4.0 installation on i686 modern Linux, a simple production app with ~4
users entering data; normalised, no weird stuff happening there. Recently:
ERROR: missing chunk number 0 for toast value 120741 in pg_toast_29644
This table has table_log and a "before update" trigger setting
"NEW.last_modified = now()". The broken toast is a HTML blob, stored in
a "text" column.
This particular row currently has last_modified="2009-08-11", but table_log
says it has been updated on "2009-08-12" -- so, data loss.
Please see the attachment for the xmin,xmax,cmin,cmax,ctid values of
these rows. Andrew Gireth has asked for the hexdump of page 539 of the
"promocje" table: http://radek.cc/promocje.hexdump
About a week after I've upgraded to 8.4, the users of this application
have started to complain, that some rows are disappearing.
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
Attachment | Content-Type | Size |
---|---|---|
pg-84-hot-bug.txt | text/plain | 3.8 KB |
From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:03:00 |
Message-ID: | 407d949e0908210903x3316f504g4d1cd8219b66ad69@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek(at)pld-linux(dot)org> wrote:
> This particular row currently has last_modified="2009-08-11", but table_log
> says it has been updated on "2009-08-12" -- so, data loss.
These are all very low transaction ids. you say it was a week of
production before this happened?
How did you migrate your data? Was it a normal pg_dump restore? Or did
you do anything funny with slony or pg_migrate?
> Please see the attachment for the xmin,xmax,cmin,cmax,ctid values of
> these rows. Andrew Gireth has asked for the hexdump of page 539 of the
> "promocje" table: http://radek.cc/promocje.hexdump
Here's a dump of that page btw.
postgres=# select (h).* from (select heap_page_items(p) as h from p) as x;
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 |
t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits
| t_oid
----+--------+----------+--------+--------+--------+----------+----------+-------------+------------+--------+------------------------------------------+-------
1 | 6896 | 1 | 1296 | 3747 | 0 | 0 |
(539,1) | 15 | 10499 | 28 |
1011111000001110000000000000000000000000 |
2 | 5 | 2 | 0 | | | |
| | | |
|
3 | 1164 | 1 | 504 | 3842 | 3843 | 0 |
(539,3) | 15 | 2439 | 28 |
1011111000001110000000000000000000000000 |
4 | 6396 | 1 | 500 | 3806 | 0 | 0 |
(539,4) | 15 | 10503 | 28 |
1011111000001110000000000000000000000000 |
5 | 5928 | 1 | 468 | 3810 | 0 | 0 |
(539,5) | -32753 | 10503 | 28 |
1011111000001110000000000000000000000000 |
6 | 5176 | 1 | 752 | 3812 | 3813 | 0 |
(539,6) | 15 | 2439 | 28 |
1011111000001110000000000000000000000000 |
7 | 4600 | 1 | 576 | 3819 | 6179 | 0 |
(539,12) | 16399 | 10503 | 28 |
1011101000001110000000000000000000000000 |
8 | 4076 | 1 | 524 | 3823 | 3825 | 0 |
(539,8) | 15 | 2439 | 28 |
1011111000001110000000000000000000000000 |
9 | 2872 | 1 | 1204 | 3830 | 0 | 0 |
(539,9) | 15 | 10499 | 28 |
1011111000001110000000000000000000000000 |
10 | 1668 | 1 | 1204 | 3831 | 3832 | 0 |
(539,10) | 15 | 2435 | 28 |
1011111000001110000000000000000000000000 |
11 | 0 | 3 | 0 | | | |
| | | |
|
12 | 0 | 0 | 0 | | | |
| | | |
|
(12 rows)
From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:12:50 |
Message-ID: | 407d949e0908210912yc8679d9j8d369015ab67f0e5@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | Postg롤 토토SQL : Postg롤 토토SQL 메일 링리스트 : 2009-08-21 이후 PGSQL-BUGS 16:12 |
On Fri, Aug 21, 2009 at 5:03 PM, Greg Stark<gsstark(at)mit(dot)edu> wrote:
> On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek(at)pld-linux(dot)org> wrote:
>
> Here's a dump of that page btw.
... [crap word-wrapped by gmail]
Sorry, clearly gmail sucks for posting things that you actually care
about how they're formatted. I've attached the page dump.
It looks like the row *was* updated by transaction 6179 and the new
version was stored in line pointer 12. However it's marked
XMAX_INVALID which means at least somebody at some point thought 6179
had aborted and marked that hint bit.
Do we have any tools to look up a transaction id status in the clog?
Or alternatively do you have all the wal logs and could you run
xlogdump on them looking for transaction id 6179?
Attachment | Content-Type | Size |
---|---|---|
promocje.page | application/octet-stream | 2.6 KB |
From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:15:26 |
Message-ID: | 20090821161526.GE5487@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Radoslaw Zielinski wrote:
> Hello,
>
> 8.4.0 installation on i686 modern Linux, a simple production app with ~4
> users entering data; normalised, no weird stuff happening there. Recently:
>
> ERROR: missing chunk number 0 for toast value 120741 in pg_toast_29644
> bug=# select id, last_modified, xmin,xmax,cmin,cmax,ctid from public.promocje where id = 4300;
> id | last_modified | xmin | xmax | cmin | cmax | ctid
> ------+----------------------------+------+------+------+------+---------
> 4300 | 2009-08-11 09:54:11.007351 | 3819 | 6179 | 0 | 0 | (539,7)
> (1 row)
I wonder if this could be explained by xid=6179 not marked as committed
in clog. I'd try flipping that bit and see what happens ...
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Radoslaw Zielinski <radek(at)pld-linux(dot)org>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:17:26 |
Message-ID: | 20090821161726.GF5487@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Greg Stark wrote:
> It looks like the row *was* updated by transaction 6179 and the new
> version was stored in line pointer 12. However it's marked
> XMAX_INVALID which means at least somebody at some point thought 6179
> had aborted and marked that hint bit.
Hmm, but LP 12 shouldn't have len=0 then, no? Unless it has been
clobbered by vacuum or HOT pruning, I guess ...
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | Radoslaw Zielinski <radek(at)pld-linux(dot)org>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:33:03 |
Message-ID: | 5696.1250872383@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> I wonder if this could be explained by xid=6179 not marked as committed
> in clog. I'd try flipping that bit and see what happens ...
Once the row's been marked XMAX_INVALID, changing clog won't change that
marking ...
regards, tom lane
From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | Radoslaw Zielinski <radek(at)pld-linux(dot)org>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:38:17 |
Message-ID: | 407d949e0908210938u50556a57t1f59993aa6de590b@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Fri, Aug 21, 2009 at 5:15 PM, Alvaro
Herrera<alvherre(at)commandprompt(dot)com> wrote:
> I wonder if this could be explained by xid=6179 not marked as committed
> in clog. I'd try flipping that bit and see what happens ...
Well nothing's going to help much now. Firstly, once the hint bit gets
set nothing second-guesses that and checks the clog anyways. And
secondly the new version of the tuple is already vacuumed.
Either of two things are true.
Either transaction 6179 committed, which would explain why the toast
tuples are missing. In which case sometime later this hint bit became
set and the new version pruned. I don't know if bad memory could cause
all that to happen, would the HOT pruning logic see the hint bit and
decide to prune based on that? I suppose a bad bit hitting the clog
could cause everything though.
Alternatively 6179 aborted but somebody along the way got that wrong
and marked the toast tuples dead (and maybe vacuumed them) thinking it
had committed. That's going to be harder to tell if that's what
happened because we don't have any pointers to the specific page in
the toast table. Not unless you can dump the whole index and find
pointers in there or can find the details in the wal log.
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Cc: | Greg Stark <gsstark(at)mit(dot)edu> |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:38:30 |
Message-ID: | 20090821163830.GA2645@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 18:03]:
> On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek(at)pld-linux(dot)org> wrote:
>> This particular row currently has last_modified="2009-08-11", but table_log
>> says it has been updated on "2009-08-12" -- so, data loss.
> These are all very low transaction ids. you say it was a week of
> production before this happened?
Upgraded around:
$ rpm -q postgresql --qf '%{INSTALLTIME}' | xargs perl -le 'print~~localtime shift'
Fri Aug 7 10:06:44 2009
The first mention of the error in the app's output:
[info] *** Request 5532 (0.008/s) [7283] [Wed Aug 19 13:06:35 2009] ***
User has reported "disappearing rows" at 12 Aug 2009, 16:58.
> How did you migrate your data? Was it a normal pg_dump restore? Or did
> you do anything funny with slony or pg_migrate?
Plain regular dump/restore (from text dump). There's no slony or
anything on this database except for table_log and the trigger to update
last_modified.
In case it might be relevant: locale C, encoding utf8.
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:40:44 |
Message-ID: | 20090821164043.GB2645@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 18:12]:
[...]
> Or alternatively do you have all the wal logs and could you run
> xlogdump on them looking for transaction id 6179?
Nope, I don't have the wal logs.
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 16:55:44 |
Message-ID: | 407d949e0908210955n7deb2946p3257ab21e30e0054@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
2009/8/21 Radoslaw Zielinski <radek(at)pld-linux(dot)org>:
> Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 18:03]:
>> On Fri, Aug 21, 2009 at 4:04 PM, Radoslaw Zielinski<radek(at)pld-linux(dot)org> wrote:
>>> This particular row currently has last_modified="2009-08-11", but table_log
>>> says it has been updated on "2009-08-12" -- so, data loss.
>
>> These are all very low transaction ids. you say it was a week of
>> production before this happened?
>
> Upgraded around:
> $ rpm -q postgresql --qf '%{INSTALLTIME}' | xargs perl -le 'print~~localtime shift'
> Fri Aug 7 10:06:44 2009
Is this, perchance, new hardware? Did you test the memory in it?
Do you have other records that have "disappeared"?
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 17:38:00 |
Message-ID: | 20090821173800.GC2645@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 18:55]:
[...]
> Is this, perchance, new hardware? Did you test the memory in it?
It's a "virtual private server"; the hosting provider is swearing
everything's fine. I can't vouch for it myself, obviously.
> Do you have other records that have "disappeared"?
I haven't found any (yet). So far, verified with:
create function exec(text) returns void language plpgsql as $f$ begin execute $1; end; $f$;
select exec(
'select log_id, id from ( select distinct ''' || relname || ''', case first_value(trigger_mode) over w when ''DELETE'' then null else id end as log_id, first_value(last_modified) over w as lm from log.' || relname || ' window w as (partition by id order by id desc,trigger_id desc ) )x left join public.' || relname || ' on id=log_id where ( id is null or last_modified <> x.lm ) and log_id is not null'
) from pg_stat_user_tables where schemaname = 'log';
(Looks at the log.* table, finds the largest last_modified of the
non-deleted rows, joins the public.* table on id, compares
last_modified).
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 17:46:07 |
Message-ID: | 20090821174607.GA3411@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 18:38]:
[...]
> the toast table. Not unless you can dump the whole index and find
> pointers in there or can find the details in the wal log.
You mean the primary key index? How do I dump it?
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 18:30:11 |
Message-ID: | 407d949e0908211130t440b4caar476172a64cc249d7@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
2009/8/21 Radoslaw Zielinski <radek(at)pld-linux(dot)org>:
> Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 18:38]:
> [...]
>> the toast table. Not unless you can dump the whole index and find
>> pointers in there or can find the details in the wal log.
>
> You mean the primary key index? How do I dump it?
Actually I mean the key for the toast table.
Let me ask firstly do you get anything if you just do select * from
pg_toast.pg_toast_29644 where chunk_id = 120741 ?
And secondly, what do you get if you do "select length(htmlblob) from
tab where id = ?" which should work even if the toast data is broken.
To dump the index -- this will be tedious, but:
Load the pageinspect contrib module and run this:
select (x).* from (select bt_page_items('t_pkey',1) as x ) as y;
select (x).* from (select bt_page_items('t_pkey',2) as x ) as y;
...
Until you get an error. Then look through the output for any pointers
The hex at the end is the index key which in the case of a toast table
starts with the chunk id, you're looking for 120741 which is 0001D7A5
in hex or A5D70100 in little-endian...
If you find any such rows send them and they might tell us what's
going on or where to look for the missing toast records to see what's
going on.
However I'm kind of skeptical that you'll find any. IIRC Btree marks
pointers that it finds point to dead records and reuses them as soon
as possible. Still, maybe nothing's been inserted on that page yet.
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
Cc: | Greg Stark <gsstark(at)mit(dot)edu>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-21 18:35:18 |
Message-ID: | 7971.1250879718@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Radoslaw Zielinski <radek(at)pld-linux(dot)org> writes:
> Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 18:55]:
>> Is this, perchance, new hardware? Did you test the memory in it?
> It's a "virtual private server"; the hosting provider is swearing
> everything's fine. I can't vouch for it myself, obviously.
Hmmm ... have you had any crashes? I'd wonder how trustworthy fsync is
on such a setup.
regards, tom lane
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-22 16:27:09 |
Message-ID: | 20090822162709.GA2492@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 20:30]:
[...]
> Actually I mean the key for the toast table.
> Let me ask firstly do you get anything if you just do select * from
> pg_toast.pg_toast_29644 where chunk_id = 120741 ?
0 rows.
> And secondly, what do you get if you do "select length(htmlblob) from
> tab where id = ?" which should work even if the toast data is broken.
bug=# select length(opis_szczeg) from promocje where id = 4300;
ERROR: missing chunk number 0 for toast value 120741 in pg_toast_29644
> To dump the index -- this will be tedious, but:
$ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt_page_items('promocje_pkey',{}) as x ) as y"
[...]
psql bug -c select 11, (x).* from (select bt_page_items('promocje_pkey',11) as x ) as y
ERROR: block number out of range
Output: http://radek.cc/promocje_bpi.txt.gz
[...]
> If you find any such rows send them and they might tell us what's
> going on or where to look for the missing toast records to see what's
> going on.
No luck.
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Greg Stark <gsstark(at)mit(dot)edu>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-22 16:30:43 |
Message-ID: | 20090822163043.GB2492@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [2009-08-21 20:35]:
> Radoslaw Zielinski <radek(at)pld-linux(dot)org> writes:
>> Greg Stark <gsstark(at)mit(dot)edu> [2009-08-21 18:55]:
>>> Is this, perchance, new hardware? Did you test the memory in it?
>> It's a "virtual private server"; the hosting provider is swearing
>> everything's fine. I can't vouch for it myself, obviously.
> Hmmm ... have you had any crashes? I'd wonder how trustworthy fsync is
> on such a setup.
Nope (at least nothing I have noticed). BTW: it's OpenVZ.
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-23 01:40:19 |
Message-ID: | 407d949e0908221840r2197fa0bj207cb3701e1e8548@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
2009/8/22 Radoslaw Zielinski <radek(at)pld-linux(dot)org>:
> bug=# select length(opis_szczeg) from promocje where id = 4300;
> ERROR: missing chunk number 0 for toast value 120741 in pg_toast_29644
Sorry, what datatype is this again? And what encoding?
Perhaps I should have said octet_length() instead of length().
From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-23 01:44:13 |
Message-ID: | 407d949e0908221844xa9e7b97sbbb01c54a66d18b2@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
2009/8/22 Radoslaw Zielinski <radek(at)pld-linux(dot)org>:
> $ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt_page_items('promocje_pkey',{}) as x ) as y"
> [...]
> psql bug -c select 11, (x).* from (select bt_page_items('promocje_pkey',11) as x ) as y
> ERROR: block number out of range
Sorry, I forgot to adjust the index name in the example. We don't want
the primary key index of your table we want the index of the toast
table.
Try the same thing with bt_page_items('pg_toast.pg_toast_29644_index',...)
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-23 08:00:25 |
Message-ID: | 20090823080025.GA2592@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Greg Stark <gsstark(at)mit(dot)edu> [2009-08-23 03:40]:
> 2009/8/22 Radoslaw Zielinski <radek(at)pld-linux(dot)org>:
>> bug=# select length(opis_szczeg) from promocje where id = 4300;
>> ERROR: missing chunk number 0 for toast value 120741 in pg_toast_29644
> Sorry, what datatype is this again? And what encoding?
text, utf-8 (locale C).
> Perhaps I should have said octet_length() instead of length().
bug=# select octet_length(opis_szczeg) from promocje where id = 4300;
octet_length
--------------
3081
(1 row)
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
From: | Radoslaw Zielinski <radek(at)pld-linux(dot)org> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-23 08:44:43 |
Message-ID: | 20090823084443.GB2592@bzium |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Greg Stark <gsstark(at)mit(dot)edu> [2009-08-23 03:44]:
> 2009/8/22 Radoslaw Zielinski <radek(at)pld-linux(dot)org>:
>> $ seq 11 | xargs -ti psql bug -c "select {}, (x).* from (select bt_page_items('promocje_pkey',{}) as x ) as y"
>> [...]
>> psql bug -c select 11, (x).* from (select bt_page_items('promocje_pkey',11) as x ) as y
>> ERROR: block number out of range
> Sorry, I forgot to adjust the index name in the example. We don't want
> the primary key index of your table we want the index of the toast
> table.
Sorry, I got confused by the index not being shown in \d pg_toast.pg_toast_29644.
> Try the same thing with bt_page_items('pg_toast.pg_toast_29644_index',...)
bug=# select * from bt_page_items('pg_toast.pg_toast_29644_index', 8) where data ~ 'a5 d7 01 00';
-[ RECORD 1 ]-----------------------
itemoffset | 121
ctid | (458,3)
itemlen | 16
nulls | f
vars | f
data | a5 d7 01 00 00 00 00 00
Full output, without the WHERE: http://radek.cc/promocje_bpi2.txt.gz
get_raw_page( 'pg_toast.pg_toast_29644', 458 ): http://radek.cc/toast_29644_page_458.txt
bug=# select * from heap_page_items( get_raw_page( 'pg_toast.pg_toast_29644', 458 ) );
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
----+--------+----------+--------+--------+--------+----------+---------+-------------+------------+--------+--------+-------
1 | 6156 | 1 | 2036 | 3816 | 0 | 1 | (458,1) | 3 | 2306 | 24 | |
2 | 4176 | 1 | 1979 | 3816 | 0 | 1 | (458,2) | 3 | 2306 | 24 | |
3 | 0 | 3 | 0 | | | | | | | | |
4 | 2140 | 1 | 2036 | 3823 | 0 | 0 | (458,4) | 3 | 2306 | 24 | |
(4 rows)
bug=# select * from bt_page_stats('pg_toast.pg_toast_29644_index', 8);
-[ RECORD 1 ]-+-----
blkno | 8
type | l
live_items | 366
dead_items | 1
avg_item_size | 16
page_size | 8192
free_size | 808
btpo_prev | 7
btpo_next | 9
btpo | 0
btpo_flags | 65
--
Radosław Zieliński <radek(at)pld-linux(dot)org>
From: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
---|---|
To: | gsstark(at)mit(dot)edu (Greg Stark), Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Radoslaw Zielinski <radek(at)pld-linux(dot)org>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-23 08:53:30 |
Message-ID: | 87ws4u7vxx.fsf@news-spur.riddles.org.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
>>>>> "Greg" == Greg Stark <gsstark(at)mit(dot)edu> writes:
Greg> Either of two things are true.
Greg> Either transaction 6179 committed,
[snip]
This is all missing the point. The row should have been killed by
transaction 4971, NOT 6179. By the time transaction 6179 tried to
do anything with it, it was almost certainly already broken (or
possibly 6179 broke it).
Notice that in the log table, the log entry that records the most
recent update to the row is the one with xmin=4971. There is no
entry in the log table corresponding to 6179.
--
Andrew (irc:RhodiumToad)
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
Cc: | gsstark(at)mit(dot)edu (Greg Stark), Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Radoslaw Zielinski <radek(at)pld-linux(dot)org>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: 8.4.0 data loss / HOT-related bug |
Date: | 2009-08-24 15:24:16 |
Message-ID: | 12095.1251127456@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Greg" == Greg Stark <gsstark(at)mit(dot)edu> writes:
> Greg> Either of two things are true.
> Greg> Either transaction 6179 committed,
> [snip]
> This is all missing the point. The row should have been killed by
> transaction 4971, NOT 6179. By the time transaction 6179 tried to
> do anything with it, it was almost certainly already broken (or
> possibly 6179 broke it).
If there have been any crashes in this database, the problem might be
a variant of Jeff Ross' issue --- to wit, the row killed by
transaction 4971 actually is dead (and its associated toast items
have been removed), but that row is still found by seqscans because
of an incorrect PD_ALL_VISIBLE flag.
regards, tom lane