Lists: | pgsql-bugs |
---|
From: | Alexandra Nitzschke <an(at)clickware(dot)de> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Bug (#3484) - Missing pg_clog/0AE6 |
Date: | 2008-02-18 15:04:20 |
Message-ID: | 47B99E74.8010806@clickware.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
We did some analyzing and found out some mysterious things.
First we used the pg_check tool to analyze the table.
The result is:
# select pgcheck_page( 211593798 );
WARNING: relation 'adresse_080103', tuple (46,11): start of tuple is not aligned properly
WARNING: relation 'adresse_080103', tuple (46,11): has t_xmax < t_xmin
pgcheck_page
--------------
9808
(1 row)
# select pgcheck_index_full( 211593798 );
NOTICE: name of tested relation: 'adresse_080103':
NOTICE: relation 'adresse_080103' doesn't have index
pgcheck_index_full
--------------------
211593798
(1 row)
So we looked up the row: SELECT * FROM public.adresse_080103 where ctid = '(46,10)'::tid;
and did an pg_filedump on the database file and searched for the data of that recordset.
After that recordset we found corrupted data!
It seems to be corrupted in the same way as the error before.
The dump of the corrupted data is appended. ( We replaced some sensitive data with '*' at the beinning and the end )
We went on with analyzing:
- the table was created at 2008/01/03 17:56h
- the nightly dump started at 2008/01/03 22:00h
- it tried to copy the table 'adresse_080103' at 22:00:08
- the dump crashed at 22:32:10 ( because of the error we reported 2007/12/14; we repaired the database not till 2008/01/11 )
The stat of the database file returns this:
File: "/postgres/database/data/base/23144/211593798"
Size: 1835008 Blocks: 3592 IO Block: 4096 reguläre Datei
Device: 811h/2065d Inode: 18121638 Links: 1
Access: (0600/-rw-------) Uid: ( 1001/postgres) Gid: ( 2/ daemon)
Access: 2008-02-15 18:19:44.000000000 +0100
Modify: 2008-01-03 22:00:34.000000000 +0100
Change: 2008-01-03 22:00:34.000000000 +0100
We are wondering, that the pg_dump seems to have modified the file.
But we think, that the file couldn't be corrupted at this time.
Each night from 2008/01/11 (the repair day) up to 2008/02/01 the nightly dump ran successful.
Suddenly at 2008/02/02 it crashed.
It seems that something corrupted the data without changing the modification time of the file.
At the moment we have no idea, how this situation comes.
Could you please answer, if the pg_dump modifies the access timestamp in some cases?
Thanks in advance,
Alex
From: | Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> |
---|---|
To: | Alexandra Nitzschke <an(at)clickware(dot)de> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Bug (#3484) - Missing pg_clog/0AE6 |
Date: | 2008-02-18 15:19:37 |
Message-ID: | 47B9A209.9090406@sun.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Alexandra Nitzschke napsal(a):
<snip>
>
> We went on with analyzing:
> - the table was created at 2008/01/03 17:56h
> - the nightly dump started at 2008/01/03 22:00h
> - it tried to copy the table 'adresse_080103' at 22:00:08
> - the dump crashed at 22:32:10 ( because of the error we reported
> 2007/12/14; we repaired the database not till 2008/01/11 )
>
> The stat of the database file returns this:
> File: "/postgres/database/data/base/23144/211593798"
> Size: 1835008 Blocks: 3592 IO Block: 4096 reguläre Datei
> Device: 811h/2065d Inode: 18121638 Links: 1
> Access: (0600/-rw-------) Uid: ( 1001/postgres) Gid: ( 2/ daemon)
> Access: 2008-02-15 18:19:44.000000000 +0100
> Modify: 2008-01-03 22:00:34.000000000 +0100
> Change: 2008-01-03 22:00:34.000000000 +0100
>
> We are wondering, that the pg_dump seems to have modified the file.
>
<snip>
>
> Could you please answer, if the pg_dump modifies the access timestamp in
> some cases?
>
Just a idea that pg_dump invoked checkpoint but I don't expect that
table data spent four hour in a buffer cache. Especially in case when
max checkpoint_timeout is one hour.
Zdenek