Lists: | pgsql-general |
---|
From: | Catalin BOIE <cboie-pgsql(at)66(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | PANIC: corrupted item pointer: 32766 |
Date: | 2010-05-14 06:32:01 |
Message-ID: | 4BECEE61.8070608@66.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Hello!
I have a serious problem with one of my tables.
Version: postgresql-server-8.4.3-1.fc12.x86_64
Kernel: kernel-2.6.32.11-99.fc12.x86_64
I reindexed all indexes on that table, but I still cannot workaround
this problem.
Memory is ECC and the storage is RAID10 (BIOS reported it OK).
How I can fix this problem?
Thank you!
--
Catalin BOIE
ROUTE 66
From: | Catalin BOIE <cboie-pgsql(at)66(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: PANIC: corrupted item pointer: 32766 |
Date: | 2010-05-14 09:43:02 |
Message-ID: | 4BED1B26.3020909@66.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Some more info.
The PANIC happens several times per minute, so, is really bad for me.
I tried to narrow down based on a field (timestamp) and I found some bad
"points", but I cannot delete them (same PANIC message appear).
Do you have any idea how can I correct that entries?
The worry part is how this happened?!
Thank you!
RAM is 16GiB, 16 "cpus" (including hyperthreading).
On 05/14/2010 09:32 AM, Catalin BOIE wrote:
> Hello!
>
> I have a serious problem with one of my tables.
>
> Version: postgresql-server-8.4.3-1.fc12.x86_64
> Kernel: kernel-2.6.32.11-99.fc12.x86_64
>
> I reindexed all indexes on that table, but I still cannot workaround
> this problem.
>
> Memory is ECC and the storage is RAID10 (BIOS reported it OK).
>
> How I can fix this problem?
>
> Thank you!
>
--
Catalin BOIE
From: | Emanuel Calvo Franco <postgres(dot)arg(at)gmail(dot)com> |
---|---|
To: | Catalin BOIE <cboie-pgsql(at)66(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: PANIC: corrupted item pointer: 32766 |
Date: | 2010-05-14 16:16:52 |
Message-ID: | AANLkTimQmi4_c2xHVIl4D1wH-VI5Bj6GIzA6ny6jrhPX@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
> The PANIC happens several times per minute, so, is really bad for me.
>
> I tried to narrow down based on a field (timestamp) and I found some bad
> "points", but I cannot delete them (same PANIC message appear).
>
> Do you have any idea how can I correct that entries?
>
> The worry part is how this happened?!
>
Do you have fsync turned off?
--
Emanuel Calvo Franco
www.emanuelcalvofranco.com.ar
Join: http://www.thevenusproject.com/
From: | "Igor Neyman" <ineyman(at)perceptron(dot)com> |
---|---|
To: | "Catalin BOIE" <cboie-pgsql(at)66(dot)com>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: PANIC: corrupted item pointer: 32766 |
Date: | 2010-05-14 17:49:17 |
Message-ID: | F4C27E77F7A33E4CA98C19A9DC6722A205F90917@EXCHANGE.corp.perceptron.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
> -----Original Message-----
> From: Catalin BOIE [mailto:cboie-pgsql(at)66(dot)com]
> Sent: Friday, May 14, 2010 5:43 AM
> To: pgsql-general(at)postgresql(dot)org
> Subject: Re: PANIC: corrupted item pointer: 32766
>
> Some more info.
>
> The PANIC happens several times per minute, so, is really bad for me.
>
> I tried to narrow down based on a field (timestamp) and I
> found some bad "points", but I cannot delete them (same PANIC
> message appear).
>
> Do you have any idea how can I correct that entries?
>
> The worry part is how this happened?!
>
> Thank you!
>
> RAM is 16GiB, 16 "cpus" (including hyperthreading).
>
> On 05/14/2010 09:32 AM, Catalin BOIE wrote:
> > Hello!
> >
> > I have a serious problem with one of my tables.
> >
> > Version: postgresql-server-8.4.3-1.fc12.x86_64
> > Kernel: kernel-2.6.32.11-99.fc12.x86_64
> >
> > I reindexed all indexes on that table, but I still cannot
> workaround
> > this problem.
> >
> > Memory is ECC and the storage is RAID10 (BIOS reported it OK).
> >
> > How I can fix this problem?
> >
> > Thank you!
> >
>
>
> --
> Catalin BOIE
>
If you can read other (good) records from this table, then:
1. create "intermediate table
2. copy all "good" records from original table into the new table
3. drop original table
4. rename "intermediate" table to "original" name
5. re-created required indexes (and any other objects dependent on this
table)
Igor Neyman
From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Catalin BOIE <cboie-pgsql(at)66(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: PANIC: corrupted item pointer: 32766 |
Date: | 2010-05-14 19:42:42 |
Message-ID: | 1273864846-sup-5312@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Excerpts from Catalin BOIE's message of vie may 14 02:32:01 -0400 2010:
> Hello!
>
> I have a serious problem with one of my tables.
>
> Version: postgresql-server-8.4.3-1.fc12.x86_64
> Kernel: kernel-2.6.32.11-99.fc12.x86_64
Hmm, it's pretty unfortunate that those buffer checks are inside
PageRepairFragmentation, because that means they are being called when
in a critical section, turning harmless ERRORs into PANICs. I think
those checks could be copied to another routine to be called outside the
critical sections.
The value 32766 is 0x7ffe, which is a pretty suspicious value. It'd be
good to see a copy of the problem block (though it'll prove difficult to
determine _which_ is the problem block ...)
--
From: | Catalin BOIE <cboie-pgsql(at)66(dot)com> |
---|---|
To: | Emanuel Calvo Franco <postgres(dot)arg(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: PANIC: corrupted item pointer: 32766 |
Date: | 2010-05-17 05:58:03 |
Message-ID: | 4BF0DAEB.2000702@66.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On 05/14/2010 07:16 PM, Emanuel Calvo Franco wrote:
>> The PANIC happens several times per minute, so, is really bad for me.
>>
>> I tried to narrow down based on a field (timestamp) and I found some bad
>> "points", but I cannot delete them (same PANIC message appear).
>>
>> Do you have any idea how can I correct that entries?
>>
>> The worry part is how this happened?!
>>
>
> Do you have fsync turned off?
I left it at the default value, so I assume it was on.
--
Catalin BOIE
From: | Catalin BOIE <cboie-pgsql(at)66(dot)com> |
---|---|
To: | Igor Neyman <ineyman(at)perceptron(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: PANIC: corrupted item pointer: 32766 |
Date: | 2010-05-17 06:09:03 |
Message-ID: | 4BF0DD7F.1070503@66.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On 05/14/2010 08:49 PM, Igor Neyman wrote:
>
>
>> -----Original Message-----
>> From: Catalin BOIE [mailto:cboie-pgsql(at)66(dot)com]
>> Sent: Friday, May 14, 2010 5:43 AM
>> To: pgsql-general(at)postgresql(dot)org
>> Subject: Re: PANIC: corrupted item pointer: 32766
>>
>> Some more info.
>>
>> The PANIC happens several times per minute, so, is really bad for me.
>>
>> I tried to narrow down based on a field (timestamp) and I
>> found some bad "points", but I cannot delete them (same PANIC
>> message appear).
>>
>> Do you have any idea how can I correct that entries?
>>
>> The worry part is how this happened?!
>>
>> Thank you!
>>
>> RAM is 16GiB, 16 "cpus" (including hyperthreading).
>>
>> On 05/14/2010 09:32 AM, Catalin BOIE wrote:
>>> Hello!
>>>
>>> I have a serious problem with one of my tables.
>>>
>>> Version: postgresql-server-8.4.3-1.fc12.x86_64
>>> Kernel: kernel-2.6.32.11-99.fc12.x86_64
>>>
>>> I reindexed all indexes on that table, but I still cannot
>> workaround
>>> this problem.
>>>
>>> Memory is ECC and the storage is RAID10 (BIOS reported it OK).
>>>
>>> How I can fix this problem?
>>>
>>> Thank you!
>>>
>>
>>
>> --
>> Catalin BOIE
>>
>
> If you can read other (good) records from this table, then:
>
> 1. create "intermediate table
> 2. copy all "good" records from original table into the new table
> 3. drop original table
> 4. rename "intermediate" table to "original" name
> 5. re-created required indexes (and any other objects dependent on this
> table)
>
>
> Igor Neyman
>
Only now I seen your message, but this is exactly what I did.
I built a script that did this:
for (uid = 1; uid < 'SELECT MAX(uid) FROM table'; uid++)
INSERT INTO new_table SELECT * FROM table WHERE uid = $uid
uid is the primary key and is a sequence.
I managed to recover all rows (around 5 millions), except 22, that where
inserted at the same time.
Seems that not the rows where bad, but the link to a page where they
were stored. I am not familiar with the PostgreSQL code. I am also not
familiar with the structure on mem/disk of the database. So, have mercy
with my explanations.
I am very sure that rows were inserted one after another.
Having no more ideas, I dropped all the indexes to the bad table. No change.
ANALYZE worked.
VACUUM generated the same error message.
fsync was at the default value (I assume is on).
For now I fixed the problem, but I am worried that the problem will come
back, because I have no idea what could generate this corruption.
Maybe, would be nice that this message to dump more data to help
somebody to debug the problem.
I searched this error and I seen that there are only around 5 cases. So,
is seldom, but is a bad one.
Thanks to everybody for their support!
P.S. I still have the bad table. If you want me to debug something, just
ask.
--
Catalin BOIE