From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Paul Lindner <lindner(at)inuus(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Upcoming PG re-releases |
Date: | 2005-12-06 20:25:13 |
Message-ID: | 200512062025.jB6KPDK02212@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers Postg토토 커뮤니티SQL : Postg토토 커뮤니티SQL 메일 링리스트 : 2005-12-06 이후 PGSQL WWW 20:25 |
Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > I have added your suggestions to the 8.1.X release notes.
> >
> > Did you read the followup discussion? Recommending -c without a large
> > warning seems a very bad idea.
>
> Well, I said it would remove invalid sequences. What else should we
> say?
>
> This will remove invalid character sequences.
>
> I saw no clear solution that allowed sequences to be corrected.
The release note text is:
Some users are having problems loading <literal>UTF8</> data into 8.1.X.
This is because previous versions allowed invalid <literal>UTF8</>
sequences to be entered into the database, and this release properly
accepts only valid <literal>UTF8</> sequences. One way to correct a
dumpfile is to use <command>iconv -c -f utf-8 -t utf-8</>. This will
remove invalid character sequences. <command>iconv</> reads the entire
input file into memory so it might be necessary to <command>split</> the
dump into multiple smaller files for processing.
One nice solution would be if iconv would report the lines with errors
and you could correct them, but I see no way to do that. The only thing
you could do is to diff the old and new files to see the problems. Is
that helpful? Here is new text I have used:
Some users are having problems loading <literal>UTF8</> data into 8.1.X.
This is because previous versions allowed invalid <literal>UTF8</>
sequences to be entered into the database, and this release properly
accepts only valid <literal>UTF8</> sequences. One way to correct a
dumpfile is to use <command>iconv -c -f utf-8 -t utf-8 -o cleanfile.sql
dumpfile.sql</>. The <literal>-c</> option removes invalid character
sequences. A diff of the two files will show the sequences that are
invalid. <command>iconv</> reads the entire input file into memory so
it might be necessary to <command>split</> the dump into multiple
smaller files for processing.
It highlights the 'diff' idea.
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2005-12-06 20:26:02 | Re: Concurrent CREATE INDEX, try 2 (was Re: Reducing |
Previous Message | Tom Lane | 2005-12-06 20:12:40 | Re: Concurrent CREATE INDEX, try 2 (was Re: Reducing relation locking overhead) |
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2005-12-06 20:26:22 | Re: Integration Requirements |
Previous Message | Magnus Hagander | 2005-12-06 20:21:18 | Re: Launching PostgreSQL KB Project Mark 2 |