The backup was created from a database with the UTF8 encoding using pg_dump. The backup is in the tar format.
I then created a new database on another server running the same version of postgreSQL (8.2.4) using this command:
createdb -E utf8 db1
When running pg_restore I get the following error:
pg_restore: [archiver (db)] Error from TOC entry 1667; 0 14758638 TABLE DATA table1 db1 pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0xc520
The original databse is no longer available.
How can I restore this data or find the byte sequence that is causing the problem?
-
Older versions of Postgres would allow invalid byte sequences to be entered into a database. There was a note about this and a suggsted fix in a recent releasenote:
Some users are having problems loading UTF-8 data into 8.1.X. This is because previous versions allowed invalid UTF-8 byte sequences to be entered into the database, and this release properly accepts only valid UTF-8 sequences. One way to correct a dumpfile is to run the command iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql dumpfile.sql. The -c option removes invalid character sequences. A diff of the two files will show the sequences that are invalid. iconv reads the entire input file into memory so it might be necessary to use split to break up the dump into multiple smaller files for processing.
If the database is not very large or complex it might be easier to locate the offending text in the original database and correct it before doing a new dump. A field that has user-entered input or contains data imported from other sources might be a culprit.
From Console -
It's probably the same issue I had once, when migratting from a 7.4 to a 8.2 db. I used the instructions on this web article to solve the problem. This presuppose that you still have access to the original database. Otherwise, you can probably restore it in an older version of Postgresql and give a try with that procedure.
From edomaur -
This little Perl script may save you : Repairing Broken documents that mix UTF-8 and ISO-8859-1
Redirect the script output to a new file. All illegal characters should have been replaced with their correct UTF-8 incarnation. The script reads the input line by line, too, so it shouldn't need too much memory.
From wazoox -
I solved this problem with the following steps:
pg_restore -f db1.sql-v db1.tar
I then removed everything from the db1.sql file except for the table1 copy command. Then ran:
psql -d db1 < db1.sql
This then gave me the exact line number within the file where the error was occuring. I then opened the file and removed the problem character and re ran the script.
From Simon
0 comments:
Post a Comment