Sidebar

How to read from a in Postgresql database that uses the SQL_ASCII encoding

0 votes
1.9K views
asked Apr 3, 2020 by mike-r-7535 (13,830 points)

I'm trying to query a Postgres database, but I hitting the following error:

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0xa3

After some discovery, the database uses SQL_ASCII as its default encoding (I have no control of this).  My client connection's encoding is UTF8 (I could not change this on the JDBC connection URL).  How can I resolve this exception?

1 Answer

0 votes
 
Best answer

There is no exact conversion between SQL_ASCII and UTF8.   SQL_ASCII is a generic text field, but no values are mapped to UTF8.  If you know the encoding of the string, you will need to use that.  If you are trying to guess the encoding of the string, I would start with WIN1252.  See https://www.postgresql.org/docs/9.4/functions-string.html for a list of the available encodings.

Within QIE, we often use 'windows-1252' encoding to view raw bytes as a string.  Postgres also can use the WIN1252 encoding.  The good news with WIN1252 is that it is also a single byte encoding (so SQL_ASCII can be cast to this without any byte modification) and almost* all of its characters are mapped to UTF8.

So, we can cast the field of type SQL_ASCII to WIN1252, and then convert from WIN1252 to UTF8 like this:

convert(convert_to(Address1, 'WIN1252'), 'WIN1252', 'UTF8')

* There are 5 unused and therefore unmapped characters in the win1252 encoding.  If your SQL_ASCII data contains some of these values, then you will need to either replace those bytes, or find a different encoding that better fits the data used.

 

answered Apr 3, 2020 by mike-r-7535 (13,830 points)
edited Apr 3, 2020 by mike-r-7535
...