How do I determine the unicode value of a character?

Question

Sidebar

How do I determine the unicode value of a character?

1 Answer

Best answer

The Word document has a multi-byte, UTF-8 character. When extracting the text, the multiple bytes show up as multiple characters in your conversion. You can replace the character with a single byte character once you know its unicode value.

Select and copy the character from the Word document into your clipboard.
Go to a unicode converter tool like this one: http://www.endmemo.com/unicode/unicodeconverter.php
Paste the character into the "Unicode Character" section.
Click Convert
Use the "Escaped Unicode" value in your replaceAll call.
For more information, google for the "UTF-8 Code" value and add the key word "unicode" (google: unicode CE A9).

You can then replace the multi-byte character (Greek Capital Letter Omega), with a single-byte character or a description if you like:

var textWithNormalSemicolon = textWithSpecialSemicolon.replaceAll("\u03A9", "Omega");

answered Aug 30, 2013 by mike-r-7535 (13.8k points)
selected Aug 30, 2013 by mike-r-7535

How do I determine the unicode value of a character?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.