1.2k questions

1.4k answers

361 comments

339 users

Categories

Sidebar
0 votes
1.6K views
by lewis-s-9714 (1.0k points)
Sending messages to another service, we occasionally get errors when a character isn't recognized in the UTF-8 format. We can see in the message detail window that the character is displayed as a red dot, but it was probably originally an M Dash or an N Dash that didn't get encoded correctly. Is there an elegant way to programatically find and remove (or replace with white space) all characters that aren't recognized? Right now, I don't even know how to find a specific bad character at all. When I copy the text into Notepadd++ it's interpretted as "ETB". I'm concerned there might be other characters, such as smart quotes, that need to be handled also.

1 Answer

+1 vote
 
Best answer

ETB is a control character is hex 17. To remove the ETB character use the following script that specifies hex 17:

var tempMessage = String(message.getNode('/'));
var newMessage = tempMessage.replace(/[\x17]/g, '');
message.setNode('/', newMessage);

To remove other characters identify what the hex value is and then modify the script accordingly.

Here is a link to the Windows 1252 character set that can help identify the hex value:

https://en.wikipedia.org/wiki/Windows-1252

by brandon-w-8204 (34.1k points)
selected by lewis-s-9714
...