Sidebar

Is there a way to remove all unrecognized characters from an HL7 message encoded in UTF-8?

0 votes
1.3K views
asked Apr 22, 2020 by lewis-s-9714 (1,020 points)
Sending messages to another service, we occasionally get errors when a character isn't recognized in the UTF-8 format. We can see in the message detail window that the character is displayed as a red dot, but it was probably originally an M Dash or an N Dash that didn't get encoded correctly. Is there an elegant way to programatically find and remove (or replace with white space) all characters that aren't recognized? Right now, I don't even know how to find a specific bad character at all. When I copy the text into Notepadd++ it's interpretted as "ETB". I'm concerned there might be other characters, such as smart quotes, that need to be handled also.

1 Answer

+1 vote
 
Best answer

ETB is a control character is hex 17. To remove the ETB character use the following script that specifies hex 17:

var tempMessage = String(message.getNode('/'));
var newMessage = tempMessage.replace(/[\x17]/g, '');
message.setNode('/', newMessage);

To remove other characters identify what the hex value is and then modify the script accordingly.

Here is a link to the Windows 1252 character set that can help identify the hex value:

https://en.wikipedia.org/wiki/Windows-1252

answered Apr 22, 2020 by brandon-w-8204 (33,270 points)
selected Apr 22, 2020 by lewis-s-9714
...