This will work with .doc type word documents, not .docx.
Apache POI is a library that is available to convert a word document to HTML. It does have a dependency to Apache Commons Collection 4. To use these libraries you will need to download and install them in QIE.
Step 1) Download both libraries and save in the C:\ProgramData\QIE\Libs\ directory
- Apache POI: https://mvnrepository.com/artifact/org.apache.poi/poi/3.16
- Apache POI excelant: https://mvnrepository.com/artifact/org.apache.poi/poi-excelant/3.16
- Apache POI ooxml: https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml/3.16
- Apache POI scratchpad: https://mvnrepository.com/artifact/org.apache.poi/poi-scratchpad/3.16
- Apache Commons Collection 4: https://mvnrepository.com/artifact/org.apache.commons/commons-collections4/4.1
Step 2) Navigate to System Configuration, and scroll down to the 'External Libraries' section, then click on 'Manage External Libraries'. Make sure that you check both 'poi-3.16.jar', 'poi-excelant-3.16.jar', 'poi-ooxml-3.16.jar', 'poi-scratchpad-3.16.jar', and 'commons-collection4-4.1jar'. When you select the 'Update' button you will be prompted to restart the service. Click 'Yes'.
Step 3) Create a mapping fucntion that will do the work for you.
// read the document that will be converted from disk.
var wordDocument = org.apache.poi.hwpf.converter.WordToHtmlUtils.loadDoc(new java.io.FileInputStream("C:\\temp\\YN.doc"));
// alternatively you can comment the above line and un-comment the next line to convert a set of bytes that you already have.
// var wordDocument = org.apache.poi.hwpf.converter.WordToHtmlUtils.loadDoc(new java.io.ByteArrayInputStream(myByteArray));
// the convertion will use the DOM, so we will new up a document
var newDocument = javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
// create a new wordtohtmlconverter object using apache poi
var wordToHtmlConverter = new org.apache.poi.hwpf.converter.WordToHtmlConverter(newDocument);
// load the word document into the converter
wordToHtmlConverter.processDocument(wordDocument);
// extract the new html document
var htmlDocument = wordToHtmlConverter.getDocument();
// convert the poi html document object to a string
var out = new java.io.ByteArrayOutputStream();
var domSource = new javax.xml.transform.dom.DOMSource(htmlDocument);
var streamResult = new javax.xml.transform.stream.StreamResult(out);
var tf = javax.xml.transform.TransformerFactory.newInstance();
var serializer = tf.newTransformer();
serializer.setOutputProperty(javax.xml.transform.OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT, "yes");
serializer.setOutputProperty(javax.xml.transform.OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
var result = new java.lang.String(out.toByteArray());
// place the result where it needs to go
message = qie.createTextMessage(result, 'UTF-8');