
Can I use QIE to convert a word document to HTML?

0 votes
asked Jul 7, 2017 by ben-s-7515 (12,640 points)
I need to take a word document and convert it to HTML.  Can this be done using QIE?

2 Answers

+1 vote
Best answer

This will work with .doc type word documents, not .docx.

Apache POI is a library that is available to convert a word document to HTML.  It does have a dependency to Apache Commons Collection 4.  To use these libraries you will need to download and install them in QIE.

Step 1) Download both libraries and save in the C:\ProgramData\QIE\Libs\ directory
   - Apache POI:
   - Apache POI excelant:
   - Apache POI ooxml:
   - Apache POI scratchpad:
   - Apache Commons Collection 4:

Step 2) Navigate to System Configuration, and scroll down to the 'External Libraries' section, then click on 'Manage External Libraries'.  Make sure that you check both 'poi-3.16.jar',  'poi-excelant-3.16.jar', 'poi-ooxml-3.16.jar', 'poi-scratchpad-3.16.jar', and 'commons-collection4-4.1jar'.  When you select the 'Update' button you will be prompted to restart the service.  Click 'Yes'.

Step 3) Create a mapping fucntion that will do the work for you.

// read the document that will be converted from disk.
var wordDocument = org.apache.poi.hwpf.converter.WordToHtmlUtils.loadDoc(new"C:\\temp\\YN.doc"));

// alternatively you can comment the above line and un-comment the next line to convert a set of bytes that you already have.
// var wordDocument = org.apache.poi.hwpf.converter.WordToHtmlUtils.loadDoc(new;

// the convertion will use the DOM, so we will new up a document
var newDocument = javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
// create a new wordtohtmlconverter object using apache poi
var wordToHtmlConverter = new org.apache.poi.hwpf.converter.WordToHtmlConverter(newDocument);
// load the word document into the converter
// extract the new html document
var htmlDocument = wordToHtmlConverter.getDocument();

// convert the poi html document object to a string
var out = new;
var domSource = new javax.xml.transform.dom.DOMSource(htmlDocument);
var streamResult = new;
var tf = javax.xml.transform.TransformerFactory.newInstance();
var serializer = tf.newTransformer();
serializer.setOutputProperty(javax.xml.transform.OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT, "yes");
serializer.setOutputProperty(javax.xml.transform.OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);

var result = new java.lang.String(out.toByteArray());

// place the result where it needs to go
message = qie.createTextMessage(result, 'UTF-8');

answered Jul 7, 2017 by ben-s-7515 (12,640 points)
selected Jul 7, 2017 by ben-s-7515
0 votes

This will work with .docx type word documents, not .doc

NOTE: This moves all of the text to the HTML document, but you will not get any graphics.

Using a 3rd party library from zwobble, you are able to convert the .docx document to HTML.  This jar will need to be downloaded.

Step 1) Download jar from

Step 2) Navigate to System Configuration, and scroll down to the 'External Libraries' section, then click on 'Manage External Libraries'.  Make sure that you check 'mammoth-1.3.1'.  When you select the 'Update' button you will be prompted to restart the service.  Click 'Yes'.

Step 3) Create a mapping function that will do the work for you.

// this requires both 'Zwobble Mammoth'

// read the document that will be converted from disk.
var wordDocument = new"C:\\temp\\YN.docx");

// alternatively you can comment the above line and un-comment the next line to convert a set of bytes that you already have.
// var wordDocument = new;

var converter = new org.zwobble.mammoth.DocumentConverter();
var result = converter.convertToHtml(wordDocument);

// place the result where it needs to go
message = qie.createTextMessage(result.getValue(), 'UTF-8');

answered Jul 7, 2017 by ben-s-7515 (12,640 points)