Sidebar

Can I use QIE to convert a word document to HTML?

0 votes
794 views
asked Jul 7, 2017 by ben-s-7515 (12,640 points)
I need to take a word document and convert it to HTML.  Can this be done using QIE?

2 Answers

+1 vote
 
Best answer

This will work with .doc type word documents, not .docx.

Apache POI is a library that is available to convert a word document to HTML.  It does have a dependency to Apache Commons Collection 4.  To use these libraries you will need to download and install them in QIE.

Step 1) Download both libraries and save in the C:\ProgramData\QIE\Libs\ directory
   - Apache POI: https://mvnrepository.com/artifact/org.apache.poi/poi/3.16
   - Apache POI excelant: https://mvnrepository.com/artifact/org.apache.poi/poi-excelant/3.16
   - Apache POI ooxml: https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml/3.16
   - Apache POI scratchpad: https://mvnrepository.com/artifact/org.apache.poi/poi-scratchpad/3.16
   - Apache Commons Collection 4: https://mvnrepository.com/artifact/org.apache.commons/commons-collections4/4.1

Step 2) Navigate to System Configuration, and scroll down to the 'External Libraries' section, then click on 'Manage External Libraries'.  Make sure that you check both 'poi-3.16.jar',  'poi-excelant-3.16.jar', 'poi-ooxml-3.16.jar', 'poi-scratchpad-3.16.jar', and 'commons-collection4-4.1jar'.  When you select the 'Update' button you will be prompted to restart the service.  Click 'Yes'.

Step 3) Create a mapping fucntion that will do the work for you.

// read the document that will be converted from disk.
var wordDocument = org.apache.poi.hwpf.converter.WordToHtmlUtils.loadDoc(new java.io.FileInputStream("C:\\temp\\YN.doc"));

// alternatively you can comment the above line and un-comment the next line to convert a set of bytes that you already have.
// var wordDocument = org.apache.poi.hwpf.converter.WordToHtmlUtils.loadDoc(new java.io.ByteArrayInputStream(myByteArray));

// the convertion will use the DOM, so we will new up a document
var newDocument = javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
// create a new wordtohtmlconverter object using apache poi
var wordToHtmlConverter = new org.apache.poi.hwpf.converter.WordToHtmlConverter(newDocument);
// load the word document into the converter
wordToHtmlConverter.processDocument(wordDocument);
// extract the new html document
var htmlDocument = wordToHtmlConverter.getDocument();

// convert the poi html document object to a string
var out = new java.io.ByteArrayOutputStream();
var domSource = new javax.xml.transform.dom.DOMSource(htmlDocument);
var streamResult = new javax.xml.transform.stream.StreamResult(out);
var tf = javax.xml.transform.TransformerFactory.newInstance();
var serializer = tf.newTransformer();
serializer.setOutputProperty(javax.xml.transform.OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT, "yes");
serializer.setOutputProperty(javax.xml.transform.OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();

var result = new java.lang.String(out.toByteArray());

// place the result where it needs to go
message = qie.createTextMessage(result, 'UTF-8');

answered Jul 7, 2017 by ben-s-7515 (12,640 points)
selected Jul 7, 2017 by ben-s-7515
0 votes

This will work with .docx type word documents, not .doc

NOTE: This moves all of the text to the HTML document, but you will not get any graphics.

Using a 3rd party library from zwobble, you are able to convert the .docx document to HTML.  This jar will need to be downloaded.

Step 1) Download jar from http://search.maven.org/#search%7Cga%7C1%7Corg.zwobble.mammoth

Step 2) Navigate to System Configuration, and scroll down to the 'External Libraries' section, then click on 'Manage External Libraries'.  Make sure that you check 'mammoth-1.3.1'.  When you select the 'Update' button you will be prompted to restart the service.  Click 'Yes'.

Step 3) Create a mapping function that will do the work for you.

// this requires both 'Zwobble Mammoth' http://search.maven.org/#search%7Cga%7C1%7Corg.zwobble.mammoth


// read the document that will be converted from disk.
var wordDocument = new java.io.FileInputStream("C:\\temp\\YN.docx");

// alternatively you can comment the above line and un-comment the next line to convert a set of bytes that you already have.
// var wordDocument = new java.io.ByteArrayInputStream(myByteArray);

var converter = new org.zwobble.mammoth.DocumentConverter();
var result = converter.convertToHtml(wordDocument);

// place the result where it needs to go
message = qie.createTextMessage(result.getValue(), 'UTF-8');

answered Jul 7, 2017 by ben-s-7515 (12,640 points)
...