java - Docx4j difference between two Word docs -
i need check difference between 2 word docx files. iam using docx4j. @ first had alter smartxmlformatter:
public smartxmlformatter(writer w) throws ioexception { this.xml = new xmlwriternsimpl(w, false); if (this.writexmldeclaration) { this.xml.xmldecl(); this.writexmldeclaration = false; } this.xml.setprefixmapping("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "w"); this.xml.setprefixmapping("http://schemas.microsoft.com/office/word/2010/wordml", "w14"); this.xml.setprefixmapping("http://schemas.microsoft.com/office/word/2012/wordml", "w15"); this.xml.setprefixmapping("http://schemas.openxmlformats.org/officedocument/2006/relationships", "r"); this.xml.setprefixmapping("http://schemas.openxmlformats.org/drawingml/2006/wordprocessingdrawing", "wp"); this.xml.setprefixmapping("http://schemas.openxmlformats.org/drawingml/2006/main", "a"); this.xml.setprefixmapping("http://schemas.openxmlformats.org/drawingml/2006/picture", "pic"); this.xml.setprefixmapping(constants.base_ns_uri, "dfx"); this.xml.setprefixmapping(constants.delete_ns_uri, "del"); this.xml.setprefixmapping(constants.insert_ns_uri, "ins"); }
after had changed code without russian letters works fine. when diff 2 docx documents russian characters next exception raises:
org.xml.sax.saxparseexception; linenumber: 1; columnnumber: 10510; präfix "w14" für attribut "w14:paraid", das mit elementtyp "w:p" verknüpft ist, ist nicht gebunden. @ com.sun.org.apache.xerces.internal.util.errorhandlerwrapper.createsaxparseexception(unknown source) @ com.sun.org.apache.xerces.internal.util.errorhandlerwrapper.fatalerror(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlerrorreporter.reporterror(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlerrorreporter.reporterror(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlerrorreporter.reporterror(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.scanstartelement(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl$fragmentcontentdriver.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentscannerimpl.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl.scandocument(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xmlparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl$jaxpsaxparser.parse(unknown source) @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.unmarshal0(unknown source) @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.unmarshal(unknown source) @ javax.xml.bind.helpers.abstractunmarshallerimpl.unmarshal(unknown source) @ javax.xml.bind.helpers.abstractunmarshallerimpl.unmarshal(unknown source) @ org.docx4j.xmlutils.unmarshalstring(xmlutils.java:381) @ org.docx4j.xmlutils.unmarshalstring(xmlutils.java:361) @ docx4jdiff.comparedocumentsusingdriver.main(comparedocumentsusingdriver.java:88) exception in thread "main" javax.xml.bind.unmarshalexception - linked exception: [org.xml.sax.saxparseexception; linenumber: 1; columnnumber: 10510; präfix "w14" für attribut "w14:paraid", das mit elementtyp "w:p" verknüpft ist, ist nicht gebunden.] @ javax.xml.bind.helpers.abstractunmarshallerimpl.createunmarshalexception(unknown source) @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.createunmarshalexception(unknown source) @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.unmarshal0(unknown source) @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.unmarshal(unknown source) @ javax.xml.bind.helpers.abstractunmarshallerimpl.unmarshal(unknown source) @ javax.xml.bind.helpers.abstractunmarshallerimpl.unmarshal(unknown source) @ org.docx4j.xmlutils.unmarshalstring(xmlutils.java:381) @ org.docx4j.xmlutils.unmarshalstring(xmlutils.java:361) @ docx4jdiff.comparedocumentsusingdriver.main(comparedocumentsusingdriver.java:88) caused by: org.xml.sax.saxparseexception; linenumber: 1; columnnumber: 10510; präfix "w14" für attribut "w14:paraid", das mit elementtyp "w:p" verknüpft ist, ist nicht gebunden. @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl$jaxpsaxparser.parse(unknown source) ... 7 more
so please can help me?
here maincode:
public class comparedocumentsusingdriver { public static jaxbcontext context = org.docx4j.jaxb.context.jc; /** * @param args */ public static void main(string[] args) throws exception { system.setproperty("file.encoding", "utf-8"); string newerfilepath = "b.docx"; string olderfilepath = "a.docx"; // 1. load packages wordprocessingmlpackage newerpackage = wordprocessingmlpackage .load(new java.io.file(newerfilepath)); wordprocessingmlpackage olderpackage = wordprocessingmlpackage .load(new java.io.file(olderfilepath)); body newerbody = ((document) newerpackage.getmaindocumentpart() .getjaxbelement()).getbody(); body olderbody = ((document) olderpackage.getmaindocumentpart() .getjaxbelement()).getbody(); system.out.println("differencing.."); // 2. differencing stringwriter sw = new stringwriter(); docx4jdriver.diff(xmlutils.marshaltow3cdomdocument(newerbody) .getdocumentelement(), xmlutils.marshaltow3cdomdocument(olderbody) .getdocumentelement(), sw); // signature takes reader objects appears broken // 3. result string contentstr = sw.tostring(); system.out.println("result: \n\n " + contentstr); body newbody = (body) xmlutils.unwrap(xmlutils.unmarshalstring(contentstr)); // in general case, need handle relationships. not done here! // relationshipspart rp = // newerpackage.getmaindocumentpart().getrelationshipspart(); // handlerels(pd, rp); newerpackage.setfontmapper(new identityplusmapper()); newerpackage.save(new java.io.file("compared.docx")); } /** * in general case, need handle relationships. although not * necessary in simple example, anyway purposes of * illustration. */ private static void handlerels(differencer pd, relationshipspart rp) { // since going add together rels appropriate docs beingness // compared, neatness , avoid duplication // (duplication of internal part names fatal in word, // , export xslt makes images internal, though avoid // duplicating // part ), // remove existing rels point images list<relationship> relstoremove = new arraylist<relationship>(); (relationship r : rp.getrelationships().getrelationship()) { // type="http://schemas.openxmlformats.org/officedocument/2006/relationships/image" if (r.gettype().equals(namespaces.image)) { relstoremove.add(r); } ti } (relationship r : relstoremove) { rp.removerelationship(r); } // add together rels composed list<relationship> newrels = pd.getcomposedrels(); (relationship nr : newrels) { rp.addrelationship(nr); } } }
best regards,
tim
edit:
public static void openresult(string nodename, author out) throws ioexception { // in general, need avoid writing straight author out... // since can happen before formatter output gets there // namespaces not declared: // 4 options: // 1: // openelementevent containeropen = new openelementeventnsimpl(xml1.getnamespaceuri(), rootnodename); // formatter.format(containeropen); // // attributeevent wns = new attributeeventnsimpl("http://www.w3.org/2000/xmlns/" , "w", // // "http://schemas.openxmlformats.org/wordprocessingml/2006/main"); // // formatter.format(wns); // attributeevent late in process set mapping. // can comment out. // still have add together w: , other namespaces in // smartxmlformatter constructor. may 2.: // 2: stick known namespaces on our root element above // 3: prepare smartxmlformatter // go alternative 2 .. since clear out.append("<" + nodename + " xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" // w: namespace + " xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\"" + " xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\"" + " xmlns:r=\"http://schemas.openxmlformats.org/officedocument/2006/relationships\"" + " xmlns:v=\"urn:schemas-microsoft-com:vml\"" + " xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\"" + " xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\"" + " xmlns:w10=\"urn:schemas-microsoft-com:office:word\"" + " xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingdrawing\"" + " xmlns:dfx=\"" + constants.base_ns_uri + "\"" // add together these, since smartxmlformatter writes them on first fragment + " xmlns:del=\"" + constants.delete_ns_uri + "\"" + " xmlns:ins=\"" + constants.base_ns_uri + "\"" + " >" ); }
java ms-word diff docx4j
No comments:
Post a Comment