Breeding: java - Docx4j difference between two Word docs -

java - Docx4j difference between two Word docs -

i need check difference between 2 word docx files. iam using docx4j. @ first had alter smartxmlformatter:

    public smartxmlformatter(writer w) throws ioexception {     this.xml = new xmlwriternsimpl(w, false);     if (this.writexmldeclaration) {       this.xml.xmldecl();       this.writexmldeclaration = false;     }      this.xml.setprefixmapping("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "w");     this.xml.setprefixmapping("http://schemas.microsoft.com/office/word/2010/wordml", "w14");     this.xml.setprefixmapping("http://schemas.microsoft.com/office/word/2012/wordml", "w15");     this.xml.setprefixmapping("http://schemas.openxmlformats.org/officedocument/2006/relationships", "r");     this.xml.setprefixmapping("http://schemas.openxmlformats.org/drawingml/2006/wordprocessingdrawing", "wp");     this.xml.setprefixmapping("http://schemas.openxmlformats.org/drawingml/2006/main", "a");     this.xml.setprefixmapping("http://schemas.openxmlformats.org/drawingml/2006/picture", "pic");      this.xml.setprefixmapping(constants.base_ns_uri, "dfx");     this.xml.setprefixmapping(constants.delete_ns_uri, "del");     this.xml.setprefixmapping(constants.insert_ns_uri, "ins");   }

after had changed code without russian letters works fine. when diff 2 docx documents russian characters next exception raises:

    org.xml.sax.saxparseexception; linenumber: 1; columnnumber: 10510; präfix "w14" für attribut "w14:paraid", das mit elementtyp "w:p" verknüpft ist, ist nicht gebunden.     @ com.sun.org.apache.xerces.internal.util.errorhandlerwrapper.createsaxparseexception(unknown source)     @ com.sun.org.apache.xerces.internal.util.errorhandlerwrapper.fatalerror(unknown source)     @ com.sun.org.apache.xerces.internal.impl.xmlerrorreporter.reporterror(unknown source)     @ com.sun.org.apache.xerces.internal.impl.xmlerrorreporter.reporterror(unknown source)     @ com.sun.org.apache.xerces.internal.impl.xmlerrorreporter.reporterror(unknown source)     @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.scanstartelement(unknown source)     @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl$fragmentcontentdriver.next(unknown source)     @ com.sun.org.apache.xerces.internal.impl.xmldocumentscannerimpl.next(unknown source)     @ com.sun.org.apache.xerces.internal.impl.xmlnsdocumentscannerimpl.next(unknown source)     @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl.scandocument(unknown source)     @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source)     @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source)     @ com.sun.org.apache.xerces.internal.parsers.xmlparser.parse(unknown source)     @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.parse(unknown source)     @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl$jaxpsaxparser.parse(unknown source)     @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.unmarshal0(unknown source)     @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.unmarshal(unknown source)     @ javax.xml.bind.helpers.abstractunmarshallerimpl.unmarshal(unknown source)     @ javax.xml.bind.helpers.abstractunmarshallerimpl.unmarshal(unknown source)     @ org.docx4j.xmlutils.unmarshalstring(xmlutils.java:381)     @ org.docx4j.xmlutils.unmarshalstring(xmlutils.java:361)     @ docx4jdiff.comparedocumentsusingdriver.main(comparedocumentsusingdriver.java:88) exception in thread "main" javax.xml.bind.unmarshalexception  - linked exception: [org.xml.sax.saxparseexception; linenumber: 1; columnnumber: 10510; präfix "w14" für attribut "w14:paraid", das mit elementtyp "w:p" verknüpft ist, ist nicht gebunden.]     @ javax.xml.bind.helpers.abstractunmarshallerimpl.createunmarshalexception(unknown source)     @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.createunmarshalexception(unknown source)     @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.unmarshal0(unknown source)     @ com.sun.xml.internal.bind.v2.runtime.unmarshaller.unmarshallerimpl.unmarshal(unknown source)     @ javax.xml.bind.helpers.abstractunmarshallerimpl.unmarshal(unknown source)     @ javax.xml.bind.helpers.abstractunmarshallerimpl.unmarshal(unknown source)     @ org.docx4j.xmlutils.unmarshalstring(xmlutils.java:381)     @ org.docx4j.xmlutils.unmarshalstring(xmlutils.java:361)     @ docx4jdiff.comparedocumentsusingdriver.main(comparedocumentsusingdriver.java:88) caused by: org.xml.sax.saxparseexception; linenumber: 1; columnnumber: 10510; präfix "w14" für attribut "w14:paraid", das mit elementtyp "w:p" verknüpft ist, ist nicht gebunden.     @ com.sun.org.apache.xerces.internal.parsers.abstractsaxparser.parse(unknown source)     @ com.sun.org.apache.xerces.internal.jaxp.saxparserimpl$jaxpsaxparser.parse(unknown source)     ... 7 more

so please can help me?

here maincode:

    public class comparedocumentsusingdriver {      public static jaxbcontext context = org.docx4j.jaxb.context.jc;      /**      * @param args      */     public static void main(string[] args) throws exception {         system.setproperty("file.encoding", "utf-8");          string newerfilepath = "b.docx";         string olderfilepath = "a.docx";          // 1. load packages         wordprocessingmlpackage newerpackage = wordprocessingmlpackage                 .load(new java.io.file(newerfilepath));         wordprocessingmlpackage olderpackage = wordprocessingmlpackage                 .load(new java.io.file(olderfilepath));          body newerbody = ((document) newerpackage.getmaindocumentpart()                 .getjaxbelement()).getbody();         body olderbody = ((document) olderpackage.getmaindocumentpart()                 .getjaxbelement()).getbody();          system.out.println("differencing..");          // 2. differencing         stringwriter sw = new stringwriter();          docx4jdriver.diff(xmlutils.marshaltow3cdomdocument(newerbody)                 .getdocumentelement(),                 xmlutils.marshaltow3cdomdocument(olderbody)                         .getdocumentelement(), sw);         // signature takes reader objects appears broken          // 3. result          string contentstr = sw.tostring();         system.out.println("result: \n\n " + contentstr);          body newbody = (body) xmlutils.unwrap(xmlutils.unmarshalstring(contentstr));           // in general case, need handle relationships. not done here!          // relationshipspart rp =         // newerpackage.getmaindocumentpart().getrelationshipspart();         // handlerels(pd, rp);         newerpackage.setfontmapper(new identityplusmapper());         newerpackage.save(new java.io.file("compared.docx"));      }      /**      * in general case, need handle relationships. although not      * necessary in simple example, anyway purposes of      * illustration.      */     private static void handlerels(differencer pd, relationshipspart rp) {         // since going  add together rels appropriate docs  beingness         // compared, neatness , avoid duplication         // (duplication of internal part names fatal in word,         // , export xslt makes images internal, though avoid         // duplicating         // part ),         // remove existing rels point images         list<relationship> relstoremove = new arraylist<relationship>();         (relationship r : rp.getrelationships().getrelationship()) {             // type="http://schemas.openxmlformats.org/officedocument/2006/relationships/image"             if (r.gettype().equals(namespaces.image)) {                 relstoremove.add(r);             } ti      }         (relationship r : relstoremove) {             rp.removerelationship(r);         }          //  add together rels composed         list<relationship> newrels = pd.getcomposedrels();         (relationship nr : newrels) {             rp.addrelationship(nr);         }     }  }

best regards,

tim

edit:

public static void openresult(string nodename,   author out) throws ioexception {         // in general, need avoid writing  straight  author out...         // since can happen before formatter output gets there          // namespaces not declared:         // 4 options:         // 1:         // openelementevent containeropen = new openelementeventnsimpl(xml1.getnamespaceuri(), rootnodename);         // formatter.format(containeropen);         // // attributeevent wns = new attributeeventnsimpl("http://www.w3.org/2000/xmlns/" , "w",         // //       "http://schemas.openxmlformats.org/wordprocessingml/2006/main");         // // formatter.format(wns);         // attributeevent late in process set mapping.         // can comment out.         // still have  add together w: , other namespaces in         // smartxmlformatter constructor. may 2.:         // 2: stick known namespaces on our root element above         // 3:  prepare smartxmlformatter         // go  alternative 2 .. since clear         out.append("<" + nodename                 + " xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\""  // w: namespace                 + " xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\""                 + " xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\""                 + " xmlns:r=\"http://schemas.openxmlformats.org/officedocument/2006/relationships\""                 + " xmlns:v=\"urn:schemas-microsoft-com:vml\""                 + " xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\""                 + " xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\""                 + " xmlns:w10=\"urn:schemas-microsoft-com:office:word\""                 + " xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingdrawing\""                 + " xmlns:dfx=\"" + constants.base_ns_uri + "\""  //  add together these, since smartxmlformatter writes them on first fragment                 + " xmlns:del=\"" + constants.delete_ns_uri + "\""                 + " xmlns:ins=\"" + constants.base_ns_uri + "\""                         + " >" );     }

java ms-word diff docx4j

Breeding

Monday, 15 June 2015

java - Docx4j difference between two Word docs -

No comments:

Post a Comment