c# - HTMLAgilityPack using my own tags -
i need parse few html elements list using html agility pack , remove them document. wrote next code:
htmldocument doc = new htmldocument(); doc.load(tempfilehtml); doc.optionsupportoptionalendtags = true; doc.optionwriteemptynodes = true; list<htmlnode> tagresolver = doc.documentnode.descendants("link").tolist(); (int = 0; < tagresolver.count; i++) { elements.add(tagresolver[i].outerhtml); tagresolver[i].remove(); } doc.save(tempfilehtml, encoding.getencoding(htmltopdf.defaultencoding)); the problem start html file looks this:
<table> <loop> <tr> <td>{code}</td> </tr> </loop> </table> and after doc.save() file looks this:
<table> <loop> </loop> <tr> <td>{code}</td> </tr> </table> is there way save document correctly?
there specific logic in agility pack enforce right structure. code targets li, ul, table, tr etc. might hitting this. see htmldocument.getresetters method. turning off optionfixnestedtags using doc.optionfixnestedtags = false, should circumvent behavior.
you should register tag(s) using htmlnode.elementsflags.add top of head right syntax is:
htmlnode.elementsflags.add("loop", htmlelementflag.empty | htmlelementflag.closed); that way can define how expect htmlagilitypack parse markers.
also: there mixedcodedocument class can utilize well, requires specify token own tags, way utilize <%loop%> , might provide escape you. can specify tokenstart , tokenend on document before parsing.
c# html-agility-pack
No comments:
Post a Comment