vb.net - HtmlAgilityPack not finding nodes from HttpWebRequest's returned HTML -
i little new htmlagilitypack. want utilize httpwebrequest can homecoming html of webpage , parse html htmlagilitypack. want find div
's specific class , inner text of within div
's. have far. request returns webpage html:
public function mygetreq(byval myurl string, byref thecookie cookiecontainer) dim getreq httpwebrequest = directcast(httpwebrequest.create(myurl), httpwebrequest) getreq.method = "get" getreq.keepalive = true getreq.cookiecontainer = thecookie getreq.useragent = "mozilla/5.0 (windows nt 6.3; wow64; rv:29.0) gecko/20100101 firefox/29.0" dim getresponse httpwebresponse getresponse = directcast(getreq.getresponse, httpwebresponse) dim getreqreader new streamreader(getresponse.getresponsestream()) dim thepage = getreqreader.readtoend 'clean streams , response. getreqreader.close() getresponse.close() homecoming thepage end function
this function returns html. set html this:
'the html shows in richtextbox richtextbox1.text = mygetreq("http://someurl.com", thecookie) dim htmldoc = new htmlagilitypack.htmldocument() htmldoc.loadhtml(richtextbox1.text) dim htmlnodes htmlnodecollection htmlnodes = htmldoc.documentnode.selectnodes("//div[@class='someclass']") if htmlnodes isnot nil each node in htmlnodes messagebox.show(node.innertext()) next end if
the problem is, htmlnodes
coming null
. final if then
loop won't run. finds nothing, know fact div
, class
exists in html page because can see html in richtextbox1:
<div class="someclass"> inner text </div>
what problem here? htmldoc.loadhtml
not type of string mygetreq
returns page html?
does have html entities? thepage
contains <
, >
brackets. not entitied.
i saw post here (c#) utilize htmlweb
class, not sure how set up. of code written httpwebrequest
.
thanks reading , helping.
if willing switch, utilize csquery, along these lines:
dim q new cq(mygetreq("http://someurl.com", thecookie)) each node in q("div.someclass") console.writeline(node.innertext) next
you may want add together error handling, overall should start you.
you can add together csquery project via nuget:
install-package csquery
and don't forget utilize imports csquery
@ top of code file.
this may not straight solve problem, should create easier experiment info (via immediate window, example).
interesting read (performance comparison):
csquery performance vs. html agility pack , fizzler vb.net httpwebrequest html-agility-pack
No comments:
Post a Comment