Thursday, 15 January 2015

javascript - C# filter JS files from HttpWebRequest/WebResponse -



javascript - C# filter JS files from HttpWebRequest/WebResponse -

i searched not find worked me.

a while ago started c# , first personal project simple webcrawler. should check sourcecode special strings identify if illustration google analytics or similar included.

so works fine of course of study i'm missing js , iframes since httpwebrequest not render website know.

so wanted check "<script src="" illustration , url through split. not work expected , don't think clean , way.

since i'm checking strings destroyed changing string "<script" "< script" illustration have no thought how specific string big string.

i found regular expressions (rex) , split i'm not sure if rex , split since there more types of "src=" or split("\"", "\"", text)

i don't want "here go" of course of study want understand , myself have no thought go here..

sorry long text , no examples @ moment have no access , there not much except rex , split's

edit: think i'll create class checks every char special row "

best, mike

try html agility pack

i haven't used personally, should work (i haven't tested it):

string url = "some/url"; var request = (httpwebrequest)httpwebrequest.create(url); var webresponse = (httpwebresponse)request.getresponse(); var responsestream = webresponse.getresponsestream(); var streamreader = new streamreader(responsestream); htmlagilitypack.htmldocument doc = new htmlagilitypack.htmldocument(); doc.loadhtml(streamreader.readtoend()); var scripts = doc.documentnode.descendants() .where(n => n.name == "script");

this should script nodes them want =)

c# javascript regex split

No comments:

Post a Comment