java - Jsoup incorrect value children size -
jsoup wrong counts number of children:
document document = jsoup .parse(teststring); element div = document.select("div").first(); elements divchildren = div.children(); system.out.println(divchildren.size()); for example, if teststring =
<div><div><p>text1</p></div><p>text2</p></div>
or
<div><h1><p>text1</p></h1><p>text2</p></div>
then divchildren.size() = 2
if teststring =
<div><p><p>text1</p></p><p>text2</p></div> then divchildren.size() = 4
what doing wrong?
if take @ document holding after parsing
string teststring ="<div><p><p>text1</p></p><p>text2</p></div>"; you see
<html> <head></head> <body> <div> <p></p> <p>text1</p> <p></p> <p>text2</p> </div> </body> </html> as @rejesh pointed p can't contain other block-level-elements p jsoup prevents closing such wrong outer p elements (separate closure opening tag , closing tag). in case
<p><p>text</p></p> will become
<p></p><p>text1</p><p></p>
so div
<div><p><p>text1</p></p><p>text2</p></div> will parsed as
<div> <p></p> <p>text1</p> <p></p> <p>text2</p> </div> and see there 4 children (two empty p , 2 p text).
if want turn off validating mechanism can utilize xml parser instead of standard html parser
string teststring ="<div><p><p>text1</p></p><p>text2</p></div>"; document document = jsoup.parse(teststring,"",parser.xmlparser()); system.out.println(document); element div = document.select("div").first(); elements divchildren = div.children(); system.out.println(divchildren.size()); will print 2.
java html parsing dom jsoup
No comments:
Post a Comment