python - Web (weird)wrapped text to plain text string -
i'm trying convert wrapped text plane text string endlines , all. wrapping of wierd kind have never seen before. text gained xml file cdata section
<font color="#bfffffff" size="12"></font><font color="#ff00ff00" size="12">my fellow muppets,<br><br>we sorry devilish intetions not going work out muppet brigade sorry guys not active ebough how ever extend arm players leave , bring together dynacorp. if of intrested drop me mail service , best of luck in future endevors. <br><br>o7 <br><br><br/></br></br></br></br></br></br></font><font color="#ff007fff" size="14">john milbroc<br/></font><font color="#bfffffff" size="14">--------------------------<br/></font><font color="#ff007fff" size="14">the muppet brigade ceo</font> i've tryed next tough:
z = beautifulsoup(string) z.get_text() however beautifulsoup not seem doing anything. i'm rather new python sorry if realy easy problem.
i think maybe beatifulsoup module broken because when :
from bs4 import beautifulsoup html_doc =""" hi.<br><br>this message.<br><br> """ print(html_doc) soup = beautifulsoup(html_doc) print(soup.text) it prints:
hi.<br><br>this message.<br><br> none after trying messed around other stuff , found if do
soup.get_text() instead of
soup.txt it wil print parsed text. wierd worked. te encouragement , keeping me on right track.
why not parse html using beautifulsoup? example:
html_doc = """ ## re-create here html text """" then parse :
from bs4 import beautifulsoup soup = beautifulsoup(html_doc) you extract text :
print soup.text fellow muppets,we sorry devilish intetions not going work out muppet brigade sorry guys not active ebough how ever extend arm players leave , bring together dynacorp. if of intrested drop me mail service , best of luck in future endevors. o7 john milbroc-------------------------- muppet brigade ceo python
No comments:
Post a Comment