python - Web (weird)wrapped text to plain text string -
i'm trying convert wrapped text plane text string endlines , all. wrapping of wierd kind have never seen before. text gained xml file cdata section
<font color="#bfffffff" size="12"></font><font color="#ff00ff00" size="12">my fellow muppets,<br><br>we sorry devilish intetions not going work out muppet brigade sorry guys not active ebough how ever extend arm players leave , bring together dynacorp. if of intrested drop me mail service , best of luck in future endevors. <br><br>o7 <br><br><br/></br></br></br></br></br></br></font><font color="#ff007fff" size="14">john milbroc<br/></font><font color="#bfffffff" size="14">--------------------------<br/></font><font color="#ff007fff" size="14">the muppet brigade ceo</font>
i've tryed next tough:
z = beautifulsoup(string) z.get_text()
however beautifulsoup not seem doing anything. i'm rather new python sorry if realy easy problem.
i think maybe beatifulsoup module broken because when :
from bs4 import beautifulsoup html_doc =""" hi.<br><br>this message.<br><br> """ print(html_doc) soup = beautifulsoup(html_doc) print(soup.text)
it prints:
hi.<br><br>this message.<br><br> none
after trying messed around other stuff , found if do
soup.get_text()
instead of
soup.txt
it wil print parsed text. wierd worked. te encouragement , keeping me on right track.
why not parse html using beautifulsoup
? example:
html_doc = """ ## re-create here html text """"
then parse :
from bs4 import beautifulsoup soup = beautifulsoup(html_doc)
you extract text :
print soup.text fellow muppets,we sorry devilish intetions not going work out muppet brigade sorry guys not active ebough how ever extend arm players leave , bring together dynacorp. if of intrested drop me mail service , best of luck in future endevors. o7 john milbroc-------------------------- muppet brigade ceo
python
No comments:
Post a Comment