Python: How to use BeautifulSoup to deal with encoding issues? -
this first time using beautifulsoup.
basically, utilize beautifulsoup extract data. trying build table in csv based on webtable. , illustration row of table looks this:
[<td>1</td>, <td> chief executives , senior officials</td>, <td>£120,830</td>,<td>-3.8</td>]
now, problem when utilize .text.encode('utf8')
, output becomes:
('1', ' chief executives , senior officials', '\xc2\xa3120,830', '-3.8')
the figure £120,830
becomes \xc2\xa3120,830
, have no thought kind of encoding is. there way can proper output £120,830
rather crazy encoding ?
alternatively, there way create crazy encoded thing \xc2\xa3120,830
£120,830
in csv ? know how deal these kind of problem ?
another alternative remove <td>
tags , maintain content, how can in python ? there efficient way of getting rid of these tags ? help appreciated. thanks
that how £ comes out when encode utf-8. if that's not want, why encoding it?
in more detail, utf-8 encodes u+00a3 byte sequence 0xc2 0xa3 (two bytes) python displays in string '\xc2\xa3'
.
if want in file , want file utf-8 encoded, nil wrong, except maybe using @ file.
python beautifulsoup
No comments:
Post a Comment