How to reformat strings to not include accented letters in Python? -
this question has reply here:
what best way remove accents in python unicode string? 7 answersi'm trying create list of locations column of csv file in python.
this 1 entry in column:
rio balira del orien,riu valira d'orient,riu valira d’orient,río balira del orien this corresponding list in current state:
locs = ['rio balira del orien', "riu valira d'orient", 'riu valira d\xe2\x80\x99orient', 'r\xc3\xado balira del orien'] in program, need check if given word in list, i'm trying remove crazy string formatting (ex. \xc3\xad = í) accented letters, apostrophes, etc. , have each location in simple lowercase ascii. when seek utilize code
loclist = [x.encode('ascii').lower() x in locs] it throws error:
unicodedecodeerror: 'ascii' codec can't decode byte 0xe2 in position 12: ordinal not in range(128) what command should utilize instead?
thanks!
locs = ['rio balira del orien', "riu valira d'orient", 'riu valira d\xe2\x80\x99orient', 'r\xc3\xado balira del orien']
to remove completely:
print [unicode(x,errors="ignore") x in locs] [u'rio balira del orien', u"riu valira d'orient", u'riu valira dorient', u'ro balira del orien'] to encode ascii.
import unicodedata print [unicodedata.normalize('nfd', x.decode('utf-8')).encode('ascii', 'ignore') x in locs] ['rio balira del orien', "riu valira d'orient", 'riu valira dorient', 'rio balira del orien'] python string
No comments:
Post a Comment