Tuesday, 15 May 2012

How to reformat strings to not include accented letters in Python? -



How to reformat strings to not include accented letters in Python? -

this question has reply here:

what best way remove accents in python unicode string? 7 answers

i'm trying create list of locations column of csv file in python.

this 1 entry in column:

rio balira del orien,riu valira d'orient,riu valira d’orient,río balira del orien

this corresponding list in current state:

locs = ['rio balira del orien', "riu valira d'orient", 'riu valira d\xe2\x80\x99orient', 'r\xc3\xado balira del orien']

in program, need check if given word in list, i'm trying remove crazy string formatting (ex. \xc3\xad = í) accented letters, apostrophes, etc. , have each location in simple lowercase ascii. when seek utilize code

loclist = [x.encode('ascii').lower() x in locs]

it throws error:

unicodedecodeerror: 'ascii' codec can't decode byte 0xe2 in position 12: ordinal not in range(128)

what command should utilize instead?

thanks!

locs = ['rio balira del orien', "riu valira d'orient", 'riu valira d\xe2\x80\x99orient', 'r\xc3\xado balira del orien']

to remove completely:

print [unicode(x,errors="ignore") x in locs] [u'rio balira del orien', u"riu valira d'orient", u'riu valira dorient', u'ro balira del orien']

to encode ascii.

import unicodedata print [unicodedata.normalize('nfd', x.decode('utf-8')).encode('ascii', 'ignore') x in locs] ['rio balira del orien', "riu valira d'orient", 'riu valira dorient', 'rio balira del orien']

python string

No comments:

Post a Comment