Sunday, 15 March 2015

Remove invalid non-ASCII characters in Bash -



Remove invalid non-ASCII characters in Bash -

in bash (on ubuntu), there command removes invalid multibyte (non-ascii) characters?

i've tried perl -pe 's/[^[:print:]]//g' removes valid non-ascii characters.

i can utilize sed, awk or similar utilities if needed.

the problem perl not realize input utf-8; assumes it's operating on stream of bytes. can utilize -ci flag tell interpret input utf-8. and, since have multibyte characters in output, need tell perl utilize utf-8 in writing standard output, can using -co flag. so:

perl -cio -pe 's/[^[:print:]]//g'

bash ascii utf non-ascii-characters

No comments:

Post a Comment