Remove invalid non-ASCII characters in Bash -
in bash (on ubuntu), there command removes invalid multibyte (non-ascii) characters?
i've tried perl -pe 's/[^[:print:]]//g' removes valid non-ascii characters.
i can utilize sed, awk or similar utilities if needed.
the problem perl not realize input utf-8; assumes it's operating on stream of bytes. can utilize -ci flag tell interpret input utf-8. and, since have multibyte characters in output, need tell perl utilize utf-8 in writing standard output, can using -co flag. so:
perl -cio -pe 's/[^[:print:]]//g' bash ascii utf non-ascii-characters
No comments:
Post a Comment