Sunday, 15 August 2010

perl - Split function returns weird characters -



perl - Split function returns weird characters -

i facing problem script want make. in short, connecting local database dbi , execute queries. while works fine, , print out returned values select queries , on, when split, say, $firstname array , print out array weird characters. note fields in table working containing greek characters , utf8_general_ci. played around utilize utf8, utilize encoding, binmode, encode etc still split function homecoming š weird characters while before split whole greek word printed fine. suppose due missing pragma string encoding or similar can't find out solution. in advance. here piece of code describing. perl version v5.14.2

@query = &databasesubs::getstringfromdb(); print "$query[1]\n"; # prints greek name fine @chars = split('',$query[1]); foreach $chr (@chars) { print "$chr \n"; # prints weird chars }

and here output print , foreach respectively.

by default, perl assumes working single-byte characters. aren't, in utf8 greek characters using two-bytes in size. hence split splitting characters in half , you're getting unusual characters.

you need decode bytes characters come program. 1 way this.

use encode; @query = map { decode_utf8($_) } databasesubs::getstringfromdb();

(i've removed unnecessary , potentially confusing '&' subroutine call.)

now @query contains decode character strings , split split individual characters correctly(*).

but if print 1 of these characters, you'll "wide character" warning. that's because perl's i/o layer expects single-byte characters. need tell expect utf8. can this:

binmode stdout, ':utf8';

there other improvements consider. example, set decoding getstringfromdb subroutine. recommend reading perldoc perluniintro , perldoc perlunicode more details.

(*) yes, there's whole level of pain lurking when two-character graphemes, let's ignore now.

perl

No comments:

Post a Comment