Friday, 15 June 2012

java - Collections.sort() isn't sorting in the right order -



java - Collections.sort() isn't sorting in the right order -

i have code in java:

list<string> unsorted = new arraylist<string>(); list<string> beforehash = new arraylist<string>(); string[] unsortedaux, beforehashaux; string line = null; bufferedreader reader = new bufferedreader(new filereader("c:\\cpd\\temp0.txt")); while ((line = reader.readline()) != null){ unsorted.add(line); beforehash.add(line.split("#")[0]); } reader.close(); collections.sort(beforehash); beforehashaux = beforehash.toarray(new string[beforehash.size()]); unsortedaux = unsorted.toarray(new string[unsorted.size()]); system.out.println(arrays.tostring(beforehashaux)); system.out.println(arrays.tostring(unsortedaux));

it reads file named temp0.txt, contains:

carlos magno#261 mateus carl#12 analise soares#151 giancarlo tobias#150

my goal sort names in string, without string after "#". using beforehash.add(line.split("#")[0]); this. problem reads correctly file, sorts in wrong order. correspondent outputs are:

[analise soares, giancarlo tobias, mateus carl, carlos magno] [carlos magno#261, mateus carl#12, analise soares#151, giancarlo tobias#150]

the first result "sorted" one, note "carlos magno" comes after "mateus carl". cannot find problem in code.

the problem "carlos magno" starts unicode byte-order mark.

if re-create , paste sample text ([analise ... carlos magno]) unicode explorer you'll see before "c" of carlos magno, you've got u+feff.

basically, you'll need strip when reading file. easiest way use:

line = line.replace("\ufeff", "");

... or check first:

if (line.startswith("\ufeff")) { line = line.substring(1); }

note should specify encoding want utilize when opening file - utilize fileinputstream wrapped in inputstreamreader.

java sorting collections

No comments:

Post a Comment