Tuesday, 15 April 2014

python - Print encoded string -



python - Print encoded string -

i'm developing own mp3 decoder using python i'm little bit stuck decoding id3 tag. don't want utilize existing libraries mutagen or eyed3 follow id3v2 specification.

the problem frame info encoded in format cannot print, using debugger see value "hideaway" it's preceded unusual characters can see here:

'data': '\\x00hideaway'

i have next questions: kind of encoding that? how can decode , print string? think other mp3 files utilize different encoding in id3 tags?

by way, i'm using utf-8 declaration @ top of file

# -*- coding: utf-8 -*-

and i'm reading file using normal i/o methods in python (read())

the characacters \\x00 indicate single byte value of 0 precedes h. so, string looks this:

class="lang-none prettyprint-override">zero - h - - d - e ...

usually character strings have letters or numbers in them, not zero. perhaps usage specific id3v2?

considering idc3v2 standard (http://id3.org/id3v2.4.0-structure), see is:

class="lang-none prettyprint-override">frames allow different types of text encoding contains text encoding description byte. possible encodings: $00 iso-8859-1 [iso-8859-1]. terminated $00. $01 utf-16 [utf-16] encoded unicode [unicode] bom. strings in same frame shall have same byteorder. terminated $00 00. $02 utf-16be [utf-16] encoded unicode [unicode] without bom. terminated $00 00. $03 utf-8 [utf-8] encoded unicode [unicode]. terminated $00.

so, see 0 byte indicates iso-8859-1 encoding, next 0 byte.

your programme might deal so:

class="lang-python prettyprint-override">title = fp.read(number_of_bytes) if(title[0] == '\x00') title = title[1:].decode('iso8859-1') elif(title[0] == ... else ...) title = title[1:].decode('some-other-encoding') ...

python encoding mp3

No comments:

Post a Comment