For a while now I’ve been noticing that certain songs’ ID3 tags are broken when played on my Android phone. I use Black Player which in turn uses the built in Android music libraries – but I’ve checked in a number of other players too just in case and the problem persists.

My music library is extremely multilingual (and most of those languages I don’t even speak), so it’s full of UTF characters. Some of them seem to break Android’s encoding recognition. Sadly some of these triggers are pretty common, resulting in borked last.fm scrobbles. (And of course last.fm can’t be expected to be so smart as to fix all those automatically.)

So far I’ve noticed that the Hungarian ű and ő characters cause problems. If a field contains either of them, the rest of non-ASCII characters are rendered broken too, but if these two are not present, other characters (such as á or ö) are rendered correctly. Certain Japanese characters seem to trigger it too, but I haven’t been able to pinpoint which characters exactly (since there are so many).

It’s my Macross Delta Walküre albums getting messed up all the time, but weirdly I haven’t noticed problems with any other Japanese artists. I’ve triple checked and force-reencoded all my library’s ID3 tags, so that should not be an issue. Music players on my computer (Clementine, foobar2000) are able to read the tags properly too.

I suspected that this might be related to how ID3v2.3 only uses UTF-16, so I tried to use UTF-8 encoded ID3v2.4, but Android couldn’t read those tags at all.

At this point I’m out of ideas as for how could I fix this – other than actually finding the problematic code points in the Android music source and fixing them myself. Let’s just say I’m not thrilled by the idea of having to hack Android Java just to get my music tags render correctly.