File name encoding

January 05, 2009

Back in college one of my hobbies was UTF-8 on Linux. I spent hours tinkering around and reading, and from that era I retained a sense of awe for Jungshik for his gnarly posts demonstrating knowledge of pretty much everything Unicode. It took me a while to realize that years later it was the same Jungshik who was contributing to Chrome.

Which takes us to r7455, a revert of a change that added a conversion from a FilePath (representing a platform-native path: UTF-16 on Windows, UTF-8 on OS X) to UTF-8. Why is it bad? Like a good student, I know that Linux paths are bytes, with no encoding. But despite my impression otherwise that UTF-8 is a safe assumption these days, Jungshik swears that it is not. And I know enough to know that that Jungshik knows more than me. (And also that the Asian situation wrt Unicode is complicated.)