Invalid conversion of non-BMP characters to UTF8
When a string containing non-BMP characters is written to the constant pool, the conversion to UTF-8 is done incorrectly, resulting in a failure to load the generated class (ClassFormatError: Illegal UTF-8 string in constant pool). The method ByteVector.putUTF8() makes no attempt to recognize surrogate pairs in the supplied string, but instead converts each of the two surrogates separately. To quote from the Unicode FAQ (http://unicode.org/faq/utf_bom.html#utf8-4), "The definition of UTF-8 requires that supplementary characters (those using surrogate pairs in UTF-16) be encoded with a single four byte sequence. However, there is a widespread practice of generating pairs of three byte sequences in older software, especially software which pre-dates the introduction of UTF-16 or that is interoperating with UTF-16 environments under particular constraints. Such an encoding is not conformant to UTF-8 as defined."