Chapter 3 discusses this in detail. Here's a very informal version:
-
Unicode characters don't fit in 8 bits; deal with it.
-
2 Byte order is only an issue in I/O.
-
If you don't know, assume big-endian.
-
Loose surrogates have no meaning.
-
Neither do U+FFFE and U+FFFF.
-
Leave the unassigned codepoints alone.
-
It's OK to be ignorant about a character, but not plain wrong.
-
Subsets are strictly up to you.
-
Canonical equivalence matters.
-
Don't garble what you don't understand.
-
Process UTF-* by the book.
-
Ignore illegal encodings.
-
Right-to-left scripts have to go by bidi rules
No comments:
Post a Comment