Character set policy BOF, 39th IETF
               Tuesday 0900-1000

Written by Roland Hedberg

Summary:

Harald presented the to be IETF policy for the handling of character sets,
which briefly consists of the following; all information, in the format of
strings for human consumption, transported using IETF protocols must have
the character set and language declared. The default character set should
be ISO 10646(Unicode) with UTF-8 as transport encoding. Language tags
according to RFC 1766 should be used. A short discussion followed which
made it quite obvious that there was rough consensus among the people in
the room that this was a go thing.

Minutes:

chair: Harald Tveit Alvestrand, Harald.T.Alvestrand@uninett.no

After a run through of the reasoning behind IETF adapting a character set
policy some of the points was discussed. It was concluded that the policy
should not deal with glyphs since that is a application client business,
neither should the IETF deal with problems inside ISO 10646. Undefined
issues that should be dealt with within the IETF are things like character
set registration and how to define comparison between strings. It was
concluded that normalization is a very hard ting to do, it is really a
research topic. As are ordering since it is language dependent. Therefore
we should initially only deal with comparison between strings. Further on,
a proposal was made that protocol element names should be in ASCII as long
as we don't have rules for name comparisons. 

Regarding language tags we do not know what language tags we will need but
we do need one tag with the meaning "the language is Unknown".
It was also discussed whether we should look to either ISO or the unicode
consortium for maintenance of the language tags.

A straw poll among the people present in the room showed that there where
rough consensus about the four bullets in Haralds proposal.



------- =_aaaaaaaaaa0--