[uClibc 0000687]: utf-8 mbrtowc accepts invalid bytes
bugs at busybox.net
bugs at busybox.net
Sat Feb 18 19:49:55 UTC 2006
The following issue has been UPDATED.
======================================================================
http://busybox.net/bugs/view.php?id=687
======================================================================
Reported By: rfelker
Assigned To: uClibc
======================================================================
Project: uClibc
Issue ID: 687
Category: Internationalization / Localization
Reproducibility: always
Severity: minor
Priority: normal
Status: assigned
======================================================================
Date Submitted: 02-06-2006 00:59 PST
Last Modified: 02-18-2006 11:49 PST
======================================================================
Summary: utf-8 mbrtowc accepts invalid bytes
Description:
According to section 3.9 of the Unicode Standard, UTF-8 is a mapping
between byte sequences and "Unicode scalar values", which are integers in
one of the ranges 0-0xd7ff or 0xe000-0x10ffff. The standard is clear that
UTF-8 sequences are one to four bytes in length. uClibc accepts the
illegal bytes 0xf5-0xfd giving 5- and 6-byte sequences for code points up
to 0x7fffffff.
Although there was a conflict in the past, my understanding is that
ISO-10646 now agrees that UCS codes go only up through 0x10ffff and that
UTF-8 is a 1-4 byte encoding, not 1-6 byte.
======================================================================
Issue History
Date Modified Username Field Change
======================================================================
02-06-06 00:59 rfelker New Issue
02-06-06 00:59 rfelker Status new => assigned
02-06-06 00:59 rfelker Assigned To => uClibc
02-18-06 11:49 vapier Category Standards Compliance =>
Internationalization / Localization
======================================================================
More information about the uClibc-cvs
mailing list