ISO/IEC 10646:2012 specifies the Universal Character Set (UCS). It is applicable to the representation, transmission, interchange, processing, storage, input and presentation of the written form of the languages of the world as well as additional symbols. It covers 110 181 characters from the world's scripts.
- specifies the architecture of ISO/IEC 10646.
- defines terms used in ISO/IEC 10646.
- describes the general structure of the UCS codespace.
- specifies the Basic Multilingual Plane (BMP) of the UCS.
- specifies supplementary planes of the UCS: the Supplementary Multilingual Plane (SMP), the Supplementary Ideographic Plane (SIP), the Tertiary Ideographic Plane (TIP), and the Supplementary Special-purpose Plane (SSP).
- defines a set of graphic characters used in scripts and the written form of languages on a world-wide scale.
- specifies the names for the graphic characters and format characters of the BMP, SMP, SIP, SSP and their coded representations within the UCS codespace. (Note: TIP is currently empty).
- specifies the coded representations for control characters and private use characters.
- specifies three encoding forms of the UCS: UTF-8, UTF-16, and UTF-32.
- specifies seven encoding schemes of the UCS: UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE.
- specifies the management of future additions to this coded character set.
The charts of the ideographic characters are now in multi-column format.
The UCS is an encoding system different from that specified in ISO/IEC 2022. The method to designate UCS from ISO/IEC 2022 is specified in 12.2.
A graphic character will be assigned only one code point in the standard, located either in the BMP or in one of the supplementary planes.
By defining a consistent way of encoding multilingual text, ISO/IEC 10646:2012 enables the exchange of data internationally. The information technology industry gains data stability, greater global interoperability and data interchange. ISO/IEC 10646 has been widely adopted on the World Wide Web and implemented in modern operating systems and computer languages.
Текущий статус : WithdrawnДата публикации : 2012-06
Версия : 3
Технический комитет:Coded character sets