The Unicode Blog: Extension G

Tuesday, March 10, 2020

Announcing The Unicode® Standard, Version 13.0

Version 13.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 5,390 characters, for a total of 143,859 characters. These additions include four new scripts, for a total of 154 scripts, as well as 55 new emoji characters.

The new scripts and characters in Version 13.0 add support for modern language groups in Africa, Pakistan, South Asia, and China:

Arabic script additions to write Hausa, Wolof, and other languages in Africa, and other additions to write Hindko and Punjabi in Pakistan
A character for Syloti Nagri in South Asia
Bopomofo additions for Cantonese

Support for scholarly work was extended worldwide, including:

Yezidi, historically used in Iraq and Georgia for liturgical purposes, with some modern revival of usage
Chorasmian, historically used in Central Asia across Uzbekistan, Kazakhstan, and Turkmenistan to write an extinct Eastern Iranian language
Dives Akuru, historically used in the Maldives until the 20th century
Khitan Small Script, historically used in northern China

Popular symbol additions include:

55 emoji characters, including several new emoji for smileys, gender neutral people, animals, and the potted plant. For the full list of new emoji characters, see emoji additions for Unicode 13.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.
Six Creative Commons license symbols that are used to describe functions, permissions, and concepts related to intellectual property that have widespread use on the web
Two Vietnamese reading marks that mark ideographs as having a distinct, colloquial reading
214 graphic characters that provide compatibility with various home computers from the mid-1970s to the mid-1980s and with early teletext broadcasting standards

Support for Chinese, Japanese, and Korean (CJK) unified ideographs was enhanced in Version 13.0 by the addition of 4,939 characters in Extension G, which is the first block to be encoded in Plane 3, as well as by significant corrections and improvements to the Unihan database. Changes to Unihan include updated regular expressions for many properties, the addition of several new properties, and the removal of three obsolete provisional properties. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.

Important chart font updates, including:

An update to the code charts for the Adlam script, now using the Ebrima font. That font has an improved design and has gained widespread acceptance in the user community.
A completely updated font for the CJK Radicals Supplement and the Kangxi Radicals blocks. This font is also used to show the radicals in the CJK unified ideographs code charts, as well as in the radical-stroke indexes.

Additional support for lesser-used languages and scholarly work was extended, including:

A character used in Sinhala to write Sanskrit

Unicode properties and specifications determine the behavior of text on computers and phones. Changes in Version 13.0 include the following Unicode Standard Annexes and Technical Standards that have notable modifications:

Five important Unicode annexes updated for Version 13.0:

Three important Unicode specifications updated for Version 13.0:

UTS #10, Unicode Collation Algorithm — sorting Unicode text
UTS #39, Unicode Security Mechanisms — reducing Unicode spoofing
UTS #46, Unicode IDNA Compatibility Processing — compatible processing of non-ASCII URLs

The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

The beta review period for Unicode 13.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 13.0 includes a number of changes and 5,930 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 13.0, often in coordination with changes to character properties. For the first time, a CJK extension has been encoded in plane 3, the Tertiary Ideographic Plane. Four new scripts have been added in Unicode 13.0. There are also 55 additional emoji characters and many other new emoji, including the transgender flag and polar bear.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 6, 2020. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-13.0.0.html for more information about testing the 13.0.0 beta.

See http://unicode.org/versions/Unicode13.0.0/ for the current draft summary of Unicode 13.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Tuesday, March 10, 2020

Announcing The Unicode® Standard, Version 13.0

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

About the Unicode Consortium

Links of Interest

Blog Archive

Labels

Followers

Tuesday, March 10, 2020

Announcing The Unicode® Standard, Version 13.0

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

About the Unicode Consortium

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog