Working towards consensus for IDN Han characters

By on 9 Dec 2016

Category: Tech matters

Tags: , , , , , , , ,

Blog home

Internet and language linguistics experts met at IETF 97 to discuss Han characters used for Internationalized Domain Names.

Formed in May 2000, the Chinese Domain Name Consortium (CDNC) is an independent non-profit organization, which is in charge of coordinating and regulating the use of Traditional and Simplified Chinese Internationalized Domain Names (IDNs).

The CDNC also plays a role in promoting the sustainable development of ICANN, both in technical and policy aspects, including Internationalized Domain Names in Applications (IDNAs), new generic top-level domains (gTLDs), and Root Zone Label Generation Rules (LGRs).

CDNC’s overall solution for registering Traditional or Simplified Chinese IDNs is to allow one single user to register one IDN and its variant combinations to avoid confusion. This is in accordance with the Traditional and Simplified Chinese IDN table, together with the registration policy.

For new gTLD applications, there are 116 IDN strings, of which 73 are Chinese. The Chinese Generation Panel (CGP), of which representatives of CDNC are a part of, has played a significant role in ICANN’s plan to develop and maintain the LGRs for the Root Zone in respect of IDN variants.

In recent years, the CGP, as part of the China-Japan-Korea (CJK) working group, has in collaboration with its neighbouring Generation Panels in Japan (JGP) and Korea (KGP), made significant progress towards achieving consensus for the Han characters variants the three economies share in the Maximal Starting Repertoire (MSR). Following the most recent CJK meeting held in Taipei, Taiwan in September 2016, only 60 sets of Han characters remained unsolved.

Internet and language linguistic experts have been working together to reduce unsolved IDN Han characters.

For Chinese experts, based on the principle of equivalence of Traditional and Simplified Chinese characters, each of the unsolved sets has certain variant characters that are put in the same variant group. However, Korean experts think differently, especially about variant characters with different meanings being put in the same variant group due to the simplification.

During IETF 97 in Seoul, 17 Internet and language linguistic experts, including Shian Shyong Tseng, Ai-Chin Lu, Kenny Huang, Nai-Wen Hsu (.tw), Wesley Wang (.cn), Hiro Hotta (.jp), Kim Kyongsok, Dongman Lee and Min Jung Park (.kr), met to review the unsolved 60 sets of characters.

During the eight-hour meeting, the unsolved character set was reduced from 60 to six, and consider the frequencies of variants used in daily life and domain name registration, as well as the same meaning of characters and their variants.

Experts met again on 17 November further reducing the unsolved character sets from six to three.

Chinese and Korean language linguistic experts still have some different views on these unsolved character sets. If consensus is not reached, there won’t be Han Character strings in the next round of new gTLD applications.

Shian-Shyong Tseng is a Chair Professor at Asia University, Taiwan, and a Co-chair of the Chinese Domain Name Consortium. He previously served as Chairman of the board of directors of TWNIC.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please answer the math question * Time limit is exhausted. Please click the refresh button next to the equation below to reload the CAPTCHA (Note: your comment will not be deleted).

Top