An International Character Set Specification

Intended for ASN.1 Encoding

February 1998

by Thierry Moreau

CONNOTECH Experts-conseils Inc.


Overview

In the spirit of ASN.1 encoding, a standards-based and non-proprietary encoding method is needed for international characters and punctuation marks beyond those offered by the 94-position basic ASCII character set.

It seems too demanding to seek strict compliance to the standards ISO 2022 and ISO 2375 which are referred to in the ASN.1 specification. Actually, these international standards received little attention in the computing comunity had little impact in the solutions adopted by various hardware and software manufacturers. The key advantages of the proposed international character set are listed below.

The multinational coding goal can be described as follows:

In this document, to describe the character encoding, the column/row position in numeric format is used, where the actual character code value is 16 × column + row. For instance, 2/0 is the space character (character code 32), and 4/3 is the upper case letter C.

The proposed encoding is based on 8-bit encoded characters with provision for "nonspacing characters." Here is the breakdown of the 256 available positions. Character positions 0/0 to 1/15 are not specified herein (some may keep their usual interpretation, such as Line Feed character at position 0/10). Character positions 2/0 to 7/14 are the ASCII table. Character positions 7/15 to 10/0 are not specified herein. Character positions 10/1 to 15/14, except for nine positions whose usage is not recommended, constitute a "supplementary character set." Finally, character position 15/15 is not defined herein.

Supplementary character set

The supplemantary character set comprises punctuation marks, currency symbols, single-character international symbols, and other symbols in columns 10, 11, 13, 14, and 15. Column 12 contains nonspacing characters, and is mainly used for diacritical marks used in a two-character encoding of international symbols using diacritical marks. Such symbols are made of the nonspacing diacritical mark followed by the letter to which the mark should be applied to properly render the international symbol.


NAPLPS1.GIF

Subset of the supplementary character set

Notes:

  1. The characters in positions 12/0, 12/9, 13/6 to 13/11, and 14/5 from [1] are omitted because they are box-drawing characters. According to [2], these characters are "are not widely accepted in other telematic services."
  2. The characters in column 12 are nonspacing. The diacritical marks (positions 12/1 to 12/8, 12/10, 12/11, and 12/13 to 12/15) are meant to precede a space character (position 2/0) or a letter according to the following table.
  3. The nonspacing underline characters (position 12/12) is meant to precede the character to which it applies and the diacretical mark if any.
  4. The character lower case n with apostrophe, position 14/15, has no upper-case equivalent.
  5. The character at position 15,0 is greenlandic lower case k.
  6. The character at position 15,5 is lower case i without dot.

Suggested minimal conversion process.

The suggested conversion process from multi-national strings to straight ASCII is to drop any character above 7/15 (especially in the range 10/1 to 15/14), except for the following ones that should be translated into sequences of one to three ASCII characters (positions 2/1 to 7/14) according to the following list.

10/4  --> 2/4 ($)
10/6  --> 3/3 (#)
10/9  --> 2/7 (')
10/10 --> 2/2 (")
10/11 --> 2/2 (")
11/1  --> 2/11 2/15 2/13 (+/-)
11/2  --> 3/2 (2)
11/3  --> 3/3 (3)
11/4  --> 2/10 (*)
11/7  --> 2/14 (.)
11/8  --> 2/15 (/)
11/9  --> 2/7 (')
11/10 --> 2/2 (")
11/11 --> 2/2 (")
11/12 --> 3/1 2/15 3/4 (1/4)
11/13 --> 3/1 2/15 3/2 (1/2)
11/14 --> 3/3 2/15 3/4 (3/4)
13/0  --> 2/13 (-)
13/1  --> 3/1 (1)
13/2  --> 2/8 5/2 2/9 ((R))
13/3  --> 2/8 4/3 2/9 ((C))
13/4  --> 5/4 4/13 (TM)
13/12 --> 3/1 2/15 3/8 (1/8)
13/13 --> 3/3 2/15 3/8 (3/8)
13/14 --> 3/5 2/15 3/8 (5/8)
13/15 --> 3/7 2/15 3/8 (7/8)
14/1  --> 4/1 4/5 (AE)
14/2  --> 4/4 (D)
14/4  --> 4/8 (H)
14/6  --> 4/9 4/10 (IJ)
14/7  --> 4/12 (L)
14/8  --> 4/12 (L)
14/9  --> 4/15 (O)
14/10 --> 4/15 4/5 (OE)
14/12 --> 5/4 6/8 (Th)
14/13 --> 5/4 (T)
14/14 --> 4/14 6/10 (Nj)
14/15 --> 6/14 (n)
15/0  --> 6/11 (k)
15/1  --> 6/1 6/5 (ae)
15/2  --> 6/4 (d)
15/3  --> 6/4 (d)
15/4  --> 6/8 (h)
15/5  --> 6/9 (i)
15/6  --> 6/9 6/10 (ij)
15/7  --> 6/12 (l)
15/8  --> 6/12 (l)
15/9  --> 6/15 (o)
15/10 --> 6/15 6/5 (oe)
15/11 --> 7/3 (s)
15/12 --> 7/4 6/8 (th)
15/13 --> 7/4 (t)
15/14 --> 6/14 6/10 (nj)

Usage of diacritical marks

A diacretical mark should precede an upper case letter (positions 4/1 to 5/10), a lower case letter (position 6/1 to 7/10), or a space character (position 2/0). For instance, the representation of é (e with acute accent) is characters 12/2, 6/5. Any diacritical mark may be turned into a spacing character by prefixing it to the space character (position 2/0).


NAPLPS2.GIF

Table of allowed diacretical marks

Notes:

  1. Letters G and I have explicit different rows for upper and lower case. Other letters (e.g. A) for which only the upper case representation appear have an implied identical row with a lower case label.
  2. The upper case equivalent of lower case g with acute accent (12/2, 6/7) is upper case G with cedilla (12/11, 4/7), and vice-versa.
  3. The above table is based on [1], and expanded to include international characters defined in [3]. Chacraters between brackets are those found in [3] but not found in [1]

The english names of diacretical marks are

References:

[1] ANSI and CSA, Videotext/Teletext Presentation Level Protocol Syntax, North American PLPS, ANSI X3.110-1983 or CSA T500-1983, American National Standard Institute, Inc., and Canadian Standards Association, 1983

[2] Bell Canada, Corporate Business Development, Information Coding Specification for the Bell Videotext System, Issue 3, November 1987

[3] WordPerfect Corporation, WordPerfect version 6.0 (DOS) Reference, WordPerfect Corporation, 1993


security scheme designalternative to PKIpatent publicationsSAKEMscholarly web contentsconsulting services ]
[ CONNOTECH home page: http://www.connotech.com/about us | e-mail to: info@connotech.com ]

CONNOTECH Experts-conseils Inc.
9130 Place de Montgolfier
Montréal, Québec, Canada, H2M 2A1
Tél.: +1-514-385-5691 Fax: +1-514-385-5900