In the spirit of ASN.1 encoding, a standards-based and non-proprietary encoding method is needed for international characters and punctuation marks beyond those offered by the 94-position basic ASCII character set.
It seems too demanding to seek strict compliance to the standards ISO 2022 and ISO 2375 which are referred to in the ASN.1 specification. Actually, these international standards received little attention in the computing comunity had little impact in the solutions adopted by various hardware and software manufacturers. The key advantages of the proposed international character set are listed below.
The multinational coding goal can be described as follows:
In this document, to describe the character encoding, the column/row position in numeric format is used, where the actual character code value is 16 × column + row. For instance, 2/0 is the space character (character code 32), and 4/3 is the upper case letter C.
The proposed encoding is based on 8-bit encoded characters with provision for "nonspacing characters." Here is the breakdown of the 256 available positions. Character positions 0/0 to 1/15 are not specified herein (some may keep their usual interpretation, such as Line Feed character at position 0/10). Character positions 2/0 to 7/14 are the ASCII table. Character positions 7/15 to 10/0 are not specified herein. Character positions 10/1 to 15/14, except for nine positions whose usage is not recommended, constitute a "supplementary character set." Finally, character position 15/15 is not defined herein.
The supplemantary character set comprises punctuation marks, currency symbols, single-character international symbols, and other symbols in columns 10, 11, 13, 14, and 15. Column 12 contains nonspacing characters, and is mainly used for diacritical marks used in a two-character encoding of international symbols using diacritical marks. Such symbols are made of the nonspacing diacritical mark followed by the letter to which the mark should be applied to properly render the international symbol.
Subset of the supplementary character set
Notes:
The suggested conversion process from multi-national strings to straight ASCII is to drop any character above 7/15 (especially in the range 10/1 to 15/14), except for the following ones that should be translated into sequences of one to three ASCII characters (positions 2/1 to 7/14) according to the following list.
10/4 --> 2/4 ($)
10/6 --> 3/3 (#)
10/9 --> 2/7 (')
10/10 --> 2/2 (")
10/11 --> 2/2 (")
11/1 --> 2/11 2/15 2/13 (+/-)
11/2 --> 3/2 (2)
11/3 --> 3/3 (3)
11/4 --> 2/10 (*)
11/7 --> 2/14 (.)
11/8 --> 2/15 (/)
11/9 --> 2/7 (')
11/10 --> 2/2 (")
11/11 --> 2/2 (")
11/12 --> 3/1 2/15 3/4 (1/4)
11/13 --> 3/1 2/15 3/2 (1/2)
11/14 --> 3/3 2/15 3/4 (3/4)
13/0 --> 2/13 (-)
13/1 --> 3/1 (1)
13/2 --> 2/8 5/2 2/9 ((R))
13/3 --> 2/8 4/3 2/9 ((C))
13/4 --> 5/4 4/13 (TM)
13/12 --> 3/1 2/15 3/8 (1/8)
13/13 --> 3/3 2/15 3/8 (3/8)
13/14 --> 3/5 2/15 3/8 (5/8)
13/15 --> 3/7 2/15 3/8 (7/8)
14/1 --> 4/1 4/5 (AE)
14/2 --> 4/4 (D)
14/4 --> 4/8 (H)
14/6 --> 4/9 4/10 (IJ)
14/7 --> 4/12 (L)
14/8 --> 4/12 (L)
14/9 --> 4/15 (O)
14/10 --> 4/15 4/5 (OE)
14/12 --> 5/4 6/8 (Th)
14/13 --> 5/4 (T)
14/14 --> 4/14 6/10 (Nj)
14/15 --> 6/14 (n)
15/0 --> 6/11 (k)
15/1 --> 6/1 6/5 (ae)
15/2 --> 6/4 (d)
15/3 --> 6/4 (d)
15/4 --> 6/8 (h)
15/5 --> 6/9 (i)
15/6 --> 6/9 6/10 (ij)
15/7 --> 6/12 (l)
15/8 --> 6/12 (l)
15/9 --> 6/15 (o)
15/10 --> 6/15 6/5 (oe)
15/11 --> 7/3 (s)
15/12 --> 7/4 6/8 (th)
15/13 --> 7/4 (t)
15/14 --> 6/14 6/10 (nj)
A diacretical mark should precede an upper case letter (positions 4/1 to 5/10), a lower case letter (position 6/1 to 7/10), or a space character (position 2/0). For instance, the representation of é (e with acute accent) is characters 12/2, 6/5. Any diacritical mark may be turned into a spacing character by prefixing it to the space character (position 2/0).
Table of allowed diacretical marks
Notes:
The english names of diacretical marks are
References:
[1] ANSI and CSA, Videotext/Teletext Presentation Level Protocol Syntax, North American PLPS, ANSI X3.110-1983 or CSA T500-1983, American National Standard Institute, Inc., and Canadian Standards Association, 1983
[2] Bell Canada, Corporate Business Development, Information Coding Specification for the Bell Videotext System, Issue 3, November 1987
[3] WordPerfect Corporation, WordPerfect version 6.0 (DOS) Reference, WordPerfect Corporation, 1993
CONNOTECH Experts-conseils Inc.
9130 Place de Montgolfier
Montréal, Québec, Canada, H2M 2A1
Tél.: +1-514-385-5691
Fax: +1-514-385-5900