NAME
	  ctab - Locale	character classification, case conversion, and
	  collating input file

     DESCRIPTION
	  A locale character classification, case conversion and col-
	  lating input file consists of	records	separated by newline
	  characters. Each record consists of one character or colla-
	  tion element in the locale, where a collation	element	is a
	  sequence of two or more characters that collate as a single
	  unit.	 These files are not directly accessed by user pro-
	  grams: the ctab command reads	them to	produce	binary files
	  loaded by the	setlocale() function.

	  The ordering of the records determines the order of the
	  locale's characters.	Records	marked with the	translate or
	  ignore indicator (see	KEYWORDS) do not reflect this order-
	  ing.	The ordering of	characters in a	locale may also	be
	  referred to as their collation weights.

	  Several characters may have the same primary collation
	  weights but different	secondary weights.  In French, the
	  plain	and accented versions of a's all sort to the same pri-
	  mary location.  If there is a	tie between a plain and
	  accented character, however, a secondary sort	is applied.  A
	  group	of characters with the same primary collation value
	  are said to belong to	the same equivalence class.  If	a
	  character is not part	of an equivalence class, it has	ident-
	  ical primary and secondary collation weights.

	  This primary and secondary collation weight information is
	  used in applications,	such as	grep, which use	ctab informa-
	  tion to determine string sequence.

	  The ctab input file describes	the collating weights for an
	  assumed code set and a particular language.  If a character
	  is encountered which does not	appear in the ctab file
	  corresponding	to the current locale, the character's colla-
	  tion weight will be based on its relative position in	the
	  current code set.

	  Records in the locale	ctab input files have fields separated
	  by a separator character (By default,	this separator is a :
	  (colon), but the user	can change this; see KEYWORDS).	The
	  records have the following fields:

	  subject character
	       The subject character field is actually the collating
	       element,	which may be comprised of more than one	char-
	       acter. If the subject character is a multicharacter
	       collating element, the first character in the element
	       must also be defined as a subject character elsewhere
	       in the input file.  If the character or collating ele-
	       ment is followed	by the equivalence class character,
	       which is	a ^ (circumflex) by default, it	is given the
	       same primary collating weight as	the character
	       represented by the preceding record.  The secondary
	       collation weight	is unique.  Characters can be speci-
	       fied using octal	escape sequences consisting of a \
	       (backslash) followed by one or more octal digits.  Any
	       backslash not followed by an octal digit	is an escape
	       character.  The subject character field must be ter-
	       minated by a separator character	even if	there are no
	       other fields in the record.

	  case conversion
	       The case	conversion field specifies the character that
	       is the inverse case of the character in the first
	       field.  For example, if the first field is p, the
	       second field is P.  If the third	field, the character
	       classification field (see below), contains an l or L
	       (for lowercase),	the second field specifies the upper-
	       case equivalent of the subject character.  If the char-
	       acter classification field contains a u or U (for
	       uppercase), the case conversion field specifies the
	       lowercase equivalent of the subject character.  Any
	       character with a	nonempty case conversion field can
	       specify the corresponding uppercase or lowercase
	       letter.	Characters classified as alphabetic do not
	       require a corresponding case; that is, the second field
	       can be empty.  The second field currently is not	used
	       for SJIS	characters when	Japanese Language Support is
	       installed.

	  character classification
	       The character classification field values assume	the
	       following classes and values:

	       u or U	      Uppercase	letter

	       l or L	      Lowercase	letter

	       a or A	      Alphabetic character

	       n or N	      Digits

	       x or X	      Hexadecimal digits

	       p or P	      Punctuation characters

	       s or S	      Whitespace characters

	       c or C	      Control characters

	       g or G	      Graphic

	       -	      No type

	       Characters can belong to	more than one character	class,
	       subject to certain rules.  The difference between
	       graphic and printable characters	is that	the set	of
	       graphic characters does not include the space charac-
	       ter, but	the set	of printable characters	does include
	       the space character.  The ASCII code set	is predefined
	       as follows:

	       A through Z    Uppercase	letters

	       a through z    Lowercase	letters

	       A through Z, and	a through z
			      Alphabetic characters

	       0 through 9    Digits

	       Alphabetic characters and digits
			      Alphanumeric characters

	       0 through 9, A through F, and a through f
			      Hexadecimal digits

	       acter
	       Any character below the Space character and the Delete char-
			      Control characters

	       and vertical tab
	       Space, formfeed,	newline, carriage-
			      return, horizontal tab,
			      Whitespace characters

	       Any character except the	above
			      Punctuation characters

	       Characters not defined as alphabetic are	automatically
	       defined as punctuation.

	Keywords
	  A line beginning with	the word "option" serves to change one
	  or more of the default conditions or metacharacters built
	  into the collating table.  The word "option" is followed by
	  one or more keyword/value pairs.  Keywords and values	are
	  separated by tab or space characters.	 The following key-
	  words	are recognized:

	  comment
	       Uses the	assigned value as the comment character.  The
	       default value is	the # (number sign).  Anything on a
	       line that follows the comment character is ignored.

	  sep  Uses the	assigned value as the field separator charac-
	       ter.  The default value is a : (colon).	Tabs or	spaces
	       can surround fields or separators.

	  ignore
	       Uses the	assigned value as the ignore character indica-
	       tor.  The default value is the @	(at sign).  A charac-
	       ter marked with the ignore indicator is ignored for
	       collation purposes.

	  repeat
	       Uses the	assigned value as the equivalence class	indi-
	       cator.  The default value is the	^ (circumflex) charac-
	       ter.  A character marked	with the equivalence class
	       indicator has the same primary collation	value as the
	       preceding character.

	  trans
	       Uses the	assigned character as the translate indicator.
	       The default value is the	| (vertical bar).  A collation
	       element marked with the translate indicator is
	       translated to the collation element(s) following	the
	       indicator.  For example,	to treat the German eszet ()
	       element as the two characters ss, the first field of
	       the line	would be:
	       \337|ss:
	       The unique collation weight is used in regular expres-
	       sions (see grep).  Characters being translated cannot
	       be followed by an equivalence character.	 The subject
	       character cannot	be contained in	its own	substitution
	       collation element(s) (not o|oe).	 The translation
	       mechanism completes in one pass:	none of	the characters
	       in the substitution collation elements can in turn be
	       the subject of further translation, so the following
	       example is illegal:
	       q|r:
	       x|pq:
	       Characters being	translated have	no primary collating
	       weight of their own, but	have a unique collation
	       weight, which is	based on the order of the input	line
	       of the input file.

     EXAMPLES
	  The following	line is	interpreted as a field containing a
	  backslash and	a colon	followed by a field separator:
	  \\\::

	  Here are the first and last three lines of a sample C.ctab
	  file:
	  \000:
	  \001:
	  \002:

	  }:
	  ~:
	  \177::c

     FILES
	  /usr/lib/nls/loc/<locale>
	       Binary character	classification,	case conversion	and
	       collating output	file for locale	<locale>.

	  /etc/nls/loc/<locale>
	       Binary locale classification, case conversion and col-
	       lating output file.  This is only used as a default
	       during single-user mode operation.

     RELATED INFORMATION
	  Commands: ctab(1)

	  Functions: setlocale(3)

	  "Using Internationalization Features"	in the OSF/1 User's
	  Guide



























Acknowledgement and Disclaimer