CHAR
signature
signature CHAR
structure Char
:> CHAR
where type char = char
where type string = String.string
structure WideChar
:> CHAR (* OPTIONAL *)
where type string = WideString.string
The CHAR
signature defines a type char
of characters, and provides basic operations and predicates on values of that type. There is a linear ordering defined on characters. In addition, there is an encoding of characters into a contiguous range of non-negative integers which preserves the linear ordering.
There are two structures matching the CHAR
signature. The Char
structure provides the extended ASCII 8-bit character set and locale-independent operations on them. For this structure, Char.maxOrd
= 255.
The optional WideChar
structure defines wide characters, which are represented by a fixed number of 8-bit words (bytes). If the WideChar
structure is provided, it is distinct from the Char
structure.
eqtype char
eqtype string
val minChar : char
val maxChar : char
val maxOrd : int
val ord : char -> int
val chr : int -> char
val succ : char -> char
val pred : char -> char
val compare : char * char -> order
val < : char * char -> bool
val <= : char * char -> bool
val > : char * char -> bool
val >= : char * char -> bool
val contains : string -> char -> bool
val notContains : string -> char -> bool
val isAscii : char -> bool
val toLower : char -> char
val toUpper : char -> char
val isAlpha : char -> bool
val isAlphaNum : char -> bool
val isCntrl : char -> bool
val isDigit : char -> bool
val isGraph : char -> bool
val isHexDigit : char -> bool
val isLower : char -> bool
val isPrint : char -> bool
val isSpace : char -> bool
val isPunct : char -> bool
val isUpper : char -> bool
val toString : char -> String.string
val scan : (Char.char, 'a) StringCvt.reader
-> (char, 'a) StringCvt.reader
val fromString : String.string -> char option
val toCString : char -> String.string
val fromCString : String.string -> char option
val minChar : char
chr
0
.
val maxChar : char
chr
maxOrd
.
val maxOrd : int
ord
maxChar
.
ord c
chr i
Chr
if i < 0 or i > maxOrd
.
succ c
Chr
if c = maxChar
. When defined, succ
c
is equivalent to chr
(ord
c + 1)
.
pred c
Chr
if c = minChar
. When defined, pred
c
is equivalent to chr
(ord
c - 1)
.
compare (c, d)
LESS
, EQUAL
, or GREATER
, depending on whether c precedes, equals, or follows d in the character ordering.
val < : char * char -> bool
val <= : char * char -> bool
val > : char * char -> bool
val >= : char * char -> bool
ord
and chr
preserve orderings. For example, if we have x < y
for characters x and y, then it is also true that ord x < ord y
.
contains s c
true
if character c occurs in the string s; otherwise it returns false
.
Implementation note:
In some implementations, the partial application of
contains
to s may build a table, which is used by the resulting function to decide whether a given character is in the string or not. Henceval p =
may be expensive to compute, butcontains
sp c
might be fast for any given character c.
notContains s c
true
if character c does not occur in the string s; it returns false
otherwise. It is equivalent to not
(contains
s c
).
Implementation note:
As with
contains
,notContains
may be implemented via table lookup.
isAscii c
true
if c is a (seven-bit) ASCII character, i.e., 0 <= ord
c <= 127. Note that this function is independent of locale.
toLower c
toUpper c
isAlpha c
true
if c is a letter (lowercase or uppercase).
isAlphaNum c
true
if c is alphanumeric (a letter or a decimal digit).
isCntrl c
true
if c is a control character.
isDigit c
true
if c is a decimal digit [0-9].
isGraph c
true
if c is a graphical character, that is, it is printable and not a whitespace character.
isHexDigit c
true
if c is a hexadecimal digit [0-9a-fA-F].
isLower c
true
if c is a lowercase letter.
isPrint c
true
if c is a printable character (space or visible), i.e., not a control character.
isSpace c
true
if c is a whitespace character (space, newline, tab, carriage return, vertical tab, formfeed).
isPunct c
true
if c is a punctuation character: graphical but not alphanumeric.
isUpper c
true
if c is an uppercase letter.
toString c
#"\\"
and #"\""
, are left unchanged. Backslash #"\\"
becomes "\\\\"
; double quote #"\""
becomes "\\\""
. The common control characters are converted to two-character escape sequences:
Alert (ASCII 0x07) |
"\\a"
|
Backspace (ASCII 0x08) |
"\\b"
|
Horizontal tab (ASCII 0x09) |
"\\t"
|
Linefeed or newline (ASCII 0x0A) |
"\\n"
|
Vertical tab (ASCII 0x0B) |
"\\v"
|
Form feed (ASCII 0x0C) |
"\\f"
|
Carriage return (ASCII 0x0D) |
"\\r"
|
The remaining characters whose codes are less than 32 are represented by three-character strings in ``control character'' notation, e.g., #"\000"
maps to "\\^@"
, #"\001"
maps to "\\^A"
, etc. For characters whose codes are greater than 999, the character is mapped to a six-character string of the form "\\uxxxx"
, where xxxx
are the four hexadecimal digits corresponding to a character's code. All other characters (i.e., those whose codes are greater than 126 but less than 1000) are mapped to four-character strings of the form "\\ddd"
, where ddd
are the three decimal digits corresponding to a character's code.
To convert a character to a length-one string containing the character, use the function String.str
.
scan getc strm
fromString s
scan
returns the remainder of the stream along with the character, whereas fromString
ignores any additional characters in s and just returns the character. If the first character is non-printable (i.e., not in the ASCII range [0x20,0x7E]) or starts an illegal escape sequence (e.g., "\q"
), no conversion is possible and NONE
is returned. The function fromString
is equivalent to StringCvt.scanString
scan
.
The allowable escape sequences are:
\a
| Alert (ASCII 0x07) |
\b
| Backspace (ASCII 0x08) |
\t
| Horizontal tab (ASCII 0x09) |
\n
| Linefeed or newline (ASCII 0x0A) |
\v
| Vertical tab (ASCII 0x0B) |
\f
| Form feed (ASCII 0x0C) |
\r
| Carriage return (ASCII 0x0D) |
\\
| Backslash |
\"
| Double quote |
\^c
|
A control character whose encoding is ord c - 64 , with the
|
character c having ord c in the range [64,95]. For example,
| |
\^H (control-H) is the same as \b (backspace).
| |
\ddd
| The character whose encoding is the number ddd, three decimal |
digits denoting an integer in the range [0,255]. | |
\uxxxx
| The character whose encoding is the number xxxx, four hexadecimal |
digits denoting an integer in the ordinal range of the alphabet. | |
\f...f\
| This sequence is ignored, where f...f stands for a sequence of one |
or more formatting (space, newline, tab, etc.) characters. |
In the escape sequences involving decimal or hexadecimal digits, if the resulting value cannot be represented in the character set, NONE
is returned. As the table indicates, escaped formatting sequences (\f...f\
) are passed over during scanning. Such sequences are successfully scanned, so that the remaining stream returned by scan
will never have a valid escaped formatting sequence as its prefix.
Here are some sample conversions:
Input string s |
fromString s
|
---|---|
"\\q"
|
NONE
|
"a\^D"
|
SOME #"a"
|
"a\\ \\\q"
|
SOME #"a"
|
"\\ \\"
|
NONE
|
""
|
NONE
|
"\\ \\\^D"
|
NONE
|
"\\ a"
|
NONE
|
toCString c
#"\\"
, #"\""
, #"?"
, and #"'"
are left unchanged. Backslash (#"\\"
) becomes "\\\\"
; double quote (#"\""
) becomes "\\\""
, question mark (#"?"
) becomes "\\?"
, and single quote (#"'"
) becomes "\\'"
. The common control characters are converted to two-character escape sequences:
Alert (ASCII 0x07) |
"\\a"
|
Backspace (ASCII 0x08) |
"\\b"
|
Horizontal tab (ASCII 0x09) |
"\\t"
|
Linefeed or newline (ASCII 0x0A) |
"\\n"
|
Vertical tab (ASCII 0x0B) |
"\\v"
|
Form feed (ASCII 0x0C) |
"\\f"
|
Carriage return (ASCII 0x0D) |
"\\r"
|
All other characters are represented by three octal digits, corresponding to a character's code, preceded by a backslash.
fromCString s
fromCString
ignores any additional characters in s. If no conversion is possible, e.g., if the first character is non-printable (i.e., not in the ASCII range [0x20-0x7E] or starts an illegal escape sequence, NONE
is returned.
The allowable escape sequences are given below (cf. Section 6.1.3.4 of the ISO C standard ISO/IEC 9899:1990[CITE]).
\a
| Alert (ASCII 0x07) |
\b
| Backspace (ASCII 0x08) |
\t
| Horizontal tab (ASCII 0x09) |
\n
| Linefeed or newline (ASCII 0x0A) |
\v
| Vertical tab (ASCII 0x0B) |
\f
| Form feed (ASCII 0x0C) |
\r
| Carriage return (ASCII 0x0D) |
\?
| Question mark |
\\
| Backslash |
\"
| Double quote |
\'
| Single quote |
\^c
|
A control character whose encoding is ord c - 64 , with the
|
character c having ord c in the range [64,95]. For example,
| |
\^H (control-H) is the same as \b (backspace).
| |
\ooo
| The character whose encoding is the number ooo, where |
ooo consists of one to three octal digits | |
\xhh
| The character whose encoding is the number hh, |
where hh is a sequence of hexadecimal digits. |
fromCString
accepts an unescaped single quote character, but does not accept an unescaped double quote character.
In the escape sequences involving octal or hexadecimal digits, the sequence of digits is taken to be the longest sequence of such characters. If the resulting value cannot be represented in the character set, NONE
is returned.
STRING
,TEXT
In WideChar
, the functions toLower
, toLower
, isAlpha
,..., isUpper
and, in general, the definition of a ``letter'' are locale-dependent. In Char
, these functions are locale-independent, with the following semantics:
isUpper c |
#"A" <= c
|
isLower c |
#"a" <= c
|
isDigit c |
#"0" <= c
|
isAlpha c |
isUpper c
|
isAlphaNum c |
isAlpha c
|
isHexDigit c |
isDigit c
|
| |
| |
isGraph c |
#"!" <= c
|
isPrint c |
isGraph c
|
isPunct c |
isGraph c
|
isCntrl c |
isAscii c
|
isSpace c |
(#"\t" <= c
|
| |
isAscii c |
0 <= ord c
|
toLower c |
|
toUpper c |
|
Generated October 02, 2003
Last Modified May 27, 2000
Comments to John Reppy.
This document may be distributed freely over the internet as long as the copyright notice and license terms below are prominently displayed within every machine-readable copy.
Copyright © 2003 AT&T and Lucent Technologies. All rights reserved.
Permission is granted for internet users to make one paper copy for their
own personal use. Further hardcopy reproduction is strictly prohibited.
Permission to distribute the HTML document electronically on any medium
other than the internet must be requested from the copyright holders by
contacting the editors.
Printed versions of the SML Basis Manual are available from Cambridge
University Press.
To order, please visit
www.cup.org (North America) or
www.cup.cam.ac.uk (outside North America). |