SUBSTRING
signature
signature SUBSTRING
structure Substring
:> SUBSTRING
where type substring = CharVectorSlice.slice
where type string = String.string
where type char = Char.char
structure WideSubstring
:> SUBSTRING (* OPTIONAL *)
where type substring = WideCharVectorSlice.slice
where type string = WideString.string
where type char = WideChar.char
The SUBSTRING
signature specifies manipulations on an abstract representation of a sequence of contiguous characters in a string. A substring
value can be modeled as a triple (s, i, n)
, where s is the underlying string, i is the starting index, and n is the size of the substring, with the constraint that 0 <= i <= i + n <= |s|.
The substring
type and its attendant functions provide a convenient abstraction for performing a variety of common analyses of strings, such as finding the leftmost occurrence, if any, of a character in a string. In addition, using the substring
functions avoids much of the copying and bounds checking that occur if similar operations are implemented solely in terms of strings.
The SUBSTRING
signature is matched by two structures, the required Substring
and the optional WideSubstring
. The former is a companion structure to the Char
and String
structures, which are based on the extended ASCII 8-bit character set. The structure WideSubstring
is related in the same way to the structures WideChar
and WideString
, which are based on characters of some size greater than or equal to 8 bits. In particular, the types Substring.string
and Substring.char
are identical to those types in the structure String
and, when WideSubstring
is defined, the types WideSubstring.string
and WideSubstring.char
are identical to those types in the structure WideString
.
All of these connections are made explicit in the Text
and WideText
structures, which match the TEXT
signature. In the exposition below, references to a String
structure refers to the substructure of that name defined in either the Text
or the WideText
structure, which ever is appropriate.
The design of the SUBSTRING
interface was influenced by the paper ``Subsequence References: First-Class Values for Substrings,'' by Wilfred J. Hansen[CITE].
type substring
eqtype char
eqtype string
val sub : substring * int -> char
val size : substring -> int
val base : substring -> string * int * int
val extract : string * int * int option -> substring
val substring : string * int * int -> substring
val full : string -> substring
val string : substring -> string
val isEmpty : substring -> bool
val getc : substring -> (char * substring) option
val first : substring -> char option
val triml : int -> substring -> substring
val trimr : int -> substring -> substring
val slice : substring * int * int option -> substring
val concat : substring list -> string
val concatWith : string -> substring list -> string
val explode : substring -> char list
val isPrefix : string -> substring -> bool
val isSubstring : string -> substring -> bool
val isSuffix : string -> substring -> bool
val compare : substring * substring -> order
val collate : (char * char -> order)
-> substring * substring -> order
val splitl : (char -> bool)
-> substring -> substring * substring
val splitr : (char -> bool)
-> substring -> substring * substring
val splitAt : substring * int -> substring * substring
val dropl : (char -> bool) -> substring -> substring
val dropr : (char -> bool) -> substring -> substring
val takel : (char -> bool) -> substring -> substring
val taker : (char -> bool) -> substring -> substring
val position : string -> substring -> substring * substring
val span : substring * substring -> substring
val translate : (char -> string) -> substring -> string
val tokens : (char -> bool) -> substring -> substring list
val fields : (char -> bool) -> substring -> substring list
val app : (char -> unit) -> substring -> unit
val foldl : (char * 'a -> 'a) -> 'a -> substring -> 'a
val foldr : (char * 'a -> 'a) -> 'a -> substring -> 'a
sub (s, i)
String.sub
(string
s, i)
. The exception Subscript
is raised unless 0 <= i < |s|.
size s
#3 o base
and String.size
o string
.
base ss
(s, i, n)
giving a concrete representation of the substring. s is the underlying string, i is the starting index, and n is the size of the substring. It will always be the case that 0 <= i <= i + n <= |s| .
extract (s, i, NONE)
extract (s, i, SOME j)
substring (s, i, j)
Subscript
unless 0 <= i <= |s|. The second form returns the substring of size j starting at index i, i.e., the string s[i..i+j-1]
. It raises Subscript
if i < 0 or j < 0 or |s| < i + j. Note that, if defined, extract
returns the empty substring when i = |s|.
The third form returns the substring s[i..i+j-1]
, i.e., the substring of size j starting at index i. This is equivalent to
.
extract
(s, i, SOME
j)
We require that
be the identity function on valid arguments.
base
o substring
Implementation note:
Implementations of these functions must perform bounds checking in such a way that the
Overflow
exception is not raised.
full s
substring
(s, 0, String.size
s)
.
string s
String.substring
o base
for the corresponding String
structure.
isEmpty s
true
if s has size 0.
getc s
NONE
if s is empty.
first s
NONE
if s is empty.
triml k s
trimr k s
ss = substring
(s, i, j)
and k <= j, we have:
triml k ss = substring(s, i+k, j-k) trimr k ss = substring(s, i, j-k)The exception
Subscript
is raised if k < 0
. This exception is raised when triml k
or trimr k
is evaluated.
slice (s, i, SOME m)
slice (s, i, NONE)
Subscript
is raised.
concat l
String.concat
o (List.map
string
)
. This raises Size
if the sum of all the sizes is greater than the corresponding maxSize
for the string
type.
concatWith s l
Size
if the size of the resulting string would be greater than maxSize
for the string
type.
explode s
String.explode
(string
s)
.
isPrefix s ss
isSubstring s ss
isSuffix s ss
true
if the string s is a prefix, substring, or suffix (respectively) of the substring ss. The functions are equivalent to their versions from STRING
. For example, isPrefix s ss
is the same as String.isPrefix s (string
ss)
.
compare (s, t)
String.compare (string s, string t)
collate f (s, t)
String.collate f (string s, string t)
splitl f s
splitr f s
(ls, rs)
giving the split of the substring into the span up to that character and the rest. ls is the left side of the split, and rs is the right side. For example, if the characters a
and c
satisfy the predicate, but character X
does not, then these functions work as follows on the substring aaaXbbbbXccc
:
splitl : aaa XbbbbXccc splitr : aaaXbbbbX ccc
splitAt (s, i)
(ss, ss')
, where ss contains the first i characters of s and ss' contains the rest, assuming 0 <= i <= size
s. Otherwise, it raises Subscript
.
dropl f s
dropr f s
takel f s
taker f s
dropl
and takel
scan left to right (i.e., increasing character indices), while dropr
and taker
scan from the right. The drop functions drop the maximal substring consisting of characters satisfying the predicate, while the take functions return the maximal such substring. These can be defined in terms of the split operations:
takel p s = #1(splitl p s) dropl p s = #2(splitl p s) taker p s = #2(splitr p s) dropr p s = #1(splitr p s)
position s ss
(pref, suff)
of substrings, where suff is the longest suffix of ss that has s as a prefix and pref is the prefix of ss preceding suff. More precisely, let m be the size of s and let ss correspond to the substring (s', i, n)
. If there is a least index k >= i such that s = s'[k..k+m-1]
, then suff corresponds to (s', k, n+i-k)
and pref corresponds to (s', i, k-i)
. If there is no such k, then suff is the empty substring corresponding to (s', i+n, 0)
and pref corresponds to (s', i, n)
, i.e., all of ss.
span (ss, ss')
Span
if ss and ss' are not substrings of the same underlying string or if the start of ss is to the right of the end of ss'. More precisely, if we have
val (s, i, n) = base ss val (s', i', n') = base ss'then
span
returns substring(s, i, (i'+n')-i)
unless s <> s'
or i'+n' < i
, in which case it raises Span
. Note that this does not preclude ss' from beginning to the left of ss, or ss from ending to the right of ss'.
This function allows one to scan for a substring using multiple pieces and then coalescing the pieces. For example, given a URL string such as
"http://www.standardml.org/Basis/overview.html"to scan the protocol and host (
"http://www.standardml.org"
), one could write:
local open Substring in fun protoAndHost url = let fun notc (c : char) = fn c' => c <> c' val (proto,rest) = splitl (notc #":") (full url) val host = takel (notc #"/") (triml 3 rest) in span (proto, host) end end
Implementation note:
When applied to substrings derived from the identical base string, the string equality test should be constant time. This can be achieved by first doing a pointer test and, only if that fails, then checking the strings character by character.
translate f s
String.concat
(List.map
f (explode
s))
.
tokens f s
fields f s
Two tokens may be separated by more than one delimiter, whereas two fields are separated by exactly one delimiter. For example, if the only delimiter is the character #"|"
, then the substring "|abc||def"
contains two tokens "abc"
and "def"
, whereas it contains the four fields ""
, "abc"
, ""
and "def"
.
app f s
List.app
f (explode
s)
.
foldl f a s
foldr f a s
List
. In particular, they are respectively equivalent to:
List.foldl f a (explode s) List.foldr f a (explode s)
CHAR
,List
,STRING
,StringCvt
,TEXT
Implementation note:
Functions that extract pieces of a substring, such as
splitl
ortokens
must return substrings with the same base string. This requirement is particularly important ifspan
is to be used to put the pieces back together again.
Generated October 02, 2003
Last Modified October 17, 2000
Comments to John Reppy.
This document may be distributed freely over the internet as long as the copyright notice and license terms below are prominently displayed within every machine-readable copy.
Copyright © 2003 AT&T and Lucent Technologies. All rights reserved.
Permission is granted for internet users to make one paper copy for their
own personal use. Further hardcopy reproduction is strictly prohibited.
Permission to distribute the HTML document electronically on any medium
other than the internet must be requested from the copyright holders by
contacting the editors.
Printed versions of the SML Basis Manual are available from Cambridge
University Press.
To order, please visit
www.cup.org (North America) or
www.cup.cam.ac.uk (outside North America). |