Lex Module
signature LEX
structure Lex :> LEX
This module implements Forlan's lexical analyzer. Most users of Forlan won't need to use it directly.
The lexical analyzer first strips its input of all whitespace and comments, where a comment consists of a "#" plus the rest of the line on which it occurs.
type basic
val charToBasic : char -> basic
val basicToChar : basic -> char
type sym
val symToString : sym -> string
val sizeSym : sym -> int
val compareSym : sym Sort.total_ordering
val symToPP : sym -> PP.pp
datatype sym_top
= BasicSymTop of basic
| CompoundSymTop of sym option list
val symTopToSym : sym_top -> sym
val symToSymTop : sym -> sym_top
datatype tok
= Bar
| Comma
| Dollar
| Perc
| Plus
| Semicolon
| Star
| Tilde
| OpenPar
| ClosPar
| SingArr
| DoubArr
| Sym of sym
| Heading of string
| EOF
val equalTok : tok * tok -> bool
val errorNotEOFTerminated : unit -> 'a
val expectedTok : int * tok -> 'a
val expectedDigit : int -> 'a
val expectedLetter : int -> 'a
val expectedLetterOrDigit : int -> 'a
val unexpectedTok : int * tok -> 'a
val checkInLabToks : tok * (int * tok) list -> (int * tok) list
val error : int * int * PP.pp list -> 'a
val lexString : string -> (int * tok) list
val lexFile : string -> (int * tok) list
type basic
charToBasic c
c is a lowercase letter, an uppercase letter or a digit, then charToBasic returns c. Otherwise, issues an error message.
basicToChar c
c.
type sym
,) and angle brackets (< and >) such that:
c, [c] is a symbol;
n, and all x1 ... xn that are symbols or [,],
[<] @ x1 @ ... @ xn @ [>]is a symbol.
The concrete syntax for a symbol [c1, ..., cn] is c1 ... cn. E.g., [<, i, d, >] is written as <id>.
The type could be implemented using lists, but is actually implemented in a way that makes the construction and destruction of symbols more efficient. See symTopToSym and symToSymTop.
symToString a
a, listed in order.
sizeSym a
a.
compareSym(a, b)
a and b, first according to length, and then lexicographically, using the ordering in which the comma (,) comes first, followed by the digits (in ascending order), followed by the lowercase letters (in ascending order), followed by the uppercase letters (in ascending order), followed the open angle bracket (<), followed by the close angle bracket (>).
symToPP a
a, so as to make the nesting of brackets in a clear.
datatype sym_top
= BasicSymTop of basic
| CompoundSymTop of sym option list
symTopToSym top
top. If top is BasicSymTop b, then symTopToSym returns [b]. Otherwise, top looks like CompoundSymTop xs, in which case the symbol returned by symTopToSym consists of the result of appending an initial [<], followed by the lists corresponding to xs, followed by a closing [>]. Each occurrence of NONE in xs is turned into [,]. And each occurrence of the form SOME a is turned into a.
symToSymTop a
sym_top describing the top-level structure of a. If the only element of a is a digit or letter, then symToSymTop returns BasicSymTop b, where b is that digit or letter. Otherwise a is the result of appending the elements of a list of lists xs, where the first and last elements of xs are [<] and [>], respectively, and each of the remaining elements are either [,] or are single symbols. In this case, symToSymTop returns CompoundSymTop ys, where ys is the value of type sym option list corresponding to all but the first and last elements ([<] and [>]) of xs, in the following way. [,] is turned into NONE. And a symbol x is turned into SOME x.
datatype tok
= Bar
| Comma
| Dollar
| Perc
| Plus
| Semicolon
| Star
| Tilde
| OpenPar
| ClosPar
| SingArr
| DoubArr
| Sym of sym
| Heading of string
| EOF
| String | Token |
|---|---|
"|"
|
Bar
|
","
|
Comma
|
"$"
|
Dollar
|
"%"
|
Perc
|
"+"
|
Plus
|
";"
|
Semicolon
|
"*"
|
Star
|
"~"
|
Tilde
|
"("
|
OpenPar
|
")"
|
ClosPar
|
"->"
|
SingArr
|
"=>"
|
DoubArr
|
symbol a
|
Sym a
|
heading s
|
Heading s
|
| end of file (end of string) |
EOF
|
A labeled token consists of a token plus the line number at which it was found, and a labeled token list consists of a list of labeled tokens.
equalTok(tok1, tok2)
tok1 and tok2 are equal, meaning that they have the same constructor and the same argument, if any.
errorNotEOFTerminated()
expectedTok(n, tok)
n, an occurrence of (the string corresponding to) tok was expected.
expectedDigit n
n, a digit was expected.
expectedLetter n
n, a letter was expected.
expectedLetterOrDigit n
n, a letter or digit was expected.
unexpectedTok(n, tok)
n, an occurrence of tok was unexpected.
checkInLabToks(tok, lts)
lts begins with tok, labeled by a line number n, then checkInLabToks returns all but this first element of lts. Otherwise, checkInLabToks issues an error message, either complaining that lts wasn't EOF-terminated, or saying that tok was expected on the line that's the label of the first element of lts.
error(n, m, pps)
n = m, then error issues the error message obtained by pretty-printing the result of annotating pps to say the error occurred on line n. If n <> m, then error issues the error message obtained by pretty-printing the result of annotating pps to say the error occurred between line n and line m.
lexString s
s, returning a labeled token list, in which each token is labeled with the line number at which it was found. Issues an error message if the lexical analysis fails.
lexFile fil
lexString, except that it works on the contents of the file named by fil.
Forlan Version 4.15
Copyright © 2022 Alley Stoughton