Lex
Module
signature LEX
structure Lex
:> LEX
This module implements Forlan's lexical analyzer. Most users of Forlan won't need to use it directly.
The lexical analyzer first strips its input of all whitespace and comments, where a comment consists of a "#"
plus the rest of the line on which it occurs.
type basic
val charToBasic : char -> basic
val basicToChar : basic -> char
type sym
val symToString : sym -> string
val sizeSym : sym -> int
val compareSym : sym Sort.total_ordering
val symToPP : sym -> PP.pp
datatype sym_top
= BasicSymTop of basic
| CompoundSymTop of sym option list
val symTopToSym : sym_top -> sym
val symToSymTop : sym -> sym_top
datatype tok
= Bar
| Comma
| Dollar
| Perc
| Plus
| Semicolon
| Star
| Tilde
| OpenPar
| ClosPar
| SingArr
| DoubArr
| Sym of sym
| Heading of string
| EOF
val equalTok : tok * tok -> bool
val errorNotEOFTerminated : unit -> 'a
val expectedTok : int * tok -> 'a
val expectedDigit : int -> 'a
val expectedLetter : int -> 'a
val expectedLetterOrDigit : int -> 'a
val unexpectedTok : int * tok -> 'a
val checkInLabToks : tok * (int * tok) list -> (int * tok) list
val error : int * int * PP.pp list -> 'a
val lexString : string -> (int * tok) list
val lexFile : string -> (int * tok) list
type basic
charToBasic c
c
is a lowercase letter, an uppercase letter or a digit, then charToBasic
returns c
. Otherwise, issues an error message.
basicToChar c
c
.
type sym
,
) and angle brackets (<
and >
) such that:
c
, [c]
is a symbol;
n
, and all x1
... xn
that are symbols or [,]
,
[<] @ x1 @ ... @ xn @ [>]is a symbol.
The concrete syntax for a symbol [c1, ..., cn]
is c1 ... cn
. E.g., [<, i, d, >]
is written as <id>
.
The type could be implemented using lists, but is actually implemented in a way that makes the construction and destruction of symbols more efficient. See symTopToSym
and symToSymTop
.
symToString a
a
, listed in order.
sizeSym a
a
.
compareSym(a, b)
a
and b
, first according to length, and then lexicographically, using the ordering in which the comma (,
) comes first, followed by the digits (in ascending order), followed by the lowercase letters (in ascending order), followed by the uppercase letters (in ascending order), followed the open angle bracket (<
), followed by the close angle bracket (>
).
symToPP a
a
, so as to make the nesting of brackets in a
clear.
datatype sym_top
= BasicSymTop of basic
| CompoundSymTop of sym option list
symTopToSym top
top
. If top
is BasicSymTop b
, then symTopToSym
returns [b]
. Otherwise, top
looks like CompoundSymTop xs
, in which case the symbol returned by symTopToSym
consists of the result of appending an initial [<]
, followed by the lists corresponding to xs
, followed by a closing [>]
. Each occurrence of NONE
in xs
is turned into [,]
. And each occurrence of the form SOME a
is turned into a
.
symToSymTop a
sym_top
describing the top-level structure of a
. If the only element of a
is a digit or letter, then symToSymTop
returns BasicSymTop b
, where b
is that digit or letter. Otherwise a
is the result of appending the elements of a list of lists xs
, where the first and last elements of xs
are [<]
and [>]
, respectively, and each of the remaining elements are either [,]
or are single symbols. In this case, symToSymTop
returns CompoundSymTop ys
, where ys
is the value of type sym option list
corresponding to all but the first and last elements ([<]
and [>]
) of xs
, in the following way. [,]
is turned into NONE
. And a symbol x
is turned into SOME x
.
datatype tok
= Bar
| Comma
| Dollar
| Perc
| Plus
| Semicolon
| Star
| Tilde
| OpenPar
| ClosPar
| SingArr
| DoubArr
| Sym of sym
| Heading of string
| EOF
String | Token |
---|---|
"|"
|
Bar
|
","
|
Comma
|
"$"
|
Dollar
|
"%"
|
Perc
|
"+"
|
Plus
|
";"
|
Semicolon
|
"*"
|
Star
|
"~"
|
Tilde
|
"("
|
OpenPar
|
")"
|
ClosPar
|
"->"
|
SingArr
|
"=>"
|
DoubArr
|
symbol a
|
Sym a
|
heading s
|
Heading s
|
end of file (end of string) |
EOF
|
A labeled token consists of a token plus the line number at which it was found, and a labeled token list consists of a list of labeled tokens.
equalTok(tok1, tok2)
tok1
and tok2
are equal, meaning that they have the same constructor and the same argument, if any.
errorNotEOFTerminated()
expectedTok(n, tok)
n
, an occurrence of (the string corresponding to) tok
was expected.
expectedDigit n
n
, a digit was expected.
expectedLetter n
n
, a letter was expected.
expectedLetterOrDigit n
n
, a letter or digit was expected.
unexpectedTok(n, tok)
n
, an occurrence of tok
was unexpected.
checkInLabToks(tok, lts)
lts
begins with tok
, labeled by a line number n
, then checkInLabToks
returns all but this first element of lts
. Otherwise, checkInLabToks
issues an error message, either complaining that lts
wasn't EOF-terminated, or saying that tok
was expected on the line that's the label of the first element of lts
.
error(n, m, pps)
n = m
, then error
issues the error message obtained by pretty-printing the result of annotating pps to say the error occurred on line n
. If n <> m
, then error
issues the error message obtained by pretty-printing the result of annotating pps to say the error occurred between line n
and line m
.
lexString s
s
, returning a labeled token list, in which each token is labeled with the line number at which it was found. Issues an error message if the lexical analysis fails.
lexFile fil
lexString
, except that it works on the contents of the file named by fil
.
Forlan Version 4.15
Copyright © 2022 Alley Stoughton