Forlan Manual


The Lex Module


Synopsis

signature LEX
structure Lex :> LEX

This module implements Forlan's lexical analyzer. Most users of Forlan won't need to use it directly.

The lexical analyzer first strips its input of all whitespace and comments, where a comment consists of a "#" plus the rest of the line on which it occurs.


Interface

type basic
val charToBasic : char -> basic
val basicToChar : basic -> char
type sym
val symToString : sym -> string
val sizeSym : sym -> int
val compareSym : sym Sort.total_ordering
val symToPP : sym -> PP.pp
datatype sym_top
  = BasicSymTop of basic
  | CompoundSymTop of sym option list
val symTopToSym : sym_top -> sym
val symToSymTop : sym -> sym_top
datatype tok
  = Bar
  | Comma
  | Dollar
  | Perc
  | Plus
  | Semicolon
  | Star
  | Tilde
  | OpenPar
  | ClosPar
  | SingArr
  | DoubArr
  | Sym of sym
  | Heading of string
  | EOF
val equalTok : tok * tok -> bool
val errorNotEOFTerminated : unit -> 'a
val expectedTok : int * tok -> 'a
val expectedDigit : int -> 'a
val expectedLetter : int -> 'a
val expectedLetterOrDigit : int -> 'a
val unexpectedTok : int * tok -> 'a
val checkInLabToks : tok * (int * tok) list -> (int * tok) list
val error : int * int * PP.pp list -> 'a
val lexString : string -> (int * tok) list
val lexFile : string -> (int * tok) list

Description

type basic
The abstract type consisting of the subset of characters that are lowercase letters, uppercase letters or digits.

charToBasic c
If c is a lowercase letter, an uppercase letter or a digit, then charToBasic returns c. Otherwise, issues an error message.

basicToChar c
returns c.

type sym
The abstract type of Forlan symbols, consisting of the least set of lists of digits, lowercase and uppercase letters, commas (,) and angle brackets (< and >) such that:

The concrete syntax for a symbol [c1, ..., cn] is c1 ... cn. E.g., [<, i, d, >] is written as <id>.

The type could be implemented using lists, but is actually implemented in a way that makes the construction and destruction of symbols more efficient. See symTopToSym and symToSymTop.

symToString a
returns the string whose characters are the elements of a, listed in order.

sizeSym a
returns the length of a.

compareSym(a, b)
compares a and b, first according to length, and then lexicographically, using the ordering in which the comma (,) comes first, followed by the digits (in ascending order), followed by the lowercase letters (in ascending order), followed by the uppercase letters (in ascending order), followed the open angle bracket (<), followed by the close angle bracket (>).

symToPP a
returns a pretty-printing expression that, when pretty-printed, will produce the result of adding spaces, as necessary, to the string consisting of the elements of a, so as to make the nesting of brackets in a clear.

datatype sym_top
  = BasicSymTop of basic
  | CompoundSymTop of sym option list
A datatype describing the top-level structure of a symbol.

symTopToSym top
returns the symbol whose top-level structure is described by top. If top is BasicSymTop b, then symTopToSym returns [b]. Otherwise, top looks like CompoundSymTop xs, in which case the symbol returned by symTopToSym consists of the result of appending an initial [<], followed by the lists corresponding to xs, followed by a closing [>]. Each occurrence of NONE in xs is turned into [,]. And each occurrence of the form SOME a is turned into a.

symToSymTop a
returns the value of type sym_top describing the top-level structure of a. If the only element of a is a digit or letter, then symToSymTop returns BasicSymTop b, where b is that digit or letter. Otherwise a is the result of appending the elements of a list of lists xs, where the first and last elements of xs are [<] and [>], respectively, and each of the remaining elements are either [,] or are single symbols. In this case, symToSymTop returns CompoundSymTop ys, where ys is the value of type sym option list corresponding to all but the first and last elements ([<] and [>]) of xs, in the following way. [,] is turned into NONE. And a symbol x is turned into SOME x.

datatype tok
  = Bar
  | Comma
  | Dollar
  | Perc
  | Plus
  | Semicolon
  | Star
  | Tilde
  | OpenPar
  | ClosPar
  | SingArr
  | DoubArr
  | Sym of sym
  | Heading of string
  | EOF
The datatype of tokens (lexical items). The following translation table is used by the lexical analyzer:
String Token
"|" Bar
"," Comma
"$" Dollar
"%" Perc
"+" Plus
";" Semicolon
"*" Star
"~" Tilde
"(" OpenPar
")" ClosPar
"->" SingArr
"=>" DoubArr
symbol a Sym a
heading s Heading s
end of file (end of string) EOF

A heading consists of an initial "{", followed by a sequence of uppercase and lowercase letters, followed by a trailing "}".

A labeled token consists of a token plus the line number at which it was found, and a labeled token list consists of a list of labeled tokens.

equalTok(tok1, tok2)
tests whether tok1 and tok2 are equal, meaning that they have the same constructor and the same argument, if any.

errorNotEOFTerminated()
issues an error message saying that a labeled token list isn't EOF-terminated.

expectedTok(n, tok)
issues an error message saying that, on line n, an occurrence of (the string corresponding to) tok was expected.

expectedDigit n
issues an error message saying that, on line n, a digit was expected.

expectedLetter n
issues an error message saying that, on line n, a letter was expected.

expectedLetterOrDigit n
issues an error message saying that, on line n, a letter or digit was expected.

unexpectedTok(n, tok)
issues an error message saying that, on line n, an occurrence of tok was unexpected.

checkInLabToks(tok, lts)
If lts begins with tok, labeled by a line number n, then checkInLabToks returns all but this first element of lts. Otherwise, checkInLabToks issues an error message, either complaining that lts wasn't EOF-terminated, or saying that tok was expected on the line that's the label of the first element of lts.

error(n, m, pps)
If n = m, then error issues the error message obtained by pretty-printing the result of annotating pps to say the error occurred on line n. If n <> m, then error issues the error message obtained by pretty-printing the result of annotating pps to say the error occurred between line n and line m.

lexString s
lexically analyzes s, returning a labeled token list, in which each token is labeled with the line number at which it was found. Issues an error message if the lexical analysis fails.

lexFile fil
behaves like lexString, except that it works on the contents of the file named by fil.


[ Top | Parent | Root | Contents | Index ]

Forlan Version 4.15
Copyright © 2022 Alley Stoughton