Boston University — CS 591 S2 — Formal Language Theory: Integrating Experimentation and Proof — Fall 2019


Instructor Alley Stoughton
Email stough@bu.edu
Personal Website alleystoughton.us
Course Website alleystoughton.us/591-s2
Class Sessions Tuesdays/Thursdays 12:30-1:45pm MCS B33
Problem Solving Sessions Tuesdays 6:30-7:20pm CAS 222
Office Hours Tuesdays/Thursdays 2:30-4pm, MCS 122
Course Piazza piazza.com/bu/fall2019/cs591s2

Thinking about Taking this Course?

You might want to take this course if:

Please email me (stough@bu.edu) if you have questions or for further information.

Overview

Since the 1930s, the subject of formal language theory, also known as automata theory, has been developed by computer scientists, linguists and mathematicians. Formal languages (or simply languages) are sets of strings over finite sets of symbols, called alphabets, and various ways of describing such languages have been developed and studied, including regular expressions (which "generate" languages), finite automata (which "accept" languages), grammars (which "generate" languages) and Turing machines (which "accept" languages). For example, the set of identifiers of a given programming language is a formal language — one that can be described by a regular expression or a finite automaton. And, the set of all strings of tokens that are generated by a programming language's grammar is another example of a formal language.

Formal language theory has applications to many areas of computer science. Perhaps the best known applications are to compiler construction. For example, regular expressions and finite automata are used when specifying and implementing lexical analyzers, and grammars are used to specify and implement parsers. Finite automata are used when designing hardware and network protocols. And Turing machines — or other machines/programs of equivalent power — are used to formalize the notion of algorithm, which in turn makes possible the study of what is, and is not, computable.

Formal language theory is traditionally taught as a paper-and-pencil mathematics course, even though it is largely concerned with algorithms on the subject's formalisms: regular expressions, automata, grammars and Turing machines. In contrast, this course approaches the subject in a way that balances proof with experimentation, carried out using the instructor's Forlan toolset, which implements the algorithms of formal language theory. Forlan is embedded in the functional programming language Standard ML (SML), a language whose notation and concepts are similar to those of mathematics. SML is strongly typed and interactive, properties that make sophisticated experimentation robust and enjoyable.

Here is an example deterministic finite automaton that was synthesized using Forlan:

Example Deterministic Finite Automaton

Here is a description of what the automaton does, i.e., what language it accepts.

The course features a rigorous approach to carrying out mathematical proofs, focusing on basic set theory, definitional techniques including various forms of recursion, and proof techniques including different kinds of induction.

Formal prerequisites (not enforced for grad students): CAS CS 320 (Concepts of Programming Languages); and either CAS CS 330 (Introduction to Analysis of Algorithms) or CAS CS 332 (Elements of the Theory of Computation).

Informal prerequisites: you should have enough familiarity with the concepts of programming languages so that learning basic functional programming won't be too much of a stretch; and you should have reached an intermediate level of mathematical maturity, and be competent using proof techniques like mathematical induction, and proof by contradiction.

Resources

Textbook

We will be using the Fall 2019 draft of my textbook on formal language theory, entitled Formal Language Theory: Integrating Experimentation and Proof. I'll be making revisions as the semester progresses.

You may find it useful to also study the related sections of another book on formal language (automata) theory. Note, however, that the notation and definitions used by any particular book won't be identical to the ones we'll be using.

Forlan Toolset

Many of the algorithms that we will study are implemented as part of the Forlan toolset — a collection of tools for experimenting with formal languages that I've been developing since 2001. Forlan is implemented as a set of Standard ML (a functional programming language) modules. It's used interactively. In fact, a Forlan session is nothing more than an SML session in which the Forlan modules are available.

As the course progresses, you'll learn how to use Forlan, and you'll be using it when solving some of the problem sets. The textbook contains a brief introduction to SML, but more information can be obtained from L. C. Paulson's ML for the Working Programmer, second edition, Cambridge University Press, 1966. It is available online, for personal use. If you like learning from examples, here's a brief tutorial on SML.

The best way to run Forlan or SML is as a sub-process of the Emacs text editor, using the SML mode for Emacs. We'll also be using JForlan, a Java program for creating and editing Forlan automata and trees.

See the Forlan website for instructions for installing Forlan on personal computers. Forlan is also installed on the CS Department's Linux servers csa1.bu.edu, csa2.bu.edu and csa3.bu.edu. When connecting to one of these servers using the secure shell (ssh) command on Linux or macOS, you must use the -Y option, so as to enable "trusted X11 forwarding". Otherwise, you won't be able to use JForlan. I.e., run one of the commands

  ssh -Y csa1.bu.edu
  ssh -Y csa2.bu.edu
  ssh -Y csa3.bu.edu

Assessment

There will be seven problem sets, each worth 100 points, plus a course project or final exam (students will be asked to choose one or the other), worth 200 points.

Each assessment unit will be graded using the following 100 point scale: A+ (100), A (92), A- (84), B+ (76), B (68), B- (60), C+ (52), C (44), C- (36), D+ (28), D (20), D- (12), F (0). (E.g., a grade of 80 points should be thought of as being halfway between a B+ and an A-.)

Late work will be assessed a penalty of 20% during the first twenty-four hours. Work that is more than twenty-four hours late will not be graded.

The elegance and simplicity of your work will be taken into account when it is assessed. I reserve the right to award extra credit to students who find errors in, or suggest improvements to, course materials.

A student's grade for the course will be the letter (possibly followed by a + or -) whose value is nearest to the weighted average of the grades of the student's problem sets and course project/final exam. Ties will be resolved by selecting the better of the two grades.

When working on a problem set, you may optionally work with a single other student — i.e., work in a pair. If you opt to do this, the two of you will submit a single solution to the problem set, with both of your names on it. You will then receive a joint grade for the problem set. If there is a part of the problem set solution that only one of you understands, you must note that discrepancy in your submission. In that case, you and your partner's grades may be adjusted to reflect that difference in understanding.

Apart from pair work on problem sets, the work you hand in must be your own. You may discuss problem sets with others in general terms, but you may not base your work on other people's work, and you may not show your draft work to others.

If you use resources other than course materials when working on problem sets or the course project (if you opt for a project instead of a final exam), you must cite your sources.

Academic Integrity

You are responsible for reading and understanding BU's Academic Conduct Code (for undergraduates)

http://www.bu.edu/academics/policies/academic-conduct-code

or the GRS Academic and Professional Conduct Code (for graduate students)

http://www.bu.edu/cas/files/2017/02/GRS-Academic-Conduct-Code-Final.pdf

Incidents of academic misconduct will be reported to the Academic Conduct Committee (ACC). If the ACC finds a student guilty, punishment could range from a minimum of a grading sanction (e.g., dropping your final grade by one letter grade) all the way up to suspension or expulsion from the University.

Email Communication, Problem Solving Sessions, Office Hours and Piazza

I will often communicate with the class by email, and so it is important that you read your email frequently.

I'll be offering weekly problem solving sessions, in which I'll solve problems on the blackboard and answer questions. They will be on Tuesdays from 6:30-7:20pm in CAS 222.

My office hours will be on Tuesdays and Thursdays from 2:30-4pm in MCS 122. You can also make an appointment with me at an alternative time.

We'll be using Piazza for online discussions. Visit piazza.com/bu/fall2019/cs591s2 to join in.

LaTeX Document Preparation System

When preparing solutions to assignments, you may wish to make use of the LaTeX document preparation system. LaTeX handles mathematics very well, and is widely used by academic computer scientists.

Class Sessions