Instructor | Alley Stoughton |

stough@bu.edu | |

Personal Website | alleystoughton.us |

Course Website | alleystoughton.us/cs516 |

Class Sessions | Tuesday/Thursday 12:30-1:45pm MCS B33 |

Problem Solving Sessions | Tuesday 6:30pm-7:45pm MCS B33 |

Office Hours | Wednesday 2:30-4:30pm MCS 122 and by appointment |

Course Piazza | piazza.com/bu/spring2022/cs516 |

You might want to take this course if:

- You are a CS undergraduate who likes theory and who is looking for an advanced topics course (Group D). E.g., maybe you enjoyed CAS CS 330 (Introduction to the Analysis of Algorithms) or CAS CS 332 (Elements of the Theory of Computation), and would like to learn more about formal language (automata) theory.
- You are a CS Master's student who wants to satisfy the "software area" breadth requirement for the MS Degree.
- You are a CS doctoral student who wants to satisfy the "software area" breadth requirement for the PhD Degree.
- You are a Mathematics student who would like to learn some computer science theory. Please don't worry about the formal prerequisites; you can talk with me about whether you have enough background knowledge to take the course.
- You like solving challenging mathematical puzzles, and/or you are interested in getting better at doing mathematical proofs, and/or you want to learn some cool math. In particular, the course covers topics from foundations of mathematics — like inductive definitions and well-founded recursion — that are generally useful in computer science theory, but are especially applicable to the theory of programming languages.
- You like functional programming, or are interested in learning about functional programming.
- You are interested in the applications of formal language theory, e.g., to compiler construction, or the design of hardware or network protocols.

Please email me (stough@bu.edu) if you have questions or for further information.

Since the 1930s, the subject of formal language theory, also known as automata theory, has been developed by computer scientists, linguists and mathematicians. Formal languages (or simply languages) are sets of strings over finite sets of symbols, called alphabets, and various ways of describing such languages have been developed and studied, including regular expressions (which "generate" languages), finite automata (which "accept" languages), grammars (which "generate" languages) and Turing machines (which "accept" languages). For example, the set of identifiers of a given programming language is a formal language — one that can be described by a regular expression or a finite automaton. And, the set of all strings of tokens that are generated by a programming language's grammar is another example of a formal language.

Formal language theory has applications to many areas of computer science. Perhaps the best known applications are to compiler construction. For example, regular expressions and finite automata are used when specifying and implementing lexical analyzers, and grammars are used to specify and implement parsers. Finite automata are used when designing hardware and network protocols. And Turing machines — or other machines/programs of equivalent power — are used to formalize the notion of algorithm, which in turn makes possible the study of what is, and is not, computable.

Formal language theory is traditionally taught as a
paper-and-pencil mathematics course, even though it is largely
concerned with algorithms on the subject's formalisms: regular
expressions, automata, grammars and Turing machines. In contrast, this
course approaches the subject in a way that balances proof with
*experimentation*, carried out using the
instructor's Forlan toolset, which implements
the algorithms of formal language theory. Forlan is embedded in the
functional programming
language Standard ML (SML), a
language whose notation and concepts are similar to those of
mathematics. SML is strongly typed and interactive, properties that
make sophisticated experimentation robust and enjoyable.

Here is an example deterministic finite automaton that was synthesized using Forlan:

Here is a description of what the automaton does, i.e., what language it accepts.

The course features a rigorous approach to carrying out mathematical proofs, focusing on basic set theory, definitional techniques including various forms of recursion, and proof techniques including different kinds of induction.

*Formal prerequisites*:
CAS CS
320 (Concepts of Programming Languages) and
CAS
CS 330 (Introduction to Analysis of Algorithms); or consent
of instructor.

*Informal prerequisites*: you should have enough familiarity
with the concepts of programming languages so that learning basic
functional programming won't be too much of a stretch; and you should
have reached an intermediate level of mathematical maturity, and be
competent using proof techniques like mathematical induction, and
proof by contradiction.

We will be using the Spring 2022
draft of my (freely available!) textbook on formal language theory, entitled
*Formal Language Theory: Integrating Experimentation and
Proof*.

You may find it useful to also study the related sections of another book on formal language (automata) theory. Note, however, that the notation and definitions used by any particular book won't be identical to the ones we'll be using.

Many of the algorithms that we will study are implemented as part of the Forlan toolset — a collection of tools for experimenting with formal languages that I've been developing since 2001. Forlan is implemented as a set of Standard ML (a functional programming language) modules. It's used interactively. In fact, a Forlan session is nothing more than an SML session in which the Forlan modules are available.

As the course progresses, you'll learn how to use Forlan, and
you'll be using it when solving some of the problem sets. The textbook
contains a brief introduction to SML, but more information can be obtained
from L. C. Paulson's *ML for the Working Programmer*, second
edition, Cambridge University Press, 1966. It
is available
online, for personal use. If you like learning from examples, here's a
brief tutorial
on SML.

The best way to run Forlan or SML is as a sub-process of the Emacs text editor, using the SML mode for Emacs. We'll also be using JForlan, a Java program for creating and editing Forlan automata and trees.

See the Forlan website for instructions for
installing Forlan on personal computers. Forlan is also installed on
the CS Department's Linux
servers `csa1.bu.edu`, `csa2.bu.edu` and
`csa3.bu.edu`. When
connecting to one of these servers using the secure shell
(`ssh`) command on Linux or macOS, you must use
the `-Y` option, so as to enable "trusted X11
forwarding". Otherwise, you won't be able to use JForlan. I.e., run
one of the commands

ssh -Y csa1.bu.edu ssh -Y csa2.bu.edu ssh -Y csa3.bu.edu

See the file `/home/fac3/stough/CS516-README` for how
to modify your shell's `PATH` variable to add `sml`,
`forlan` and `jforlan` to your shell's search path.

There will be seven problem sets, each worth 100 points, plus a
course project *or* final exam (students will be asked to
choose one or the other), worth 200 points.

Each assessment unit will be graded using the following 100 point scale: A+ (100), A (92), A- (84), B+ (76), B (68), B- (60), C+ (52), C (44), C- (36), D+ (28), D (20), D- (12), F (0). (E.g., a grade of 80 points should be thought of as being halfway between a B+ and an A-.)

Late work will be assessed a penalty of 20% during the first twenty-four hours. Work that is more than twenty-four hours late will not be graded.

The elegance and simplicity of your work will be taken into account when it is assessed. I reserve the right to award extra credit to students who find errors in, or suggest improvements to, course materials.

A student's grade for the course will be the letter (possibly followed by a + or -) whose value is nearest to the weighted average of the grades of the student's problem sets and course project/final exam. Ties will be resolved by selecting the better of the two grades.

When working on a problem set, you may optionally work with a
*single* other student — i.e., work in a pair. If you opt
to do this, the two of you will submit a *single* solution to
the problem set, with both of your names on it. You will then receive
a joint grade for the problem set. If there is a part of the problem
set solution that only one of you understands, you must note that
discrepancy in your submission. In that case, you and your partner's grades
may be adjusted to reflect that difference in understanding.

Apart from pair work on problem sets, the work you hand in must be your own. You may discuss problem sets with others in general terms, but you may not base your work on other people's work, and you may not show your draft work to others.

If you use resources other than course materials when working on problem sets or the course project (if you opt for a project instead of a final exam), you must cite your sources.

You are responsible for reading and understanding BU's Academic Conduct Code (for undergraduates)

or the GRS Academic and Professional Conduct Code (for graduate students)

www.bu.edu/cas/files/2017/02/GRS-Academic-Conduct-Code-Final.pdf

Incidents of academic misconduct will be reported to the Academic Conduct Committee (ACC). If the ACC finds a student guilty, punishment could range from a minimum of a grading sanction (e.g., dropping your final grade by one letter grade) all the way up to suspension or expulsion from the University.

I'll be offering weekly problem solving sessions in which I'll solve problems on the blackboard, show how Forlan is used, and answer questions. They will be on Tuesdays from 6:30-7:45pm in MCS B33.

My office hours will be on Wednesday from 2:30-4:30pm in MCS 122. You can also make an appointment with me at an alternative time, either in person or via Zoom.

We'll be using Piazza for announcements and online
discussion. Visit

You will be submitting solutions to assignments via Gradescope. I will post the entry code on Piazza.

So that you can privately submit Forlan/sml code in a
machine-readable form, you will need to create
a *private* GitHub repository
and grant me (GitHub account: `alleystoughton`) access to
it. If you don't already have a GitHub account, you will need to
create one first.

When preparing solutions to assignments, you may wish to make use of the LaTeX document preparation system. LaTeX handles mathematics very well, and is widely used by academic computer scientists.

Here are some LaTeX macros I've used in writing the book, lecture
slides and problem sets:
`defs.tex` and
`commands.tex`.

- Lecture 1: January 20
- Lecture 2: January 25
- Lecture 3: January 27
- Lecture 4: February 1
- Lecture 5: February 3
- Lecture 6: Febuary 8
- Lecture 7: February 10
- Lecture 8: February 15
- Lecture 9: February 17
- Lecture 10: February 24
- Lecture 11: March 1
- Lecture 12: March 3
- Lecture 13: March 15
- Lecture 14: March 17
- Lecture 15: March 22
- Lecture 16: March 24
- Lecture 17: March 29
- Lecture 18: March 31
- Lecture 19: April 5
- Lecture 20: April 7
- Lecture 21: April 12
- Lecture 22: April 14
- Lecture 23: April 19
- Lecture 24: April 21
- Lecture 25: April 26
- Lecture 26: April 28
- Lecture 27: May 3

- Problem Set 1
- Model Answers to Problem Set 1
- Problem Set 2
(
`ps2-framework.sml`) - Model Answers to Problem Set 2
(
`ps2-explain.sml`,`ps2-testit.sml`) - Problem Set 3
- Model Answers to Problem Set 3
- Problem Set 4
- Model Answers to Problem Set 4
(
`ps4-p1-fa`,`ps4-p1.sml`,`ps4-p2-fa`,`ps4-p2.sml`) - Problem Set 5
- Model Answers to Problem Set 5
(
`ps5-p1-fa`,`ps5-p2.sml`,`ps5-p2-testing.sml`,`ps5-p3.sml`,`ps5-p3-testing.sml`) - Problem Set 6
- Model Answers to Problem Set 6
(
`ps6-p2-gram`,`ps6-p2-testing.sml`,`ps6-p3.sml`,`ps6-p3-testing.sml`) - Problem Set 7
- Model Answers to Problem Set 7
(
`ps7-p3-gram`,`ps7-p4-gen.sml`,`ps7-p4-testing.sml`,`ps7-p4-gram`)

- Final Project or Examination
- Preparing for Final Examination (print)
- Final Examination
- Model Answers to Final Examination

- Problem Set 1
- Model Answers to Problem Set 1
- Problem Set 2
(
`ps2-framework.sml`) - Model Answers to Problem Set 2
(
`ps2-explain.sml`) - Problem Set 3
- Model Answers to Problem Set 3
- Problem Set 4
- Model Answers to Problem Set 4
(
`ps4-dfa`,`ps4-p2.sml`) - Problem Set 5
- Model Answers to Problem Set 5
(
`ps5.sml`,`ps5-test.sml`) - Problem Set 6
- Model Answers to Problem Set 6
- Problem Set 7
- Model Answers to Problem Set 7
(
`ps7-p2-orig-gram.txt`,`ps7-p2-even0s-dfa.txt`,`ps7-p2-find.sml`,`ps7-p2-hand-simp.sml`)

Alley Stoughton (stough@bu.edu)