Boston University — CS 516 — Software Foundations via Formal Languages (Spring 2022)


Instructor Alley Stoughton
Email stough@bu.edu
Personal Website alleystoughton.us
Course Website alleystoughton.us/cs516
Class Sessions Tuesday/Thursday 12:30-1:45pm MCS B33
Problem Solving Sessions Tuesday 6:30pm-7:45pm MCS B33
Office Hours Wednesday 2:30-4:30pm MCS 122 and by appointment
Course Piazza piazza.com/bu/spring2022/cs516

Thinking about Taking this Course?

You might want to take this course if:

Please email me (stough@bu.edu) if you have questions or for further information.

Overview

Since the 1930s, the subject of formal language theory, also known as automata theory, has been developed by computer scientists, linguists and mathematicians. Formal languages (or simply languages) are sets of strings over finite sets of symbols, called alphabets, and various ways of describing such languages have been developed and studied, including regular expressions (which "generate" languages), finite automata (which "accept" languages), grammars (which "generate" languages) and Turing machines (which "accept" languages). For example, the set of identifiers of a given programming language is a formal language — one that can be described by a regular expression or a finite automaton. And, the set of all strings of tokens that are generated by a programming language's grammar is another example of a formal language.

Formal language theory has applications to many areas of computer science. Perhaps the best known applications are to compiler construction. For example, regular expressions and finite automata are used when specifying and implementing lexical analyzers, and grammars are used to specify and implement parsers. Finite automata are used when designing hardware and network protocols. And Turing machines — or other machines/programs of equivalent power — are used to formalize the notion of algorithm, which in turn makes possible the study of what is, and is not, computable.

Formal language theory is traditionally taught as a paper-and-pencil mathematics course, even though it is largely concerned with algorithms on the subject's formalisms: regular expressions, automata, grammars and Turing machines. In contrast, this course approaches the subject in a way that balances proof with experimentation, carried out using the instructor's Forlan toolset, which implements the algorithms of formal language theory. Forlan is embedded in the functional programming language Standard ML (SML), a language whose notation and concepts are similar to those of mathematics. SML is strongly typed and interactive, properties that make sophisticated experimentation robust and enjoyable.

Here is an example deterministic finite automaton that was synthesized using Forlan:

Example Deterministic Finite Automaton

Here is a description of what the automaton does, i.e., what language it accepts.

The course features a rigorous approach to carrying out mathematical proofs, focusing on basic set theory, definitional techniques including various forms of recursion, and proof techniques including different kinds of induction.

Formal prerequisites: CAS CS 320 (Concepts of Programming Languages) and CAS CS 330 (Introduction to Analysis of Algorithms); or consent of instructor.

Informal prerequisites: you should have enough familiarity with the concepts of programming languages so that learning basic functional programming won't be too much of a stretch; and you should have reached an intermediate level of mathematical maturity, and be competent using proof techniques like mathematical induction, and proof by contradiction.

Resources

Textbook

We will be using the Spring 2022 draft of my (freely available!) textbook on formal language theory, entitled Formal Language Theory: Integrating Experimentation and Proof.

You may find it useful to also study the related sections of another book on formal language (automata) theory. Note, however, that the notation and definitions used by any particular book won't be identical to the ones we'll be using.

Forlan Toolset

Many of the algorithms that we will study are implemented as part of the Forlan toolset — a collection of tools for experimenting with formal languages that I've been developing since 2001. Forlan is implemented as a set of Standard ML (a functional programming language) modules. It's used interactively. In fact, a Forlan session is nothing more than an SML session in which the Forlan modules are available.

As the course progresses, you'll learn how to use Forlan, and you'll be using it when solving some of the problem sets. The textbook contains a brief introduction to SML, but more information can be obtained from L. C. Paulson's ML for the Working Programmer, second edition, Cambridge University Press, 1966. It is available online, for personal use. If you like learning from examples, here's a brief tutorial on SML.

The best way to run Forlan or SML is as a sub-process of the Emacs text editor, using the SML mode for Emacs. We'll also be using JForlan, a Java program for creating and editing Forlan automata and trees.

See the Forlan website for instructions for installing Forlan on personal computers. Forlan is also installed on the CS Department's Linux servers csa1.bu.edu, csa2.bu.edu and csa3.bu.edu. When connecting to one of these servers using the secure shell (ssh) command on Linux or macOS, you must use the -Y option, so as to enable "trusted X11 forwarding". Otherwise, you won't be able to use JForlan. I.e., run one of the commands

  ssh -Y csa1.bu.edu
  ssh -Y csa2.bu.edu
  ssh -Y csa3.bu.edu

See the file /home/fac3/stough/CS516-README for how to modify your shell's PATH variable to add sml, forlan and jforlan to your shell's search path.

Assessment

There will be seven problem sets, each worth 100 points, plus a course project or final exam (students will be asked to choose one or the other), worth 200 points.

Each assessment unit will be graded using the following 100 point scale: A+ (100), A (92), A- (84), B+ (76), B (68), B- (60), C+ (52), C (44), C- (36), D+ (28), D (20), D- (12), F (0). (E.g., a grade of 80 points should be thought of as being halfway between a B+ and an A-.)

Late work will be assessed a penalty of 20% during the first twenty-four hours. Work that is more than twenty-four hours late will not be graded.

The elegance and simplicity of your work will be taken into account when it is assessed. I reserve the right to award extra credit to students who find errors in, or suggest improvements to, course materials.

A student's grade for the course will be the letter (possibly followed by a + or -) whose value is nearest to the weighted average of the grades of the student's problem sets and course project/final exam. Ties will be resolved by selecting the better of the two grades.

When working on a problem set, you may optionally work with a single other student — i.e., work in a pair. If you opt to do this, the two of you will submit a single solution to the problem set, with both of your names on it. You will then receive a joint grade for the problem set. If there is a part of the problem set solution that only one of you understands, you must note that discrepancy in your submission. In that case, you and your partner's grades may be adjusted to reflect that difference in understanding.

Apart from pair work on problem sets, the work you hand in must be your own. You may discuss problem sets with others in general terms, but you may not base your work on other people's work, and you may not show your draft work to others.

If you use resources other than course materials when working on problem sets or the course project (if you opt for a project instead of a final exam), you must cite your sources.

Academic Integrity

You are responsible for reading and understanding BU's Academic Conduct Code (for undergraduates)

www.bu.edu/academics/policies/academic-conduct-code

or the GRS Academic and Professional Conduct Code (for graduate students)

www.bu.edu/cas/files/2017/02/GRS-Academic-Conduct-Code-Final.pdf

Incidents of academic misconduct will be reported to the Academic Conduct Committee (ACC). If the ACC finds a student guilty, punishment could range from a minimum of a grading sanction (e.g., dropping your final grade by one letter grade) all the way up to suspension or expulsion from the University.

Problem Solving Sessions, Office Hours, Piazza, Gradescope and GitHub

I'll be offering weekly problem solving sessions in which I'll solve problems on the blackboard, show how Forlan is used, and answer questions. They will be on Tuesdays from 6:30-7:45pm in MCS B33.

My office hours will be on Wednesday from 2:30-4:30pm in MCS 122. You can also make an appointment with me at an alternative time, either in person or via Zoom.

We'll be using Piazza for announcements and online discussion. Visit piazza.com/bu/spring2022/cs516 to join in.

You will be submitting solutions to assignments via Gradescope. I will post the entry code on Piazza.

So that you can privately submit Forlan/sml code in a machine-readable form, you will need to create a private GitHub repository and grant me (GitHub account: alleystoughton) access to it. If you don't already have a GitHub account, you will need to create one first.

LaTeX Document Preparation System

When preparing solutions to assignments, you may wish to make use of the LaTeX document preparation system. LaTeX handles mathematics very well, and is widely used by academic computer scientists.

Here are some LaTeX macros I've used in writing the book, lecture slides and problem sets: defs.tex and commands.tex.

Class Sessions

Problem Sets

Final Project or Examination

Old Problem Sets

Old Final Exam


Alley Stoughton (stough@bu.edu)