How Compiler Construction Remains Relevant (part 3)

Previously, in part 2, I arrived at the conclusion that the compilers class should focus on the design patterns and software engineering involved in constructing a compiler. I believe that our students would benefit more if the compiler is treated as a case-study, rather than the core focus. Recently, I ran across a paper[1] (which I’ll be quoting at length) that backs up this claim, and re-iterates some observations I made in part 1.

First, I think it’s incredibly motivating to students, when you can demonstrate to them that they’ve learned or improved. Direct demonstration is more memorable and compelling than just allusion, reference, or statement. Second, learning is a long and drawn out process, that doesn’t have concrete observable effects. Unlike manufacturing, you can’t really point at a physical or environmental change and declare ‘I made that’ or ‘I did that’. Learning modifies your neural pathways, so it is less directly observable. However, we still want to communicate the message that all those nights of studying, and all those hours spent hacking, do pay off.

In the undergraduate compiler design course at the University of Arizona, the 0th programming assignment has the students use lex and yacc to implement, in roughly 1 ½ weeks, a translator from (a subset of) LaTeX to (a subset of) HTML.

At that point, most students know very little about LaTeX, many don’t know a lot about HTML, and none of them know anything about lex and yacc. The goals of the assignment are twofold: first, to get the students acquainted with lex and yacc, in preparation for a more traditional project implementing a compiler for a subset of C; and second, to illustrate the applicability of these tools to other translation problems.

We use discussion sessions and on-line tutorials to give them just enough acquaintance with LaTeX and HTML so that the students know what they are doing. We revisit the problem in classroom discussions at the end of the term, when they are much better versed with these tools (lex and yacc); students often seem quite surprised and pleased to realize that they are now equipped to implement a nontrivial and practically useful piece of software, for a significant fragment of LaTeX, reasonably quickly and without a great deal of effort.

Compiler literature commonly breaks down the compiler into a series of important phases. In a compiler, each of these phases represents an important transformation between internal representations.

The execution of a compiler conceptually consists of four phases: (i) lexical analysis and parsing; (ii) semantic analysis; (iii) code generation; and (iv) code optimization.

But, these phases aren’t unique to compilers! They are generally applicable to any translation program. And, in the real-world, programmers are frequently asked to write software that performs automatic translation between two different representations.

The actions carried out by dot [a graph drawing program] when processing an input file … are as follows:

  1. Read in the graph specification using context-free parsing techniques.
  2. Check for semantic consistency between components of the specification. For example, edge_op must be ‘ – >’ for a graph of type digraph and ‘ – – ‘ for one of type graph.
  3. Construct an internal representation of the graph specified.
  4. Determine the “output” representation of this graph, i.e., where different nodes and edges will be placed and how they will look.
  5. Modify the output representation to improve its appearance, e.g., by reducing the number of unnecessary edge crossings (where one edge crosses over another) where possible.
  6. Generate the final PostScript for the graph.

It is not difficult to see that there is a close correspondence between this sequence of actions and those of a compiler: step (1) above corresponds to the lexical analysis and parsing phase of the compiler; step (2) to type checking; step (3) to syntax tree construction; step (4) to code generation; step (5) to code optimization; and step (6) to code generation.

So compilers, when looked at from the perspective of software engineering, are much more than they seem. Unfortunately (as I observed in part 1), unless we make the lessons of writing clear and maintainable code using design patterns and refactorings explicit, we should not call ourselves educators. Because, without learning these ideas and the associated vocabulary, our students are left without the ideas that will make them better, more capable programmers. Truly understanding a thing also implies you are able to communicate it, and share it with others.

However, students often seem to compartmentalize their knowledge, and thereby find it difficult to apply lessons from compiler design courses to other translation problems unless the underlying similarities between the problems are pointed out explicitly and repeatedly. Once they see the similarities, however, they find that using tools and techniques from compiler design can be very helpful. As an example, a few years ago, while teaching an undergraduate course on Formal Languages and Automata Theory, I asked one of my teaching assistants to write a software package to allow our students to specify various sorts of automata in the form of a text file and then simulate their behavior on input strings. Despite being a bright student and talented programmer, he struggled unsuccessfully with the construction of a front end for over a week, after which I suggested that he use lex and yacc to construct the front end. Once he was able to abstract away from specifications for automata and view this as just another translation problem, he was able to program up the front end in under a day.

I’ll leave this discussion with the conclusions reached in the paper[1].

Compiler design courses typically focus narrowly on the translation of high-level programming languages into low level assembly or machine code. Given that the majority of computer science students are unlikely to be involved in compiler design as a day-to-day professional activity, this limits the relevance of such courses to the students’ eventual careers. However, it is possible to generalize the traditional view and consider the problem of translating from a source language to a target language, where both the source and target languages are defined broadly, e.g., need not even be programming languages. Such a generalized view includes many translation problems, e.g., document formatting or graph drawing, that are not traditionally viewed as “compiler problems.” Viewing such translation problems in this way allows us to identify and understand essential underlying commonalities of the translation process.

This has several benefits, among them that the use of tools such as lex and yacc to generate the front end of a translator reduces development time, and that by relying on well-understood techniques and avoiding ad hoc approaches to the lexical analysis and parsing problems, reliability is enhanced. Overall, therefore, students benefit from having a deeper understanding of a variety of translation problems; being able to apply techniques and tools developed for compilers to other translation problems; and thereby being able to produce better code more quickly.

Compiler construction courses are still relevant, not because of the compiler itself, but because of the software engineering lessons.


[1]
@inproceedings{debray2002making,
title={Making compiler design relevant for students who will (most likely) never design a compiler},
author={Debray, S.},
booktitle={ACM SIGCSE Bulletin},
volume={34},
number={1},
pages={341–345},
year={2002},
organization={ACM}
}