How Compiler Construction Remains Relevant (part 1)

Many students struggle with the compiler construction class here at UCI. I’d like to change that. Let me first lay down some groundwork, by giving some observations that I’ve made about the general lack of software engineering skills.

Our students don’t have enough coding practice. Specifically, they lack the discipline, guidelines, rules-of-thumb, and principles we all use to manage larger code bases. What’s particularly discouraging is that the compiler project isn’t even that large when compared to a real-world system. We’re talking ~2K lines compared to ~200K lines, it’s two orders of magnitude tinier than what they will encounter should they get a job as a software developer.

There are several factors contributing to this outcome. One is that, throughout their undergraduate career in CS, our students are doing assignments that are due weekly or bi-weekly. These assignments are small and highly specified (to cut down on code variation and make it easier to grade). By coding assignments that are about ~300 lines and completely independent of each other, our students never find out about how important it is to write clear, maintainable code. The code they write never grows large enough to be an issue, and they don’t have to work on it after the due date. They never encounter that familiar pain that we all feel when we uncover a new fact that invalidates an assumption made much earlier in the development cycle, requiring a huge rewrite.

We also don’t have mandatory lab attendance after the beginning courses. Although some of the really bright students (who use their own initiative to read more about what is only alluded to in lecture) would hate it for us to make lab attendance mandatory, I think this is the best time and place to talk about things like code cleanliness, organization, clear naming conventions, design patterns, refactorings, and all the other techniques of the programming trade.

It’s not that these things are entirely missing, it’s that they aren’t discussed explicitly, by name, during lecture. If you examine the code framework and suggested code layout for the assignments in the beginning data structures class, you can see the visitor pattern, strategy pattern, and even some mock object testing. We’re trying to teach by example, hoping our students will pick it up by osmosis, and it’s not working.

Even though our students take compilers as one of their later class, and we as educators assume knowledge of Java and OO-Design, our students are not equipped for success. For example, I got submissions where students were using Strings when something like Symbol would be much better, and where String was used as the the type for all fields of the Symbol class. One of my fellow TA’s even got a submission for a parser that had huge chains of if-else logic!!! sufficient for handling every test case we handed out, rather than following the recursive-descent pattern that was given in discussion.

Repetition is how we learn. To repeatedly perform a task in a sub-optimal manner, one learns that pattern so much that it holds back further development. After repeatedly coding in an unorganized manner, barely adequate for small assignments, our students have internalized dirty habits that are now so ingrained that only a complete unlearning could repair the damage. By allowing this behavior to occur, we have set up our students for failure (both in the compiler class, and as software developers). As evidence of this, the bad design decisions made and poor coding practices used on the first assignment (parsing) readily take their toll once students have to rely on their own code in later components (code generation). This has earned the compiler class a reputation for being a tremendous amount of work. But, good code discipline proves it doesn’t have to be this way!!!

Although, it may already be too late, I’ve identified some things that will help us to do better as educators:

  1. Mention design patterns explicitly, and repeatedly.
  2. Show design trade-offs with concrete examples that give an explicit idea of when to choose one way vs. other options.
  3. Consider requiring a software engineering class that gives students practice with larger code bases, before allowing enrollment in compilers.

I’d really like for these lessons to be applied much earlier, so that students carry a much larger toolbox by the time they arrive on the doorstep of the compilers class.