The Good IR: Other Control Flow Structures

This post is part of a seres:
The Good IR (BasicBlocks and control flow instructions)
The Good IR: Other Control Flow Structures
The Good IR: Instructions and Values
The Good IR: Reporting Semantic Errors via Type Checking
The Good IR: Multiple Returns
—-
In my last post on The Good IR, I had arrived at a representation with two invariants:

  1. Only BasicBlock’s carry information about the ControlFlowGraph; No other edges are allowed between BasicBlocks.
  2. All BasicBlocks end with a control transfer Instruction.

I focused that post solely on the if-then-else control flow structure. I would now like to demonstrate how those invariants translate to the while-loop and if-without-else. Although other languages have do-while, for-loop, and switch-case, I won’t be addressing those now. Instead I’m limiting my focus on a much simpler language (that used by CS241 here at UCI). I want to get a clear picture of how the if-else and while-loop work, before complicating the picture with Return. So, for this post, I’m assuming that we have ‘the usual case’ where the ‘stuff’ inside the body of a loop or branch of an if-then doesn’t proceeds ‘normally’ and doesn’t exit the function.

While Loop
while (cond) {
    // stuff
}

We should note two important aspects of the illustration:

  1. The body of the loop is fully general. That is, I wanted to make clear that the ‘stuff’ in the loop body might have control structures which cause a separation of the first block of the body from the last block. This separation is indicated by a series of small orange blocks.
  2. Not in the illustration is a restriction on the instructions in the loop header. During a parse, when the loop is encountered we must create a new block to store two items: The instructions for evaluating the loop condition and the ConditionalBranch. We do not include instructions for any statement prior to the loop condition, even if such statement appear in the source program. We do this because, we wish to evaluate only the conditional and no other instructions, when control is restored on the loop’s backward control flow edge.

Given these observations. The presence of UnconditionalJump and ConditionalBranch is straightforward, and needs no modification from the last post.

Consider the degenerate case, when the loop body is empty. We could choose to draw the control flow graph omitting the body node. We decide not to do so, because it creates a special case. Instead we prefer that the loop body contains at least one block, even if that block holds only an UncoditionalJump back to the loop header block.

Forcing all loops to have at minimum 3 nodes: the loop-header, the body, and the exit block; gives us a stable footing for control flow optimizations which are implemented later in the compiler. It also gives us another invariant: No block points back to itself, all blocks always point to other blocks.

If-NoElse
if (cond) {
     // stuff
}

The if-noelse is handled similarly. We do not have to create any instructions other than ConditionalBranch and UnconditionalJump. Again, we prefer to keep that ’empty’ else branch. Regularizing the if-then-else to a minimum of 4 blocks: the if-header, then-block, else-block, and join-block. Always creating these blocks definitely helps the control flow graph keep an organized and uniform appearance, especially when you consider the nesting of several different control flow structures. An example from the last post would be quite unmanageable without the presence of these ’empty’ blocks. In the if-then-else control structure, keeping these blocks aids in the placement of SSA φ-instructions, as well as control flow graph traversals.

Design Rule of Thumb: Don’t have the parser optimize away empty blocks, they’re actually useful to keep around (both for SSA and CFG traversals).