# Compiler Construction 2012/2013 Instruction Selection

Peter Thiemann

December 5, 2012

## Optimal vs Optimum Tiling

#### **Optimal Tiling**

No two adjacent tiles can be replaced by a larger tile of lower cost.

#### **Optimum Tiling**

The total cost of the tiling is minimal among all possible tilings.

Tiling is optimum ⇒ tiling is optimal

## Implementation of Optimal Tiling

Maximal Munch Algorithm (Top Down)

```
Temp munchExpr (Tree.Exp e) {
  test patterns from largest to smallest
  choose the first matching pattern
         with instruction INS
  foreach (e i : wildcard (pattern, e))
    recursively invoke temp i = munchExpr (e i)
  emit INS using temp i as arguments
         putting result into new temp 0
 return temp_0
```

## Optimum Tiling



pattern instr tile cost wildcard cost total cost CONST ADDI 1 0 1

## Optimum Tiling

Example (cont'd)

| pattern<br>_ | instr | tile cost | wildcard cost | total cost |
|--------------|-------|-----------|---------------|------------|
| +            | ADD   | 1         | 1+1           | 3          |
| CONST        | ADDI  | 1         | 1             | 2          |
| CONST        | ADDI  | 1         | 1             | 2          |

## **Optimum Tiling**

Example (cont'd)



## Optimum Tiling Emitted Code

ADDI 
$$r_1 \leftarrow r_0 + 1$$
  
LOAD  $r_1 \leftarrow M[r_1 + 2]$ 

## Implementation of Optimum Tiling

Dynamic Programming (Bottom Up)

```
void matchExpr (Tree.Exp e) {
  for (Tree.Exp kid : e.kids())
    matchExpr (kid);
  cost = INFINITY;
  for each pattern P_i
    if (P_i.matches (e)) {
      cost i = cost(P i)
             + sum ((wildcard (P i, e)).mincost);
      if (cost i < cost) {
        cost = cost i; choice = i;
  e.matched = P_{choice}
  e.mincost = cost
```

## Implementation of Optimum Tiling

Collecting the Match (Top Down)

```
Temp emission (Tree.Exp e) {
  foreach (e_i : wildcard (e.matched, e)) {
    temp_i = emission (e_i)
  }
  emit INS using temp_i as arguments
        putting result into new temp_0
  return temp_0
}
```

## Implementation of Pattern Matching

- Additional side conditions (e.g., size of constants, special constants)
- Matching of patterns can be done with a decision tree that avoids checking the same node twice
- The bottom up matcher can remember partial matches and avoid rechecking the same nodes
- ⇒ tree automata

#### Tree Automata

A bottom-up tree automaton is  $\mathcal{M} = (Q, \Sigma, \delta, F)$  where

- Q is a finite set of states
- Σ a ranked alphabet (the tree constructors)
- $\delta \subseteq \bigcup_n \Sigma^{(n)} \times Q^n \times Q$  the transition relation
- $F \subseteq Q$  the set of final states

 $\mathcal{M}$  is deterministic if  $\delta$  is a function.

Define 
$$\Rightarrow$$
 on  $\mathcal{T}_{\Sigma+Q}$  by

$$t[F(q_1,\ldots,q_n)]\Rightarrow t[q_0]$$
 if  $(F,q_1,\ldots,q_n,q_0)\in\delta$ 

$$t \in L(\mathcal{M})$$
 if  $t \Rightarrow^* q$  with  $q \in F$ 

#### Tree automaton for



$$Q = \{q_t, q_c, q_a, q_m\}$$

• 
$$F = \{q_m\}$$

| $\bullet$ $\delta =$ | Σ     | $q_1$          | $q_2$ | q <sub>out</sub> |
|----------------------|-------|----------------|-------|------------------|
|                      | CONST |                |       | $q_c$            |
|                      | TEMP  |                |       | $q_t$            |
|                      | +     | $q_c$          | $q_t$ | $q_a$            |
|                      | MEM   | q <sub>a</sub> |       | $q_m$            |

### Optimum Tiling with Tree Automata

- Generate a bu tree automaton for each pattern
- Simulate them in parallel on expression tree
- At each node
  - determine all patterns whose root matches the current node
  - compute their cost and mark the node with the minimum cost pattern
- There are tools to compile a pattern specification to such an automaton ⇒ BURG (Fraser, Hanson, Proebsting)

#### **Tree Grammars**

- Extension: Different pattern sets leading to different kinds of results
- Some architectures habe different kinds of registers that obey different restrictions
- Set of patterns for each kind of register
- Example: M680x0 distinguishes data and address registers, only the latter may serve for address calculations and indirect addressing
- ⇒ Tree grammar needed

## A <u>context-free tree grammar</u> is defined by $G = (N, \Sigma, P, S)$ where

- N is a finite set of non-terminals
- Σ is a ranked alphabet
- $S \in N$  is the start symbol

• 
$$P \subseteq N \times T_{\Sigma+N}$$

Define  $\Rightarrow$  on  $T_{\Sigma+N}$  by

$$t[A] \Rightarrow t[r]$$
 in  $A \rightarrow r \in P$ 

$$t \in L(\mathcal{G}) \text{ if } S \Rightarrow^* t \in T_{\Sigma}$$

#### **Tree Grammars**

Example: The Schizo-Jouette Architecture (Excerpt)

| Instruction | Effect                      | Pattern                                |
|-------------|-----------------------------|----------------------------------------|
|             |                             | $D \rightarrow +$                      |
| ADD         | $d_i \leftarrow d_j + d_k$  | D D                                    |
|             |                             | $D \rightarrow +$                      |
| ADDI        | $d_i \leftarrow d_j + c$    | D CONST                                |
| MOVEA       | $d_i \leftarrow a_j$        | $	extcolor{D}  ightarrow 	extcolor{A}$ |
| MOVED       | $a_i \leftarrow d_j$        | A 	o D                                 |
|             |                             | $D \rightarrow MEM$                    |
|             |                             |                                        |
|             |                             |                                        |
| LOAD        | $d_i \leftarrow M[a_j + c]$ | A CONST                                |

## Efficiency of Tiling

- N number of nodes in input tree
- T number of patterns
- K average number of labeled nodes in pattern
- K' maximum number of nodes to check for a match
- T' average number of patterns that match at each node
- **Maximal munch.** Each match consumes K nodes: test for matches at N/K nodes. At each candidate node, choose pattern with K' + T' tests.
  - (K' + T')N/K steps on average. Worst case: K = 1.
- **Dynamic programming.** Tests every pattern at every node: (K' + T')N.
- $\Rightarrow$  same linear worst-case complexity. (K' + T')/K is constant, anyway.



#### CISC vs RISC

#### Challenges for Instruction Selection and Register Allocation

| RISC                            | CISC                                         |  |  |
|---------------------------------|----------------------------------------------|--|--|
| 32 registers                    | few registers (16, 8, 6)                     |  |  |
| one class of registers          | different classes with restricted operations |  |  |
| ALU instructions only be-       | ALU operations with mem-                     |  |  |
| tween registers                 | ory operands                                 |  |  |
| three-adress instructions       | two-address instructions                     |  |  |
| $r_1 \leftarrow r_2 \oplus r_3$ | $r_1 \leftarrow r_1 \oplus r_2$              |  |  |
| one addressing mode for         | several addressing modes                     |  |  |
| load/store                      |                                              |  |  |
| every instruction 32 bits long  | different instruction lengths                |  |  |
| one result / instruction        | instructions w/ side effects                 |  |  |
|                                 |                                              |  |  |

## **CISC Examples**

#### Pentium / x86 (32-bit)

- six GPR, sp, bp (+ 8 registers in 64-bit mode)
- multiply / divide only on eax, indexing restricted
- generally two-address instructions

#### MC 680x0 (32-bit)

- 8 data registers, 7 address registers, 2 stack registers
- ALU operations generally on data registers, indirect addressing only through address registers
- generally two-address instructions
- esoteric addressing modes (68020)
- scope entry and exit instructions



### Challenges

- [Few Registers] generate temporaries and rely on register allocation
- [Restricted Registers] generate extra moves and hope that register allocation can get rid of them. Example:
  - Multiply on Pentium requires one operand and destination in eax
  - Most-significant word of result stored to edx

Hence for  $t_1 \leftarrow t_2 \cdot t_3$  generate

```
mov eax, t_2 eax \leftarrow t_2 mul t_3 eax \leftarrow eax \leftarrow eax \cdot t_3; edx \leftarrow MSW(t_2 \cdot t_3) mov t_1, eax t_3 \leftarrow eax
```

## Challenges II

#### [Two-address instructions]

Generate extra move instructions.

For 
$$t_1 \leftarrow t_2 + t_3$$
 generate

mov 
$$t_1, t_2$$
  $t_1 \leftarrow t_2$  add  $t_1, t_3$   $t_1 \leftarrow t_1 + t_3$ ;

#### [Special addressing modes]

Example: memory addressing

```
mov eax,[ebp-8]
add eax, ecx add [ebp-8], ecx
mov [ebp-8], eax
```

#### Two choices:

- Ignore and use separate load and store instructions. Same speed, but an extra register gets trashed.
- 2 Avoid register pressure and use addressing mode. More work for the pattern matcher.

## Challenges III

- [Variable-length instructions]
   No problem for instruction selection or register allocation.
   Assembler deals with it.
- [Instructions with side effects]
   Example: autoincrement after memory fetch (MC 680x0)

$$r_2 \leftarrow M[r_1]; \qquad r_1 \leftarrow r_1 + 4$$

Hard to incorporate in tree-pattern based instruction selection.

- Ignore...
- Ad-hoc solution
- Openition of the property o

## Abstract Assembly Language

Output of Instruction Selection

Class hierarchy for representing instructions



Each instruction specifies a

- set of defined temporaries
- set of used temporaries
- set of branch targets

each of which may be empty

## Abstract Assembly Language

Creating an Operation



Independent of register allocation and jump labels

An operation's def and use set must account for  $\underline{all}$  defined and used registers.

Example: the multiplication instruction on Pentium

- Example: a procedure call trashes many registers (see the calling convention of the architecture)
  - return address
  - return-value register
  - caller-save registers