Getting the hang of polyhedral compilation

The polyhedral model is an optimization and parallelization technique used to speed up the execution of loops. The main idea is to create a mathematical abstraction of a program and use it to exploit the target device’s architecture through the design of sophisticated optimization heuristics. It’s called polyhedral compilation, but modeling through polyhedra is neither required nor sufficient, in fact it is possible to obtain the same optimizations with other techniques.

Case study

Before defining some common concepts, it’s probably better to provide a simple example that shows at least one use case of this technique.

Let’s take the following code as example:

1
2
3
4
for (int i = 0; i < 5; i++)
    A[i] = 2*A[i] + 1;
for (int i = 0; i < 5; i++)
    B[i] = A[5 - i] - i;

In the code above the two loops can be merged into one, becoming:

1
2
3
4
for (int i = 0; i < 5; i++){
    A[i] = 2*A[i] + 1;
    B[5 - i] = A[i] - 5 + i ;
}

The new loop requires less time to execute since it can better exploit the pipeline functionality of the processor and can also be further optimized.

Notice that, on average polyhedral compilation helps the code to obtain a significant speed up, but in some cases can also decrease the performance of the original code.

Definitions

Now we define the objects needed to create the mathematical model on which different operations can be performed to optimize the loop.

Static Control Parts (SCoP)

SCoPs are defined as a set of consecutive statements in loop nests with affine bounds and conditional expressions. To be fair, some restriction needs to be included in this definition, like, no function call in a SCoP, only affine access to memory, etc.; otherwise, we cannot characterize the iteration domain of the loop statically.
We skip some of those details, and we will only use code that respects that definition, besides this is only a post to get the hang of this technique.

Example 1 of SCoP:

1
2
3
4
5
for (int i = 0; i < N; i++){
    for (int j = 0; j < M; j++){
        A[i][j] = i+j;
    }
}

The SCoP of this code is given by only one statement: S1: A[i][j] = i + j .

Example 2 of SCoP:

1
2
3
4
5
6
7
8
9
for (int i = 0; i < N; i++)
    A[i] = 0;

for (int i = 0; i < N; i++){
    A[i] = i;
    for (int j = 0; j < M; j++){
        B[j] = A[i] - j;
    }
}

The SCoP of this code is given by the statements:

S1: A[i] = 0
S2: A[i] = i
S3: B[j] = A[i] - j

Iteration Domain

One of the first mathematical objects needed to describe the polyhedral model is the iteration domain. The iteration domain is a set generated by the statement instances. A statement instance is the execution of a statement exactly once and the set of all the instances can be seen as the ’execution space’ of the loop.

xkcd

Let’s take for example:

1
2
3
for (int i = 0; i < N; i++){
    A[i] = i;
}

The instance for i = 0 of statment S1: A[i] = i is $ S_1(0) = 0 $, for i = 5 we have $ S_1(5) = 5 $ and so on.
What about nested loop? The same:

1
2
3
4
5
for (int i = 0; i < N; i++){
    for(int j = 0; j <N; j++){
        A[i][j] = i + j;
    }
}

In this case for statment S1: A[i][j] = i + j and i = 5 , j = 2 we have

$$ S_1 \left( \begin{smallmatrix} i \\ j \end{smallmatrix} \right) = S_1 \left( \begin{smallmatrix} 5 \\ 2 \end{smallmatrix} \right) = 5 + 2 = 7 $$ where the vector $\vec{i_s} = \left( \begin{smallmatrix} i \\ j \end{smallmatrix} \right)$ is called iteration vector.

Of course, we cannot precompute all the instances since we don’t know how many they are, and in general, we also don’t know the results. That’s why a more formal definition of the iteration domain can be useful: $$ \mathscr{D}_s\left(\vec{p}\right) = \left\{() \rightarrow \vec{i_s} \in \mathbb{Z}^{dim(\vec{i_s})} \middle| \left[D_s\right] \begin{pmatrix} \vec{i_s} \\ \vec{p} \\ 1 \end{pmatrix} \geq \vec{0} \right\} $$ where $D_s \in \mathbb{Z}^{ m_D \times \left( dim(\vec{i_s}) + dim(\vec{p}) + 1 \right) }$ is an integer matrix, $m_D$ is the number of contraints and $\vec{p}$ is the vector of parameters.

For example, the code:

1
2
3
4
5
for (int i = 0; i < N; i++){
    for (int j = 0; j < M; j++){
        A[i][j] = i+j;
    }
}

will have as domain:

$$ \mathscr{D}_s\left(N,M\right) = \left\{() \rightarrow {i \choose j} \in \mathbb{Z}^{2} \middle| \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ -1 & 0 & 1 & 0 & -1 \\ 0 & -1 & 0 & 1 & -1 \end{bmatrix} \begin{pmatrix} i \\ j \\ N \\ M \\ 1 \end{pmatrix} \geq \vec{0} \right\} $$

The numbers in the matrix $D_s$ represent the coefficients of the variables that define the iteration domain. If we compute the matrix-vector multiplication, we get: $$ \begin{pmatrix} i \\ j \\ -i + N - 1\\ -j + M - 1 \\ 1 \end{pmatrix} \geq \vec{0} \implies \begin{pmatrix} i \\ j \\ N - 1\\ M - 1 \\ 1 \end{pmatrix} \geq \begin{pmatrix} 0 \\ 0 \\ i \\ j \\ 1 \end{pmatrix} $$

that is the set of all ‘boundiers’ of our domain.\

Graphically you have:

To show that local variables can be used and an iteration domain can be unions of basic relations, e.g., when a condition in the input code splits the iteration domain into a union of disjoint polyhedra, the next example can be useful.

1
2
3
4
5
6
7
8
for (int i = 0; i < 2 * N; i += 2){
    for(int j = i; j < M; j++){
        if( i == j || i == 0)
            A[i][j] = i + j;
        else
            A[i][j] = -1;
    }
}

$$ \mathscr{D}_{s_1}\left(N,M\right) = \left\{() \rightarrow {i \choose j} \in \mathbb{Z}^{2} \middle| \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ -1 & 0 & 1 & 0 & -1 \\ 0 & -1 & 0 & 1 & -1 \end{bmatrix} \begin{pmatrix} i \\ j \\ N \\ M \\ 1 \end{pmatrix} \geq \vec{0} \right\} $$

$$ \mathscr{D}_{s_2}\left(N,M\right) = \left\{() \rightarrow {i \choose j} \in \mathbb{Z}^{2} \middle| \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ -1 & 0 & 1 & 0 & -1 \\ 0 & -1 & 0 & 1 & -1 \end{bmatrix} \begin{pmatrix} i \\ j \\ N \\ M \\ 1 \end{pmatrix} \geq \vec{0} \right\} $$ In this case the iteration domain is given by: $$ \mathscr{D}_s\left(N,M\right) = \mathscr{D}_2 \left( N,M\right) \bigcup \mathscr{D}_1 \left( N,M\right) $$

Mapping Relations

We just briefly saw the concept of iteration domains, but we can notice that it does not contain any information about the execution order of the statements instances. This is why we need another object to describe it. Usually to scheudle the statements instances for a given statement $S$, $\Theta_s$ is used. For example:

1
2
3
S1: x = a - c
S2: y = d * x
S3: z = f/b

The first two statements cannot be executed ad the same time since $S_2$ needs to wait for $S_1$ to compute x. On the other hand, $S_3$ can be parallelized. The schedule function for the code above would be: $$ \begin{align*} \Theta_{S_1}&=\{() \rightarrow (1)\} \\ \Theta_{S_2}&=\{() \rightarrow (2)\} \\ \Theta_{S_3}&=\{() \rightarrow (1)\} \end{align*} $$

The numbers on the right of the arrow represent the time (or the date) at which the statement can be executed. Those values can have more than one dimension and usually follow a lexicographic order. Like always, a more formal definition is always helpful, so now we have:

$$ \Theta_S\left(\vec{p}\right) = \left\{\vec{i_S} \rightarrow \vec{t_S} \in \mathbb{Z}^{dim(\vec{i_S})} \times \mathbb{Z}^{dim(\vec{t_S})} \middle| \left[T_s\right] \begin{pmatrix} \vec{t_S} \\ \vec{i_S} \\ \vec{p} \\ 1 \end{pmatrix} \geq \vec{0} \right\} $$

where $\vec{i_S}$ is the iteration vector and $T_S \in \mathbb{Z}^{m_{\Theta_S}\times (dim(\vec{i_S}) +\vec{t_S} +\vec{p} + 1)} $ is an integer matrix and $m_{\Theta_S}$ is the number of contraitns. There are multiple functions that can encode the relation $\vec{i_S} \rightarrow \vec{t_S}$. One way is to create an abstract syntax tree of the program in which: each node is a loop, statements are leaves and arcs connect loops with their internal loops and statements. The labels tell the order of loops or statements in the code. Now as usual, let’s see an example:

1
2
3
4
for(int i = 0; i < N; i++){
  S1:  a[i] = 1 << i;
  S2:  b[i] = a[i] + 1;
}

The associate Abstract Tree is: abt

According to the Tree above the scheduling relation is: $$ \Theta_{S_1}\left(N\right) = \left\{(i) \rightarrow \begin{pmatrix} \vec{t}^1_{S_1} \\ \vec{t}^2_{S_1} \\ \vec{t}^3_{S_1} \\ \vec{t}^4_{S_2} \\ \end{pmatrix} \in \mathbb{Z} \times \mathbb{Z}^{4} \middle| \left[ \begin{matrix} -1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & -1 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & -1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & -1 & 0 & 0 & 1\\ \end{matrix} \right] \begin{pmatrix} \vec{t}^1_{S_1} \\ \vec{t}^2_{S_1} \\ \vec{t}^3_{S_1} \\ \vec{t}^4_{S_2} \\ i \\ N \\ 1 \end{pmatrix} = \vec{0} \right\} $$

which is $ \Theta_{S_1}\left(N\right) = \begin{pmatrix} 0 \\ i \\ 0 \\ 1 \\ \end{pmatrix} $

Memory Accesses Relations

Our polyhedral model needs yet another key mathematical object to be able to describe the relations between memory access.

$$ \mathcal{A_{S,r}}\left(\vec{p}\right) = \left\{\vec{i_S} \rightarrow \vec{a_{S,r}} \in \mathbb{Z}^{dim(\vec{i_S})} \times \mathbb{Z}^{dim(\vec{a_{S,r}})} \middle| \left[A_{S,r}\right] \begin{pmatrix} \vec{a_{S,r}} \\ \vec{i_S} \\ \vec{p} \\ 1 \end{pmatrix} \geq \vec{0} \right\} $$

where $r$ is the reference number associate to the statement $S$, $\vec{i_S}$ is the iteration vector, $\vec{a_{S,r}}$ is the access vector and $ A_{S,r} \in \mathbb{Z}^{m_{A_{S,r}} \times (dim(\vec{i_S})+dim(\vec{a_{S,r}})+dim(\vec{p})+1)} $ is an integer matrix and $m_{A_{S,r}}$ is the number of constraints.

For example:

1
2
3
for(int i = 0; i < N; i++){
    a[i + 1] = i;
}

generates the following Access relation:

$$ \mathcal{A_{S_1,1}}\left(N\right) = \left\{(i) \rightarrow \vec{a_{S_1,1}} \in \mathbb{Z} \times \mathbb{Z} \middle| \left[ \begin{matrix} -1 & 1 & 0 & 1 \\ \end{matrix} \right] \begin{pmatrix} \vec{a_{S_1,1}} \\ i \\ N \\ 1 \end{pmatrix} = \vec{0} \right\} $$

Conclusion

Hopefully, with this, you should have a rough understanding of what each object is abstracting and how to generate them. All those definitions can be found in Cedric Bastoul Ph.D. thesis. I’ve only added some different examples and left out some details :). In the next post, I’ll try to describe the operations that can be done with those objects.

Case study#

Definitions#

Static Control Parts (SCoP)#

Iteration Domain#

Graphically you have:#

Mapping Relations#

Memory Accesses Relations#

Conclusion#

References#