APOLLO - Automatic speculative POLyhedral Loop Optimizer is a compiler framework dedicated to automatic, dynamic and speculative parallelization and optimization of programs' loop nests. It is developed at Inria in the CAMUS Team and at the University of Strasbourg in the CNRS laboratory ICube, France. This framework allows a user to mark in a C/C++ source code some nested loops of any kind (for, while or do-while loops) in order to be handled by a speculative parallelization process, to take advantage of the underlying multi-core processor architecture. The framework is composed of two main parts. First, extensions to the CLANG-LLVM compiler are devoted to prepare the program for being handled by the runtime system. This phase consists of generating several versions of each target loop nest and several code snippets called "code bones":
- an instrumented version, where every memory instruction, as well as some scalar updates are associated with instructions that collect at runtime, the referenced memory addresses or values assigned to the scalars.
- an original version which corresponds to the sequential execution of the target code.
- several code bones, which are parametrized code snippets in LLVM intermediate representation and which are instantiated and assembled at runtime to generate optimized and parallelized code, as soon as an efficient transformation of the original code has been decided. These code bones are either dedicated to the verification of the speculation or to computations of the original code or both.
During the execution, the runtime system orchestrates the launch of the different code versions through a mechanism of sliding windows (chunks). Each target loop nest is run as a sequence of slices of the outermost loop. Each of these slices is run by using one of the versions generated at compile-time. The scenario is the following:
- at start-up, the runtime system launches a first slice using the instrumented version. This slice is reduced to a tenth of iterations of the outermost loop, in order to limit the time-overhead due to sequential execution and instrumentation.
- using the memory addresses and the values of some scalars that have been collected, the runtime system builds linear interpolation or regression functions for each instrumented instruction, when possible. These functions from a "prediction model".
- the prediction is then used to select an optimizing and parallelizing transformation for each target loop nest. This transformation is speculative, since it is based on the analysis of a small slice of each target loop nest. The optimizing transformation that may be a combination of loop transformations as interchange, shifting, skewing, tiling, fusion or fission, is applied by instantiating and assembling the code bones. The LLVM Just-In-Time compiler is then launched to generate the final executable code.
- the resulting parallel code is launched in a chunk of significant size. The instructions devoted to the control of the speculation validate the execution. In case of invalidation, the chunk is re-executed using the original sequential version and an instrumented version is re-launched, in order to detect the change of behavior and to enable new parallelization of the code.
If you are interested in APOLLO, we strongly encourage you to sign up for the
APOLLO Mailing List
for receiving news, launching discussions, sharing experiences and reporting bugs.
The APOLLO project welcomes contributors.