TY - GEN
T1 - Towards a source level compiler
T2 - 2006 International Conference on Parallel Processing Workshops, ICPP 2006
AU - Ben-Asher, Yosi
AU - Meisler, Danny
PY - 2006
Y1 - 2006
N2 - Modulo scheduling is a major optimization of high performance compilers wherein the body of a loop is replaced by an overlapping of instructions from different iterations. Hence the compiler can schedule more instructions in parallel than in the original option. Modulo scheduling, being a scheduling optimization, is a typical backend optimization relying on detailed description of the underlying CPU and its instructions to produce a good scheduling. This work considers the problem of applying modulo scheduling at source level as a loop transformation, using only general information of the underlying CPU architecture. By doing so it is possible: a) Create a more retargeble compiler as modulo scheduling is now applied in source level, b) Study possible interactions between modulo scheduling and common loop transformations. c) Obtain a source level optimizer whose output is readable to the programmer, yet its final output can be efficiently compiled by a relatively "simple" compiler. Experimental results show that source level modulo scheduling can improve performance also when low level modulo scheduling is applied by the final compiler, indicating that high level modulo scheduling and low level modulo scheduling can co-exist to improve performance. An algorithm for source level modulo scheduling modifying the abstract syntax tree of a program is presented. This algorithm has been implemented in an automatic parallelizer (Tiny). Preliminary experiments yield runtime and power improvements also for the ARM CPU for embedded systems.
AB - Modulo scheduling is a major optimization of high performance compilers wherein the body of a loop is replaced by an overlapping of instructions from different iterations. Hence the compiler can schedule more instructions in parallel than in the original option. Modulo scheduling, being a scheduling optimization, is a typical backend optimization relying on detailed description of the underlying CPU and its instructions to produce a good scheduling. This work considers the problem of applying modulo scheduling at source level as a loop transformation, using only general information of the underlying CPU architecture. By doing so it is possible: a) Create a more retargeble compiler as modulo scheduling is now applied in source level, b) Study possible interactions between modulo scheduling and common loop transformations. c) Obtain a source level optimizer whose output is readable to the programmer, yet its final output can be efficiently compiled by a relatively "simple" compiler. Experimental results show that source level modulo scheduling can improve performance also when low level modulo scheduling is applied by the final compiler, indicating that high level modulo scheduling and low level modulo scheduling can co-exist to improve performance. An algorithm for source level modulo scheduling modifying the abstract syntax tree of a program is presented. This algorithm has been implemented in an automatic parallelizer (Tiny). Preliminary experiments yield runtime and power improvements also for the ARM CPU for embedded systems.
UR - http://www.scopus.com/inward/record.url?scp=34547272642&partnerID=8YFLogxK
U2 - 10.1109/ICPPW.2006.74
DO - 10.1109/ICPPW.2006.74
M3 - Conference contribution
AN - SCOPUS:34547272642
SN - 0769526373
SN - 9780769526379
T3 - Proceedings of the International Conference on Parallel Processing Workshops
SP - 298
EP - 305
BT - Proceedings of the 2006 International Conference on Parallel Processing Workshops, ICPP 2006
Y2 - 14 August 2006 through 18 August 2006
ER -