Proceedings of ECT2012 - 7 Adjoining Hybrid Parallel Code




Proceedings home preface contents authors keywords copyright reference ©2012 Civil-Comp Ltd				Paper 7 Adjoining Hybrid Parallel Code M. Schanen, M. Foerster, J. Lotz, K. Leppkes and U. Naumann LuFG Informatik 12, Software and Tools for Computational Engineering, RWTH Aachen University, Germany Keywords: algorithmic differentiation, adjoint MPI, adjoint OpenMP. full paper (pdf) - reference Numerical simulation software is generally run on multi-core parallel architectures. This trend implies hybrid parallelization schemes consisting of both distributed and shared-memory programming models. The de facto standard for distributed memory is the message passing interface (MPI) [1]. MPI is used to decompose the workload into large chunks which are distributed onto computer nodes. Additionally, each node is composed of several cores that access the same memory locations over a common physical memory. Hence, we assume that the core numerical problem, called kernel, is distributed among the nodes through MPI. On each node the kernel is assumed to use OpenMP [2] for shared-memory parallelization. Numerical simulation and optimization typically rely on robust and efficient derivative information. [3] The authors prefer the adjoint model resulting from the associativity of the chain rule. Algorithmic Differentiation applies this model semi-automatically by transforming a given original code into its derivative equivalent where in addition to the values, derivatives are computed. Thus, a potentially tedious implementation of the derivative code by hand is avoided. No existing AD tool is able to generate the derivative code of a hybrid parallel implementation automatically. In this paper this is achieved by using both categories of tools (source transformation and overloading) to implement the adjoint derivative model. At runtime, crucial information for adjoining OpenMP pragmas is missing. Therefore only a source transformation tool (e.g. compiler) parsing these pragmas, is able to adjoin OpenMP code. Moreover, parsing the entire code using an AD tool is a difficult task such that no tool has ever completely achieved or even strived for, since the additional effort far outweighs the benefits. As MPI resides mostly on a higher layer of an application, this is in particular true for adjoint MPI. Hence an overloading AD tool is used for adjoining MPI. To motivate and illustrate the authors approach, a distributed dense matrix multiplication based on the Cannon algorithm [4] is implemented, serving as an emulation of large-scale simulation codes, covering both the distribution of the input problem using MPI as well as a local computation of a kernel using OpenMP. References 1 W. Gropp, E. Lusk, A. Skjellum, "Using MPI: Portable Parallel Programming with the Message Passing Interface", MIT Press, 1994. 2 OpenMP Architecture Review Board, "OpenMP Application Program Interface", Specification, 2008. 3 A. Griewank, A. Walter, "Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation", 2nd Edition, SIAM, Philadelphia, 2008. 4 L.E. Cannon, "A Cellular Computer to implement the Kalman Filter Algorithm", 1969.

Paper 7 Adjoining Hybrid Parallel Code

References

Paper 7

Adjoining Hybrid Parallel Code