
Proceedings
home
preface
contents
authors
keywords
copyright
reference
©2012 Civil-Comp Ltd |
 |
 |
 |
M. Schanen, M. Foerster, J. Lotz, K. Leppkes and U. Naumann
LuFG Informatik 12, Software and Tools for Computational Engineering, RWTH Aachen University, Germany
Keywords: algorithmic differentiation, adjoint MPI, adjoint OpenMP.
full paper (pdf) -
reference
Numerical simulation software is generally run on multi-core parallel
architectures. This trend implies hybrid parallelization schemes consisting of
both distributed and shared-memory programming models. The de facto standard for
distributed memory is the message passing interface (MPI) [1].
MPI is used to decompose the workload into large chunks which are distributed
onto computer nodes. Additionally, each node is composed of several cores that
access the same memory locations over a common physical memory. Hence, we assume
that the core numerical problem, called kernel, is distributed among the
nodes through MPI. On each node the kernel is assumed to use OpenMP
[2] for shared-memory parallelization.
Numerical simulation and optimization typically rely on robust and efficient
derivative information. [3] The authors prefer the adjoint model
resulting from the associativity of the chain rule. Algorithmic Differentiation
applies this model semi-automatically by transforming a given original code into
its derivative equivalent where in addition to the values, derivatives are
computed. Thus, a potentially tedious implementation of the derivative code by
hand is avoided.
No existing AD tool is able to generate the derivative code of a hybrid parallel
implementation automatically. In this paper this is achieved by using both categories of tools
(source transformation and overloading) to implement the adjoint derivative
model. At runtime, crucial information for adjoining OpenMP pragmas is missing.
Therefore only a source transformation tool (e.g. compiler) parsing these
pragmas, is able to adjoin OpenMP code. Moreover, parsing the entire code using
an AD tool is a difficult task such that no tool has ever completely achieved or even
strived for, since the additional effort far outweighs the benefits. As MPI
resides mostly on a higher layer of an application, this is in particular true
for adjoint MPI. Hence an overloading AD tool is used for adjoining MPI.
To motivate and illustrate the authors approach, a distributed dense
matrix multiplication based on the Cannon algorithm [4] is implemented,
serving as an emulation of large-scale simulation codes, covering both the
distribution of the input problem using MPI as well as a local computation of a
kernel using OpenMP.
- 1
- W. Gropp, E. Lusk, A. Skjellum, "Using MPI: Portable Parallel Programming with the Message Passing Interface", MIT Press, 1994.
- 2
- OpenMP Architecture Review Board, "OpenMP Application Program Interface", Specification, 2008.
- 3
- A. Griewank, A. Walter, "Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation", 2nd Edition, SIAM, Philadelphia, 2008.
- 4
- L.E. Cannon, "A Cellular Computer to implement the Kalman Filter Algorithm", 1969.
|