
Proceedings
home
preface
contents
authors
keywords
copyright
reference
©2012 Civil-Comp Ltd |
 |
 |
 |
C. Chevalier, G. Grospellier, F. Ledoux and J.C. Weill
CEA, DAM, DIF, Arpajon, France
Keywords: partitioning, load balancing, distributed memory, high performance computing.
full paper (pdf) -
reference
On large computers, numerical simulation codes run using
distributed memory on a large number of processing units. This implies that many technical issues must be addressed to achieve good performance including: data
distribution, data exchange between processes, load balancing,
etc. In an industrial context, this can be addressed by using
a development framework such as Arcane. Arcane manages all technical
aspects while ensuring high performance using
thousands of cores. In this paper the focus is on improving the timings for
various dynamic mesh based multi-physics simulations by optimising the
data distribution.
The goal is to determine the assignment of the mesh entities on the
processors in a way that the overall running time of the simulation is
the lowest. This is mainly a load balancing problem which can be
solved as a mesh partitioning problem. Graph or hypergraph
partitioning tools, such as ParMetis, Scotch or Zoltan can be used to
solve it. The paper demonstrates how to efficiently use these tools to solve load balancing issues.
Through dynamic multi-physics experiments it is shown that good
framework driven load balancing can be achieved by using appropriate
characterisations of computational costs. It is shown that a fully
automated characterisation based on timing measures is generally not
sufficient to obtain good load balancing, especially for multi-step
simulations. Ideally, each phase of the simulation might be
characterised by an elementary cost associated with each cell of the
mesh. For dynamic simulations, timings, even one for each physics phase, do not
perfectly fit.
In this paper, an approach is proposed, that requires a minimal
instrumentation of
physics code implementation to count elementary costs. Using the approach, a mono or multi criteria graph partitioning can be used to achieve
better quality mesh distributions. For mono-criterion partitioning, it is shown that the Arcane framework can be used to perform a
correct merging of each phase criterion into a single one by load
balancing the memory, which can be accurately computed knowing only
the number of entities as their associated data are Arcane variables.
Using these methods, good speed-ups can be achieved on
large complex two- or three-dimensional simulations with millions of mesh cells and
millions of highly mobile particles, specially at a large scale (thousands of
MPI processes). These methods and observations are not specific to the
Arcane framework and may also benefit any dynamic simulation codes.
Some issues with standard graph models for mesh and current graph partitioning tools are described. Some have to be solved in the
future to enable really efficient exascale computing for this kind of
simulation.
|