Proceedings
home
preface
contents
authors
keywords
copyright
reference ©2012 Civil-Comp Ltd

Paper 6

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures

A. Akbariyeh1, T.J. Carrigan2, B.H. Dennis1, W.S. Chan1, B.P. Wang1 and K.L. Lawrence1
1University of Texas at Arlington, United States of America
2Pointwise Inc., Fort Worth, Texas, United States of America

Keywords: finite element, graphics processor, parallel processing, sparse matrix.

full paper (pdf) - reference

Finite element analysis for stress and deformation prediction has become routine in many industries. However, the analysis of complex three-dimensional geometries composed of millions of degrees of freedom is beyond the computing capacity of the typical desktop computer. Recent advances in commodity graphics processing units (GPUs) have opened a new route for computing large scale finite element solutions on the desktop computer in a timely fashion. The current generation of advanced GPU hardware is equipped with hundreds of streaming processors and offers the potential of teraflop performance. It is no surprise that this hardware is becoming increasingly popular for general scientific computing because of their high performance to cost ratio. However, the peak performance of the GPU is difficult to obtain and requires careful consideration regarding algorithms and implementation. In the case of finite element analysis, the most time consuming step is solving the resulting linear system of equations. This system is typically large, sparse, and unstructured. Often these systems are solved iteratively with a domain decomposition technique used to distribute the computational load among the parallel processors.

In this paper the benefits of using GPUs to improve the performance of the iterative sparse matrix solver for a finite element program is explored. The focus is specifically on hexahedral elements for linear elasticity with trilinear basis displacement functions. This approach does not require domain decomposition, so it is simpler than the corresponding implementation for distributed memory parallel computers. The performance of the GPU implementation is compared with the corresponding serial version run on a conventional processor for various mesh sizes and sparse matrix storage schemes.