Massive Parallel Computing with NUMECA's OMNIS™/OpenDBS
Recent developments in NUMECA OMNIS™/Open-DBS have yielded impressive improvements in massively parallel performance. More than just improving the scalability of the solver iteration loop, these developments have affected all phases of a typical simulation: solver startup, iteration, and the solution writing.Taken together, these developments allow efficient performance with over 20,000 processes and have made rapid turnaround of high resolution time accurate simulations a reality.
In recent years there has been a shift in industrial turbomachinery design and analysis to higher fidelity, higher cost, time-accurate simulations in place of traditional steady state analysis. These simulations allow a deeper understanding of blade passing interaction and other time-varying phenomena which can negatively impact design efficiency. However, this type of simulation presents many significant software challenges:
- As opposed to steady state simulation which only require a single meshed blade passage, unsteady simulations require multiple blade passages or even a full 360 degree mesh. To maintain a sufficient mesh density this can lead to an order of magnitude or more increase in mesh size. To achieve a quick solution turnaround time a scalable parallel implementation is mandatory.
- A second problem appears in the form of poorly scaling solver input/output (IO). The famous saying by Ken Batcher applies: “A supercomputer is a device for turning compute-bound problem into an I/O-bound problem”. At higher levels of parallelism a large percentage of computation time is spent on traditionally serial I/O. This is most apparent with unsteady simulations which require frequent solution writes.
Recent developments in the NUMECA OMNIS™/Open-DBS solver have addressed both of these problems. First, optimizations to the MPI-based parallel model have yielded efficient parallel performance with over 20,000 processes and a per-process load of less than 30K cells. Second, a comprehensive parallel I/O solution has been implemented with the CGNS-3 file format, allowing scalable initialization, checkpointing, and shutdown of the OMNIS™/Open-DBS solver at any scale.
Figure 1: OMNIS™/Open-DBS Parallel Performance. 6e8 Cell Mesh, Unsteady DES Solved on the OLCF Titan Supercomputer.
Taken together, these developments have allowed massive unsteady simulations which would have been otherwise unfathomable. In collaboration with Dresser-Rand, high resolution turbomachinery DES simulations have been performed on the OLCF Titan supercomputer using mesh configurations with between 8e7 and 6e8 cells, the largest of which were solved on over 20,000 compute processes. Using the CPUBooster™ module for rapid convergence, and taking advantage of the efficient parallel performance of the OMNIS™/Open-DBS solver, it is now possible to complete each time step in less than 4 minutes including the solution write. This allows a complete solution turnaround time on the order of a few days.
Onset of rotating stall in a turbomachinery diffuser / return channel. OMNIS™/Open-DBS unsteady DES solved on the OLCF Titan supercomputer
Posted by David Gutzwiller
David Gutzwiller is a software engineer and head of High Performance Computing at NUMECA-USA, based in San Francisco, CA. David joined NUMECA in 2009 after completion of a graduate degree in Aerospace Engineering from the University of Cincinnati. His graduate research was focused on the automated structural design and optimization of turbomachinery components. Since joining NUMECA David has worked on the adaptation of the FINE/Turbo and FINE/Open CFD solvers for use in a massively parallel, heterogeneous environment. In collaboration with industry users, David has constructed frameworks for intelligently driven design and optimization on the supercomputers at the Oak Ridge Leadership Computing Facility and Air Force Research Laboratories.