A Task Scheduler for Astaroth, the Astrophysics Simulation Framework
Lappi, Oskar (2021)
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2021050428868
https://urn.fi/URN:NBN:fi-fe2021050428868
Tiivistelmä
Computational scientists use numerical simulations performed by computers to create new knowledge about complex systems. To give an example, computational astrophysicists run large magnetohydrodynamics simulations to study the evolution of magnetic fields in stars.
Simulation workloads are growing in size as scientists are interested in studying systems of increasingly large scales and high resolutions. As a consequence, performance requirements for the underlying computing platforms are becoming tougher.
To meet these requirements, many high-performance computer clusters use GPUs together with high-speed networks to accelerate computation. The fluid dynamics library Astaroth is built for such GPU clusters and has demonstrated a 10x speedup when compared to the equivalent CPU-based Pencil Code library still used by computational physicists today.
This thesis has been done in coordination with the research team that created Astaroth and builds on their findings. My goal has been to create a new version of Astaroth that performs better at distributed scale.
Earlier versions of Astaroth perform well due to optimized compute kernels and a coarse pipeline that overlaps computation and communication. The work done for this thesis consists of a finer-grained dependency analysis and a task graph scheduler that exploits the analysis to schedule tasks dynamically.
The task scheduler improves the performance of Astaroth for realistic bandwidth-bound workloads by up to 25% and improves the overall scalability of Astaroth.
Simulation workloads are growing in size as scientists are interested in studying systems of increasingly large scales and high resolutions. As a consequence, performance requirements for the underlying computing platforms are becoming tougher.
To meet these requirements, many high-performance computer clusters use GPUs together with high-speed networks to accelerate computation. The fluid dynamics library Astaroth is built for such GPU clusters and has demonstrated a 10x speedup when compared to the equivalent CPU-based Pencil Code library still used by computational physicists today.
This thesis has been done in coordination with the research team that created Astaroth and builds on their findings. My goal has been to create a new version of Astaroth that performs better at distributed scale.
Earlier versions of Astaroth perform well due to optimized compute kernels and a coarse pipeline that overlaps computation and communication. The work done for this thesis consists of a finer-grained dependency analysis and a task graph scheduler that exploits the analysis to schedule tasks dynamically.
The task scheduler improves the performance of Astaroth for realistic bandwidth-bound workloads by up to 25% and improves the overall scalability of Astaroth.