MHD - Weak scaling measured on OLCF/TITAN¶
We measured weak scaling performance for the MHD scheme (no shear border, nor dissipative terms) on the OLCF/TITAN system (world largest GPU cluster) upto 4096 GPU. Sub-domain size is \(256^3\).
Performance are measured in millions of cells update per second. Each configuration uses one GPU per MPI process.
You can notice the good scaling up to 4096 GPU, resulting in a global resolution of \(4096^3\)
From the following chart, we can notice the very good effective bandwidth obtained (up to almost 30 GBytes/s when writing a 4.4 TB file in parallel collective mode with 4096 MPI tasks contributing for 1.07 GB each. It takes about 2.5 minutes to write more than 4 TB of data on disk !
Notice also the very strong impact of the Lustre stripe parameter on performance. When writing such large files, you should always tune this parameter (increase from the default value which is often 2 or 4).
|Number of||Global size||total output||local size||Lustre||time||Effective|
|MPI proc||(Number of||size||per GPU||stripe||(sec)||Bandwidth|