Dispersion-free ultra-high order FFT-based Maxwell solvers have recently proven to be paramount to a large range of applications, including the high-fidelity modeling of high-intensity laser–matter interactions with Particle-In-Cell (PIC) codes. To enable a massively parallel scaling of these solvers, a novel parallelization technique was recently proposed, which consists in splitting the simulation domain into several processor sub-domains, with guard regions appended at each sub-domain boundary. Maxwell’s equations are advanced independently on each sub-domain using local shared-memory FFTs (instead of a single distributed global FFT). This implies small truncation errors at sub-domain boundaries, the amplitude of which depends on guard regions sizes and order of the Maxwell solver. For moderate guard region sizes, this "local" technique proved to be highly scalable on up to a million cores and notably enabled the 3D modeling of so-called plasma mirrors, for which 8 guard cells only were enough to prevent truncation error growth. Yet, for other applications, the required number of guard cells might be much higher, which would severely limit the parallel efficiency of this technique due to the large volume of guard cells to be exchanged between sub-domains. In this context, we propose a novel parallelization technique that ensures very good scaling of FFT-based solvers with an arbitrarily high number of guard cells. Our "hybrid" technique consists in performing distributed FFTs on local groups of processors with guard regions now appended to boundaries of each group of processors. It uses a dual domain decomposition method for the Maxwell solver and other parts of the PIC cycle to keep the simulation load-balanced. This ’hybrid’ technique was implemented in the open source exascale library PICSAR. Benchmarks show that for a large number of guard cells (), the ’hybrid’ technique offers up to speed-up and memory savings compared to the ’local’ one.
Existing programming models tend to tightly interleave algorithm and optimization in HPC simulation codes. This requires scientists to become experts in both the simulated domain and the optimization process and makes the code difficult to maintain or port to new architectures. In this paper, we propose the INKS programming model that decouples these concerns with two distinct languages: INKSpia to express the simulation algorithm and INKSpso for optimizations. We define INKSpia and evaluate the feasibility of defining INKSpso with three test languages: INKSo/C++, INKSo/loop and INKSo/XMP. We evaluate the approach on synthetic benchmarks (NAS and heat equation) as well as on a more complex example (6D Vlasov–Poisson solver). Our evaluation demonstrates the soundness of the approach as it improves the separation of algorithmic and optimization concerns at no performance cost. We also identify a set of guidelines for the later full definition of the INKSpso language.
Grâce au grand challenge Irene Joliot-Curie, les équipes de la Maison de la Simulation ont vectorisé les opérateurs Particle-in-Cell du code Smilei pour fonctionner le plus rapidement possible sur les processeurs Intel de dernière génération aux capacités vectorielles SIMD. Elles ont façonné une méthode adaptative locale et temporelle inédite sur ce type de code afin de déterminer la meilleure solution entre opérateurs scalaires et opérateurs vectoriels pour ainsi obtenir l'efficacité maximale. Ces méthodes ont été testées sur des cas de production et peuvent apporter des accélérations non négligeables jusqu'à 2x dans nos études.