Every second Tuesday of the month, we welcome a speaker for a presentation on a subject related to scientific modeling and high performance computing. This takes place at the Maison de la Simulation and everyone is welcome to attend with no registration required. The seminar usually starts at 10 AM with some coffee and pastries offered at 9:30 AM.
A mailing list is available if you want to be informed of every new seminar session, you can subscribe online. Please contact us if you need any additional information.
There is today increasing consensus on the fact that quantum computing - an emerging data processing technology - may in the future play a significant role (with not yet fully understood boundaries) in high performance scientific computing. The simulation of quantum computers on standard computing platforms is today a necessary step to understand, assess, and develop quantum algorithms for computation, paving the way for the eventual future adoption of this disruptive technology. Emulation software will remain useful for some time to validate results of the earlier quantum computing platforms, and to help tuning application quantum codes. We will present a fully portable C++ emulation library under development for more than one year. The library disposes of three interfaces: a shared memory version running on SMP nodes, a MPI extension enabling access to a larger number of qubits, and a Python wrapper of the C++ library. A significant pedagogical effort is implemented in the documentation. Besides the traditional C++ class documentation, we are producing strategic higher level documentation and pedagogical papers on applications, including Jupyter notebooks. This seminar will propose first an introduction to quantum computing, explaining the fundamental differences with classical computing, underlining the strong features as well as the limiting bottlenecks of the technology. We will present next a short and very high level overview of the emulation library, focusing on a few major strategic choices. Then, the issues involved in quantum algorithms will be illustrated with a quantum chemistry example used in the validation tests of the software. Finally, we will conclude with a rapid overview of different areas in which quantum algorithm research is evolving (quantum chemistry, condensed matter physics, combinatorial optimization, fault tolerant quantum computing,...).
Semainar starts at 10:30AM. Welcome coffee at 10:00AM.
The MPC (Multi-Processor Computing) framework provides a unified parallel runtime designed to improve the scalability and performances of applications running on clusters of (very) large multiprocessor/multicore NUMA nodes. Thanks to its design, MPC allows mixed-mode programming models and efficient interaction with the HPC software stack. MPC provides implementations for the MPI, OpenMP and POSIX Threads standards. All these standards can be mixed together in an efficient way, thanks to process virtualization, and the sharing of information and resources
Hercules is a CEA-DAM platform for managing data produced by simulation codes. It integrates different I/O services to read, write in parallel database in the framework of protection / recovery, intercode (coupling or code sequence) and post-processing (visualization and analysis). It is based on a data model that covers many domains of simulation (structured, unstructured, AMR-block, AMR-tree based, multi-fluid, laser, atom, Euler, Lagrange, ale, 1D, 2D, 3D) and provides services to produce, filter, and disaggregate data in the sequential or parallel HPC application.
PaDaWAn is an infrastructure providing in-memory data exchange between applications and a simple configuration model to switch from a file-based workflow to an in-transit workflow. The infrastructure is currently based on CEA-DAM Hercule parallel I/O libray by providing an ABI-compatible library to intercept simulation data in a transparent way and to facilitate integration into existing simulation codes and tools.
TLT is a brand new ultra fast lossless compression algorithm, whose first objective was to reduce, quickly enough, the quantity of exchanged data on massively parallel architectures whether these data are transferred to distant nodes (MPI) or to storage (HDF5). This algorithm was originally dedicated to floating points encoded signals, which are difficult to compress due their lack of redundancy. Then, it was extended to classical (fixed points) encoded signals, in order to allow its implementation on dedicated chips (FPGA) for its usage for sensors and measuring chains. This presentation will give a brief overview of the state of the art, of the currently used tools, and of the TLT algorithm itself, finishing by the obtained results on several kinds of data.
Disruptive transitions in HPC landscape have already occurred when the fundamental strategy for organizing data and computations changed. For example, back in the 1990's, many programmers had to switch from vector processing to massively parallel computers. There was a disruptive ramp-up phase during which developers took care both of the old vector code and the new massively parallel processing code. We are facing the same kind of situation with the petascale-exascale transition currently on-going. So many applications are already bound by memory bandwidth and have sustained performance below 2% of CPU peak. The capabilities of networks, both in term of latency and bandwidth can hardly compete with the high CPU speed. Then, applications will need to be agile in evaluating and adopting technologies and software tools that are most promising along the way. It will even require exploration of new computing paradigms as we move to extreme parallelism and heterogeneity. One key component to help for this issue is the concept of mini-application. Mini-applications attempt to capture meaningful aspects of a fraction of a large application. The goal is to provide a sufficiently small set of code lines (e.g. less than 2000 lines of code) in order to try different strategies, various algorithms or schemes quickly and at a relative low human cost. Indeed, this is much important to converge towards good solutions satisfying numerous constraints at a reasonable cost, but also to have means to keep the pace of change in software/hardware.
Gysela is a simulation code used by physicists to study plasma turbulence within tokamak devices. In this parallel code, the semi-Lagrangian scheme is used to solve a set of 5-dimensional gyrokinetic equations which are self-consistently coupled to a Poisson equation The relative efficiency of this MPI+OpenMP application, for a weak scaling starting from 8k cores, is higher than 95% at 65k cores on several supercomputers. Producing physics results with this tool already necessitates large CPU resources. Moreover, it is expected that the needs will increase in the near future due to higher mesh resolution, and core plasma-edge plasma interaction. In that respect, adapting the code to upcoming parallel architectures is a key issue, and designing a new code that departs from the existing one is required. In this talk, the bottlenecks and some of the possible solutions to follow the path of Exascale will be given. Modelling the interplay between edge and core turbulent transport with such a gyrokinetic code is for a large part an unexplored territory, mainly because of numerical and physics bottlenecks. These stem from some characteristics of the edge physics, all requiring a dramatic increase of numerical resources: large variations of equilibrium temperature by 1 to 2 orders of magnitude from the core to the scrape-off layer region, and complex geometry including plasma-wall interaction, X-point, mutiresolution schemes. We will show how task programming techniques have potential to help scaling parallel algorithms at a pre-exascale level, while reducing prohibitive communication costs.
Low temperature plasmas, also known as discharge or laboratory plasmas, are ionized gases whose fields of applications are rich and varied: materials, environment, energy, transport, etc. An understanding of the main mechanisms leading to plasma generation requires the modeling of the discharge, where complex phenomena have to be considered: plasma generation, transport of charged particles, collisions, particle interaction with the walls, etc. The presentation will be limited to low temperature plasmas (far from thermodynamic equilibrium) which has the peculiarity of generating active species while maintaining a low temperature of the gas.
In low temperature plasmas, two main families of models are used to describe the transport of charged particles: "collisional" fluid approaches valid at high pressure when the average free path ? (distance between two collisions) is very small compared to typical dimensions d. This Eulerian approach describes the plasma through density, velocity and electron mean energy. The other approach is kinetic valid at lower pressure, when ?>d. Charged particles are described using macro-particles as a sample of the distribution function (Lagrangian approach). Electromagnetic fields influence the transport of charged particles, which in turn modify the field profiles. In most of situation the self-magnetic field can be ignored and Maxwell?s equations reduce to Poisson equation. We use the Particle-In-Cell technique to solve the Vlasov-Poisson system.
Strong constraints related to the time step for the resolution of transport equations and the coupling with the electric field, and space meshes are particularly penalizing at high plasma densities. This is the reason why these models were carried out about ten-to-fifteen years ago on single-processor machines for low plasma densities and restricted to one-dimensional geometries in most of cases addressed. The appearance of multi-core processors makes it possible today to address more and more complex problems, with the transition to more realistic geometries (2D and 3D models) in order to better describe the studied system (branching in the streamers, instabilities in magnetized plasmas, etc.).
We will detail the needs, the difficulties and the specific approaches (hybrid MPI/OpenMP parallel programming) associated with high-performance computing in the field of low temperature plasmas. We will illustrate the techniques discussed to study the electron drift instability observed in Hall thrusters.
The origin of the OpenMP task model is Cilk, Jade or even data flow model for the 70s. After an introduction to the task model as defined since the 3.0 version of OpenMP I will focus on 4.0 and above that make tasks attractive to the developpers. But the OpenMP standard only defines the way tasks are created and how can be schedule concurrently or not. There is no performance garantee on the makespan obtains by the task scheduler which is ?implementation defined?. In this presentation I will take examples from standard OpenMP runtimes form GCC, Intel and LLBM to show you some conditions under which performance cannot be ensured, especially if fine grainularity of task is important. I will finish the presentation by runtime modifications which are intended to be integrated into the LLVM OpenMP runtime and currently available under libKOMP OpenMP runtime.
Ce séminaire commencera exceptionnellement à 10h30 (café à 10h).
In this talk, we will address several issues encountered in numerical simulations of complex aero-propulsive flows about space launch vehicles.
The aerospace industry has always been very keen to use numerical modelling for the design of space launchers, since few critical problems are easily accessible to physical experiments. For instance, it is extremely difficult to maintain a rocket engine switched on in a wind tunnel or to reproduce accurately the motion of launcher stages during the stages separation phases, in a wind tunnel. This motivated the development by Ariane Group of its own calculation tool, FLUSEPA, since the early 90?s.
First, the robustness of this tool must allow it to cope with extremely violent transient phases (engine ignition ...), with high energy reactive flows (hypersonic or propulsive) about bodies in relative motion. Therefore, the proposed methodology based on the geometric intersection allows the use of an evolutionary mesh topology and is perfectly conservative. In addition, the original numerical schemes were chosen for their robustness and ability to simulate very steep phenomena (strong shock waves interactions?).
Recently, with the advent in an industrial context of numerical simulations of very large eddies (RANS-LES hybrid models), accuracy and numerical efficiency have become crucial features. Therefore, the new developments in FLUSEPA rely mostly on the High Power Computing techniques, efficient space-time integration schemes and mesh adaptation methods.
In this talk, we will describe how ArianeGroup plans to improve the FLUSEPA code regarding its grid adaptation strategy for the numerical simulation of the aerodynamics loads on space launchers.
This work is part of an effort to develop numerical industrial tools for the simulation of unsteady compressible flows about bodies in relative motion. FLUSEPA, an ArianeGroup CFD code, relies on a high-order Finite Volume formulation and a conservative overlapping of meshes using geometric intersections. In the overlapping regions, the calculation of the fluxes between overlapping mesh and cut cells is done on the exact geometric intersection surfaces: this allows the advection of shocks and unsteady structures.
Recently, an Adaptive Mesh Refinement (AMR) technique for unstructured hexahedron meshes has been implemented in FLUSEPA. This method eases the mesh construction process and ensures a local resolution adapted to the physical properties captured. In order to be functional, the AMR module has to be consistent with the pre-existing space-time numerical schemes (i.e. conservativity and accuracy) and also keep the algorithmic performance. Thus, the obtained solution is divided between several processes with a load balancing specific to the explicit temporal adaptive scheme and a high-order conservative projection of the variables for the refined cells. These two properties guarantee a reliable global numerical strategy. Several test cases have been run using this module and validate its formal use.
First-principle molecular simulation based on electronic structure calculation has become an essential tool in chemistry, condensed matter physics, molecular biology, materials science, and nanosciences. It is also an inexhaustible source of exciting mathematical and numerical problems.
In this talk, I will briefly introduce Density Functional Theory, which is to date the most widely used approach in electronic structure calculation, as it provides the best compromise between accuracy and computational efficiency. I will present some recent progress made in the mathematical understanding and the numerical analysis of this model, which pave the road to high-fidelity numerical simulations (with a posteriori error bounds) of the electronic structure of large molecular systems. I will then discuss the difficult issue of coupling the Kohn-Sham model with coarser models in view of simulating even larger molecular systems, such as drug-protein complexes in solution, or functionalized 2D-materials.
Molecular dynamics is now a very widely used tool to study by numerical simulations the matter at the molecular level. It is used in various fields, such as biology, chemistry or materials science in order to relate the macroscopic properties of matter to its atomistic features for various applications: protein structure prediction, drug design, dynamics of defects in crystals, exploration of the properties of new materials, etc...
Despite the increasing computational power, it remains in some practical cases difficult to simulate a sufficently large number of atoms over sufficiently long timescales to obtain predicitive and precise results. Mathematics play a fundamental role to derive coarse-grained models and to analyze and improve algorithms which are used to bridge space and time scales. One of the numerical difficulty is indeed related to timescales: the typical timescale of a molecular dynamics simulation is much smaller than the typical timescale at which the crucial events, from a macroscopic viewpoint, occur. This is related to the metastability of a molecular dynamics trajectory.
Many methods have been proposed in the molecular dynamics community to deal with these difficulties, and we will focus on two prototypical ones for which a mathematical analysis gives useful insights. We will first present adaptive importance sampling techniques, which have been proposed to sample efficiently statistical ensembles. Then, we will describe a mathematical analysis of accelerated dynamics methods which have been introduced by A.F. Voter to generate efficiently metastable dynamics.
Porous media are inherently a multi-scale system with heterogeneity on all scales from nanometres up to thousands of kilometres. Whereas the physical and mathematical description of the processes is reasonably well understood at the scale of individual pores, and established models exist on the scale of laboratory columns, many applications require the simulation of problems at the field scale up to whole landscapes, while experiments are best performed somewhere in the middle of this range. The gap between these scales can only be closed by massive parallel scientific computing.
The talk will present our proven numerical solvers for subsurface water flow and solute transport, shown to scale up to nearly a million processes and several billion unknowns. Results from scalability tests and high-resolution applications will be shown, including parallel I/O of tens of terabyte of data. New approaches to solve coupled surface/subsurface flow with high single-node performance on next-generation supercomputers will also be addressed.
Despite the growing place taken in industrial applications, thanks to its reduced cost and greater flexibility compared to experiments, Computational Fluid Dynamics (CFD) did not kept all its promises yet, due to the difficulty of providing high-fidelity predictions of complex high Reynolds turbulent flows, among others.
For industrial applications, high-fidelity simulations like DNS (Direct Numerical Simulation) or even LES (Large Eddy Simulation) remain limited and restricted to relatively simple configurations, whereas RANS methods, based on the solution of the averaged equations supplemented by a turbulence model, still represent the workhorse tool, despite their numerous flaws.
As a consequence, the improvement of the predictions and the quantification of uncertainties associated with RANS models appears to be of the utmost importance to progress toward more reliable CFD simulations.
One possibility consists in adapting the model closure parameters in such a way as to quantify the uncertainty associated with the use of a given set of parameters, calibrated for a small set of simple flow configurations, to radically different cases. Another possibility consists in using statistical learning techniques to infer about the error associated with the turbulent closure, namely, the constitutive law for the Reynolds stress tensor. In both cases, data resulting from various sources (high-fidelity numerical simulations, experimental measurements) are assimilated into the statistical model in order to provide an a posteriori predictive probability for one or more quantities of interest.
In this work, Bayesian inference techniques are used for achieving the above-mentioned goals.
These are stochastic learning methods allowing to infer on model coefficients and model probabilities based on data observation and assimilation.
Differently from other calibration methods, Bayesian techniques provide a quantification of the reliability of the calibrated models, i.e. an estimate of the posterior probability distribution of the solution under parameter uncertainty and model uncertainty (i.e. the fact that the model does not represent physical reality exactly).
Additionally, they are well suited for costly fluid dynamics applications, for which only a limited number of independent datasets are available.
In the talk, recent advances on the application of Bayesian methods to the quantification and reduction of the uncertainty of RANS models will be presented, as well as applications to some representative flow configurations, like external flows past wings and internal separated flows.
With the ever-growing need of data in HPC applications, the congestion at the I/O level becomes critical in super-computers. Architectural enhancement such as burst-buffers and pre-fetching are added to machines, but are not sufficient to prevent congestion. Recent online I/O scheduling strategies have been put in place, but they add an additional congestion point and overheads in the computation of applications. In this work, we show how to take advantage of the periodic nature of HPC applications in order to develop efficient periodic scheduling strategies for their I/O transfers. Our strategy computes once during the job scheduling phase a pattern where it defines the I/O behavior for each application, after which the applications run independently, transferring their I/O at the specified times. Our strategy limits the amount of I/O congestion at the I/O node level and can be easily integrated into current job schedulers. We validate this model through extensive simulations and experiments by comparing it to state-of-the-art online solutions. Specifically, we show that not only our scheduler has the advantage of being de-centralized, thus overcoming the overhead of online schedulers, but we also show that on Mira one can expect an average dilation improvement of 22% with an average throughput improvement of 32%! Finally, we show that one can expect those improvements to get better in the next generation of platforms where the compute - I/O bandwidth imbalance increases.
Large-scale HPC applications can produce a high load on the file system. This often occurs during access to checkpoint and restart files, which have to be frequently stored to allow for an application restart after program termination or system failure. On large-scale HPC systems with distributed memory, each application task will often perform such I/O individually by creating task-local file objects. This I/O can stress the metadata management components of the I/O subsystem significantly.
SIONlib is a library for writing and reading binary data to/from several thousands of processors into one or a small number of physical files. The SIONlib file layout and API allow the application to take advantage of the scaling behaviour and asynchronous access of a logical task-local pattern while keeping the number of files independent of and significantly smaller than the number of processes.
This talk will give an overview of the design choices for the SIONlib library as well as a summary of possible use cases
Large scale numerical simulations are producing an ever growing amount of data that include the simulation results as well as execution traces and logs. These data represent a double challenge. First, these amounts of data are becoming increasingly difficult to analyse relying on traditional tools. Next, moving these data from the simulation to disks, to latter retrieve them from disks to the analysis machine is becoming increasingly costly in term of time and energy. And this situation is expected to worsen as supercomputer I/Os and more generally data movements capabilities are progressing more slowly than compute capabilities. While the simulation was at the center of all attentions, it is now time to focus on high performance data analysis. This integration of data analytics with large scale simulations represents a new kind of workflow that needs adapted software solutions.
In this talk we will survey two major trends related to data analysis, namely Big Data Anlytics and In-Situ Analytics, and confront their benefits and shortcomings.
Big Data Analytics solutions like MapReduce, Spark or Flink were developed to answer the needs for analysing large amount of data from the web, social networks, or generated by business applications on cloud infrastructures. The machines (HPC versus cloud) and the data (mainly text versus structured numerical data) deeply differ. During this talk we will survey the main concepts supported by modern Map/Reduce frameworks and look at some attempts to use these tools for the analysis of numerical data.
The second trend is in-situ analytics. In-Situ Analytics is a recent paradigm that emerged as a solution to the scalability issue of postmortem data analysis in the HPC context. It proposes to start processing the data while the simulation is running, as soon as the raw results are available compute nodes? memory, with multiple benefits. Data produced by the simulation can start to be reduced before moving out of the compute nodes, thus saving on data movements and on the amount of data eventually stored to disk. We will discuss the main concepts and show some examples of in-situ analytics scenarios we developed with HPC users.
Parallel and distributed systems are able to run complex HPC applications whose computing resource requirements have considerably increased along with the size of HPC infrastructures. This increase is accompanied by a significant and worrying growth in the electricity consumed by HPC data centers, and consequently, energy-efficiency is becoming critical for these infrastructures. However, knowing the energy consumed by an application running on a data center is complicated due to the lack of fine-grain power meters deployed on these platforms. This understanding is yet essential to improve their energy-efficiency.
In this talk, we will present techniques to model and simulate the energy consumed by HPC applications and infrastructures. In particular, we will show how to predict the performance and power consumption of MPI applications with the SimGrid simulation toolkit. We will also survey the various techniques employed to reduce the energy consumption of data centers and show how to simulate them. Such energy-aware simulation tools can help developers to optimize the energy efficiency of their distributed applications.
Machine Learning is a rapidly developing field at the intersection of statistics, computer science and applied mathematics. Numerous recent examples have demonstrated the transformative impact it can have across engineering and natural sciences. The field of system identification, using statistical methods to build models of dynamical system from measured data, can be seen as a subset of Machine Learning. Over the past few years, sparsity and parsimony have been overarching themes attracting a lot of attention from the system identification community. The aim of this seminar is to introduce the audience to Sparse Identification of Nonlinear Dynamics (SINDy), a new data-driven framework proposed by Brunton et al. (PNAS, 2015) to identify nonlinear low- order models solely based on experimentally available measurements and in the absence of prior knowledge of the underlying governing equations. Throughout this presentation, the two-dimensional shear-driven cavity flow at Re = 7500 will be used for illustration purposes. Despite its geometric simplicity, this flow exhibits complex non-linear dynamics, such as shear-layer instabilities or inner-cavity motion, occurring over different ranges of length scales and time scales. Two different reduced-order models (R.O.M.) of the flow will be presented: - a first R.O.M. obtained using the classical Galerkin projection procedure; - a second R.O.M. obtained using Sparse Galerkin regression, a new procedure based on SINDy recently introduced by Loiseau and Brunton. In both cases, a dimensionality reduction preprocessing using Proper Orthogonal Decomposition will be performed as to obtained a low-dimensional representation of the flow?s dynamics. As will be shown, the reduced-order model obtained by Galerkin regression largely outperforms the one obtained using standard Galerkin projection
Grid-based solvers for the Vlasov equation give accurate results but suffer from the curse of dimensionality. To enable the grid-based solution of the Vlasov equation in 6d phase-space, we need efficient parallelization schemes. In this talk, we consider the 6d Vlasov-Poisson problem discretized by a split-step semi-Lagrangian scheme. To optimize single node performance, we use vectorization, efficient data access and OpenMP parallelism. For distributed memory parallelism, we consider two parallelization strategies : A remapping strategy that works with two different layouts keeping parts of the dimensions sequential and a classical partitioning into hyperrectangles. The 1d interpolations can be performed sequentially on each processor for the remapping scheme. On the other hand, the remapping consists in an all-to-all communication pattern. The partitioning only requires localized communication but each 1d interpolation needs to be performed on distributed data. We compare both parallelization schemes and discuss how to efficiently handle the domain boundaries in the interpolation for partitioning. In order to extend the domain partitioning strategy to problems with fast gyration around a magnetic field, we propose to use a moving mesh in velocity space. For the domain partitioning, we will also discuss the optimal choice of the process grid as well as compression of the halo cells. This is joint work with Klaus Reuter, Markus Rampp and Eric Sonnendrücker.
Software defined storage promises ease of deployment on a large varieties of hardware platforms, specifically commodity ones. SDS is seen simultaneously as a way to reduce cost and increase user adaptability. It appears that the emancipation from the powerful appliances with numerous built-in fail-over capabilities that use to be at the core HPC storage architecture remains a daunting task.
This talk, taking the IME as a case study, will browse the current challenges to address in order to shift to commodity hardware and will details some of the solutions that have been implemented in the IME project.
I will describe the development of a new code aimed at the study of hydrodynamical processes in stellar interiors. Current understanding of the evolution of stellar interiors relies on one-dimensional calculations. Complex physical processes which drive this evolution, such as convection, rotation, or accretion, are described by simplified, phenomenological approaches. However, the predictive power of these methods is severely hindered by the many free parameters employed by them. In an effort to redress this situation the Multi-dimensional Stellar Implicit Code (MUSIC) has been developed. By solving the equations of hydrodynamics in spherical coordinates the multi-dimensional processes at the heart of stellar evolution can be studied directly. The use of time-implicit methods allows the specific time-scale of interest to be targeted, and for statistically meaningful quantities of data to be gathered. Recent results from two applications will be presented: first, the accretion onto a young solar type star, and its impact on subsequent evolution; and second the problem of convective overshooting and its influence on lithium depletion.
With the advance of modern high performance computing, large scale simulations have been able to account for more and more realistic physics, capturing very large dynamical ranges. One of the challenges in the current approaches towards exascale computing and ever increasing resolutions is the change of physics in-between scales. The simulations currently running on supercomputers have been designed for specific purposes and cannot be easily modified to accommodate more physics. But reaching to different orders of magnitudes in scales requires also to change the physics computed. Moreover, writing new software to account simultaneously for different scales, while at the same time including as much features as the legacy codes that have been written and maintained for decades is getting harder and harder. One solution to that problem is, instead of rewriting them, coupling existing codes. I will present the efforts that have been made at University of Surrey to couple two astrophysics codes : Ramses (P3M + hydrodynamics using an AMR grid) and NBody6 (direct summation code) via MPI and MIMD techniques. This coupling will allow us to run precise simulations of globular clusters in interaction with a host galaxy. In these runs, Ramses is managing all the hydro and collisionless dynamics of a host galaxy while NBody6 is in charge of integrating precisely the trajectory and stellar evolution of stars in a globular cluster. Such a system allow us to precisely simulate tidal interaction between the two objects, something that has been done with rough analytical models until now. Such systems allow us to cover up to nine orders of magnitude in space, time and mass resolutions.
It is well known that the inviscid, adiabatic equations of atmospheric motion constitute a non-canonical Hamiltonian system, and therefore posses many important conserved quantities such as as mass, potential vorticity and total energy. In addition, there are also key mimetic properties (such as curl grad = 0) of the underlying continuous vector calculus. Ideally, a dynamical core should have similar properties.
A general approach to deriving such structure-preserving numerical schemes has been developed through a combination of Hamiltonian methods and mimetic discretizations. Beyond these structure-preserving properties, modern dynamical cores must be efficient on a wide range of computational architectures, and should be able to efficiency leverage the increasing parallelism of modern machines. This is achieved through the selection of a particular class of mimetic discretizations: structured grid, tensor product mimetic Galerkin methods.
This talk will discuss Dynamico-FE, a new structure-preserving hydrostatic atmospheric dynamical core built using these techniques, and show results from a standard set of test cases on both the plane and the sphere. It will also briefly discuss the Themis software framework (used to construct this code), which is designed specifically for tensor product Galerkin methods on structured grids.
Climate models simulate atmospheric flows interacting with many physical processes. Because they address long time scales, from centuries to millennia, they need to be efficient, but not at the expense of certain desirable properties, especially conservation of total mass and energy. Most of my talk will explain the design principles behind DYNAMICO, a highly scalable unstructured-mesh energy-conserving finite volume/mimetic finite difference atmospheric flow solver and potential successor of LMD-Z, a structured-mesh (longitude-latitude) solver currently operational as part of IPSL-CM, the Earth System Model developed by Institut Pierre Simon Laplace (IPSL).
Specifically, the design of DYNAMICO leverages the variational structure of the equations of motion and their Hamiltonian formulation, so that the conservation of energy requires only that the discrete grad and div operators be compatible, i.e. that a discrete integration by parts formula holds. At the implementation level, performance is achieved by combining a simple memory layout allowing vectorization, mixed MPI/OpenMP parallelism and using the asynchronous parallel I/O server XIOS.
Dans cette présentation je présente une vision des impacts de la convergence calculs-données sur les aspects logiciels des applications du calcul numériques d'une part et d'autre part sur une synthèse des travaux du projet Européen EXDCI relatif à cette problématique.
Joint work with:
Heat equation and Poisson equations are basic blocks of many numerical methods for partial differential equations (PDE). These two equations, which could be considered as simple are actually numerical bottlenecks in many applications like fluid mechanics, plasma physics and so on, as obtaining fast solvers is always challenging, at least in dimension 3.
I will show that fast and precise parallel solvers are obtained when two
conditions are fulfilled:
1) for the heat equation, use explicit high order stabilized methods,
2) perform arithmetic intensive matrix vector products obtained from
high order discretizations.
In the next decade, exascale supercomputers will provide the computational power required to perform very large scale simulations. For certain applications the results of exascale simulations will be of such high reslution that experimental measurements will be insufficient for validation purposes. As floating point approximations of numeric expressions are neither associative nor distributive, the results of a numerical simulation can differ between executions. As reported by the numerician I. S. Duff, "Getting different results for different runs of the same computation can be disconcerting for users even if, in a sense, both results are correct". There is a need to have an automatic and global approach giving a confidence interval on the results taking into account the floating point arithmetic effect. The estimation of the effect of the floating point model on the accuracy of the computed results is the first step of rigorous Verification and Validation (VandV) procedure.
This talk is organised as follows. The context of our work and in particular the dark side of the floating point computation is firstly presented. A brief overview of some numerical verification tools is the reported. Then, the new tool called verificarlo is exposed. Using verificarlo is transparent for the user and does not require manually modifying the source code. It can be used for the automatic assessment of the numerical accuracy of large scale digital simulations by using the Monte-Carlo Arithmetic. Several examples will be displayed in particular the numerical verification of the solving of linear systems using the LAPACK and BLAS scientific libraries.
Radiotherapy treatments consists in irradiating the patient with beams of energetic particles (typically photons) targeting the tumor. Such particles are transported through the medium deposit energy in the medium. This deposited energy is the so-called dose, responsible for the biological effect of the radiations. The present work aim to develop numerical methods for dose computation and optimization that are competitive in terms of computational cost and accuracy compared to reference method.
The motion of particles is first studied through a system of linear transport equations at the kinetic level. However, solving directly such systems is numerically too costly for medical application. Instead, the moment method is used with a special focus on the Mn models. Those moment equations are non-linear and valid under a condition called realizability.
Standard numerical schemes for moment equations are constrained by stability conditions which happen to be very restrictive when the medium contains low density regions. Inconditionally stable numerical schemes adapted to moment equations (preserving the realizability property) are developped. Those schemes are shown to be competitive in terms of computational costs compared to reference approaches. Finally they are applied to in an optimization procedure aiming to maximize the dose in the tumor and to minimize the dose in healthy tissues.
Magnetic fusion research aims at developing power plants based on the fusion of light nuclei, which produces a large amount of energy and no radioactive waste. The energy of the sun results from such fusion reactions. On earth a promising concept to gain energy from fusion is magnetic confinement, where charged particles are confined at a high temperature for a long enough time using a magnetic field. An international experiment called ITER is being build in Cadarache near Aix-en-Provence to attest the feasibility of the concept.
Even though the main idea of confining the particles with a magnetic field seems simple and natural, magnetized plasmas, which are dense collections of charged particles, exhibit many instabilities that need to be controlled. This can be done only via intensive numerical simulations.
The "Numerical Methods in Plasma Physics" division at the Max-Planck Institute for plasma physics in Garching (Germany) develops numerical methods and algorithms for magnetic fusion simulations. It also hosts the High Level Support Team of the EUROfusion consortium, that helps profiling and optimizing the major european Fusion codes for the EUROfusion high performance computer, currently Marconi-fusion hosted by CINECA in Italy. In this talk we will present an overview of the activities of the division including the main models Kinetic, MHD, Maxwell used in magnetic fusion and the development of the software libraries SeLaLib for semi-Lagrangian and PIC kinetic simulations and Django-Jorek for Finite Element MHD and full-wave (Maxwell) simulations.
The phase diagram of high pressure hydrogen is of great interest for fundamental research, planetary physics, and energy applications[1]. Laboratory experiments to reach the appropriate physical conditions are difficult and extremely expensive, therefore ab-initio theory has played a crucial role in developing the field. The accuracy of the conventional method based on Density Functional Theory (DFT) is however limited and often non-predictive. We have developed a quantitative methodology based on Quantum Monte Carlo methods to study hydrogen in extreme conditions: the Coupled Electron-Ion Monte Carlo method (CEIMC)[2].
After a brief introduction to the physical problem, I will outline the main ingredients of the method and describe some applications to high pressure hydrogen.
In particular I will focus on the the liquid-liquid phase transition, a first-order phase transition in the fluid phase between a molecular insulating fluid and a monoatomic metallic fluid. The existence and precise location of the transition line is relevant for planetary models. Recent experiments reported contrasting results about the location of the transition[3,4,5]. Theoretical results based on DTF are also very scattered[6,7,8,5]. We report accurate CEIMC calculations of this transition finding results that lie between the two experimental predictions, close to that measured in diamond anvil cell experiments but at 25-30 GPa higher pressure. The transition along an isotherm is signaled by a discontinuity in the specific volume, a ?sudden dissociation of the molecules, a jump in electrical conductivity and loss of electron localization[9].
references:
[1] J.M. McMahon, M.A. Morales, C. Pierleoni and D.M. Ceperley, ?The properties of hydrogen and helium under
extreme conditions?, Review of Modern Physics 84, 1607 (2012).
[2] C. Pierleoni and D.M. Ceperley, ?The Coupled Electron-Ion Monte Carlo method?, Lect. Notes Phys. 703,
641?683 (2006).
[3] Zaghoo M, Salamat A, Silvera IF, ?Evidence of a first-order phase transition to metallic hydrogen?, Phys.
Rev. B 93, 155128 (2016).
[4] Ohta K et al. ?Phase boundary of hot dense fluid hydrogen? Scientific Reports 5:16560 (2015).
[5] Knudson MD et al. ?Direct observation of an abrupt insulator-to-metal transition in dense liquid deuterium?
Science 348, 1455 (2015).
[6] M.A. Morales, C. Pierleoni, E. Schwegler and D.M. Ceperley ?Evidence for a first-order liquid-liquid
transition in high-pressure hydrogen from ab initio simulations?, PNAS 107, 12799 (2010).
[7] M.A. Morales, J.M. McMahon, C. Pierleoni, D.M. Ceperley, ?Nuclear quantum effects and nonlocal
exchange-correlation functionals applied to liquid hydrogen at high pressure?, Phys. Rev. Lett. 110,
065702 (2013).
[8] W. Lorenzen, B. Holst, R. Redmer, ?First-order liquid-liquid phase transition in dense hydrogen? Phys. Rev.
B 82, 195107 (2010).
[9] C. Pierleoni, M.A. Morales, G. Rillo, M. Holzmann and D.M. Ceperley, ?Liquid-liquid phase transition in
hydrogen by Coupled Electron-Ion Monte Carlo Simulations?, PNAS 113, 4953?4957(2016).
With highly-pipelined vector processors, hardware accelerators (GPU, FPGA), deeper memory hierarchies, and heterogeneous designs becoming mainstream, programmability, portability, and productivity are now important facets to consider on the way to performance. In this context, I will try to illustrate how compiler research, through language developments and automation of code analysis and optimizations, addresses these concerns and why the gap between what compilers can do and what HPC users hope they could do is still very large.
Cost models (for both the users and the compilers), exchange between compiler and user (application knowledge and optimization reporting), and limits of code analysis, remain serious issues. Nevertheless cost models (such as roofline and ECM), communications (such as automatic offloading), locality optimizations (such as tiling), language design (of various kinds) still make regular progress in these directions.
We describe how random transformations can accelerate the solution of linear systems by preventing the communication overhead due to pivoting. We have applied successfully this technique to dense linear systems (general or symmetric indefinite), resulting in efficient solvers for current parallel architectures, including multicore, GPU or Intel Xeon Phi, and already integrated in the public domain scientific library MAGMA. We also present some experiments using direct sparse factorizations where randomization is combined with sparsity-preserving strategies. Finally we illustrate how the some iterative solvers based on Krylov subspace method can also benefit from this approach.
Turbulent transport of solid particles in a suspending Newtonian fluid is often present in natural and industrial contexts. Few of many well-known examples are sediment transport in a river bed, sandstorms, slurries, and the flocculation and sedimentation processes in the treatment of drinking water. In many cases the particles have a finite-size, i.e., a size comparable to or larger than the smallest scales of the turbulent flow. In these cases turbulence ? in itself one of the most challenging problems in classical physics ? is greatly modified due to the presence of the particles, which interact both with each other and with the suspending fluid. The continuous growth in computing power together with the development of efficient numerical algorithms makes the simulation of the detailed interaction of many particles with the fluid turbulence possible. Possible, but challenging. We will present an overview of the steps taken to achieve such massive simulations. In particular, we will present our numerical algorithm, elaborate on technical details such as parallelization and data handling, and finalize with some relevant scientific findings.
The ONERA CFD department has been developing and supporting computational fluid dynamics software for decades both for its own research and for industrial partners in the aeronautical domain. Nowadays, the elsA software, developed at ONERA since 1997, is one of the major CFD tools used by Airbus, Eurocopter and Safran. In their design services, it is massively employed to optimize airplane performance (noise or energy consumption reduction, safety improvement,...). Due to environmental constraint, noise reduction in the vicinity of airports has become a major challenge for aircraft manufacturer. The noise radiated during the landing phase is due to turbulent vortices generated by landing gears and flaps in the wings, which act like powerfull whistles. The numerical simulation of the generated noise requires to handle the complex detailled geometry of landing gear or flaps and to solve billions of unknows at each timestep to describe the time evolution of turbulence vortices during millions of timestep in order to compute few seconds of the physical time. Therefore HPC capabilities, complex geometries (re)meshing and multiphysics coupling (noise generator and propagator) are crucial points for the efficiency of the software to obtain a solution in a reasonable time. For these reasons, a demonstrator named FAST (Flexible Aerodynamic Solver Technology) is under development since 1-2 year in order to prepare a major evolution of elsA in the coming years. This demonstrator aims at providing a software architecture and numerical techniques which will allow better flexibility, evolutivity and efficiency in order to perform simulations out of reach with the present CFD tools. Thanks to previous expertise, services reclaimed by CFD simulations (pre/post-processing, boundary conditions, solvers, coupling...) are provided by different Python modules in FAST, whereas the CGNS standard (CFD General Notation System) is adopted as a data model for interoperability between modules. Thanks to code modernization (memory access, vectorization,..) we aim to reduce by at least by one order of magnitude the CPU cost of this kind of computation on actual Xeon and future Phi (KNL) or Xeon (Skylake) Intel architecture processors.
SAMSON (Software for Adaptive Modeling and Simulation Of Nanosystems) is a new software platform for computational nanoscience available on SAMSON Connect at http://www.samson-connect.net.
SAMSON integrates modeling and simulation to aid in the analysis and design of molecular systems. For instance, an interactive quantum chemistry module (ASED-MO level of theory) makes it possible for users to build and edit structures while interactively visualizing how the electronic density is updated (Figure 1a); interactive flexing and twisting tools allow users to easily perform large-scale flexible deformations of e.g. proteins with a few mouse clicks (Figure 1b); interactive virtual prototyping of hydrocarbon systems may be used to edit and constrain graphene sheets, carbon nanotubes (Figure 1c), or build complex models (Figure 2), potentially through adaptive simulation algorithms [1][2][3].
Most important, a Software Development Kit allows developers to extend SAMSON?s functionality by developing SAMSON Elements (modules for SAMSON), including e.g. new interaction models, editors, apps, wrappers or interfaces to existing software, connectors to web services, etc. The SAMSON Connect website (http://www.samson-connect.net) is open for developers and users to easily share SAMSON Elements.
We will present SAMSON and its general design principles, as well as specific applications to structural biology and materials science.
References
[1] S. Artemova and S. Redon, Physical Review Letters, 109:19, 2012
[2] M. Bosson et al, Journal of Computational Physics, 231:6, 2012
[3] M. Bosson et al, Journal of Computational Chemistry, 34:6, 2013
Geometric methods furnish novel and transformative strategies for tackling the supercomputing challenge in a wide range of scientific areas (physics, chemistry, biology, life sciences, materials, climate, geosciences). In particular, new numerical schemes have been developed which respect the intrinsic geometrical structure of the challenges of (big-)data or physics.
Recently, a variational strategy for introducing stochasticity into the dynamics of fluids, and fluid-structure interactions which also respects this intrinsic geometrical structure has been discovered. It already comprise a significant High Performance Computing problem that challenges even exascale computational capability! Fortunately, we can expect the extreme parallelism of the Port Controlled Hamiltonian and Lagrangian Systems for lumped descriptions to be particularly well adapted in making the cost and computer power requirements of these stochastic simulations affordable.
These geometric variational methods have been well-formulated by eminent experts in mathematics in an established cooperation with our world calibre industrial partner (Thales) and computer engineers within the UMN network, as supported by the National Agency of Research (ANR,France) to facilitate access to the European funding programmes (Horizon 2020, FETHPC, ?Transition to Exascale Computing") in 2017.
This talk is about the tools and methods developed during the MACOPA ANR grant (LAPLACE, ONERA, IMFT, IRIT).
Time consistent numerical integration of transport partial differential equations systems can be done with fully local time stepping at the cell level in a explicit formalism with the so-called "asynchronous" time integration methods. Large speed-up can be achieved for multi-scale problems. Different numerical schemes have been developed within the MACOPA free software toolkit: finite volume schemes, discontinuous Galerkin schemes, distributed residual schemes. Higher order time accuracy could also be achieved when using specific Asynchronous Runge-Kutta methods. Parallelization of the asynchronous schemes will also be discussed. Some simulation results in combustion, electromagnetism, discharge plasmas and coupled problems will be shown.
Organic photovoltaic systems present the specificity that the photo-generated excitons are strongly bound, justifying the realisation of donor/acceptor interfaces at which the charge separation can be achieved. The exact processes associated with such a process is however still a matter of discussion. As such, besides the standard goal of describing correctly the electronic and optical properties of organic semiconductors, one needs to describe correctly the band offsets at the interface and understand the related electron-hole dissociation processes. As will be shown in this presentation, the standard ab initio tools, namely DFT and its time-dependent extension (TDDFT) present severe limitations. We will show that the so-called GW and Bethe-Salpeter formalisms, allow to cure much of the problems associated with the (TD)DFT formalisms. Our GW and Bethe-Salpeter implementation with Gaussian bases allows to tackle systems comparison a few hundred atoms. The numerical implementation details and its scaling will be presented. Finally, perspective such as the accouting of the environment effect on electronic properties will be discussed.
References:
[1] "First principles calculations of charge transfer excitations in polymer-fullerene complexes: Influence of excess energy", D. Niedzialek, I. Duchemin, T. Branquinho de Queiroz, S. Osella, A. Rao, R. Friend, X. Blase, S. Kümmel, and D. Beljonne, Adv. Funct. Mater. 25 pp. 1287-1295 (2015).
[2] "Many-body Green's function study of coumarins for dye-sensitized solar cells", C. Faber, I. Duchemin, T. Deutsch, X. Blase, Phys. Rev. B, 86, 155315 (2012)
[3] "Short-range to long-range charge-transfer excitations in the zincbacteriochlorin-bacteriochlorin complex: A Bethe-Salpeter study", I. Duchemin, T. Deutsch, X. Blase, Phys. Rev. Lett. 109, 167801 (2012).
[4] "First-principles GW calculations for DNA and RNA nucleobases", Carina Faber, Claudio Attaccalite, V. Olevano, E. Runge, X. Blase, Phys. Rev. B 83, 115123 (2011).
Processes taking place in the liquid state, for instance chemical
reactions, happen in a sea of solvent molecules. They are legion, and
some say they don't forget, but can we predict their effect in solution?
(i) Roughly yes, using rough methods that rely on a macroscopic
description of the solvent. They are fast (say few cpu-seconds) but are
not able to capture the physical, molecular nature of solvation. No
packing, no orientation effects, no hydrogen-bonding, among others.
(ii) Yes, accurately, using explicit simulations like molecular
dynamics. But these are at least 3 to 4 orders of magnitude slower.
Hundreds or thousands of cpu-hours are often necessary.
(iii) We will present the molecular density functional theory and its
associated code, MDFT. We will show how state-of-the-art liquid state
theory and high performance algorithms can capture solvation effects at
their inherent molecular scale, for the cost of rough methods.
In this talk I will first present our recent work and developments on SMPI, a flexible simulator of MPI applications. In this tool, we took a particular care to ensure our simulator could be used to produce fast and accurate predictions in a wide variety of situations. Although we did build SMPI on SimGrid whose speed and accuracy had already been assessed in other contexts, moving such techniques to a HPC workload required significant additional efforts to accurately model communications and network topology. I will also present our recent work on StarPU/SimGrid, a custom simulator that can be used to predict the performance of task-based applications running on top of StarPU to exploit hybrid (CPU+GPU) architectures. We have demonstrated the faithfulness of StarPU/SimGrid for both modern dense and sparse linear algebra solvers.
The prediction of conversion efficiency and pollutant emissions in combustion devices is particularly challenging as they result from very complex interactions of turbulence, chemistry, and heat exchanges at very different space and time scales. The mesh resolution in the mixing and reaction zones of the combustor is therefore of tremendous importance to reduce the turbulent combustion modeling effort and the inherent modeling errors. In the framework of finite-rate chemistry modeling at low-Mach number, h-adaptation has to be supplemented by operator splitting and stiff integration algorithms in order to alleviate the time step restriction due to the chemical time scales. These algorithms introduce a large load unbalance when running on massively parallel super-computers. However, task sharing and dynamic scheduling approaches enable to recover a linear scaling on a large number of cores. These techniques have been implemented in the YALES2 CFD solver (http://www.coria-cfd.fr), developed at CORIA and used in several laboratories of the scientific group SUCCESS (http://success.coria-cfd.fr) and in the industry. It has been specifically tailored for dealing with very large meshes up to tens of billion cells and for solving efficiently the low-Mach number Navier-Stokes equations on massively parallel computers. The presentation will focus in particular on the recent development in YALES2 of dynamic h-adaptivity and operator splitting approaches, which enable to recover a good scaling on modern super-computers.
Full waveform inversion (FWI) is a nonlinear data fitting process for high resolution seismic imaging. It is based on the iterative minimization of the L^2 distance between predicted and observed data. The predicted data is computed through the numerical solution of the wave equation in time or frequency domain. The minimization is performed through local optimization techniques based on the gradient of the misfit function and an estimation of its inverse Hessian following quasi-Newton techniques. Compared to standard tomography methods, full waveform inversion is able to yield higher resolution estimates of the subsurface wave velocity. However, as it is based on the repeated computation of the full wavefield, it requires efficient numerical algorithms carefully implemented. In addition, local optimization techniques require a sufficiently accurate initial model to converge to the desired solution. In this presentation, we shall review standard implementation of FWI for industrial scale problems, and present a novel methodology to relax the constraint on the accuracy of the initial model. This methodology is based on the computation of the distance between predicted and observed data using an optimal transport distance. In particular, we will present the properties of this distance in the framework of FWI, and the algorithm we designe for its efficient evaluation for large-scale problems.
Particle and heat exhaust physics is one of the main challenges magnetic fusion research will have to solve on its way to full scale reactors. The design of the magnetic configuration and wall shape as well as the tuning of the edge plasma conditions are critical in order to maintain sustainable power fluxes on plasma facing components while insuring an effective pumping of Helium ashes and keeping fusion favorable conditions in the core plasma. The physics at play involves a complex balance between plasma transport processes and volumetric sources and sinks related to atomic and molecular processes occurring between the plasma and recycling or injected neutrals.
Mean-field "transport codes" have been for many years the key tools for the understanding of edge plasma regimes and the design of future machines. These codes rely on models in which plasma turbulence has been smoothed out by averaging and simple closures are used to model the average fluxes and stresses due to fluctuations. In particular, transverse transport is commonly described via a gradient-diffusion hypothesis in which fluxes are driven by local gradients and characterized by ad-hoc diffusion coefficients whose values are determined experimentally. However, these coefficients differ from one machine to another, from one pulse to another in the same device and even from one location to another inside a plasma. They must then be considered as free parameters, which reduces drastically the predictive capabilities of these codes. Fluctuations related non-linearities appearing in atomic physics are not captured either by mean-field models and could influence significantly the results.
In this presentation, I will present the effort led at CEA Cadarache to unlock this bolt. I will specifically introduce the two main edge plasma codes developed by our team, namely SOLEDGE2D and TOKAM3X, and report on report the latest advances and results obtained with these two tools. Special focus will be given to numerical challenges still needing to be solved on the way to ITER relevant simulations.
In turbulent transport the advected quantities often exhibit spatial scales that are much smaller than the flow scales. We will describe hybrid algorithms which take this property into account to combine different methods (Eulerian and Lagrangian) on different hardware (CPU and GPU). We will also show how similar ideas can be used to define approaches which would be intermediate between Direct Numerical Simulations and Large Eddy Simulations for turbulent flows.
La modélisation des milieux poreux fracturés est à la fois un enjeu industriel et environnemental majeur et un problème d'une redoutable complexité du fait de la large gamme d'échelles d'espace et de temps mises en jeu. Dans cet exposé on s'intéressera à la classe des modèles de fractures discrètes qui représente les fractures comme un réseau de surfaces de co-dimension 1 immergé dans le milieu 3D environnant. On étudiera la formulation mathématique de ces modèles pour des écoulements monophasiques et diphasiques, leur discrétisation par des méthodes de type volume fini sur des maillages polyédriques ainsi que leur parallélisation sur des architectures distribuées.
Differences in simulation results may be observed from one architecture to another or even inside the same architecture. Such reproducibility failures are often due to different rounding errors generated by different orders in the sequence of arithmetic operations. It must be pointed out that the cause of differences in results may be difficult to identify: rounding errors or bug? Such differences are particularly noticeable with multicore processors or GPUs (Graphics Processing Units).
In this talk, we describe the principles of DSA (Discrete Stochastic Arithmetic) which enables one to estimate rounding error propagation in simulation programs. We show that DSA can be used to estimate which digits in simulation results may be different from one environment to another because of rounding errors. We present the CADNA library (http://www.lip6.fr/cadna), an implementation of DSA that controls the numerical quality of programs and detects numerical instabilities generated during the execution. A particular version of CADNA which enables numerical validation in hybrid CPU-GPU environments is described. The estimation of numerical reproducibility using DSA is illustrated by a wave propagation code which can be affected by reproducibility problems when executed on different architectures.
Addressing the major challenges of software productivity and performance portability becomes necessary to take advantage of emerging computing architectures. There is a growing demand for new programming environments in order to improve scientific productivity, to ease design and implementation, and to optimize large production codes.
We introduce the numerical analysis specific language Nabla which is an open-source (nabla-lang.org) Domain Specific Language (DSL) whose purpose is to translate numerical analysis algorithmic sources in order to generate optimized code for different runtimes and architectures. Nabla raises the level of abstraction, following a bottom-up compositional approach that provides a methodology to co-design between applications and underlying software layers for existing middleware or heterogeneous execution models.
One of the key concept is the introduction of the hierarchical logical time within the high-performance computing scientific community. This new dimension to parallelism is explicitly expressed to go beyond the classical single-program multiple-data or bulk-synchronous parallel programming models. Control and data concurrencies can be combined consistently to achieve statically analysable transformations and efficient code generation. Shifting the complexity to the compiler offers an ease of programming and a more intuitive approach, while reaching the ability to target new hardware and leading to performance portability.
Combiner l'adaptation de maillage et le calcul massivement parallèle permet en théorie d'obtenir l'efficacité maximale en simulation numérique. Les stratégies parallèles s'appuient généralement sur la partition d'un maillage, elle-même réalisée à l'aide d'heuristiques géométrique ou topologique. L'adaptation de maillage anisotrope, non structuré, nécessite de revoir cette approche puisque tout change, le maillage et donc la partition aussi. Ce couplage maillage partition peut être résolu de façon itérative et l'ensemble du processus devient alors adaptatif. Maillage et partition évoluent localement pilotés par l'estimateur d'erreur.
Les méthodes de type frontière implicite permettent de découpler le maillage de la représentation géométrique des différentes phases que l'on veut prendre en compte dans un calcul. Cela vaut pour l'ensemble de calculs multiphasiques, liquide gaz ou fluide structure. Cela permet une approche monolithique à maillage unique et une régularisation des équations d'interface à partir d'une épaisseur, elle-même contrôlée par l'adaptation de maillage. Le calcul massivement parallèle devient possible et dépend de nouveau de l'extrême précision des itérations maillage partionnement. Nous montrerons des exemples de calculs multiphasiques à partir des méthodes éléments finis stabilisés et d'adaptation anisotrope et les performances obtenues sur quelques machines massivement parallèles.
One task in the numerical solution of partial differential equations is to define the computational mesh and its partition among the processors used, for any given domain geometry. For many applications, it is desirable to perform all algorithms for adaptive mesh refinement (AMR) in parallel, in core, and whenever the simulation requires it. Especially for large simulations, this imposes the condition that AMR should scale at least as well as the solvers.
Forest-of-octrees AMR is an approach that offers both geometric flexibility and parallel scalability and is being used in various finite element codes and libraries. Low and high order discretizations alike are enabled by parallel node numbering algorithms that encapsulate the semantics of sharing node values between processors. More general applications, such as semi-Lagrangian and patch-based methods, require additional AMR functionalities. In this talk, we present algorithmic concepts essential for leading-edge adaptive simulations.
En vue des difficultés d'accès au site du CEA Saclay, annoncées pour la durée de la COP21, nous avons pris la décision d'annuler le séminaire du 8 décembre, et de reprogrammer au 14 juin 2016 les talks prévus à cette date :
Ivan Duchemin : GW and BSE calculations with FIESTA
Maximilien Levesque : Solvation at the molecular scale, the MDFT route
Ceci afin que ces talks soient accessibles au plus grand nombre.
We propose to generalize the SIMT execution model of GPUs to general-purpose processors, and use it to design new CPU-GPU hybrid cores. These hybrid cores will form the building blocks of heterogeneous architectures mixing CPU-like cores and GPU-like cores that all share the same instruction set and programming model. The SIMT model used on some GPUs binds together threads of parallel applications so they perform the same instruction at the same time, in order to execute their instructions on energy-efficient SIMD units. Unfortunately, current GPU architectures lack the flexibility to work with standard instruction sets like x86 or ARM. Their implementation of SIMT requires special instruction sets with control-flow reconvergence annotations, and they do not support complex control flow like exceptions, context switches and thread migration. We will see how we can overcome all of these limitations and extend the SIMT model to conventional instruction sets using a PC-based instruction fetch policy. In addition, this solution enables key improvements that were not possible in the traditional SIMD model, such as simultaneous execution of divergent paths. It also opens the way for a whole spectrum of new architectures, hybrids of latency-oriented superscalar processors and throughput-oriented SIMT GPUs.
PGAS programming languages and models exist since some time. Their primary goal was to propose an alternative standard model for parallel programming as a replacement of MPI and OpenMP. No PGAS model currently emerges, particularly on recent parallel machines that include accelerators. Nevertheless, existing PGAS implementations keep improving their efficiency. At the same time, a tendency focuses on the interoperability with MPI, OpenMP and other standard tools. Also, some effort is done to incorporate PGAS concepts in non-PGAS programming models. We will present an overview on the current PGAS environment, existing tools and their use.
We present a fast and parallel finite volume scheme on unstructured meshes applied to air/water flow. The mathematical model is based on a three-dimensional compressible low Mach two-phase flows model, combined with a linearized 'artificial pressure' law. This hyperbolic system of conservation laws allows an explicit scheme, improved by a block-based adaptive mesh refinement scheme. The numerical density of entropy production (the amount of violation of the theoretical entropy inequality) is used as an error indicator. This criterion indicates efficiently where the mesh should be refined or coarsened. Moreover, the computational time is preserved using a local time-stepping method. Finally, we show through several test cases the efficiency of the present scheme on somes two- and three-dimensional dam-break problems.
Projection on Proper Elements (PoPe) is based on an in-depth analysis of diagnostics which only require minimal modifications of the code tested and a minimal computational overhead. No dedicated simulations are needed since this method can be used in any regimes, including chaotic ones. PoPe is based on the exploration of the bijection between the analytical model implemented in a code and the output of simulations: if the equations recovered from a simulation are equivalent to the ones theoretically implemented in the code, the code is then verified; if not, PoPe gives indication to find and correct the error. The accuracy of PoPe diagnostics also allows to recover the convergence of the numerical methods. The verification of a 2D fluid code TOKAM and a 4D gyro-kinetic code TERESA, both used in plasma physics, are presented.
Massively parallel simulations generate increasing volumes of large data, whose exploitation requires large storage resources, efficient network and increasingly large post-processing facilities. In the coming era of exascale computations, there is an emerging need for new data analysis and visualization strategies.
We will present here an original solution to address these questions for massively parallel direct numerical simulations of transitional and turbulent flows.
Domain decomposition methods are, alongside multigrid methods, one of the dominant paradigms in contemporary large-scale partial differential equation simulation.
In this talk, I will present a lightweight implementation (HPDDM, https://github.com/hpddm/hpddm) of theoretically and numerically scalable domain decomposition preconditioners in the context of overlapping and substructuring methods. A broad spectrum of applications will be covered, ranging from the scalar diffusion equation to Maxwell's equation, and including incompressible linear elasticity. Numerical results with hundreds of processes will be provided, clearly showing the effectiveness and the robustness of the proposed approaches.
HPDDM is currently interfaced with two finite element libraries, FreeFem++ (http://www.freefem.org/ff++/) and Feel++ (http://www.feelpp.org/), which allows for quick prototyping and throughout testing.
With the heterogeneity of modern parallel architectures, and with the democratization of parallel hardware in our day to day life, scientific applications has become, in the past few years, a complex HPC-centric development science. However, numericians are not computer-scientists, and even if some of them follow parallelization courses, the complexity of modern architectures and the needed knowledge on low level programming represent too much work and time to actually reach the performance peak of a given machine.
As a result, one of the main research topic of HPC is to find how to abstract intricate details of HPC programming to non-specialists. Some HPC programming models try to solve general problems, but most of the time to stay efficient, those models do not totally hide parallelization but simplify it. On the other hand, some HPC programming models try to make transparent or implicit the parallelization of codes. In this case, though, the model has to be specialized to a specific domain where parallelization technics are known and can be applied transparently.
I will present SIPSim (Structured Implicit Parallelism for scientific SIMulations), an implicitly-parallel programming model for mesh-based numerical simulations. This model proposes the good balance to keep a sequential programming style, and thus the development freedom searched by numericians, while obtaining a transparent high performance simulation on distributed memory architectures, such as clusters. I will also show that this model can be applied to more complex multi-physics simulations. The parallelization of two real case simulations will be presented using two implementations of the SIPSim model, one on a water flow simulation on 2D surfaces, the second one on an arterial blood flow simulation. Finally, I will open this work to the use of component and workflow models to implement the SIPSim model.
The infiltration of rain waters through soils are determinant for numerous engineering applications (water ressources, environment, geotechnics, ...) as well as in various natural phenomena, such as for instance weathering of continental surfaces (e.g.: Goddéris et al., 2012), which is a key process of the carbon cycle (Walker et al., 1981). A classical way to quantify such flows in variably saturated porous media consist in the numerical resolution of the Richards equation, 3D, instationnary and non-linear (Richards, 1931). However, the study of the hydrological behaviour of soils under evolving conditions (climate changes, land use changes, ...) requires modellings at large spatial scales (km 2 and beyond) and on large time scales (decades, century). The massively parallel computing is a major way of dealing whith such large scale problems (see for example Miller et al., 2013). We present here a massively parallel solver for Richards equation, RichardsFOAM.
This solver has been developed in the framework of the open source CFD tool box OpenFOAM ® (Jasak, 1996, Weller et al., 1998). RichardsFOAM is able to solve large scale problems due to the good parallel performances of OpenFOAM ® (with RichardsFOAM, about 90% of parallel efficiency with 1024 cores both in strong and weak scaling). These performances will allow us to propose mechanistic modellings of water fluxes at the relevant space and time scales for the study of weathering processes (> km 2 , > decades).
A detailed study of the parallel performances of RichardsFOAM will be presented (strong and weak scaling, I/O's impacts, numerical stiffness impacts) as well as an example of application on a field data set. The associated scientific perspectives will be discussed.
Python is getting widely used to quickly prototype numerical kernels, thanks to the numpy/scipy/matplotlib/ipython team. But when it comes to performance, it still lags behind equivalent native code. The Pythran compiler proposes a solution to this problem, by statically compiling and optimizing high-level Python/numpy kernels into parallel, vectorized C++11 code. The leitmotiv is to take high-level Numpy code, without the need of explicit loops, and rely on the semantic of the numpy operations to generate efficient code.
The talk will present both how Pythran works and how to make it work!
High performance computing hardware is increasing complex. Servers features at least two processors, with many cores each, shared caches, non-uniform memory access, and possibly accelerators. The actual hardware organization of these resources has a deep impact of HPC application performance since computation and data transfer speed depends on data locality. Unfortunately this organization as viewed from application is unpredictable. Resources can be hierarchically organized or horizontally ordered differently from one machine to another, making topology assumptions highly non-portable and causing application performance to vary significantly even on apparently-similar platforms.
This lecture will first detail the complexity of current hardware architectures and explain how it matters to HPC application performance. Then we will introduce the hwloc tool (Hardware Locality) which aims at hiding all these deep hardware details and non-portability issues. We will explain how hwloc models the hardware resource organization in an abstracted and portable manner and exposes it in a simple way to applications. Finally we will show how hwloc can ease the building of portable locality optimization in HPC libraries and applications.
On s'intéresse à la simulation de transitoires brutaux impliquant des fluides et des structures en interaction. La représentation des systèmes est très générale, avec une formulation eulérienne ou ALE pour les fluides, lagrangiennes pour les structures, et de nombreuses contraintes cinématiques assurant les couplages entre les entités (interaction fluide-structure ou contact entre structures par exemple).
Assurer un suivi précis des fronts dans un tel contexte suggère de recourir au Raffinement Adaptatif de Maillage (AMR). Il s'agit de suivre des ondes, dans les structures ou les fluides, ou des interfaces physiques comme des structures immergées ou des interfaces entre différentes phases fluides.
Le présent exposé est consacré aux spécificités de tels développements dans un code volumineux (EPX, http://www-epx.cea.fr dans le cas présent), pour intégrer simultanément les caractéristiques de la structure de données existante, le passage au calcul parallèle avec le plus d'efficacité et généricité possible, la multiplicité des critères d'adaptation du maillage et la prise en compte sans restriction des contraintes cinématiques.
In this talk, I will present a Molecular Dynamics N-body simulation for multi-accelerator heterogeneous architectures. It is based on OpenCL and features a new force computation algorithm focusing on using a low memory footprint while maintaining a high performance level. I will describe the implementation and detail how we had to slightly adapt the kernels for accelerators as different as NVIDIA GPU and Intel Xeon Phi to provide better opportunities for code optimizations. Finally I will present our research perspectives about enhancing the OpenCL ecosystem to achieve better portability and performance
La fin de la loi de Moore a remis en avant le problème de la portabilité des performances. Une rupture technologique majeure est apparue avec le concept de processeur multi-coeur qui s'est rapidement imposé au niveau des processeurs généralistes, des processeurs embarqués et des accélérateurs de calcul. Les plates-formes de calcul sont devenues très parallèles, hétérogènes avec une hiérarchie mémoire particulièrement complexe. Les noyaux de calcul des applications doivent être continuellement optimisés pour exploiter la complexité croissante des architectures et régulièrement adaptées aux évolutions architecturales. Il faut donc à la fois optimiser, porter et maintenir régulièrement ces noyaux de calcul. Sans cela, il n'est pas possible de tirer le meilleur parti des plates-formes modernes en termes de niveau de performance atteint par l'application, c'est-à-dire qu'une part souvent très importante de la puissance de calcul disponible est perdue.
Se pose aussi le problème de la maintenabilité et de la pérennité de ces noyaux, notamment avec le nombre d'architectures qui ne cessent de croître (Intel Xeon, Xeon Phi, GPU, FPGA, ARM, ...), utilisant des langages différents (C, Fortran, Cuda, OpenCL), avec des paradigmes de programmation différents (OpenMP, OpenACC, MPI,...) sur des compilateurs différents qui essaient d'exploiter des instructions intrinsèques, par exemple, de vectorisation. Les avancées en parallélisation automatique par les compilateurs n'ont pas été suffisantes et le travail de parallélisation/vectorisation repose encore sur les programmeurs.
Pour répondre à cette problématique, l'équipe-commune INRIA Corse a développé un environnement de génération automatique de codes [BOAST] permettant d'évaluer les différents modèles d'optimisations possibles sur une architecture donnée en générant de multiples variantes de codes source dont la performance est analysée. Cet environnement, BOAST, a démontré qu'il pouvait sur deux codes, BigDFT et SPECFEM3D fournir des performances supérieures aux routines déjà optimisées. Pour cela, il suffit d?exprimer dans un langage spécifique les noyaux de calcul, en décrivant les optimisations possibles (déroulement des boucles, vectorisation, ...). BOAST génère différentes versions des noyaux dans les langages cibles (C, Fortran, CUDA ou OpenCL), évalue ensuite leurs performances en les exécutant sur la plate-forme cible. Le programmeur peut ensuite sélectionner la plus performante et la plus adaptée à l'architecture visée.
BOAST a été financé et développé dans le contexte des projets européens FP7 Mont-Blanc (2011-2016).
Despite the significant progresses in blood flow simulations in the recent years, the accurate description of the complex multi-physics, multi-scale phenomena characterizing blood flow in realistic geometries still remains a very challenging task.
The aim of the first part of the lecture is to present our advances in developing an open-source framework for hemodynamics and to assess this framework through validation against experimental data for fluid flow in an idealized medical device with rigid boundaries. The core is built on a flexible generic library called Feel++ Finite Element method Embedded Language in C++ (www.feelpp.org), which allows for arbitrary order continuous and discontinuous Galerkin methods in 1D, 2D and 3D, seamlessly in parallel.
In the second part of the talk, we illustrate the capabilities of this framework by applying it to the development of a computational model for blood flow in the cerebral venous system. The study focuses on the influence of different modeling assumptions (inflow/outflow boundary conditions, viscosity models) on the flow (velocity field, wall shear stresses) and it constitutes a first step towards incorporating and quantifying different sources of uncertainty in the modeling process.
The talk is based on joint works with V. Chabannes, M. Ismail, C. Prud'homme and R. Tarabay.
On s'intéresse à la simulation de transitoires brutaux impliquant des fluides et des structures en interaction. La représentation des systèmes est très générale, avec une formulation eulérienne ou ALE pour les fluides, lagrangiennes pour les structures, et de nombreuses contraintes cinématiques assurant les couplages entre les entités (interaction fluide-structure ou contact entre structures par exemple).
Assurer un suivi précis des fronts dans un tel contexte suggère de recourir au Raffinement Adaptatif de Maillage (AMR). Il s'agit de suivre des ondes, dans les structures ou les fluides, ou des interfaces physiques comme des structures immergées ou des interfaces entre différentes phases fluides.
Le présent exposé est consacré aux spécificités de tels développements dans un code volumineux (EPX, http://www-epx.cea.fr dans le cas présent), pour intégrer simultanément les caractéristiques de la structure de données existante, le passage au calcul parallèle avec le plus d'efficacité et généricité possible, la multiplicité des critères d'adaptation du maillage et la prise en compte sans restriction des contraintes cinématiques.
Despite the significant progresses in blood flow simulations in the recent years, the accurate description of the complex multi-physics, multi-scale phenomena characterizing blood flow in realistic geometries still remains a very challenging task.
The aim of the first part of the lecture is to present our advances in developing an open-source framework for hemodynamics and to assess this framework through validation against experimental data for fluid flow in an idealized medical device with rigid boundaries. The core is built on a flexible generic library called Feel++ Finite Element method Embedded Language in C++ (www.feelpp.org), which allows for arbitrary order continuous and discontinuous Galerkin methods in 1D, 2D and 3D, seamlessly in parallel.
In the second part of the talk, we illustrate the capabilities of this framework by applying it to the development of a computational model for blood flow in the cerebral venous system. The study focuses on the influence of different modeling assumptions (inflow/outflow boundary conditions, viscosity models) on the flow (velocity field, wall shear stresses) and it constitutes a first step towards incorporating and quantifying different sources of uncertainty in the modeling process.
The talk is based on joint works with V. Chabannes, M. Ismail, C. Prud'homme and R. Tarabay.
The detection of gravitational waves is eagerly expected as one of the most important scientific discoveries of the next decade. A worldwide effort is now working actively to pursue this goal both at an experimental level, by building ever sensitive detectors, and at a theoretical level, by improving the modelling of the numerous sources of gravitational waves. Much of this theoretical work is made through the solution of the Einstein equations in those nonlinear regimes where no analytic solutions are possible or known. I will review how this is done in practice and highlight the considerable progress made recently in the description of the dynamics of binary systems of black holes and neutron stars. I will also discuss how the study of these systems provides information well beyond that contained in the gravitational waveforms and opens very exciting windows on the relativistic astrophysics of GRBs and of the cosmological evolution of massive black holes.
Heterogeneous multi-core platforms, mixing regular cores and dedicated accelerators, are now so widespread to have become the nowadays canonical computing architecture. To fully tap into the potential of these hybrid platforms, both in terms of computation efficiency and power saving, pure offloading approaches in which the main core of the application runs on regular processors and offloads specific parts on accelerators, are not sufficient. The real challenge is to allow applications to fully use the cumulated computing power of the entire machine, by scheduling parallel jobs dynamically over the whole set of available processing units. The Inria Team RUNTIME has been studying this problem of scheduling tasks on heterogeneous multi/many-core architectures for many years, which led to the design of the StarPU runtime system. The StarPU runtime is capable of scheduling tasks over heterogeneous, accelerator-based machines. Its core engine integrates both a scheduling framework and a software virtual shared memory (DSM), working in close relationship. The scheduling framework maintains an up-to-date, self-tuned database of kernel performance models over the available computing units to guide the task/unit mapping algorithms. The DSM keeps track of data copies within accelerator embedded memories and features a data-prefetching engine, avoiding expensive redundant memory transfers and enabling task computations to overlap unavoidable memory transfers. Such facilities were successfully used in the field of parallel linear algebra algorithms, notably, where StarPU is now one of the target backends of the University of Tennessee at Knoxville's MAGMA linear algebra library.The StarPU environment typically makes it much easier to exploit heterogeneous multicore machines. Thanks to the Sequential Task Flow programming paradigm, programs may submit tasks to the StarPU engine using the same logical order as the sequential version, thus preserving initial algorithm layouts and loop patterns.
Neuroscience has produced a large body of data on the function and anatomy of brain but the transformation of this knowledge into a coherent understanding has been limited. Computational modeling can integrate such fragmented data into models of brain structures that satisfy the broad range of constraints imposed by experiments, thus advancing our understanding of their computational role, and their implementation in the neural substrate.
In the first part of the lecture I will present a comprehensive multi-scale spiking model of cat primary visual cortex satisfying a range of anatomical, statistical and functional properties. It considers cortical layers 4 and 2/3, corresponding to a 5x5 mm patch of V1. We have subjected the model to numerous visual stimulation protocols covering a wide range of input statistics, from spa-rse noise to natural scenes with simulated eye-movements. The model expresses over multiple scales a number of statistical and functional properties previously identified experimentally including: spontaneous activity with a physiologically plausible resting conductance regime; contrast-invariant orientation-tuning width; realistic adaptive interplay between evoked excitatory and inhibitory conductances; center-surround interaction effects; and stimulus-dependent changes in the precision of the neural code as a function of input statistics. This data-driven model offers numerous insights into how the studied properties interact, and thus contributes to a better understanding of visual cortical dynamics.
In the second part of the talk I will discuss the technology that was required to simulate such model and share our experience of running the model on a small-scale cluster. However, we will show that so far the main challenges were not due to the scale of the model itself, but its complexity. This is due to the heterogeneous nature of the model itself, and the complexity of the experimental environment the model is tested in. To address these issues we have built a highly automated workflow covering the specification of stimulation and experimental protocols, higher-level model design, advanced analysis and visualization libraries, and central storage module utilizing common meta-data specification.
The cortical microcircuit, the network comprising a square millimeter of brain tissue, has been the subject of intense experimental and theoretical research. The lecture first introduces a full-scale model of this circuit at cellular and synaptic resolution : the model comprises about 100,000 neurons and one billion local synapses connecting them. The purpose of the model is to investigate the effect of network structure on the observed activity. To this end it incorporates cell-type specific connectivity but identical single neurons dynamics for all cell types. The emerging network activity exhibits a number of the fundamental properties of in vivo activity: asynchronous irregular activity, layer specific spike rates, higher spike rates of inhibitory neurons as compared to excitatory neurons, and a characteristic response to transient input. Despite this success, the explanatory power of such local models is limited as half of the synapses of each excitatory nerve cell have non-local origins and at the level of areas the brain constitutes a recurrent network of networks.
The second part of the lecture therefore argues for the need of brain-scale models to arrive at self-consistent descriptions of the multi-scale architecture of the network. Such models will enable us to relate the microscopic activity to mesoscopic measures and functional imaging data and to interpret those with respect to brain structure.
The third part of the lecture introduces the technology required to simulate such models and discusses the performance of the upcoming NEST simulation code. Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. Over the past decade researchers have learned to manage the heterogeneity with efficient data structures. Already early parallel codes had distributed target lists, consuming memory for a synapse on just one compute node. As petascale computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise: Each nerve cell contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any node. The heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. The latest technology takes advantage of this double collapse using metaprogramming techniques and orchestrates the full memory of petascale computers like JUQUEEN and the K computer into a single brain-scale simulation.
OpenMP 4.0 substantially enlarges the traditional OpenMP programming environment, in particular in the context of the task API for handling recursive or irregular parallel patterns. We discuss the motivations for the integration of advanced programming interfaces, and illustrate their relevance with a few examples. All the other major new features proposed by OpenMP 4.0 - cancellation of parallel constructs, thread affinity, vectorization and GPU programming directives - are also reviewed
The modeling and simulation of multiphase reacting flows covers a large spectrum of applications ranging from combustion in automobile and aeronautical engines to atmospheric pollution as well as biomedical engineering. In the framework of this seminar, we will mainly focus on a disperse liquid phase carried by a gaseous flow field which can be either laminar or turbulent; however, this spray can be polydisperse, that is constituted of droplets with a large size spectrum. Thus, such flows involve a large range of temporal and spatial scales which have to be resolved in order to capture the dynamics of the phenomena and provide reliable and eventually predictive simulation tools. Even if the power of the computer resources regularly increases, such very stiff problems can lead to serious numerical difficulties and prevent efficient multi-dimensional simulations.
We discuss recent developments in Adjoint Algorithmic Differentiation (AAD) tool support, e.g., in the context oflarge-scale parameter calibration methods. First-and second-order derivative-based approaches to solving the underlying numerical optimization problems are considered. Specific mathematical and structural properties of the underlying simulation are exploited. The superiority of AAD software tools over manual approaches to the implementation of adjoint numerical models as well as over numerical approximation of the required sensitivities by finite differences is illustrated.
On introduira une manière automatique, la méthode GENEO, de construire des espace grossiers pour les méthodes parallèles de type Schwarz, BNN et FETI qui s'adapte automatiquement aux coefficients très hétérogènes, aux découpages irréguliers du moment que l'on travaille avec une discrétisation éléments finis d'un problème symétrique positif. Cela inclut Darcy et l'élasticité 2D et 3D. La méthode GENEO a été mise en oeuvre dans le logiciel d?éléments finis FreeFem++. On présentera des tests d?extensibilité (« scalability » en anglais) et des comparaisons avec les meilleurs solveurs multigrille pour des problèmes bi ou tridimensionnels allant jusqu?à plusieurs milliards d?inconnues avec plusieurs milliers de coeurs.
La modélisation du climat est devenu l'un des principaux défis techniques et scientifiques du siècle, une vive polémique ayant surgi sur les questions du changement climatique. Une des limitations les plus importantes des modèles climatiques, couplant composante océanique et atmosphérique, est la faible résolution spatiale (~ 100 km) imposée par le coût de calcul. Cette contrainte limite considérablement le réalisme des processus physiques paramétrés dans les modèles. L'arrivée imminente des machines pétaflopiques en France offre l'occasion unique d'élaborer de nouveaux modèles climatiques, dans le but de réduire les biais et les incertitudes récurrentes, dans des simulations climatiques et pour des projections à long terme du changement global. Notre approche vise à construire une plate-forme de modélisation pour la réalisation de simulations couplées océan-atmosphère multi-échelle, en introduisant des modèles dit « zooms » océaniques et atmosphériques à haute résolution, dans ces deux régions, au sein d'un modèle climatique global. En suivant cette stratégie, nous serons en mesure de représenter les fines échelles océaniques et atmosphériques des processus dynamiques, et de permettre aux processus régionaux de rétroagir sur le climat global. Cette présentation introduira les problème techniques et scientifiques liés à la modélisation du climat en général et à ce projet en particulier. Nous illustrerons ensuite les premiers résultats obtenus à partir de simulations réalisées sur Curie.
High performance computing has helped raise computational biochemistry to new levels of accuracy and power, recognized last year by the Nobel prize awarded to Karplus, Levitt, and Warshel: the first Nobel prize ever to reward simulation work. Ressources like GPUs, volunteer computing and specialized computers allow protein folding and ligand binding within simulation timescales. Today's computers also allow us to upgrade our physical models and introduce very sophisticated treatments of the molecular interactions. Nevertheless, it is still a major challenge to simulate large cellular machines over realistic timescales. Another challenge is to explore the vast space of possible protein mutations, in search of new proteins or biocatalysts. We will illustrate some of these achievements and challlenges, with examples from the literature and our own work. Applications from our lab include theoretical issues with the calculation of biomolecular thermodynamics, and volunteer distributed computing for the design of new proteins.
POSTPONED The modeling and simulation of multiphase reacting flows covers a large spectrum of applications ranging from combustion in automobile and aeronautical engines to atmospheric pollution as well as biomedical engineering. In the framework of this seminar, we will mainly focus on a disperse liquid phase carried by a gaseous flow field which can be either laminar or turbulent; however, this spray can be polydisperse, that is constituted of droplets with a large size spectrum. Thus, such flows involve a large range of temporal and spatial scales which have to be resolved in order to capture the dynamics of the phenomena and provide reliable and eventually predictive simulation tools. Even if the power of the computer resources regularly increases, such very stiff problems can lead to serious numerical difficulties and prevent efficient multi-dimensional simulations. The purpose of this seminar is to introduce to the Eulerian modeling of polydisperse evaporating spray for various applications, that is the disperse liquid phase carried by a gaseous flow field is modeled by ``fluid" conservation equations. Such an approach is very competitive for real applications since it has strong ability for optimization on parallel architectures and leads to an easy coupling with the gaseous flow field resolution. We will show that all the necessary steps in order to develop a new generation of computational code have to be designed at the same time with a high level of coherence: mathematical modeling through Eulerian moment methods, development of new dedicated stable and accurate numerical methods, implementation of optimized algorithms as well as verification and validations of both model and methods using other codes and experimental measurements. We will introduce both a new class of models and their mathematical analysis for the direct numerical simulation of spray dynamics even in the presence of coalescence and break-up, as well as a set of dedicated numerical methods and prove that such an approach has the ability, once validated, to lead to high performance computing on parallel architectures. We will finally present a synthesis of recent contributions, which aim at: 1- on the one side transferring the proposed models into identified codes for industrial applications in the fields of solid propulsion, aeronautical and automotive engines, 2- on the other side extending the previous work to turbulent flows, where some scales can not be resolved and have to be modeled, and where some dedicated numerical methods have to be designed.
Predicting the performance of fusion plasmas in terms of amplification factor, namely the ratio of the fusion power over the injected power, is among the key challenges in fusion plasma physics. In this perspective, turbulence and heat transport need being modeled within the most accurate theoretical framework, using first-principle non-linear simulation tools. The gyrokinetic equation for each species, coupled to Maxwell?s equations are an appropriate self-consistent description of this problem. A new class of global full-f codes has recently emerged, solving the gyrokinetic equation for the entire distribution function on a large radial domain of the tokamak and using some prescribed external heat source [1]. Such simulations are extremely challenging and require state-of-the-art high performance computing (HPC). The non-linear global full-f gyrokinetic 5D code GYSELA, which focuses on the electrostatic toroidal branch of the Ion Temperature Gradient driven turbulence with adiabatic electrons, is one of them. One particularity of the code is to solve the self-consistent problem on a fixed grid with a Backward Semi-Lagrangian scheme [2]. Despite the non-locality of this method, the new two-ion-species version of the code has been successfully ported on BlueGene architecture with a relative efficiency of 91% on 458 752 cores (weak scaling). The hybrid OpenMP/MPI parallelization which allows to obtain such performance will be detailed. One mid-term objective is to implement kinetic electrons in the code. This will need an increase of the mesh size by a factor of the order of 103 and need a decrease of the algorithm time step by a factor 10. Present simulations already require petascale computing resources. We will discuss the various approaches we currently investigate to prepare the code for our exascale future needs.
In this talk we will describe in detail a Density Functional Theory method based on a Daubechies wavelets basis set, named BigDFT. We will see that, thanks to wavelet properties, this code shows high systematic convergence properties, very good performances and an excellent efficiency for parallel calculations. BigDFT code operation are also well-suited for GPU acceleration. We will discuss how the problematic of fruitfully benefit of this new technology can be match with the needs of robustness and flexibility of a complex code like BigDFT.
Computational biology greatly benefits from approaches such as molecular dynamics simulations to study complex molecular assemblies. In this context, interactive visualization, manipulation and analysis aids hypothesis generation and exploration of large datasets. Integration of experimental data (SAXS, CryoEM, ..) in this modeling process is challenging. I will illustrate these issues through a) work on a complete dystrophin filament model [1] using our BioSpring simulation engine and b) interactive simulations and analysis of membrane proteins [2]. To tackle the corresponding visualization challenges, my group recently developed the UnityMol framework [3], based on the Unity3D game engine. A particular focus lies on interactive exploration and manipulation using tools such as haptic devices or very recently the LeapMotion controller. Possible display platforms are mobile devices, desktop workstations, display walls and virtual reality setups (CAVE, workbench,..). [1] Molza et al.,FD169: Innovative interactive flexible docking method for multi-scale reconstruction elucidates dystrophin molecular assembly, Faraday Discussion #169, 2014 [2] Dreher et al., ExaViz: a Flexible Framework to Analyse, Steer and Interact with Molecular Dynamics Simulations, Faraday Discussion #169, 2014 [3] Lv et al., Game on, Science - how video game technology may help biologists tackle visualization challenges, PLoS ONE 8(3):e57990, 2013 (http://unitymol.sourceforge.net)
Performing large, intensive or non-trivial computing on array like data structures is one of the most common task in scientific computing, video game development and other fields. This matter of fact is backed up by the large number of tools, languages and libraries to perform such tasks. If we restrict ourselves to C++ based solutions, more than a dozen such libraries exists from BLAS/LAPACK C++ binding to template meta-programming based Blitz++ or Eigen2. If all of these libraries provide good performance or good abstraction, none of them seems to fit the need of so many different user types. Moreover, as parallel system complexity grows, the need to maintain all those components quickly become unwieldy. This talk explores various software design techniques - like Generative Programming, MetaProgramming and Generic Programming - and their application to the implementation of various parallel computing libraries in such a way that: - abstraction and expressiveness are maximized - cost over efficiency is minimized As a conclusion, we'll skim over various applications and see how they can benefit from such tools.
Recent years have seen impressive progress towards hydrodynamic cosmological simulations of galaxy formation that try to account for much of the relevant physics in a realistic fashion. At the same time, numerical uncertainties and scaling limitations in the available simulation codes have been recognized as important challenges. I will review the state of the field in this area, highlighting a number of recent results obtained with large particle-based and mesh based simulations. I will also describe a novel moving-mesh methodology for gas dynamics in which a fully dynamic and adaptive Voronoi tessellation is used to formulate a finite volume discretization of hydrodynamics which offers numerous advantages compared with traditional techniques. The new approach is fully Galilei-invariant and gives much smaller advection errors than ordinary Eulerian codes, while at the same time offering comparable accuracy for treating shocks and an improved treatment of contact discontinuities. The scheme adjusts its spatial resolution to the local clustering of the flow automatically and continuously, and hence retains a principle advantage of SPH for simulations of cosmological structure growth. Applications of the method in large production calculations that aim to produce disc galaxies similar to the Milky Way will be discussed.
Since 1998, the question of the origin of the accelerated expansion of the Universe has become one of the most fundamental open problem in cosmology. To investigate this question, numerical simulations of large scale structure formation such as clusters of galaxies and filaments are a particularly relevant tool. In this presentation, I will review the current status of these simulations and the related computational problems. I will focus on the Full Universe Run that has been carried out on the entire Curie supercomputer in 2012. Then, I will highlight the lessons taken from this simulation and the ongoing library development based on generative programming to address most of the issues of current cosmological codes. The goal is to avoid the intertwining of physics, algorithms, and parallelization, and to make the best of supercomputers by creating a "compiler inside a compiler". I will describe these techniques and explain their substantial advantages for large cosmological simulations.
As HPC systems keep growing in scale, providing efficient fault tolerance mechanisms becomes a major issue. Studies on future exascale systems highlight that, considering that the expected meantime between failures will range from one day to a few hours, simple solutions based on coordinated checkpoints saved to a parallel file system will not work: more time will be spent dealing with failures than doing useful computation. As a consequence, new checkpointing techniques should be designed. A checkpointing protocol for large scale HPC systems should provide good performance in failure-free execution and in recovery while limiting the amount of resources used for fault tolerance. Designing a solution that can achieve all these conflicting goals is a hard task. In this talk, I will introduce hybrid rollback-recovery protocols as a solution to this problem: A hybrid rollback-recovery protocol combines coordinated checkpointing with some message logging to provide failure containment. In a first part, I will explain why hybrid protocols can be efficiently applied to most MPI HPC applications. In a second part, I will present SPBC, our new checkpointing solution based on an hybrid protocol. SPBC is the first checkpointing solution that provides failure containment without logging any information reliably apart from process checkpoints, and this, without penalizing recovery performance. To achieve this result, we used an original approach: Instead of designing a protocol that works for all message-passing applications, we identified properties common to our target applications, namely MPI HPC applications, and we leveraged these properties to design a fault tolerant solution that can be more efficient than existing protocols at large scale.
With exascale computing on the horizon, the performance variability of I/O systems represents a key challenge in sustaining high performance. In many HPC applications, I/O is concurrently performed by all processes, which leads to I/O bursts. This causes resource contention and substantial variability of I/O performance, which significantly impacts the overall application performance and, most importantly, its predictability over time. In this talk, we describe an original approach to I/O, called Damaris, which leverages dedicated I/O cores on each multicore SMP node, along with the use of shared-memory, to efficiently perform asynchronous data processing and I/O in order to hide this variability. We evaluated Damaris on various supercomputers including Titan (Top 1st in Top500 at the time of the experiment), Jaguar (Top 2nd at the time of the experiment) and Kraken (Top11th at the time of the experiment), with the CM1 atmospheric model, one of the target HPC applications for the Blue Waters postpetascale supercomputer project. By overlapping I/O with computation and by gathering data into large files while avoiding synchronization between cores, our solution brings several benefits: 1) it fully hides jitter as well as all I/O-related costs, which makes simulation performance predictable; 2) it increases the sustained write throughput by a factor of 15 compared to standard approaches; 3) it allows almost perfect scalability of the simulation up to over 16,000 cores, as opposed to state-of-the-art approaches which fail to scale; 4) it enables a 600% compression ratio without any additional overhead, leading to a major reduction of storage requirements. Additionally, based on Damaris, the Damaris/Viz framework provides support for easy, nonintrusive in situ visualization.
The simulation of astrophysical phenomena generally feature a large spread of length and time scales. This makes them extremely challenging from a computational point of view. In many instances, delicate trade-offs between resolution and fidelity to the physical conservation laws have to be committed in order to overcome the computational limitations. It is more the rule than the exception, that astrophysical phenomena are modeled in a resolution starved regime. This puts great robustness requirements on the used numerical methods. For instance, many systems of conservation laws used to model physical phenomena posses companion laws. These companion laws are generically fulfilled by analytical solutions to the original system of conservation laws. However, this assertion may not remain true when the equations are solved numerically. A prominent example is the divergence constraint on the magnetic field in magnetohydrodynamics. Other examples include the conservation of angular momentum and the preservation of steady states. We will present a class of methods, termed as structure preserving, that are constructed to fulfill as many as possible companion laws of the original system of conservations laws. The defect of standard numerical methods and the need for structure preserving ones will be illustrated through several challenging astrophysical scenarios, including the simulation of magneto-rotationally driven core-collapse supernovae and the merger of two neutron stars.
When a massive star reaches the end of its life, the core of the star collapses to a neutron star or black hole while the outer stellar layers are expelled in a supernova explosion. These cosmic catastrophes are not only among the most spectacular celestial phenomena, they are also responsible for the production and dissemination of a major part of the heavy elements in the universe. A better understanding of the role of supernovae in astrophysics and as laboratories for nuclear and particle physics at extreme conditions requires the solution of one of the most long-standing problems of stellar physics: What is the mechanism that initiates and powers the explosion of stars? Increasingly sophisticated numerical models provide growing support that the energy deposition by neutrinos radiated from the hot, newly formed neutron star and aided by violent hydrodynamic mass motions is the driving agency of the explosion. In this talk I will review recent successes of theoretical modeling and new questions arising as simulations currently push forward to meet the grand computational challenges of the third spatial dimension. I will also discuss possibilities to confront the theoretical picture with observational tests and constraints.
In this work we investigate the parallel scalability of variants of additive Schwarz preconditioners for three dimensional non-overlapping domain decomposition methods. To alleviate the computational cost, both in terms of memory and floating-point complexity, we investigate variants based on a sparse approximation. The robustness of the preconditioners is illustrated on a set of linear systems arising from the finite element discretization of academic convection-diffusion problems (un-symmetric matrices), and from real-life structural mechanical problems (symmetric indefinite matrices). Parallel experiments on up to a thousand processors on some problems will be presented and results of an ongoing implementation on top of runtime systems for heterogeneous computing will be discussed. The efficiency from a numerical and parallel performance view point are studied on problem ranging from a few hundred thousands unknowns up-to a few tens of millions.
Life Science is already an important player in computational science and with the ever increasing computational resources it is now possible to study systems composed of thousands of atoms. While different models are used to simulate the behaviour of these systems, all of them are solving an N-Body problem.
The code POLARIS(MD), developed by Michel Masella from the CEA/DSV, is based on a hierarchical model of the solvent surrounding a protein and an accurate microscopic interaction model. This code can handle large biological systems on far fewer CPU cores compared to its peers, which allows a biochemist to explore the space of the molecular complexity and to find the most suitable molecule at much lower cost.
At the Exascale Computing Research center we have analysed POLARIS(MD) with our in-house tools as well as common analysis tools. We have further optimized the code to run efficiently on Intel Xeon and Xeon-Phi architectures, in order to prepare for forthcoming architectures. The talk will present the latest results and give an overview of the experience gathered while preparing codes for higher parallelism on a single node.
Today's many-core GPU allow theoretical teraflop performance computing at the cost of a personal computer. But the parallel performance strongly depends on the choice of both models and numerical methods. We believe that the rising of manycore processing will certainly deeply impact the next-generation computational approaches. In this talk we will focus on few examples: compact stencil remapped Lagrange methods, low-diffusive transport solvers for interface capturing. We will conclude the talk by a set of demo, showing the capability to attain runtime computations with real visualization and interaction for 2D computations (including incompressible Navier-Stokes, compressible Euler equations and thermal-CFD coupling). About applications, we plan to develop serious games involving multiple users using heterogeneous interacting devices. This will be deployed on the DIGISCOPE visualization infrastructure.
We present several numerical simulations of conservation laws on recent multicore processors, such as GPU's, using the OpenCL programming framework. Depending on the chosen numerical method, different implementation strategies have to be considered, for achieving the best performance. We explain how to program efficiently three methods: a finite volume approach on a structured grid, a high order Discontinuous Galerkin (DG) method on an unstructured grid and a Particle-In-Cell (PIC) method. The three methods are respectively applied to a two-fluid computation, a Maxwell simulation and a Vlasov-Maxwell simulation
Domain decomposition methods are well suited for parallel computations. Indeed, the division of a problem into smaller subproblems, through artificial subdivisions of the domain, is a means for introducing parallelism. Domain decomposition strategies include in one way or another the following ingredients:
This talk shows how these methods have efficiently evolved over the years by using specially designed boundary conditions on the interface. These optimized interface conditions designed to take into account the heterogeneity between the subdomains on each sides of the interfaces (in porous media), or the propagation of the wave through the interfaces (in acoustics), for instance, lead to robust and efficient algorithms. In order to use such methods on massive parallel computers, the iterative scheme should be modified, as investigated in this talk. Chaotic iterations are here considered for the solution strategy of the interface problem, leading to some convergence difficulties. After the presentation of the proof of the convergence of the method, numerical experiments are performed on large scale engineering problems to illustrate the robustness and efficiency of the proposed method.
XIOS est un nouvel outil développé à l?IPSL (Institut Pierre Simon Laplace) destiné à gérer efficacement les sorties fichiers des modèles de simulations climatiques. Il vise deux principaux objectifs :
Quantum chemistry is known to be one of the grand challenges of modern science since many fundamental and applied fields are concerned (drug design, micro-electronics, nanosciences,...). To investigate all these fascinating problems is a tremendous task since highly accurate solutions of the fundamental underlying Schrödinger equation for a (very) large number of electrons need to be determined. The use of Quantum Monte Carlo methods is an emerging alternative approach to usual methods since they can take advantage of massively parallel architectures. In this talk the QMC=Chem program we develop in Toulouse will be presented, as well as the different strategies we used to reach the petaflops/s scale.