Tous les seconds Mardi du mois nous invitons un orateur pour une présentation sur un sujet relatif à la modélisation scientifique et le calcul haute performance (HPC). Ces présentations prennent place à la Maison de la Simulation et vous êtes le bienvenu pour y assister. Aucune inscription n'est nécessaire. Les séminaires commencent vers 10h du matin avec du café et des gâteaux offert à 9h30.
Une liste de diffusion est disponible si vous souhaitez être informé pour les prochains séminaires. Vous pouvez vous inscrire ici. Pour plus d'information vous pouvez nous contacter.
Large scale numerical simulations are producing an ever growing amount of data that include the simulation results as well as execution traces and logs. These data represent a double challenge. First, these amounts of data are becoming increasingly difficult to analyse relying on traditional tools. Next, moving these data from the simulation to disks, to latter retrieve them from disks to the analysis machine is becoming increasingly costly in term of time and energy. And this situation is expected to worsen as supercomputer I/Os and more generally data movements capabilities are progressing more slowly than compute capabilities. While the simulation was at the center of all attentions, it is now time to focus on high performance data analysis. This integration of data analytics with large scale simulations represents a new kind of workflow that needs adapted software solutions.
In this talk we will survey two major trends related to data analysis, namely Big Data Anlytics and In-Situ Analytics, and confront their benefits and shortcomings.
Big Data Analytics solutions like MapReduce, Spark or Flink were developed to answer the needs for analysing large amount of data from the web, social networks, or generated by business applications on cloud infrastructures. The machines (HPC versus cloud) and the data (mainly text versus structured numerical data) deeply differ. During this talk we will survey the main concepts supported by modern Map/Reduce frameworks and look at some attempts to use these tools for the analysis of numerical data.
The second trend is in-situ analytics. In-Situ Analytics is a recent paradigm that emerged as a solution to the scalability issue of postmortem data analysis in the HPC context. It proposes to start processing the data while the simulation is running, as soon as the raw results are available compute nodes? memory, with multiple benefits. Data produced by the simulation can start to be reduced before moving out of the compute nodes, thus saving on data movements and on the amount of data eventually stored to disk. We will discuss the main concepts and show some examples of in-situ analytics scenarios we developed with HPC users.
Grid-based solvers for the Vlasov equation give accurate results but suffer from the curse of dimensionality. To enable the grid-based solution of the Vlasov equation in 6d phase-space, we need efficient parallelization schemes. In this talk, we consider the 6d Vlasov-Poisson problem discretized by a split-step semi-Lagrangian scheme. To optimize single node performance, we use vectorization, efficient data access and OpenMP parallelism. For distributed memory parallelism, we consider two parallelization strategies : A remapping strategy that works with two different layouts keeping parts of the dimensions sequential and a classical partitioning into hyperrectangles. The 1d interpolations can be performed sequentially on each processor for the remapping scheme. On the other hand, the remapping consists in an all-to-all communication pattern. The partitioning only requires localized communication but each 1d interpolation needs to be performed on distributed data. We compare both parallelization schemes and discuss how to efficiently handle the domain boundaries in the interpolation for partitioning. In order to extend the domain partitioning strategy to problems with fast gyration around a magnetic field, we propose to use a moving mesh in velocity space. For the domain partitioning, we will also discuss the optimal choice of the process grid as well as compression of the halo cells. This is joint work with Klaus Reuter, Markus Rampp and Eric Sonnendrücker.
Software defined storage promises ease of deployment on a large varieties of hardware platforms, specifically commodity ones. SDS is seen simultaneously as a way to reduce cost and increase user adaptability. It appears that the emancipation from the powerful appliances with numerous built-in fail-over capabilities that use to be at the core HPC storage architecture remains a daunting task.
This talk, taking the IME as a case study, will browse the current challenges to address in order to shift to commodity hardware and will details some of the solutions that have been implemented in the IME project.
I will describe the development of a new code aimed at the study of hydrodynamical processes in stellar interiors. Current understanding of the evolution of stellar interiors relies on one-dimensional calculations. Complex physical processes which drive this evolution, such as convection, rotation, or accretion, are described by simplified, phenomenological approaches. However, the predictive power of these methods is severely hindered by the many free parameters employed by them. In an effort to redress this situation the Multi-dimensional Stellar Implicit Code (MUSIC) has been developed. By solving the equations of hydrodynamics in spherical coordinates the multi-dimensional processes at the heart of stellar evolution can be studied directly. The use of time-implicit methods allows the specific time-scale of interest to be targeted, and for statistically meaningful quantities of data to be gathered. Recent results from two applications will be presented: first, the accretion onto a young solar type star, and its impact on subsequent evolution; and second the problem of convective overshooting and its influence on lithium depletion.
With the advance of modern high performance computing, large scale simulations have been able to account for more and more realistic physics, capturing very large dynamical ranges. One of the challenges in the current approaches towards exascale computing and ever increasing resolutions is the change of physics in-between scales. The simulations currently running on supercomputers have been designed for specific purposes and cannot be easily modified to accommodate more physics. But reaching to different orders of magnitudes in scales requires also to change the physics computed. Moreover, writing new software to account simultaneously for different scales, while at the same time including as much features as the legacy codes that have been written and maintained for decades is getting harder and harder. One solution to that problem is, instead of rewriting them, coupling existing codes. I will present the efforts that have been made at University of Surrey to couple two astrophysics codes : Ramses (P3M + hydrodynamics using an AMR grid) and NBody6 (direct summation code) via MPI and MIMD techniques. This coupling will allow us to run precise simulations of globular clusters in interaction with a host galaxy. In these runs, Ramses is managing all the hydro and collisionless dynamics of a host galaxy while NBody6 is in charge of integrating precisely the trajectory and stellar evolution of stars in a globular cluster. Such a system allow us to precisely simulate tidal interaction between the two objects, something that has been done with rough analytical models until now. Such systems allow us to cover up to nine orders of magnitude in space, time and mass resolutions.
It is well known that the inviscid, adiabatic equations of atmospheric motion constitute a non-canonical Hamiltonian system, and therefore posses many important conserved quantities such as as mass, potential vorticity and total energy. In addition, there are also key mimetic properties (such as curl grad = 0) of the underlying continuous vector calculus. Ideally, a dynamical core should have similar properties.
A general approach to deriving such structure-preserving numerical schemes has been developed through a combination of Hamiltonian methods and mimetic discretizations. Beyond these structure-preserving properties, modern dynamical cores must be efficient on a wide range of computational architectures, and should be able to efficiency leverage the increasing parallelism of modern machines. This is achieved through the selection of a particular class of mimetic discretizations: structured grid, tensor product mimetic Galerkin methods.
This talk will discuss Dynamico-FE, a new structure-preserving hydrostatic atmospheric dynamical core built using these techniques, and show results from a standard set of test cases on both the plane and the sphere. It will also briefly discuss the Themis software framework (used to construct this code), which is designed specifically for tensor product Galerkin methods on structured grids.
Climate models simulate atmospheric flows interacting with many physical processes. Because they address long time scales, from centuries to millennia, they need to be efficient, but not at the expense of certain desirable properties, especially conservation of total mass and energy. Most of my talk will explain the design principles behind DYNAMICO, a highly scalable unstructured-mesh energy-conserving finite volume/mimetic finite difference atmospheric flow solver and potential successor of LMD-Z, a structured-mesh (longitude-latitude) solver currently operational as part of IPSL-CM, the Earth System Model developed by Institut Pierre Simon Laplace (IPSL).
Specifically, the design of DYNAMICO leverages the variational structure of the equations of motion and their Hamiltonian formulation, so that the conservation of energy requires only that the discrete grad and div operators be compatible, i.e. that a discrete integration by parts formula holds. At the implementation level, performance is achieved by combining a simple memory layout allowing vectorization, mixed MPI/OpenMP parallelism and using the asynchronous parallel I/O server XIOS.
Dans cette présentation je présente une vision des impacts de la convergence calculs-données sur les aspects logiciels des applications du calcul numériques d'une part et d'autre part sur une synthèse des travaux du projet Européen EXDCI relatif à cette problématique.
Joint work with:
Heat equation and Poisson equations are basic blocks of many numerical methods for partial differential equations (PDE). These two equations, which could be considered as simple are actually numerical bottlenecks in many applications like fluid mechanics, plasma physics and so on, as obtaining fast solvers is always challenging, at least in dimension 3.
I will show that fast and precise parallel solvers are obtained when two
conditions are fulfilled:
1) for the heat equation, use explicit high order stabilized methods,
2) perform arithmetic intensive matrix vector products obtained from
high order discretizations.
In the next decade, exascale supercomputers will provide the computational power required to perform very large scale simulations. For certain applications the results of exascale simulations will be of such high reslution that experimental measurements will be insufficient for validation purposes. As floating point approximations of numeric expressions are neither associative nor distributive, the results of a numerical simulation can differ between executions. As reported by the numerician I. S. Duff, "Getting different results for different runs of the same computation can be disconcerting for users even if, in a sense, both results are correct". There is a need to have an automatic and global approach giving a confidence interval on the results taking into account the floating point arithmetic effect. The estimation of the effect of the floating point model on the accuracy of the computed results is the first step of rigorous Verification and Validation (VandV) procedure.
This talk is organised as follows. The context of our work and in particular the dark side of the floating point computation is firstly presented. A brief overview of some numerical verification tools is the reported. Then, the new tool called verificarlo is exposed. Using verificarlo is transparent for the user and does not require manually modifying the source code. It can be used for the automatic assessment of the numerical accuracy of large scale digital simulations by using the Monte-Carlo Arithmetic. Several examples will be displayed in particular the numerical verification of the solving of linear systems using the LAPACK and BLAS scientific libraries.
Radiotherapy treatments consists in irradiating the patient with beams of energetic particles (typically photons) targeting the tumor. Such particles are transported through the medium deposit energy in the medium. This deposited energy is the so-called dose, responsible for the biological effect of the radiations. The present work aim to develop numerical methods for dose computation and optimization that are competitive in terms of computational cost and accuracy compared to reference method.
The motion of particles is first studied through a system of linear transport equations at the kinetic level. However, solving directly such systems is numerically too costly for medical application. Instead, the moment method is used with a special focus on the Mn models. Those moment equations are non-linear and valid under a condition called realizability.
Standard numerical schemes for moment equations are constrained by stability conditions which happen to be very restrictive when the medium contains low density regions. Inconditionally stable numerical schemes adapted to moment equations (preserving the realizability property) are developped. Those schemes are shown to be competitive in terms of computational costs compared to reference approaches. Finally they are applied to in an optimization procedure aiming to maximize the dose in the tumor and to minimize the dose in healthy tissues.
Magnetic fusion research aims at developing power plants based on the fusion of light nuclei, which produces a large amount of energy and no radioactive waste. The energy of the sun results from such fusion reactions. On earth a promising concept to gain energy from fusion is magnetic confinement, where charged particles are confined at a high temperature for a long enough time using a magnetic field. An international experiment called ITER is being build in Cadarache near Aix-en-Provence to attest the feasibility of the concept.
Even though the main idea of confining the particles with a magnetic field seems simple and natural, magnetized plasmas, which are dense collections of charged particles, exhibit many instabilities that need to be controlled. This can be done only via intensive numerical simulations.
The "Numerical Methods in Plasma Physics" division at the Max-Planck Institute for plasma physics in Garching (Germany) develops numerical methods and algorithms for magnetic fusion simulations. It also hosts the High Level Support Team of the EUROfusion consortium, that helps profiling and optimizing the major european Fusion codes for the EUROfusion high performance computer, currently Marconi-fusion hosted by CINECA in Italy. In this talk we will present an overview of the activities of the division including the main models Kinetic, MHD, Maxwell used in magnetic fusion and the development of the software libraries SeLaLib for semi-Lagrangian and PIC kinetic simulations and Django-Jorek for Finite Element MHD and full-wave (Maxwell) simulations.
The phase diagram of high pressure hydrogen is of great interest for fundamental research, planetary physics, and energy applications[1]. Laboratory experiments to reach the appropriate physical conditions are difficult and extremely expensive, therefore ab-initio theory has played a crucial role in developing the field. The accuracy of the conventional method based on Density Functional Theory (DFT) is however limited and often non-predictive. We have developed a quantitative methodology based on Quantum Monte Carlo methods to study hydrogen in extreme conditions: the Coupled Electron-Ion Monte Carlo method (CEIMC)[2].
After a brief introduction to the physical problem, I will outline the main ingredients of the method and describe some applications to high pressure hydrogen.
In particular I will focus on the the liquid-liquid phase transition, a first-order phase transition in the fluid phase between a molecular insulating fluid and a monoatomic metallic fluid. The existence and precise location of the transition line is relevant for planetary models. Recent experiments reported contrasting results about the location of the transition[3,4,5]. Theoretical results based on DTF are also very scattered[6,7,8,5]. We report accurate CEIMC calculations of this transition finding results that lie between the two experimental predictions, close to that measured in diamond anvil cell experiments but at 25-30 GPa higher pressure. The transition along an isotherm is signaled by a discontinuity in the specific volume, a ?sudden dissociation of the molecules, a jump in electrical conductivity and loss of electron localization[9].
references:
[1] J.M. McMahon, M.A. Morales, C. Pierleoni and D.M. Ceperley, ?The properties of hydrogen and helium under
extreme conditions?, Review of Modern Physics 84, 1607 (2012).
[2] C. Pierleoni and D.M. Ceperley, ?The Coupled Electron-Ion Monte Carlo method?, Lect. Notes Phys. 703,
641?683 (2006).
[3] Zaghoo M, Salamat A, Silvera IF, ?Evidence of a first-order phase transition to metallic hydrogen?, Phys.
Rev. B 93, 155128 (2016).
[4] Ohta K et al. ?Phase boundary of hot dense fluid hydrogen? Scientific Reports 5:16560 (2015).
[5] Knudson MD et al. ?Direct observation of an abrupt insulator-to-metal transition in dense liquid deuterium?
Science 348, 1455 (2015).
[6] M.A. Morales, C. Pierleoni, E. Schwegler and D.M. Ceperley ?Evidence for a first-order liquid-liquid
transition in high-pressure hydrogen from ab initio simulations?, PNAS 107, 12799 (2010).
[7] M.A. Morales, J.M. McMahon, C. Pierleoni, D.M. Ceperley, ?Nuclear quantum effects and nonlocal
exchange-correlation functionals applied to liquid hydrogen at high pressure?, Phys. Rev. Lett. 110,
065702 (2013).
[8] W. Lorenzen, B. Holst, R. Redmer, ?First-order liquid-liquid phase transition in dense hydrogen? Phys. Rev.
B 82, 195107 (2010).
[9] C. Pierleoni, M.A. Morales, G. Rillo, M. Holzmann and D.M. Ceperley, ?Liquid-liquid phase transition in
hydrogen by Coupled Electron-Ion Monte Carlo Simulations?, PNAS 113, 4953?4957(2016).
With highly-pipelined vector processors, hardware accelerators (GPU, FPGA), deeper memory hierarchies, and heterogeneous designs becoming mainstream, programmability, portability, and productivity are now important facets to consider on the way to performance. In this context, I will try to illustrate how compiler research, through language developments and automation of code analysis and optimizations, addresses these concerns and why the gap between what compilers can do and what HPC users hope they could do is still very large.
Cost models (for both the users and the compilers), exchange between compiler and user (application knowledge and optimization reporting), and limits of code analysis, remain serious issues. Nevertheless cost models (such as roofline and ECM), communications (such as automatic offloading), locality optimizations (such as tiling), language design (of various kinds) still make regular progress in these directions.
We describe how random transformations can accelerate the solution of linear systems by preventing the communication overhead due to pivoting. We have applied successfully this technique to dense linear systems (general or symmetric indefinite), resulting in efficient solvers for current parallel architectures, including multicore, GPU or Intel Xeon Phi, and already integrated in the public domain scientific library MAGMA. We also present some experiments using direct sparse factorizations where randomization is combined with sparsity-preserving strategies. Finally we illustrate how the some iterative solvers based on Krylov subspace method can also benefit from this approach.
Turbulent transport of solid particles in a suspending Newtonian fluid is often present in natural and industrial contexts. Few of many well-known examples are sediment transport in a river bed, sandstorms, slurries, and the flocculation and sedimentation processes in the treatment of drinking water. In many cases the particles have a finite-size, i.e., a size comparable to or larger than the smallest scales of the turbulent flow. In these cases turbulence ? in itself one of the most challenging problems in classical physics ? is greatly modified due to the presence of the particles, which interact both with each other and with the suspending fluid. The continuous growth in computing power together with the development of efficient numerical algorithms makes the simulation of the detailed interaction of many particles with the fluid turbulence possible. Possible, but challenging. We will present an overview of the steps taken to achieve such massive simulations. In particular, we will present our numerical algorithm, elaborate on technical details such as parallelization and data handling, and finalize with some relevant scientific findings.
The ONERA CFD department has been developing and supporting computational fluid dynamics software for decades both for its own research and for industrial partners in the aeronautical domain. Nowadays, the elsA software, developed at ONERA since 1997, is one of the major CFD tools used by Airbus, Eurocopter and Safran. In their design services, it is massively employed to optimize airplane performance (noise or energy consumption reduction, safety improvement,...). Due to environmental constraint, noise reduction in the vicinity of airports has become a major challenge for aircraft manufacturer. The noise radiated during the landing phase is due to turbulent vortices generated by landing gears and flaps in the wings, which act like powerfull whistles. The numerical simulation of the generated noise requires to handle the complex detailled geometry of landing gear or flaps and to solve billions of unknows at each timestep to describe the time evolution of turbulence vortices during millions of timestep in order to compute few seconds of the physical time. Therefore HPC capabilities, complex geometries (re)meshing and multiphysics coupling (noise generator and propagator) are crucial points for the efficiency of the software to obtain a solution in a reasonable time. For these reasons, a demonstrator named FAST (Flexible Aerodynamic Solver Technology) is under development since 1-2 year in order to prepare a major evolution of elsA in the coming years. This demonstrator aims at providing a software architecture and numerical techniques which will allow better flexibility, evolutivity and efficiency in order to perform simulations out of reach with the present CFD tools. Thanks to previous expertise, services reclaimed by CFD simulations (pre/post-processing, boundary conditions, solvers, coupling...) are provided by different Python modules in FAST, whereas the CGNS standard (CFD General Notation System) is adopted as a data model for interoperability between modules. Thanks to code modernization (memory access, vectorization,..) we aim to reduce by at least by one order of magnitude the CPU cost of this kind of computation on actual Xeon and future Phi (KNL) or Xeon (Skylake) Intel architecture processors.
SAMSON (Software for Adaptive Modeling and Simulation Of Nanosystems) is a new software platform for computational nanoscience available on SAMSON Connect at http://www.samson-connect.net.
SAMSON integrates modeling and simulation to aid in the analysis and design of molecular systems. For instance, an interactive quantum chemistry module (ASED-MO level of theory) makes it possible for users to build and edit structures while interactively visualizing how the electronic density is updated (Figure 1a); interactive flexing and twisting tools allow users to easily perform large-scale flexible deformations of e.g. proteins with a few mouse clicks (Figure 1b); interactive virtual prototyping of hydrocarbon systems may be used to edit and constrain graphene sheets, carbon nanotubes (Figure 1c), or build complex models (Figure 2), potentially through adaptive simulation algorithms [1][2][3].
Most important, a Software Development Kit allows developers to extend SAMSON?s functionality by developing SAMSON Elements (modules for SAMSON), including e.g. new interaction models, editors, apps, wrappers or interfaces to existing software, connectors to web services, etc. The SAMSON Connect website (http://www.samson-connect.net) is open for developers and users to easily share SAMSON Elements.
We will present SAMSON and its general design principles, as well as specific applications to structural biology and materials science.
References
[1] S. Artemova and S. Redon, Physical Review Letters, 109:19, 2012
[2] M. Bosson et al, Journal of Computational Physics, 231:6, 2012
[3] M. Bosson et al, Journal of Computational Chemistry, 34:6, 2013
Geometric methods furnish novel and transformative strategies for tackling the supercomputing challenge in a wide range of scientific areas (physics, chemistry, biology, life sciences, materials, climate, geosciences). In particular, new numerical schemes have been developed which respect the intrinsic geometrical structure of the challenges of (big-)data or physics.
Recently, a variational strategy for introducing stochasticity into the dynamics of fluids, and fluid-structure interactions which also respects this intrinsic geometrical structure has been discovered. It already comprise a significant High Performance Computing problem that challenges even exascale computational capability! Fortunately, we can expect the extreme parallelism of the Port Controlled Hamiltonian and Lagrangian Systems for lumped descriptions to be particularly well adapted in making the cost and computer power requirements of these stochastic simulations affordable.
These geometric variational methods have been well-formulated by eminent experts in mathematics in an established cooperation with our world calibre industrial partner (Thales) and computer engineers within the UMN network, as supported by the National Agency of Research (ANR,France) to facilitate access to the European funding programmes (Horizon 2020, FETHPC, ?Transition to Exascale Computing") in 2017.
This talk is about the tools and methods developed during the MACOPA ANR grant (LAPLACE, ONERA, IMFT, IRIT).
Time consistent numerical integration of transport partial differential equations systems can be done with fully local time stepping at the cell level in a explicit formalism with the so-called "asynchronous" time integration methods. Large speed-up can be achieved for multi-scale problems. Different numerical schemes have been developed within the MACOPA free software toolkit: finite volume schemes, discontinuous Galerkin schemes, distributed residual schemes. Higher order time accuracy could also be achieved when using specific Asynchronous Runge-Kutta methods. Parallelization of the asynchronous schemes will also be discussed. Some simulation results in combustion, electromagnetism, discharge plasmas and coupled problems will be shown.
Organic photovoltaic systems present the specificity that the photo-generated excitons are strongly bound, justifying the realisation of donor/acceptor interfaces at which the charge separation can be achieved. The exact processes associated with such a process is however still a matter of discussion. As such, besides the standard goal of describing correctly the electronic and optical properties of organic semiconductors, one needs to describe correctly the band offsets at the interface and understand the related electron-hole dissociation processes. As will be shown in this presentation, the standard ab initio tools, namely DFT and its time-dependent extension (TDDFT) present severe limitations. We will show that the so-called GW and Bethe-Salpeter formalisms, allow to cure much of the problems associated with the (TD)DFT formalisms. Our GW and Bethe-Salpeter implementation with Gaussian bases allows to tackle systems comparison a few hundred atoms. The numerical implementation details and its scaling will be presented. Finally, perspective such as the accouting of the environment effect on electronic properties will be discussed.
References:
[1] "First principles calculations of charge transfer excitations in polymer-fullerene complexes: Influence of excess energy", D. Niedzialek, I. Duchemin, T. Branquinho de Queiroz, S. Osella, A. Rao, R. Friend, X. Blase, S. Kümmel, and D. Beljonne, Adv. Funct. Mater. 25 pp. 1287-1295 (2015).
[2] "Many-body Green's function study of coumarins for dye-sensitized solar cells", C. Faber, I. Duchemin, T. Deutsch, X. Blase, Phys. Rev. B, 86, 155315 (2012)
[3] "Short-range to long-range charge-transfer excitations in the zincbacteriochlorin-bacteriochlorin complex: A Bethe-Salpeter study", I. Duchemin, T. Deutsch, X. Blase, Phys. Rev. Lett. 109, 167801 (2012).
[4] "First-principles GW calculations for DNA and RNA nucleobases", Carina Faber, Claudio Attaccalite, V. Olevano, E. Runge, X. Blase, Phys. Rev. B 83, 115123 (2011).
Processes taking place in the liquid state, for instance chemical
reactions, happen in a sea of solvent molecules. They are legion, and
some say they don't forget, but can we predict their effect in solution?
(i) Roughly yes, using rough methods that rely on a macroscopic
description of the solvent. They are fast (say few cpu-seconds) but are
not able to capture the physical, molecular nature of solvation. No
packing, no orientation effects, no hydrogen-bonding, among others.
(ii) Yes, accurately, using explicit simulations like molecular
dynamics. But these are at least 3 to 4 orders of magnitude slower.
Hundreds or thousands of cpu-hours are often necessary.
(iii) We will present the molecular density functional theory and its
associated code, MDFT. We will show how state-of-the-art liquid state
theory and high performance algorithms can capture solvation effects at
their inherent molecular scale, for the cost of rough methods.
In this talk I will first present our recent work and developments on SMPI, a flexible simulator of MPI applications. In this tool, we took a particular care to ensure our simulator could be used to produce fast and accurate predictions in a wide variety of situations. Although we did build SMPI on SimGrid whose speed and accuracy had already been assessed in other contexts, moving such techniques to a HPC workload required significant additional efforts to accurately model communications and network topology. I will also present our recent work on StarPU/SimGrid, a custom simulator that can be used to predict the performance of task-based applications running on top of StarPU to exploit hybrid (CPU+GPU) architectures. We have demonstrated the faithfulness of StarPU/SimGrid for both modern dense and sparse linear algebra solvers.
The prediction of conversion efficiency and pollutant emissions in combustion devices is particularly challenging as they result from very complex interactions of turbulence, chemistry, and heat exchanges at very different space and time scales. The mesh resolution in the mixing and reaction zones of the combustor is therefore of tremendous importance to reduce the turbulent combustion modeling effort and the inherent modeling errors. In the framework of finite-rate chemistry modeling at low-Mach number, h-adaptation has to be supplemented by operator splitting and stiff integration algorithms in order to alleviate the time step restriction due to the chemical time scales. These algorithms introduce a large load unbalance when running on massively parallel super-computers. However, task sharing and dynamic scheduling approaches enable to recover a linear scaling on a large number of cores. These techniques have been implemented in the YALES2 CFD solver (http://www.coria-cfd.fr), developed at CORIA and used in several laboratories of the scientific group SUCCESS (http://success.coria-cfd.fr) and in the industry. It has been specifically tailored for dealing with very large meshes up to tens of billion cells and for solving efficiently the low-Mach number Navier-Stokes equations on massively parallel computers. The presentation will focus in particular on the recent development in YALES2 of dynamic h-adaptivity and operator splitting approaches, which enable to recover a good scaling on modern super-computers.
Full waveform inversion (FWI) is a nonlinear data fitting process for high resolution seismic imaging. It is based on the iterative minimization of the L^2 distance between predicted and observed data. The predicted data is computed through the numerical solution of the wave equation in time or frequency domain. The minimization is performed through local optimization techniques based on the gradient of the misfit function and an estimation of its inverse Hessian following quasi-Newton techniques. Compared to standard tomography methods, full waveform inversion is able to yield higher resolution estimates of the subsurface wave velocity. However, as it is based on the repeated computation of the full wavefield, it requires efficient numerical algorithms carefully implemented. In addition, local optimization techniques require a sufficiently accurate initial model to converge to the desired solution. In this presentation, we shall review standard implementation of FWI for industrial scale problems, and present a novel methodology to relax the constraint on the accuracy of the initial model. This methodology is based on the computation of the distance between predicted and observed data using an optimal transport distance. In particular, we will present the properties of this distance in the framework of FWI, and the algorithm we designe for its efficient evaluation for large-scale problems.
Particle and heat exhaust physics is one of the main challenges magnetic fusion research will have to solve on its way to full scale reactors. The design of the magnetic configuration and wall shape as well as the tuning of the edge plasma conditions are critical in order to maintain sustainable power fluxes on plasma facing components while insuring an effective pumping of Helium ashes and keeping fusion favorable conditions in the core plasma. The physics at play involves a complex balance between plasma transport processes and volumetric sources and sinks related to atomic and molecular processes occurring between the plasma and recycling or injected neutrals.
Mean-field "transport codes" have been for many years the key tools for the understanding of edge plasma regimes and the design of future machines. These codes rely on models in which plasma turbulence has been smoothed out by averaging and simple closures are used to model the average fluxes and stresses due to fluctuations. In particular, transverse transport is commonly described via a gradient-diffusion hypothesis in which fluxes are driven by local gradients and characterized by ad-hoc diffusion coefficients whose values are determined experimentally. However, these coefficients differ from one machine to another, from one pulse to another in the same device and even from one location to another inside a plasma. They must then be considered as free parameters, which reduces drastically the predictive capabilities of these codes. Fluctuations related non-linearities appearing in atomic physics are not captured either by mean-field models and could influence significantly the results.
In this presentation, I will present the effort led at CEA Cadarache to unlock this bolt. I will specifically introduce the two main edge plasma codes developed by our team, namely SOLEDGE2D and TOKAM3X, and report on report the latest advances and results obtained with these two tools. Special focus will be given to numerical challenges still needing to be solved on the way to ITER relevant simulations.
In turbulent transport the advected quantities often exhibit spatial scales that are much smaller than the flow scales. We will describe hybrid algorithms which take this property into account to combine different methods (Eulerian and Lagrangian) on different hardware (CPU and GPU). We will also show how similar ideas can be used to define approaches which would be intermediate between Direct Numerical Simulations and Large Eddy Simulations for turbulent flows.
La modélisation des milieux poreux fracturés est à la fois un enjeu industriel et environnemental majeur et un problème d'une redoutable complexité du fait de la large gamme d'échelles d'espace et de temps mises en jeu. Dans cet exposé on s'intéressera à la classe des modèles de fractures discrètes qui représente les fractures comme un réseau de surfaces de co-dimension 1 immergé dans le milieu 3D environnant. On étudiera la formulation mathématique de ces modèles pour des écoulements monophasiques et diphasiques, leur discrétisation par des méthodes de type volume fini sur des maillages polyédriques ainsi que leur parallélisation sur des architectures distribuées.
Differences in simulation results may be observed from one architecture to another or even inside the same architecture. Such reproducibility failures are often due to different rounding errors generated by different orders in the sequence of arithmetic operations. It must be pointed out that the cause of differences in results may be difficult to identify: rounding errors or bug? Such differences are particularly noticeable with multicore processors or GPUs (Graphics Processing Units).
In this talk, we describe the principles of DSA (Discrete Stochastic Arithmetic) which enables one to estimate rounding error propagation in simulation programs. We show that DSA can be used to estimate which digits in simulation results may be different from one environment to another because of rounding errors. We present the CADNA library (http://www.lip6.fr/cadna), an implementation of DSA that controls the numerical quality of programs and detects numerical instabilities generated during the execution. A particular version of CADNA which enables numerical validation in hybrid CPU-GPU environments is described. The estimation of numerical reproducibility using DSA is illustrated by a wave propagation code which can be affected by reproducibility problems when executed on different architectures.
Addressing the major challenges of software productivity and performance portability becomes necessary to take advantage of emerging computing architectures. There is a growing demand for new programming environments in order to improve scientific productivity, to ease design and implementation, and to optimize large production codes.
We introduce the numerical analysis specific language Nabla which is an open-source (nabla-lang.org) Domain Specific Language (DSL) whose purpose is to translate numerical analysis algorithmic sources in order to generate optimized code for different runtimes and architectures. Nabla raises the level of abstraction, following a bottom-up compositional approach that provides a methodology to co-design between applications and underlying software layers for existing middleware or heterogeneous execution models.
One of the key concept is the introduction of the hierarchical logical time within the high-performance computing scientific community. This new dimension to parallelism is explicitly expressed to go beyond the classical single-program multiple-data or bulk-synchronous parallel programming models. Control and data concurrencies can be combined consistently to achieve statically analysable transformations and efficient code generation. Shifting the complexity to the compiler offers an ease of programming and a more intuitive approach, while reaching the ability to target new hardware and leading to performance portability.
Combiner l'adaptation de maillage et le calcul massivement parallèle permet en théorie d'obtenir l'efficacité maximale en simulation numérique. Les stratégies parallèles s'appuient généralement sur la partition d'un maillage, elle-même réalisée à l'aide d'heuristiques géométrique ou topologique. L'adaptation de maillage anisotrope, non structuré, nécessite de revoir cette approche puisque tout change, le maillage et donc la partition aussi. Ce couplage maillage partition peut être résolu de façon itérative et l'ensemble du processus devient alors adaptatif. Maillage et partition évoluent localement pilotés par l'estimateur d'erreur.
Les méthodes de type frontière implicite permettent de découpler le maillage de la représentation géométrique des différentes phases que l'on veut prendre en compte dans un calcul. Cela vaut pour l'ensemble de calculs multiphasiques, liquide gaz ou fluide structure. Cela permet une approche monolithique à maillage unique et une régularisation des équations d'interface à partir d'une épaisseur, elle-même contrôlée par l'adaptation de maillage. Le calcul massivement parallèle devient possible et dépend de nouveau de l'extrême précision des itérations maillage partionnement. Nous montrerons des exemples de calculs multiphasiques à partir des méthodes éléments finis stabilisés et d'adaptation anisotrope et les performances obtenues sur quelques machines massivement parallèles.
One task in the numerical solution of partial differential equations is to define the computational mesh and its partition among the processors used, for any given domain geometry. For many applications, it is desirable to perform all algorithms for adaptive mesh refinement (AMR) in parallel, in core, and whenever the simulation requires it. Especially for large simulations, this imposes the condition that AMR should scale at least as well as the solvers.
Forest-of-octrees AMR is an approach that offers both geometric flexibility and parallel scalability and is being used in various finite element codes and libraries. Low and high order discretizations alike are enabled by parallel node numbering algorithms that encapsulate the semantics of sharing node values between processors. More general applications, such as semi-Lagrangian and patch-based methods, require additional AMR functionalities. In this talk, we present algorithmic concepts essential for leading-edge adaptive simulations.
En vue des difficultés d'accès au site du CEA Saclay, annoncées pour la durée de la COP21, nous avons pris la décision d'annuler le séminaire du 8 décembre, et de reprogrammer au 14 juin 2016 les talks prévus à cette date :
Ivan Duchemin : GW and BSE calculations with FIESTA
Maximilien Levesque : Solvation at the molecular scale, the MDFT route
Ceci afin que ces talks soient accessibles au plus grand nombre.
We propose to generalize the SIMT execution model of GPUs to general-purpose processors, and use it to design new CPU-GPU hybrid cores. These hybrid cores will form the building blocks of heterogeneous architectures mixing CPU-like cores and GPU-like cores that all share the same instruction set and programming model. The SIMT model used on some GPUs binds together threads of parallel applications so they perform the same instruction at the same time, in order to execute their instructions on energy-efficient SIMD units. Unfortunately, current GPU architectures lack the flexibility to work with standard instruction sets like x86 or ARM. Their implementation of SIMT requires special instruction sets with control-flow reconvergence annotations, and they do not support complex control flow like exceptions, context switches and thread migration. We will see how we can overcome all of these limitations and extend the SIMT model to conventional instruction sets using a PC-based instruction fetch policy. In addition, this solution enables key improvements that were not possible in the traditional SIMD model, such as simultaneous execution of divergent paths. It also opens the way for a whole spectrum of new architectures, hybrids of latency-oriented superscalar processors and throughput-oriented SIMT GPUs.
PGAS programming languages and models exist since some time. Their primary goal was to propose an alternative standard model for parallel programming as a replacement of MPI and OpenMP. No PGAS model currently emerges, particularly on recent parallel machines that include accelerators. Nevertheless, existing PGAS implementations keep improving their efficiency. At the same time, a tendency focuses on the interoperability with MPI, OpenMP and other standard tools. Also, some effort is done to incorporate PGAS concepts in non-PGAS programming models. We will present an overview on the current PGAS environment, existing tools and their use.
We present a fast and parallel finite volume scheme on unstructured meshes applied to air/water flow. The mathematical model is based on a three-dimensional compressible low Mach two-phase flows model, combined with a linearized 'artificial pressure' law. This hyperbolic system of conservation laws allows an explicit scheme, improved by a block-based adaptive mesh refinement scheme. The numerical density of entropy production (the amount of violation of the theoretical entropy inequality) is used as an error indicator. This criterion indicates efficiently where the mesh should be refined or coarsened. Moreover, the computational time is preserved using a local time-stepping method. Finally, we show through several test cases the efficiency of the present scheme on somes two- and three-dimensional dam-break problems.
Projection on Proper Elements (PoPe) is based on an in-depth analysis of diagnostics which only require minimal modifications of the code tested and a minimal computational overhead. No dedicated simulations are needed since this method can be used in any regimes, including chaotic ones. PoPe is based on the exploration of the bijection between the analytical model implemented in a code and the output of simulations: if the equations recovered from a simulation are equivalent to the ones theoretically implemented in the code, the code is then verified; if not, PoPe gives indication to find and correct the error. The accuracy of PoPe diagnostics also allows to recover the convergence of the numerical methods. The verification of a 2D fluid code TOKAM and a 4D gyro-kinetic code TERESA, both used in plasma physics, are presented.
Massively parallel simulations generate increasing volumes of large data, whose exploitation requires large storage resources, efficient network and increasingly large post-processing facilities. In the coming era of exascale computations, there is an emerging need for new data analysis and visualization strategies.
We will present here an original solution to address these questions for massively parallel direct numerical simulations of transitional and turbulent flows.
Domain decomposition methods are, alongside multigrid methods, one of the dominant paradigms in contemporary large-scale partial differential equation simulation.
In this talk, I will present a lightweight implementation (HPDDM, https://github.com/hpddm/hpddm) of theoretically and numerically scalable domain decomposition preconditioners in the context of overlapping and substructuring methods. A broad spectrum of applications will be covered, ranging from the scalar diffusion equation to Maxwell's equation, and including incompressible linear elasticity. Numerical results with hundreds of processes will be provided, clearly showing the effectiveness and the robustness of the proposed approaches.
HPDDM is currently interfaced with two finite element libraries, FreeFem++ (http://www.freefem.org/ff++/) and Feel++ (http://www.feelpp.org/), which allows for quick prototyping and throughout testing.
With the heterogeneity of modern parallel architectures, and with the democratization of parallel hardware in our day to day life, scientific applications has become, in the past few years, a complex HPC-centric development science. However, numericians are not computer-scientists, and even if some of them follow parallelization courses, the complexity of modern architectures and the needed knowledge on low level programming represent too much work and time to actually reach the performance peak of a given machine.
As a result, one of the main research topic of HPC is to find how to abstract intricate details of HPC programming to non-specialists. Some HPC programming models try to solve general problems, but most of the time to stay efficient, those models do not totally hide parallelization but simplify it. On the other hand, some HPC programming models try to make transparent or implicit the parallelization of codes. In this case, though, the model has to be specialized to a specific domain where parallelization technics are known and can be applied transparently.
I will present SIPSim (Structured Implicit Parallelism for scientific SIMulations), an implicitly-parallel programming model for mesh-based numerical simulations. This model proposes the good balance to keep a sequential programming style, and thus the development freedom searched by numericians, while obtaining a transparent high performance simulation on distributed memory architectures, such as clusters. I will also show that this model can be applied to more complex multi-physics simulations. The parallelization of two real case simulations will be presented using two implementations of the SIPSim model, one on a water flow simulation on 2D surfaces, the second one on an arterial blood flow simulation. Finally, I will open this work to the use of component and workflow models to implement the SIPSim model.
The infiltration of rain waters through soils are determinant for numerous engineering applications (water ressources, environment, geotechnics, ...) as well as in various natural phenomena, such as for instance weathering of continental surfaces (e.g.: Goddéris et al., 2012), which is a key process of the carbon cycle (Walker et al., 1981). A classical way to quantify such flows in variably saturated porous media consist in the numerical resolution of the Richards equation, 3D, instationnary and non-linear (Richards, 1931). However, the study of the hydrological behaviour of soils under evolving conditions (climate changes, land use changes, ...) requires modellings at large spatial scales (km 2 and beyond) and on large time scales (decades, century). The massively parallel computing is a major way of dealing whith such large scale problems (see for example Miller et al., 2013). We present here a massively parallel solver for Richards equation, RichardsFOAM.
This solver has been developed in the framework of the open source CFD tool box OpenFOAM ® (Jasak, 1996, Weller et al., 1998). RichardsFOAM is able to solve large scale problems due to the good parallel performances of OpenFOAM ® (with RichardsFOAM, about 90% of parallel efficiency with 1024 cores both in strong and weak scaling). These performances will allow us to propose mechanistic modellings of water fluxes at the relevant space and time scales for the study of weathering processes (> km 2 , > decades).
A detailed study of the parallel performances of RichardsFOAM will be presented (strong and weak scaling, I/O's impacts, numerical stiffness impacts) as well as an example of application on a field data set. The associated scientific perspectives will be discussed.
Python is getting widely used to quickly prototype numerical kernels, thanks to the numpy/scipy/matplotlib/ipython team. But when it comes to performance, it still lags behind equivalent native code. The Pythran compiler proposes a solution to this problem, by statically compiling and optimizing high-level Python/numpy kernels into parallel, vectorized C++11 code. The leitmotiv is to take high-level Numpy code, without the need of explicit loops, and rely on the semantic of the numpy operations to generate efficient code.
The talk will present both how Pythran works and how to make it work!
High performance computing hardware is increasing complex. Servers features at least two processors, with many cores each, shared caches, non-uniform memory access, and possibly accelerators. The actual hardware organization of these resources has a deep impact of HPC application performance since computation and data transfer speed depends on data locality. Unfortunately this organization as viewed from application is unpredictable. Resources can be hierarchically organized or horizontally ordered differently from one machine to another, making topology assumptions highly non-portable and causing application performance to vary significantly even on apparently-similar platforms.
This lecture will first detail the complexity of current hardware architectures and explain how it matters to HPC application performance. Then we will introduce the hwloc tool (Hardware Locality) which aims at hiding all these deep hardware details and non-portability issues. We will explain how hwloc models the hardware resource organization in an abstracted and portable manner and exposes it in a simple way to applications. Finally we will show how hwloc can ease the building of portable locality optimization in HPC libraries and applications.
On s'intéresse à la simulation de transitoires brutaux impliquant des fluides et des structures en interaction. La représentation des systèmes est très générale, avec une formulation eulérienne ou ALE pour les fluides, lagrangiennes pour les structures, et de nombreuses contraintes cinématiques assurant les couplages entre les entités (interaction fluide-structure ou contact entre structures par exemple).
Assurer un suivi précis des fronts dans un tel contexte suggère de recourir au Raffinement Adaptatif de Maillage (AMR). Il s'agit de suivre des ondes, dans les structures ou les fluides, ou des interfaces physiques comme des structures immergées ou des interfaces entre différentes phases fluides.
Le présent exposé est consacré aux spécificités de tels développements dans un code volumineux (EPX, http://www-epx.cea.fr dans le cas présent), pour intégrer simultanément les caractéristiques de la structure de données existante, le passage au calcul parallèle avec le plus d'efficacité et généricité possible, la multiplicité des critères d'adaptation du maillage et la prise en compte sans restriction des contraintes cinématiques.
In this talk, I will present a Molecular Dynamics N-body simulation for multi-accelerator heterogeneous architectures. It is based on OpenCL and features a new force computation algorithm focusing on using a low memory footprint while maintaining a high performance level. I will describe the implementation and detail how we had to slightly adapt the kernels for accelerators as different as NVIDIA GPU and Intel Xeon Phi to provide better opportunities for code optimizations. Finally I will present our research perspectives about enhancing the OpenCL ecosystem to achieve better portability and performance
La fin de la loi de Moore a remis en avant le problème de la portabilité des performances. Une rupture technologique majeure est apparue avec le concept de processeur multi-coeur qui s'est rapidement imposé au niveau des processeurs généralistes, des processeurs embarqués et des accélérateurs de calcul. Les plates-formes de calcul sont devenues très parallèles, hétérogènes avec une hiérarchie mémoire particulièrement complexe. Les noyaux de calcul des applications doivent être continuellement optimisés pour exploiter la complexité croissante des architectures et régulièrement adaptées aux évolutions architecturales. Il faut donc à la fois optimiser, porter et maintenir régulièrement ces noyaux de calcul. Sans cela, il n'est pas possible de tirer le meilleur parti des plates-formes modernes en termes de niveau de performance atteint par l'application, c'est-à-dire qu'une part souvent très importante de la puissance de calcul disponible est perdue.
Se pose aussi le problème de la maintenabilité et de la pérennité de ces noyaux, notamment avec le nombre d'architectures qui ne cessent de croître (Intel Xeon, Xeon Phi, GPU, FPGA, ARM, ...), utilisant des langages différents (C, Fortran, Cuda, OpenCL), avec des paradigmes de programmation différents (OpenMP, OpenACC, MPI,...) sur des compilateurs différents qui essaient d'exploiter des instructions intrinsèques, par exemple, de vectorisation. Les avancées en parallélisation automatique par les compilateurs n'ont pas été suffisantes et le travail de parallélisation/vectorisation repose encore sur les programmeurs.
Pour répondre à cette problématique, l'équipe-commune INRIA Corse a développé un environnement de génération automatique de codes [BOAST] permettant d'évaluer les différents modèles d'optimisations possibles sur une architecture donnée en générant de multiples variantes de codes source dont la performance est analysée. Cet environnement, BOAST, a démontré qu'il pouvait sur deux codes, BigDFT et SPECFEM3D fournir des performances supérieures aux routines déjà optimisées. Pour cela, il suffit d?exprimer dans un langage spécifique les noyaux de calcul, en décrivant les optimisations possibles (déroulement des boucles, vectorisation, ...). BOAST génère différentes versions des noyaux dans les langages cibles (C, Fortran, CUDA ou OpenCL), évalue ensuite leurs performances en les exécutant sur la plate-forme cible. Le programmeur peut ensuite sélectionner la plus performante et la plus adaptée à l'architecture visée.
BOAST a été financé et développé dans le contexte des projets européens FP7 Mont-Blanc (2011-2016).
Despite the significant progresses in blood flow simulations in the recent years, the accurate description of the complex multi-physics, multi-scale phenomena characterizing blood flow in realistic geometries still remains a very challenging task.
The aim of the first part of the lecture is to present our advances in developing an open-source framework for hemodynamics and to assess this framework through validation against experimental data for fluid flow in an idealized medical device with rigid boundaries. The core is built on a flexible generic library called Feel++ Finite Element method Embedded Language in C++ (www.feelpp.org), which allows for arbitrary order continuous and discontinuous Galerkin methods in 1D, 2D and 3D, seamlessly in parallel.
In the second part of the talk, we illustrate the capabilities of this framework by applying it to the development of a computational model for blood flow in the cerebral venous system. The study focuses on the influence of different modeling assumptions (inflow/outflow boundary conditions, viscosity models) on the flow (velocity field, wall shear stresses) and it constitutes a first step towards incorporating and quantifying different sources of uncertainty in the modeling process.
The talk is based on joint works with V. Chabannes, M. Ismail, C. Prud'homme and R. Tarabay.
On s'intéresse à la simulation de transitoires brutaux impliquant des fluides et des structures en interaction. La représentation des systèmes est très générale, avec une formulation eulérienne ou ALE pour les fluides, lagrangiennes pour les structures, et de nombreuses contraintes cinématiques assurant les couplages entre les entités (interaction fluide-structure ou contact entre structures par exemple).
Assurer un suivi précis des fronts dans un tel contexte suggère de recourir au Raffinement Adaptatif de Maillage (AMR). Il s'agit de suivre des ondes, dans les structures ou les fluides, ou des interfaces physiques comme des structures immergées ou des interfaces entre différentes phases fluides.
Le présent exposé est consacré aux spécificités de tels développements dans un code volumineux (EPX, http://www-epx.cea.fr dans le cas présent), pour intégrer simultanément les caractéristiques de la structure de données existante, le passage au calcul parallèle avec le plus d'efficacité et généricité possible, la multiplicité des critères d'adaptation du maillage et la prise en compte sans restriction des contraintes cinématiques.
Despite the significant progresses in blood flow simulations in the recent years, the accurate description of the complex multi-physics, multi-scale phenomena characterizing blood flow in realistic geometries still remains a very challenging task.
The aim of the first part of the lecture is to present our advances in developing an open-source framework for hemodynamics and to assess this framework through validation against experimental data for fluid flow in an idealized medical device with rigid boundaries. The core is built on a flexible generic library called Feel++ Finite Element method Embedded Language in C++ (www.feelpp.org), which allows for arbitrary order continuous and discontinuous Galerkin methods in 1D, 2D and 3D, seamlessly in parallel.
In the second part of the talk, we illustrate the capabilities of this framework by applying it to the development of a computational model for blood flow in the cerebral venous system. The study focuses on the influence of different modeling assumptions (inflow/outflow boundary conditions, viscosity models) on the flow (velocity field, wall shear stresses) and it constitutes a first step towards incorporating and quantifying different sources of uncertainty in the modeling process.
The talk is based on joint works with V. Chabannes, M. Ismail, C. Prud'homme and R. Tarabay.
The detection of gravitational waves is eagerly expected as one of the most important scientific discoveries of the next decade. A worldwide effort is now working actively to pursue this goal both at an experimental level, by building ever sensitive detectors, and at a theoretical level, by improving the modelling of the numerous sources of gravitational waves. Much of this theoretical work is made through the solution of the Einstein equations in those nonlinear regimes where no analytic solutions are possible or known. I will review how this is done in practice and highlight the considerable progress made recently in the description of the dynamics of binary systems of black holes and neutron stars. I will also discuss how the study of these systems provides information well beyond that contained in the gravitational waveforms and opens very exciting windows on the relativistic astrophysics of GRBs and of the cosmological evolution of massive black holes.
Heterogeneous multi-core platforms, mixing regular cores and dedicated accelerators, are now so widespread to have become the nowadays canonical computing architecture. To fully tap into the potential of these hybrid platforms, both in terms of computation efficiency and power saving, pure offloading approaches in which the main core of the application runs on regular processors and offloads specific parts on accelerators, are not sufficient. The real challenge is to allow applications to fully use the cumulated computing power of the entire machine, by scheduling parallel jobs dynamically over the whole set of available processing units. The Inria Team RUNTIME has been studying this problem of scheduling tasks on heterogeneous multi/many-core architectures for many years, which led to the design of the StarPU runtime system. The StarPU runtime is capable of scheduling tasks over heterogeneous, accelerator-based machines. Its core engine integrates both a scheduling framework and a software virtual shared memory (DSM), working in close relationship. The scheduling framework maintains an up-to-date, self-tuned database of kernel performance models over the available computing units to guide the task/unit mapping algorithms. The DSM keeps track of data copies within accelerator embedded memories and features a data-prefetching engine, avoiding expensive redundant memory transfers and enabling task computations to overlap unavoidable memory transfers. Such facilities were successfully used in the field of parallel linear algebra algorithms, notably, where StarPU is now one of the target backends of the University of Tennessee at Knoxville's MAGMA linear algebra library.The StarPU environment typically makes it much easier to exploit heterogeneous multicore machines. Thanks to the Sequential Task Flow programming paradigm, programs may submit tasks to the StarPU engine using the same logical order as the sequential version, thus preserving initial algorithm layouts and loop patterns.
Neuroscience has produced a large body of data on the function and anatomy of brain but the transformation of this knowledge into a coherent understanding has been limited. Computational modeling can integrate such fragmented data into models of brain structures that satisfy the broad range of constraints imposed by experiments, thus advancing our understanding of their computational role, and their implementation in the neural substrate.
In the first part of the lecture I will present a comprehensive multi-scale spiking model of cat primary visual cortex satisfying a range of anatomical, statistical and functional properties. It considers cortical layers 4 and 2/3, corresponding to a 5x5 mm patch of V1. We have subjected the model to numerous visual stimulation protocols covering a wide range of input statistics, from spa-rse noise to natural scenes with simulated eye-movements. The model expresses over multiple scales a number of statistical and functional properties previously identified experimentally including: spontaneous activity with a physiologically plausible resting conductance regime; contrast-invariant orientation-tuning width; realistic adaptive interplay between evoked excitatory and inhibitory conductances; center-surround interaction effects; and stimulus-dependent changes in the precision of the neural code as a function of input statistics. This data-driven model offers numerous insights into how the studied properties interact, and thus contributes to a better understanding of visual cortical dynamics.
In the second part of the talk I will discuss the technology that was required to simulate such model and share our experience of running the model on a small-scale cluster. However, we will show that so far the main challenges were not due to the scale of the model itself, but its complexity. This is due to the heterogeneous nature of the model itself, and the complexity of the experimental environment the model is tested in. To address these issues we have built a highly automated workflow covering the specification of stimulation and experimental protocols, higher-level model design, advanced analysis and visualization libraries, and central storage module utilizing common meta-data specification.
The cortical microcircuit, the network comprising a square millimeter of brain tissue, has been the subject of intense experimental and theoretical research. The lecture first introduces a full-scale model of this circuit at cellular and synaptic resolution : the model comprises about 100,000 neurons and one billion local synapses connecting them. The purpose of the model is to investigate the effect of network structure on the observed activity. To this end it incorporates cell-type specific connectivity but identical single neurons dynamics for all cell types. The emerging network activity exhibits a number of the fundamental properties of in vivo activity: asynchronous irregular activity, layer specific spike rates, higher spike rates of inhibitory neurons as compared to excitatory neurons, and a characteristic response to transient input. Despite this success, the explanatory power of such local models is limited as half of the synapses of each excitatory nerve cell have non-local origins and at the level of areas the brain constitutes a recurrent network of networks.
The second part of the lecture therefore argues for the need of brain-scale models to arrive at self-consistent descriptions of the multi-scale architecture of the network. Such models will enable us to relate the microscopic activity to mesoscopic measures and functional imaging data and to interpret those with respect to brain structure.
The third part of the lecture introduces the technology required to simulate such models and discusses the performance of the upcoming NEST simulation code. Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. Over the past decade researchers have learned to manage the heterogeneity with efficient data structures. Already early parallel codes had distributed target lists, consuming memory for a synapse on just one compute node. As petascale computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise: Each nerve cell contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any node. The heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. The latest technology takes advantage of this double collapse using metaprogramming techniques and orchestrates the full memory of petascale computers like JUQUEEN and the K computer into a single brain-scale simulation.
OpenMP 4.0 substantially enlarges the traditional OpenMP programming environment, in particular in the context of the task API for handling recursive or irregular parallel patterns. We discuss the motivations for the integration of advanced programming interfaces, and illustrate their relevance with a few examples. All the other major new features proposed by OpenMP 4.0 - cancellation of parallel constructs, thread affinity, vectorization and GPU programming directives - are also reviewed
The modeling and simulation of multiphase reacting flows covers a large spectrum of applications ranging from combustion in automobile and aeronautical engines to atmospheric pollution as well as biomedical engineering. In the framework of this seminar, we will mainly focus on a disperse liquid phase carried by a gaseous flow field which can be either laminar or turbulent; however, this spray can be polydisperse, that is constituted of droplets with a large size spectrum. Thus, such flows involve a large range of temporal and spatial scales which have to be resolved in order to capture the dynamics of the phenomena and provide reliable and eventually predictive simulation tools. Even if the power of the computer resources regularly increases, such very stiff problems can lead to serious numerical difficulties and prevent efficient multi-dimensional simulations.
We discuss recent developments in Adjoint Algorithmic Differentiation (AAD) tool support, e.g., in the context oflarge-scale parameter calibration methods. First-and second-order derivative-based approaches to solving the underlying numerical optimization problems are considered. Specific mathematical and structural properties of the underlying simulation are exploited. The superiority of AAD software tools over manual approaches to the implementation of adjoint numerical models as well as over numerical approximation of the required sensitivities by finite differences is illustrated.
On introduira une manière automatique, la méthode GENEO, de construire des espace grossiers pour les méthodes parallèles de type Schwarz, BNN et FETI qui s'adapte automatiquement aux coefficients très hétérogènes, aux découpages irréguliers du moment que l'on travaille avec une discrétisation éléments finis d'un problème symétrique positif. Cela inclut Darcy et l'élasticité 2D et 3D. La méthode GENEO a été mise en oeuvre dans le logiciel d?éléments finis FreeFem++. On présentera des tests d?extensibilité (« scalability » en anglais) et des comparaisons avec les meilleurs solveurs multigrille pour des problèmes bi ou tridimensionnels allant jusqu?à plusieurs milliards d?inconnues avec plusieurs milliers de coeurs.
La modélisation du climat est devenu l'un des principaux défis techniques et scientifiques du siècle, une vive polémique ayant surgi sur les questions du changement climatique. Une des limitations les plus importantes des modèles climatiques, couplant composante océanique et atmosphérique, est la faible résolution spatiale (~ 100 km) imposée par le coût de calcul. Cette contrainte limite considérablement le réalisme des processus physiques paramétrés dans les modèles. L'arrivée imminente des machines pétaflopiques en France offre l'occasion unique d'élaborer de nouveaux modèles climatiques, dans le but de réduire les biais et les incertitudes récurrentes, dans des simulations climatiques et pour des projections à long terme du changement global. Notre approche vise à construire une plate-forme de modélisation pour la réalisation de simulations couplées océan-atmosphère multi-échelle, en introduisant des modèles dit « zooms » océaniques et atmosphériques à haute résolution, dans ces deux régions, au sein d'un modèle climatique global. En suivant cette stratégie, nous serons en mesure de représenter les fines échelles océaniques et atmosphériques des processus dynamiques, et de permettre aux processus régionaux de rétroagir sur le climat global. Cette présentation introduira les problème techniques et scientifiques liés à la modélisation du climat en général et à ce projet en particulier. Nous illustrerons ensuite les premiers résultats obtenus à partir de simulations réalisées sur Curie.
High performance computing has helped raise computational biochemistry to new levels of accuracy and power, recognized last year by the Nobel prize awarded to Karplus, Levitt, and Warshel: the first Nobel prize ever to reward simulation work. Ressources like GPUs, volunteer computing and specialized computers allow protein folding and ligand binding within simulation timescales. Today's computers also allow us to upgrade our physical models and introduce very sophisticated treatments of the molecular interactions. Nevertheless, it is still a major challenge to simulate large cellular machines over realistic timescales. Another challenge is to explore the vast space of possible protein mutations, in search of new proteins or biocatalysts. We will illustrate some of these achievements and challlenges, with examples from the literature and our own work. Applications from our lab include theoretical issues with the calculation of biomolecular thermodynamics, and volunteer distributed computing for the design of new proteins.
POSTPONED The modeling and simulation of multiphase reacting flows covers a large spectrum of applications ranging from combustion in automobile and aeronautical engines to atmospheric pollution as well as biomedical engineering. In the framework of this seminar, we will mainly focus on a disperse liquid phase carried by a gaseous flow field which can be either laminar or turbulent; however, this spray can be polydisperse, that is constituted of droplets with a large size spectrum. Thus, such flows involve a large range of temporal and spatial scales which have to be resolved in order to capture the dynamics of the phenomena and provide reliable and eventually predictive simulation tools. Even if the power of the computer resources regularly increases, such very stiff problems can lead to serious numerical difficulties and prevent efficient multi-dimensional simulations. The purpose of this seminar is to introduce to the Eulerian modeling of polydisperse evaporating spray for various applications, that is the disperse liquid phase carried by a gaseous flow field is modeled by ``fluid" conservation equations. Such an approach is very competitive for real applications since it has strong ability for optimization on parallel architectures and leads to an easy coupling with the gaseous flow field resolution. We will show that all the necessary steps in order to develop a new generation of computational code have to be designed at the same time with a high level of coherence: mathematical modeling through Eulerian moment methods, development of new dedicated stable and accurate numerical methods, implementation of optimized algorithms as well as verification and validations of both model and methods using other codes and experimental measurements. We will introduce both a new class of models and their mathematical analysis for the direct numerical simulation of spray dynamics even in the presence of coalescence and break-up, as well as a set of dedicated numerical methods and prove that such an approach has the ability, once validated, to lead to high performance computing on parallel architectures. We will finally present a synthesis of recent contributions, which aim at: 1- on the one side transferring the proposed models into identified codes for industrial applications in the fields of solid propulsion, aeronautical and automotive engines, 2- on the other side extending the previous work to turbulent flows, where some scales can not be resolved and have to be modeled, and where some dedicated numerical methods have to be designed.
Predicting the performance of fusion plasmas in terms of amplification factor, namely the ratio of the fusion power over the injected power, is among the key challenges in fusion plasma physics. In this perspective, turbulence and heat transport need being modeled within the most accurate theoretical framework, using first-principle non-linear simulation tools. The gyrokinetic equation for each species, coupled to Maxwell?s equations are an appropriate self-consistent description of this problem. A new class of global full-f codes has recently emerged, solving the gyrokinetic equation for the entire distribution function on a large radial domain of the tokamak and using some prescribed external heat source [1]. Such simulations are extremely challenging and require state-of-the-art high performance computing (HPC). The non-linear global full-f gyrokinetic 5D code GYSELA, which focuses on the electrostatic toroidal branch of the Ion Temperature Gradient driven turbulence with adiabatic electrons, is one of them. One particularity of the code is to solve the self-consistent problem on a fixed grid with a Backward Semi-Lagrangian scheme [2]. Despite the non-locality of this method, the new two-ion-species version of the code has been successfully ported on BlueGene architecture with a relative efficiency of 91% on 458 752 cores (weak scaling). The hybrid OpenMP/MPI parallelization which allows to obtain such performance will be detailed. One mid-term objective is to implement kinetic electrons in the code. This will need an increase of the mesh size by a factor of the order of 103 and need a decrease of the algorithm time step by a factor 10. Present simulations already require petascale computing resources. We will discuss the various approaches we currently investigate to prepare the code for our exascale future needs.
In this talk we will describe in detail a Density Functional Theory method based on a Daubechies wavelets basis set, named BigDFT. We will see that, thanks to wavelet properties, this code shows high systematic convergence properties, very good performances and an excellent efficiency for parallel calculations. BigDFT code operation are also well-suited for GPU acceleration. We will discuss how the problematic of fruitfully benefit of this new technology can be match with the needs of robustness and flexibility of a complex code like BigDFT.
Computational biology greatly benefits from approaches such as molecular dynamics simulations to study complex molecular assemblies. In this context, interactive visualization, manipulation and analysis aids hypothesis generation and exploration of large datasets. Integration of experimental data (SAXS, CryoEM, ..) in this modeling process is challenging. I will illustrate these issues through a) work on a complete dystrophin filament model [1] using our BioSpring simulation engine and b) interactive simulations and analysis of membrane proteins [2]. To tackle the corresponding visualization challenges, my group recently developed the UnityMol framework [3], based on the Unity3D game engine. A particular focus lies on interactive exploration and manipulation using tools such as haptic devices or very recently the LeapMotion controller. Possible display platforms are mobile devices, desktop workstations, display walls and virtual reality setups (CAVE, workbench,..). [1] Molza et al.,FD169: Innovative interactive flexible docking method for multi-scale reconstruction elucidates dystrophin molecular assembly, Faraday Discussion #169, 2014 [2] Dreher et al., ExaViz: a Flexible Framework to Analyse, Steer and Interact with Molecular Dynamics Simulations, Faraday Discussion #169, 2014 [3] Lv et al., Game on, Science - how video game technology may help biologists tackle visualization challenges, PLoS ONE 8(3):e57990, 2013 (http://unitymol.sourceforge.net)
Performing large, intensive or non-trivial computing on array like data structures is one of the most common task in scientific computing, video game development and other fields. This matter of fact is backed up by the large number of tools, languages and libraries to perform such tasks. If we restrict ourselves to C++ based solutions, more than a dozen such libraries exists from BLAS/LAPACK C++ binding to template meta-programming based Blitz++ or Eigen2. If all of these libraries provide good performance or good abstraction, none of them seems to fit the need of so many different user types. Moreover, as parallel system complexity grows, the need to maintain all those components quickly become unwieldy. This talk explores various software design techniques - like Generative Programming, MetaProgramming and Generic Programming - and their application to the implementation of various parallel computing libraries in such a way that: - abstraction and expressiveness are maximized - cost over efficiency is minimized As a conclusion, we'll skim over various applications and see how they can benefit from such tools.
Recent years have seen impressive progress towards hydrodynamic cosmological simulations of galaxy formation that try to account for much of the relevant physics in a realistic fashion. At the same time, numerical uncertainties and scaling limitations in the available simulation codes have been recognized as important challenges. I will review the state of the field in this area, highlighting a number of recent results obtained with large particle-based and mesh based simulations. I will also describe a novel moving-mesh methodology for gas dynamics in which a fully dynamic and adaptive Voronoi tessellation is used to formulate a finite volume discretization of hydrodynamics which offers numerous advantages compared with traditional techniques. The new approach is fully Galilei-invariant and gives much smaller advection errors than ordinary Eulerian codes, while at the same time offering comparable accuracy for treating shocks and an improved treatment of contact discontinuities. The scheme adjusts its spatial resolution to the local clustering of the flow automatically and continuously, and hence retains a principle advantage of SPH for simulations of cosmological structure growth. Applications of the method in large production calculations that aim to produce disc galaxies similar to the Milky Way will be discussed.
Since 1998, the question of the origin of the accelerated expansion of the Universe has become one of the most fundamental open problem in cosmology. To investigate this question, numerical simulations of large scale structure formation such as clusters of galaxies and filaments are a particularly relevant tool. In this presentation, I will review the current status of these simulations and the related computational problems. I will focus on the Full Universe Run that has been carried out on the entire Curie supercomputer in 2012. Then, I will highlight the lessons taken from this simulation and the ongoing library development based on generative programming to address most of the issues of current cosmological codes. The goal is to avoid the intertwining of physics, algorithms, and parallelization, and to make the best of supercomputers by creating a "compiler inside a compiler". I will describe these techniques and explain their substantial advantages for large cosmological simulations.
As HPC systems keep growing in scale, providing efficient fault tolerance mechanisms becomes a major issue. Studies on future exascale systems highlight that, considering that the expected meantime between failures will range from one day to a few hours, simple solutions based on coordinated checkpoints saved to a parallel file system will not work: more time will be spent dealing with failures than doing useful computation. As a consequence, new checkpointing techniques should be designed. A checkpointing protocol for large scale HPC systems should provide good performance in failure-free execution and in recovery while limiting the amount of resources used for fault tolerance. Designing a solution that can achieve all these conflicting goals is a hard task. In this talk, I will introduce hybrid rollback-recovery protocols as a solution to this problem: A hybrid rollback-recovery protocol combines coordinated checkpointing with some message logging to provide failure containment. In a first part, I will explain why hybrid protocols can be efficiently applied to most MPI HPC applications. In a second part, I will present SPBC, our new checkpointing solution based on an hybrid protocol. SPBC is the first checkpointing solution that provides failure containment without logging any information reliably apart from process checkpoints, and this, without penalizing recovery performance. To achieve this result, we used an original approach: Instead of designing a protocol that works for all message-passing applications, we identified properties common to our target applications, namely MPI HPC applications, and we leveraged these properties to design a fault tolerant solution that can be more efficient than existing protocols at large scale.
With exascale computing on the horizon, the performance variability of I/O systems represents a key challenge in sustaining high performance. In many HPC applications, I/O is concurrently performed by all processes, which leads to I/O bursts. This causes resource contention and substantial variability of I/O performance, which significantly impacts the overall application performance and, most importantly, its predictability over time. In this talk, we describe an original approach to I/O, called Damaris, which leverages dedicated I/O cores on each multicore SMP node, along with the use of shared-memory, to efficiently perform asynchronous data processing and I/O in order to hide this variability. We evaluated Damaris on various supercomputers including Titan (Top 1st in Top500 at the time of the experiment), Jaguar (Top 2nd at the time of the experiment) and Kraken (Top11th at the time of the experiment), with the CM1 atmospheric model, one of the target HPC applications for the Blue Waters postpetascale supercomputer project. By overlapping I/O with computation and by gathering data into large files while avoiding synchronization between cores, our solution brings several benefits: 1) it fully hides jitter as well as all I/O-related costs, which makes simulation performance predictable; 2) it increases the sustained write throughput by a factor of 15 compared to standard approaches; 3) it allows almost perfect scalability of the simulation up to over 16,000 cores, as opposed to state-of-the-art approaches which fail to scale; 4) it enables a 600% compression ratio without any additional overhead, leading to a major reduction of storage requirements. Additionally, based on Damaris, the Damaris/Viz framework provides support for easy, nonintrusive in situ visualization.
The simulation of astrophysical phenomena generally feature a large spread of length and time scales. This makes them extremely challenging from a computational point of view. In many instances, delicate trade-offs between resolution and fidelity to the physical conservation laws have to be committed in order to overcome the computational limitations. It is more the rule than the exception, that astrophysical phenomena are modeled in a resolution starved regime. This puts great robustness requirements on the used numerical methods. For instance, many systems of conservation laws used to model physical phenomena posses companion laws. These companion laws are generically fulfilled by analytical solutions to the original system of conservation laws. However, this assertion may not remain true when the equations are solved numerically. A prominent example is the divergence constraint on the magnetic field in magnetohydrodynamics. Other examples include the conservation of angular momentum and the preservation of steady states. We will present a class of methods, termed as structure preserving, that are constructed to fulfill as many as possible companion laws of the original system of conservations laws. The defect of standard numerical methods and the need for structure preserving ones will be illustrated through several challenging astrophysical scenarios, including the simulation of magneto-rotationally driven core-collapse supernovae and the merger of two neutron stars.
When a massive star reaches the end of its life, the core of the star collapses to a neutron star or black hole while the outer stellar layers are expelled in a supernova explosion. These cosmic catastrophes are not only among the most spectacular celestial phenomena, they are also responsible for the production and dissemination of a major part of the heavy elements in the universe. A better understanding of the role of supernovae in astrophysics and as laboratories for nuclear and particle physics at extreme conditions requires the solution of one of the most long-standing problems of stellar physics: What is the mechanism that initiates and powers the explosion of stars? Increasingly sophisticated numerical models provide growing support that the energy deposition by neutrinos radiated from the hot, newly formed neutron star and aided by violent hydrodynamic mass motions is the driving agency of the explosion. In this talk I will review recent successes of theoretical modeling and new questions arising as simulations currently push forward to meet the grand computational challenges of the third spatial dimension. I will also discuss possibilities to confront the theoretical picture with observational tests and constraints.
In this work we investigate the parallel scalability of variants of additive Schwarz preconditioners for three dimensional non-overlapping domain decomposition methods. To alleviate the computational cost, both in terms of memory and floating-point complexity, we investigate variants based on a sparse approximation. The robustness of the preconditioners is illustrated on a set of linear systems arising from the finite element discretization of academic convection-diffusion problems (un-symmetric matrices), and from real-life structural mechanical problems (symmetric indefinite matrices). Parallel experiments on up to a thousand processors on some problems will be presented and results of an ongoing implementation on top of runtime systems for heterogeneous computing will be discussed. The efficiency from a numerical and parallel performance view point are studied on problem ranging from a few hundred thousands unknowns up-to a few tens of millions.
Life Science is already an important player in computational science and with the ever increasing computational resources it is now possible to study systems composed of thousands of atoms. While different models are used to simulate the behaviour of these systems, all of them are solving an N-Body problem.
The code POLARIS(MD), developed by Michel Masella from the CEA/DSV, is based on a hierarchical model of the solvent surrounding a protein and an accurate microscopic interaction model. This code can handle large biological systems on far fewer CPU cores compared to its peers, which allows a biochemist to explore the space of the molecular complexity and to find the most suitable molecule at much lower cost.
At the Exascale Computing Research center we have analysed POLARIS(MD) with our in-house tools as well as common analysis tools. We have further optimized the code to run efficiently on Intel Xeon and Xeon-Phi architectures, in order to prepare for forthcoming architectures. The talk will present the latest results and give an overview of the experience gathered while preparing codes for higher parallelism on a single node.
Today's many-core GPU allow theoretical teraflop performance computing at the cost of a personal computer. But the parallel performance strongly depends on the choice of both models and numerical methods. We believe that the rising of manycore processing will certainly deeply impact the next-generation computational approaches. In this talk we will focus on few examples: compact stencil remapped Lagrange methods, low-diffusive transport solvers for interface capturing. We will conclude the talk by a set of demo, showing the capability to attain runtime computations with real visualization and interaction for 2D computations (including incompressible Navier-Stokes, compressible Euler equations and thermal-CFD coupling). About applications, we plan to develop serious games involving multiple users using heterogeneous interacting devices. This will be deployed on the DIGISCOPE visualization infrastructure.
We present several numerical simulations of conservation laws on recent multicore processors, such as GPU's, using the OpenCL programming framework. Depending on the chosen numerical method, different implementation strategies have to be considered, for achieving the best performance. We explain how to program efficiently three methods: a finite volume approach on a structured grid, a high order Discontinuous Galerkin (DG) method on an unstructured grid and a Particle-In-Cell (PIC) method. The three methods are respectively applied to a two-fluid computation, a Maxwell simulation and a Vlasov-Maxwell simulation
Domain decomposition methods are well suited for parallel computations. Indeed, the division of a problem into smaller subproblems, through artificial subdivisions of the domain, is a means for introducing parallelism. Domain decomposition strategies include in one way or another the following ingredients:
This talk shows how these methods have efficiently evolved over the years by using specially designed boundary conditions on the interface. These optimized interface conditions designed to take into account the heterogeneity between the subdomains on each sides of the interfaces (in porous media), or the propagation of the wave through the interfaces (in acoustics), for instance, lead to robust and efficient algorithms. In order to use such methods on massive parallel computers, the iterative scheme should be modified, as investigated in this talk. Chaotic iterations are here considered for the solution strategy of the interface problem, leading to some convergence difficulties. After the presentation of the proof of the convergence of the method, numerical experiments are performed on large scale engineering problems to illustrate the robustness and efficiency of the proposed method.
XIOS est un nouvel outil développé à l?IPSL (Institut Pierre Simon Laplace) destiné à gérer efficacement les sorties fichiers des modèles de simulations climatiques. Il vise deux principaux objectifs :
Quantum chemistry is known to be one of the grand challenges of modern science since many fundamental and applied fields are concerned (drug design, micro-electronics, nanosciences,...). To investigate all these fascinating problems is a tremendous task since highly accurate solutions of the fundamental underlying Schrödinger equation for a (very) large number of electrons need to be determined. The use of Quantum Monte Carlo methods is an emerging alternative approach to usual methods since they can take advantage of massively parallel architectures. In this talk the QMC=Chem program we develop in Toulouse will be presented, as well as the different strategies we used to reach the petaflops/s scale.