The NEMO (Nucleus for European Modeling of the Ocean) oceanic model is one of the most widely used by the climate community. It is exploited with different configurations in more than 50 research projects for both long and short-term simulations. Computational requirements of the model and its implementation limit the exploitation of the emerging computational infrastructure at peta and exascale. A deep revision and analysis of the model and its implementation were needed. The paper describes the performance evaluation of the last release of the model, based on MPI parallelization, on the MareNostrum platform at the Barcelona Supercomputing Centre. The analysis of the scalability has been carried out taking into account different factors, i.e. the I/O system available on the platform, the domain decomposition of the model and the level of the parallelism. The analysis highlighted different bottlenecks due to the communication overhead. The code has been optimized reducing the communication weight within some frequently called functions and the parallelization has been improved introducing a second level of parallelism based on the OpenMP shared memory paradigm.
The NEMO Oceanic Model: Computational Performance Analysis and Optimization
EPICOCO, Italo
Conceptualization
;ALOISIO, GiovanniSupervision
2011-01-01
Abstract
The NEMO (Nucleus for European Modeling of the Ocean) oceanic model is one of the most widely used by the climate community. It is exploited with different configurations in more than 50 research projects for both long and short-term simulations. Computational requirements of the model and its implementation limit the exploitation of the emerging computational infrastructure at peta and exascale. A deep revision and analysis of the model and its implementation were needed. The paper describes the performance evaluation of the last release of the model, based on MPI parallelization, on the MareNostrum platform at the Barcelona Supercomputing Centre. The analysis of the scalability has been carried out taking into account different factors, i.e. the I/O system available on the platform, the domain decomposition of the model and the level of the parallelism. The analysis highlighted different bottlenecks due to the communication overhead. The code has been optimized reducing the communication weight within some frequently called functions and the parallelization has been improved introducing a second level of parallelism based on the OpenMP shared memory paradigm.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.