The NEMO oceanic model, characterized by a resolution of 1/16◦and tailored on the Mediterranean Basin used at CMCC, has been analyzed to discover possible bottlenecks to the parallel scalability. A detailed analysis of scalability on all of the routines called during a NEMO time step allowed to identify the SOR solver routine as the most expensive from the communication point of view. The function implements the red-black successive-over-relaxation method, an iterative search algorithm used for solving the elliptical equation for the barotropic stream function. The algorithm iterates until reach the convergence; a limit on the maximum number of iteration is also set up. The high frequency of data exchanging within this routine implies a high communication overhead. The NEMO code includes an enhanced version of the routine, that reduce the frequency of communication by adding an extra-halo region. The use of this optimization requires the selection of the optimal value of the extra-halo dimension to trade-off computation and communication. A performance model, allowing the choice of the optimal extra-halo value for a pre-defined decomposition, has been designed. The model has been tested on the MareNostrum cluster at the Barcelona Supercomputing Centre.
Nemo-Med: Extra-Halo Performance Model
EPICOCO, Italo;ALOISIO, Giovanni
2011-01-01
Abstract
The NEMO oceanic model, characterized by a resolution of 1/16◦and tailored on the Mediterranean Basin used at CMCC, has been analyzed to discover possible bottlenecks to the parallel scalability. A detailed analysis of scalability on all of the routines called during a NEMO time step allowed to identify the SOR solver routine as the most expensive from the communication point of view. The function implements the red-black successive-over-relaxation method, an iterative search algorithm used for solving the elliptical equation for the barotropic stream function. The algorithm iterates until reach the convergence; a limit on the maximum number of iteration is also set up. The high frequency of data exchanging within this routine implies a high communication overhead. The NEMO code includes an enhanced version of the routine, that reduce the frequency of communication by adding an extra-halo region. The use of this optimization requires the selection of the optimal value of the extra-halo dimension to trade-off computation and communication. A performance model, allowing the choice of the optimal extra-halo value for a pre-defined decomposition, has been designed. The model has been tested on the MareNostrum cluster at the Barcelona Supercomputing Centre.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.