18.2.470. MPIX_Comm_get_failed
MPIX_Comm_get_failed - Obtain a group that lists failed processes in a communicator.
This is part of the User Level Fault Mitigation ULFM extension.
18.2.470.1. SYNTAX
18.2.470.1.1. C Syntax
#include <mpi.h>
#include <mpi-ext.h>
int MPIX_Comm_get_failed(MPI_Comm comm, MPI_Group *failedgrp)
18.2.470.1.2. Fortran Syntax
USE MPI
USE MPI_EXT
! or the older form: INCLUDE 'mpif.h'
MPIX_COMM_GET_FAILED(COMM, FAILEDGRP, IERROR)
     INTEGER COMM, FAILEDGRP, IERROR
18.2.470.1.3. Fortran 2008 Syntax
USE mpi_f08
USE mpi_ext_f08
MPIX_Comm_get_failed(comm, failedgrp, ierror)
     TYPE(MPI_Comm), INTENT(IN) :: comm
     TYPE(MPI_Group), INTENT(OUT) :: failedgrp
     INTEGER, OPTIONAL, INTENT(OUT) :: ierror
18.2.470.2. INPUT PARAMETERS
- comm: Communicator (handle).
18.2.470.3. OUTPUT PARAMETERS
- failedgrp: Group (handle).
- ierror: Fortran only: Error status (integer).
18.2.470.4. DESCRIPTION
This local operation returns the group failedgrp of processes from the communicator comm that are locally known to have failed. The failedgrp can be empty, that is, equal to MPI_GROUP_EMPTY.
For any two groups obtained from calls to that routine at the same MPI process, with the same comm, the intersection of the largest group with the smallest group is MPI_IDENT to the smallest group, that is, the same processes have the same ranks in the two groups, up to the size of the smallest group.
18.2.470.5. PROCESS FAILURES
MPI makes no assumption about asynchronous progress of the failure detection. A valid MPI implementation may choose to update the group of locally known failed MPI processes only when it enters a function that must raise a fault tolerance error.
It is possible that only the calling MPI process has detected the reported failure. If global knowledge is necessary, MPI processes detecting failures should call MPIX_Comm_revoke to enforce an error at other ranks.
18.2.470.6. WHEN COMMUNICATOR IS AN INTER-COMMUNICATOR
When the communicator is an inter-communicator, the value of failedgrp contains the members known to have failed in both the local and the remote groups of comm.
18.2.470.7. ERRORS
Almost all MPI routines return an error value; C routines as the return result of the function and Fortran routines in the last argument.
Before the error value is returned, the current MPI error handler associated with the communication object (e.g., communicator, window, file) is called. If no communication object is associated with the MPI call, then the call is considered attached to MPI_COMM_SELF and will call the associated MPI error handler. When MPI_COMM_SELF is not initialized (i.e., before MPI_Init/MPI_Init_thread, after MPI_Finalize, or when using the Sessions Model exclusively) the error raises the initial error handler. The initial error handler can be changed by calling MPI_Comm_set_errhandler on MPI_COMM_SELF when using the World model, or the mpi_initial_errhandler CLI argument to mpiexec or info key to MPI_Comm_spawn/MPI_Comm_spawn_multiple. If no other appropriate error handler has been set, then the MPI_ERRORS_RETURN error handler is called for MPI I/O functions and the MPI_ERRORS_ABORT error handler is called for all other MPI functions.
Open MPI includes three predefined error handlers that can be used:
- MPI_ERRORS_ARE_FATALCauses the program to abort all connected MPI processes.
- MPI_ERRORS_ABORTAn error handler that can be invoked on a communicator, window, file, or session. When called on a communicator, it acts as if MPI_Abort was called on that communicator. If called on a window or file, acts as if MPI_Abort was called on a communicator containing the group of processes in the corresponding window or file. If called on a session, aborts only the local process.
- MPI_ERRORS_RETURNReturns an error code to the application.
MPI applications can also implement their own error handlers by calling:
Note that MPI does not guarantee that an MPI program can continue past an error.
See the MPI man page for a full list of MPI error codes.
See the Error Handling section of the MPI-3.1 standard for more information.
See also