User-Defined Derived Type Input/Output with NVHPC compiler

User-Defined Derived Type Input/Output (UDTIO) allows the programmer to specify how a derived type is read or written from or to a file. This allows the user of an object to perform Input/Output operations without any knowledge of the object's layout.

Case study: Unformatted I/O

Consider the following mesh class:

type t_mesh
integer :: nod2D
real(kind=WP) :: ocean_area
real(kind=WP), allocatable, dimension(:,:) :: coord_nod2D
procedure, private write_t_mesh
procedure, private read_t_mesh
generic :: write(unformatted) => write_t_mesh
generic :: read(unformatted) => read_t_mesh
end type t_mesh

The subroutines for IO of this derived data type are:

subroutine write_t_mesh(mesh, unit, iostat, iomsg)

class(t_mesh), intent(in) :: mesh
integer, intent(in) :: unit
integer, intent(out) :: iostat
character(*), intent(inout) :: iomsg

write(unit, iostat=iostat, iomsg=iomsg) mesh%nod2D
write(unit, iostat=iostat, iomsg=iomsg) mesh%ocean_area
call write_bin_array(mesh%coord_nod2D, unit, iostat, iomsg)

end subroutine write_t_mesh

subroutine read_t_mesh(mesh, unit, iostat, iomsg)

class(t_mesh), intent(inout) :: mesh
integer, intent(in) :: unit
integer, intent(out) :: iostat
character(*), intent(inout) :: iomsg

read(unit, iostat=iostat, iomsg=iomsg) mesh%nod2D
read(unit, iostat=iostat, iomsg=iomsg) mesh%ocean_area
call read_bin_array(mesh%coord_nod2D, unit, iostat, iomsg)

end subroutine read_t_mesh

In the main program then the object can be read/written without knowing the derived data type layout:

program main

implicit none
type(t_mesh) :: mesh

! mesh derived type
open(newunit = fileunit, &
          file = trim(path_in), &
          status = 'replace', &
          form = 'unformatted')
write(fileunit) mesh

end program main

NVHPC compiler requires the writing/reading subroutines to be defined as private. If also the generic write/read are defined as private, the program will note fail with a runtime error but inconsistent behaviors can be observed (like wrong data read from files).

Compilation errors

Private components of a derived data type are not accessible with traditional IO operations, so the following code generates a compilation error:

module my_mod
type t
integer :: x
integer, private :: y
end type t
end module my_mod
program prg1
use my_mod
type(t) :: obj
write(*,*) obj ! Illegal due to private y

F2003 also does not allow IO operations on entire objects that have pointer components, so the following code generates a compilation error:

module my_mod
type t
integer :: x
integer,pointer :: y
end type
end module my_mod
program prg2
use my_mod
type(t) :: obj
write(*,*) obj ! Illegal due to pointer y

If a derived data type contains an allocatable array of another derived data type, the UDTIO operation fails at compile time. For example:

real(kind=WP), allocatable, dimension(:,:) :: values, valuesAB
real(kind=WP) :: gamma0_tra, gamma1_tra, gamma2_tra
integer :: ID

procedure, private :: WRITE_T_TRACER_DATA
procedure, private :: READ_T_TRACER_DATA
generic :: write(unformatted) => WRITE_T_TRACER_DATA
generic :: read(unformatted) => READ_T_TRACER_DATA

real(kind=WP), allocatable :: del_ttf_advhoriz(:,:), del_ttf_advvert(:,:)

procedure, private :: WRITE_T_TRACER_WORK
procedure, private :: READ_T_TRACER_WORK
generic :: write(unformatted) => WRITE_T_TRACER_WORK
generic :: read(unformatted) => READ_T_TRACER_WORK

integer :: num_tracers=2
type(t_tracer_data), allocatable :: data(:)
type(t_tracer_work) :: work
procedure, private :: WRITE_T_TRACER
procedure, private :: READ_T_TRACER
generic :: write(unformatted) => WRITE_T_TRACER
generic :: read(unformatted) => READ_T_TRACER

Debugging during GPU porting with OpenACC

Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. It compares the GPU computation against the same program running on the CPU. In this case, all compute constructs are done redundantly, on both CPU and GPU. The GPU results can then be compared against the CPU results and the differences reported.

A semi-automatic method which can be used with OpenACC is to allow the runtime to automatically compare values when they are downloaded from device memory. This is enabled with the -gpu=autocompare compiler flag, which also enables the redundant option. This runs each compute construct redundantly on CPU and GPU and compares the results, with no changes to the program itself.

The autocompare feature only compares data when it would get downloaded to system memory. To compare data after some compute construct that is in a data region, where the data is already present on the device, there are three ways to do the comparison at any point in the program.

  • First, you can insert an update self directive to download the data to compare. With the autocompare option enabled, any data downloaded with an update self directive will be downloaded from the GPU and compared to the values computed on the host CPU.

  • Alternatively, you can add a call to acc_compare, which compares the values then present on the GPU with the corresponding values in host memory. The acc_compare routine has only two arguments: the address of the data to be compared and the number of elements to compare. The data type is available in the OpenACC runtime, so doesn’t need to be specified.

Finally, you can use the acc compare directive.

Autocompare is useful to debug differences between CPU and GPU results during the porting. When during the porting the developer is facing a runtime error, the NVHPC compiler is often not giving much information. In this case, it is useful to set export NVCOMPILER_ACC_NOTIFY=3 in the batch script to get useful information about data movements and kernel launching in order to locate the part of the code responsible for the crash.