mp_collect bug when compiling with define PARALLEL_IO

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
xiaozhu557
Posts: 61
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

mp_collect bug when compiling with define PARALLEL_IO

#1 Post by xiaozhu557 » Fri Sep 22, 2017 4:35 am

Dear all,
I want to try the parallel input/output in order to increase the speed of I/O. But I have found a bug with the function mp_collect when I compiled the ROMS code 857 with the option #define PARALLEL_IO.

I have double checked that the parameter A of function mp_collect_f(ng,model,Npts,Aspv,A) is defined in the MODULE distribute_mod as

Code: Select all

real(r8), intent(inout) :: A(Npts)
, however it is used in the function nf_fread2d with the parameter Awrk is defined as

Code: Select all

real(r8),allocatable :: Awrk(:,:)
So, while compling it with the option #define PARALLEL_IO, I have got the error messages as follow,
nf_fread2d.f90(314): error #6285: There is no matching specific subroutine for this generic subroutine call. [MP_COLLECT]

Code: Select all

CALL mp_collect (ng, model, Npts, IniVal, Awrk)
-----------------^
compilation aborted for nf_fread2d.f90 (code 1)
make: *** [/fs01/home/Build/nf_fread2d.o] Error 1
make: *** Waiting for unfinished jobs....
Anyone can help me to figure out this bug? Thanks.
Last edited by xiaozhu557 on Sat Sep 23, 2017 7:17 am, edited 1 time in total.

User avatar
kate
Posts: 3770
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: mp_collect bug when compiling with define PARALLEL_IO

#2 Post by kate » Fri Sep 22, 2017 5:12 pm

That's weird - my copy of the trunk code lists (line 2598):

Code: Select all

      real(r8), intent(inout) :: A(Npts)
Try that.

xiaozhu557
Posts: 61
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#3 Post by xiaozhu557 » Sat Sep 23, 2017 7:16 am

Thanks for your reply, Kate.
It is also

Code: Select all

real(r8), intent(inout) :: A(Npts)
in my copy. I am sorry for the error in my last post. I have fixed it in my last post.

But I would like to let you know that the bug which I mentioned in my last post is caused by the disagreement between the shapes of the actual argument (Awrk(:,:),two dimensions) and the dummy argument (A(:), one dimension) for the subroutine mp_collect.

User avatar
kate
Posts: 3770
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: mp_collect bug when compiling with define PARALLEL_IO

#4 Post by kate » Sat Sep 23, 2017 3:23 pm

Ah, then the solution is to add another mp_collect_f routine to the interface with the 2-D array.

Frankly, my I/O speed is limited by the output, not the input. I separated the parallelness of the I/O into two different switches in my branch. After trying the PARALLEL_OUT option, I ended up going for compression of the output instead of parallel output. Which is faster depends on many things, but we need the disk space more. ;)

xiaozhu557
Posts: 61
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#5 Post by xiaozhu557 » Sun Sep 24, 2017 11:59 am

Thank you for your reminding on the compression of the output, Kate.

I would like to know whether you have the mp_collect_f routine to the interface with the 2-D array ? Since I am not good at the parallel programming. If you have, could you share it with me?

Thanks.

User avatar
kate
Posts: 3770
Joined: Wed Jul 02, 2003 5:29 pm
Location: IMS/UAF, USA

Re: mp_collect bug when compiling with define PARALLEL_IO

#6 Post by kate » Mon Sep 25, 2017 4:39 pm

No, I don't have it. However, distribute.F has other examples of using arrays of differing shapes, such as mp_assemblef_1d and mp_assemblef_2d.

User avatar
arango
Site Admin
Posts: 1106
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: mp_collect bug when compiling with define PARALLEL_IO

#7 Post by arango » Tue Sep 26, 2017 1:24 am

The PARALLEL_IO is slower than serial I/O, and it won't help you to improve I/O performance. The last time that I checked that option, it was not working well, and I haven't checked if the NetCDF4 library has made progress on it. I can fix the issue with nf_read2d.F but it will not help you much. There is no need to introduce mp_collect for 2D arrays to the module interface in distribute.F.

In parallel I/O, we need parallel access to all input and output files. The parallel I/O only makes sense when you have a computer infrastructure (disk storage) that allows it. As far as I know, regular desktop or clusters don't have such infrastructure. We can find it in super-computer centers.

I assume that you already know that all input files need to be NetCDF4 (HDF5) type files. Also, all output files are the NetCDF4 type.

I am planning to make an update the code tomorrow to give the user more options fine-tune the MPI communications an accelerate the I/O.

xiaozhu557
Posts: 61
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#8 Post by xiaozhu557 » Tue Sep 26, 2017 1:37 am

Many thanks Kate and Arango.

It is a good news to me. I am using super-computer to run ROMS.

Hopefully I can get your new version ASAP.

xiaozhu557
Posts: 61
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#9 Post by xiaozhu557 » Fri Sep 29, 2017 7:57 am

arango wrote:I assume that you already know that all input files need to be NetCDF4 (HDF5) type files. Also, all output files are the NetCDF4 type.

I am planning to make an update the code tomorrow to give the user more options fine-tune the MPI communications an accelerate the I/O.
Hello, Arango, would you like to let me know once you have updated and uploaded the code for the MPI communications?

Thanks a lot.

Post Reply