mp_collect bug when compiling with define PARALLEL_IO

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
xiaozhu557
Posts: 62
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

mp_collect bug when compiling with define PARALLEL_IO

#1 Post by xiaozhu557 » Fri Sep 22, 2017 4:35 am

Dear all,
I want to try the parallel input/output in order to increase the speed of I/O. But I have found a bug with the function mp_collect when I compiled the ROMS code 857 with the option #define PARALLEL_IO.

I have double checked that the parameter A of function mp_collect_f(ng,model,Npts,Aspv,A) is defined in the MODULE distribute_mod as

Code: Select all

real(r8), intent(inout) :: A(Npts)
, however it is used in the function nf_fread2d with the parameter Awrk is defined as

Code: Select all

real(r8),allocatable :: Awrk(:,:)
So, while compling it with the option #define PARALLEL_IO, I have got the error messages as follow,
nf_fread2d.f90(314): error #6285: There is no matching specific subroutine for this generic subroutine call. [MP_COLLECT]

Code: Select all

CALL mp_collect (ng, model, Npts, IniVal, Awrk)
-----------------^
compilation aborted for nf_fread2d.f90 (code 1)
make: *** [/fs01/home/Build/nf_fread2d.o] Error 1
make: *** Waiting for unfinished jobs....
Anyone can help me to figure out this bug? Thanks.
Last edited by xiaozhu557 on Sat Sep 23, 2017 7:17 am, edited 1 time in total.

User avatar
kate
Posts: 3796
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: mp_collect bug when compiling with define PARALLEL_IO

#2 Post by kate » Fri Sep 22, 2017 5:12 pm

That's weird - my copy of the trunk code lists (line 2598):

Code: Select all

      real(r8), intent(inout) :: A(Npts)
Try that.

xiaozhu557
Posts: 62
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#3 Post by xiaozhu557 » Sat Sep 23, 2017 7:16 am

Thanks for your reply, Kate.
It is also

Code: Select all

real(r8), intent(inout) :: A(Npts)
in my copy. I am sorry for the error in my last post. I have fixed it in my last post.

But I would like to let you know that the bug which I mentioned in my last post is caused by the disagreement between the shapes of the actual argument (Awrk(:,:),two dimensions) and the dummy argument (A(:), one dimension) for the subroutine mp_collect.

User avatar
kate
Posts: 3796
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: mp_collect bug when compiling with define PARALLEL_IO

#4 Post by kate » Sat Sep 23, 2017 3:23 pm

Ah, then the solution is to add another mp_collect_f routine to the interface with the 2-D array.

Frankly, my I/O speed is limited by the output, not the input. I separated the parallelness of the I/O into two different switches in my branch. After trying the PARALLEL_OUT option, I ended up going for compression of the output instead of parallel output. Which is faster depends on many things, but we need the disk space more. ;)

xiaozhu557
Posts: 62
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#5 Post by xiaozhu557 » Sun Sep 24, 2017 11:59 am

Thank you for your reminding on the compression of the output, Kate.

I would like to know whether you have the mp_collect_f routine to the interface with the 2-D array ? Since I am not good at the parallel programming. If you have, could you share it with me?

Thanks.

User avatar
kate
Posts: 3796
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: mp_collect bug when compiling with define PARALLEL_IO

#6 Post by kate » Mon Sep 25, 2017 4:39 pm

No, I don't have it. However, distribute.F has other examples of using arrays of differing shapes, such as mp_assemblef_1d and mp_assemblef_2d.

User avatar
arango
Site Admin
Posts: 1128
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: mp_collect bug when compiling with define PARALLEL_IO

#7 Post by arango » Tue Sep 26, 2017 1:24 am

The PARALLEL_IO is slower than serial I/O, and it won't help you to improve I/O performance. The last time that I checked that option, it was not working well, and I haven't checked if the NetCDF4 library has made progress on it. I can fix the issue with nf_read2d.F but it will not help you much. There is no need to introduce mp_collect for 2D arrays to the module interface in distribute.F.

In parallel I/O, we need parallel access to all input and output files. The parallel I/O only makes sense when you have a computer infrastructure (disk storage) that allows it. As far as I know, regular desktop or clusters don't have such infrastructure. We can find it in super-computer centers.

I assume that you already know that all input files need to be NetCDF4 (HDF5) type files. Also, all output files are the NetCDF4 type.

I am planning to make an update the code tomorrow to give the user more options fine-tune the MPI communications an accelerate the I/O.

xiaozhu557
Posts: 62
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#8 Post by xiaozhu557 » Tue Sep 26, 2017 1:37 am

Many thanks Kate and Arango.

It is a good news to me. I am using super-computer to run ROMS.

Hopefully I can get your new version ASAP.

xiaozhu557
Posts: 62
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#9 Post by xiaozhu557 » Fri Sep 29, 2017 7:57 am

arango wrote:I assume that you already know that all input files need to be NetCDF4 (HDF5) type files. Also, all output files are the NetCDF4 type.

I am planning to make an update the code tomorrow to give the user more options fine-tune the MPI communications an accelerate the I/O.
Hello, Arango, would you like to let me know once you have updated and uploaded the code for the MPI communications?

Thanks a lot.

fabeobd
Posts: 10
Joined: Thu Jun 09, 2011 2:10 pm
Location: University of Helsinki

Re: mp_collect bug when compiling with define PARALLEL_IO

#10 Post by fabeobd » Tue Jun 09, 2020 12:48 pm

Hi,

I'm facing this bug and wondering if any fix has been done at this point?

Thanks,
Fabio
Fabio Boeira Dias, Ph.D.
Postdoctoral researcher
Institute for Atmospheric and Earth System Research
University of Helsinki

fabeobd
Posts: 10
Joined: Thu Jun 09, 2011 2:10 pm
Location: University of Helsinki

Re: mp_collect bug when compiling with define PARALLEL_IO

#11 Post by fabeobd » Mon Jun 22, 2020 12:13 pm

I've figured out this bug which seems to be fixed in newer versions (I'm using a branch of ROMS 3.6). For documentation in case anyone face this bug in the future, the mp_collect call should be placed before the Jstr,Jend/Istr,Iend loop that calculates Awrk (around line 345), and not after as I have in my nf_fread2d.F code. Model compiled and run fine after this, which improved significantly the speed of input reading.

Cheers,
Fabio
Fabio Boeira Dias, Ph.D.
Postdoctoral researcher
Institute for Atmospheric and Earth System Research
University of Helsinki

xiaozhu557
Posts: 62
Joined: Fri Sep 11, 2009 1:48 pm
Location: nmefc

Re: mp_collect bug when compiling with define PARALLEL_IO

#12 Post by xiaozhu557 » Tue Jun 23, 2020 2:12 am

Thank you for your reply, Fabio.

It looks like the same for my copy with yours, after changed. It is as followed,

Code: Select all

IF (interpolate) THEN
            CALL mp_collect (ng, model, Npts, IniVal, wrk)
            ic=0
            DO j=Jstr,Jend
              DO i=Istr,Iend
                ic=ic+1
                Awrk(i,j)=wrk(ic)
              END DO
            END DO
          ELSE
Is it the same with your new version?

If your version can work, would you like to provide the files distribute.F and nf_fread2d.F? Your fixing looks like not the case of mine. For my question, it is the different shape of array between A and Awrk.

Thanks again.

User avatar
arango
Site Admin
Posts: 1128
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: mp_collect bug when compiling with define PARALLEL_IO

#13 Post by arango » Tue Jun 23, 2020 3:07 am

I can believe that this is done :shock: We are not responsible for corrections to the code by a third party. Your solution is straightforward. Update your code from the official ROMS repositories. The agreement to receive a freely distributed version of the ROMS framework from us is that Users need to keep their code up to date . I highly suggest that you read the term of the :arrow: agreement.

fabeobd
Posts: 10
Joined: Thu Jun 09, 2011 2:10 pm
Location: University of Helsinki

Re: mp_collect bug when compiling with define PARALLEL_IO

#14 Post by fabeobd » Tue Jun 23, 2020 7:52 am

Hi Xiau,

The error I initially got was the same you described in your first post:

"nf_fread2d.f90(314): error #6285: There is no matching specific subroutine for this generic subroutine call. [MP_COLLECT]"

Just changing the place where mp_collect is called fixed that (exactly as you posted). I got that by comparing the the code I'm using with the latest available. I didn't changed anything in the distribute.F. If you can't update your code as Arango suggested, I would advice you to compare files individually between your code and the latest one.

Arango, I understand the term of the agreement and would like to keep my code updated, but it seems a quite painful route in some cases. When I started my current project (Antarctic ocean-ice sheet modelling) earlier this year, I got a ROMS setup from my collaborators at University of Tasmania, which is based on ROMS 3.6 with modifications for the ice sheet coupling. They got the latest ROMS version when they starting developing the model (in 2016), but they didn't update it with the main branch in the subsequent years. I guess that is a common approach in several institutions - I have been worked with MOM previously and have seen people using different versions as they include mods for specific purposes. If I now try to merge versions it shows several hundreds of code changes, so sometimes update the code is not so straightforward.

Cheers,
Fabio
Fabio Boeira Dias, Ph.D.
Postdoctoral researcher
Institute for Atmospheric and Earth System Research
University of Helsinki

User avatar
arango
Site Admin
Posts: 1128
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: mp_collect bug when compiling with define PARALLEL_IO

#15 Post by arango » Tue Jun 23, 2020 4:21 pm

Well, you cannot pick and choose what routines you need to update. It doesn't work that way. Every change done to ROMS is well documented in the trac system, and the changes sometimes depend on previous changes and may be modified in future changes. The last major update to the Parallel I/O was done in :arrow: track ticket 747 on Oct 5, 2017. In that update, 38 files were modified. And many more changes have done to some of those files in the last three years. ROMS is available in both svn and git repositories. You don't need to make such changes by hand. You can introduce bugs to the code that will be a nightmare to fix, and we are not responsible for those. There are ways for branching in svn and git and resolve conflicts in the third party features that are not part of our distributed version.

You can check trac and all the changes and differences that have been made to ROMS since the base version of the code that you are using. It will take some time. There are various important changes to the ROMS kernels.

fabeobd
Posts: 10
Joined: Thu Jun 09, 2011 2:10 pm
Location: University of Helsinki

Re: mp_collect bug when compiling with define PARALLEL_IO

#16 Post by fabeobd » Wed Jun 24, 2020 9:48 am

Hi Arango,

thanks for your suggestions. The approach in compare versions of a single file was only in an attempt to diagnose the crash during the compilation with the PARALLEL_IO. I would very much like to update the code I'm currently using, if there is a doable route. I did a similar question a few months back (viewtopic.php?f=14&t=5447) and I ended up just using the code I have. But without doubt would be better to have it aligned with the newest developments. Which steps would you suggest in my case to resolve the conflicts from third party features?

Cheers
Fabio
Fabio Boeira Dias, Ph.D.
Postdoctoral researcher
Institute for Atmospheric and Earth System Research
University of Helsinki

User avatar
kate
Posts: 3796
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: mp_collect bug when compiling with define PARALLEL_IO

#17 Post by kate » Wed Jun 24, 2020 5:32 pm

I suggest learning about git. You can have a git repo with (a) the old ROMS you started from (b) the old ROMS with your modifications and (c) new ROMS. Or the crude thing to do is diff old ROMS vs. your code and save those diffs to a file. Then apply those changes to the new ROMS by hand. I assume that patch will fail after all the intervening ROMS changes, same with git merge.

User avatar
arango
Site Admin
Posts: 1128
Joined: Wed Feb 26, 2003 4:41 pm
Location: IMCS, Rutgers University
Contact:

Re: mp_collect bug when compiling with define PARALLEL_IO

#18 Post by arango » Wed Jun 24, 2020 6:13 pm

Everybody has their own procedure for updating code from repositories. I have several research repositories, and I usually do the following steps:
  • Copy the version that I want to update to a temporary one. So, I have a version of the code that works in case something goes wrong with the updating. I also use the old version to benchmark the old code against the new code. I compare the old and new solutions
  • I get a scope of the changes by comparing old and new versions of the code in the KDiff3 tool. I highly recommend users interested in developing code to install KDiff3. It is the best tool for comparing codes in a tree directory structure. It is free.
  • Update the code with svn (svn update) or git (git pull). I am much better and experienced with svn.
  • Resolve the merging conflicts with your favorite text editor. I have been using Xemacs since I was a graduate student. It has wonderful tools for comparing files line-by-line and correcting/merging the desired lines of code. The differences are all color-coded.
Good luck

fabeobd
Posts: 10
Joined: Thu Jun 09, 2011 2:10 pm
Location: University of Helsinki

Re: mp_collect bug when compiling with define PARALLEL_IO

#19 Post by fabeobd » Fri Jun 26, 2020 10:25 am

Hi Kate and Arango,

many thanks for your suggestions. I do use git but I'm more on the newby side - I think that's a great incentive to learn it in details. Thanks for indicating the KDiff3, it does look an amazing tool. I usually use vimdiff to compare files and it's also color-code, which is handy. I going to give a try in merging the codes, and hope it's not so difficult as I was thinking.

Cheers,
Fabio
Fabio Boeira Dias, Ph.D.
Postdoctoral researcher
Institute for Atmospheric and Earth System Research
University of Helsinki

Post Reply