There is a shared-memory and serial with partition parallel bug when activating both TS_MPDATA
. I have been looking for this bug for a while and I have not been able to find it and fix it. It is a very weird bug and it is only found in all the N-S periodic
test cases distributed in ROMS. It displays different behaviors making it more difficult to track it. Each test case gives us a very different clue. The only thing that it is common is that the bug doesn't start right away, but appears after several timesteps which makes it much harder to find. I have never seen a parallel bug of this kind before so it is new territory for me. It is really weird to have parallel bugs associated with round-off. In my experience, parallel bugs always appear right away. All clues tell me that the problem is in mpdata_adiff.F
. However, I am starting to suspect that it is somewhere else. I have made the backward parallel dependencies analysis for private and global arrays in this routine several times and re-written the ranges several times and I still get the parallel bug.
TS_MPDATA is still fine with any other type of boundary condition. If you are using TS_MPDATA
, you need to run in serial with no partitions or distributed-memory (MPI). We will continue hunting for this elusive parallel bug
By the way, TS_MPDATA
is fine for any partition with shared-memory, distributed-memory, and serial with partitions.