Feature/nh2 #107

JinsuRhee · 2021-07-27T02:55:04Z

This request includes the following...

Tree Optimisation
Some bugs fixed in the 6d core search
Tree-based MPI Decomposition

… OMP settings because the number of final galaxies is different for different runs

push

…or-STF into feature/NHrunver

pelahi · 2021-07-27T03:20:18Z

Requires some changes to source branch. Some changes alter the default behaviour of VR or default compilation.

pelahi · 2021-07-27T06:34:44Z

CMakeLists.txt

@@ -38,7 +38,7 @@ macro(vr_option optname optdesc status)
 endmacro()

 # Input formats
-vr_option(HDF5 "Attempt to include HDF5 support in VELOCIraptor" ON)
+vr_option(HDF5 "Attempt to include HDF5 support in VELOCIraptor" OFF)


Default compilation should not be changed.

@JinsuRhee, please update the source of the pull

pelahi · 2021-07-27T06:35:49Z

src/allvars.h

+        impiusemesh = false;
+	impiusetree = true;
 #else
-        impiusemesh = true;
+        impiusemesh = false;
+	impiusetree = true;


Default configuration should not be changed. Plus swift will use a mpi mesh decomposition so please revert.

@JinsuRhee , can update the user interface so that this option can be set at runtime or when a ramses input is loaded?

pelahi · 2021-07-27T06:37:53Z

src/js_perform.h

@@ -0,0 +1 @@
+//#define JS_ADT_ON


Remove this file since it is not needed.

@JinsuRhee, please be careful with pull requests to not add unneeded files

pelahi · 2021-07-27T06:54:16Z

src/allvars.h

+        impiusemesh = false;
+	impiusetree = true;
 #else
-        impiusemesh = true;
+        impiusemesh = false;
+	impiusetree = true;


@JinsuRhee , can update the user interface so that this option can be set at runtime or when a ramses input is loaded?

pelahi · 2021-07-27T06:57:06Z

src/ompvar.h

-#define ompsplitsubsearchnum 10000000
+#define ompsplitsubsearchnum 100000000


@JinsuRhee why the large increase in the splitsubsearchnum

pelahi · 2021-07-27T06:58:37Z

src/ramsesitems.h

-#define RAMSESIDTYPE  unsigned int
+#define RAMSESIDTYPE int


@JinsuRhee is it possible that this type could be different for different runs of RAMSES? Does it warrant wrapping this in idefs and adding some compilation options?

pelahi · 2021-07-27T07:04:16Z

src/mpiramsesio.cxx

+		cout<<"%123123-----"<<endl;
+		cout<<"%123123	MPIDomainDecompositionWithTree: Check the size of Part_mpi to include the number of baryons"<<endl;
+		cout<<"%123123-----"<<endl;
+		exit(9);


@JinsuRhee is there a reason for having %123123?

pelahi · 2021-07-27T07:06:04Z

src/mpiramsesio.cxx

+//                    byteoffset+=RAMSES_fortran_read(Famr[i],header[i].npart[RAMSESGASTYPE]);
+//
+//                    //then skip the rest
+//                    for (j=0;j<14;j++) RAMSES_fortran_skip(Famr[i]);
+//                    if (lmin>header[i].nlevelmax) lmin=header[i].nlevelmax;
+//                    if (lmax<header[i].nlevelmax) lmax=header[i].nlevelmax;
+//                    //@}
+//                    //read header info from hydro files
+//                    //@{
+//                    RAMSES_fortran_skip(Fhydro[i]);
+//                    RAMSES_fortran_read(Fhydro[i],header[i].nvarh);
+//                    RAMSES_fortran_skip(Fhydro[i]);
+//                    RAMSES_fortran_skip(Fhydro[i]);
+//                    RAMSES_fortran_skip(Fhydro[i]);
+//                    RAMSES_fortran_read(Fhydro[i],header[i].gamma_index);
+//                    //@}
+//
+//                    //then apparently read ngridlevels, which appears to be an array storing the number of grids at a given level
+//                    ngridlevel=new int[header[i].nlevelmax];
+//                    ngridfile=new int[(1+header[i].nboundary)*header[i].nlevelmax];
+//                    RAMSES_fortran_read(Famr[i],ngridlevel);
+//                    for (j=0;j<header[i].nlevelmax;j++) ngridfile[j]=ngridlevel[j];
+//                    //skip some more
+//                    RAMSES_fortran_skip(Famr[i]);
+//                    //if nboundary>0 then need two skip twice then read ngridbound
+//                    if(header[i].nboundary>0) {
+//                        ngridbound=new int[header[i].nboundary*header[i].nlevelmax];
+//                        RAMSES_fortran_skip(Famr[i]);
+//                        RAMSES_fortran_skip(Famr[i]);
+//                        //ngridbound is an array of some sort but I don't see what it is used for
+//                        RAMSES_fortran_read(Famr[i],ngridbound);
+//                        for (j=0;j<header[i].nlevelmax;j++) ngridfile[header[i].nlevelmax+j]=ngridbound[j];
+//                    }
+//                    //skip some more
+//                    RAMSES_fortran_skip(Famr[i],2);
+//                    //if odering list in info is bisection need to skip more
+//                    if (orderingstring==string("bisection")) RAMSES_fortran_skip(Famr[i],5);
+//                    else RAMSES_fortran_skip(Famr[i],4);
+//
+//                    for (k=0;k<header[i].nboundary+1;k++) {
+//                        for (j=0;j<header[i].nlevelmax;j++) {
+//                            //first read amr for positions
+//                            chunksize=nchunk=ngridfile[k*header[i].nlevelmax+j];
+//                            if (chunksize>0) {
+//                                xtempchunk=new RAMSESFLOAT[3*chunksize];
+//                                //store son value in icell
+//                                icellchunk=new int[header[i].twotondim*chunksize];
+//                                //skip grid index, next index and prev index.
+//                                RAMSES_fortran_skip(Famr[i],3);
+//                                //now read grid centre
+//                                for (idim=0;idim<header[i].ndim;idim++) {
+//                                    RAMSES_fortran_read(Famr[i],&xtempchunk[idim*chunksize]);
+//                                }
+//                                //skip father index, then neighbours index
+//                                RAMSES_fortran_skip(Famr[i],1+2*header[i].ndim);
+//                                //read son index to determine if a cell in a specific grid is at the highest resolution and needs to be represented by a particle
+//                                for (idim=0;idim<header[i].twotondim;idim++) {
+//                                    RAMSES_fortran_read(Famr[i],&icellchunk[idim*chunksize]);
+//                                }
+//                                //skip cpu map and refinement map (2^ndim*2)
+//                                RAMSES_fortran_skip(Famr[i],2*header[i].twotondim);
+//                            }
+//                            RAMSES_fortran_skip(Fhydro[i]);
+//                            //then read hydro for other variables (first is density, then velocity, then pressure, then metallicity )
+//                            if (chunksize>0) {
+//                                //first read velocities (for 2 cells per number of dimensions (ie: cell corners?))
+//                                for (idim=0;idim<header[i].twotondim;idim++) {
+//                                    for (ivar=0;ivar<header[i].nvarh;ivar++) {
+//                                        for (igrid=0;igrid<chunksize;igrid++) {
+//                                            //once we have looped over all the hydro data then can start actually storing it into the particle structures
+//                                            if (ivar==header[i].nvarh-1) {
+//                                                //if cell has no internal cells or at maximum level produce a particle
+//                                                if (icellchunk[idim*chunksize+igrid]==0 || j==header[i].nlevelmax-1) {
+//                                                    //first suggestion is to add some jitter to the particle positions
+//                                                    double dx = pow(0.5, j);
+//                                                    int ix, iy, iz;
+//                                                    //below assumes three dimensions with 8 corners (? maybe cells) per grid
+//                                                    iz = idim/4;
+//                                                    iy = (idim - (4*iz))/2;
+//                                                    ix = idim - (2*iy) - (4*iz);
+//                                                    // Calculate absolute coordinates + jitter, and generate particle
+//                                                    xtemp[0] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid] + (double(ix)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ;
+//                                                    xtemp[1] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid+1*chunksize] + (double(iy)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ;
+//                                                    xtemp[2] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid+2*chunksize] + (double(iz)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ;
+//                                                    //determine processor this particle belongs on based on its spatial position
+//                                                    ibuf=MPIGetParticlesProcessor(opt, xtemp[0],xtemp[1],xtemp[2]);
+//                                                    Nbuf[ibuf]++;
+//                                                }
+//                                            }
+//                                        }
+//                                    }
+//                                }
+//                                delete[] xtempchunk;
+//                            }
+//                        }
+//                    }
+//                    Famr[i].close();
+//                }
+//            }
+//        }


@JinsuRhee is there a reason for commenting all this?

pelahi · 2021-07-27T07:17:40Z

src/stf.h

@@ -7,6 +7,7 @@

 #include "allvars.h"
 #include "proto.h"
+#include "js_perform.h"


@JinsuRhee remove this include.

pelahi · 2021-07-27T07:18:30Z

src/mpiramsesio.cxx

+	xtempall   = new RAMSESFLOAT[nbodies*3];
+	famtempall = new int[nbodies];


@JinsuRhee, can you explain how xtempall is a viable solution? It is quite possible that there are too many particles to fit in memory. Hence the previous approach to load stuff in chunks. Does it not make sense to only enable this type of load if and only if one can construct a local xtempall allocation and then build a tree to do the mpi decomposition?

pelahi · 2021-07-27T07:20:40Z

src/mpiroutines.cxx

+void MPIInitialDomainDecompositionWithTree(Options &opt){
+    if (ThisTask==0) {
+    }
+}
+


@JinsuRhee , why is there any empty function?

pelahi · 2021-07-27T07:22:43Z

src/omproutines.cxx

-        tree3dfofomp[i] = new KDTree(&Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL);
+        //tree3dfofomp[i] = new KDTree(&Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL);
+        tree3dfofomp[i] = new KDTree(js_bsize, &Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL);


@JinsuRhee , why the KDTree API change? The API should remain the same. Please verify the NBodylib.

pelahi · 2021-07-27T07:23:28Z

src/omproutines.cxx

@@ -44,14 +44,15 @@ OMP_Domain *OpenMPBuildDomains(Options &opt, const Int_t numompregions, KDTree *
 KDTree **OpenMPBuildLocalTrees(Options &opt, const Int_t numompregions, vector<Particle> &Part, OMP_Domain *ompdomain, Double_t *period)
 {
    KDTree **tree3dfofomp = new KDTree*[numompregions];
-    Int_t i;
+    Int_t i, js_bsize;


@JinsuRhee, update all variables that use js_ to not use this prefix.

pelahi · 2021-07-27T07:24:48Z

src/omproutines.cxx

+
+	js_count++;
+	cout<<"		3DFOF Log - "<<i<<" th / "<<js_count<<" of "<<numompregions<<" / # ptcls : "<<ompdomain[i].ncount<<" / # groups : "<<ng<<" / Time [s] : "<<MyGetTime() - js_time<<endl;


@JinsuRhee , please remove this and also the js_time

pelahi · 2021-07-27T07:26:34Z

src/ramsesio.cxx

+
+    getline(Finfo,stringbuf);//Mysterious mis-reading one line
+


@JinsuRhee, Is this true for all RAMSES formats? Might it be necessary to wrap this in ifdefs and add some compilation options to the ramses format? This might apply to the additions you have added below.

pelahi · 2021-07-27T07:27:44Z

src/search.cxx

-    bool runompfof = (numompregions>=2 && nthreads > 1 && opt.iopenmpfof == 1);
+    //bool runompfof = (numompregions>=2 && nthreads > 1 && opt.iopenmpfof == 1);
+    bool runompfof = (numompregions>=1 && nthreads > 1 && opt.iopenmpfof == 1);


This does not make sense. @JinsuRhee you can only run searches in parallel if there are at least two regions. Please fix.

pelahi · 2021-07-27T07:28:40Z

src/search.cxx

+        //tree = new KDTree(Part.data(),nbodies,opt.openmpfofsize,tree->TPHYS,tree->KEPAN,100);
+        tree = new KDTree(rdist, Part.data(),nbodies,opt.openmpfofsize,tree->TPHYS,tree->KEPAN,100);
+	if(opt.iverbose) cout<<ThisTask<<":	Building Root Tree Done"<<endl;
+	if(opt.iverbose) cout<<"			# of OMP Domains -> "<<tree->GetNumLeafNodes()<<endl;
+


@JinsuRhee , These API changes to the KDTree are not acceptable. Please update the library to ensure that it remains backward compatible.

pelahi · 2021-07-27T07:30:12Z

src/ui.cxx

+		    else if (strcmp(tbuff, "MPI_use_tree_decomposition")==0)
+		        opt.impiusetree = (atoi(vbuff)>0);


Had a comment about adding this config option. Does also require an update to allvars.cxx to ensure the meta data is written for the config option.

pelahi · 2021-07-27T07:31:12Z

src/unbind.cxx

-#pragma omp parallel default(shared)  \
+#pragma omp parallel default(shared) num_threads(1) \


There is no point in running a parallel with num_threads(1). Remove

pelahi · 2021-07-27T07:31:31Z

src/unbind.cxx

-    #pragma omp parallel for \
-    default(shared) private(v2,Ti,mass) schedule(dynamic) \
-    reduction(+:totT,Efrac,nunbound) reduction(max:maxE) num_threads(nthreads)
+    //#pragma omp parallel for \
+    //default(shared) private(v2,Ti,mass) schedule(dynamic) \
+    //reduction(+:totT,Efrac,nunbound) reduction(max:maxE) num_threads(nthreads)


Why is this commented out?

JinsuRhee and others added 30 commits October 23, 2020 10:43

Find Leaf Node by its tag value

d43f117

KDTree (local & OMP) for 3D FOF changed

f9e31e9

This time2 is not needed

89e7c7f

Just for output log message (negelect this commit)

2815f59

KDTree for 6D FOF changed

d5f4497

Just for output log message (neglect this commit)

8add255

Tree building for 6D CORE search case

2080181

Adaptive KDTree for the case of single thread 3DFOF search

21be899

Adaptive KDTree for the case of noOMP DFOF search

d741df7

(Bug? Not fixed yet bypassed) It seems there isa bug related with the…

17e57f8

… OMP settings because the number of final galaxies is different for different runs

typeval should be initialized

1e4a454

Just for output log message (neglect this commit)

e37ab11

Done:)

1db4380

Done..

56ed0f5

ramsesitems.h han changed to cover negtive IDs

8620155

test has done

f94a643

(Change) omp criterion altered

1dbfdb1

(Error) format fixed

35f973d

omp cut changed

5551e55

(Fixed) redundant read of id

f04eafb

Log output deleted

d49c44f

Nbody added

0ef637f

(Add) For performance test, Adaptive Tree on/off mount

ce67283

bugs in ramsesio fixed

7e45b04

push

cd33a25

Merge pull request #2 from JinsuRhee/feature/forhighres

6e11c18

push

ramsesio: modified for the new output

d0c9fb3

a quick test for weighting

7f84e9a

unbind.cxx: considering only bound mass fraction part added

3723142

unbind: minbound fraction is added

51a5697

JinsuRhee added 2 commits June 23, 2021 10:01

Merge branch 'feature/NH2' of https://github.com/JinsuRhee/VELOCIrapt…

bdce28a

…or-STF into feature/NHrunver

MPI decomposibion by tree is updated

2aa3a3d

pelahi self-requested a review July 27, 2021 04:03

pelahi assigned pelahi and JinsuRhee Jul 27, 2021

pelahi added the enhancement label Jul 27, 2021

pelahi requested changes Jul 27, 2021

View reviewed changes

mpi decomposition by tree: boundary modification

b065380

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/nh2 #107

Feature/nh2 #107

JinsuRhee commented Jul 27, 2021

pelahi commented Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

pelahi Jul 27, 2021

		#define ompsplitsubsearchnum 10000000
		#define ompsplitsubsearchnum 100000000

		xtempall = new RAMSESFLOAT[nbodies*3];
		famtempall = new int[nbodies];


		js_count++;
		cout<<" 3DFOF Log - "<<i<<" th / "<<js_count<<" of "<<numompregions<<" / # ptcls : "<<ompdomain[i].ncount<<" / # groups : "<<ng<<" / Time [s] : "<<MyGetTime() - js_time<<endl;

		else if (strcmp(tbuff, "MPI_use_tree_decomposition")==0)
		opt.impiusetree = (atoi(vbuff)>0);

		#pragma omp parallel default(shared) \
		#pragma omp parallel default(shared) num_threads(1) \

Feature/nh2 #107

Are you sure you want to change the base?

Feature/nh2 #107

Conversation

JinsuRhee commented Jul 27, 2021

pelahi commented Jul 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment