Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/nh2 #107

Open
wants to merge 33 commits into
base: feature/ramses-optimisations
Choose a base branch
from

Conversation

JinsuRhee
Copy link

This request includes the following...

  1. Tree Optimisation
  2. Some bugs fixed in the 6d core search
  3. Tree-based MPI Decomposition

@pelahi
Copy link
Owner

pelahi commented Jul 27, 2021

Requires some changes to source branch. Some changes alter the default behaviour of VR or default compilation.

@@ -38,7 +38,7 @@ macro(vr_option optname optdesc status)
endmacro()

# Input formats
vr_option(HDF5 "Attempt to include HDF5 support in VELOCIraptor" ON)
vr_option(HDF5 "Attempt to include HDF5 support in VELOCIraptor" OFF)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default compilation should not be changed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee, please update the source of the pull

Comment on lines +1113 to +1117
impiusemesh = false;
impiusetree = true;
#else
impiusemesh = true;
impiusemesh = false;
impiusetree = true;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default configuration should not be changed. Plus swift will use a mpi mesh decomposition so please revert.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee , can update the user interface so that this option can be set at runtime or when a ramses input is loaded?

@@ -0,0 +1 @@
//#define JS_ADT_ON
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file since it is not needed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee, please be careful with pull requests to not add unneeded files

Comment on lines +1113 to +1117
impiusemesh = false;
impiusetree = true;
#else
impiusemesh = true;
impiusemesh = false;
impiusetree = true;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee , can update the user interface so that this option can be set at runtime or when a ramses input is loaded?

#define ompsplitsubsearchnum 10000000
#define ompsplitsubsearchnum 100000000
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee why the large increase in the splitsubsearchnum

#define RAMSESIDTYPE unsigned int
#define RAMSESIDTYPE int
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee is it possible that this type could be different for different runs of RAMSES? Does it warrant wrapping this in idefs and adding some compilation options?

Comment on lines +148 to +151
cout<<"%123123-----"<<endl;
cout<<"%123123 MPIDomainDecompositionWithTree: Check the size of Part_mpi to include the number of baryons"<<endl;
cout<<"%123123-----"<<endl;
exit(9);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee is there a reason for having %123123?

Comment on lines +522 to +664
// byteoffset+=RAMSES_fortran_read(Famr[i],header[i].npart[RAMSESGASTYPE]);
//
// //then skip the rest
// for (j=0;j<14;j++) RAMSES_fortran_skip(Famr[i]);
// if (lmin>header[i].nlevelmax) lmin=header[i].nlevelmax;
// if (lmax<header[i].nlevelmax) lmax=header[i].nlevelmax;
// //@}
// //read header info from hydro files
// //@{
// RAMSES_fortran_skip(Fhydro[i]);
// RAMSES_fortran_read(Fhydro[i],header[i].nvarh);
// RAMSES_fortran_skip(Fhydro[i]);
// RAMSES_fortran_skip(Fhydro[i]);
// RAMSES_fortran_skip(Fhydro[i]);
// RAMSES_fortran_read(Fhydro[i],header[i].gamma_index);
// //@}
//
// //then apparently read ngridlevels, which appears to be an array storing the number of grids at a given level
// ngridlevel=new int[header[i].nlevelmax];
// ngridfile=new int[(1+header[i].nboundary)*header[i].nlevelmax];
// RAMSES_fortran_read(Famr[i],ngridlevel);
// for (j=0;j<header[i].nlevelmax;j++) ngridfile[j]=ngridlevel[j];
// //skip some more
// RAMSES_fortran_skip(Famr[i]);
// //if nboundary>0 then need two skip twice then read ngridbound
// if(header[i].nboundary>0) {
// ngridbound=new int[header[i].nboundary*header[i].nlevelmax];
// RAMSES_fortran_skip(Famr[i]);
// RAMSES_fortran_skip(Famr[i]);
// //ngridbound is an array of some sort but I don't see what it is used for
// RAMSES_fortran_read(Famr[i],ngridbound);
// for (j=0;j<header[i].nlevelmax;j++) ngridfile[header[i].nlevelmax+j]=ngridbound[j];
// }
// //skip some more
// RAMSES_fortran_skip(Famr[i],2);
// //if odering list in info is bisection need to skip more
// if (orderingstring==string("bisection")) RAMSES_fortran_skip(Famr[i],5);
// else RAMSES_fortran_skip(Famr[i],4);
//
// for (k=0;k<header[i].nboundary+1;k++) {
// for (j=0;j<header[i].nlevelmax;j++) {
// //first read amr for positions
// chunksize=nchunk=ngridfile[k*header[i].nlevelmax+j];
// if (chunksize>0) {
// xtempchunk=new RAMSESFLOAT[3*chunksize];
// //store son value in icell
// icellchunk=new int[header[i].twotondim*chunksize];
// //skip grid index, next index and prev index.
// RAMSES_fortran_skip(Famr[i],3);
// //now read grid centre
// for (idim=0;idim<header[i].ndim;idim++) {
// RAMSES_fortran_read(Famr[i],&xtempchunk[idim*chunksize]);
// }
// //skip father index, then neighbours index
// RAMSES_fortran_skip(Famr[i],1+2*header[i].ndim);
// //read son index to determine if a cell in a specific grid is at the highest resolution and needs to be represented by a particle
// for (idim=0;idim<header[i].twotondim;idim++) {
// RAMSES_fortran_read(Famr[i],&icellchunk[idim*chunksize]);
// }
// //skip cpu map and refinement map (2^ndim*2)
// RAMSES_fortran_skip(Famr[i],2*header[i].twotondim);
// }
// RAMSES_fortran_skip(Fhydro[i]);
// //then read hydro for other variables (first is density, then velocity, then pressure, then metallicity )
// if (chunksize>0) {
// //first read velocities (for 2 cells per number of dimensions (ie: cell corners?))
// for (idim=0;idim<header[i].twotondim;idim++) {
// for (ivar=0;ivar<header[i].nvarh;ivar++) {
// for (igrid=0;igrid<chunksize;igrid++) {
// //once we have looped over all the hydro data then can start actually storing it into the particle structures
// if (ivar==header[i].nvarh-1) {
// //if cell has no internal cells or at maximum level produce a particle
// if (icellchunk[idim*chunksize+igrid]==0 || j==header[i].nlevelmax-1) {
// //first suggestion is to add some jitter to the particle positions
// double dx = pow(0.5, j);
// int ix, iy, iz;
// //below assumes three dimensions with 8 corners (? maybe cells) per grid
// iz = idim/4;
// iy = (idim - (4*iz))/2;
// ix = idim - (2*iy) - (4*iz);
// // Calculate absolute coordinates + jitter, and generate particle
// xtemp[0] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid] + (double(ix)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ;
// xtemp[1] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid+1*chunksize] + (double(iy)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ;
// xtemp[2] = ((((float)rand()/(float)RAND_MAX) * header[i].BoxSize * dx) +(header[i].BoxSize * (xtempchunk[igrid+2*chunksize] + (double(iz)-0.5) * dx )) - (header[i].BoxSize*dx/2.0)) ;
// //determine processor this particle belongs on based on its spatial position
// ibuf=MPIGetParticlesProcessor(opt, xtemp[0],xtemp[1],xtemp[2]);
// Nbuf[ibuf]++;
// }
// }
// }
// }
// }
// delete[] xtempchunk;
// }
// }
// }
// Famr[i].close();
// }
// }
// }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee is there a reason for commenting all this?

@@ -7,6 +7,7 @@

#include "allvars.h"
#include "proto.h"
#include "js_perform.h"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee remove this include.

Comment on lines +194 to +195
xtempall = new RAMSESFLOAT[nbodies*3];
famtempall = new int[nbodies];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee, can you explain how xtempall is a viable solution? It is quite possible that there are too many particles to fit in memory. Hence the previous approach to load stuff in chunks. Does it not make sense to only enable this type of load if and only if one can construct a local xtempall allocation and then build a tree to do the mpi decomposition?

Comment on lines +262 to +266
void MPIInitialDomainDecompositionWithTree(Options &opt){
if (ThisTask==0) {
}
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee , why is there any empty function?

Comment on lines -54 to +55
tree3dfofomp[i] = new KDTree(&Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL);
//tree3dfofomp[i] = new KDTree(&Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL);
tree3dfofomp[i] = new KDTree(js_bsize, &Part.data()[ompdomain[i].noffset],ompdomain[i].ncount,opt.Bsize,tree3dfofomp[i]->TPHYS,tree3dfofomp[i]->KEPAN,100,0,0,0,period,NULL);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee , why the KDTree API change? The API should remain the same. Please verify the NBodylib.

@@ -44,14 +44,15 @@ OMP_Domain *OpenMPBuildDomains(Options &opt, const Int_t numompregions, KDTree *
KDTree **OpenMPBuildLocalTrees(Options &opt, const Int_t numompregions, vector<Particle> &Part, OMP_Domain *ompdomain, Double_t *period)
{
KDTree **tree3dfofomp = new KDTree*[numompregions];
Int_t i;
Int_t i, js_bsize;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee, update all variables that use js_ to not use this prefix.

Comment on lines +142 to +144

js_count++;
cout<<" 3DFOF Log - "<<i<<" th / "<<js_count<<" of "<<numompregions<<" / # ptcls : "<<ompdomain[i].ncount<<" / # groups : "<<ng<<" / Time [s] : "<<MyGetTime() - js_time<<endl;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee , please remove this and also the js_time

Comment on lines +299 to +301

getline(Finfo,stringbuf);//Mysterious mis-reading one line

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee, Is this true for all RAMSES formats? Might it be necessary to wrap this in ifdefs and add some compilation options to the ramses format? This might apply to the additions you have added below.

Comment on lines -60 to +61
bool runompfof = (numompregions>=2 && nthreads > 1 && opt.iopenmpfof == 1);
//bool runompfof = (numompregions>=2 && nthreads > 1 && opt.iopenmpfof == 1);
bool runompfof = (numompregions>=1 && nthreads > 1 && opt.iopenmpfof == 1);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make sense. @JinsuRhee you can only run searches in parallel if there are at least two regions. Please fix.

Comment on lines +95 to +99
//tree = new KDTree(Part.data(),nbodies,opt.openmpfofsize,tree->TPHYS,tree->KEPAN,100);
tree = new KDTree(rdist, Part.data(),nbodies,opt.openmpfofsize,tree->TPHYS,tree->KEPAN,100);
if(opt.iverbose) cout<<ThisTask<<": Building Root Tree Done"<<endl;
if(opt.iverbose) cout<<" # of OMP Domains -> "<<tree->GetNumLeafNodes()<<endl;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JinsuRhee , These API changes to the KDTree are not acceptable. Please update the library to ensure that it remains backward compatible.

Comment on lines +995 to +996
else if (strcmp(tbuff, "MPI_use_tree_decomposition")==0)
opt.impiusetree = (atoi(vbuff)>0);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a comment about adding this config option. Does also require an update to allvars.cxx to ensure the meta data is written for the config option.

Comment on lines -802 to +831
#pragma omp parallel default(shared) \
#pragma omp parallel default(shared) num_threads(1) \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no point in running a parallel with num_threads(1). Remove

Comment on lines -235 to +265
#pragma omp parallel for \
default(shared) private(v2,Ti,mass) schedule(dynamic) \
reduction(+:totT,Efrac,nunbound) reduction(max:maxE) num_threads(nthreads)
//#pragma omp parallel for \
//default(shared) private(v2,Ti,mass) schedule(dynamic) \
//reduction(+:totT,Efrac,nunbound) reduction(max:maxE) num_threads(nthreads)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants