Skip to content
This repository has been archived by the owner on Dec 3, 2019. It is now read-only.

Building 1.0.6 with MPI #121

Open
AustinBorger opened this issue Dec 4, 2017 · 3 comments
Open

Building 1.0.6 with MPI #121

AustinBorger opened this issue Dec 4, 2017 · 3 comments

Comments

@AustinBorger
Copy link

When building RevBayes v. 1.0.6 with MPI, a compilation error will appear:

/gpfs/projects/hpc_support/revbayes-dev/src/revlanguage/analysis/mcmc/RlMonteCarloAnalysis.cpp:          
In member function ‘virtual RevLanguage::RevPtr<RevLanguage::RevVariable> RevLanguage::MonteCarloAnalysis::executeMethod(const string&, const std::vector<RevLanguage::Argument>&, bool&)’:
/gpfs/projects/hpc_support/revbayes-dev/src/revlanguage/analysis/mcmc/RlMonteCarloAnalysis.cpp:102:53: error: ‘tuning_interval’ was not declared in this scope
        value->run( gen, rules, MPI_COMM_WORLD, tuning_interval );

I've been able to build by going to these lines of code and defining tuning_interval:

    //int tuning_interval = static_cast<const Natural &>(args[2].getVariable()->getRevObject()).getValue();
    bool prior = static_cast<const RlBoolean &>( args[2].getVariable()->getRevObject() ).getValue();
    if ( prior == true )
    {
        value->runPriorSampler( gen, rules );
    }
    else
    {
    #ifdef RB_MPI
            value->run( gen, rules, MPI_COMM_WORLD, tuning_interval );
    #else
            value->run( gen, rules, 0 ); //tuning_interval );
    #endif

Changing the first line to:

#ifdef RB_MPI
    int tuning_interval = static_cast<const Natural &>(args[2].getVariable()->getRevObject()).getValue();
#endif`

It would seem that the software hasn't been tested with MPI thoroughly in this release, since it doesn't build. In addition to this compilation error, the validation scripts run more slowly as you increase the number of threads. I am running tech support on a cluster, and one of our users can't get the software to complete in a timely fashion on their own script unless they run on one processor - it always takes longer with more cores, which defeats the purpose of using MPI on a cluster.

@AustinBorger
Copy link
Author

Cloning the development branch fixes the compilation error, but not the MPI slowdown.

@hoehna
Copy link
Member

hoehna commented Dec 5, 2017 via email

@AustinBorger
Copy link
Author

I'm not sure what the analysis is for (I can find out though), but the problem shows up in the validation scripts as well so we might be able to just focus on those.

For problems that can't be parallelized, I would expect that running with more cores would take the same amount of time as one serial thread with just a slight amount of overhead. However, that's not what's happening. The 28-threaded task takes 10 times as long to finish when compared to the single-threaded task (still using MPI). I'm assuming the validation scripts don't analyze more data as you increase the thread count, so a run time this long points to some kind of bug.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants