Permit multiple input files on the accelsearch command line #120

Kawashima-Azumi · 2020-07-09T08:21:34Z

Permit multiple input files on the command line in order to avoid traversing $PATH and $LD_LIBRARY_PATH each time a new accelsearch starts.
Those who reduce pulsar short observations on large computer clusters restart accelsearch every few milliseconds, traversing their $PATH and $LD_LIBRARY_PATH on the shared filesystem each time, causing massive slowdown of the whole cluster.

traversing $PATH and $LD_LIBRARY_PATH each time a new search starts

scottransom · 2020-07-09T13:01:04Z

Thanks for the pull request! I like the idea. I'm not sure I like the implementation with "--", though. Especially as it breaks the automatic creation of all the accelsearch_clig files.

Is it possible to change the way that it works so that it simply checks to see if there is more than one input file, like was done in #117 ? I haven't had a chance to fully review that PR yet, but I will almost certainly merge it (and it has exactly the same goal as yours).

scottransom · 2020-07-09T13:45:48Z

This also applies to PR #117

I just pushed up changes to the command line parsing for realfft, accelsearch, and zapbirds (which all likely need to be called many times on clusters) to allow up to 16384 input files. I did this using the "clig" software that auto-generates the command line interfaces.

This should make using multiple files a lot better and more consistent. The way it would work is like the following:

After the clig parser is called:
cmd = parseCmdline(argc, argv);
then the number of input files is simply:
numfiles = cmd->argc;
and you can iterate over them like:

    for (ii=0; ii<numfiles; ii++){
        current_filename = cmd->argv[ii];
        ...
    }

If you can please modify your PR so that it handles the new changes, I'll happily merge it.

Thank you!

scottransom · 2020-07-09T13:52:01Z

One other comment: Is the fork() really necessary? If you have all the input filenames, why not simply iterate over them with a for loop within the original process?

Kawashima-Azumi · 2020-07-09T16:49:53Z

Thank you for mentioning #117 , which I should certainly have read earlier. Yes, I will make them consistent tomorrow since it's midnight here already.

My apologies for the fork(2) and for breaking the generation of accelsearch_clig ... Since my job is at the data center and unrelated to compact object astrophysics, this patch was an emergency fix for someone who congested our storage cluster. Had got only a few minutes to go over the presto code, and was unsure if there is no static variable and a loop is sufficient without fork-ing. So my first attempt was to fork(2) to make it safe, ... and on Linux that doesn't add much cost anyway.
I'll remove the fork(2) if you (the author of presto) are sure that the program status won't be messed up by repeated calls.

scottransom · 2020-07-09T17:07:58Z

No apologies necessary! I really appreciate you making the PR and making PRESTO better to use for others. Thank you so much! I'll verify about fork(), but I'm pretty sure that simply looping over calls to "main()" for each file should be completely fine.

scottransom · 2020-07-09T17:13:37Z

So I just checked and it isn't quite a simple as iterating over main() as create_accelobs() will need a slight modification as well since it gets sent the full parsed command line. I'm happy to help test any changes you make. And if I weren't swamped, I would happily make all the changes myself. If this is really a problem for you to make the changes, I can eventually make the changes myself. But it might take a while.

Kawashima-Azumi · 2020-07-10T12:26:05Z

I have corrected the clig thing and also got rid of a call to system(3).

However, a closer look showed many calls to exit(3) at various levels of function calls. More complex/dangerous than I thought yesterday to loop without fork-ing. Would you like to re-consider the loop-and-fork approach? With command line parsing outside the loop, of course.

Fork-ing may have a negative impact on performance and resource utilization if the program should support MS Windows (but should be ok for the Windows subsystem for Linux 2).

…void" This conflicts with the clig command line parser / doc generator files, and also needs further optimization on command line parsing. This also missed a call to system(3). Leaving it there partly defeated the purpose of this commit, which is to reduce the number of shell executions and associated searched over $PATH and $LD_LIBRARY_PATH. This reverts commit f025633.

scottransom · 2020-07-10T14:17:00Z

Just checking about the intent of the fork() idea... Is your intent to run all of the accelsearches in parallel with the fork calls? Because that was more than I was thinking. I was simply thinking about running them in serial, all within the original process.

Kawashima-Azumi · 2020-07-10T15:46:29Z

Just checking about the intent of the fork() idea... Is your intent to run all of the accelsearches in parallel with the fork calls? Because that was more than I was thinking. I was simply thinking about running them in serial, all within the original process.

No. In serial, but exit(3), malloc without free, non-stateless code like global variables and static local variables cannot possibly mess up subsequent accelsearches, and the parent process is free to decide if it will proceed to the next one when the previous accelsearch fails (i.e., exited with error status).

scottransom · 2020-07-10T15:51:00Z

I see. OK. That helps. And I agree that that does seem potentially helpful. I'll take a look at your PR as well as the possibility to avoid fork() and still get the same benefits.

Kawashima-Azumi · 2020-07-10T16:05:54Z

Or probably later when I make another commit with correct clig changes, just to save your time?

BTW, Writing code that loops for long (or forever) without randomly crashing may be non-trivial according to my experience on writing online data reduction programs for high frequency radio astronomy. So from my side, at least, fork-ing to isolate code with casual error checks is not more guilty than asking you to work over the weekend :-)

scottransom · 2020-07-10T16:20:38Z

Fair enough! And after you explained your reasoning for forking, I think I might be coming around to your way of thinking.

Catch up with original repo

Kawashima-Azumi · 2020-07-11T05:54:32Z

Updated to a fork-ing approach that conforms to the original command line format. This reduces ~50% of the metadata server load on a cluster. Will submit another PR later to get rid of the other ~50% load.

pressure on the clustered filesystem's metadata server.

Kawashima-Azumi · 2020-12-17T11:26:36Z

Hi Scott,

Sorry for coming back late, but I have heavy technical work to do, and playing with pulsars is only my hobby.

I have included a bit more changes with this pull request to reduce filesystem access further, eliminating the need to congest the storage cluster when doing acceleration searches on many short observations.

Now this PR is complete, I guess. Please review.

Best regards,
Azumi

scottransom · 2020-12-17T16:43:00Z

Thanks, Azumi! This has been a crazy semester, but I completely plan to handle this PR (perhaps with small changes) over the next month or so. I really appreciate your contribution!

Permit multiple input files on the command line in order to avoid

f025633

traversing $PATH and $LD_LIBRARY_PATH each time a new search starts

Kawashima-Azumi and others added 2 commits July 11, 2020 02:24

Merge pull request #1 from scottransom/master

343b71c

Catch up with original repo

Support multiple input files

5465469

Kawashima-Azumi added 2 commits December 17, 2020 19:07

Merge remote-tracking branch 'scott.ransom/master'

42d3ac3

Avoid calling external command for simple file operation to reduce the

0a95602

pressure on the clustered filesystem's metadata server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permit multiple input files on the accelsearch command line #120

Permit multiple input files on the accelsearch command line #120

Kawashima-Azumi commented Jul 9, 2020 •

edited

scottransom commented Jul 9, 2020

scottransom commented Jul 9, 2020

scottransom commented Jul 9, 2020

Kawashima-Azumi commented Jul 9, 2020 •

edited

scottransom commented Jul 9, 2020

scottransom commented Jul 9, 2020

Kawashima-Azumi commented Jul 10, 2020 •

edited

scottransom commented Jul 10, 2020

Kawashima-Azumi commented Jul 10, 2020

scottransom commented Jul 10, 2020

Kawashima-Azumi commented Jul 10, 2020

scottransom commented Jul 10, 2020

Kawashima-Azumi commented Jul 11, 2020

Kawashima-Azumi commented Dec 17, 2020

scottransom commented Dec 17, 2020

Permit multiple input files on the accelsearch command line #120

Are you sure you want to change the base?

Permit multiple input files on the accelsearch command line #120

Conversation

Kawashima-Azumi commented Jul 9, 2020 • edited

scottransom commented Jul 9, 2020

scottransom commented Jul 9, 2020

scottransom commented Jul 9, 2020

Kawashima-Azumi commented Jul 9, 2020 • edited

scottransom commented Jul 9, 2020

scottransom commented Jul 9, 2020

Kawashima-Azumi commented Jul 10, 2020 • edited

scottransom commented Jul 10, 2020

Kawashima-Azumi commented Jul 10, 2020

scottransom commented Jul 10, 2020

Kawashima-Azumi commented Jul 10, 2020

scottransom commented Jul 10, 2020

Kawashima-Azumi commented Jul 11, 2020

Kawashima-Azumi commented Dec 17, 2020

scottransom commented Dec 17, 2020

Kawashima-Azumi commented Jul 9, 2020 •

edited

Kawashima-Azumi commented Jul 9, 2020 •

edited

Kawashima-Azumi commented Jul 10, 2020 •

edited