Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add Ctrl-C interrupt #253

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

WIP: add Ctrl-C interrupt #253

wants to merge 19 commits into from

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Feb 28, 2024

addresses #40

This PR introduces a ThreadManager struct that contains a syncsender, a writer thread, and an atomic bool, interrupted, which is set to true if Ctrl-C is detected.

pub struct ThreadManager<T: Send + 'static> {
    pub sender: Option<SyncSender<T>>,
    pub writer_thread: Option<std::thread::JoinHandle<Result<()>>>,
    pub interrupted: Arc<AtomicBool>,
}

ThreadManager has three functions:

  • new for building a new instance and setting up Ctrl-C handling
  • send for sending data to the writer thread
  • perform_cleanup, to close the threads. This standardizes cleanup across interruption and successful completion.

In each main function, we instantiate a ThreadManager and wrap in in Arc<Mutex> for safe sharing across threads. Then, while iterating through e.g. queries or against signatures, we check the status of the interrupted bool each iteration. If interrupted is ever true, we return, foregoing the remaining iterations.

Sometimes the return is faster than others -- it really depends on how long each iteration takes (aka how often we are checking for interruption).

This PR adds this ctrl-c capture to:

  • pairwise
  • manysearch
  • manysketch
  • multisearch
  • rocksdb fastmultigather
  • rocksdb manysearch

I want to punt the remaining commands to an issue for future work, since they require some refactoring (fastgather/fastmultigather/cluster do not use send/recv) or it wouldn't even really be useful (index, check) since we only provide a super lightweight wrapper around core functionality.

Notes/Questions:

  • Pytest has its own Ctrl-C handler and we can't have two. So we only set up the Ctrl-C handling if we are not running with pytest.
  • How do I properly test this, other than just running?
  • I have no idea if there are better ways to do this, but it seems logical to me and seems to work :)

In progress: benchmark to ensure this doesn't inflate runtimes.

benchmark summary

12k ICTV viral genomes, scaled=200

79.8m comparisons

command version time
pairwise PR 17s
pairwise v0.9.1 20s
multisearch PR 32s
multisearch v0.9.1 ?
manysketch PR 4 s
manysketch v0.9.1 ?
manysearch PR ?
manysearch v0.9.1 ?
rocksdb manysearch PR ?
rocksdb manysearch v0.9.1 ?
rocksdb fastmultigather PR ?
rocksdb fastmultigather v0.9.1 ?

@bluegenes
Copy link
Contributor Author

verbose benchmarking results (in progress):

DONE. Processed 79752135 comparisons
...pairwise is done! results in 'vmr_MSL38_v1.dna-k21-sc200.pairwise-ani.csv'
        Command being timed: "sourmash scripts pairwise vmr_MSL38_v1.dna-k21-sc200.zip -c 10 -k 21 -s 200 --ani -o vmr_MSL38_v1.dna-k21-sc200.pairwise-ani.csv"
        User time (seconds): 119.37
        System time (seconds): 3.95
        Percent of CPU this job got: 739%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.68
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 178668
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 86987
        Voluntary context switches: 69215
        Involuntary context switches: 126
        Swaps: 0
        File system inputs: 0
        File system outputs: 55608
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

multisearch

DONE. Processed 159516900 comparisons
...multisearch is done! results in 'vmr_MSL38_v1.dna-k21-sc200.pairwise-ani.csv'
        Command being timed: "sourmash scripts multisearch vmr_MSL38_v1.dna-k21-sc200.zip vmr_MSL38_v1.dna-k21-sc200.zip -c 10 -k 21 -s 200 --ani -o vmr_MSL38_v1.dna-k21-sc200.pairw
ise-ani.csv"
        User time (seconds): 249.55
        System time (seconds): 3.59
        Percent of CPU this job got: 794%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.87
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 225256
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 155134
        Voluntary context switches: 138924
        Involuntary context switches: 283
        Swaps: 0
        File system inputs: 0
        File system outputs: 114016
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

manysketch with ctrl-c:

DONE. Processed 67132 fasta files
WARNING: 33566 fasta files skipped - no compatible signatures.
...manysketch is done! results in 'test.zip'
        Command being timed: "sourmash scripts manysketch output.spillover/spillover.fromfile.csv -o test.zip"
        User time (seconds): 6.02
        System time (seconds): 4.14
        Percent of CPU this job got: 278%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.64
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 155568
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 5
        Minor (reclaiming a frame) page faults: 53261
        Voluntary context switches: 78358
        Involuntary context switches: 73616
        Swaps: 0
        File system inputs: 0
        File system outputs: 44368
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Collaborator

ctb commented May 23, 2024

see #40 (comment)
nt-2105327466

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants