Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output the progress once more when the whole scan process finished #828

Merged
merged 3 commits into from Mar 18, 2024

Conversation

WangYihang
Copy link
Contributor

Introduction

As the issue (#823) montioned, the status output lacks the final line, which is crucial for certain monitoring systems that track scanning progress via status files. As a result, these systems are currently unable to accurately capture the completion of the scanning process.

For comprehensive background information, refer to issue #823.

Overview of Current Issue

Prior to the proposed changes, the status output format is as follows:

real-time,time-elapsed,time-remaining,percent-complete,hit-rate,active-send-threads,sent-total,sent-last-one-sec,sent-avg-per-sec,recv-success-total,recv-success-last-one-sec,recv-success-avg-per-sec,recv-total,recv-total-last-one-sec,recv-total-avg-per-sec,pcap-drop-total,drop-last-one-sec,drop-avg-per-sec,sendto-fail-total,sendto-fail-last-one-sec,sendto-fail-avg-per-sec
2024-03-16 10:24:50,0,11,0.249121,0.000000,4,2,1,68,0,0,0,0,0,0,0,0,0,0,0,0
...
2024-03-16 10:24:58,8,0,99.590841,19.531250,0,256,1,4009,50,0,6,200,50,25,0,0,0,0,0,0

Proposed Enhancement

With the proposed modifications, the status output format will be updated to include the final line, thereby providing a complete overview of the scanning progress:

real-time,time-elapsed,time-remaining,percent-complete,hit-rate,active-send-threads,sent-total,sent-last-one-sec,sent-avg-per-sec,recv-success-total,recv-success-last-one-sec,recv-success-avg-per-sec,recv-total,recv-total-last-one-sec,recv-total-avg-per-sec,pcap-drop-total,drop-last-one-sec,drop-avg-per-sec,sendto-fail-total,sendto-fail-last-one-sec,sendto-fail-avg-per-sec
2024-03-16 10:24:50,0,11,0.249121,0.000000,4,2,1,68,0,0,0,0,0,0,0,0,0,0,0,0
...
2024-03-16 10:24:58,8,0,99.590841,19.531250,0,256,1,4009,50,0,6,200,50,25,0,0,0,0,0,0

src/monitor.c Outdated
@@ -487,6 +487,9 @@ void monitor_run(iterator_t *it, pthread_mutex_t *lock)
if (status_fd) {
update_status_updates_file(export_status, status_fd);
}
if (zsend.complete && zrecv.complete) {
Copy link
Contributor

@droe droe Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to guarantee a final status line based on stats in completed state, then there's a race here. The scan can be still incomplete when we obtain stats on line 473 and subsequently produce status lines based on those stats, and then complete on the recv thread just before monitor thread execution reaches line 490, in which case the loop is exited without a final status line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback. It appears that I may not have fully grasped your point.

Currently, ZMap does not correctly update the status file after completing the scan (see #823 ).

My intention is to ensure that, once ZMap's scan is finished, it will refetch the current scanning status one more time and output the last line of the status file. I am not quite familiar with the code base, so my changes maybe naive somehow.

To address this issue, do you have any further suggestions for improvement? I am more than willing to continue assisting in resolving this issue.

Hey, @zakird and @phillip-stephens, any suggestions on these tweaks?

Thank you guys once again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zsend.complete and zrecv.complete are global vars that can be set at any time on other threads running concurrently to the monitor thread. Their values can change basically at any point in time. There is a lock taken across updating of the stats, but the lock does not protect zrecv.complete. I'm suggesting that this change makes it much more likely that the last status line has completed stats, by moving the sleep to after the condition check, but that it still does not guarantee it.

One way to avoid the race without reworking any of the locking would be to read zsend.complete && zrecv.complete into a local var at the beginning of the loop, and then breaking out of the loop based on that local var. Then we only break out of the loop when we know that all send and recv threads had completed already before updating stats and writing the status lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback. I have a few questions to better understand your comments.

  1. You mentioned that zsend.complete and zrecv.complete are global vars that can be set at any time on other threads running concurrently to the monitor thread. This makes me wonder if the original zmap code src/monitor.c#L472 might have a race condition issue similar to the one you described.

  2. Intuitively the variable zsend.complete and zrecv.complete, they should be monotonic, which means they change only from an incomplete to a complete state, and never revert from complete to incomplete. With this in mind, I believe the current modifications should be safe, as once both variables reach the complete state, they should not be altered by other threads.

  3. About your comment "by moving the sleep to after the condition check, but that it still does not guarantee it," I havn't change the callsite of the sleep function, it remains after the condition check in my latest revision. Could you please specify what additional steps you recommend to ensure the effectiveness of this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, zsend.complete and zrecv.complete are monotonic in the sense that they only go from 0 to 1. Correct, the issue existed before your change, your change does improve the situation, vastly, but seems like an incomplete fix in that it works as expected most of the time, but it does not guarantee that the last status line written was based on counters in their final state after completion.

The corner case I'm (only a little) concerned about:

  1. recv thread is still executing, zrecv.complete is 0.
  2. monitor thread executes (new) lines 474 to 490, outputting a line to the status file based on non-completed status counters.
  3. monitor thread gets preempted, or, say, file I/O takes unexpectedly long.
  4. recv thread completes and sets zrecv.complete to 1.
  5. monitor thread executes line 491, observes zrecv.complete being 1, breaks the loop. The last status line is still based on older counters before completion.

In practice, especially with default cooldown, not a big deal. Still, I would propose to read zsend.complete && zrecv.complete into a local var at the beginning of the loop, and break based on the local var on line 491, knowing that then, when we are complete, we were already complete before outputting status, which means status was based on counters after completion. If completion happens during production of the status line (during execution of lines 474 to 490), we take the loop once more, produce another full status line, then exit after a complete line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I will make an effort to implement improvements based on your suggestions.

export_status_t *export_status = xmalloc(sizeof(export_status_t));

while (!(zsend.complete && zrecv.complete)) {
void export_then_update(int_status_t *internal_status, iterator_t *it, export_status_t *export_status, pthread_mutex_t *lock) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now encapsulated the main functionality within the while loop (which involves obtaining the current scanning status and writing it to a status file) into a separate function called export_then_update. This encapsulation simplifies the logic in monitor_run.

The structure of the while loop remains unchanged. The only addition is to call export_then_update once more after the while loop concludes. This ensures that the status file is updated based on the latest state.

If there are any concerns regarding the style or functionality of the code modifications I've made, please let me know. Your feedback is greatly appreciated! @droe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does address the race I pointed out. No other concerns, I'm not a reviewer tho.

Copy link
Contributor

@phillip-stephens phillip-stephens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, really appreciate the contribution @WangYihang and the PR discussion @droe!

@phillip-stephens phillip-stephens merged commit 4d4166e into zmap:main Mar 18, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants