Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spider logs the count of found URLs more than actual existing URIs. #7737

Open
1 task done
jeremychoi opened this issue Feb 15, 2023 · 6 comments · May be fixed by zaproxy/zap-extensions#4963
Open
1 task done

Spider logs the count of found URLs more than actual existing URIs. #7737

jeremychoi opened this issue Feb 15, 2023 · 6 comments · May be fixed by zaproxy/zap-extensions#4963
Assignees
Labels
add-on enhancement in:spider Issues pertaining to Spider add-on.

Comments

@jeremychoi
Copy link

jeremychoi commented Feb 15, 2023

Describe the bug

Job spider is reporting always 3 or more even when there is no URLs that can be found, like the following:

"Job spider found 3 URLs"

The logs above came from Automation Framework's messages but the issue seems to exist in the spider addon itself (e.g. https://github.com/zaproxy/zap-extensions/blob/420d2e9d24c44f6a54089c54ad432531cf336b9c/addOns/spider/src/main/java/org/zaproxy/addon/spider/SpiderController.java#L193)

I assume the URLs are including the user-suppiled URL (e.g. a default target URL) plus robots.txt and sitemap.xml which Spider sends automatically.

The root cause might be, increasing the counts regardless of the return status (e.g. 404).

Steps to reproduce the behavior

It can be reproduced using any web apps. Just a simple example can be:

  1. create an empty dir
  2. python3 -m http.server 5000
  3. run spider with automation framework

Expected behavior

It should report counts of existing pages only.

Software versions

ZAP 2.12.0

Screenshots

No response

Errors from the zap.log file

No response

Additional context

No response

Would you like to help fix this issue?

  • Yes

Edit:

In the end the resolution of this issue will be an "enhancement" excluding the seeds from the "found" count.

Though ignoring the seeds would probably be more accurate since those are technically not found while spidering.

@jeremychoi jeremychoi added the bug label Feb 15, 2023
@jeremychoi jeremychoi changed the title Spider logs more than actual found URIs. Spider logs the count of found URLs more than actual existing URIs. Feb 15, 2023
@thc202 thc202 removed the bug label Feb 15, 2023
@thc202
Copy link
Member

thc202 commented Feb 15, 2023

That's the expected behaviour, status code is not relevant for the count of URLs found. Though ignoring the seeds would probably be more accurate since those are technically not found while spidering.

@jeremychoi
Copy link
Author

I see. You're right. If a URL is found and returned with 404, that should be counted too. And, I agree it will make it accurate to ignore the seeds

@jeremychoi
Copy link
Author

thanks for assigning. I will be able to work in April.

@thc202 thc202 added the in:spider Issues pertaining to Spider add-on. label Apr 14, 2023
@kingthorin
Copy link
Member

@jeremychoi do you still plan to tackle this?

@jeremychoi
Copy link
Author

@kingthorin yes. sorry for the delay. I couldn't find time to work on it. I'll do this Q. However, if there's someone else who wants to fix it, it's okay for this to be reassigned.

@kingthorin
Copy link
Member

No problem and no rush. Life gets busy.

I’m working on something else with the Spider but might tackle this in a few weeks if it’s still kicking around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add-on enhancement in:spider Issues pertaining to Spider add-on.
Development

Successfully merging a pull request may close this issue.

3 participants