Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spider: do not add seeds to the found list #4963

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jeremychoi
Copy link
Contributor

@jeremychoi jeremychoi commented Oct 4, 2023

Overview

Fix zaproxy/zaproxy#7737

Checklist

  • [N/A] Update help
  • Update changelog
  • Run ./gradlew spotlessApply for code formatting
  • Write tests
  • Check code coverage
  • Sign-off commits
  • Squash commits
  • Use a descriptive title

For more details, please refer to the developer rules and guidelines.

@thc202 thc202 changed the title changed not to add seeds to the spider found list spider: do not add seeds to the spider found list Oct 4, 2023
@thc202
Copy link
Member

thc202 commented Oct 4, 2023

Please see https://github.com/zaproxy/zap-extensions/pull/4963/checks?check_run_id=17379176664

The changelog should be updated. Better to have some tests.

@thc202 thc202 changed the title spider: do not add seeds to the spider found list spider: do not add seeds to the found list Oct 4, 2023
@psiinon
Copy link
Member

psiinon commented Oct 4, 2023

Note that the robots.txt and sitemap.xml URLs still appear in the SiteMap.
This makes sense if they really exist, but maybe we should not include them if they are 404s and requested by the spider? I've wondered about this for a while.
That doesnt have to be part of this PR, just a comment 😁

Signed-off-by: Jeremy Choi <jechoi@redhat.com>
@jeremychoi
Copy link
Contributor Author

Better to have some tests.

maybe I could add a test like

  1. spiderController.addSeed(uri, ...)
  2. assert if the uri does not exist in foundURIs.

Does that makes sense?

BTW, since I'm yet to learn more about the ZAP code, I'm not sure at the moment how I can access 'foundURIs' in SpiderControllerUnitTest.Java? @thc202 Could you give me an example?

@jeremychoi
Copy link
Contributor Author

maybe we should not include them if they are 404s and requested by the spider?

+1 from a user perspective :)

@jeremychoi
Copy link
Contributor Author

The changelog should be updated.

Could you let me know where the changelog is? Maybe it'd be better if the information could be found in https://github.com/zaproxy/zaproxy/blob/main/CONTRIBUTING.md#guidelines-for-pull-request-pr-submission-and-processing

@kingthorin
Copy link
Member

https://github.com/zaproxy/zap-extensions/blob/main/addOns/spider/CHANGELOG.md

Signed-off-by: Jeremy Choi <jechoi@redhat.com>
Co-authored-by: Rick M <kingthorin@users.noreply.github.com>
Signed-off-by: Jeremy Bonghwan Choi <jechoi@redhat.com>
@thc202
Copy link
Member

thc202 commented Oct 5, 2023

For the record that's in https://github.com/zaproxy/zap-extensions/blob/main/CONTRIBUTING.md#changelog

@jeremychoi
Copy link
Contributor Author

maybe I could add a test like

  1. spiderController.addSeed(uri, ...)
  2. assert if the uri does not exist in foundURIs.

Does that makes sense?

BTW, since I'm yet to learn more about the ZAP code, I'm not sure at the moment how I can access 'foundURIs' in SpiderControllerUnitTest.Java? @thc202 Could you give me an example?

@thc202 or anyone, could I get any info which will be helpful?

@thc202
Copy link
Member

thc202 commented Oct 30, 2023

In SpiderControllerUnitTest something like:

@Test
void shouldNotNotifySeedAsFoundUri() {
    // Given
    URI seed = …
    // When
    spiderController.addSeed(seed, …);
    // Then
    verify(spider, times(0)).notifyListenersFoundURI(eq(seed), …);
}

although this is not correct (IMO) we should still notify about the seeds used, the referenced issue was about the count of found URIs so the fix should be to ignore seeds when counting.

@kingthorin
Copy link
Member

@jeremychoi do you still plan to finish this?

@jeremychoi
Copy link
Contributor Author

@kingthorin will try to find time to finish this week.

Signed-off-by: Jeremy Choi <jechoi@redhat.com>
Signed-off-by: Jeremy Choi <jechoi@redhat.com>
@jeremychoi
Copy link
Contributor Author

@kingthorin @thc202 please review.
@thc202 we should still notify about the seeds used => In that case, IMO it should be implemented in a separate way, instead of calling notifyListenersFoundURI() whose name doesn't fit here, because it's not that seeds are found.

Copy link
Member

@kingthorin kingthorin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this mean that none of the seeds are marked found, though we only want to skip the "artificial" seeds (robots, sitemap, ds_store)?

@jeremychoi
Copy link
Contributor Author

Won't this mean that none of the seeds are marked found, though we only want to skip the "artificial" seeds (robots, sitemap, ds_store)?

@kingthorin you're right. would this work if we checked the found-URI belongs to the aritificial seeds?

@kingthorin
Copy link
Member

I'd be good with that.

Copy link

github-actions bot commented Feb 27, 2024

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

…() nor addRootFileSeed()

Signed-off-by: Jeremy Choi <jechoi@redhat.com>
@jeremychoi
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@jeremychoi
Copy link
Contributor Author

recheck

zapbot added a commit to zaproxy/cla that referenced this pull request Feb 27, 2024
@jeremychoi
Copy link
Contributor Author

Could anyone help with solving this error?

A failure occurred while executing me.champeau.gradle.japicmp.JApiCmpWorkAction
937 actionable tasks: 99 executed, 311 from cache, 527 up-to-date
Detected binary changes.
- current: spider-0.11.0.jar
- baseline: spider-0.6.0.jar.

Copy link
Member

@kingthorin kingthorin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For binary compatibility you'll need to maintain the original Seed constructor and addSeed method. They can call the new one and pass defaults for the boolean (likely false?)

@@ -5,9 +5,13 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased
### Changed
- Changed Spider not to notify as a found URI when a seed is added from addRootFileSeed() or addFileSeed() (Issue 7737).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a user friendly explanation not a dev one.

addOns/spider/CHANGELOG.md Show resolved Hide resolved
@kingthorin
Copy link
Member

@jeremychoi are you able to keep going with this?

@jeremychoi
Copy link
Contributor Author

@kingthorin Thanks for the heads up. Yes. Sorry for the delay. I'll find time this weekend.

@thc202
Copy link
Member

thc202 commented May 15, 2024

Can we have a clarification of what the final behaviour is going to be?

@jeremychoi
Copy link
Contributor Author

@thc202

Can we have a clarification of what the final behaviour is going to be?

the file seeds(e.g. robots, sitemap) will not be notified as 'foundURI'.

@thc202
Copy link
Member

thc202 commented May 20, 2024

We are back to the remark in #4963 (comment).

@jeremychoi
Copy link
Contributor Author

@thc202 we should still notify about the seeds used, the referenced issue was about the count of found URIs so the fix should be to ignore seeds when counting. => do you mean file seeds should be notified as found URIs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants