Drop anemone and use Spidr for Repo discovery #10947

sjha4 · 2024-03-22T03:26:33Z

To Do:

Support basic auth
Support proxy

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

bundle install
Go to Content > Product > Repo discovery
Run repo discovery.

sjha4 · 2024-03-22T03:40:52Z

@evgeni : Thoughts on spidr gem as replacement for anemone?

evgeni · 2024-03-22T06:09:21Z

Doesn't look crazy? Only dep is nokogiri, which we have anyway, tested on modern rubies. Why not.

app/lib/katello/repo_discovery.rb

sjha4 · 2024-05-21T15:40:45Z

I am seeing significant performance difference with what's on the PR vs the existing workflow..Looking at ways to speed this up..Will push updates when I get the performance sorted.

Update: Should be good to go with latest commit.
Able to see chunked output in repo discovery page and the task finishes in about the same time as earlier.

app/lib/katello/repo_discovery.rb

katello.gemspec

ekohl

Implementation wise I think you should separate the crawler and Docker search into separate classes. Perhaps even the file crawl as well. Right now it's confusing.

ekohl · 2024-05-21T20:39:02Z

app/lib/katello/repo_discovery.rb

+
+    def process_page_urls(urls)
+      urls.each do |url|
+        url = url.to_s


I think this is a bad idea. The URL object is way more valuable and I'd only add it to @to_follow with to_s.

We don't necessarily use the URL object properties once we have the list of URLs in the page. It's just matching strings after that point..Becomes easier to pass that around and work with it IMO?

Updated it in latest push to pass URL object.

app/lib/katello/repo_discovery.rb

katello.gemspec

pr-processor bot added Not yet reviewed Waiting on contributor labels Mar 22, 2024

sjha4 force-pushed the anemone branch 2 times, most recently from cc6dee6 to 934292d Compare March 22, 2024 16:48

pr-processor bot removed the Waiting on contributor label Mar 22, 2024

jeremylenz reviewed May 14, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

ekohl reviewed May 16, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

Fixes #37159 - Drop anemone and use Spidr for repo discovery

84094c5

sjha4 force-pushed the anemone branch from 934292d to 759e66d Compare May 21, 2024 16:57

github-actions bot added the Packaging Change label May 21, 2024

sjha4 changed the title ~~Early Draft - Drop anemone and use Spidr~~ Drop anemone and use Spidr for Repo discovery May 21, 2024

sjha4 force-pushed the anemone branch from 759e66d to 5834448 Compare May 21, 2024 16:58

sjha4 marked this pull request as ready for review May 21, 2024 16:59

evgeni reviewed May 21, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

katello.gemspec Outdated Show resolved Hide resolved

sjha4 force-pushed the anemone branch from 5834448 to d96f3f0 Compare May 21, 2024 17:20

Refs #37159 - Improve discovery performance and add gem dependency

1d7766a

sjha4 force-pushed the anemone branch from d96f3f0 to 1d7766a Compare May 21, 2024 17:56

ekohl reviewed May 21, 2024

View reviewed changes

Refs #37159 - Refactor content specific discoveries

7abf885

sjha4 force-pushed the anemone branch from 2c86dc5 to 7abf885 Compare May 30, 2024 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop anemone and use Spidr for Repo discovery #10947

Drop anemone and use Spidr for Repo discovery #10947

sjha4 commented Mar 22, 2024 •

edited

sjha4 commented Mar 22, 2024

evgeni commented Mar 22, 2024

sjha4 commented May 21, 2024 •

edited

ekohl left a comment

ekohl May 21, 2024

sjha4 May 22, 2024

sjha4 May 30, 2024

Drop anemone and use Spidr for Repo discovery #10947

Are you sure you want to change the base?

Drop anemone and use Spidr for Repo discovery #10947

Conversation

sjha4 commented Mar 22, 2024 • edited

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

sjha4 commented Mar 22, 2024

evgeni commented Mar 22, 2024

sjha4 commented May 21, 2024 • edited

ekohl left a comment

Choose a reason for hiding this comment

ekohl May 21, 2024

Choose a reason for hiding this comment

sjha4 May 22, 2024

Choose a reason for hiding this comment

sjha4 May 30, 2024

Choose a reason for hiding this comment

sjha4 commented Mar 22, 2024 •

edited

sjha4 commented May 21, 2024 •

edited