Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUTCH-3043 Generator: count URLs rejected by URL filters #814

Conversation

sebastian-nagel
Copy link
Contributor

  • add counters URL_FILTERS_REJECTED and URL_FILTER_EXCEPTION
  • simplify logging statement
  • remove unnecessary cast

- add counters URL_FILTERS_REJECTED and URL_FILTER_EXCEPTION
- simplify logging statement
- remove unnecessary cast
Copy link
Member

@lewismc lewismc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic @sebastian-nagel
It would be great if we could augment the metrics documentation once this is merged.
Suppose we could also create a test for the counters.

src/java/org/apache/nutch/crawl/Generator.java Outdated Show resolved Hide resolved
@sebastian-nagel
Copy link
Contributor Author

Hi @lewismc:

  • "use parameterized logging": done
  • "augment the metrics documentation once this is merged.": will do
  • "we could also create a test for the counters.": for now, TestGenerator is not based on MRUNIT. The various Generator::generate(...) return the number of generated segments without a way to access the counters (they're logged, however). I'd prefer to track this in a separate issue, because it would require to many code changes to read the counters.

@lewismc
Copy link
Member

lewismc commented Apr 28, 2024

Excellent @sebastian-nagel 👍 I agree

@sebastian-nagel sebastian-nagel merged commit 5f1330a into apache:master May 14, 2024
4 checks passed
@sebastian-nagel
Copy link
Contributor Author

Thanks, @lewismc! The metrics wiki page was updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants