Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUTCH-2481 HostDatum deltas(previous step statistics) and Metadata expressions #278

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

okedoki
Copy link
Contributor

@okedoki okedoki commented Jan 17, 2018

The logic of updatehostdb is changed slightly.

In case of specification of hostdb.deltaExpression, we dont reset statistics in mapper, but send the previous step statistic first to the reducer and reset it afterwards.

In line 215 of the mapper
if (readingCrawlDb)
is replaced by
if (readingCrawlDb && !isDeltaStatisticCalculated) {
hostDatum.resetStatistics();

  •  }
    

Please, verify that logic doesn't break the current functionality.

@okedoki okedoki changed the title Nutch 2481 NUTCH-2481 Jan 17, 2018
// Create or retrieve a JexlEngine
JexlEngine jexl = new JexlEngine();

// Dont't be silent and be strict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont't must be a typo, but beyond that, setSilent(true) seems to contradict this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YossiTamari
The funny part it is a copy-paste from ReadHostDb line 83.
How do you propose to fix it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the original intent was, but maybe we should replace this whole code (in both places) with:
this.deltaExpression = org.apache.nutch.util.JexlUtil.parseExpression(stringDeltaExpression);?

@okedoki
Copy link
Contributor Author

okedoki commented Feb 12, 2018

@YossiTamari
Refactored according to your suggestion. It is quite bad that we have a utility for it and it wasnt used.

@lewismc lewismc changed the title NUTCH-2481 NUTCH-2481 HostDatum deltas(previous step statistics) and Metadata expressions Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants