Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In workflow-attached curation, separate task-list building from execution. #9497

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mwoodiupui
Copy link
Member

@mwoodiupui mwoodiupui commented Apr 23, 2024

The old code would curate the object once for each task, meaning that all but one task would be executed N times up to the length of the list. This patch first configures the Curator with tasks, and then runs it once (or queues it).

References

Description

Completely configure the Curator before using it.

Instructions for Reviewers

List of changes in this PR:

  • Moved Curator.curate out of the loop that configures the Curator with tasks.
  • Replaced literal %n with newline in report output
  • Extended the internal documentation of the package

See comments for a good test procedure.

Checklist

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & integration tests). Exceptions may be made if previously agreed upon.
  • My PR passes Checkstyle validation based on the Code Style Guide.
  • My PR includes Javadoc for all new (or modified) public methods and classes. It also includes Javadoc for large or complex private methods.
  • My PR passes all tests and includes new/updated Unit or Integration Tests based on the Code Testing Guide.
  • If my PR includes new libraries/dependencies (in any pom.xml), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR modifies REST API endpoints, I've opened a separate REST Contract PR related to this change.
  • If my PR includes new configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

The old code would curate the object once for each task, meaning that all
but one task would be executed N times up to the length of the list.
@mwoodiupui mwoodiupui added the component: workflow Related to approval workflow or configurable workflow label Apr 23, 2024
@mwoodiupui mwoodiupui self-assigned this Apr 23, 2024
@alanorth
Copy link
Contributor

alanorth commented Apr 24, 2024

Thanks @mwoodiupui. I tested on DSpace 7.6.1 with the following workflow curation task configuration:

...
 <tasksets>
   <taskset name="metadata-curation">
     <flowstep name="editstep">
       <task name="normalizedois" />
     </flowstep>
     <flowstep name="finaleditstep">
       <task name="countrycodetagger" />
     </flowstep>
   </taskset>
 </tasksets>

Before the patch

Immediately after depositing the submission into the workflow, I see the following:

$ tail -f dspace.log | grep -A1 org.dspace.curate
2024-04-24 07:47:36,759 INFO  1859f123-6a5c-42a2-813d-ed0b2b1ef962 e09685ef-ecf5-40c9-b01d-6b776c4cd65e org.dspace.curate.Curator @ Curation task: normalizedois performed on: workflow item: bf20d42e-b9a6-4125-87d5-ed84a75a3755 with status: 0. Result: 'Normalized 1 DOI(s)'
2024-04-24 07:47:36,759 WARN  1859f123-6a5c-42a2-813d-ed0b2b1ef962 e09685ef-ecf5-40c9-b01d-6b776c4cd65e org.dspace.curate.XmlWorkflowCuratorServiceImpl @ No contacts were found for workflow item 50129:  task normalizedois returned action none with message Normalized 1 DOI(s)
2024-04-24 07:47:36,759 INFO  1859f123-6a5c-42a2-813d-ed0b2b1ef962 e09685ef-ecf5-40c9-b01d-6b776c4cd65e org.dspace.curate.XmlWorkflowCuratorServiceImpl @ Curation tasks over item bf20d42e-b9a6-4125-87d5-ed84a75a3755 for step editstep report:%nNormalized 1 DOI(s)All DOIs already normalizedAll DOIs already normalizedAll DOIs already normalizednull: no countries, skipping.Normalized 1 DOI(s)
2024-04-24 07:47:36,761 INFO  1859f123-6a5c-42a2-813d-ed0b2b1ef962 e09685ef-ecf5-40c9-b01d-6b776c4cd65e org.dspace.content.Item @ aorth@mjanja.ch::update_item:item_id=bf20d42e-b9a6-4125-87d5-ed84a75a3755

Note how the "Normalize DOIs" task runs multiple times, and the country code tagger task also runs, despite only being configured for the finaleditstep. Also note the literal %n from DSpace in the report, which was also fixed below.

After the patch

Immediately after depositing the submission into the workflow, I see the following:

$ tail -f /home/dspace7/log/dspace.log | grep -A1 org.dspace.curate
2024-04-24 07:56:37,439 INFO  1859f123-6a5c-42a2-813d-ed0b2b1ef962 1b8d1bfc-b4c3-413a-8a46-13edb456ca0f org.dspace.curate.Curator @ Curation task: normalizedois performed on: workflow item: 51c38bf0-fabe-4b41-aad8-fb27ee42b177 with status: 0. Result: 'Normalized 1 DOI(s)'
2024-04-24 07:56:37,440 WARN  1859f123-6a5c-42a2-813d-ed0b2b1ef962 1b8d1bfc-b4c3-413a-8a46-13edb456ca0f org.dspace.curate.XmlWorkflowCuratorServiceImpl @ No contacts were found for workflow item 50130:  task normalizedois returned action none with message Normalized 1 DOI(s)
2024-04-24 07:56:37,441 INFO  1859f123-6a5c-42a2-813d-ed0b2b1ef962 1b8d1bfc-b4c3-413a-8a46-13edb456ca0f org.dspace.curate.XmlWorkflowCuratorServiceImpl @ Curation tasks over item 51c38bf0-fabe-4b41-aad8-fb27ee42b177 for step editstep report:
Normalized 1 DOI(s)

The "Normalize DOIs" task runs once, and the country code tagger task is not run. This is the correct behavior.

@tdonohue tdonohue added the bug label Apr 24, 2024
@tdonohue
Copy link
Member

As it sounds like this is working, I'm pulling this over to the 8.0 board and flagging as port to dspace-7_x in the hopes that we can get this fix into both releases. That said, obviously we'll have to wait until @mwoodiupui takes this out of draft status.

@tdonohue tdonohue added the port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release label Apr 24, 2024
@mwoodiupui mwoodiupui marked this pull request as ready for review April 25, 2024 18:44
@mwoodiupui
Copy link
Member Author

It was draft because I hadn't had time to test it.

@alanorth
Copy link
Contributor

alanorth commented Apr 26, 2024

Let's hold off on this. I tested current DSpace 7.6.1 again several times with the same configuration before this patch and I cannot re-produce the original issue. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component: workflow Related to approval workflow or configurable workflow port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release
Projects
Status: 🙋 Needs Reviewers Assigned
Development

Successfully merging this pull request may close these issues.

Workflow curation incorrect nesting causes multiple applications of the same task per object.
3 participants