Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data migration from DSpace-CRIS 5 #404

Open
alejandratenorio opened this issue Nov 25, 2023 · 6 comments
Open

Data migration from DSpace-CRIS 5 #404

alejandratenorio opened this issue Nov 25, 2023 · 6 comments
Labels

Comments

@alejandratenorio
Copy link

alejandratenorio commented Nov 25, 2023

Describe the bug

Dear 4Science Team,

We are working on upgrading our DSpace Cris instance from version base on DSpace 5.10 to DSpace-CRIS 7 release 2023.01.01.
URL: CRIS
Cris Versión: 5.10
image

We need to upgrade our database using your Data migration from DSpace-CRIS 5 process described on your documentation but now the tools (Pentaho Data Integration) are not available.

image

Please, could you help us? Where can we download it?

Thank you in advance.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Pentaho Data Integration
  2. Then we got this message: file could not be found or is not available.
@kskaiser
Copy link

You can download the tool from https://www.hitachivantara.com/en-us/products/pentaho-platform/data-integration-analytics/pentaho-community-edition.html
If you want to run it on an ARM Mac, you have to follow this guide:
https://stackoverflow.com/questions/67972804/pentaho-data-integration-not-starting-on-new-mac-m1
Windows and Linux should be fine.

@alejandratenorio
Copy link
Author

@kskaiser
Thank you so much. There are a lot of options, could you tell me which tool I have to download?

@kskaiser
Copy link

@alejandratenorio
Copy link
Author

Hi @kskaiser
Thank you very much again. I have run the kitchen script with the following parameters:

kitchen.sh -file:/home/dspace/DSpace-dspace-cris-2023.01.01/dspace/etc/migration/dspace_cris_migration.kjb -param:db_host_name=localhost -param:db_name=dspace -param:db_port_number=5432 -param:db_username=dspace -param:db_password=dspace. -param:eperson_email=mymails@...

but I got this error:

2023/11/30 04:13:27 - insert into imp_record.0 - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Because of an error, this step can't continue: 2023/11/30 04:13:27 - metadata visibility configuration.0 - Finished processing (I=2, O=0, R=0, W=2, U=0, E=0) 2023/11/30 04:13:27 - Join rows.0 - Finished processing (I=0, O=0, R=71, W=70, U=0, E=0) 2023/11/30 04:13:27 - Add insert operation and status.0 - Finished processing (I=0, O=0, R=70, W=70, U=0, E=0) 2023/11/30 04:13:27 - insert into imp_record.0 - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : org.pentaho.di.core.exception.KettleException: 2023/11/30 04:13:27 - insert into imp_record.0 - Error inserting row into table [imp_record] with values: [null], [Funding], [30], [pj00030], [null], [1], [insert], [z], [1], [1c5344cd-fa89-44d8-9c98-7aa20de1d75 0] 2023/11/30 04:13:27 - insert into imp_record.0 - 2023/11/30 04:13:27 - insert into imp_record.0 - Error inserting/updating row 2023/11/30 04:13:27 - insert into imp_record.0 - ERROR: null value in column "imp_collection_uuid" of relation "imp_record" violates not-null constraint 2023/11/30 04:13:27 - insert into imp_record.0 - Detail: Failing row contains (1, pj00030, 1c5344cd-fa89-44d8-9c98-7aa20de1d750, null, z, insert, null, null, null). 2023/11/30 04:13:27 - insert into imp_record.0 - 2023/11/30 04:13:27 - insert into imp_record.0 - 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:384) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:125) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) 2023/11/30 04:13:27 - insert into imp_record.0 - at java.base/java.lang.Thread.run(Thread.java:829) 2023/11/30 04:13:27 - insert into imp_record.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 2023/11/30 04:13:27 - insert into imp_record.0 - Error inserting/updating row 2023/11/30 04:13:27 - insert into imp_record.0 - ERROR: null value in column "imp_collection_uuid" of relation "imp_record" violates not-null constraint 2023/11/30 04:13:27 - insert into imp_record.0 - Detail: Failing row contains (1, pj00030, 1c5344cd-fa89-44d8-9c98-7aa20de1d750, null, z, insert, null, null, null). 2023/11/30 04:13:27 - insert into imp_record.0 - 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1335) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:262) 2023/11/30 04:13:27 - insert into imp_record.0 - ... 3 more 2023/11/30 04:13:27 - insert into imp_record.0 - Caused by: org.postgresql.util.PSQLException: ERROR: null value in column "imp_collection_uuid" of relation "imp_record" violates not-null constraint 2023/11/30 04:13:27 - insert into imp_record.0 - Detail: Failing row contains (1, pj00030, 1c5344cd-fa89-44d8-9c98-7aa20de1d750, null, z, insert, null, null, null). 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2552) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2284) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:322) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:130) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1302) 2023/11/30 04:13:27 - insert into imp_record.0 - ... 4 more 2023/11/30 04:13:27 - Rename to metadata_visibility.0 - Finished processing (I=0, O=0, R=2, W=2, U=0, E=0) 2023/11/30 04:13:27 - Add imp_record_id.0 - Finished processing (I=0, O=0, R=13, W=13, U=0, E=0) 2023/11/30 04:13:27 - Get variables.0 - Finished processing (I=0, O=0, R=13, W=13, U=0, E=0) 2023/11/30 04:13:27 - insert into imp_record.0 - Finished processing (I=0, O=0, R=1, W=0, U=0, E=1) 2023/11/30 04:13:27 - entity_migration - Transformation detected one or more steps with errors. 2023/11/30 04:13:27 - entity_migration - Transformation is killing the other steps! 2023/11/30 04:13:27 - orcid authentication configuration.0 - Finished processing (I=1, O=0, R=0, W=0, U=0, E=0) 2023/11/30 04:13:27 - Placeholder Var.0 - Finished processing (I=0, O=0, R=1, W=0, U=0, E=0) 2023/11/30 04:13:27 - entity_migration - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Errors detected! 2023/11/30 04:13:28 - orcid scopes configuration.0 - Finished processing (I=1, O=0, R=0, W=0, U=0, E=0) 2023/11/30 04:13:28 - Join rows 6.0 - Finished processing (I=0, O=0, R=27, W=0, U=0, E=0) 2023/11/30 04:13:28 - entities nested placeholder.0 - Finished reading query, closing connection 2023/11/30 04:13:30 - entity_migration - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Errors detected! 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [funding migration] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [funding migration setup] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [set funding variables] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [publications migration] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [set database variables] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Job execution finished 2023/11/30 04:13:30 - Kitchen - Finished! 2023/11/30 04:13:30 - Kitchen - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Finished with errors 2023/11/30 04:13:30 - Kitchen - Start=2023/11/30 04:13:22.422, Stop=2023/11/30 04:13:30.044 2023/11/30 04:13:30 - Kitchen - Processing ended after 7 seconds.

image

Has this ever happened to you?

@kskaiser
Copy link

kskaiser commented Nov 30, 2023

Yes, indeed. That happened also to me ;)
You have to edit the migration_configuration.xls Excel file.
On the first tab ("collections"), you have to enter the UUIDs of the collections you have generated in a previous step.
You must also edit the other tabs in the file, but for the first run you can try to leave it.

Carefully read the PDF documentation.
Not everything is mentioned on the place you'd expect them to be :(
For the Pentaho part:
Run the "spoon.sh" (or "spoon.bat") script to have the GUI opened. There you can load the "dspace_cris_migration.kjb" file and edit the parameters when you run the script. You can see at which step something goes wrong ;)

Be aware that between the "dspace_cris_migration.kjb" script and the "dspace_cris_migration_post_import.kjb" you have to run the "Import Script":
dspace dsrun org.dspace.app.batch.ItemImportMainOA -E <eperson-email>

Good luck!

@alejandratenorio
Copy link
Author

Hi @kskaiser

Thank you so much, I have filled in the excel file, and I ran the dspace_cris_migration.kjb, dsrun org.dspace.app.batch.ItemImportMainOA and dspace_cris_migration_post_import.kjb everything goes well. However, when I ran dspace update-item-references, I got this message:

image

I think I should configure my relationship, shouldn't it?

Another question, data of the entities was migrated to the new collections, but their relationships were not migrated. Am I skipping a step?

image

Thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants