Skip to content

Web application providing a GUI for editing and loading data into the Cambridge Digital Collection Platform

License

Notifications You must be signed in to change notification settings

cambridge-collection/dl-loading-ui

Repository files navigation

Cambridge Digital Library: Content Loader

This is a web application providing a GUI for editing and loading data into the Cambridge Digital Library Platform. It edits data in the cudl-data-source s3 buckets (prefixed with dev-, staging- and production-). Once this data is edited it triggers processing, see https://github.com/cambridge-collection/data-lambda-transform which has details on how the data is processed.

Want to just start the content loader on a folder of data?

If you are new to the content loader, start by using the sample data set instead of using data ina s3 bucket. You can run the following commands to bring in the sample data, and run the client:

git submodule init

You should see some sample data in the directory docker/dl-data-samples/source-data

Then you can build the client and run it using the commands:

mvn clean package

This should produce a target directory with a war file in.

Then you can run the command:

docker-compose  --env-file example.env --file docker-compose-sample-data.yml up

and the content loader should start on http://localhost:8081 and you will need to log in using the following:

username: test-all
password: password

This is set in the file docker/dl-loading-db/resources/example_user_data.sql and can be changed through the user management section.

Ready for getting the edited data into the viewer?

The content loader works on the "source data" (see docker/dl-data-samples/source-data for an example) and it needs to be transformed into the processed data (see docker/dl-data-samples/processed-data for an example). In order to do this we have a Terraform script to create a pipeline in AWS to transform the data in real-time here: https://github.com/cambridge-collection/cudl-terraform. If you prefer to do this transformation yourself we have XSLT here: https://github.com/cambridge-collection/cudl-data-processing-xslt which transforms the TEI into the JSON format we use for the viewer.

Cambridge developer?

Use: docker-compose --env-file <your env file> --file docker-compose-cudl-data.yml

Running with S3 buckets

For this you will need to make sure the env file you are using points to a valid s3 bucket which you have access to and have created an access key, with permission to write to that bucket. The bucket should contain the data in the same format as shown in the docker/dl-data-samples/source-data.

  1. Build the project:

     $ mvn clean package
    
  2. Copy the example.env file and customise by adding specific s3 bucket details and access keys.

  3. Run using the command:

    docker-compose --env-file <your_.env_file> up
    

alternatively you can enable hot reloading (for html) using the command:

docker-compose --file docker-compose-hot.yml --env-file <your_.env_file> up

  1. The application will start an HTTP server listening on http://localhost:8081. (or an alternative port if config was changed)

NOTE: It currently takes AGES for the servers to start. This is when SSL is enabled. Needs investigating. Be prepared when deploying to staging/dev.

ALSO: If you're deploying for CUDL, the env files for staging and dev (there is no production deployment) are in the KeePass file.

Developing

The Maven build is the definitive way to compile/test the project. Your IDE should be able to import the project from the pom.xml.

In addition, compile-time annotation processing is used by the Immutables library. This works automatically with Maven, but your IDE may need some configuration to enable & configure annotation processing feature of the java compiler.

  • Instructions for enabling compile-time annotation processing in IDEs are available here: https://immutables.github.io/apt.html
  • The annotation processor is expected to generate code at:
    • target/generated-sources/annotations
    • target/generated-test-sources/test-annotations
  • These locations are added to the build path via the build-helper-maven-plugin and should be picked up by IDEs automatically

Configuration

Configuration is set through .env file which defines a number of environment variables that are passed to the application on building and running.

Copy the example.env file and make your own adjustments to the variables.

This is passed into docker-compose when the application is run/

Note

The application can be configured using the methods described in the Externalized Configuration section of the Spring Boot docs.

Authentication / Authorisation

The auth.methods configuration property controls which authentication method(s) are enabled. It's a comma-separated list of method names. Available methods are:

  • basic — HTTP basic authentication; suitable for development/testing only
  • saml — SAML 2.0 authentication

SAML 2.0 Authentication Guide

Setting up your Idp

This application can use a SAML IdP to provide user authentication. I have used the standalone version of Keycloak from https://www.keycloak.org/ (tested with Keycloak 9.0).

Getting started guide for Keycloak: https://www.keycloak.org/docs/latest/getting_started/

After installing keycloak you need to create a new Realm.

It will default to the 'Master' realm and if you click on the word Master it will allow you to 'Add Realm'. Add the following:

Name: demo
Import file: select the config file 'keycloak-realm-export.json' in the root of this project.

This will be setup the Client 'com:demo:spring:sp', Role 'DEMO_USER' and connect using the to use the keystore in src /main/resources/saml/samlKeystore.jks.

For more info, here's a nice guide on setting up a Spring Boot app with SAML and Keycloak: https://blog.codecentric.de/en/2019/03/secure-spring-boot-app-saml-keycloak/

Create a test user

You them need to go into keycloak and add a user to test with. To do this select Manage -> Users -> Add User. You should then ensure the user has the DEMO_USER role in the 'Role Mappings' tab.

You then need to set the users password so you can log in. You do this by going back to the Users screen and select 'impersonate'. You can then select the password tab and supply a password for your user.

Adding a Client Scope

You should now see the Client com:demo:spring:sp in your list of clients and DEMO_USER in your list of roles. Now select Client scopes -> Create, to create a new scope for our client to use.

Settings:
    Name: saml_profile
    Protocol: saml

Mappers:
    Add Builtin:
        X500 givenName
        X500 surname
        X500 email

Scope:
    Assigned Roles: DEMO_USER

Now you can add this to your client by going selecting com:demo:spring:sp from the list of clients, and going to the Client Scopes tab. You can then make sure saml_profile is in 'Assigned Default Client Scopes'.

If you goto: http://your-keycloak-host/auth/realms/demo/protocol/saml/descriptor you should see the XML descriptor for the Idp that the loading-ui will connect to.

You can now update the application.properties to point to this installation e.g. auth.saml.keycloak.auth-server-url=http://your-keycloak-host/auth/realms/demo

You can now start the loading ui and log in using the user account you have created.

Database

A database server is required to run the application. Postgres 12.3 is recommended. Docker Compose can be used to create one, see the Docker Compose section.

The database schema is managed by Flyway. Flyway applies any required migrations on application startup, so the app's user needs permission to create/modify the database structure. Alternatively, the Flyway CLI can be used stand-alone to apply the migration files.

Example Data

The database can be populated with sample users and workspaces by executing example_user_data.sql.

Docker Compose

The repository contains a Docker Compose file which will run a suitable database, plus a web-based DB admin UI to manage it.

Run $ docker-compose --env-file example.env up --build in the repository to start the services with appropriate values set in the example.env

Deployment

First build the project if you have not already. $ mvn clean package

Enable the library VPN

To deploy to remote host set the DOCKER_HOST variable. export DOCKER_HOST="ssh://digilib@dev.loader.cudl.link"

then run the docker commands as you would the local version. You will need to be on the library VPN to connect. Add the -d flag to the up command for Detached mode: Run containers in the background.

e.g.

docker-compose --env-file cudl-dev.env down
docker image rm camdl/dl-loading-ui
docker-compose --env-file cudl-dev.env up --build -d

NOTE: This can be very slow at the moment to start up.

Note that you will need to copy your key to the servers manually to deploy remotely using the command: e.g. for dev: ssh-copy-id -i ~/.ssh/mykey digilib@dev.loader.cudl.link passwords for the digilib account can be found on keepass.

You can then unset the DOCKER_HOST variable to work locally again

unset DOCKER_HOST

Restart running container on dev

export DOCKER_HOST="ssh://digilib@dev.loader.cudl.link"

Get container id using:

docker container ls

the restart using:

docker container restart [CONTAINERID]

Alternatively, after setting the DOCKER_HOST you can run:

docker container restart

Publish to Docker Images AWS ECR Repository

source <env file>
cd docker/dl-loading-db

follow push commands at: https://eu-west-1.console.aws.amazon.com/ecr/repositories/private/563181399728/dl-loader-db?region=eu-west-1 substituting build command for: docker image build --build-arg LOADING_DB_PASSWORD=$LOADING_DB_PASSWORD --build-arg LOADING_DB_USER_SETUP_SQL=$LOADING_DB_USER_SETUP_SQL -t dl-loader-db .

cd ../..

follow push commands at: https://eu-west-1.console.aws.amazon.com/ecr/repositories/private/563181399728/dl-loader-ui?region=eu-west-1 substituting build command for: docker image build --build-arg LOADING_UI_HARDCODED_USERS_FILE=$LOADING_UI_HARDCODED_USERS_FILE -t dl-loader-ui .

About

Web application providing a GUI for editing and loading data into the Cambridge Digital Collection Platform

Resources

License

Stars

Watchers

Forks

Packages

No packages published