Skip to content
Snippets Groups Projects

Introduction

RDMS is an application based on Hyrax 3.3 stack by Cottage Labs and AntLeaf. It is built with Docker containers.

Getting Started

Clone the repository with git clone https://gitlab.ruhr-uni-bochum.de/researchdata/rdms.git.

Ensure you have docker and docker-compose.

Open a console and try running docker -h and docker-compose -h to verify they are both accessible.

Create the environment file .env. You can start by copying the template file .env.template.development to .env and customizing the values to your setup.
Note: For production environment, use .env.template as your template.

Note: Currently it is necessary to provide a working S3-interfaced object-store for RDMS to function. Therefore, in your .env file you will need to check the S3 settings section and ensure that the properties in this section have valid values (including that USE_S3 is set to true).

Quick start

If you would like to do a test run of the system, start the docker containers

$ cd rdms
$ docker-compose -f docker-compose.yml -f docker-compose.development.override.yml up -d

You should see the containers being built and the services start.

Docker compose explained

There are 2 docker-compose files provided in the repository, which build the containers running the services as shown above

Containers running in docker

  • fcrepo is the container running the Fedora 4 commons repository, an rdf document store.

    By default, this runs the fedora service on port 8080 internally in docker. http://fcrepo:8080/fcrepo/rest

  • Solr container runs SOLR, an enterprise search server.

    By default, this runs the SOLR service on port 8983 internally in docker. http://solr:8983

  • db containers running a postgres database for use by the Hyrax application (appdb) and Fedora (fcrepodb).

    By default, this runs the database service on port 5432 internally in docker.

  • redis container running redis, used by Hyrax to manage background tasks.

    By default, this runs the redis service on port 6379 internally in docker.

  • app container sets up the Hyrax application, which is then used by 2 services - web and workers.

  • Workers container runs the background tasks, using sidekiq and redis.

  • Hyrax processes long-running or particularly slow work in background jobs to speed up the web request/response cycle. When a user submits a file through a work (using the web or an import task), there a number of background jobs that are run, initilated by the hyrax actor stack, as explained here.

    You can monitor the background workers using the RDMS service at http://web:3000/sidekiq when logged in as an admin user.

Container volumes

The data for the application is stored in docker named volumes as specified by the compose files. These are:

$ docker volume list -f name=rdms
DRIVER    VOLUME NAME
local     rdms_app
local     rdms_cache
local     rdms_db-app
local     rdms_db-fcrepo
local     rdms_derivatives
local     rdms_fcrepo
local     rdms_file_uploads
local     rdms_redis
local     rdms_solr

These will persist when the system is brought down and rebuilt. Deleting them will require importers etc. to run again.

Persisting container volumes in a custom directory tree

You can set the DOCKER_VOLUMES_PATH_PREFIX variable in the .env file to a path where all the above mentioned volumes should be physically stored, but note that this variable should always end in a forward slash / (This is a side effect of us not explicitly specifying a / between the variable name and the volume names in docker-compose.yml, because this allows leaving the variable empty/unset if the Docker default should be used, storing the volumes in /var/lib/docker/volumes). If you specify set this variable to a value in the .env file, you can run create_volume_directories.sh to create the directory tree with the subdirectory for each volume and set its required access permissions / file system ACLs. The main reason for this feature is to allow for easier and less error-prone backup creation and restoration.

If you want to migrate data from existing Docker volumes in /var/lib/docker/volumes to the ${DOCKER_VOLUMES_PATH_PREFIX} directory tree, you can do something like this:

# Shut down all Docker containers to ensure consistency
docker compose -f docker-compose.yml down

# Set the `DOCKER_VOLUMES_PATH_PREFIX` variable in the .env file to an absolute or relative path ending in "/", for example "./volumes/"
vim .env

# Source the modified .env file
source .env

# Create the volume directory tree
# Note: This script can also be executed if some or all of the volume directories already exist.
# In that case, it will *not* delete any data, but merely create missing directories and set their filesystem permissions and ACLs)
./create_volume_directories.sh

# Copy the data from the existing/old volume directories
for volume in app cache db-app db-fcrepo derivatives fcrepo file_uploads redis solr
do
	cp -a "/var/lib/docker/volumes/rdms_${volume}/_data/*" "${DOCKER_VOLUMES_PATH_PREFIX}${volume}/"
done

Note that when creating backups, numerical user IDs instead of user/group names should be used (because the numerical IDs of users and groups inside containers will usually not align neither with the host system nor with other containers), and filesystem ACLs need to be preserved. Therefore, for example, if you use tar, use tar's --numeric-owner and --acls parameters both when creating and extracting a backup tarball.

Running RDMS

  • When running in production environment,

    • Prepare your .env file using .env.template as the template.

    • You need to use docker-compose.yml to build and run the containers.

      You could setup an alias for docker-compose on your local machine, to ease typing

      alias hd='docker-compose -f docker-compose.yml'
  • When running in development and test environment,

    • Prepare your .env file using .env.template.development as the template.

    • You need to use docker-compose.yml and docker-compose.development.override.yml. to build and run your containers.

      You could setup an alias for docker-compose on your local machine, to ease typing

      alias hd='docker-compose -f docker-compose.yml -f docker-compose.development.override.yml'
  • Prepare the file hyrax/seed/setup.json if you would like to create a set of users in the RDMS, as a part of start-up.

Build the docker container

To start with, you would need to build the system, before running the services. To do this you need to issue the build command

$ hd build

Start and run the services in the containers

To run the containers after build, issue the up command (-d means run as daemon, in the background):

$ hd up -d

The containers should all start and the services should be available in their end points as described above

Docker container status and logs

You can see the state of the containers with hd ps, and view logs e.g. for the web container using hd logs web

The services that you would need to monitor the logs for are docker mainly web and workers.

Running services

Using RDMS

To use the RDMS application on http://localhost:3000, you would need to do the following

  1. Add passwords for the system users, or assign the role admin to a user who has signed in through Shibboleth, or register an user with role admin (see wiki), so they can login.
  2. Setup the RUB publication workflow, to submit a dataset.
  3. Setup the CRC 1280 publication workflow to submit an experiment.

Stop the services

You could stop the container using hd stop.

This will just stop all of the running containers created by hd up

Any background jobs running in the workers container and not having completed will fail, and will be re-tried when the container is restarted.

To gracefully shutdown the service, before stopping, you could make sure

  • There are no background jobs running.
    • If there are any running jobs and you don't want to wait, you can kill the job. The job will move to the dead tab, from where you can retry later, after restarting the service.
  • There is no Create, Update and Delete activity happening in RDMS

To deploy an update of the code

similar to the steps described above, to deploy an update of the code

  • Checkout the latest code from github

  • Stop the containers. To deploy an update of the code, you likely want to use

    hd down

    This will stop containers and remove

    • Containers for services defined in the Compose file
    • Networks defined in the networks section of the Compose file
    • The default network, if one is used

    Networks and volumes defined as external are not removed. Named volumes are not removed.

  • Build the system

    hd build
  • Start the containers

    hd up -d
  • Check all the containers have started and the status of the web service with the logs

    hd ps
    hd logs -f web

Setup of RDMS at startup

The RDMS web container is the main entry point of the rdms application, with which users interact.

At startup, the web container runs docker-entrypoint.sh. This script does the following tasks

1. Initial checks

  • Creates the log folder if it doesn't exist
  • Checks the bundle (and installs It in development)
  • Does the database migration and setup
  • Checks Solr and Fedora are running (waits 15 seconds if needed)
  • Create S3 bucket

2. System users

RDMS creates system users if they don't exist

  • System administrator - the email id of the user is defined in the .env file as SYSTEM_ADMINISTRATOR

  • Publication manager - the email id of the user is defined in the .env file as SYSTEM_PUBLICATION_MANAGER

    Setting a password for the system users

    The system users are created with a random password. If you need to login as these users, you need to change the password from the rails console (once the web container is up and running)

    docker exec -it rdms-web-1 /bin/bash
    rails c
    u = User.find_by(email: ENV['SYSTEM_ADMINISTRATOR'])
    u.password = <some password>
    u.save

3. Loads workflows, create default admin sets and collection types

  • Loads the default workflows
  • Creates the default collection types and admin sets (Hyrax administrative task)
  • Setup the participants, visibility and workflow for each admin set
  • Setup of CRC 1280 collection

Creates users defined in the file hyrax/seed/setup.json, if they haven't already been created.

4. Create users during start-up from setup.json

RDMS uses the file hyrax/seed/setup.json to create a set of users during first startup, if the file exists.

If you would like to create users during startup,

  • Copy the file in hyrax/seed/setup.json.template to hyrax/seed/setup.json

  • Modify hyrax/seed/setup.json so it has the list of users to create / update.

    For more information on the rake task to create users and the json file, see this wiki page.

Note: The file hyrax/seed/setup.json needs to exist before running docker build, for users to be created at start-up.

5. Starts the rails server

Some example docker commands and usage:

Docker cheat sheet

# Bring the whole application up to run in the background, building the containers
hd up -d --build

# Stop the container
hd stop

# Halt the system
hd down

# Re-create the web container without affecting the rest of the system (and run in the background with -d)
hd up -d --build --no-deps --force-recreate web

# View the logs for the web application container
hd logs web

# Create a log dump file
hd logs web | tee web_logs_`date --iso-8601`
# (writes to e.g. web_logs_2022-02-14)

# View all running containers
hd ps     

# Using its container name, you can run a shell in a container to view or make changes directly
docker exec -it rdms-web-1 /bin/bash

Backups

There is docker documentation advising how to back up volumes and their data. Docker suggests mounting the volumes in a container, creating a tar of the contents of the volume in a backup location and restoring them.

It is also possible to stop the containers, copy all of the named volumes in /var/lib/docker/volumes and start the containers. To use the backup, copy the volumes back into /var/lib/docker/volumes.