Introduction
RDMS is an application based on Hyrax 3.3 stack by Cottage Labs and AntLeaf. It is built with Docker containers.
Getting Started
Clone the repository with git clone https://gitlab.ruhr-uni-bochum.de/researchdata/rdms.git
.
Ensure you have docker and docker-compose.
Open a console and try running docker -h
and docker-compose -h
to verify they are both accessible.
Create the environment file .env
. You can start by copying the template file .env.template.development to .env
and customizing the values to your setup.
Note: For production environment, use .env.template as your template.
Note: Currently it is necessary to provide a working S3-interfaced object-store for RDMS to function. Therefore, in your .env
file you will need to check the S3 settings section and ensure that the properties in this section have valid values (including that USE_S3
is set to true
).
Quick start
If you would like to do a test run of the system, start the docker containers
$ cd rdms
$ docker-compose -f docker-compose.yml -f docker-compose.development.override.yml up -d
You should see the containers being built and the services start.
Docker compose explained
There are 2 docker-compose
files provided in the repository, which build the containers running the services as shown above
-
docker-compose.yml is the main docker-compose file. It builds all the core servcies required to run the application
-
docker-compose.development.override.yml is used along with the main docker-compose.yml file in development, mainly to expose ports for the various services and mount the local development environment for rdms.
Containers running in docker
-
fcrepo is the container running the Fedora 4 commons repository, an rdf document store.
By default, this runs the fedora service on port 8080 internally in docker. http://fcrepo:8080/fcrepo/rest
-
Solr container runs SOLR, an enterprise search server.
By default, this runs the SOLR service on port 8983 internally in docker. http://solr:8983
-
db containers running a postgres database for use by the Hyrax application (appdb) and Fedora (fcrepodb).
By default, this runs the database service on port 5432 internally in docker.
-
redis container running redis, used by Hyrax to manage background tasks.
By default, this runs the redis service on port 6379 internally in docker.
-
app container sets up the Hyrax application, which is then used by 2 services - web and workers.
-
Web container runs the application.
By default, this runs on port 3000 internally in docker. http://web:3000
-
-
Workers container runs the background tasks, using sidekiq and redis.
-
Hyrax processes long-running or particularly slow work in background jobs to speed up the web request/response cycle. When a user submits a file through a work (using the web or an import task), there a number of background jobs that are run, initilated by the hyrax actor stack, as explained here.
You can monitor the background workers using the RDMS service at http://web:3000/sidekiq when logged in as an admin user.
Container volumes
The data for the application is stored in docker named volumes as specified by the compose files. These are:
$ docker volume list -f name=rdms
DRIVER VOLUME NAME
local rdms_app
local rdms_cache
local rdms_db-app
local rdms_db-fcrepo
local rdms_derivatives
local rdms_fcrepo
local rdms_file_uploads
local rdms_redis
local rdms_solr
These will persist when the system is brought down and rebuilt. Deleting them will require importers etc. to run again.
Persisting container volumes in a custom directory tree
You can set the DOCKER_VOLUMES_PATH_PREFIX
variable in the .env
file to a path where all the above mentioned volumes should be physically stored, but note that this variable should always end in a forward slash /
(This is a side effect of us not explicitly specifying a /
between the variable name and the volume names in docker-compose.yml
, because this allows leaving the variable empty/unset if the Docker default should be used, storing the volumes in /var/lib/docker/volumes
). If you specify set this variable to a value in the .env
file, you can run create_volume_directories.sh
to create the directory tree with the subdirectory for each volume and set its required access permissions / file system ACLs. The main reason for this feature is to allow for easier and less error-prone backup creation and restoration.
If you want to migrate data from existing Docker volumes in /var/lib/docker/volumes
to the ${DOCKER_VOLUMES_PATH_PREFIX}
directory tree, you can do something like this:
# Shut down all Docker containers to ensure consistency
docker compose -f docker-compose.yml down
# Set the `DOCKER_VOLUMES_PATH_PREFIX` variable in the .env file to an absolute or relative path ending in "/", for example "./volumes/"
vim .env
# Source the modified .env file
source .env
# Create the volume directory tree
# Note: This script can also be executed if some or all of the volume directories already exist.
# In that case, it will *not* delete any data, but merely create missing directories and set their filesystem permissions and ACLs)
./create_volume_directories.sh
# Copy the data from the existing/old volume directories
for volume in app cache db-app db-fcrepo derivatives fcrepo file_uploads redis solr
do
cp -a "/var/lib/docker/volumes/rdms_${volume}/_data/*" "${DOCKER_VOLUMES_PATH_PREFIX}${volume}/"
done
Note that when creating backups, numerical user IDs instead of user/group names should be used (because the numerical IDs of users and groups inside containers will usually not align neither with the host system nor with other containers), and filesystem ACLs need to be preserved. Therefore, for example, if you use tar
, use tar's --numeric-owner
and --acls
parameters both when creating and extracting a backup tarball.
Running RDMS
-
When running in production environment,
-
Prepare your .env file using .env.template as the template.
-
You need to use
docker-compose.yml
to build and run the containers.You could setup an alias for docker-compose on your local machine, to ease typing
alias hd='docker-compose -f docker-compose.yml'
-
-
When running in development and test environment,
-
Prepare your .env file using .env.template.development as the template.
-
You need to use
docker-compose.yml
anddocker-compose.development.override.yml
. to build and run your containers.You could setup an alias for docker-compose on your local machine, to ease typing
alias hd='docker-compose -f docker-compose.yml -f docker-compose.development.override.yml'
-
-
Prepare the file
hyrax/seed/setup.json
if you would like to create a set of users in the RDMS, as a part of start-up.
Build the docker container
To start with, you would need to build the system, before running the services. To do this you need to issue the build
command
$ hd build
Start and run the services in the containers
To run the containers after build, issue the up
command (-d means run as daemon, in the background):
$ hd up -d
The containers should all start and the services should be available in their end points as described above
- web server at http://localhost:3000 in development and https://domain-name in production
Docker container status and logs
You can see the state of the containers with hd ps
, and view logs e.g. for the web container using hd logs web
The services that you would need to monitor the logs for are docker mainly web and workers.
Running services
- fcrepo container will run the fedora service, which will be available in port 8080 at http://localhost:8080/fcrepo/rest
- Solr container will run the Solr service, which will be available in port 8983 at http://localhost:8983/solr
- The web container runs the RDMS service, which will be available in port 3000 at http://localhost:3000
Using RDMS
To use the RDMS application on http://localhost:3000, you would need to do the following
- Add passwords for the system users, or assign the role admin to a user who has signed in through Shibboleth, or register an user with role admin (see wiki), so they can login.
- Setup the RUB publication workflow, to submit a dataset.
- Setup the CRC 1280 publication workflow to submit an experiment.
Stop the services
You could stop the container using hd stop
.
This will just stop all of the running containers created by hd up
Any background jobs running in the workers container and not having completed will fail, and will be re-tried when the container is restarted.
To gracefully shutdown the service, before stopping, you could make sure
- There are no background jobs running.
- If there are any running jobs and you don't want to wait, you can kill the job. The job will move to the dead tab, from where you can retry later, after restarting the service.
- There is no Create, Update and Delete activity happening in RDMS
To deploy an update of the code
similar to the steps described above, to deploy an update of the code
-
Checkout the latest code from github
-
Stop the containers. To deploy an update of the code, you likely want to use
hd down
This will stop containers and remove
- Containers for services defined in the Compose file
- Networks defined in the networks section of the Compose file
- The default network, if one is used
Networks and volumes defined as external are not removed. Named volumes are not removed.
-
Build the system
hd build
-
Start the containers
hd up -d
-
Check all the containers have started and the status of the web service with the logs
hd ps hd logs -f web
Setup of RDMS at startup
The RDMS web container is the main entry point of the rdms application, with which users interact.
At startup, the web container runs docker-entrypoint.sh. This script does the following tasks
1. Initial checks
- Creates the log folder if it doesn't exist
- Checks the bundle (and installs It in development)
- Does the database migration and setup
- Checks Solr and Fedora are running (waits 15 seconds if needed)
- Create S3 bucket
2. System users
RDMS creates system users if they don't exist
-
System administrator - the email id of the user is defined in the .env file as
SYSTEM_ADMINISTRATOR
-
Publication manager - the email id of the user is defined in the .env file as
SYSTEM_PUBLICATION_MANAGER
The system users are created with a random password. If you need to login as these users, you need to change the password from the rails console (once the web container is up and running)
docker exec -it rdms-web-1 /bin/bash rails c u = User.find_by(email: ENV['SYSTEM_ADMINISTRATOR']) u.password = <some password> u.save
3. Loads workflows, create default admin sets and collection types
- Loads the default workflows
- Creates the default collection types and admin sets (Hyrax administrative task)
- Setup the participants, visibility and workflow for each admin set
- Setup of CRC 1280 collection
Creates users defined in the file hyrax/seed/setup.json
, if they haven't already been created.
setup.json
4. Create users during start-up from RDMS uses the file hyrax/seed/setup.json
to create a set of users during first startup, if the file exists.
If you would like to create users during startup,
-
Copy the file in
hyrax/seed/setup.json.template
tohyrax/seed/setup.json
-
Modify
hyrax/seed/setup.json
so it has the list of users to create / update.For more information on the rake task to create users and the json file, see this wiki page.
Note: The file hyrax/seed/setup.json
needs to exist before running docker build, for users to be created at start-up.
5. Starts the rails server
Some example docker commands and usage:
# Bring the whole application up to run in the background, building the containers
hd up -d --build
# Stop the container
hd stop
# Halt the system
hd down
# Re-create the web container without affecting the rest of the system (and run in the background with -d)
hd up -d --build --no-deps --force-recreate web
# View the logs for the web application container
hd logs web
# Create a log dump file
hd logs web | tee web_logs_`date --iso-8601`
# (writes to e.g. web_logs_2022-02-14)
# View all running containers
hd ps
# Using its container name, you can run a shell in a container to view or make changes directly
docker exec -it rdms-web-1 /bin/bash
Backups
There is docker documentation advising how to back up volumes and their data. Docker suggests mounting the volumes in a container, creating a tar of the contents of the volume in a backup location and restoring them.
It is also possible to stop the containers, copy all of the named volumes in /var/lib/docker/volumes and start the containers. To use the backup, copy the volumes back into /var/lib/docker/volumes.