## Using Bulkrax *CRC1280 folder parser* to do imports
### Prepare the data
The Bulkrax *CRC1280 folder parser* expects the CRC1280 data to follow a very specific pattern, starting from a folder path within which is the group/experiment/subject/session/modality
The text file shows an example import_data tree [import_data.txt](/system/development-notes/import_data.txt)
Copy the directory to be imported within `CRC_FOLDER_IMPORT_PATH` (the import data directory that is mounted onto the web and workers container as explained below), so it's available to the importer.
*Note for developers / system administrators*
* The location of the test data folder in the host machine (from where you will be running docker) will need to be added to the [.env file](https://gitlab.ruhr-uni-bochum.de/researchdata/rdms/-/blob/develop/.env.template#L90).
* The environment variable is `CRC_FOLDER_IMPORT_PATH`.
* This is mapped to the path `/rub-test-data` in the container in docker (as defined in the [docker-compose.yml](https://gitlab.ruhr-uni-bochum.de/researchdata/rdms/-/blob/develop/docker-compose.yml#L99))
ℹ After changing the contents of the `.env` file, stop the containers and start them again (updated values only apply after restarting the containers):
```shell
$ docker compose -f docker-compose.yml down
$ docker compose -f docker-compose.yml up -d
```
#### Import RUB s3 test data of CRC1280
1. rclone mount s3 bucket (ceph) with test data
`rclone mount --daemon s3-rdms-test:fowi-rdms-testbucket/20220425_Test_GroupData /root/rdms.develop/hyrax/rub-test-data/20220425_Test_GroupData/`
⚠️ For reasons not yet fully understood, the `rclone mount` process used to mount the S3 bucket into the bulk importer directory must be run as `root`, even if the RDMS containers are started by a non-root user. We suspect that this is a limitation of the kernel's `fuse` module, and that the `rclone mount` process must be run under the same UID as the importer processes inside the docker containers (which run as root, too).
### Steps to run an import
1. Log into RDMS as an administrator.
2. On the dashboard you should see the options Importers and Exporters. Click on Importers.
There is one job you cannot monitor from the importer dashboard. It is the job scheduled to run after all of the collections, works and filesets have been imported. It is the job to create relationships between the collections, works and filesets. If this is not t=run, the uploaded files will not be associated to the filesets.
The importer first creates a csv file for the folder to be imported. This is then imported with a customised csv importer.
The csv files are stored at `rdms/hyrax/tmp/imports/`, where `rdms` is the root of the source directory checked out and from where the `workers` docker container is running.
### `Mark as completed` and `Rerun` buttons
1. Mark as completed
This button is displayed the numbers show that the import has completed (total and number processed are the same, and the number failed is 0), but Bulkrax has got it's counting wrong and thinks the import is pending (like in the screen shot below). In such a case, it is safe to click on `Mark as completed`. It will change the status of the import from pending to completed (and does nothing else).
This button is displayed when an import has completed, but with errors. When this button is clicked, it will cycle through all the entries in the importer and rerun all the failed jobs.
It is worth going into Sidekiq (for example - https://rdms.cottagelabs.com/sidekiq/. You need to logged in as admin to view this URL) first and checking if the job is still being processed in sidekiq (in busy, enqueued or retries). If it's an error like `Failed to acquire lock`, then we have noticed that this goes away on a retry.
If there are no jobs listed in sidekiq, you can click on rerun to rerun all the failed jobs.
When a new CRCDataset is imported with the same collection name as an existing CRC1280 collection (either previously imported or created by a user), it will create a new collection, rather than reuse the one previously created.
When a new CRCDataset is imported again (it was previously imported), it will create a new collection and a new experiment, rather than reuse the ones previously created.