⚠️ Note: This page is within the `system` namespace/directory because as of 2024-09-20, the RUB importer in ReSeeD only supports importing from a subdirectory of the same S3 share ("share" meaning "parent of an S3 bucket") that is also used by ReSeeD to store the imported data (configured using the `S3_ENDPOINT`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`, and `S3_REGION` variables in the `.env` file). The name of the S3 bucket from which the RUB importer imports the data is configured in the `S3_FILE_UPLOAD_BUCKET` variable in the `.env` file. Because access to this S3 share requires sysadmin access anyway, this page belongs into the `system` directory of the wiki for the time being.
⚠️ Note: This page is within the `system` namespace/directory because as of 2024-09-20, the RUB importer in ReSeeD only supports importing from a subdirectory of the same S3 share ("share" meaning "parent of an S3 bucket") that is also used by ReSeeD to store the imported data (configured using the `S3_ENDPOINT`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`, and `S3_REGION` variables in the `.env` file). The name of the S3 bucket from which the RUB importer imports the data is configured in the `S3_FILE_UPLOAD_BUCKET` variable in the `.env` file. Because access to this S3 share requires sysadmin access anyway, this page belongs into the `system` directory of the wiki for the time being.
This is not documented yet, pending https://gitlab.ruhr-uni-bochum.de/FDM/rdm-system/antleaf-projectmanagement/-/issues/407.
\ No newline at end of file
This process uses the Bulkrax *CSV from S3 parser* to do imports. Data is prepared in CSV files, with commas used to separate columns, and semi-colons used (in some cases) to separate values within a single column.
## Prepare the data
The data to be imported needs to have the following file structure
* There should be a file called `metadata.csv`
* The format of the columns in the `metadata.csv` file is explained in The *Metadata CSV Format* section (below)
* The CSV file should contain one row for each dataset to be imported
* The row should mention the path to the dataset relative to the directory containing the `metadata.csv` in the column `dataset_path`.
* Within each dataset path, there should be a directory named `data` where all the data for the dataset is placed.
An example data strurcture for 2 datatsets is shown below
```
cl-reseed_import/set1/
├── dataset1
│ └── data
│ └── 1529
│ ├── folder_1
│ │ ├── another_file.exe
│ │ └── some_other_file.json
│ ├── my_software.exe
│ └── mydata.json
├── dataset2
│ └── data
│ ├── AV02CP07GI0
│ │ ├── anat
│ │ │ └── sub-AV02CP07GI0_T1w.nii
│ │ └── func
│ │ └── sub-AV02CP07GI0_task-rest_bold.nii
│ ├── CHANGES
│ ├── README
│ ├── dataset_description.json
│ ├── participants.json
│ └── participants.tsv
└── metadata.csv
```
* The example zip file [Example_RUB_import_data.zip](https://gitlab.ruhr-uni-bochum.de/-/project/864/uploads/afa04d2c5f75c5e5a7d5e69248a1f58e/Example_RUB_import_data.zip) has the datasets and the `metadata.csv` structured as needed.
### Steps to run an import
1. Upload the data you want to import (for example: the unzipped data in [Example_RUB_import_data.zip](https://gitlab.ruhr-uni-bochum.de/-/project/864/uploads/671860fcf003818d516cf4a6d11b8e20/Example_RUB_import_data.zip)) into the S3 bucket that ReSeed has access to
For example: `cl-reseed_import`
This bucket name needs to be filled in the form for `Specify a bucket name with prefix`
2. Log into RDMS as an administrator.
3. On the dashboard you should see the options Importers and Exporters. Click on Importers.
| Administrative Set | RUB publication workflow | This will apply this workflow to all imported datasets |
| Frequency | Once | We are running a one off import |
| Limit | 0 or leave blank | This will import all records in the metadata.csv file |
| Parser | CSV from S3 - ReSeed CSV parser for work (Datasets) from local S3 | This will choose the parser for ReSeed |
| Visibility | Private | The workflow will need all datasets to be private until published |
| Rights statement | Leave blank | It will pick up the rights statement from the csv file |
| Specify a bucket name with prefix | cl-reseed_import | The bucket name with the prefix.<br/>You could also add a path within the bucket, for example: <br/>`cl-reseed_import/set1` |
| **title** | One | String The title of the dataset | Test dataset 1 for import | Test dataset 2 for import |
| **dataset_path** | One | String Folder path within the bucket | dataset1 | dataset2 |
| **alternative_title** | Zero or more | String The alternative title(s) of the dataset Multiple values should be separated with a semi-colon. | The rhythms of old men who hit things with sticks | The rhythms of old men who hit things with sticks; Huh? |
| **description** | Zero or one | String Description of the dataset | A collection of rhythms from veteran rock drummers | A collection of rhythms from veteran rock drummers |
| **contributor** | Zero or more | Names should be entered in the format: **LAST_NAME, FORENAME(S).** Multiple contributors should be separated with a semi-colon. The order of names is significant in relating them to: contributor_orcid contributor_affiliation | Starr, Ringo; Bonham, John; Densmore, John; Moon, Keith | Starr, Ringo; Bonham, John; Densmore, John; Moon, Keith |
| **contributor_orcid** | Zero or more | ORCIDS should be entered in their full https format. The order of ORCIDS is significant in relating them to contributor. ORCIDS should be separated with a semi-colon. It should ideally have the same number of semi-colons as contributor. | [;;https://orcid.org/0000-0001-5109-3700;](https://orcid.org/0000-0001-5109-3700) | https://orcid.org/0000-0001-0001-3700;;; |
| **contributor_affiliation** | Zero or more | String The order of affiliations is significant in relating them to contributor. Affiliations should be separated with a semi-colon. It should ideally have the same number of semi-colons as contributor. | The Beatles; Led Zeppelin; The Doors; The Who | The Beatles;;The Doors; |
| **creator** | One or more | Names should be entered in the format: LAST_NAME, FORENAME(S) Multiple creators should be separated with a semi-colon. The order of names is significant in relating them to: creator_orcid creator_affiliation | Lennon, John | Lennon, John; McCartney, Paul |
| **creator_orcid** | One or more | ORCIDS should be entered in their full https format. The order of ORCIDS is significant in relating them to creator. ORCIDS should be separated with a semi-colon. It should ideally have the same number of semi-colons as creator. | https://orcid.org/0000-0001-5109-3700 | https://orcid.org/0000-0001-5109-3700;https://orcid.org/0000-0001-5109-3701 |
| **creator_affiliation** | One or more | String The order of affiliations is significant in relating them to creator Affiliations should be separated with a semi-colon. It should ideally have the same number of semi-colons as creator. | The Beatles | The Beatles;The Beatles |
| **keyword** | One or more | String Multiple keywords should be separated with a semi-colon | drumming | drumming; pop stars |
| **resource_type** | One or more | Must be one or more of: *Book**BookChapter**Collection**ComputationalNotebook**ConferencePaper**DataPaper**Dataset**Dissertation**Event**Image**InteractiveResource**Journal**JournalArticle**Model**OutputManagementPlan**PeerReview**PhysicalObject**Preprint**Report**Service**Software**Sound**Standard**Text**Workflow**Other* If the value is not one of the allowed values, we will set it to Dataset | Dataset | Dataset |
| **license** | One | [Must be one of](http://rightsstatements.org/vocab/InC/1.0/) http://rightsstatements.org/vocab/InC/1.0/ [https://creativecommons.org/licenses/by/4.0/](http://rightsstatements.org/vocab/InC/1.0/)[https://creativecommons.org/licenses/by-sa/4.0/](http://rightsstatements.org/vocab/InC/1.0/) [https://creativecommons.org/licenses/by-nd/4.0/](http://rightsstatements.org/vocab/InC/1.0/)[https://creativecommons.org/licenses/by-nc/4.0/](http://rightsstatements.org/vocab/InC/1.0/) [https://creativecommons.org/licenses/by-nc-nd/4.0/](http://rightsstatements.org/vocab/InC/1.0/)[https://creativecommons.org/licenses/by-nc-sa/4.0/](http://rightsstatements.org/vocab/InC/1.0/) [http://creativecommons.org/publicdomain/zero/1.0/](http://rightsstatements.org/vocab/InC/1.0/)[http://creativecommons.org/publicdomain/mark/1.0/](http://rightsstatements.org/vocab/InC/1.0/) [http://www.apache.org/licenses/LICENSE-2.0](http://rightsstatements.org/vocab/InC/1.0/)[http://www.gnu.org/licenses/gpl.html](http://rightsstatements.org/vocab/InC/1.0/) [http://opensource.org/licenses/MIT](http://rightsstatements.org/vocab/InC/1.0/)[If the license URI is not one of the allowed values, we will ignore it](http://rightsstatements.org/vocab/InC/1.0/) | http://creativecommons.org/publicdomain/mark/1.0/ | http://opensource.org/licenses/MIT |
| **date** | Zero or more | Dates should be entered in the format: YYYY-MM-DD <DATE-TYPE>. For example, 2024-05-29 Created Multiple dates should be separated with a semi-colon. Each date must have a date type which must be one of the following: *Accepted**Available**Copyrighted**Collected**Created**Deposited**Published ** *Recorded* *Registered* *Submitted* *Updated* *Archived* *If the date type is not one of the allowed values, we will ignore the date and the type* *The dates entered here are all metadata dates.* *The system dates are saved in create_date, date_modified, modified_date, date_uploaded* *The published* date if entered above will be overwritten when you go through the submission and review workflow. | 2024-05-29 Created; 2024-06-10 Published | 2024-05-29 Created; 2024-06-10 Published |
| **subject** | Zero or more | String Multiple subjects should be separated with a semi-colon | drumming | Drumming; music |
| **language** | Zero or more | String Multiple languages should be separated with a semi-colon | English | English |
| **location** | Zero or more | String Multiple languages should be separated with a semi-colon | London | |
| **software_version** | Zero or more | String Multiple software versions should be separated with a semi-colon | | |
| **funder_identifier** | Zero or more | Identifiers should be entered as full URIs Multiple funders Identifier should be separated with a semi-colon The order of identifiers is significant in relating them to: funder_name award_number award_uri award_title | http://dx.doi.org/10.13039/501100001659 | http://dx.doi.org/10.13039/501100001659;http://dx.doi.org/10.13039/50110000165999 |
| **funder_name** | Zero or more | Multiple funder’s name should be separated with a semi-colon. It should ideally have the same number of semi-colons as identifier. The order of funder name is significant in relating them to: funder_identifier award_number award_uri award_title | DFG | DFG;RUB |
| **award_number** | Zero or more | Multiple Funder's award number should be separated with a semi-colon. It should ideally have the same number of semi-colons as identifier. The order of award number is significant in relating them to: funder_identifier funder_name award_uri award_title | A0001 | A0001;W3asxa3 |
| **award_uri** | Zero or more | Multiple Funder's award uri should be separated with a semi-colon. It should ideally have the same number of semi-colons as identifier. The order of award uri is significant in relating them to: funder_identifier funder_name award_number award_title | | |
| **award_title** | Zero or more | Multiple Funder's award uri should be separated with a semi-colon. It should ideally have the same number of semi-colons as identifier. The order of award uri is significant in relating them to: funder_identifier funder_name award_number award_title | | |