⚠️ Note: This page is within the `system` namespace/directory because as of 2024-09-20, the RUB importer in ReSeeD only supports importing from a subdirectory of the same S3 share ("share" meaning "parent of an S3 bucket") that is also used by ReSeeD to store the imported data (configured using the `S3_ENDPOINT`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`, and `S3_REGION` variables in the `.env` file). The name of the S3 bucket from which the RUB importer imports the data is configured in the `S3_FILE_UPLOAD_BUCKET` variable in the `.env` file. Because access to this S3 share requires sysadmin access anyway, this page belongs into the `system` directory of the wiki for the time being.
⚠️ Note: This page is within the `system` namespace/directory because as of 2024-09-20, the RUB importer in ReSeeD only supports importing from a subdirectory of the same S3 share ("share" meaning "parent of an S3 bucket") that is also used by ReSeeD to store the imported data (configured using the `S3_ENDPOINT`, `S3_ACCESS_KEY`, `S3_SECRET_KEY`, and `S3_REGION` variables in the `.env` file). The name of the S3 bucket from which the RUB importer imports the data is configured in the `S3_FILE_UPLOAD_BUCKET` variable in the `.env` file. Because access to this S3 share requires sysadmin access anyway, this page belongs into the `system` directory of the wiki for the time being.
----
## About this process
This process uses the Bulkrax *CSV from S3 parser* to do imports. Data is prepared in CSV files, with commas used to separate columns, and semi-colons used (in some cases) to separate values within a single column.
This process uses the Bulkrax *CSV from S3 parser* to do imports. Data is prepared in CSV files, with commas used to separate columns, and semi-colons used (in some cases) to separate values within a single column.
## Prepare the data
## Prepare the data
...
@@ -10,7 +13,7 @@ The data to be imported needs to have the following file structure
...
@@ -10,7 +13,7 @@ The data to be imported needs to have the following file structure
* There should be a file called `metadata.csv`
* There should be a file called `metadata.csv`
* The format of the columns in the `metadata.csv` file is explained in The *Metadata CSV Format* section (below)
* The format of the columns in the `metadata.csv` file is explained in *The metadata CSV format* section (below)
* The CSV file should contain one row for each dataset to be imported
* The CSV file should contain one row for each dataset to be imported
* The row should mention the path to the dataset relative to the directory containing the `metadata.csv` in the column `dataset_path`.
* The row should mention the path to the dataset relative to the directory containing the `metadata.csv` in the column `dataset_path`.
...
@@ -78,7 +81,7 @@ The data to be imported needs to have the following file structure
...
@@ -78,7 +81,7 @@ The data to be imported needs to have the following file structure