Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Timeout value for commands executed via `filesystems/ops` and `status` is now configured with command execution timeout setting
- Stdout, stderr path are now fully expanded
- `probing` configuration is optional now for the `clusters` settings
- Updated documentation for large data upload

## [2.4.0]

Expand Down
32 changes: 22 additions & 10 deletions docs/user_guide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,33 +140,45 @@ All asynchronous endpoints are located under `/transfer` and follow this path s
## File transfer

FirecREST provides two methods for transferring files:
- Small files (up to 5MB by [default](../setup/conf/README.md)) can be uploaded or downloaded directly.

- Small files (up to 5MB by [default](../setup/conf/#dataoperation)) can be uploaded or downloaded directly.
- Large files must first be transferred to a staging storage system (e.g., S3) before being moved to their final location on the HPC filesystem.

Small file transfer endpoints:

- `/filesystem/{system_name}/ops/download`
- `/filesystem/{system_name}/ops/upload`
- [`/filesystem/{system_name}/ops/download`](https://eth-cscs.github.io/firecrest-v2/openapi/#/filesystem/get_download_filesystem__system_name__ops_download_get)
- [`/filesystem/{system_name}/ops/upload`](https://eth-cscs.github.io/firecrest-v2/openapi/#/filesystem/post_upload_filesystem__system_name__ops_upload_post)

Large file transfer endpoints:

- `/filesystem/{system_name}/transfer/download`
- `/filesystem/{system_name}/transfer/upload`
- [`/filesystem/{system_name}/transfer/download`](https://eth-cscs.github.io/firecrest-v2/openapi/#/filesystem/post_download_filesystem__system_name__transfer_download_post)
- [`/filesystem/{system_name}/transfer/upload`](https://eth-cscs.github.io/firecrest-v2/openapi/#/filesystem/post_upload_filesystem__system_name__transfer_upload_post)

### Downloading Large Files

When requesting a large file download, FirecREST returns a download URL and a `jobId`. Once the remote job is completed, the user can retrieve the file using the provided URL.

### File Transfer with Bash
### Uploading Large Files

Given that FirecREST utilizes a storage service based on [S3 as staging area](../setup/arch/external_storage/), the upload is limited by the constraints on S3 server. In this case, for files larger than 5GB the file to be uploaded needs to be splitted in chunks, which complicates the file upload.

For this, we have created a set of examples in different programming and scripting languages that we describe following:

#### Large Data Upload with Python3

This is the easiest way of using FirecREST. See [FirecREST SDK section](#firecrest-sdk) below for more information and detailed examples.

#### Large Data Upload with Bash

[Detailed example.](file_transfer_bash/README.md)

### File Transfer with .NET
#### Large Data Upload with .NET

[Detailed example.](file_transfer_dotnet/README.md)

### Need more examples?
The complexity of using FirecREST, for example implementing the multipart protocol, can vary depending on the programming language used and how well it aligns with your specific requirements or constraints such as speed, disk space, or else.
#### Need more examples?

If you need tailored examples for your particular use case, feel free to open an [issue on GitHub](https://github.com/eth-cscs/firecrest-v2/issues/new). We'd be happy to create one for you.
If you need examples for your particular use case (ie, using a different language than the listed above), feel free to open an [issue on GitHub](https://github.com/eth-cscs/firecrest-v2/issues/new). We'd be happy to create one for you.

## FirecREST SDK

Expand Down
61 changes: 34 additions & 27 deletions docs/user_guide/file_transfer_bash/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# File Transfer with Bash
# Upload Large Data Transfer with Bash

## Uploading large files using S3 multipart protocol

Expand All @@ -10,39 +10,47 @@ Once all parts have been uploaded, the user must call the provided complete uplo

The first step is to determine the size of your large file, expressed in bytes. A reliable method is to use the command: `stat --printf "%s" "$LARGE_FILE_NAME"`.

Then call the `/filesystem/{system}/transfer/upload` endpoint as following.
Then call the [`/filesystem/{system}/transfer/upload`](https://eth-cscs.github.io/firecrest-v2/openapi/#/filesystem/post_upload_filesystem__system_name__transfer_upload_post) endpoint as following.

!!! example "Call to transfer/upload to activate the multipart protocol"
```bash
curl -s --location --globoff "${F7T_URL}/filesystem/${F7T_SYSTEM}/transfer/upload" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $ACCESS_TOKEN" \
--data "{
\"path\":\"${DESTINATION_PATH}\",
\"fileName\":\"${LARGE_FILE_NAME}\",
\"fileSize\":\"${LARGE_FILE_SIZE_IN_BYTES}\"
--header "Content-Type: application/json" \
--header "Authorization: Bearer $ACCESS_TOKEN" \
--data "{
\"path\":\"${DESTINATION_PATH}\",
\"fileName\":\"${DATA_FILE}\",
\"transferDirectives\": {
\"fileSize\":\"${UPLOAD_FILE_SIZE}\",
\"transferMethod\":\"s3\"
}
}"
```

The JSON response from this call follows the structure shown below. FirecREST calculates the number of parts the file must be split into, based on the provided file size and the `maxPartSize` setting. Each part is assigned a number from <i>1</i> to <i>n</i> and must be uploaded using the presigned URLs listed in `partsUploadUrls`. Once all parts have been successfully uploaded, the presigned URL in `completeUploadUrl` is used to finalize the upload sequence and initiate the transfer of the complete data file from S3 to its final destination.
The JSON response from this call follows the structure shown below. FirecREST calculates the number of parts the file must be split into, based on the provided file size and the `maxPartSize` setting. Each part is assigned a number from *1* to *n* and must be uploaded using the presigned URLs listed in `partsUploadUrls`. Once all parts have been successfully uploaded, the presigned URL in `completeUploadUrl` is used to finalize the upload sequence and initiate the transfer of the complete data file from S3 to its final destination.

!!! example "FirecREST response from `/filesystem/{system}/transfer/upload` endpoint"
!!! example "FirecREST response from [`/filesystem/{system}/transfer/upload`](https://eth-cscs.github.io/firecrest-v2/openapi/#/filesystem/post_upload_filesystem__system_name__transfer_upload_post) endpoint"
```json
{
"transferJob": {
"jobId": nnnnnnnnn,
"system": "SYSTEM",
"workingDirectory": "/xxxxxxxxx",
"logs": {
"outputLog": "/xxxxxxxx.log",
"errorLog": "/xxxxxxxxx.log"
"transferJob": {
"jobId": nnnnnnnnn,
"system": "SYSTEM",
"workingDirectory": "/path/to/wordir",
"logs": {
"outputLog": "/path/to/output.log",
"errorLog": "/path/to/error.log"
}
},
"transferDirectives": {
"transfer_method": "s3",
"parts_upload_urls": [
"https://part01-url",
"https://part02-url",
"https://part03-url",
],
"complete_upload_url": "https://upload-complete-url",
"max_part_size": 1073741824
}
},
"partsUploadUrls": [
"https://part1-url", "https://part2-url", "https://part3-url"
],
"completeUploadUrl": "https://upload-complete-url",
"maxPartSize": 1073741824
}
```
Extract the most useful information from the response using `jq`_
Expand Down Expand Up @@ -121,9 +129,9 @@ Complete the upload by calling the presigned `completeUploadUrl` as in the examp
curl -f --show-error -i -w "%{http_code}" -H "Content-Type: application/xml" -d "$complete_upload_xml" -X POST $complete_upload_url
```

## Script examples
### Script examples

### Using split
#### Using `split` command

To run the [example](examples/multipart_upload_split.sh) you need first to set up the `environment file` using the provided [env-template](examples/env-template) file.
Set the field in the template as described int the [user guide](../README.md) to match your deployment and save the template as a new file.
Expand All @@ -137,8 +145,7 @@ Launch the script as in the example

The script uploads `your_data_fil.zip` to the designated cluster. Note that the `split` command generates all temporary part files beforehand, so <b>your local disk must have at least as much free space as the total size of the data being uploaded</b>.


### Using dd
#### Using `dd` command

To run the [example](examples/multipart_upload_dd.sh) you need first to set up the `environment file` using the provided [env-template](examples/env-template) file.
Set the field in the template as described int the [user guide](../README.md) to match your deployment and save the template as a new file.
Expand Down
4 changes: 2 additions & 2 deletions docs/user_guide/file_transfer_bash/examples/.env-template
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
export CLIENT_ID="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export CLIENT_SECRET="yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"
export TOKEN_URI="https://auth.cscs.ch/auth/realms/firecrest-clients/protocol/openid-connect/token"
export F7T_URL="https://api.tds.cscs.ch/stp/firecrest/v2"
export TOKEN_URI="http://localhost:8080/auth/realms/firecrest-clients/protocol/openid-connect/token"
export F7T_URL="http://localhost:8000"
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,10 @@ response=$(curl -s --location --globoff "${F7T_URL}/filesystem/${F7T_SYSTEM}/tra
--data "{
\"path\":\"${DESTINATION_PATH}\",
\"fileName\":\"${DATA_FILE}\",
\"fileSize\":\"${UPLOAD_FILE_SIZE}\"
\"transferDirectives\": {
\"fileSize\":\"${UPLOAD_FILE_SIZE}\",
\"transferMethod\":\"s3\"
}
}")

# Extract information
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,10 @@ response=$(curl -s --location --globoff "${F7T_URL}/filesystem/${F7T_SYSTEM}/tra
--data "{
\"path\":\"${DESTINATION_PATH}\",
\"fileName\":\"${DATA_FILE}\",
\"fileSize\":\"${UPLOAD_FILE_SIZE}\"
\"transferDirectives\": {
\"fileSize\":\"${UPLOAD_FILE_SIZE}\",
\"transferMethod\":\"s3\"
}
}")

# Extract information
Expand Down