Configuring extra file submission
Loculus supports the handling of arbitrary files associated with sequence entries. You can configure Loculus to support the submission of extra files for sequences, as well as providing extra files along with the sequence data and metadata for download. A typical usecase would be for raw reads. The files are stored in S3.
To enable this feature you need to configure an S3 bucket for Loculus to use, and then configure the file categories per organism.
Conceptional overview
Section titled “Conceptional overview”Extra files submitted alongside sequence and metadata are treated differently. Loculus uses S3 - a generic object storage service - to store these files. Files are uploaded directly to S3 using presigned URLs.
Unlike files that contain sequence data, the file sharing files are not inherently coupled to any particular sequence. Files are uploaded first, and then associated to a sequence entry; at the time of uploading, only an owning group needs to be specified. Because of this, the same file can also be attached to multiple sequence entries as well.
The files will not be publicly accessible, until an associated sequence entry is released. Loculus uses the file access mechanisms built into S3: Loculus tags files with public=true
if they should be public, and the S3 is configured with a policy to make files with this tag publicly accessible (this configuration needs to be applied by the S3 administrator).
When configuring this feature for an organism, you can configure file categories for which users can submit files, as well as file “output” categories, which will then be visible alongside other sequence data and metadata in the sequence detail view. You can also configure only submit files (which can then be used by the preprocessing pipeline in some way) or only “output” files, which the preprocessing pipeline can generate on its own. The preprocessing pipeline gets access to submitted files before they are released, and the pipeline can also upload its own new files.
Configuring an S3 bucket
Section titled “Configuring an S3 bucket”You neeed admin access to an S3 bucket, and have the credentials at hand.
Enable S3 and configure the location of the bucket:
s3: enabled: true bucket: region: us-east-1 endpoint: my-s3.net bucket: loculus-data
Configure the credentials using sealed secrets:
secrets: s3-bucket: type: sealedsecret clusterWide: 'true' encryptedData: accessKey: AgCm73j1g21Dn.... secretKey: AgAS8a/ldl....
The backend makes files in the bucket public, by tagging them with public=true
.
For this to work, you need to configure a bucket policy like this:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": "*", "Action": "s3:GetObject", "Resource": ["arn:aws:s3:::$bucket/*"], "Condition": { "StringEquals": { "s3:ExistingObjectTag/public": "true" } } } ]}
Consult the documentation of your particular S3 provider on how to configure bucket policies.
Configuring file submission
Section titled “Configuring file submission”Users can submit files along with sequence metadata and sequences (or also instead of sequences).
For this, you need to enable the files
submission type, and configure at least one file category that users can submit:
my-organism: schema: submissionDataTypes: files: enabled: true # enable the feature categories: - name: rawReads # configure a submission category
The example above configures the rawReads
file category.
If a user submits these files, they will be passed along to the processing pipeline as well, and the pipeline can read them, pass them through as output files, or generate additional fields or process them in any other way.
Configuring output files
Section titled “Configuring output files”By default, files are not shown in the sequence detail view as well. You need to configure output files as well, and the pipeline needs to set them.
To configure:
my-organism: schema: files: - name: rawReads