Setup with Kubernetes
Prerequisites
Before you get started with deploying Loculus, there are a few prerequisites you’ll need to ensure are in place.
Kubernetes Cluster with Traefik
To deploy Loculus, you’ll currently need a Kubernetes cluster with Traefik installed. Kubernetes is an open-source container orchestration system, and Traefik is a HTTP reverse proxy and load balancer that is used to route traffic to the Loculus application.
Production Environment
For a production environment, if you are deploying on a bare server you can use k3s. K3s is a lightweight, certified Kubernetes distribution designed for production workloads. You could also use a managed Kubernetes environment at a cloud provider like Digital Ocean, Vultr, AWS, etc.
Local Development
If you’re working on local development or testing, you can use k3d to instantly create a disposable k3s like cluster using Docker.
Helm
Helm is a package manager for Kubernetes that simplifies the deployment and management of applications. It uses charts, which are packages of pre-configured Kubernetes resources, to define and install applications in a Kubernetes cluster.
To deploy Loculus, you’ll need to have Helm installed. Helm will be used to manage the dependencies and deploy the Loculus application using the provided Helm chart.
External Database
By default, the provided Helm chart will create temporary databases for testing and development purposes. These temporary databases are suitable for initial setup and experimentation.
However, for a production deployment, you must use a permanent database. We recommend using a managed database service like Amazon RDS, Google Cloud SQL, or DigitalOcean Managed Databases, or you can run your own database server, but you must not use the built in database for production.
To use an external database, you’ll need to provide the necessary connection details, such as the database URL, username, and password.
These details are configured in the secrets
section of the values.yaml
file.
You can also use sealed secrets, see the Sealed Secrets section for more information.
Clone the repository
The helm chart for deploying pathoplexus is within the Loculus repo, in the kubernetes
subdirectory. You can adjust the kubernetes/loculus/values.yaml
file to change the default values.
Configure ingress
[We need to write about how to do this]
Organism Configuration
Loculus supports multiple organisms, each with its own configuration. The organisms section in the values.yaml file allows you to define the specific settings for each organism. See here for a list of the available config fields.
Here’s an example of how the organism configuration might look:
In this example, the configuration for the “ebolavirus-sudan” organism is defined. It includes schema settings, website display options, silo configuration, preprocessing details, and reference genome information.
Note the metadata section includes various fields for how the metadata of specific sequences should be displayed. Each metadata item must have a name
which will also be displayed on the page unless displayName
is also set. The type
of the data, as well as if the field is required
and if autoComplete
is enabled can also be added. Additionally, links from metadata entries to external websites can be added using the customDisplay
option. We also allow metadata to be grouped in sections, specified by the header
field. The noInput
parameter specifies that a parameter is generated internally by loculus (can be specified in the preprocessing pipeline) and should not be expected as input. You can optionally add a columnWidth
for the column in the search table, in pixels, which will be the minimal width for the column.
Your preprocessing pipeline can be customized for each organism. Currently, we use nextclade run
in our preprocessing pipeline and we suggest it as a fast option to do basic checks on your input sequences. Given a nextclade dataset
(in its simplest form a reference sequence and a gene_annotation file) nextclade tries to align new sequences to the reference and will discard sequences that cannot be aligned. It will also compute mutations, insertions and deletions for the nucleotide sequence as well as for the corresponding genes. If you would like to use our preprocessing set-up you can add a nextclade dataset to your values.yaml
as follows:
Additionally, the tableColumns
section defines which metadata fields are shown as columns in the search results.
You can add multiple organisms under the organisms section, each with its own unique configuration.
Multi-segmented pathogens
In Loculus, sequence data from multi-segmented viruses is stored in accessioned sequence entries which group together the segments from a particular sample or isolate. Multi-segmented organisms should be annotated with a list with the names of the segments supplied as nucleotideSequences
. For CCHFV this looks like:
Additionally, if you are using the preprocessing or ingest pipelines, nucleotideSequences
must also be defined in those sections of the config.
Metadata fields can be isolate- or segment-specific. By default we assume metadata fields are isolate-specific (i.e. are shared across all segments), therefore segment-specific fields must be marked as perSegment
in the config file. Marking a field as perSegment
will result in that field existing for each segment. In the example above, instead of there being one metadata field called length
there will now be three fields called length_L
, length_M
and length_S
.
Loculus expects multi-segmented pathogen sequences to be submitted in a specific format. Fasta files should have a separate entry/record for each segment, with a Fasta header of >[submissionID]_[segmentName]
, e.g. >sample123_L
for the L
segment of the sample with the submissionID sample123
. Metadata is uploaded for an entire sequence entry, rather than per segment, i.e. there will be only one row for each submissionID
.
Secrets
Our secrets configuration supports three types of secrets.
raw
This is the simplest type of secret, it is just a key value pair.
sealedsecret
This is a sealed secret, it is encrypted and can only be decrypted by the cluster.
autogen
This is a secret that is automatically generated by the helm chart.
Ready to Deploy?
Once you have the prerequisites in place and have configured the values.yaml
file according to your requirements, you’re ready to deploy Loculus using the provided Helm chart.
Run helm install loculus kubernetes/loculus -f kubernetes/loculus/values.yaml
You can use helm status loculus
to see how the deployment has gone.