Skip to content

Glossary

Accession

An accession is the unique identifier of a sequence entry. The accession itself does not contain the version number. The field that concatenates the accession and the version (<accession>.<version>) is called accessionVersion.

Aligned sequence

An aligned sequence is a sequence that has been aligned to a reference sequence. I.e., it is a sequence that has the same length as the reference sequence. It is the task of the preprocessing pipeline to perform the alignment.

Backend

The “Loculus backend” is the central server service of Loculus and responsible for managing submissions and ensuring data persistence. Among other things, it offers APIs to submit and revise data. For querying and retrieving data, LAPIS is usually used. The backend is written in Kotlin and uses the Spring framework.

Deletion

A deletion is a type of mutation where a nucleotide or amino acid is present in a reference sequence but not present in the sample sequence. The notation for a deletion in the case of a single-segmented nucleotide sequence is <base of reference genome><position>- (e.g., C100-). A mutation in the case of a multi-segmented nucleotide sequence or an amino acid sequence is further prefixed with the segment or gene name by adding <segment/gene name>: (e.g., E:S100-).

Insertion

An insertion is a type of mutation where one or more nucleotides or amino acids are present in a sample sequence but not in a reference sequence. The notation for an insertion in the case of a single-segmented nucleotide sequence is ins_<position>:<inserted bases> (e.g., ins_100:AAT). An insertion in the case of a multi-segmented nucleotide sequence or an amino acid sequence further contains <segment/gene name>: in front of the position (e.g., ins_E:100:AAT).

Instance

An instance is a specific deployment of Loculus. Each instance operates independently, with its own set of data, user management, and preprocessing pipelines.

Keycloak

Keycloak is an open-source identity and access management software. Loculus uses it to manage user accounts and authentication.

LAPIS

LAPIS is an open-source software for querying genomic sequences. It provides convenient APIs to filter and download data and get aggregated information. It uses SILO for the main computations. Users may directly use the LAPIS APIs to retrieve data and there is an R package under development. In Loculus, there is a LAPIS instance for each organism.

Metadata

Metadata refers to sequence entry-specific information. Some metadata are provided by the submitters (typical fields include sampling location and time and information about the host), whereas others metadata can be derived from a sequence by the preprocessing pipeline (e.g., the lineage) or appended by Loculus (e.g., the submission date). Metadata fields are configurable and different Loculus instances or different organisms within an instance may have different fields.

Mutation

A mutation is a change in the nucleotide or amino acid sequence of a sample relative to the reference sequence. We distinguish between substitutions, deletions and insertions.

Nextclade

Nextclade is an open-source software for sequence alignment, clade and mutation calling and sequence quality checks for viral data. Loculus provides a preprocessing pipeline that uses Nextclade.

Nucleotide sequence and amino acid sequences

Users upload unaligned nucleotide sequences. The preprocessing pipeline aligns the sequences to an organism-specific reference genome and translates them to amino acid sequences.

Organism

A Loculus instance is capable of storing data from multiple organisms. Organisms are independent of each other: they may have different metadata fields, use different preprocessing pipelines and different reference sequences.

Preprocessing pipeline

A preprocessing pipeline takes submitter-provided data for a specific organism, adds alignments, translations, and annotations, and identifies errors both in metadata and sequences.

Processed data

Processed data is generated by the preprocessing pipeline based on the unprocessed data and contain both sequence and metadata. Processed data usually includes derived information such as sequence alignments, translations and lineages. The processing pipeline will also “clean” the unprocessed data (typically this entails formatting metadata fields in a standard way and flagging potential errors in the metadata). Users can usually only see the processed data.

Reference sequences

Each organism has its own reference sequence(s) which are used for alignment, enabling easier comparison of sequences. It is customary to choose a reference sequence which has been accepted as a standard by the research community.

Revision

A revision adds an updated version of a sequence entry.

Revocation

A revocation adds a new version that declares a sequence entry to be revoked. Revoked sequences are still publicly available but are highlighted as revoked.

Schema

A schema is a part of the configuration and describes the data structure of an instance. It includes the list of organisms and, for each organism, the available metadata fields and segments.

Segment

A segment refers to a part of a genome. Some viruses only have one segment (e.g., SARS-CoV-2 and mpox) while others have multiple (e.g., Influenza A has 8 segments). Loculus supports both single- and multi-segmented organisms. In terms of the data structure, each segment is a nucleotide sequence; i.e., for a multi-segmented organism, there are multiple nucleotide sequences per sequence entry.

Sequence entry

A sequence entry consists of a genome sequence (or sequences if the organisms has a segmented genome) and associated metadata. It is the main entity of the Loculus application. Users submit sequence entries and search for sequence entries. Each sequence entry has its own accession. Changes to sequence entries are versioned, meaning that a sequence entry can have multiple versions.

SILO

SILO is an open-source query engine for genomic sequences. It is usually used together with LAPIS which provides more convenient APIs. In Loculus, there is a SILO instance for each organism.

Submission

A submission adds new sequence entries. See also revision and revocation.

Submission ID

When users upload sequence entries, they have to provide a submission ID to link the entries in the metadata file and the FASTA file. Each submission ID must be unique within the submission, but re-use across submissions is acceptable.

Submitter

A submitter is a user who submitted (or revised or revoked) a sequence.

Submitting group

In Loculus, every sequence entry belongs to a submitting group. A submitting group can have one or more users and a user may be member of multiple groups. A member of a group may submit new sequences or revise or revoke existing sequences on behalf of the group.

Substitution

A substitution is a type of mutation where at a given position in a sample a nucleotide or amino acid differs from the reference sequence. The notation for a mutation in the case of a single-segmented nucleotide sequence is <base of reference genome><position><base of the sequence> (e.g., C100T). A mutation in the case of a multi-segmented nucleotide sequence or an amino acid sequence is further prefixed with the segment or gene name by adding <segment/gene name>: (e.g., E:S100K).

Superuser

A superuser is a user role. Superusers have the privileges to act on behalf of any submitting group. This role is designed to be used by curators.

Unaligned sequence

An unaligned sequence is a sequence that has not undergone alignment. It may or may not have the same length as the reference sequence. Generally users upload unaligned sequences.

Unprocessed data

The unprocessed data consists of the original submissions, including unaligned sequences and their accompanying metadata. Unprocessed data needs to be processed by the preprocessing pipeline. Users can usually only see the processed data.

Version

Sequence entries are versioned and every revision creates a new version. The first version is 1.

Website

The “Loculus website” is the frontend part of Loculus. It interacts with the backend and LAPIS through their APIs. The website is written in TypeScript and uses the frameworks Astro and React.