Search sequences
Mutations
Nucleotide mutations and insertions
A nucleotide mutation has the format <position><base>
or <base_ref><position><base>
. A <base>
can be one of the four nucleotides A
, T
, C
, and G
. It can also be -
for deletion and N
for unknown. For example if the reference sequence is A
at position 23 both: 23T
and A23T
will yield the same results.
If your organism is multi-segmented you must append the name of the segment to the start of the mutation, e.g. S:23T
and S:A23T
for a mutation in segment S
.
Insertions can be searched for in the same manner, they just need to have ins_
appended to the start of the mutation. Example ins_10462:A
or if the organism is multi-segmented ins_S:10462:A
.
Amino acid mutations and insertions
An amino acid mutation has the format <gene>:<position><base>
of <gene>:<base_ref><position><base>
. A <base>
can be one of the 20 amino acid codes. It can also be -
for deletion and X
for unknown. Example: E:57Q
.
Insertions can be searched for in the same manner, they just need to have ins_
appended to the start of the mutation. Example ins_NS4B:31:N
.
Insertion wildcards
Loculus supports insertion queries that contain wildcards ?
. For example ins_S:214:?EP?
will match all cases where segment S
has an insertion of EP
between the positions 214 and 215 but also an insertion of other AAs which include the EP
, e.g. the insertion EPE
will be matched.
You can also use wildcards to match any insertion at a given position. For example ins_S:214:?:
will match any (but at least one) insertion between the positions 214 and 215.
Multiple mutations
Multiple mutation filters can be provided by adding one mutation after the other.
Any mutation
To filter for any mutation at a given position you can omit the <base>
.
Date ranges
Dates like the collection date of a sequence can’t always be exactly given as a single day, but are sometimes only known as a date range (“This sequence was collected sometime during this week”). When searching sequences by collection date, there are two approaches when searching for sequences that fall into a particular range: Look for complete overlap or partial overlap.
The graphic above illustrates this, in Loculus these two modes are called “strict” and “not strict”. Strict means, that the range of the date of the sequences must wholy be contained in the search range. Not strict means that a partial overlap is sufficient.