FASTA format
The FASTA format is a standard way to store sequence data along with optional metadata. Loculus provides sequence data in FASTA files and expects FASTA-formatted input when sequences are submitted.
Each sequence entry begins with a metadata line starting with the > character. For example:
>TTKC257461 2021-05-12, CongoTTATGCTTCGTAAAATGTAGGTCTTGAACCAAACATTCTTTGAAAAAATGAGATGCATAAAACTTTATTATCCAATAGATTAACTATTTCAGACGTCAATCGTTTAAAGTAAACTTCGTAThe text immediately following > and extending to the first space (or the end of the line) is the ID of the sequence or segment. In the example above, the ID is TTKC257461.
For isolates composed of multiple segments, Loculus requires one metadata entry per sample, and every segment must appear as a separate sequence in the uploaded FASTA file.
The metadata file should include a field named fastaIds, containing a space-separated list of all FASTA IDs associated with that sample. For example, if the following three sequences correspond to the metadata entry NIPAK-sample, the fastaIds should be:
test_NIHPAK-19_L test_NIHPAK-19_M test_NIHPAK-19_SExample sequences:
>test_NIHPAK-19_LCCACATTGACACAGANAGCTCCAGTAGTGGTTCTCTGTCCTTATTAAACCATGGACTTCTTAAGAAACCTTGACTGGACTCAGGTGATTGCTAGTCAGTATGTGACCAATCCCAGGTTTAATATCTCTGATTACTTCGAGATTGTTCGACAGCCTGAAGCAGATAAGTCTTCACTACTCATGAGTTTC>test_NIHPAK-19_MGTGGATTGAGCATCTTAATTGCAGCATACTTGTCAACATCATGCATATATCATTGATGTATGCAGTTTTCTGCTTGCAGCTGTGCGGTCTAGGGAAAACTAACGGACTACACAATGGGACTGAACACAATAAGACACACGTTATGACAACGCCTGATGACAGTCGGAAGAGCTGTGAAATAGACAGTATC>test_NIHPAK-19_SGTGTTCTCTTGAGTGTTGGCAAAATGGAAAACAAAATCGAGGTGAACAACAAAGATGAGATGAACAAATGGTTTGAGGAGTTCAAGAAAGGAAATGGACTTGTGGACACTTTCACAAACTCNTATTCCTTTTGTGAAAGCGTNCCAAATCTGGACAGNTTTGTNAATGGAGAAAAGACATAGGCTTCCGTGTCAA segment cannot be empty.