FASTA formatted files

FASTA formatted files

FASTA is a text-based file format for representing biological sequences. A FASTA file stores a list of sequence records with name, description, and sequence.

The template of a sequence record is:

>{name} {description}?

Here is an example of a chromosomal sequence:

>chrI chromosome 1

Readers and Writers

The reader and writer for FASTA formatted files, are found within the FASTA submodule of FASTX.

They can be created with IOStreams.

using FASTX

r = FASTA.Reader(open("my-seqs.fasta", "r"))
w = FASTA.Writer(open("my-out.fasta", "w"))

As always with julia IO types, remember to close your file readers and writer after you are finished.

Using open with a do-block can help ensure you close a stream after you are finished. is overloaded with a method for this purpose.

r = open(FASTA.Reader, "my-seqs.fasta")
w = open(FASTA.Writer, "my-out.fasta")

Usually sequence records will be read sequentially from a file by iteration.

open(FASTA.Reader, "my-seqs.fasta") do reader
    for record in reader
        ## Do something
        # like showing the identifiers
        @show FASTA.identifier(record)

Gzip compressed files can be streamed to the Reader using the CodecZlib.jl package.

reader = FASTA.Reader(GzipDecompressorStream(open("my-reads.fasta.gz")))
for record in reader
    ## do something

You can also overwrite records in a while loop to avoid excessive memory allocation.

open(FASTA.Reader, "my-seqs.fasta") do reader
    record = FASTA.Record()
    while !eof(reader)
        read!(reader, record)
        ## Do something.

But if the FASTA file has an auxiliary index file formatted in fai, the reader supports random access to FASTA records, which would be useful when accessing specific parts of a huge genome sequence:

open(FASTA.Reader, "sacCer.fa", index = "sacCer.fa.fai") do reader
    chrIV = reader["chrIV"]  # directly read sequences called chrIV.

Reading in a sequence from a FASTA formatted file will give you a variable of type FASTA.Record.

Various getters and setters are available for FASTA.Records:

To write a BioSequence to FASTA file, you first have to create a FASTA.Record:

using BioSequences
x = dna"aaaaatttttcccccggggg"
rec = FASTA.Record("MySeq", x)
open(FASTA.Writer, "my-out.fasta") do
    write(w, rec)