FASTA formatted files

NB: First read the overview in the sidebar

FASTA is a text-based file format for representing biological sequences. A FASTA file stores a list of sequence records with name, description, and sequence.

The template of a sequence record is:

>{description}
{sequence}

Where the "identifier" is the first part of the description up to the first whitespace (or the entire description if there is no whitespace)

Here is an example of a chromosomal sequence:

>chrI chromosome 1
CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACC
CACACACACACATCCTAACACTACCCTAACACAGCCCTAATCTA

Here:

The identifier is "chrI"
The description is "chrI chromosome 1", containing the identifier
The sequence is the DNA sequence "CCACA..."

The `FASTARecord`

FASTA records are, by design, very lax in what they can contain. They can contain almost arbitrary byte sequences, including invalid unicode, and trailing whitespace on their sequence lines, which will be interpreted as part of the sequence. If you want to have more certainty about the format, you can either check the content of the sequences with a regex, or (preferably), convert them to the desired BioSequence type.

FASTX.FASTA.Record — Type

FASTA.Record

Mutable struct representing a FASTA record as parsed from a FASTA file. The content of the record can be queried with the following functions: identifier, description, sequence.

FASTA records are un-typed, i.e. they are agnostic to what kind of data they contain.

Examples

julia> rec = parse(FASTARecord, ">some header\nTAqA\nCC");

julia> identifier(rec)
"some"

julia> description(rec)
"some header"

julia> sequence(rec)
"TAqACC"

julia> typeof(description(rec)) == typeof(sequence(rec)) <: AbstractString
true

`FASTAReader` and `FASTAWriter`

FASTAWriter can optionally be passed the keyword width to control the line width. If this is zero or negative, it will write all record sequences on a single line. Else, it will wrap lines to the given maximal width.

Reference:

FASTX.FASTA — Module

FASTA

Module under FASTX with code related to FASTA files.

FASTX.FASTA.Reader — Type

FASTA.Reader(input::IO; index=nothing, copy::Bool=true)

Create a buffered data reader of the FASTA file format. The reader is a BioGenerics.IO.AbstractReader, a stateful iterator of FASTA.Record. Readers take ownership of the underlying IO. Mutating or closing the underlying IO not using the reader is undefined behaviour. Closing the Reader also closes the underlying IO.

See more examples in the FASTX documentation.

Arguments

input: data source
index: Optional random access index (currently fai is supported). index can be nothing, a FASTA.Index, or an IO in which case an index will be parsed from the IO, or AbstractString, in which case it will be treated as a path to a fai file.
copy::Bool: iterating returns fresh copies instead of the same Record. Set to false for improved performance, but be wary that iterating mutates records.

Examples

julia> rdr = FASTAReader(IOBuffer(">header\nTAG\n>another\nAGA"));

julia> records = collect(rdr); close(rdr);

julia> foreach(println, map(identifier, records))
header
another

julia> foreach(println, map(sequence, records))
TAG
AGA

FASTX.FASTA.Writer — Type

FASTA.Writer(output::IO; width=70)

Create a data writer of the FASTA file format. The writer is a BioGenerics.IO.AbstractWriter. Writers take ownership of the underlying IO. Mutating or closing the underlying IO not using the writer is undefined behaviour. Closing the writer also closes the underlying IO.

See more examples in the FASTX documentation.

Arguments

output: Data sink to write to
width: Wrapping width of sequence characters. If < 1, no wrapping.

Examples

julia> FASTA.Writer(open("some_file.fna", "w")) do writer
    write(writer, record) # a FASTA.Record
end

FASTX.FASTA.validate_fasta — Function

validate_fasta(io::IO) >: Nothing

Check if io is a valid FASTA file. Return nothing if it is, and an instance of another type if not.

Examples

julia> validate_fasta(IOBuffer(">a bc\nTAG\nTA")) === nothing
true

julia> validate_fasta(IOBuffer(">a bc\nT>G\nTA")) === nothing
false

FASTA formatted files

The FASTARecord

FASTAReader and FASTAWriter

Reference:

The `FASTARecord`

`FASTAReader` and `FASTAWriter`