FASTA formatted files

FASTA is a text-based file format for representing biological sequences. A FASTA file stores a list of sequence records with name, description, and sequence.

The template of a sequence record is:


Where the "identifier" is the first part of the description up to the first whitespace (or the entire description if there is no whitespace)

Here is an example of a chromosomal sequence:

>chrI chromosome 1


  • The identifier is "chrI"
  • The description is "chrI chromosome 1", containing the identifier
  • The sequence is the DNA sequence "CCACA..."

The FASTARecord

FASTA records are, by design, very lax in what they can contain. They can contain almost arbitrary byte sequences, including invalid unicode, and trailing whitespace on their sequence lines, which will be interpreted as part of the sequence. If you want to have more certainty about the format, you can either check the content of the sequences with a regex, or (preferably), convert them to the desired BioSequence type.


Mutable struct representing a FASTA record as parsed from a FASTA file. The content of the record can be queried with the following functions: identifier, description, sequence.

FASTA records are un-typed, i.e. they are agnostic to what kind of data they contain.

See also: FASTA.Reader, FASTA.Writer


julia> rec = parse(FASTARecord, ">some header\nTAqA\nCC");

julia> identifier(rec)

julia> description(rec)
"some header"

julia> sequence(rec)

julia> typeof(description(rec)) == typeof(sequence(rec)) <: AbstractString

FASTAReader and FASTAWriter

FASTAWriter can optionally be passed the keyword width to control the line width. If this is zero or negative, it will write all record sequences on a single line. Else, it will wrap lines to the given maximal width.



Module under FASTX with code related to FASTA files.

FASTA.Reader(input::IO; index=nothing, copy::Bool=true)

Create a buffered data reader of the FASTA file format. The reader is a BioGenerics.IO.AbstractReader, a stateful iterator of FASTA.Record. Readers take ownership of the underlying IO. Mutating or closing the underlying IO not using the reader is undefined behaviour. Closing the Reader also closes the underlying IO.

See more examples in the FASTX documentation.

See also: FASTA.Record, FASTA.Writer


  • input: data source
  • index: Optional random access index (currently fai is supported). index can be nothing, a FASTA.Index, or an IO in which case an index will be parsed from the IO, or AbstractString, in which case it will be treated as a path to a fai file.
  • copy::Bool: iterating returns fresh copies instead of the same Record. Set to false for improved performance, but be wary that iterating mutates records.


julia> rdr = FASTAReader(IOBuffer(">header\nTAG\n>another\nAGA"));

julia> records = collect(rdr); close(rdr);

julia> foreach(println, map(identifier, records))

julia> foreach(println, map(sequence, records))
FASTA.Writer(output::IO; width=70)

Create a data writer of the FASTA file format. The writer is a BioGenerics.IO.AbstractWriter. Writers take ownership of the underlying IO. Mutating or closing the underlying IO not using the writer is undefined behaviour. Closing the writer also closes the underlying IO.

See more examples in the FASTX documentation.

See also: FASTA.Record, FASTA.Reader


  • output: Data sink to write to
  • width: Wrapping width of sequence characters. If < 1, no wrapping.


julia> FASTA.Writer(open("some_file.fna", "w")) do writer
    write(writer, record) # a FASTA.Record
validate_fasta(io::IO) >: Nothing

Check if io is a valid FASTA file. Return nothing if it is, and an instance of another type if not.


julia> validate_fasta(IOBuffer(">a bc\nTAG\nTA")) === nothing

julia> validate_fasta(IOBuffer(">a bc\nT>G\nTA")) === nothing