BioStockholm.jl

Build Status Aqua QA

Julia parser for the Stockholm file format (.sto) used for multiple sequence alignments of protein, RNA, or DNA sequences (Pfam, Rfam, etc databases). This package uses Automa.jl under the hood to generate a finite state machine parser.

Installation

Enter the package mode from the Julia REPL by pressing ], then install with:

add BioStockholm

Usage

using BioStockholm

msa = MSA{Char}(;
    seq = Dict("human"   => "ACACGCGAAA.GCGCAA.CAAACGUGCACGG",
               "chimp"   => "GAAUGUGAAAAACACCA.CUCUUGAGGACCU",
               "bigfoot" => "UUGAG.UUCG..CUCGUUUUCUCGAGUACAC"),
     GC = Dict("SS_cons" => "...<<<.....>>>....<<....>>.....")
)

# read from file
# example2.sto contains an example Stockholm file
msa_path = joinpath(dirname(pathof(BioStockholm)), "..",
                    "test", "example2.sto")
msa_str = read(msa_path, String)
print(msa_str)

# read from a file or parse from a String
msa = read(msa_path, MSA)
msa = parse(MSA, msa_str)

# write to a file
write("foobar.sto", msa)

# pretty-print
print(msa)
print(stdout, msa)

Limitations / TODO

  • when writing, long sequences or text is never split over multiple lines
  • integrate with BioJulia string types

MIToS.jl is a package for analysing protein sequences that also supports parsing the Stockholm format (and many more things).