CUBScout only works with coding sequences.
CUBScout does not identify ORFs, pause at stop codons, or parse non-nucleotide characters. It is assumed the coding sequences you provide are in-frame and don't contain 5' or 3' untranslated regions. Codons which have non-specific nucleotides, like "W", are skipped. Sequences with characters outside of those recognized by BioSequences will throw an error.
CUBScout functions, like
count_codons, are meaningful when applied to a single nucleotide sequence. However, most
CUBScout functions are designed to work at the genome-level, and calculate metrics that rely on comparisons between multiple genes. Specifically, none of the codon usage bias or expressivity functions accept a single nucleotide sequence; all expect to operate across a set of sequences, whether in a fasta file or vector of BioSequences.
Most functions in
CUBScout accept any FASTA-formatted file (e.g. .fa, .fna, .fasta) where each entry corresponds to coding sequences or open readings frames.
CUBScout accepts either a
String which is the complete filepath to a fast-formatted file, or objects of type
IO which point to a fasta-formatted file. There is no significant performance advantage between these three options, unless you already have an
FASTAReader open for another purpose.
CUBScout functions also accept nucleotide sequences from BioSequences (
<:NucSeq). Keep in mind that most
CUBScout functions are designed to operate across genomes, and so accept a vector of nucleotide sequences. The vector corresponds to a genome, with each DNA or RNA string corresponding to a coding sequence.
While there is a slight performance advantage in
CUBScout functions when supplying BioSequences as an input rather than a filepath, supplying filepaths will still be faster than the cumulative time spent reading in a BioSequence and then running a
CUBScout function. This will also use less memory and so is generally recommended, unless you already have BioSequences loaded into Julia's environment for a separate reason.