Iteration
As you might expect, sequence types are iterators over their elements:
julia> n = 0
0
julia> for nt in dna"ATNGNNT"
if nt == DNA_N
global n += 1
end
end
julia> n
3
However, there are many other ways to iterate over a biological sequence, and they are described in the following sections.
Kmers and Skipmers
Kmers
To iterate over every overlapping kmer in a longer DNA or RNA sequence, use the each
method:
BioSequences.each
— Methodeach(::Type{T}, seq::BioSequence) where {T<:AbstractMer}
Initialize an iterator over all overlapping k-mers in a sequence seq
skipping ambiguous nucleotides without changing the reading frame.
Each iteration yields a MerIterResult
struct that has the following fields:
position
: the position in the sequence at which the mer began.fw
: the mer in the same orientation as the sequence from which it was generated.bw
: the reverse complement of thefw
mer.
Iterating over mers in a sequence builds both a mer and it's reverse complement at the same time, as it is more efficient, and it is a common requirement; for example during mapping or constructing assemblies it is often useful to know if the mer your are processing is a canonical mer or not.
Kmers with jumps
You can also iterate over non-overlapping kmers using a step
parameter.
each(DNAMer{27}, seq, 10)
Skipmers
To iterate over Mers using the Skipmer method of selecting nucleotides, then you provide a bases_per_cycle
and a cycle_len
parameter. For example, where bases_per_cycle = 2
and cycle_len = 3
.
each(DNAMer{27}, seq, 2, 3)
Positions
You can iterate over the positions/residues of a sequence that satisfy some condition or query function:
BioSequences.each
— Methodeach(f::Function, seq::BioSequence) = ConditionIterator(f, seq)
Return an iterator over each element in seq
that satisfies some function f
.
This is similar to the Iterators.filter
iterator, except that every iteration yields a tuple of the position of the residue that satisfied f
, and the residue itself.
julia> dna_seq = dna"NATTCGRATY"
10nt DNA Sequence:
NATTCGRATY
julia> # Iterate over each ambiguous residue
julia> for (pos, nuc) in each(isambiguous, dna_seq)
println("Position $pos is $nuc")
end
Position 1 is N
Position 7 is R
Position 10 is Y
julia> # Iterate over each non-ambiguous residue
julia> for (pos, nuc) in each(iscertain, dna_seq)
println("Position $pos is $nuc")
end
Position 2 is A
Position 3 is T
Position 4 is T
Position 5 is C
Position 6 is G
Position 8 is A
Position 9 is T