API

##Import

using CAOS

Index

Functions

CAOS.add_blanksMethod.
add_blanks(query_path::String, db_path::String, character_labels::Dict{String,String},
character_labels_no_gaps::Dict{String,String} ; return_blast::Bool=false)

Adds blanks to an input sequence given a database.

Arguments

  • query_path::String: path to the query file.
  • db_path::String: path to the blast database.
  • character_labels::Dict{String,String}: a mapping of the character labels to the corresponding sequences.
  • character_labels_no_gaps::Dict{String,String}: character labels with gaps removed from sequences.
  • return_blast::Bool=false: whether to return blast results.
  • protein::Bool=false: if protein sequence.
CAOS.add_nodes!Method.
add_nodes!(tree::Node,sPu::Array{Dict{String,Any}},sPr::Array{Dict{String,Any}},
cPu::Array{Dict{String,Any}},cPr::Array{Dict{String,Any}},taxa_labels::Dict{String,String},
character_labels::Dict{String,String},nodes::Array{Dict{String,Any}},node_num::Int64;complex::Bool=true)

Takes a tree (Node), adds all the CA's from the entire tree into the internal representation.

Arguments

  • tree::Node: the tree represented as a Node.
  • sPu::Array{Dict{String,Any}}: an array of simple pure rules.
  • sPr::Array{Dict{String,Any}}: an array of simple private rules.
  • cPu::Array{Dict{String,Any}}: an array of complex pure rules.
  • cPr::Array{Dict{String,Any}}: an array of complex private rules.
  • taxa_labels::Dict{String,String}: a mapping of the taxa labels to the character labels.
  • character_labels::Dict{String,String}: a mapping of the character labels to the corresponding sequences.
  • nodes::Array{Dict{String,Any}}: an array of nodes.
  • node_num::Int64: the current node number.
  • complex::Bool=true: indicates whether complex rules should be calculated
  • protein::Bool=false: indicates whether dataset is a protein (or nucleotide)
classify_new_sequence(tree::Node, character_labels::Dict{String,String}, taxa_labels::Dict{String,String},
sequence_file_path::String, output_directory::String ; all_CA_weights::Dict{Int64,
Dict{String,Int64}}=Dict(1=>Dict("sPu"=>1,"sPr"=>1,"cPu"=>1,"cPr"=>1)), occurrence_weighting::Bool=false,
tiebreaker::Vector{Dict{String,Int64}}=[Dict{String,Int64}()], combo_classification::Bool=false)

Takes a tree (Node) and a sequence, and classifies the new sequence using the CAOS tree.

Arguments

  • tree::Node: the tree represented as a Node.
  • character_labels::Dict{String,String}: a mapping of the character labels to the corresponding sequences.
  • taxa_labels::Dict{String,String}: a mapping of the taxa labels to the character labels.
  • sequence_file_path::String: a file path to the sequence to classify.
  • output_directory::String: path to the output directory.
  • all_CA_weights::Dict{Int64,Dict{String,Int64}}=Dict(1=>Dict("sPu"=>1,"sPr"=>1,"cPu"=>1,"cPr"=>1)): CA weights to be used.
  • occurrence_weighting::Bool=false: whether to use occurence weighting in classification.
  • tiebreaker::Vector{Dict{String,Int64}}=[Dict{String,Int64}()]: tiebreaker to be used in classification.
  • combo_classification::Bool=false: whether to use a combo of Blast and CAOS for classification.
classify_sequence(sequence::String, tree::Node, CA_weights::Dict{String,Int64},
all_CA_weights::Dict{Int64,Dict{String,Int64}}, occurrence_weighting::Bool,
depth::Int64, tiebreaker::Vector{Dict{String,Int64}} ; blast_results=["Fake Label"], combo_classification::Bool=false, protein::Bool=false)

Classifies an input sequence given a phylogentic tree.

Arguments

  • sequence::String: sequence to count matches.
  • tree::Node: the tree represented as a Node.
  • CA_weights::Dict{String,Int64}: weights to use for CA counts.
  • all_CA_weights::Dict{Int64,Dict{String,Int64}}: all sets of weights to use for CA counts.
  • occurrence_weighting::Bool: whether to use occurrence weighting during counting.
  • depth::Int64: current depth of the tree.
  • tiebreaker::Vector{Dict{String,Int64}}: tiebreaking procedures to use.
  • blast_results=["Fake Label"]: list of blast results.
  • combo_classification::Bool=false: whether to use both Blast and CAOS for classification.
convert_to_struct(tree_dict::Dict{String,Any}, tree_obj::Node)

Takes a tree loaded from json and convert it back to a proper internal representation.

Arguments

  • tree_dict::Dict{String,Any}: tree as a dictionary after being read from json.
  • tree_obj::Node: the tree (Node).
downsample_taxa(taxa::Array{String}, perc_keep::Float64)

Downsamples taxa by a certain percentage.

Arguments

  • taxa::Array{String}: list of taxa.
  • perc_keep::Float64: percentage of taxa to keep.
CAOS.find_sequenceMethod.
find_sequence(tree::Node, taxa_label::String)

Takes a tree (Node) and a taxa label and finds the subtree containing that sequence.

Arguments

  • tree::Node: the tree represented as a Node.
  • taxa_label::String: taxa label.
generate_caos_rules(tree_file_path::String, output_directory::String)

Takes a Nexus file and generates all the CAOS rules for the tree.

Arguments

  • tree_file_path::String: path to the Nexus file.
  • output_directory::String: path to the output directory.
get_adjusted_start(original_start::Int, subject::String)

Adjusts the start of the matched subject based on its blanks.

Arguments

  • original_start::Int: the index of the original starting position.
  • subject::String: the matched subject.
get_all_neighbors(tree::Node, character_labels::Dict{String,String}, taxa_label::String)

Takes a tree (Node) and a taxa label and finds all the neighbors (including duplicates).

Arguments

  • tree::Node: the tree represented as a Node.
  • character_labels::Dict{String,String}: character label mappings.
  • taxa_label::String: taxa label.
get_cPu_and_cPr(nodes::Array{Dict{String,Any}}, node_num::Int64, taxa_labels::Dict{String,String},
character_labels::Dict{String,String}, sPu::Array{Dict{String,Any}}, sPr::Array{Dict{String,Any}})

Gets all the cPu and cPr for the entire character sequence at a specific node (does not support nucleotide options).

Arguments

  • nodes::Array{Dict{String,Any}}: list of nodes.
  • node_num::Int64: current node index.
  • taxa_labels::Dict{String,String}: a mapping of the taxa labels to the character labels.
  • character_labels::Dict{String,String}: a mapping of the character labels to the corresponding sequences.
  • sPu::Array{Dict{String,Any}}: list of simple pure rules.
  • sPr::Array{Dict{String,Any}}: list of simple private rules.
get_descendents(tree::Node)

Gets descendents of a Node (tree or subtree).

Arguments

  • tree::Node: the tree represented as a Node.
get_first_taxa_from_tree(tree::Node)

Gets the first taxa from a tree.

Arguments

  • tree::Node: the tree represented as a Node.
get_group_combos(group_taxa::Array{Array{String}})

Gets all the combinations of group vs non groups.

Arguments

  • group_taxa::Array{Array{String}}: list of taxa within a group.
get_group_taxa_at_node(nodes::Array{Dict{String,Any}}, node_num::Int64)

Gets the sets of taxa for each group at a node.

Arguments

  • nodes::Array{Dict{String,Any}}: list of nodes.
  • node_num::Int64: current node index.
CAOS.get_max_depthMethod.
get_max_depth(tree::Node, depth::Int64)

Takes a tree (Node) and gets the maximum depth.

Arguments

  • tree::Node: the tree represented as a Node.
  • depth::Int64: current depth.
CAOS.get_neighborsMethod.
get_neighbors(tree::Node, taxa_label::String)

Takes a tree (Node) and a taxa label and finds all the neighbors (taxa that come after from the subtree containing the input taxa).

Arguments

  • tree::Node: the tree represented as a Node.
  • taxa_label::String: taxa label.
CAOS.get_nodesMethod.
get_nodes(tree::String ; taxa_to_remove::Union{Array{String,1},Bool}=false)

Takes a tree in Newick format, returns an internal representation of the tree.

Arguments

  • tree::String: the tree in Newick format.
  • taxa_to_remove::Union{Array{String,1},Bool}=false: the taxa that will be removed (if applicable).
get_sPu_and_sPr(nodes::Array{Dict{String,Any}}, node_num::Int64,
taxa_labels::Dict{String,String}, character_labels::Dict{String,String})

Gets all the sPu and sPr for the entire character sequence at a specific node.

Arguments

  • nodes::Array{Dict{String,Any}}: list of nodes.
  • node_num::Int64: current node index.
  • taxa_labels::Dict{String,String}: a mapping of the taxa labels to the character labels.
  • character_labels::Dict{String,String}: a mapping of the character labels to the corresponding sequences.
CAOS.load_treeMethod.
load_tree(directory::String)

Loads a CAOS tree from file.

Arguments

  • directory::String: path to directory where tree exists.
CAOS.parse_treeMethod.
parse_tree(file_path::String; taxa_to_remove::Union{Array{String,1},Bool}=false)

Takes a Nexus file for a tree, returns an internal representation of that tree (and other relevant information).

Arguments

  • file_path::String: file path to the Nexus file.
  • taxa_to_remove::Union{Array{String,1},Bool}=false: the taxa that will be removed (if applicable).
CAOS.remove_blanksMethod.
remove_blanks(char_label_dict::Dict{String,String} ; change_to_N::Bool=false)

Changes all blanks to N's in character sequences.

Arguments

  • char_label_dict::Dict{String,String}: character label mappings.
  • change_to_N::Bool=false: whether to change to N or just remove.
remove_from_tree!(tree_tokens::Vector{String}, taxa_to_remove::Union{Array{String,1},Bool})

Takes a tree in Newick format, removes a specific taxa from the tree.

Arguments

  • tree_tokens::Vector{String}: the tree in Newick format, tokenized.
  • taxa_to_remove::Union{Array{String,1},Bool}: the taxa that will be removed.
CAOS.CA_matchesMethod.
CA_matches(sequence::String, CAs::Vector{Rule}, CA_weights::Dict{String,Int64}, occurrence_weighting::Bool)

Counts the number of CA's matched by a sequence (only support for simple rules).

Arguments

  • sequence::String: sequence to count matches.
  • CAs::Vector{Rule}: list of all CA's.
  • CA_weights::Dict{String,Int64}: weights to use for CA counts.
  • occurrence_weighting::Bool: whether to use occurrence weighting during counting.
add_blanks_to_back(subject::String, query::String, new_seq::String,
subj_len::Int64, query_len::Int64, subj_non_blanks::Int64,
hitnames::Vector{String}, hit_idx::Int64, character_labels::Dict{String,String})

Adds blanks to the back of a sequence from a blast match.

Arguments

  • subject::String: the subject the query is being matched to.
  • query::String: the query that is having blanks added to it.
  • new_seq::String: the new sequence (query with added blanks).
  • subj_len::Int64: length of the subject.
  • query_len::Int64: length of the query.
  • subj_non_blanks::Int64: number of non blanks in the subject.
  • hitnames::Vector{String}: list of blast hits.
  • hit_idx::Int64: index of the current blast hit.
  • character_labels::Dict{String,String}: a mapping of the character labels to the corresponding sequences.
add_blanks_to_front(subject::String, query::String, new_seq::String,
subj_len::Int64, query_len::Int64, subj_non_blanks::Int64,
hitnames::Vector{String}, hit_idx::Int64, character_labels::Dict{String,String})

Adds blanks to the front of a sequence from a blast match.

Arguments

  • subject::String: the subject the query is being matched to.
  • query::String: the query that is having blanks added to it.
  • new_seq::String: the new sequence (query with added blanks).
  • subj_len::Int64: length of the subject.
  • query_len::Int64: length of the query.
  • subj_non_blanks::Int64: number of non blanks in the subject.
  • hitnames::Vector{String}: list of blast hits.
  • hit_idx::Int64: index of the current blast hit.
  • character_labels::Dict{String,String}: a mapping of the character labels to the corresponding sequences.
CAOS.get_best_hitMethod.
get_best_hit(results::Array{BioTools.BLAST.BLASTResult,1}, query::String,
character_labels::Dict{String,String},  character_labels_no_gaps::Dict{String,String})

Gets the hit from blastn that has the most sequence coverage with no gaps compared to the query sequence.

Arguments

  • results::Array{BioTools.BLAST.BLASTResult,1}: blastn results.
  • query::String: the query that is having blanks added to it.
  • character_labels::Dict{String,String}: a mapping of the character labels to the corresponding sequences.
  • character_labels_no_gaps::Dict{String,String}: character labels with gaps removed from sequences.
get_duplicate_labels(character_labels::Dict{String,String}, label::String)

Takes the character labels and a specific label and finds if any other sequences are the same.

Arguments

  • character_labels::Dict{String,String}: character label mappings.
  • label::String: taxa label to search for duplicates of.
CAOS.get_next_hitMethod.
get_next_hit(hitnames::Vector{String}, hit_idx::Int64)

Gets the next best hit returned from a blastn search.

Arguments

  • hitnames::Vector{String}: a list of all blastn hitnames.
  • hit_idx::Int64: index of the current hit.
CAOS.NodeType.
Node(CAs::Array{Rule,1}, taxa_label::String="")

Struct to store a node (is recursive).

CAOS.RuleType.
Rule(idxs::Tuple{Vararg{Int}}, char_attr::Tuple{Vararg{Char}},
is_pure::Bool, num_group::Int, num_non_group::Int, occurances::Int)

Struct to store relevant information about a CA.