##Import
using CAOS
Index
CAOS.Node
CAOS.Rule
CAOS.CA_matches
CAOS.add_blanks
CAOS.add_blanks_to_back
CAOS.add_blanks_to_front
CAOS.add_nodes!
CAOS.classify_new_sequence
CAOS.classify_sequence
CAOS.convert_to_struct
CAOS.downsample_taxa
CAOS.find_sequence
CAOS.generate_caos_rules
CAOS.get_adjusted_start
CAOS.get_all_neighbors
CAOS.get_best_hit
CAOS.get_cPu_and_cPr
CAOS.get_descendents
CAOS.get_duplicate_labels
CAOS.get_first_taxa_from_tree
CAOS.get_group_combos
CAOS.get_group_taxa_at_node
CAOS.get_max_depth
CAOS.get_neighbors
CAOS.get_next_hit
CAOS.get_nodes
CAOS.get_sPu_and_sPr
CAOS.load_tree
CAOS.parse_tree
CAOS.remove_blanks
CAOS.remove_from_tree!
Functions
CAOS.add_blanks
— Method.add_blanks(query_path::String, db_path::String, character_labels::Dict{String,String},
character_labels_no_gaps::Dict{String,String} ; return_blast::Bool=false)
Adds blanks to an input sequence given a database.
Arguments
query_path::String
: path to the query file.db_path::String
: path to the blast database.character_labels::Dict{String,String}
: a mapping of the character labels to the corresponding sequences.character_labels_no_gaps::Dict{String,String}
: character labels with gaps removed from sequences.return_blast::Bool=false
: whether to return blast results.protein::Bool=false
: if protein sequence.
CAOS.add_nodes!
— Method.add_nodes!(tree::Node,sPu::Array{Dict{String,Any}},sPr::Array{Dict{String,Any}},
cPu::Array{Dict{String,Any}},cPr::Array{Dict{String,Any}},taxa_labels::Dict{String,String},
character_labels::Dict{String,String},nodes::Array{Dict{String,Any}},node_num::Int64;complex::Bool=true)
Takes a tree (Node), adds all the CA's from the entire tree into the internal representation.
Arguments
tree::Node
: the tree represented as a Node.sPu::Array{Dict{String,Any}}
: an array of simple pure rules.sPr::Array{Dict{String,Any}}
: an array of simple private rules.cPu::Array{Dict{String,Any}}
: an array of complex pure rules.cPr::Array{Dict{String,Any}}
: an array of complex private rules.taxa_labels::Dict{String,String}
: a mapping of the taxa labels to the character labels.character_labels::Dict{String,String}
: a mapping of the character labels to the corresponding sequences.nodes::Array{Dict{String,Any}}
: an array of nodes.node_num::Int64
: the current node number.complex::Bool=true
: indicates whether complex rules should be calculatedprotein::Bool=false
: indicates whether dataset is a protein (or nucleotide)
CAOS.classify_new_sequence
— Method.classify_new_sequence(tree::Node, character_labels::Dict{String,String}, taxa_labels::Dict{String,String},
sequence_file_path::String, output_directory::String ; all_CA_weights::Dict{Int64,
Dict{String,Int64}}=Dict(1=>Dict("sPu"=>1,"sPr"=>1,"cPu"=>1,"cPr"=>1)), occurrence_weighting::Bool=false,
tiebreaker::Vector{Dict{String,Int64}}=[Dict{String,Int64}()], combo_classification::Bool=false)
Takes a tree (Node) and a sequence, and classifies the new sequence using the CAOS tree.
Arguments
tree::Node
: the tree represented as a Node.character_labels::Dict{String,String}
: a mapping of the character labels to the corresponding sequences.taxa_labels::Dict{String,String}
: a mapping of the taxa labels to the character labels.sequence_file_path::String
: a file path to the sequence to classify.output_directory::String
: path to the output directory.all_CA_weights::Dict{Int64,Dict{String,Int64}}=Dict(1=>Dict("sPu"=>1,"sPr"=>1,"cPu"=>1,"cPr"=>1))
: CA weights to be used.occurrence_weighting::Bool=false
: whether to use occurence weighting in classification.tiebreaker::Vector{Dict{String,Int64}}=[Dict{String,Int64}()]
: tiebreaker to be used in classification.combo_classification::Bool=false
: whether to use a combo of Blast and CAOS for classification.
CAOS.classify_sequence
— Method.classify_sequence(sequence::String, tree::Node, CA_weights::Dict{String,Int64},
all_CA_weights::Dict{Int64,Dict{String,Int64}}, occurrence_weighting::Bool,
depth::Int64, tiebreaker::Vector{Dict{String,Int64}} ; blast_results=["Fake Label"], combo_classification::Bool=false, protein::Bool=false)
Classifies an input sequence given a phylogentic tree.
Arguments
sequence::String
: sequence to count matches.tree::Node
: the tree represented as a Node.CA_weights::Dict{String,Int64}
: weights to use for CA counts.all_CA_weights::Dict{Int64,Dict{String,Int64}}
: all sets of weights to use for CA counts.occurrence_weighting::Bool
: whether to use occurrence weighting during counting.depth::Int64
: current depth of the tree.tiebreaker::Vector{Dict{String,Int64}}
: tiebreaking procedures to use.blast_results=["Fake Label"]
: list of blast results.combo_classification::Bool=false
: whether to use both Blast and CAOS for classification.
CAOS.convert_to_struct
— Method.convert_to_struct(tree_dict::Dict{String,Any}, tree_obj::Node)
Takes a tree loaded from json and convert it back to a proper internal representation.
Arguments
tree_dict::Dict{String,Any}
: tree as a dictionary after being read from json.tree_obj::Node
: the tree (Node).
CAOS.downsample_taxa
— Method.downsample_taxa(taxa::Array{String}, perc_keep::Float64)
Downsamples taxa by a certain percentage.
Arguments
taxa::Array{String}
: list of taxa.perc_keep::Float64
: percentage of taxa to keep.
CAOS.find_sequence
— Method.find_sequence(tree::Node, taxa_label::String)
Takes a tree (Node) and a taxa label and finds the subtree containing that sequence.
Arguments
tree::Node
: the tree represented as a Node.taxa_label::String
: taxa label.
CAOS.generate_caos_rules
— Method.generate_caos_rules(tree_file_path::String, output_directory::String)
Takes a Nexus file and generates all the CAOS rules for the tree.
Arguments
tree_file_path::String
: path to the Nexus file.output_directory::String
: path to the output directory.
CAOS.get_adjusted_start
— Method.get_adjusted_start(original_start::Int, subject::String)
Adjusts the start of the matched subject based on its blanks.
Arguments
original_start::Int
: the index of the original starting position.subject::String
: the matched subject.
CAOS.get_all_neighbors
— Method.get_all_neighbors(tree::Node, character_labels::Dict{String,String}, taxa_label::String)
Takes a tree (Node) and a taxa label and finds all the neighbors (including duplicates).
Arguments
tree::Node
: the tree represented as a Node.character_labels::Dict{String,String}
: character label mappings.taxa_label::String
: taxa label.
CAOS.get_cPu_and_cPr
— Method.get_cPu_and_cPr(nodes::Array{Dict{String,Any}}, node_num::Int64, taxa_labels::Dict{String,String},
character_labels::Dict{String,String}, sPu::Array{Dict{String,Any}}, sPr::Array{Dict{String,Any}})
Gets all the cPu and cPr for the entire character sequence at a specific node (does not support nucleotide options).
Arguments
nodes::Array{Dict{String,Any}}
: list of nodes.node_num::Int64
: current node index.taxa_labels::Dict{String,String}
: a mapping of the taxa labels to the character labels.character_labels::Dict{String,String}
: a mapping of the character labels to the corresponding sequences.sPu::Array{Dict{String,Any}}
: list of simple pure rules.sPr::Array{Dict{String,Any}}
: list of simple private rules.
CAOS.get_descendents
— Method.get_descendents(tree::Node)
Gets descendents of a Node (tree or subtree).
Arguments
tree::Node
: the tree represented as a Node.
CAOS.get_first_taxa_from_tree
— Method.get_first_taxa_from_tree(tree::Node)
Gets the first taxa from a tree.
Arguments
tree::Node
: the tree represented as a Node.
CAOS.get_group_combos
— Method.get_group_combos(group_taxa::Array{Array{String}})
Gets all the combinations of group vs non groups.
Arguments
group_taxa::Array{Array{String}}
: list of taxa within a group.
CAOS.get_group_taxa_at_node
— Method.get_group_taxa_at_node(nodes::Array{Dict{String,Any}}, node_num::Int64)
Gets the sets of taxa for each group at a node.
Arguments
nodes::Array{Dict{String,Any}}
: list of nodes.node_num::Int64
: current node index.
CAOS.get_max_depth
— Method.get_max_depth(tree::Node, depth::Int64)
Takes a tree (Node) and gets the maximum depth.
Arguments
tree::Node
: the tree represented as a Node.depth::Int64
: current depth.
CAOS.get_neighbors
— Method.get_neighbors(tree::Node, taxa_label::String)
Takes a tree (Node) and a taxa label and finds all the neighbors (taxa that come after from the subtree containing the input taxa).
Arguments
tree::Node
: the tree represented as a Node.taxa_label::String
: taxa label.
CAOS.get_nodes
— Method.get_nodes(tree::String ; taxa_to_remove::Union{Array{String,1},Bool}=false)
Takes a tree in Newick format, returns an internal representation of the tree.
Arguments
tree::String
: the tree in Newick format.taxa_to_remove::Union{Array{String,1},Bool}=false
: the taxa that will be removed (if applicable).
CAOS.get_sPu_and_sPr
— Method.get_sPu_and_sPr(nodes::Array{Dict{String,Any}}, node_num::Int64,
taxa_labels::Dict{String,String}, character_labels::Dict{String,String})
Gets all the sPu and sPr for the entire character sequence at a specific node.
Arguments
nodes::Array{Dict{String,Any}}
: list of nodes.node_num::Int64
: current node index.taxa_labels::Dict{String,String}
: a mapping of the taxa labels to the character labels.character_labels::Dict{String,String}
: a mapping of the character labels to the corresponding sequences.
CAOS.load_tree
— Method.load_tree(directory::String)
Loads a CAOS tree from file.
Arguments
directory::String
: path to directory where tree exists.
CAOS.parse_tree
— Method.parse_tree(file_path::String; taxa_to_remove::Union{Array{String,1},Bool}=false)
Takes a Nexus file for a tree, returns an internal representation of that tree (and other relevant information).
Arguments
file_path::String
: file path to the Nexus file.taxa_to_remove::Union{Array{String,1},Bool}=false
: the taxa that will be removed (if applicable).
CAOS.remove_blanks
— Method.remove_blanks(char_label_dict::Dict{String,String} ; change_to_N::Bool=false)
Changes all blanks to N's in character sequences.
Arguments
char_label_dict::Dict{String,String}
: character label mappings.change_to_N::Bool=false
: whether to change to N or just remove.
CAOS.remove_from_tree!
— Method.remove_from_tree!(tree_tokens::Vector{String}, taxa_to_remove::Union{Array{String,1},Bool})
Takes a tree in Newick format, removes a specific taxa from the tree.
Arguments
tree_tokens::Vector{String}
: the tree in Newick format, tokenized.taxa_to_remove::Union{Array{String,1},Bool}
: the taxa that will be removed.
CAOS.CA_matches
— Method.CA_matches(sequence::String, CAs::Vector{Rule}, CA_weights::Dict{String,Int64}, occurrence_weighting::Bool)
Counts the number of CA's matched by a sequence (only support for simple rules).
Arguments
sequence::String
: sequence to count matches.CAs::Vector{Rule}
: list of all CA's.CA_weights::Dict{String,Int64}
: weights to use for CA counts.occurrence_weighting::Bool
: whether to use occurrence weighting during counting.
CAOS.add_blanks_to_back
— Method.add_blanks_to_back(subject::String, query::String, new_seq::String,
subj_len::Int64, query_len::Int64, subj_non_blanks::Int64,
hitnames::Vector{String}, hit_idx::Int64, character_labels::Dict{String,String})
Adds blanks to the back of a sequence from a blast match.
Arguments
subject::String
: the subject the query is being matched to.query::String
: the query that is having blanks added to it.new_seq::String
: the new sequence (query with added blanks).subj_len::Int64
: length of the subject.query_len::Int64
: length of the query.subj_non_blanks::Int64
: number of non blanks in the subject.hitnames::Vector{String}
: list of blast hits.hit_idx::Int64
: index of the current blast hit.character_labels::Dict{String,String}
: a mapping of the character labels to the corresponding sequences.
CAOS.add_blanks_to_front
— Method.add_blanks_to_front(subject::String, query::String, new_seq::String,
subj_len::Int64, query_len::Int64, subj_non_blanks::Int64,
hitnames::Vector{String}, hit_idx::Int64, character_labels::Dict{String,String})
Adds blanks to the front of a sequence from a blast match.
Arguments
subject::String
: the subject the query is being matched to.query::String
: the query that is having blanks added to it.new_seq::String
: the new sequence (query with added blanks).subj_len::Int64
: length of the subject.query_len::Int64
: length of the query.subj_non_blanks::Int64
: number of non blanks in the subject.hitnames::Vector{String}
: list of blast hits.hit_idx::Int64
: index of the current blast hit.character_labels::Dict{String,String}
: a mapping of the character labels to the corresponding sequences.
CAOS.get_best_hit
— Method.get_best_hit(results::Array{BioTools.BLAST.BLASTResult,1}, query::String,
character_labels::Dict{String,String}, character_labels_no_gaps::Dict{String,String})
Gets the hit from blastn that has the most sequence coverage with no gaps compared to the query sequence.
Arguments
results::Array{BioTools.BLAST.BLASTResult,1}
: blastn results.query::String
: the query that is having blanks added to it.character_labels::Dict{String,String}
: a mapping of the character labels to the corresponding sequences.character_labels_no_gaps::Dict{String,String}
: character labels with gaps removed from sequences.
CAOS.get_duplicate_labels
— Method.get_duplicate_labels(character_labels::Dict{String,String}, label::String)
Takes the character labels and a specific label and finds if any other sequences are the same.
Arguments
character_labels::Dict{String,String}
: character label mappings.label::String
: taxa label to search for duplicates of.
CAOS.get_next_hit
— Method.get_next_hit(hitnames::Vector{String}, hit_idx::Int64)
Gets the next best hit returned from a blastn search.
Arguments
hitnames::Vector{String}
: a list of all blastn hitnames.hit_idx::Int64
: index of the current hit.
CAOS.Node
— Type.Node(CAs::Array{Rule,1}, taxa_label::String="")
Struct to store a node (is recursive).
CAOS.Rule
— Type.Rule(idxs::Tuple{Vararg{Int}}, char_attr::Tuple{Vararg{Char}},
is_pure::Bool, num_group::Int, num_non_group::Int, occurances::Int)
Struct to store relevant information about a CA.