phylojunction.data package

Submodules

phylojunction.data.attribute_transition module

class phylojunction.data.attribute_transition.AttributeTransition(attr_label: str, subtending_node_label: str, global_time: float, from_state: int, to_state: int, age: float | None = None, to_state2: int | None = None, at_speciation: bool = False)

Bases: object

age: float
at_speciation: bool | None
attr_label: str
from_state: int
global_time: float
str_representation: str
subtending_or_speciating_node_label: str
to_state: int
to_state2: int
update_daughter_members(daughter_node_label: str, daughter_node_time: float) None

phylojunction.data.sampled_ancestor module

class phylojunction.data.sampled_ancestor.SampledAncestor(label: str, lineage_node_label: str, global_time: float, age: float | None = None, state: int = 0, time_to_lineage_node: float = -1.0)

Bases: object

age: float
global_time: float
label: str
lineage_node_label: str
state: int
str_representation: str
time_to_lineage_node: float

phylojunction.data.tree module

class phylojunction.data.tree.AnnotatedTree(a_tree: Tree, total_state_count: int, start_at_origin: bool = False, alternative_root_label: str = '', condition_on_obs_both_sides_root: bool = False, max_age: float | None = None, slice_t_ends: List[float] | None = None, slice_age_ends: List[float] | None = None, sa_lineage_dict: Dict[str, List[SampledAncestor]] | None = None, at_dict: Dict[str, List[AttributeTransition]] | None = None, clado_at_dict: Dict[str, AttributeTransition] | None = None, tree_died: bool | None = None, tree_invalid: bool | None = None, read_as_newick_string: bool = False, epsilon: float = 1e-12)

Bases: Tree

Tree annotated with discrete states.

Parameters:
  • tree (dendropy.Tree) – Main class member, holding the full tree.

  • tree_reconstructed (dendropy.Tree) – Reconstructed tree produced by pruning the full tree from non-observed phylogenetic paths.

  • origin_node (dendropy.Node) – Origin node. It is ‘None’ if no origin node.

  • root_node (dendropy.Node) – Root node. It is ‘None’ if no root node.

  • rec_tr_root_node (dendropy.Node, optional) – Root node of reconstructed tree. It is ‘None’ if ‘extract_reconstructed_tree’ method is not called.

  • brosc_node (dendropy.Node) – Node that occurs before the root node when tree starts from origin node. The brosc_node can be a terminal or internal node, and is replaced by the root node if a speciation event happens.

  • alternative_root_label (str) – An AnnotatedTree may be read in with a constant function rather than be simulated by discrete_sse(). In this case, the root node is named ‘root’ so that AnnotatedTree is initialized properly, but we want to keep track of an alternative label for the root node, which may appear in a stochastic map table, or in a table with tip states. Defaults to ‘’.

  • with_origin (bool) – Flag indicating that tree starts at origin node. Upon instantiation of class, defaults to ‘False’.

  • tree_read_as_newick (bool) – Flag indicating that tree was not simulated in PJ, but read as a Newick string instead. Upon instantiation of class, defaults to ‘False’.

  • condition_on_obs_both_sides_root (bool) – Flag indicating if root node, when simulation starts from it (!), has at least one living on either side. If not specified by user upon instantiation of class, defaults to ‘False’.

  • tree_died (bool) – Flag indicating if full tree went extinct before reaching the specified age stopping condition (i.e., the reconstructed tree should be empty).

  • tree_invalid (bool) – Flag indicating (if tree was simulated) that rejection sampling deemed the tree invalid. If not specified upon class instantiation, tree is assumed to have been read as a string, and flag is assigned ‘True’.

  • seed_age (float) – Age of seed node (either origin or root).

  • max_age (float) – Maximum age of the tree when it was simulated with an age stopping condition. If not provided by user upon class instantiation, it is ‘None’.

  • origin_age (float) – Age of origin node, if there is one, otherwise ‘None’.

  • origin_edge_length (float) – Length of the branch connecting the origin node to its single child (brosc or root) or to the present. If there is a root, this length does not care about intervening direct (sampled) ancestor between the origin and the root. Will be 0.0 if no origin node.

  • root_age (float) – Age of root node, if there is one, otherwise 0.0.

  • rec_tr_root_age (float) – Age of root node of reconstructed tree.

  • node_heights_dict (dict) – Dictionary that holds the heights of all nodes in the tree. Keys are node labels, values are floats.

  • rec_node_heights_dict (dict) – Dictionary that holds the heights of all nodes in the reconstructed tree. Keys are node labels, values are floats.

  • node_ages_dict (dict) – Dictionary that holds the ages of all nodes in the tree. Keys are node labels, values are floats.

  • rec_node_ages_dict (dict) – Dictionary that holds the ages of all nodes in the reconstructed tree. Keys are node labels, values are floats.

  • slice_t_ends (float) – List of floats with the end times for specified time slices (epochs). If not provided by user Upon instantiation of class, will be ‘None’.

  • slice_age_ends (dict) – List of floats with the end ages for specified time slices (epochs). If not provided by user Upon instantiation of class, will be ‘None’.

  • state_count (int) – How many states there are.

  • state_count_dict (dict) – Dictionary tabulating how many terminal nodes in the full tree are in each state. Keys are integers representing states and values are their counts.

  • extant_terminal_state_count_dict (dict) – Dictionary tabulating how many living (terminal) nodes are in each state. Keys are integers representing states and values are their counts.

  • extant_sampled_terminal_state_count_dict (dict) – Dictionary tabulating how many living (terminal) and sampled nodes are in each state. Keys are integers representing states and values are their counts.

  • extinct_sampled_state_count_dict (dict) – Dictionary tabulating how many extinct (terminal) nodes are in each state. Keys are integers representing states and values are their counts.

  • sa_node_state_count_dict (dict) – Dictionary tabulating how many direct (sampled) ancestor nodes are in each state. Keys are integers representing states and values are their counts.

  • node_attr_dict (dict) – Nested dictionaries. Keys of the outer dictionary are node labels, values are inner dictionaries. Inner dictionary’s key are attribute names (str) and values are the attribute’s values (Any).

  • n_extant_terminal_nodes (int) – Count of living terminal nodes.

  • n_extinct_terminal_nodes (int) – Count of dead terminal nodes.

  • n_extant_sampled_terminal_nodes (int) – Count of living and sampled terminal nodes.

  • n_sa_nodes (int) – Count of direct (sampled) ancestors. These are nodes that are both sampled and observed by definition.

  • sa_lineage_dict (dict) –

  • at_dict (dict, optional) – Dictionary holding a list of AttributeTransition objects for each branch (subtending an internal node) along which state transitions happened. When a cladogenetic event change states, it is recorded as the first (oldest) state transition of its children.

  • rec_tr_at_dict (dict, optional) – Same as ‘at_dict’, but applies to reconstructed tree.

  • clado_at_dict (dict) –

  • epsilon (float) – Float threshold to determine if a tiny decimal number is to be considered 0.0 or not. In other words, if the difference between a tiny value ‘x’ and 0.0 is smaller than epsilon, then ‘x’ is set to 0.0. If not provided by user upon initialization of class, defaults to 1e-12.

alternative_root_label: str
at_dict: Dict[str, List[AttributeTransition]] | None
brosc_node: Node | None
clado_at_dict: Dict[str, AttributeTransition] | None
condition_on_obs_both_sides_root: bool
epsilon: float
extant_sampled_terminal_nodes_labels: Tuple[str, ...]
extant_terminal_nodes_labels: Tuple[str, ...]
extant_terminal_sampled_state_count_dict: Dict[int, int]
extant_terminal_state_count_dict: Dict[int, int]
extinct_terminal_nodes_labels: Tuple[str, ...]
extinct_terminal_state_count_dict: Dict[int, int]
extract_reconstructed_tree(plotting_overhead: bool = False, require_obs_both_sides: bool | None = None) Tree

Extract reconstructed tree from complete tree.

This method was designed to be called outside of AnnotatedTree’s initialization, and instead be prompted by rejection sampling (when conditioning on both sides of the root), and summarization and printing functions should the user ask for it. This is to save running time.

The method deep-copies self.tree, which is of type dendropy.Tree, populates it appropriately and then returns it. One important step is the update of the reconstructed tree’s ‘rec_tr_at_dict’ and ‘rec_sa_lineage_dict’ members, which will be different from the complete tree’s. These updates are carried out by ‘update_rec_tr_at_dict’ and ‘update_rec_tr_sa_lineage_dict’.

When populating the reconstructed tree, this method effectively prunes extinct taxa, and re-roots the resulting tree at the MRCA of the sampled (observed) taxa. This includes all taxa sampled at the present and direct ancestors.

Side-effect on the complete tree’s AnnotatedTree object:
  1. Populates self.rec_tr_root_node

  2. Populates self.rec_tr_at_dict

  3. Populates self.rec_tr_sa_lineage_dict

Parameters:

require_obs_both_sides (bool, optional) – Flag specifying if sampled (observed) taxa are required on both sides of the root.

Returns:

A tree instance containing the

reconstructed tree extracted from AnnotatedTree’s instance owning the method call.

Return type:

(dendropy.Tree)

get_stats_dict() Dict[str, int | float]

Get dictionary with AnnotatedTree’s summary stats.

This method is required whenever the user asks for a DAG node to be summarized.

Returns:

Dictionary with summary statistic names as keys and their values as values.

Return type:

(dict)

get_taxon_states_str(nexus: bool = False) str

Get states for all nodes in tree as single string.

Parameters:

nexus (bool) – Flag specifying whether states are being collected for Nexus printing (‘True’ if so). Defaults to ‘False’.

Returns:

String containing the states of all nodes in tree.

Return type:

(str)

is_extant_or_sa_on_both_sides_complete_tr_root(a_node: Node) bool

Verify one or more sampled nodes exist on both root sides.

This method is called by extract_reconstructed_tree(), and verifies that there is at least one sampled node (direct sampled ancestor, or sampled extant) on both sides of the root of the complete tree. A root node labeled ‘root’ should be guaranteed to exist by this function’s caller.

Parameters:

a_node (dendropy.Node) – Node from which to start recurring so as to find if there are sampled taxa on both sides of root. E.g., the origin node.

Returns:

Whether there is at least one or more sampled (observed) taxon on both sides of the complete tree’s root node.

Return type:

(bool)

max_age: float | None
n_extant_sampled_terminal_nodes: int
n_extant_terminal_nodes: int
n_extinct_sampled_terminal_nodes: int
n_extinct_terminal_nodes: int
n_sa_nodes: int
node_ages_dict: Dict[str, float]
node_attr_dict: Dict[str, Dict[str, Any]]
node_heights_dict: Dict[str, float]
origin_age: float | None
origin_edge_length: float
origin_node: Node | None
plot_node(axes: Axes, node_attr: str = 'state', draw_reconstructed: bool | None = False, **kwargs) None

Draw tree on provided Axes instance.

This method is required whenever the user asks for a DAG node to be drawn.

Parameters:
  • axes (matplotlib.pyplot.Axes) – Axes object where we are drawing the tree.

  • node_attr (str) – Name of the attribute according to which one wants to color the AnnotatedTree’s branches with. Defaults to ‘state’.

  • draw_reconstructed (bool, optional) – Whether we are drawing the reconstructed tree instead of the complete tree. Defaults to ‘False’.

populate_nd_attr_dict(attrs_of_interest_list: List[str], attr_dict_added_separately_from_tree: bool = False) None

Populate member nested dictionary with node attributes.

This method is not called upon initialization of class, but rather when information about nodes’ states is required, e.g., when get_taxon_states_str() is called. The method takes the attribute values (for one or more attributes) stored in a DendroPy.Tree, and copies them into the member dictionary node_attr_dict.

There is no return and only a side-effect.

Parameters:
  • attrs_of_interest_list (str) – List of attribute names to store in member dictionary.

  • attr_dict_added_separately_from_tree (bool) – Flag specifying whether method is being called outside tree initialization (i.e., to a tree that already exists), or during tree initialization.

rec_node_ages_dict: Dict[str, float]
rec_node_heights_dict: Dict[str, float]
rec_str() str
rec_tr_at_dict: Dict[str, List[AttributeTransition]] | None
rec_tr_clado_at_dict: Dict[str, AttributeTransition] | None
rec_tr_root_age: float
rec_tr_root_node: Node | None
rec_tr_sa_lineage_dict: Dict[str, List[SampledAncestor]] | None
root_age: float
root_node: Node | None
sa_lineage_dict: Dict[str, List[SampledAncestor]] | None
sa_obs_nodes_labels: Tuple[str, ...]
seed_age: float
slice_age_ends: List[float] | None
slice_t_ends: List[float] | None
state_count: int
state_count_dict: Dict[int, int]
tree: Tree
tree_died: bool | None
tree_invalid: bool | None
tree_read_as_newick: bool
tree_reconstructed: Tree
update_rec_tr_at_dict(rec_tree_root_nd: Node) None

Update ‘rec_tr_at_dict’ and ‘rec_clado-at_dict’ members.

The ‘at_dict’ member of the AnnotatedTree, when defined, will by default host the state transitions of every node of the complete tree. This method initializes ‘rec_tr_at_dict’ so that it reflects the reconstructed tree – it is only called when necessary, by the ‘extract_reconstructed_tree’ method.

Member ‘rec_tr_clado_at_dict’ is also updated. Internal nodes undergoing cladogenetic changes that are in the complete tree but not in the reconstructed tree are removed.

update_rec_tr_sa_lineage_dict() None

Update ‘rec_tr_sa_lineage_dict’ member.

The ‘rec_tr_sa_lineage_dict’ member of the AnnotatedTree, when initialized, will be identical to the complete tree’s ‘sa_lineage_dict’. But with the pruning of the complete tree into the reconstructed tree, some of the internal nodes whose names are keys inside ‘sa_lineage_dict’ are pruned (because one of their children lineages dies off). Their SampledAncestor’s associated instances must then be passed down to their surviving lineages (which then must also be annotated with ‘is_sa_lineage == True’).

Parameters:

rec_tree_root_nd (DendroPy.Node) – Node that roots the reconstructed tree.

with_origin: bool
phylojunction.data.tree.get_color_map(n_states: int) Dict[int, str]

Create and return a map from discrete state to color.

This method uses a palette of very contrasting colors if the number of states is lesser or equal to 20. If greater than 20 and lesser or equal to 120, the method switches to another palette and carries out some truncation to avoid colors that are almost white. If the number of states is greater than 120, it switches to yet another palette, truncating it again. The choice of palette and truncation is arbitrary, and was informed by an experiment in plotting.pj_seeing_colors.py.

Parameters:

n_states (int) – Number of states.

Returns:

A dictionary with an integer representing a

discrete state, and a string containing the hex code of a color.

Return type:

(dict)

phylojunction.data.tree.get_node_name(nd: Node) str

Return node name.

Get name of node that may be stored in a DendroPy.Node as either its label, taxon, or taxon.label members. It tries them all to make sure a node name is found and returned. If none is found, an exception is raised.

Parameters:

nd (dendropy.Node) – Node whose name one wants to collect.

Returns:

Name of the node.

Return type:

(str)

phylojunction.data.tree.get_x_coord_from_nd_heights(ann_tr: AnnotatedTree, use_age: bool = False, unit_branch_lengths: bool = False, draw_reconstructed: bool = False) Dict[str, float]

Return x-coordinates for all nodes in tree.

This method returns a dictionary of node labels as keys, node x_coords (time) as values.

Parameters:
  • ann_tr (AnnotatedTree) – Instance of AnnotatedTree that we are drawing.

  • use_age (bool) – Flag specifying if to use node age or not. Defaults to ‘False’.

  • unit_branch_lengths (bool, optional) – If branch lengths are all 1.0 (currently not used). Defaults to ‘False’.

  • draw_reconstructed (bool, optional) – Whether we are drawing the reconstructed tree instead of the complete tree. Defaults to ‘False’.

phylojunction.data.tree.get_y_coord_from_n_obs_nodes(ann_tr: AnnotatedTree, start_at_origin: bool = False, sa_along_branches: bool = True, draw_reconstructed: bool = False) Dict[str, float]

Return y-coordinates for all nodes in tree.

This method returns a dictionary of node labels as keys, y-coords as values. Y-coords here are integers that go from 1 to the total number of observable nodes. Every observable node will be 1 y-unit away from each other.

Parameters:
  • ann_tr (AnnotatedTree) – Instance of AnnotatedTree that we are drawing.

  • start_at_origin (bool) – Flag specifying if tree starts at the origin node. Defaults to ‘False’.

  • sa_along_branches (bool) – Flag specifying if direct (sampled) ancestors should be placed along a branch. Defaults to ‘True’.

  • draw_reconstructed (bool, optional) – Flag specifying if tree being drawn is the reconstructed tree or complete tree. Defaults to ‘False’.

phylojunction.data.tree.pj_get_name_mrca_obs_terminals(nd: Node, nd_label_list: List[str]) str

Get name of the MRCA of specified observed terminal nodes.

This method recursively finds the most recent common ancestor of the terminal nodes whose names are specified as input. These nodes must be observed, i.e., be either direct (sampled) ancestors, or sampled extant terminal nodes.

Parameters:
  • nd (dendropy.Node) – Node to recur and grab the name of.

  • nd_label_list (str) – List of node names whose MRCA’s name is being searched.

Returns:

Name of MRCA node

Return type:

(str)

phylojunction.data.tree.plot_ann_tree(ann_tr: AnnotatedTree, axes: Axes, use_age: bool = False, start_at_origin: bool = False, attr_of_interest: str = 'state', sa_along_branches: bool = True, draw_reconstructed: bool = False) Tuple[Dict[str, float]]

Plot instance of AnnotatedTree on provided Axes instance.

Plotting is a side-effect.

Parameters:
  • ann_tr (AnnotatedTree) – Instance of AnnotatedTree that we are drawing.

  • axes (matplotlib.pyplot.Axes) – Axes object where we are drawing the tree.

  • use_age (bool) – Flag specifying if age or time is being used.

  • start_at_origin (bool) – Flag specifying if drawing starts at the origin node. Defaults to ‘False’.

  • attr_of_interest (str) – Name of the attribute according to which states we are coloring tree branches. Defaults to ‘state’.

  • sa_along_branches (bool) – Flag specifying if direct (sampled) ancestors should be placed along a branch. Defaults to ‘True’.

  • draw_reconstructed (bool) – Flag specifying if reconstructed tree is meant to be drawn instead of complete tree. Defaults to ‘False’.

Returns:

Tuple with two dictionary. This return is used for unit testing purposes only. Keys are node names, values are either x- or y-coordinates.

Return type:

(tuple)

Module contents