phylojunction.distribution package

Submodules

phylojunction.distribution.dn_discrete_sse module

class phylojunction.distribution.dn_discrete_sse.DnSSE(sse_stash: SSEStash, n: int = 1, n_replicates: int = 1, origin: bool = True, start_states_list: List[int] = [], stop: str = '', stop_value: List[float] = [], condition_on_speciation: bool = False, condition_on_survival: bool = True, condition_on_obs_both_sides_root: bool = False, min_rec_taxa: int = 0, max_rec_taxa: int = 1000000000000, abort_at_alive_count: int = 1000000000000, epsilon: float = 1e-12, runtime_limit: int = 300, max_n_failed_attempts: int = 200, rng_seed: int | None = None, debug: bool | None = False, info: bool | None = False)

Bases: DistrForSampling

Discrete SSE distribution.

Class for the discrete state-dependent speciation and extinction (SSE) distribution. Used for sampling phylogenetic trees annotated with discrete states.

The simulate() method in this class is a ‘rising tide’ sampler, where all living lineages grow together. It is not a recursive sampler in which lineages take care of growing themselves and recur upon birth events.

At all times we know:
  1. how many lineages are alive and dead

  2. all the character states represented by living lineages

Parameters:
  • n_sim (int) – Number of trees to sample. Defaults to 1.

  • n_repl (int) – Number of tree replicates per sample. This is equivalent to the size of a plate in the DAG representation. Defaults to 1.

  • with_origin (bool) – Flag for whether the process starts at the origin or not.

  • root_is_born (bool) – Attribute that records if root node was created and is in tree (for when it starts at the origin). ‘True’ if process starts at the root.

  • start_states (int) – List of integers representing the starting states of each of the ‘n_sim’ samples.

  • seed_age (float, optional) – Age of seed node (either origin or root).

  • condition_on_speciation (bool) – Flag for rejecting tree samples that do not go a single speciation event before ‘stop_val’ is met. Note that this first speciation event may or not be what is canonically the reconstructed tree root node. If ‘True’, rejects tree sample failing to meet condition. Defaults to ‘False’.

  • condition_on_survival (bool) – Flag for rejecting tree samples that go extinct before ‘stop_val’ is met. If ‘True’, rejects tree sample failing to meet condition. Defaults to ‘True’.

  • condition_on_obs_both_sides_root (bool) – Flag for rejecting tree samples that do not have observed nodes on both sides of the complete tree’s root node. If ‘True’, rejects tree sample failing to meet condition. Defaults to ‘False’.

  • stop (str) – Stop condition to end sampling (simulation) procedure and return tree. If ‘age’, stops when age of either origin or is equal to ‘stop_val’ (see below). If “size”, stops when tree has ‘stop_val’ observed nodes.

  • stop_val (float) – List of values used by ‘stop’ (see above) to end each of the ‘n_sim’ sampling (simulation) procedures and return tree. Either maximum age, or maximum count of observable nodes.

  • min_rec_taxa (int) – Required minimum number of observed taxa in reconstructed tree. Defaults to 0.

  • max_rec_taxa (int) – Required maximum number of observed taxa in reconstructed tree. Defaults to 1e12.

  • abort_at_alive_count (int) – Number of living (not observed!) nodes at which point sample is rejected. This parameter is used to abort samples whose SSE parameters cause trees to grow too large. Defaults to 1e12.

  • sse_stash (SSEStash) – Object holding all discrete state-dependent parameters we need to sample (i.e., simulate). This object holds (i) the number of discrete states of the process, (ii) the number of time slices if process is time-heterogeneous, (iii) the end age of each time slice.

  • events (MacroevolEventHandler) – Object that computes total rate values, and samples SSE events. It is a member of sse_stash.

  • state_count (int) – Number of states of SSE process.

  • n_time_slices (int) – Number of time slices (epochs).

  • slice_t_ends (float, optional) – List of floats with time slice time ends (forward!).

  • prob_handler (DiscreteStateDependentProbabilityHandler) – Object that takes care of state-dependent taxon sampling across time slices.

  • epsilon (float, optional) – Float threshold to determine if a tiny decimal number is to be considered 0.0 or not. In other words, if the difference between a tiny value ‘x’ and 0.0 is smaller than epsilon, then ‘x’ is set to 0.0. Defaults to 1e-12.

  • runtime_limit (int, optional) – Runtime ceiling (in seconds) for obtaining the ‘n’ tree samples. If this limit is met, the sampling procedure is aborted. Defaults to 300.

  • max_n_failed_attempts (int, optional) – Maximum number of failed tree sampling attempts (replicates included) before PhyloJunction quits. Defaults to 200.

  • rng_seed (int, optional) – Integer seed for the two random number generators used in this class. This seed is only ever used by user if bypassing the scripting language (otherwise random number generator seeds are handled in cmd_parse.py). Defaults to None.

  • debug (bool, optional) – Flag for whether to print debugging messages during sampling procedure.

  • info (bool, optional) – Flag for whether to print information about running sampling procedure.

DN_NAME = 'DnSSE'
abort_at_alive_count: int
condition_on_obs_both_sides_root: bool
condition_on_speciation: bool
condition_on_survival: bool
debug: bool
epsilon: float
events: MacroevolEventHandler
generate() List[AnnotatedTree]

Generate samples according to distribution.

The number of samples will be the specified number of samples times the number of replicates (per sample).

Returns:

Valid simulated trees annotated with

discrete traits

Return type:

(AnnotatedTree)

get_rev_inference_spec_info() List[str]
info: bool
init_check_vectorize_sample_size() None

Vectorize SSE rates and probs, start values and stop values.

This method is in the parent class.

Raises:

DimensionalityError – Is raised if (i) more than one SSE rate value is provided, but that number is smaller than the requested number of tree samples (because then we do not know how to vectorize the values), (ii) same as (i), but for SSE probabilities, (iii) same as (i), but for starting states, (iv) same as (i), but for stop values

max_n_failed_attempts: int
max_rec_taxa: int
min_rec_taxa: int
n_sim: int
n_time_slices: int
prob_handler: DiscreteStateDependentProbabilityHandler
rng_seed: int
root_is_born: bool
runtime_limit: int
seed_age: float | None
simulate(a_start_state: int, a_stop_value: int | float, sample_idx: int = 0) AnnotatedTree

Sample (simulate) discrete-SSE tree.

The simulated tree may or not meet the conditions specified by the user. Conditioning is handled outside of this method (see documentation for .generate()).

Parameters:
  • a_start_state (int) – State at seed node (origin or root).

  • a_stop_value (float) – Value to stop simulation with (number of tips or tree height).

  • sample_idx (int) – Index of current sample.

Returns:

A tree annotated with discrete states.

Return type:

(AnnotatedTree)

slice_t_ends: List[float | None]
sse_stash: SSEStash
start_states: List[int]
state_count: int
stop: str
stop_val: List[float]
with_origin: bool

phylojunction.distribution.dn_parametric module

class phylojunction.distribution.dn_parametric.DnExponential(n_samples: int, n_repl: int, scale_or_rate_param: List[float], rate_parameterization: bool, parent_node_tracker: Dict[str, str] | None = None)

Bases: DistrForSampling

DN_NAME = 'Exponential'
static draw_exp(n_samples: int, scale_or_rate_param: float, rate_parameterization: bool = True) ndarray

Return sample from exponential distribution.

Parameters:
  • n_samples (int) – Number of draws (sample size).

  • scale_or_rate_param (float) – Scale (default) or rate of exponential distribution.

  • rate_parameterization (bool, optional) – Argument of ‘scale_or_rate_param’ is rate instead of scale. Defaults to ‘True’.

Returns:

List of floats sampled from exonential distribution.

Return type:

(list)

exp_rate_parameterization: bool
exp_scale_or_rate_list: List[float]
generate() List[float]
get_rev_inference_spec_info() List[str]
init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None

Check sample size against number of provided parameter values

This is the function behind the vectorization functionality

param_dict: Dict[str, bool | List[int | float | str]]
parent_node_tracker: Dict[str, str] | None
vectorized_params: List[List[int | float | str]]
class phylojunction.distribution.dn_parametric.DnGamma(n_samples: int, n_repl: int, shape_param: List[float], scale_or_rate_param: List[float], rate_parameterization: bool, parent_node_tracker: Dict[str, str] | None = None)

Bases: DistrForSampling

DN_NAME = 'Gamma'
static draw_gamma(n_samples: int, shape_param: float, scale_or_rate_param: float, rate_parameterization: bool = False) float64 | ndarray

Return sample from gamma distribution.

Parameters:
  • n_samples (int) – Number of draws (sample size).

  • shape_param (float) – Gamma distribution shape parameter (represented by alpha or kappa sometimes).

  • scale_or_rate_param (float) – Gamma distribution scale or rate parameter.

  • rate_parameterization (bool, optional) – Argument of ‘scale_or_rate_param’ is rate instead of scale. Defaults to ‘False’.

Returns:

List of floats sampled from exonential distribution.

Return type:

(list)

gamma_rate_parameterization: bool
gamma_scale_or_rate_param_list: List[float]
gamma_shape_param_list: List[float]
generate() List[float]
get_rev_inference_spec_info() List[str]
init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None

Check sample size against number of provided parameter values

This is the function behind the vectorization functionality

param_dict: Dict[str, bool | List[int | float | str]] = {}
parent_node_tracker: Dict[str, str] | None
vectorized_params: List[List[int | float | str]]
class phylojunction.distribution.dn_parametric.DnLogNormal(n_samples: int, n_repl: int, ln_mean: List[float], ln_sd: List[float], ln_log_space: bool, parent_node_tracker: Dict[str, str] | None = None)

Bases: DistrForSampling

DN_NAME = 'Log-normal'
static draw_ln(n_samples: int, mean_param: float, sd_param: float, scale: float = 1.0, log_space: bool = True) float64 | ndarray

Return sample from log-normal distribution.

Parameters:
  • n_samples (int) – Number of draws (sample size).

  • mean_param (float) – Mean (scale) of log-normal.

  • sd_param (float) – Std. deviation (shape) of log-normal distribution.

  • scale (float) – Mean (scale) of log-normal distribution in log-space. Defaults to 1.0.

  • log_space (bool, optional) – Flag specifying if mean of distribution is provided in log-space. Defaults to ‘True’.

Returns:

List of floats sampled from log-normal distribution.

Return type:

(list)

generate() List[float]
get_rev_inference_spec_info() List[str]
init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None

Check sample size against number of provided parameter values

This is the function behind the vectorization functionality

ln_log_space: bool
ln_mean_list: List[float]
ln_sd_list: List[float]
param_dict: Dict[str, bool | List[int | float | str]]
parent_node_tracker: Dict[str, str] | None
vectorized_params: List[List[int | float | str]]
class phylojunction.distribution.dn_parametric.DnNormal(n_samples: int, n_repl: int, norm_mean_param: List[float], norm_sd_param: List[float], parent_node_tracker: Dict[str, str] | None)

Bases: DistrForSampling

DN_NAME = 'Normal'
static draw_normal(n_samples: int, mean_param: float, sd_param: float) float64 | ndarray

Return sample from normal distribution.

Parameters:
  • n_samples (int) – Number of draws (sample size).

  • mean_param (float) – Mean (location) of normal distribution.

  • sd_param (float) – Std. deviation (scale) of normal distribution.

Returns:

List of values sampled from normal distribution.

Return type:

(list)

generate() List[float]
get_rev_inference_spec_info() List[str]
init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None

Check sample size against number of provided parameter values

This is the function behind the vectorization functionality

norm_mean_param_list: List[float]
norm_sd_param_list: List[float]
param_dict: Dict[str, bool | List[int | float | str]]
parent_node_tracker: Dict[str, str] | None
vectorized_params: List[List[int | float | str]]
class phylojunction.distribution.dn_parametric.DnUnif(n_samples: int, n_repl: int, min_param: List[float], max_param: List[float], parent_node_tracker: Dict[str, str] | None = None)

Bases: DistrForSampling

DN_NAME = 'Uniform'
static draw_unif(n_samples: int, min_param: float, max_param: float) float64 | ndarray

Return sample from uniform distribution.

Parameters:
  • n_samples (int) – Number of draws (sample size).

  • min_param (float) – Minimum value.

  • max_param (float) – Maximum value.

Returns:

List of floats sampled from exonential distribution.

Return type:

(list)

generate() List[float]
get_rev_inference_spec_info() List[str]
init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None

Check sample size against number of provided parameter values

This is the function behind the vectorization functionality

max_param_list: List[float]
min_param_list: List[float]
param_dict: Dict[str, bool | List[int | float | str]]
parent_node_tracker: Dict[str, str] | None
vectorized_params: List[List[int | float | str]]

Module contents