phylojunction.distribution package
Submodules
phylojunction.distribution.dn_discrete_sse module
- class phylojunction.distribution.dn_discrete_sse.DnSSE(sse_stash: SSEStash, n: int = 1, n_replicates: int = 1, origin: bool = True, start_states_list: List[int] = [], stop: str = '', stop_value: List[float] = [], condition_on_speciation: bool = False, condition_on_survival: bool = True, condition_on_obs_both_sides_root: bool = False, min_rec_taxa: int = 0, max_rec_taxa: int = 1000000000000, abort_at_alive_count: int = 1000000000000, epsilon: float = 1e-12, runtime_limit: int = 300, max_n_failed_attempts: int = 200, rng_seed: int | None = None, debug: bool | None = False, info: bool | None = False)
Bases:
DistrForSampling
Discrete SSE distribution.
Class for the discrete state-dependent speciation and extinction (SSE) distribution. Used for sampling phylogenetic trees annotated with discrete states.
The simulate() method in this class is a ‘rising tide’ sampler, where all living lineages grow together. It is not a recursive sampler in which lineages take care of growing themselves and recur upon birth events.
- At all times we know:
how many lineages are alive and dead
all the character states represented by living lineages
- Parameters:
n_sim (int) – Number of trees to sample. Defaults to 1.
n_repl (int) – Number of tree replicates per sample. This is equivalent to the size of a plate in the DAG representation. Defaults to 1.
with_origin (bool) – Flag for whether the process starts at the origin or not.
root_is_born (bool) – Attribute that records if root node was created and is in tree (for when it starts at the origin). ‘True’ if process starts at the root.
start_states (int) – List of integers representing the starting states of each of the ‘n_sim’ samples.
seed_age (float, optional) – Age of seed node (either origin or root).
condition_on_speciation (bool) – Flag for rejecting tree samples that do not go a single speciation event before ‘stop_val’ is met. Note that this first speciation event may or not be what is canonically the reconstructed tree root node. If ‘True’, rejects tree sample failing to meet condition. Defaults to ‘False’.
condition_on_survival (bool) – Flag for rejecting tree samples that go extinct before ‘stop_val’ is met. If ‘True’, rejects tree sample failing to meet condition. Defaults to ‘True’.
condition_on_obs_both_sides_root (bool) – Flag for rejecting tree samples that do not have observed nodes on both sides of the complete tree’s root node. If ‘True’, rejects tree sample failing to meet condition. Defaults to ‘False’.
stop (str) – Stop condition to end sampling (simulation) procedure and return tree. If ‘age’, stops when age of either origin or is equal to ‘stop_val’ (see below). If “size”, stops when tree has ‘stop_val’ observed nodes.
stop_val (float) – List of values used by ‘stop’ (see above) to end each of the ‘n_sim’ sampling (simulation) procedures and return tree. Either maximum age, or maximum count of observable nodes.
min_rec_taxa (int) – Required minimum number of observed taxa in reconstructed tree. Defaults to 0.
max_rec_taxa (int) – Required maximum number of observed taxa in reconstructed tree. Defaults to 1e12.
abort_at_alive_count (int) – Number of living (not observed!) nodes at which point sample is rejected. This parameter is used to abort samples whose SSE parameters cause trees to grow too large. Defaults to 1e12.
sse_stash (SSEStash) – Object holding all discrete state-dependent parameters we need to sample (i.e., simulate). This object holds (i) the number of discrete states of the process, (ii) the number of time slices if process is time-heterogeneous, (iii) the end age of each time slice.
events (MacroevolEventHandler) – Object that computes total rate values, and samples SSE events. It is a member of sse_stash.
state_count (int) – Number of states of SSE process.
n_time_slices (int) – Number of time slices (epochs).
slice_t_ends (float, optional) – List of floats with time slice time ends (forward!).
prob_handler (DiscreteStateDependentProbabilityHandler) – Object that takes care of state-dependent taxon sampling across time slices.
epsilon (float, optional) – Float threshold to determine if a tiny decimal number is to be considered 0.0 or not. In other words, if the difference between a tiny value ‘x’ and 0.0 is smaller than epsilon, then ‘x’ is set to 0.0. Defaults to 1e-12.
runtime_limit (int, optional) – Runtime ceiling (in seconds) for obtaining the ‘n’ tree samples. If this limit is met, the sampling procedure is aborted. Defaults to 300.
max_n_failed_attempts (int, optional) – Maximum number of failed tree sampling attempts (replicates included) before PhyloJunction quits. Defaults to 200.
rng_seed (int, optional) – Integer seed for the two random number generators used in this class. This seed is only ever used by user if bypassing the scripting language (otherwise random number generator seeds are handled in cmd_parse.py). Defaults to None.
debug (bool, optional) – Flag for whether to print debugging messages during sampling procedure.
info (bool, optional) – Flag for whether to print information about running sampling procedure.
- DN_NAME = 'DnSSE'
- abort_at_alive_count: int
- condition_on_obs_both_sides_root: bool
- condition_on_speciation: bool
- condition_on_survival: bool
- debug: bool
- epsilon: float
- events: MacroevolEventHandler
- generate() List[AnnotatedTree]
Generate samples according to distribution.
The number of samples will be the specified number of samples times the number of replicates (per sample).
- Returns:
- Valid simulated trees annotated with
discrete traits
- Return type:
- get_rev_inference_spec_info() List[str]
- info: bool
- init_check_vectorize_sample_size() None
Vectorize SSE rates and probs, start values and stop values.
This method is in the parent class.
- Raises:
DimensionalityError – Is raised if (i) more than one SSE rate value is provided, but that number is smaller than the requested number of tree samples (because then we do not know how to vectorize the values), (ii) same as (i), but for SSE probabilities, (iii) same as (i), but for starting states, (iv) same as (i), but for stop values
- max_n_failed_attempts: int
- max_rec_taxa: int
- min_rec_taxa: int
- n_sim: int
- n_time_slices: int
- prob_handler: DiscreteStateDependentProbabilityHandler
- rng_seed: int
- root_is_born: bool
- runtime_limit: int
- seed_age: float | None
- simulate(a_start_state: int, a_stop_value: int | float, sample_idx: int = 0) AnnotatedTree
Sample (simulate) discrete-SSE tree.
The simulated tree may or not meet the conditions specified by the user. Conditioning is handled outside of this method (see documentation for .generate()).
- Parameters:
a_start_state (int) – State at seed node (origin or root).
a_stop_value (float) – Value to stop simulation with (number of tips or tree height).
sample_idx (int) – Index of current sample.
- Returns:
A tree annotated with discrete states.
- Return type:
- slice_t_ends: List[float | None]
- start_states: List[int]
- state_count: int
- stop: str
- stop_val: List[float]
- with_origin: bool
phylojunction.distribution.dn_parametric module
- class phylojunction.distribution.dn_parametric.DnExponential(n_samples: int, n_repl: int, scale_or_rate_param: List[float], rate_parameterization: bool, parent_node_tracker: Dict[str, str] | None = None)
Bases:
DistrForSampling
- DN_NAME = 'Exponential'
- static draw_exp(n_samples: int, scale_or_rate_param: float, rate_parameterization: bool = True) ndarray
Return sample from exponential distribution.
- Parameters:
n_samples (int) – Number of draws (sample size).
scale_or_rate_param (float) – Scale (default) or rate of exponential distribution.
rate_parameterization (bool, optional) – Argument of ‘scale_or_rate_param’ is rate instead of scale. Defaults to ‘True’.
- Returns:
List of floats sampled from exonential distribution.
- Return type:
(list)
- exp_rate_parameterization: bool
- exp_scale_or_rate_list: List[float]
- generate() List[float]
- get_rev_inference_spec_info() List[str]
- init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None
Check sample size against number of provided parameter values
This is the function behind the vectorization functionality
- param_dict: Dict[str, bool | List[int | float | str]]
- parent_node_tracker: Dict[str, str] | None
- vectorized_params: List[List[int | float | str]]
- class phylojunction.distribution.dn_parametric.DnGamma(n_samples: int, n_repl: int, shape_param: List[float], scale_or_rate_param: List[float], rate_parameterization: bool, parent_node_tracker: Dict[str, str] | None = None)
Bases:
DistrForSampling
- DN_NAME = 'Gamma'
- static draw_gamma(n_samples: int, shape_param: float, scale_or_rate_param: float, rate_parameterization: bool = False) float64 | ndarray
Return sample from gamma distribution.
- Parameters:
n_samples (int) – Number of draws (sample size).
shape_param (float) – Gamma distribution shape parameter (represented by alpha or kappa sometimes).
scale_or_rate_param (float) – Gamma distribution scale or rate parameter.
rate_parameterization (bool, optional) – Argument of ‘scale_or_rate_param’ is rate instead of scale. Defaults to ‘False’.
- Returns:
List of floats sampled from exonential distribution.
- Return type:
(list)
- gamma_rate_parameterization: bool
- gamma_scale_or_rate_param_list: List[float]
- gamma_shape_param_list: List[float]
- generate() List[float]
- get_rev_inference_spec_info() List[str]
- init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None
Check sample size against number of provided parameter values
This is the function behind the vectorization functionality
- param_dict: Dict[str, bool | List[int | float | str]] = {}
- parent_node_tracker: Dict[str, str] | None
- vectorized_params: List[List[int | float | str]]
- class phylojunction.distribution.dn_parametric.DnLogNormal(n_samples: int, n_repl: int, ln_mean: List[float], ln_sd: List[float], ln_log_space: bool, parent_node_tracker: Dict[str, str] | None = None)
Bases:
DistrForSampling
- DN_NAME = 'Log-normal'
- static draw_ln(n_samples: int, mean_param: float, sd_param: float, scale: float = 1.0, log_space: bool = True) float64 | ndarray
Return sample from log-normal distribution.
- Parameters:
n_samples (int) – Number of draws (sample size).
mean_param (float) – Mean (scale) of log-normal.
sd_param (float) – Std. deviation (shape) of log-normal distribution.
scale (float) – Mean (scale) of log-normal distribution in log-space. Defaults to 1.0.
log_space (bool, optional) – Flag specifying if mean of distribution is provided in log-space. Defaults to ‘True’.
- Returns:
List of floats sampled from log-normal distribution.
- Return type:
(list)
- generate() List[float]
- get_rev_inference_spec_info() List[str]
- init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None
Check sample size against number of provided parameter values
This is the function behind the vectorization functionality
- ln_log_space: bool
- ln_mean_list: List[float]
- ln_sd_list: List[float]
- param_dict: Dict[str, bool | List[int | float | str]]
- parent_node_tracker: Dict[str, str] | None
- vectorized_params: List[List[int | float | str]]
- class phylojunction.distribution.dn_parametric.DnNormal(n_samples: int, n_repl: int, norm_mean_param: List[float], norm_sd_param: List[float], parent_node_tracker: Dict[str, str] | None)
Bases:
DistrForSampling
- DN_NAME = 'Normal'
- static draw_normal(n_samples: int, mean_param: float, sd_param: float) float64 | ndarray
Return sample from normal distribution.
- Parameters:
n_samples (int) – Number of draws (sample size).
mean_param (float) – Mean (location) of normal distribution.
sd_param (float) – Std. deviation (scale) of normal distribution.
- Returns:
List of values sampled from normal distribution.
- Return type:
(list)
- generate() List[float]
- get_rev_inference_spec_info() List[str]
- init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None
Check sample size against number of provided parameter values
This is the function behind the vectorization functionality
- norm_mean_param_list: List[float]
- norm_sd_param_list: List[float]
- param_dict: Dict[str, bool | List[int | float | str]]
- parent_node_tracker: Dict[str, str] | None
- vectorized_params: List[List[int | float | str]]
- class phylojunction.distribution.dn_parametric.DnUnif(n_samples: int, n_repl: int, min_param: List[float], max_param: List[float], parent_node_tracker: Dict[str, str] | None = None)
Bases:
DistrForSampling
- DN_NAME = 'Uniform'
- static draw_unif(n_samples: int, min_param: float, max_param: float) float64 | ndarray
Return sample from uniform distribution.
- Parameters:
n_samples (int) – Number of draws (sample size).
min_param (float) – Minimum value.
max_param (float) – Maximum value.
- Returns:
List of floats sampled from exonential distribution.
- Return type:
(list)
- generate() List[float]
- get_rev_inference_spec_info() List[str]
- init_check_vectorize_sample_size(param_list: List[Any] = []) List[List[int | float | str]] | None
Check sample size against number of provided parameter values
This is the function behind the vectorization functionality
- max_param_list: List[float]
- min_param_list: List[float]
- param_dict: Dict[str, bool | List[int | float | str]]
- parent_node_tracker: Dict[str, str] | None
- vectorized_params: List[List[int | float | str]]