phylojunction.readwrite package

Submodules

phylojunction.readwrite.pj_read module

phylojunction.readwrite.pj_read.is_csv(fp_string: str) → bool

Check if file in provided path is in CSV format.

Parameters:: fp_string (str) – String containing file path to text file being read
Returns:: True if file is in CSV format, False otherwise
Return type:: bool

phylojunction.readwrite.pj_read.is_tsv(fp_string: str) → bool

phylojunction.readwrite.pj_read.parse_cli_str_write_fig(str_write_fig: str) → Dict[str, Tuple[int]]

Parse command-line string argument for generating node plots.

Parameters:

str_write_fig (str) – User-provided string as argument to command-line interface -f parameter (e.g., ‘tr;0-10’)

Returns:

Dictionary with node names as keys (str), tuple with range start: (int, int)

phylojunction.readwrite.pj_read.read_csv_tsv_into_dataframe(fp_string: str, is_file_csv: bool = True) → DataFrame

Read .csv/.tsv file into a pandas DataFrame.

Parameters:

fp_string (str) – String containing file path to text file being read
is_csv (bool) – Flag specifying if file is .csv (otherwise it is) a .tsv.

Returns:

pandas DataFrame object (empty DataFrame if not: CSV nor TSV)

Return type:

pd.DataFrame

phylojunction.readwrite.pj_read.read_node_attr_update_tree(attr_tsv_path: str, attr_name: str, attr_cast: Callable, ann_tr: AnnotatedTree) → None

Update AnnotatedTree members with attribute information.

Parameters:

attr_tsv_path (string) – Path to .tsv file containing attribute values for nodes in the tree.
attr_name – Name of the attribute.
attr_cast (ty.Callable) – Type to be used to cast value as (e.g., int, float).
ann_tr (AnnotatedTree) – AnnotatedTree object to be updated with attribute name and values

phylojunction.readwrite.pj_read.read_nwk_tree_str(nwk_tree_path_or_str: str, fn_name: str = 'read_tree', in_file: bool = True, node_names_attribute: str = '', n_states: int = 1, epsilon: float = 1e-12) → AnnotatedTree

Read Newick tree string directly or in provided file.

Parameters:

nwk_tree_path_or_str (str) – Tree Newick string, or path to file containing single tree Newick string.
fn_name (str) – Name of the function (in the .pj script) being called.
in_file (bool) – If tree string is in a file being passed as argument (True) or if Newick string is being passed directly (False). Defaults to ‘True’.
node_names_attribute (str) – Defaults to empty string “”.

Returns:

AnnotatedTree with populated attributes dictionary: (i.e., a dendropy.Tree that has been annotated)

phylojunction.readwrite.pj_read.read_serialized_pgm(fp_string: str) → DirectedAcyclicGraph

Read binary file storing PGM from a previous PJ session.

Parameters:: fp_string (str) – String containing file path to binary file storing DAG from previous PJ session.
Returns:: DAG object to be initialized.
Return type:: (DirectedAcyclicGraph)

phylojunction.readwrite.pj_read.read_text_file(fp_string: str) → List[str]

Read and parse text file into list of strings (one per line).

Parameters:: fp_string (str) – String containing file path to text file being read
Returns:: List of strings, each being a line of the input text file
Return type:: str

phylojunction.readwrite.pj_write module

phylojunction.readwrite.pj_write.dump_pgm_data(dir_string: str, dag_obj: DirectedAcyclicGraph, prefix: str = '', write_nex_states: bool = False) → None

Write stochastic-node sampled values in specified directory

Parameters:

dir_string (str) – Where to save the files to be written
dag_obj (DirectedAcyclicGraph) – DAG object whose sampled values we are extracting and writing to file.
prefix (str) – String to preceed file names

Returns:

None

phylojunction.readwrite.pj_write.dump_serialized_pgm(file_name: str, dag_obj: DirectedAcyclicGraph, cmd_log_list: List[str], prefix: str = '', to_folder: bool = False) → None

Write serialized DAG in specified directory.

Parameters:

file_name (str) – Serialized file name
dag_obj (DirectedAcyclicGraph) – DAG object to be serialized and saved.
prefix (str) – String to preceed file name.

phylojunction.readwrite.pj_write.dump_trees_rb_smap_dfs(dir_string: str, dag_obj: DirectedAcyclicGraph, tr_dag_node_name_list: List[str], mapped_attr_name: str, prefix: str = '') → None

phylojunction.readwrite.pj_write.get_write_inference_rev_scripts(all_sims_model_spec_list: List[str], all_sims_mcmc_logging_spec_list: List[str], dir_list: List[str], prefix: str = '', write2file: bool = False) → List[str]

Get and/or write full inference .Rev scripts

Parameters:

all_sims_model_spec_list (str) – List of strings specifying just the model part of a .Rev script, one element per simulation
all_sims_mcmc_logging_spec_list (str) – List of strings specifying just the MCMC and logging part of a .Rev script, one element per simulation
dir_list (str) – List of three string specifying directories (inference root, scripts, results)
prefix (str) – String prefix to place before the name of files being written
write2file (bool) – If ‘True’, function writes to file. Defaults to ‘False’.

Returns:

A list of full .Rev script string specifications,: one per simulation

Return type:

list of str(s)

phylojunction.readwrite.pj_write.initialize_scalar_dataframe(sample_size: int, n_repl: int = 1, summaries_avg_over_repl: bool = False) → DataFrame

_summary_

Parameters:

sample_size (int) – Number of samples (i.e., independent full model simulations)
n_repl (int, optional) – How many times the scalar variable is replicated. Defaults to 1.
summaries_avg_over_repl (bool, optional) – Dataframe will hold statistics (avg., st. dev.) summarized over replicates. Defaults to False.

Returns:

DataFrame with certain columns holding 0 or 0.0 values

Return type:

pd.DataFrame

phylojunction.readwrite.pj_write.initialize_tree_dataframe(sample_size: int, n_repl: int = 1, summaries: bool = False, summaries_avg_over_repl: bool = False) → DataFrame

Initialize pandas DataFrame to hold tree information

Parameters:

sample_size (int) – Number of samples (i.e., independent full model simulations)
n_repl (int, optional) – How many times the tree is replicated. Defaults to 1.
summaries (bool, optional) – Dataframe will hold individual-tree statistics. Defaults to False.
summaries_avg_over_repl (bool, optional) – Dataframe will hold statistics summarized over replicates. Defaults to False.

Returns:

DataFrame with certain columns holding 0 or: 0.0 values

Return type:

pd.DataFrame

phylojunction.readwrite.pj_write.prep_data_df(dag_obj: DirectedAcyclicGraph, write_nex_states: bool = False) → Tuple[List[DataFrame | Dict[int, DataFrame]], List[Dict[str, DataFrame]]]

Return two pandas DataFrame’s, with scalar and tree random variables.

Parameters:

dag_obj (DirectedAcyclicGraph) – DAG object holding the simulated data to be tabulated.
write_nex_states (bool) – Whether to write .nex file with states

Returns:

Tuple with two lists as elements, one with file-suffix: strings, another with pandas.DataFrame’s

Return type:

(tuple)

phylojunction.readwrite.pj_write.prep_data_filepaths_dfs(scalar_output_stash: List[DataFrame | Dict[int, DataFrame]], tree_output_stash: List[Dict[str, DataFrame] | Dict[str, str]]) → Tuple[List[str], List[DataFrame | str]]

Prepare list of file paths and list of pandas DataFrames.

Parameters:

scalar_output_stash (list) – List of either pandas dataframes, or dictionaries with number of replicates as keys, and pandas dataframes as values. These contain scalar simulated data.
tree_output_stash (ty.List[ty.Dict[str, pd.DataFrame]]) – List of dictionaries with tree node names as keys, and pandas’ dataframes as values. These contains tree simulated data.

Returns:

List of filepath strings and and list of pandas: dataframes to be written to disk.

Return type:

(tuple)

phylojunction.readwrite.pj_write.prep_trees_rb_smap_dfs(dag_obj: DirectedAcyclicGraph, tree_dag_node_name_list: List[str], mapped_attr_name: str) → Tuple[Dict[str, List[DataFrame]], Dict[str, List[str]]]

Initialize pandas DataFrame’s for holding stochastic maps.

Each dataframe will hold stochastic maps for all nodes for all (replicate) trees in a single sample. Each iteration will correspond to a single replicate.

Parameters:

dag_obj (DirectedAcyclicGraph) – Instance of DAG class, holding the model.
tree_dag_node_name_list (list) – List with names of the DAG nodes tree random variables along which attributes have transitioned and for which we are producing a stochastic mapping dataframe.
mapped_attr_name (str) – Name of the attribute being stochastically mapped. E.g., ‘state’.

Returns:

A tuple with two dictionaries. The keys of both: dictionaries are DAG node names. The values are lists. Inside the lists of one are pandas DataFrame instances with all replicates per sample in a single dataframes (and one dataframe per sample). In the lists of the other dictionary’s values are lists of strings with the contents of the dataframes (used for unit testing).

Return type:

(tuple)

phylojunction.readwrite.pj_write.write_data_df(outfile_handle: IO, data_df: DataFrame, format='csv') → None

Write a pandas DataFrame to output file stream

Parameters:

outfile_handle (file) – Output file object to write to
data_df (pandas.DataFrame) – A data frame containing random variable values to print to file
format (str) – Extension for output file. It can be ‘csv’ or ‘tsv’. Defaults to ‘csv’.

phylojunction.readwrite.pj_write.write_fig_to_file(outfile_path: str, fig_obj: Figure) → None

phylojunction.readwrite.pj_write.write_str_list(outfile_handle: IO, content_string_list: List[str]) → None

phylojunction.readwrite package

Submodules

phylojunction.readwrite.pj_read module

phylojunction.readwrite.pj_write module

Module contents