phylojunction.readwrite package

Submodules

phylojunction.readwrite.pj_read module

phylojunction.readwrite.pj_read.is_csv(fp_string: str) bool

Check if file in provided path is in CSV format.

Parameters:

fp_string (str) – String containing file path to text file being read

Returns:

True if file is in CSV format, False otherwise

Return type:

bool

phylojunction.readwrite.pj_read.is_tsv(fp_string: str) bool
phylojunction.readwrite.pj_read.parse_cli_str_write_fig(str_write_fig: str) Dict[str, Tuple[int]]

Parse command-line string argument for generating node plots.

Parameters:

str_write_fig (str) – User-provided string as argument to command-line interface -f parameter (e.g., ‘tr;0-10’)

Returns:

Dictionary with node names as keys (str), tuple with range start

(int, int)

phylojunction.readwrite.pj_read.read_csv_tsv_into_dataframe(fp_string: str, is_file_csv: bool = True) DataFrame

Read .csv/.tsv file into a pandas DataFrame.

Parameters:
  • fp_string (str) – String containing file path to text file being read

  • is_csv (bool) – Flag specifying if file is .csv (otherwise it is) a .tsv.

Returns:

pandas DataFrame object (empty DataFrame if not

CSV nor TSV)

Return type:

pd.DataFrame

phylojunction.readwrite.pj_read.read_node_attr_update_tree(attr_tsv_path: str, attr_name: str, attr_cast: Callable, ann_tr: AnnotatedTree) None

Update AnnotatedTree members with attribute information.

Parameters:
  • attr_tsv_path (string) – Path to .tsv file containing attribute values for nodes in the tree.

  • attr_name – Name of the attribute.

  • attr_cast (ty.Callable) – Type to be used to cast value as (e.g., int, float).

  • ann_tr (AnnotatedTree) – AnnotatedTree object to be updated with attribute name and values

phylojunction.readwrite.pj_read.read_nwk_tree_str(nwk_tree_path_or_str: str, fn_name: str = 'read_tree', in_file: bool = True, node_names_attribute: str = '', n_states: int = 1, epsilon: float = 1e-12) AnnotatedTree

Read Newick tree string directly or in provided file.

Parameters:
  • nwk_tree_path_or_str (str) – Tree Newick string, or path to file containing single tree Newick string.

  • fn_name (str) – Name of the function (in the .pj script) being called.

  • in_file (bool) – If tree string is in a file being passed as argument (True) or if Newick string is being passed directly (False). Defaults to ‘True’.

  • node_names_attribute (str) – Defaults to empty string “”.

Returns:

AnnotatedTree with populated attributes dictionary

(i.e., a dendropy.Tree that has been annotated)

phylojunction.readwrite.pj_read.read_serialized_pgm(fp_string: str) DirectedAcyclicGraph

Read binary file storing PGM from a previous PJ session.

Parameters:

fp_string (str) – String containing file path to binary file storing DAG from previous PJ session.

Returns:

DAG object to be initialized.

Return type:

(DirectedAcyclicGraph)

phylojunction.readwrite.pj_read.read_text_file(fp_string: str) List[str]

Read and parse text file into list of strings (one per line).

Parameters:

fp_string (str) – String containing file path to text file being read

Returns:

List of strings, each being a line of the input text file

Return type:

str

phylojunction.readwrite.pj_write module

phylojunction.readwrite.pj_write.dump_pgm_data(dir_string: str, dag_obj: DirectedAcyclicGraph, prefix: str = '', write_nex_states: bool = False) None

Write stochastic-node sampled values in specified directory

Parameters:
  • dir_string (str) – Where to save the files to be written

  • dag_obj (DirectedAcyclicGraph) – DAG object whose sampled values we are extracting and writing to file.

  • prefix (str) – String to preceed file names

Returns:

None

phylojunction.readwrite.pj_write.dump_serialized_pgm(file_name: str, dag_obj: DirectedAcyclicGraph, cmd_log_list: List[str], prefix: str = '', to_folder: bool = False) None

Write serialized DAG in specified directory.

Parameters:
  • file_name (str) – Serialized file name

  • dag_obj (DirectedAcyclicGraph) – DAG object to be serialized and saved.

  • prefix (str) – String to preceed file name.

phylojunction.readwrite.pj_write.dump_trees_rb_smap_dfs(dir_string: str, dag_obj: DirectedAcyclicGraph, tr_dag_node_name_list: List[str], mapped_attr_name: str, prefix: str = '') None
phylojunction.readwrite.pj_write.get_write_inference_rev_scripts(all_sims_model_spec_list: List[str], all_sims_mcmc_logging_spec_list: List[str], dir_list: List[str], prefix: str = '', write2file: bool = False) List[str]

Get and/or write full inference .Rev scripts

Parameters:
  • all_sims_model_spec_list (str) – List of strings specifying just the model part of a .Rev script, one element per simulation

  • all_sims_mcmc_logging_spec_list (str) – List of strings specifying just the MCMC and logging part of a .Rev script, one element per simulation

  • dir_list (str) – List of three string specifying directories (inference root, scripts, results)

  • prefix (str) – String prefix to place before the name of files being written

  • write2file (bool) – If ‘True’, function writes to file. Defaults to ‘False’.

Returns:

A list of full .Rev script string specifications,

one per simulation

Return type:

list of str(s)

phylojunction.readwrite.pj_write.initialize_scalar_dataframe(sample_size: int, n_repl: int = 1, summaries_avg_over_repl: bool = False) DataFrame

_summary_

Parameters:
  • sample_size (int) – Number of samples (i.e., independent full model simulations)

  • n_repl (int, optional) – How many times the scalar variable is replicated. Defaults to 1.

  • summaries_avg_over_repl (bool, optional) – Dataframe will hold statistics (avg., st. dev.) summarized over replicates. Defaults to False.

Returns:

DataFrame with certain columns holding 0 or 0.0 values

Return type:

pd.DataFrame

phylojunction.readwrite.pj_write.initialize_tree_dataframe(sample_size: int, n_repl: int = 1, summaries: bool = False, summaries_avg_over_repl: bool = False) DataFrame

Initialize pandas DataFrame to hold tree information

Parameters:
  • sample_size (int) – Number of samples (i.e., independent full model simulations)

  • n_repl (int, optional) – How many times the tree is replicated. Defaults to 1.

  • summaries (bool, optional) – Dataframe will hold individual-tree statistics. Defaults to False.

  • summaries_avg_over_repl (bool, optional) – Dataframe will hold statistics summarized over replicates. Defaults to False.

Returns:

DataFrame with certain columns holding 0 or

0.0 values

Return type:

pd.DataFrame

phylojunction.readwrite.pj_write.prep_data_df(dag_obj: DirectedAcyclicGraph, write_nex_states: bool = False) Tuple[List[DataFrame | Dict[int, DataFrame]], List[Dict[str, DataFrame]]]

Return two pandas DataFrame’s, with scalar and tree random variables.

Parameters:
  • dag_obj (DirectedAcyclicGraph) – DAG object holding the simulated data to be tabulated.

  • write_nex_states (bool) – Whether to write .nex file with states

Returns:

Tuple with two lists as elements, one with file-suffix

strings, another with pandas.DataFrame’s

Return type:

(tuple)

phylojunction.readwrite.pj_write.prep_data_filepaths_dfs(scalar_output_stash: List[DataFrame | Dict[int, DataFrame]], tree_output_stash: List[Dict[str, DataFrame] | Dict[str, str]]) Tuple[List[str], List[DataFrame | str]]

Prepare list of file paths and list of pandas DataFrames.

Parameters:
  • scalar_output_stash (list) – List of either pandas dataframes, or dictionaries with number of replicates as keys, and pandas dataframes as values. These contain scalar simulated data.

  • tree_output_stash (ty.List[ty.Dict[str, pd.DataFrame]]) – List of dictionaries with tree node names as keys, and pandas’ dataframes as values. These contains tree simulated data.

Returns:

List of filepath strings and and list of pandas

dataframes to be written to disk.

Return type:

(tuple)

phylojunction.readwrite.pj_write.prep_trees_rb_smap_dfs(dag_obj: DirectedAcyclicGraph, tree_dag_node_name_list: List[str], mapped_attr_name: str) Tuple[Dict[str, List[DataFrame]], Dict[str, List[str]]]

Initialize pandas DataFrame’s for holding stochastic maps.

Each dataframe will hold stochastic maps for all nodes for all (replicate) trees in a single sample. Each iteration will correspond to a single replicate.

Parameters:
  • dag_obj (DirectedAcyclicGraph) – Instance of DAG class, holding the model.

  • tree_dag_node_name_list (list) – List with names of the DAG nodes tree random variables along which attributes have transitioned and for which we are producing a stochastic mapping dataframe.

  • mapped_attr_name (str) – Name of the attribute being stochastically mapped. E.g., ‘state’.

Returns:

A tuple with two dictionaries. The keys of both

dictionaries are DAG node names. The values are lists. Inside the lists of one are pandas DataFrame instances with all replicates per sample in a single dataframes (and one dataframe per sample). In the lists of the other dictionary’s values are lists of strings with the contents of the dataframes (used for unit testing).

Return type:

(tuple)

phylojunction.readwrite.pj_write.write_data_df(outfile_handle: IO, data_df: DataFrame, format='csv') None

Write a pandas DataFrame to output file stream

Parameters:
  • outfile_handle (file) – Output file object to write to

  • data_df (pandas.DataFrame) – A data frame containing random variable values to print to file

  • format (str) – Extension for output file. It can be ‘csv’ or ‘tsv’. Defaults to ‘csv’.

phylojunction.readwrite.pj_write.write_fig_to_file(outfile_path: str, fig_obj: Figure) None
phylojunction.readwrite.pj_write.write_str_list(outfile_handle: IO, content_string_list: List[str]) None

Module contents