phylojunction.readwrite package
Submodules
phylojunction.readwrite.pj_read module
- phylojunction.readwrite.pj_read.is_csv(fp_string: str) bool
Check if file in provided path is in CSV format.
- Parameters:
fp_string (str) – String containing file path to text file being read
- Returns:
True if file is in CSV format, False otherwise
- Return type:
bool
- phylojunction.readwrite.pj_read.is_tsv(fp_string: str) bool
- phylojunction.readwrite.pj_read.parse_cli_str_write_fig(str_write_fig: str) Dict[str, Tuple[int]]
Parse command-line string argument for generating node plots.
- Parameters:
str_write_fig (str) – User-provided string as argument to command-line interface -f parameter (e.g., ‘tr;0-10’)
- Returns:
- Dictionary with node names as keys (str), tuple with range start
(int, int)
- phylojunction.readwrite.pj_read.read_csv_tsv_into_dataframe(fp_string: str, is_file_csv: bool = True) DataFrame
Read .csv/.tsv file into a pandas DataFrame.
- Parameters:
fp_string (str) – String containing file path to text file being read
is_csv (bool) – Flag specifying if file is .csv (otherwise it is) a .tsv.
- Returns:
- pandas DataFrame object (empty DataFrame if not
CSV nor TSV)
- Return type:
pd.DataFrame
- phylojunction.readwrite.pj_read.read_node_attr_update_tree(attr_tsv_path: str, attr_name: str, attr_cast: Callable, ann_tr: AnnotatedTree) None
Update AnnotatedTree members with attribute information.
- Parameters:
attr_tsv_path (string) – Path to .tsv file containing attribute values for nodes in the tree.
attr_name – Name of the attribute.
attr_cast (ty.Callable) – Type to be used to cast value as (e.g., int, float).
ann_tr (AnnotatedTree) – AnnotatedTree object to be updated with attribute name and values
- phylojunction.readwrite.pj_read.read_nwk_tree_str(nwk_tree_path_or_str: str, fn_name: str = 'read_tree', in_file: bool = True, node_names_attribute: str = '', n_states: int = 1, epsilon: float = 1e-12) AnnotatedTree
Read Newick tree string directly or in provided file.
- Parameters:
nwk_tree_path_or_str (str) – Tree Newick string, or path to file containing single tree Newick string.
fn_name (str) – Name of the function (in the .pj script) being called.
in_file (bool) – If tree string is in a file being passed as argument (True) or if Newick string is being passed directly (False). Defaults to ‘True’.
node_names_attribute (str) – Defaults to empty string “”.
- Returns:
- AnnotatedTree with populated attributes dictionary
(i.e., a dendropy.Tree that has been annotated)
- phylojunction.readwrite.pj_read.read_serialized_pgm(fp_string: str) DirectedAcyclicGraph
Read binary file storing PGM from a previous PJ session.
- Parameters:
fp_string (str) – String containing file path to binary file storing DAG from previous PJ session.
- Returns:
DAG object to be initialized.
- Return type:
- phylojunction.readwrite.pj_read.read_text_file(fp_string: str) List[str]
Read and parse text file into list of strings (one per line).
- Parameters:
fp_string (str) – String containing file path to text file being read
- Returns:
List of strings, each being a line of the input text file
- Return type:
str
phylojunction.readwrite.pj_write module
- phylojunction.readwrite.pj_write.dump_pgm_data(dir_string: str, dag_obj: DirectedAcyclicGraph, prefix: str = '', write_nex_states: bool = False) None
Write stochastic-node sampled values in specified directory
- Parameters:
dir_string (str) – Where to save the files to be written
dag_obj (DirectedAcyclicGraph) – DAG object whose sampled values we are extracting and writing to file.
prefix (str) – String to preceed file names
- Returns:
None
- phylojunction.readwrite.pj_write.dump_serialized_pgm(file_name: str, dag_obj: DirectedAcyclicGraph, cmd_log_list: List[str], prefix: str = '', to_folder: bool = False) None
Write serialized DAG in specified directory.
- Parameters:
file_name (str) – Serialized file name
dag_obj (DirectedAcyclicGraph) – DAG object to be serialized and saved.
prefix (str) – String to preceed file name.
- phylojunction.readwrite.pj_write.dump_trees_rb_smap_dfs(dir_string: str, dag_obj: DirectedAcyclicGraph, tr_dag_node_name_list: List[str], mapped_attr_name: str, prefix: str = '') None
- phylojunction.readwrite.pj_write.get_write_inference_rev_scripts(all_sims_model_spec_list: List[str], all_sims_mcmc_logging_spec_list: List[str], dir_list: List[str], prefix: str = '', write2file: bool = False) List[str]
Get and/or write full inference .Rev scripts
- Parameters:
all_sims_model_spec_list (str) – List of strings specifying just the model part of a .Rev script, one element per simulation
all_sims_mcmc_logging_spec_list (str) – List of strings specifying just the MCMC and logging part of a .Rev script, one element per simulation
dir_list (str) – List of three string specifying directories (inference root, scripts, results)
prefix (str) – String prefix to place before the name of files being written
write2file (bool) – If ‘True’, function writes to file. Defaults to ‘False’.
- Returns:
- A list of full .Rev script string specifications,
one per simulation
- Return type:
list of str(s)
- phylojunction.readwrite.pj_write.initialize_scalar_dataframe(sample_size: int, n_repl: int = 1, summaries_avg_over_repl: bool = False) DataFrame
_summary_
- Parameters:
sample_size (int) – Number of samples (i.e., independent full model simulations)
n_repl (int, optional) – How many times the scalar variable is replicated. Defaults to 1.
summaries_avg_over_repl (bool, optional) – Dataframe will hold statistics (avg., st. dev.) summarized over replicates. Defaults to False.
- Returns:
DataFrame with certain columns holding 0 or 0.0 values
- Return type:
pd.DataFrame
- phylojunction.readwrite.pj_write.initialize_tree_dataframe(sample_size: int, n_repl: int = 1, summaries: bool = False, summaries_avg_over_repl: bool = False) DataFrame
Initialize pandas DataFrame to hold tree information
- Parameters:
sample_size (int) – Number of samples (i.e., independent full model simulations)
n_repl (int, optional) – How many times the tree is replicated. Defaults to 1.
summaries (bool, optional) – Dataframe will hold individual-tree statistics. Defaults to False.
summaries_avg_over_repl (bool, optional) – Dataframe will hold statistics summarized over replicates. Defaults to False.
- Returns:
- DataFrame with certain columns holding 0 or
0.0 values
- Return type:
pd.DataFrame
- phylojunction.readwrite.pj_write.prep_data_df(dag_obj: DirectedAcyclicGraph, write_nex_states: bool = False) Tuple[List[DataFrame | Dict[int, DataFrame]], List[Dict[str, DataFrame]]]
Return two pandas DataFrame’s, with scalar and tree random variables.
- Parameters:
dag_obj (DirectedAcyclicGraph) – DAG object holding the simulated data to be tabulated.
write_nex_states (bool) – Whether to write .nex file with states
- Returns:
- Tuple with two lists as elements, one with file-suffix
strings, another with pandas.DataFrame’s
- Return type:
(tuple)
- phylojunction.readwrite.pj_write.prep_data_filepaths_dfs(scalar_output_stash: List[DataFrame | Dict[int, DataFrame]], tree_output_stash: List[Dict[str, DataFrame] | Dict[str, str]]) Tuple[List[str], List[DataFrame | str]]
Prepare list of file paths and list of pandas DataFrames.
- Parameters:
scalar_output_stash (list) – List of either pandas dataframes, or dictionaries with number of replicates as keys, and pandas dataframes as values. These contain scalar simulated data.
tree_output_stash (ty.List[ty.Dict[str, pd.DataFrame]]) – List of dictionaries with tree node names as keys, and pandas’ dataframes as values. These contains tree simulated data.
- Returns:
- List of filepath strings and and list of pandas
dataframes to be written to disk.
- Return type:
(tuple)
- phylojunction.readwrite.pj_write.prep_trees_rb_smap_dfs(dag_obj: DirectedAcyclicGraph, tree_dag_node_name_list: List[str], mapped_attr_name: str) Tuple[Dict[str, List[DataFrame]], Dict[str, List[str]]]
Initialize pandas DataFrame’s for holding stochastic maps.
Each dataframe will hold stochastic maps for all nodes for all (replicate) trees in a single sample. Each iteration will correspond to a single replicate.
- Parameters:
dag_obj (DirectedAcyclicGraph) – Instance of DAG class, holding the model.
tree_dag_node_name_list (list) – List with names of the DAG nodes tree random variables along which attributes have transitioned and for which we are producing a stochastic mapping dataframe.
mapped_attr_name (str) – Name of the attribute being stochastically mapped. E.g., ‘state’.
- Returns:
- A tuple with two dictionaries. The keys of both
dictionaries are DAG node names. The values are lists. Inside the lists of one are pandas DataFrame instances with all replicates per sample in a single dataframes (and one dataframe per sample). In the lists of the other dictionary’s values are lists of strings with the contents of the dataframes (used for unit testing).
- Return type:
(tuple)
- phylojunction.readwrite.pj_write.write_data_df(outfile_handle: IO, data_df: DataFrame, format='csv') None
Write a pandas DataFrame to output file stream
- Parameters:
outfile_handle (file) – Output file object to write to
data_df (pandas.DataFrame) – A data frame containing random variable values to print to file
format (str) – Extension for output file. It can be ‘csv’ or ‘tsv’. Defaults to ‘csv’.
- phylojunction.readwrite.pj_write.write_fig_to_file(outfile_path: str, fig_obj: Figure) None
- phylojunction.readwrite.pj_write.write_str_list(outfile_handle: IO, content_string_list: List[str]) None