Clustering — mdpath.src.cluster
This module contains the class PatwayClustering which calculates the overlap between pathways and clusters them based on the overlap. Clusters are generated through hirarcical clustering using scipy. Optimal cluster size is evaluated using the silhouette score.
Classes
PatwayClustering
- class mdpath.src.cluster.PatwayClustering(df_close_res: DataFrame, pathways: list, num_processes: int)[source]
Bases:
objectPerform clustering of pathways based on the overlap of close residue pairs.
- Attributes:
df (pd.DataFrame): DataFrame containing close residue pairs.
pathways (list): List of pathways, where each pathway is a list of residue indices.
num_processes (int): Number of processes to use for parallel computation.
overlapp_df (pd.DataFrame): DataFrame containing the overlap between all pathway pairs.
- calculate_overlap_for_pathway(args: tuple) list[source]
Calculates the overlap between a pathway and all other pathways.
- Args:
args (tuple): Argument wrapper conatining the pathway index, the pathway, all pathways and the dataframe with close residue pairs.
- Returns:
result (list): List of dictionaries with the overlap between the given pathway and all other pathways.
- calculate_overlap_parallel() DataFrame[source]
Parallelization wrapper for the calculate_overlap_for_pathway function.
- Returns:
overlap_df (pd.DataFrame): Pandas dataframe with the overlap between all pathways and all other pathways.
- pathway_clusters_dictionary(clusters: dict, sorted_paths: list) dict[source]
Generates a dictionary mapping cluster numbers to lists of pathways.
- Args:
clusters (dict): A dictionary where keys are cluster numbers and values are lists of pathway IDs.
sorted_paths (list): A list of pathways, where each pathway is a tuple and the first element is the pathway name.
- Returns:
dict: A dictionary where keys are cluster numbers and values are lists of pathways corresponding to each cluster.
- pathways_cluster(n_top_clust: int = 0, save_path: str = 'clustered_paths.png') dict[source]
Clustering of pathways based on the overlap between them.
- Args:
n_top_clust (int, optional): Number of clusters to output. Defaults to all.
save_path (str, optional): Save path for cluster dendogram figure. Defaults to “clustered_paths.png”.
- Returns:
clusters (dict): Dictionary with the clusters and their pathways.