cluster
A cluster is an ordered set of hits related to a model which satisfy the model distance constraints.
cluster API reference
Class Cluster
- class macsylib.cluster.Cluster(hits: list[CoreHit] | list[ModelHit], model, hit_weights)[source]
Handle hits relative to a model which collocates
- __contains__(m_hit: ModelHit) bool[source]
- Parameters:
m_hit – The hit to test
- Returns:
True if the hit is in the cluster hits, False otherwise
- __init__(hits: list[CoreHit] | list[ModelHit], model, hit_weights) None[source]
- Parameters:
hits – the hits constituting this cluster
model – the model associated to this cluster
hit_weights – the weight of the hit to compute the score
- __weakref__
list of weak references to the object
- _check_replicon_consistency() None[source]
- Raise:
MacsylibError if all hits of a cluster are NOT related to the same replicon
- fulfilled_function(*genes: ModelGene | str) frozenset[str][source]
- Parameters:
genes – The genes which must be tested.
- Returns:
the common functions between genes and this cluster.
- property functions: frozenset[str]
- Returns:
The set of functions encoded by this cluster function mean gene name or reference gene name for exchangeables genes for instance
<model vers="2.0"> <gene a presence="mandatory"/> <gene b presence="accessory"/> <exchangeable> <gene c /> </exchangeable> <gene/> </model>
the functions for a cluster corresponding to this model wil be {‘a’ , ‘b’}
- property hit_weights: HitWeight
- Returns:
the different weight for the hits used to compute the score
- property loner: bool
- Returns:
True if this cluster is made of only some hits representing the same gene and this gene is tag as loner False otherwise:
contains several hits coding for different genes
contains one hit but gene is not tag as loner (max_gene_required = 1)
- merge(cluster: Cluster, before: bool = False) None[source]
merge the cluster param in this one. (do it in place)
- Parameters:
cluster
before (bool) – If False the hits of the cluster will be added at the end of this one, Otherwise the cluster hits will be inserted before the hits of this one.
- Raises:
MacError – if the two clusters have not the same model
- property multi_system: bool
- Returns:
True if this cluster is made of only one hit representing a multi_system gene False otherwise:
contains several hits
contains one hit but gene is not tag as loner (max_gene_required = 1)
- replace(old: ModelHit, new: ModelHit) None[source]
replace hit old in this cluster by new one. (do it in place) beware the hits in a cluster are sorted by their position so if old hit and new hit have not same position the order will be changed
- Parameters:
old – the hit to replace
new – the new hit
- Returns:
None
- property replicon_name: str
- Returns:
The name of the replicon where this cluster is located
- Return type:
str
- property score: float
- Returns:
The score for this cluster
cluster functions
Functions that help to build macsylib.cluster.Cluster object.
- class macsylib.cluster.Cluster(hits: list[CoreHit] | list[ModelHit], model, hit_weights)[source]
Handle hits relative to a model which collocates
- fulfilled_function(*genes: ModelGene | str) frozenset[str][source]
- Parameters:
genes – The genes which must be tested.
- Returns:
the common functions between genes and this cluster.
- property functions: frozenset[str]
- Returns:
The set of functions encoded by this cluster function mean gene name or reference gene name for exchangeables genes for instance
<model vers="2.0"> <gene a presence="mandatory"/> <gene b presence="accessory"/> <exchangeable> <gene c /> </exchangeable> <gene/> </model>
the functions for a cluster corresponding to this model wil be {‘a’ , ‘b’}
- property hit_weights: HitWeight
- Returns:
the different weight for the hits used to compute the score
- property loner: bool
- Returns:
True if this cluster is made of only some hits representing the same gene and this gene is tag as loner False otherwise:
contains several hits coding for different genes
contains one hit but gene is not tag as loner (max_gene_required = 1)
- merge(cluster: Cluster, before: bool = False) None[source]
merge the cluster param in this one. (do it in place)
- Parameters:
cluster
before (bool) – If False the hits of the cluster will be added at the end of this one, Otherwise the cluster hits will be inserted before the hits of this one.
- Raises:
MacError – if the two clusters have not the same model
- property multi_system: bool
- Returns:
True if this cluster is made of only one hit representing a multi_system gene False otherwise:
contains several hits
contains one hit but gene is not tag as loner (max_gene_required = 1)
- replace(old: ModelHit, new: ModelHit) None[source]
replace hit old in this cluster by new one. (do it in place) beware the hits in a cluster are sorted by their position so if old hit and new hit have not same position the order will be changed
- Parameters:
old – the hit to replace
new – the new hit
- Returns:
None
- property replicon_name: str
- Returns:
The name of the replicon where this cluster is located
- Return type:
str
- property score: float
- Returns:
The score for this cluster
- macsylib.cluster.build_clusters(hits: list[ModelHit], rep_info: RepliconInfo, model: Model, hit_weights: HitWeight) tuple[list[~macsylib.cluster.Cluster], dict[slice(<class 'str'>, macsylib.hit.Loner | macsylib.hit.LonerMultiSystem, None)]][source]
From a list of filtered hits, and replicon information (topology, length), build all lists of hits that satisfied the constraints:
max_gene_inter_space
loner
multi_system
If Yes create a cluster. A cluster contains at least two hits separated by less or equal than max_gene_inter_space Except for loner genes which are allowed to be alone in a cluster
- Parameters:
hits – list of filtered hits
rep_info – the replicon to analyse
model – the model to study
hit_weights – the hit weight needed to compute the cluster score
- Returns:
list of regular clusters, the special clusters (loners not in cluster and multi systems)
- Return type:
tuple with 2 elements
true_clusters which is list of
Clusterobjectstrue_loners: a dict { str function: :class:macsylib.hit.Loner | :class:macsylib.hit.LonerMultiSystem object}
- macsylib.cluster.closest_hit(hit: ModelHit, ref_hits: list[ModelHit]) ModelHit[source]
- Parameters:
hit – the hit
ref_hits – The reference hits. the distance between hit and each ref_hit will be computed. the closest ref_hit will be returned
- Returns:
- The closest ref_hit to the hit. If two ref_hits are equidistant form the hit
return those with the lowest position. for isnstance:
position 40 20 60 closest_hit( ref_hit, [H1, H2]
will return H1
- macsylib.cluster.clusterize_hits_around_key_genes(key_genes: set[str], hits: list[ModelHit], model: Model, hit_weights: HitWeight, rep_info: RepliconInfo) list[Cluster][source]
clusterize hit regarding the distance between them and around key_gene
- Parameters:
hits (list of
macsylib.model.ModelHitobjects) – the hits to clusterizemodel (
macsylib.model.Modelobject) – the model to considerhit_weights (
macsylib.hit.HitWeightobject) – the hit weight to compute the score
- Returns:
the clusters
- Return type:
list of
macsylib.cluster.Clusterobjects.
- macsylib.cluster.clusterize_hits_on_distance_only(hits: list[ModelHit], model: Model, hit_weights: HitWeight, rep_info: RepliconInfo) list[Cluster][source]
clusterize hit regarding the distance between them
- Parameters:
hits – the hits to clusterize
model – the model to consider
hit_weights – the hit weight to compute the score
rep_info – The information on the replicon
- Returns:
the clusters
- macsylib.cluster.is_a(hit: ModelHit | CoreHit, ref_hits: set[str]) bool[source]
- Parameters:
hit – The hit to check
ref_hits – the gene name of the reference hit
- Returns:
True if the hit belong to the reference hits, False otherwise
- macsylib.cluster.scaffold_to_cluster(cluster_scaffold: list[ModelHit], model: Model, hit_weights: HitWeight) Cluster[source]
transform a list of ModelHit in a cluster if the hit colocalize and they are not all neutral and they do not code for same gene add the new cluster to the clusters
- Parameters:
cluster_scaffold – model hit to transform in cluster
model – The model related to thus cluster
hit_weights – the hit weight to compute scores
- Returns:
Cluster
- macsylib.cluster.split_cluster_on_key_genes(key_genes: set[str], cluster: Cluster) list[Cluster][source]
split a Cluster containing several key genes to have one cluster per key genes, with their closest hits
For instance if a set of gene clusterize as following (we considering that all gene are 10 genea between next one:
positions 10 20 30 40 50 60 70 genes A KG1 B C D KG2 E
The resulting cluster after split around the 2 KG (key genes):
c1 = [A, KG1, B, C], c2 = [D, KG2, E]
The question is for gene C which is equidistant from KG1 KG2 C will be clustered with the most left cluster
- Parameters:
key_genes – the gene names which be seed for cluster
cluster – The cluster to split
- Returns: