search_genes

manage the paralelization of code which execute in fine hmmsearch to find the genes constituting the models in the input dataset.

search_genes API reference

search_genes

Manage the hmm step (hmmsearch or recover results from previous run) in parallele

macsylib.search_genes.search_genes(genes: list[ModelGene], cfg: Config) list[HMMReport][source]

For each gene of the list, use the corresponding profile to perform an Hmmer search, and parse the output to generate a HMMReport that is saved in a file after CoreHit filtering. These tasks are performed in parallel using threads. The number of workers can be limited by worker_nb directive in the config object or in the command-line with the “-w” option.

Parameters:
  • genes – the genes to search in the input sequence dataset

  • cfg – the configuration object

macsylib.search_genes.worker_cpu(genes_nb: int, cfg: Config) tuple[int, int][source]

Compute the optimum number of worker and cpu per worker The number of worker is set by the user (1 by default 0 means all worker available)

we use one worker per gene if number of workers is greater than number of genes then several cpu can be use by hmsearch to speed up the search step

Parameters:
  • genes_nb – the number of genes to search

  • cfg – The macsylib configuration

Returns:

the number of worker and cpu_per_worker to use

Return type:

tuple (int worker_nb, int cpu_per_worker)