scripts

The are 4 entry points.

  • macsydata: which allow to manage the models

  • macsyprofile: an utility dedicated to modelers which gather information about hmmer output

API reference

macsydata

This is the entrypoint to the msl_data command mmsl_data allow the user to manage the MacSylib models

macsylib.scripts.macsydata._find_all_installed_packages(models_dir: list[str] | None = None, package_name: str = 'macsylib') ModelRegistry[source]
Parameters:
  • models_dir – list of path where package can be find.

  • package_name – the name of the high level tool that embed macsylib

Returns:

all models installed

macsylib.scripts.macsydata._find_installed_package(model_pack_name: str, models_dir: list[str] | None = None, package_name: str = 'macsylib') ModelLocation | None[source]

search if a package names pack_name is already installed

Parameters:
  • model_pack_name – the name of the family model to search

  • models_dir – list of path where package can be find.

  • package_name – the name of the high level tool that embed macsylib, for instance: ‘macsyfinder’

Returns:

The model location corresponding to the pack_name

macsylib.scripts.macsydata._get_remote_available_versions(model_pack_name: str, org: str) list[str][source]

Ask the organization org the available version for the package pack_name :param model_pack_name: the name of the models package :param org: The remote organization to query :return: list of available version for the package

macsylib.scripts.macsydata._search_in_desc(pattern: str, remote: RemoteModelIndex, m_packages: list[str], match_case: bool = False) tuple[str, str, str][source]
Parameters:
  • pattern – the substring to search packages descriptions

  • remote – the uri of the macsy-models index

  • m_packages – list of model packages to search in

  • match_case – True if the search is case-sensitive, False otherwise

Returns:

macsylib.scripts.macsydata._search_in_pack_name(pattern: str, remote: RemoteModelIndex, m_packages: list[str], match_case: bool = False) list[tuple[str, str, dict]][source]
Parameters:
  • pattern – the substring to search packages names

  • remote – the uri of the macsy-models index

  • m_packages – list of model packages to search in

  • match_case – True if the search is case-sensitive, False otherwise

Returns:

macsylib.scripts.macsydata.build_arg_parser(header: str, version: str, package_name: str = 'macsylib', tool_name: str = 'msl_data') ArgumentParser[source]

Build argument parser.

Parameters:
  • header – the header of console script

  • args – The arguments provided on the command line

  • package_name – the name of the higher package that embed the macsylib (eg ‘macsyfinder’)

  • tool_name – the name of this tool as it appear in pyproject.toml

Returns:

The arguments parsed

macsylib.scripts.macsydata.cmd_name(args: Namespace) str[source]

Return the name of the command being executed (scriptname + operation).

Example

msl_data uninstall

Parameters:

args – the arguments passed on the command line

macsylib.scripts.macsydata.do_available(args: Namespace) None[source]

List Models available on macsy-models :param args: the arguments passed on the command line :return: None

macsylib.scripts.macsydata.do_check(args: Namespace) None[source]
Parameters:

args – the arguments passed on the command line

Return type:

None

macsylib.scripts.macsydata.do_cite(args: Namespace) None[source]

How to cite an installed model.

Parameters:

args – the arguments passed on the command line

macsylib.scripts.macsydata.do_download(args: Namespace) str | None[source]

Download tarball from remote models’ repository.

Parameters:

args (argparse.Namespace object) – the arguments passed on the command line

macsylib.scripts.macsydata.do_freeze(args: Namespace) None[source]

display all models installed with their respective version, in requirement format.

Parameters:

args – the arguments passed on the command line

macsylib.scripts.macsydata.do_help(args: Namespace) None[source]

Display on stdout the content of readme file if the readme file does not exist display a message to the user see macsylib.package.help()

Parameters:

args – the arguments passed on the command line (the package name)

Returns:

None

Raises:

ValueError – if the package name is not known.

macsylib.scripts.macsydata.do_info(args: Namespace) None[source]

Show information about installed model.

Parameters:

args – the arguments passed on the command line

Raises:

ValueError – if the package is not found locally

macsylib.scripts.macsydata.do_init_package(args: Namespace) None[source]

Create a template for data package

  • skeleton for metadata.yml

  • definitions directory with a skeleton of models.xml

  • profiles directory

  • skeleton for README.md file

  • COPYRIGHT file (if holders option is set)

  • LICENSE file (if model_license option is set)

Parameters:

args – The parsed commandline subcommand arguments

Returns:

None

macsylib.scripts.macsydata.do_install(args: Namespace) None[source]

Install new models in macsylib local models repository.

Parameters:

args – the arguments passed on the command line

Raises:
  • RuntimeError – if there is problem is installed package

  • ValueError – if the package and/or version is not found

macsylib.scripts.macsydata.do_list(args: Namespace) None[source]

List installed models.

Parameters:

args – the arguments passed on the command line

Search macsy-models for Model in a remote index. by default search in package name, if option -S is set search also in description by default the search is case-insensitive except if option –match-case is set.

Parameters:

args – the arguments passed on the command line

macsylib.scripts.macsydata.do_show_definition(args: Namespace) None[source]

display on stdout the definition if only a package or sub-package is specified display all model definitions in the corresponding package or subpackage

for instance

TXSS+/bacterial T6SSii T6SSiii

display models TXSS+/bacterial/T6SSii and TXSS+/bacterial/T6SSiii

TXSS+/bacterial all or TXSS+/bacterial

display all models contains in TXSS+/bacterial subpackage

Parameters:

args – the arguments passed on the command line

macsylib.scripts.macsydata.do_show_package(args: Namespace) None[source]

Display the structure of an installed model package. The family , sub families and models in tree-like format

Parameters:

args – the passed on the command line (the package name)

Returns:

None

Raises:

ValueError – if the package is not find.

macsylib.scripts.macsydata.do_uninstall(args: Namespace) None[source]

Remove models from macsylib local models repository.

Parameters:

args – the arguments passed on the command line

Raises:

ValueError – if the package is not found locally

macsylib.scripts.macsydata.init_logger(level: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int = 'INFO', out: bool = True) Logger[source]
Parameters:
  • level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’

  • out – if the log message must be displayed

Returns:

logger

macsylib.scripts.macsydata.main(args: list[str] = None, header: str = "\n\n     *            *               *\n*           *               *   *   *  *    **\n  **     *    *   *  *     *        *\n            *      _      *   _   *   _      *\n  *  _ __ ___  ___| |      __| | __ _| |_ __ _\n    | '_ ` _ \\/ __| |     / _` |/ _` | __/ _` |\n    | | | | | \\__ \\ |    | (_| | (_| | || (_| |\n    |_| |_| |_|___/_|_____\\__,_|\\__,_|\\__\\__,_|\n           *        |_____|          *\n *      *   * *     *   **         *   *  *\n  *      *         *        *    *\n*                           *  *           *\n\n\nmsl_data - Model Management Tool\n", version="msl_data 1.0.1 \nPython 3.13.5 (main, Jul 17 2025, 00:32:28) [GCC 14.3.0]\n\nMacSyLib is distributed under the terms of the GNU General Public License (GPLv3).\nSee the COPYING file for details.\n\nIf you use this software please cite:\nNéron, Bertrand; Denise, Rémi; Coluzzi, Charles; Touchon, Marie; Rocha, Eduardo P.C.; Abby, Sophie S.\nMacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes.\nPeer Community Journal, Volume 3 (2023), article no. e28. doi : 10.24072/pcjournal.250.\nhttps://peercommunityjournal.org/articles/10.24072/pcjournal.250/\nand don't forget to cite models used:\nmacsydata cite <model>\n", package_name: str = 'macsylib', tool_name='msl_data') None[source]

Main entry point.

Parameters:
  • args – the arguments passed on the command line (before parsing)

  • header – the header of console scriot

  • package_name – the name of the higher package that embed the macsylib (eg ‘macsyfinder’)

  • tool_name – the name of this tool as it appear in pyproject.toml

macsylib.scripts.macsydata.verbosity_to_log_level(verbosity: int) int[source]

transform the number of -v option in loglevel :param verbosity: number of -v option on the command line :return: an int corresponding to a logging level

macsyprofile

class macsylib.scripts.macsyprofile.HmmProfile(gene_name: str, gene_profile_lg: int, hmmer_output: str, cfg: Config)[source]

Handle the HMM output files

__init__(gene_name: str, gene_profile_lg: int, hmmer_output: str, cfg: Config)[source]
Parameters:
  • gene_name – the name of the gene corresponding to the profile search reported here

  • hmmer_output – The path to the raw Hmmer output file

  • cfg – the configuration object

__weakref__

list of weak references to the object

_build_my_db(hmm_output: str) dict[slice(<class 'str'>, None, None)][source]

Build the keys of a dictionary object to store sequence identifiers of hits.

Parameters:

hmm_output – the path to the hmmsearch output to parse.

Returns:

a dictionary containing a key for each sequence id of the hits

_fill_my_db(db: dict[slice(<class 'str'>, tuple[int, int], None)]) None[source]

Fill the dictionary with information on the matched sequences

Parameters:

db – the database containing all sequence id of the hits.

_hit_start(line: str) bool[source]
Parameters:

line – the line to parse

Returns:

True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise

_parse_hmm_body(hit_id: str, gene_profile_lg: int, seq_lg: int, coverage_threshold: float, replicon_name: str, position_hit: int, i_evalue_sel: float, b_grp: list[list[str]]) list[CoreHit][source]

Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)

Parameters:
  • hit_id – the sequence identifier

  • gene_profile_lg – the length of the profile matched

  • seq_lg – the length of the sequence

  • coverage_threshold – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection.

  • replicon_name – the identifier of the replicon

  • position_hit – the rank of the sequence matched in the input dataset file

  • i_evalue_sel – the maximal i-evalue (independent evalue) for hit selection

  • b_grp – the Hmmer output lines to deal with (grouped by hit)

Returns:

a sequence of hits

_parse_hmm_header(h_grp: str) str[source]
Parameters:

h_grp – the sequence of string return by groupby function representing the header of a hit

Returns:

the sequence identifier from a set of lines that corresponds to a single hit

parse() list[LightHit][source]

parse a hmm output file and extract all hits and do some basic computation (coverage profile)

Returns:

The list of extracted hits

class macsylib.scripts.macsyprofile.LightHit(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int)[source]

Handle hmm hits

__eq__(other)

Return self==value.

__init__(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int) None
__repr__()

Return repr(self).

__str__() str[source]

Return str(self).

__weakref__

list of weak references to the object

macsylib.scripts.macsyprofile.get_gene_name(path: str, suffix: str) str[source]
Parameters:
  • path – The path to the hmm output to analyse

  • suffix – the suffix of the hmm output file

Returns:

the name of the analysed gene

macsylib.scripts.macsyprofile.get_profile_len(path: str) int[source]

Parse the HMM profile to extract the length and the presence of GA bit threshold

Parameters:

path – The path to the hmm profile used to produce the hmm search output to analyse

Returns:

the length, presence of ga bit threshold

macsylib.scripts.macsyprofile.get_version_message(tool_name: str = 'msl_profile', data_mgr: str = 'msl_data') str[source]
Parameters:

tool_name – The name of the high level tool

Returns:

the long description of the macsylib version

macsylib.scripts.macsyprofile.init_logger(level: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int = 'INFO', out: bool = True)[source]
Parameters:
  • level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’

  • out – if the log message must be displayed

Returns:

logger

macsylib.scripts.macsyprofile.main(args: list[str] | None = None, header: str = "\n     *            *               *                   * *\n            *               *   *   *  *    **           \n  **     *    *   *  *     *                    *        \n            *       _   *             **    __ _ _     *         \n      _ __ ___  ___| |     _ __  _ __ ___  / _(_) | ___          \n     | '_ ` _ \\/ __| |    | '_ \\| '__/ _ \\| |_| | |/ _ \\       \n     | | | | | \\__ \\ |    | |_) | | | (_) |  _| | |  __/\n     |_| |_| |_|___/_|____| .__/|_|  \\___/|_| |_|_|\\___|\n           *         |_____|_|        *                  *\n        *   * *     *   **         *   *  *           *\n  *      *         *        *    *              *        \n             *                           *  *           * \n\n\nmsl_profile - MacSyLib profile helper tool\n", package_name: str = 'macsylib', tool_name: str = 'msl_profile', log_level: str | int | None = None) None[source]

main entry point

Parameters:
  • args – the arguments passed on the command line without the program name

  • package_name – the name of the higher package that embed the macsylib (eg ‘macsyfinder’)

  • tool_name – the name of this tool as it appear in pyproject.toml

  • log_level – the output verbosity

macsylib.scripts.macsyprofile.parse_args(header: str, args: list[str], package_name='macsylib', tool_name: str = 'msl_profile') Namespace[source]

Build argument parser.

Parameters:
  • header – the header of console scriot

  • args – The arguments provided on the command line

  • package_name – the name of the higher package that embed the macsylib (eg ‘macsyfinder’)

  • tool_name – the name of this tool as it appear in pyproject.toml

Returns:

The arguments parsed

macsylib.scripts.macsyprofile.result_header(cmd: list[str], model: str, model_vers: str, tool_name='msl_profile') str[source]
Parameters:

cmd – the command use dto launch this analyse

Model:

The name of model family

Model_vers:

The version of the model

Returns:

The header of the result file

macsylib.scripts.macsyprofile.verbosity_to_log_level(verbosity: int) int[source]

transform the number of -v option in loglevel :param verbosity: number of -v option on the command line :return: an int corresponding to a logging level