scripts
The are 4 entry points.
macsydata: which allow to manage the models
macsyprofile: an utility dedicated to modelers which gather information about hmmer output
API reference
macsydata
This is the entrypoint to the msl_data command mmsl_data allow the user to manage the MacSylib models
- macsylib.scripts.macsydata._find_all_installed_packages(models_dir: list[str] | None = None, package_name: str = 'macsylib') ModelRegistry[source]
- Parameters:
models_dir – list of path where package can be find.
package_name – the name of the high level tool that embed macsylib
- Returns:
all models installed
- macsylib.scripts.macsydata._find_installed_package(model_pack_name: str, models_dir: list[str] | None = None, package_name: str = 'macsylib') ModelLocation | None[source]
search if a package names pack_name is already installed
- Parameters:
model_pack_name – the name of the family model to search
models_dir – list of path where package can be find.
package_name – the name of the high level tool that embed macsylib, for instance: ‘macsyfinder’
- Returns:
The model location corresponding to the pack_name
- macsylib.scripts.macsydata._get_remote_available_versions(model_pack_name: str, org: str) list[str][source]
Ask the organization org the available version for the package pack_name :param model_pack_name: the name of the models package :param org: The remote organization to query :return: list of available version for the package
- macsylib.scripts.macsydata._search_in_desc(pattern: str, remote: RemoteModelIndex, m_packages: list[str], match_case: bool = False) tuple[str, str, str][source]
- Parameters:
pattern – the substring to search packages descriptions
remote – the uri of the macsy-models index
m_packages – list of model packages to search in
match_case – True if the search is case-sensitive, False otherwise
- Returns:
- macsylib.scripts.macsydata._search_in_pack_name(pattern: str, remote: RemoteModelIndex, m_packages: list[str], match_case: bool = False) list[tuple[str, str, dict]][source]
- Parameters:
pattern – the substring to search packages names
remote – the uri of the macsy-models index
m_packages – list of model packages to search in
match_case – True if the search is case-sensitive, False otherwise
- Returns:
- macsylib.scripts.macsydata.build_arg_parser(header: str, version: str, package_name: str = 'macsylib', tool_name: str = 'msl_data') ArgumentParser[source]
Build argument parser.
- Parameters:
header – the header of console script
args – The arguments provided on the command line
package_name – the name of the higher package that embed the macsylib (eg ‘macsyfinder’)
tool_name – the name of this tool as it appear in pyproject.toml
- Returns:
The arguments parsed
- macsylib.scripts.macsydata.cmd_name(args: Namespace) str[source]
Return the name of the command being executed (scriptname + operation).
- Example
msl_data uninstall
- Parameters:
args – the arguments passed on the command line
- macsylib.scripts.macsydata.do_available(args: Namespace) None[source]
List Models available on macsy-models :param args: the arguments passed on the command line :return: None
- macsylib.scripts.macsydata.do_check(args: Namespace) None[source]
- Parameters:
args – the arguments passed on the command line
- Return type:
None
- macsylib.scripts.macsydata.do_cite(args: Namespace) None[source]
How to cite an installed model.
- Parameters:
args – the arguments passed on the command line
- macsylib.scripts.macsydata.do_download(args: Namespace) str | None[source]
Download tarball from remote models’ repository.
- Parameters:
args (
argparse.Namespaceobject) – the arguments passed on the command line
- macsylib.scripts.macsydata.do_freeze(args: Namespace) None[source]
display all models installed with their respective version, in requirement format.
- Parameters:
args – the arguments passed on the command line
- macsylib.scripts.macsydata.do_help(args: Namespace) None[source]
Display on stdout the content of readme file if the readme file does not exist display a message to the user see
macsylib.package.help()- Parameters:
args – the arguments passed on the command line (the package name)
- Returns:
None
- Raises:
ValueError – if the package name is not known.
- macsylib.scripts.macsydata.do_info(args: Namespace) None[source]
Show information about installed model.
- Parameters:
args – the arguments passed on the command line
- Raises:
ValueError – if the package is not found locally
- macsylib.scripts.macsydata.do_init_package(args: Namespace) None[source]
Create a template for data package
skeleton for metadata.yml
definitions directory with a skeleton of models.xml
profiles directory
skeleton for README.md file
COPYRIGHT file (if holders option is set)
LICENSE file (if model_license option is set)
- Parameters:
args – The parsed commandline subcommand arguments
- Returns:
None
- macsylib.scripts.macsydata.do_install(args: Namespace) None[source]
Install new models in macsylib local models repository.
- Parameters:
args – the arguments passed on the command line
- Raises:
RuntimeError – if there is problem is installed package
ValueError – if the package and/or version is not found
- macsylib.scripts.macsydata.do_list(args: Namespace) None[source]
List installed models.
- Parameters:
args – the arguments passed on the command line
- macsylib.scripts.macsydata.do_search(args: Namespace) None[source]
Search macsy-models for Model in a remote index. by default search in package name, if option -S is set search also in description by default the search is case-insensitive except if option –match-case is set.
- Parameters:
args – the arguments passed on the command line
- macsylib.scripts.macsydata.do_show_definition(args: Namespace) None[source]
display on stdout the definition if only a package or sub-package is specified display all model definitions in the corresponding package or subpackage
for instance
TXSS+/bacterial T6SSii T6SSiii
display models TXSS+/bacterial/T6SSii and TXSS+/bacterial/T6SSiii
TXSS+/bacterial all or TXSS+/bacterial
display all models contains in TXSS+/bacterial subpackage
- Parameters:
args – the arguments passed on the command line
- macsylib.scripts.macsydata.do_show_package(args: Namespace) None[source]
Display the structure of an installed model package. The family , sub families and models in tree-like format
- Parameters:
args – the passed on the command line (the package name)
- Returns:
None
- Raises:
ValueError – if the package is not find.
- macsylib.scripts.macsydata.do_uninstall(args: Namespace) None[source]
Remove models from macsylib local models repository.
- Parameters:
args – the arguments passed on the command line
- Raises:
ValueError – if the package is not found locally
- macsylib.scripts.macsydata.init_logger(level: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int = 'INFO', out: bool = True) Logger[source]
- Parameters:
level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’
out – if the log message must be displayed
- Returns:
logger
- macsylib.scripts.macsydata.main(args: list[str] = None, header: str = "\n\n * * *\n* * * * * * **\n ** * * * * * *\n * _ * _ * _ *\n * _ __ ___ ___| | __| | __ _| |_ __ _\n | '_ ` _ \\/ __| | / _` |/ _` | __/ _` |\n | | | | | \\__ \\ | | (_| | (_| | || (_| |\n |_| |_| |_|___/_|_____\\__,_|\\__,_|\\__\\__,_|\n * |_____| *\n * * * * * ** * * *\n * * * * *\n* * * *\n\n\nmsl_data - Model Management Tool\n", version="msl_data 1.0.1 \nPython 3.13.5 (main, Jul 17 2025, 00:32:28) [GCC 14.3.0]\n\nMacSyLib is distributed under the terms of the GNU General Public License (GPLv3).\nSee the COPYING file for details.\n\nIf you use this software please cite:\nNéron, Bertrand; Denise, Rémi; Coluzzi, Charles; Touchon, Marie; Rocha, Eduardo P.C.; Abby, Sophie S.\nMacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes.\nPeer Community Journal, Volume 3 (2023), article no. e28. doi : 10.24072/pcjournal.250.\nhttps://peercommunityjournal.org/articles/10.24072/pcjournal.250/\nand don't forget to cite models used:\nmacsydata cite <model>\n", package_name: str = 'macsylib', tool_name='msl_data') None[source]
Main entry point.
- Parameters:
args – the arguments passed on the command line (before parsing)
header – the header of console scriot
package_name – the name of the higher package that embed the macsylib (eg ‘macsyfinder’)
tool_name – the name of this tool as it appear in pyproject.toml
macsyprofile
- class macsylib.scripts.macsyprofile.HmmProfile(gene_name: str, gene_profile_lg: int, hmmer_output: str, cfg: Config)[source]
Handle the HMM output files
- __init__(gene_name: str, gene_profile_lg: int, hmmer_output: str, cfg: Config)[source]
- Parameters:
gene_name – the name of the gene corresponding to the profile search reported here
hmmer_output – The path to the raw Hmmer output file
cfg – the configuration object
- __weakref__
list of weak references to the object
- _build_my_db(hmm_output: str) dict[slice(<class 'str'>, None, None)][source]
Build the keys of a dictionary object to store sequence identifiers of hits.
- Parameters:
hmm_output – the path to the hmmsearch output to parse.
- Returns:
a dictionary containing a key for each sequence id of the hits
- _fill_my_db(db: dict[slice(<class 'str'>, tuple[int, int], None)]) None[source]
Fill the dictionary with information on the matched sequences
- Parameters:
db – the database containing all sequence id of the hits.
- _hit_start(line: str) bool[source]
- Parameters:
line – the line to parse
- Returns:
True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise
- _parse_hmm_body(hit_id: str, gene_profile_lg: int, seq_lg: int, coverage_threshold: float, replicon_name: str, position_hit: int, i_evalue_sel: float, b_grp: list[list[str]]) list[CoreHit][source]
Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)
- Parameters:
hit_id – the sequence identifier
gene_profile_lg – the length of the profile matched
seq_lg – the length of the sequence
coverage_threshold – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection.
replicon_name – the identifier of the replicon
position_hit – the rank of the sequence matched in the input dataset file
i_evalue_sel – the maximal i-evalue (independent evalue) for hit selection
b_grp – the Hmmer output lines to deal with (grouped by hit)
- Returns:
a sequence of hits
- class macsylib.scripts.macsyprofile.LightHit(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int)[source]
Handle hmm hits
- __eq__(other)
Return self==value.
- __init__(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int) None
- __repr__()
Return repr(self).
- __weakref__
list of weak references to the object
- macsylib.scripts.macsyprofile.get_gene_name(path: str, suffix: str) str[source]
- Parameters:
path – The path to the hmm output to analyse
suffix – the suffix of the hmm output file
- Returns:
the name of the analysed gene
- macsylib.scripts.macsyprofile.get_profile_len(path: str) int[source]
Parse the HMM profile to extract the length and the presence of GA bit threshold
- Parameters:
path – The path to the hmm profile used to produce the hmm search output to analyse
- Returns:
the length, presence of ga bit threshold
- macsylib.scripts.macsyprofile.get_version_message(tool_name: str = 'msl_profile', data_mgr: str = 'msl_data') str[source]
- Parameters:
tool_name – The name of the high level tool
- Returns:
the long description of the macsylib version
- macsylib.scripts.macsyprofile.init_logger(level: Literal['NOTSET', 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | int = 'INFO', out: bool = True)[source]
- Parameters:
level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’
out – if the log message must be displayed
- Returns:
logger
- macsylib.scripts.macsyprofile.main(args: list[str] | None = None, header: str = "\n * * * * *\n * * * * * ** \n ** * * * * * * \n * _ * ** __ _ _ * \n _ __ ___ ___| | _ __ _ __ ___ / _(_) | ___ \n | '_ ` _ \\/ __| | | '_ \\| '__/ _ \\| |_| | |/ _ \\ \n | | | | | \\__ \\ | | |_) | | | (_) | _| | | __/\n |_| |_| |_|___/_|____| .__/|_| \\___/|_| |_|_|\\___|\n * |_____|_| * *\n * * * * ** * * * *\n * * * * * * \n * * * * \n\n\nmsl_profile - MacSyLib profile helper tool\n", package_name: str = 'macsylib', tool_name: str = 'msl_profile', log_level: str | int | None = None) None[source]
main entry point
- Parameters:
args – the arguments passed on the command line without the program name
package_name – the name of the higher package that embed the macsylib (eg ‘macsyfinder’)
tool_name – the name of this tool as it appear in pyproject.toml
log_level – the output verbosity
- macsylib.scripts.macsyprofile.parse_args(header: str, args: list[str], package_name='macsylib', tool_name: str = 'msl_profile') Namespace[source]
Build argument parser.
- Parameters:
header – the header of console scriot
args – The arguments provided on the command line
package_name – the name of the higher package that embed the macsylib (eg ‘macsyfinder’)
tool_name – the name of this tool as it appear in pyproject.toml
- Returns:
The arguments parsed