Metadata-Version: 2.1
Name: xlms-tools
Version: 1.0.5
Summary: xlms-tools is a set of command line tools to apply crosslinking mass spectrometry (XL-MS) data to protein structure models.
Author-email: Topf Lab <karen.manalastas-cantos@cssb-hamburg.de>
Project-URL: Homepage, https://gitlab.com/topf-lab/xlms-tools
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: biopython>=1.78
Requires-Dist: numpy>=1.19.2
Requires-Dist: scipy>=1.5.2

# xlms-tools

## Description
xlms-tools is a set of command line tools to apply crosslinking mass spectrometry (XL-MS) data to protein structure models.


## Setting up xlms-tools
xlms-tools can be installed directly from PyPI, as follows:
```bash
$ pip install xlms-tools
```
Or alternatively, by downloading or cloning this repository, and running the following from the project directory:
```bash
$ pip install dist/xlms_tools-1.0.5-py3-none-any.whl
```

## Using xlms-tools
Currently, xlms-tools can be run in two modes. The first is to score how well a protein structure model agrees with XL-MS data, which is specified as a list crosslinks and monolinks derived from a XL-MS experiment. The second mode is to compute the depths of individual residues in protein structures.

### To score how well a protein structure agrees with XL-MS data
1. First, format crosslinks and monolinks into a text file with the following format:
```bash
98|A|147|A 5.1		#
72|A|161|A 4.3		# crosslinks: <residue # of a>|<chain of a>|<res#, b>|<chain, b> <occupancy>
72|A|180|A 2.7		#
35|A 1.9	#		
137|A 5.3	# monolinks: <residue # of a>|<chain of a> <occupancy>
97|A 2.6	#
```
where each line corresponds to either a crosslink or a monolink. Optionally, a numerical value can be appended at the end of each line, corresponding to the occupancy of each individual monolink or crosslink. For a detailed discussion on occupancy, please refer to the paper.

2. Score protein structure model/s:
To compute the crosslink (XLP) and monolink probability (MP) scores of one or more protein structures, execute the following in the command line:

```bash
$ xlms-tools -m score -l [list of crosslinks and/or monolinks] [PDB file/s] --name [name of run]
```

Example files can be found in the tests/ directory. You can navigate to the tests/ directory and run the scoring, as follows:
```bash
$ xlms-tools -m score -l xlms_data_qtv.txt model_1.pdb --name withoccupancy_m1 #score model_1.pdb
$ xlms-tools -m score -l xlms_data_qtv.txt model_*pdb --name withoccupancy_all #score all models
```

3. The outputs include (a) a tab-separated (.tsv) file containing the scores, which can be viewed using a spreadsheet editor, as well as (b) a ChimeraX command (.cxc) file, which can be executed by double-clicking the .cxc file (given a working installation of ChimeraX). In the ChimeraX visualization, crosslinks and monolinks are color-coded: blue means that the spanning distance of the crosslink, or the residue depth of the monolinked residue, are well within the cutoff, red stands for a maximum distance violation (for crosslinks) or a maximum depth violation (for monolinks), while yellow is within the cutoff, but approaching it. The distance cutoffs are currently only defined for BS3/DSS, but will be expanded in future releases.

	<img src="./imgs/chimerax.png">

| link type | blue | yellow | red |
| --- | --- | --- | --- |
| any monolink | depth ≤ 6.25Å | - | depth > 6.25Å |
| BS3 or DSS crosslink | CA-CA distance ≤ 21Å | 21Å < CA-CA distance ≤ 33Å | CA-CA distance >  33Å |

### To compute residue depths in a protein structure
1. Run the following command:
```bash
$ xlms-tools -m depth [PDB file/s]
```
Example files can be found in the tests/ directory. You can navigate to the tests/ directory and run the depth computations, as follows:
```bash
$ xlms-tools -m depth model_1.pdb #compute residue depths for model_1.pdb
$ xlms-tools -m depth model_*pdb #compute residue depths for all models
```

2. Each line of the output file (.depth file) corresponds to one residue 
```bash
# format: <residue number>:<chain>	<amino acid>	<residue depth in Å>
A:26	LYS	4.317777777777779	
A:27	LEU	4.608300983124843
A:28	VAL	5.739574753218949
A:29	VAL	8.684474490011493
A:30	ALA	8.926983412282244
A:31	THR	8.0268463237516
A:32	ASP	6.0348264453008715
A:33	THR	4.487608873498731
A:34	ALA	4.371281572999748
A:35	PHE	5.2527378512712275
A:36	VAL	5.226420625104608
A:37	PRO	6.844806528409208
...
```

### API
**xlms_tools.depth.computedepth**(*biopdbstruct*)
+ Computes the depth of each residue in a Biopython PDB structure

**Parameters:**
+ *biopdbstruct*: an instance of the [Bio.PDB structure class](https://biopython.org/docs/1.75/api/Bio.PDB.Structure.html)

**Returns:**
+ *depths*: dictionary containing residue depths, with key-value pairs in the format chainid:residue\#-residue_depth 
+ *residues*: dictionary containing residue names, with key-value pairs in the format chainid:residue\#-residue_name 

**Example usage:**
```bash
from Bio.PDB import PDBParser
from xlms_tools.depth import computedepth

parser = PDBParser()
biopdbstruct = parser.get_structure('test', 'tests/bigmodel.pdb')
depths, residues = computedepth(biopdbstruct)
```


**xlms_tools.score.linkmetrics**(*biopdbstruct*, *link*, *depths*, *linkweight*=1.0, *linker*='BS3/DSS')
+ Computes the score and CA-CA distance/residue depth in a Biopython PDB structure for a given crosslink or monolink.

**Parameters:**
+ *biopdbstruct*: an instance of the [Bio.PDB structure class](https://biopython.org/docs/1.75/api/Bio.PDB.Structure.html)
+ *link*: a tuple defining either a monolink (residue\#, chainid), or a crosslink (residue\#_1, chainid_1, residue\#_2, chainid_2)
+ *depths*: dictionary of residue depths (from **xlms_tools.depth.computedepth**(*biopdbstruct*))
+ *linkweight*: for quantitative XL-MS data, a measure of relative abundance or occupancy. If unspecified, default weight is 1.0
+ *linker*: crosslinking reagent used (if unspecified, default is BS3/DSS)

**Returns:**
+ *score*: monolink probability (MP) score for a monolink, crosslink probability (XLP) score for a crosslink
+ *monolinkdepth*: depth of the monolinked residue
+ *crosslinkdistance*: CA-CA distance of crosslink

**Example usage:**
```bash
from Bio.PDB import PDBParser
from xlms_tools.depth import computedepth
from xlms_tools.score import linkmetrics

parser = PDBParser()
biopdbstruct = parser.get_structure('test', 'tests/bigmodel.pdb')
depths, residues = computedepth(biopdbstruct)

# compute monolink score
mpscore, mpdepth = linkmetrics(biopdbstruct, (1, 'A'), depths)

# computed crosslink score
xlpscore, distance = linkmetrics(biopdbstruct, (1049, 'B', 1409, 'B'), depths)

```

 
## Citations
When using xlms-tools, please cite:
Manalastas-Cantos, K., Adoni, K. R., Pfeifer, M., Märtens, B., Grünewald, K., Thalassinos, K., & Topf, M. (2024). Modeling flexible protein structure with AlphaFold2 and cross-linking mass spectrometry. Molecular & Cellular Proteomics. https://doi.org/10.1016/j.mcpro.2024.100724
