Metadata-Version: 2.1
Name: OAS_wrapper
Version: 1.0
Summary: Python package to parse Observed antibody sequence data for easy reporting, visualization, annotation and alignment.
Home-page: https://github.com/sridhara-omics/OAS_wrapper
Author: Viswanadham Sridhara
Author-email: Sridhara.Utils@gmail.com
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy <2
Requires-Dist: scipy
Requires-Dist: pandas
Requires-Dist: matplotlib
Requires-Dist: Bio

# OAS_wrapper
A wrapper to parse Observed Antibody Space (OAS) data for better visualization, annotation and comparison of sequence to germline.

## Functionalities

The scripts include functionalities to:
1. Provide basic metrics and visualizations of sequences and their annotations
2. Provide original IMGT sequences for V, D and J calls made in OAS data using IMGT reference database
3. Align sequence and germline to highlight regions of mismatches, providing positional information e.g.,  
```
Example Output:  
Sequence: ATGCCGT  
Germline: ATGTCGT  
Aligned and Highlighted Differences:  
ATGC[RED]C[/RED]GT  
ATG[RED]T[/RED]CGT  
Indices of Differences: [3]  
Mapped Regions: [(3, 'CDR2')]  
--------------------------------------------------
```
5. Group data by germline, to infer sequences that originate from germline, including providing information on V, D and J annotations
```
Example Output:
germline,number_of_sequences,sequence,v_call,d_call,j_call,quality,source
IGHV1-69,3,ATGC,IGHV1,IGHD2,IGHJ4,High,Lab1
IGHV1-69,3,GCTA,IGHV1,IGHD2,IGHJ4,Medium,Lab2
IGHV1-69,3,ATGC,IGHV1,IGHD2,IGHJ4,High,Lab1
IGHV3-23,2,TAGC,IGHV3,IGHD3,IGHJ5,Low,Lab3
IGHV3-23,2,GCAT,IGHV3,IGHD3,IGHJ5,Medium,Lab2
```
   
7. Annotate sequence with CDRs and FWRs for easy inference of regions of interest
```
Example Output:  
Original Sequence: ATGCCGTAACTG  
Annotated Sequence: A(Unannotated)T(CDR1)GC(FWR1)GT(CDR2)AA(FWR2)CT(CDR3)G(Unannotated)  
--------------------------------------------------  
Row 2:  
Original Sequence: GATTACAGTGCTA  
Annotated Sequence: GAT(CDR1)TAC(FWR1)AG(CDR2)TG(FWR2)CTA(CDR3)  
--------------------------------------------------  
```

## Project Organization

```
â”œâ”€â”€ LICENSE            <- Open-source license if one is chosen
â”œâ”€â”€ README.md          <- The top-level README for developers using this project.
â”œâ”€â”€ data 
â”œâ”€â”€ notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
â””â”€â”€ src                <- Source code for use in this project.
```

--------

