Metadata-Version: 2.4
Name: rwe
Version: 0.0.20
Summary: Real World Evidence utilities and reporting
Author: Deepro Banerjee
License: MIT License
        
        Copyright (c) 2026 Deepro Banerjee
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Keywords: genomics,phewas,rwe,reporting
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: matplotlib
Requires-Dist: seaborn>=0.13
Requires-Dist: python-docx>=1.1.0
Requires-Dist: tqdm
Requires-Dist: requests
Requires-Dist: scipy
Requires-Dist: pyarrow
Requires-Dist: gcsfs
Requires-Dist: google-cloud-storage
Requires-Dist: playwright
Requires-Dist: phetk
Dynamic: license-file

# Real world evidence of siRNA targets
The current pipeline generates a real world genetic evidence document of an siRNA target by providing phenotypic details of individuals carrying predicted loss of function mutations in that target from multiple biobanks. The report can be used for the following three broader utilities:
- Discover new target-indication pairs
- Safety evaluation of potential target
- Repurposing opportunity of existing target

# Description of the report
The report currently has the following sections:
- Variant information and demographics
- Clinical records
- Labs and measurements
- Survey information
- Homozygous loss of function carriers
- Plasma proteomics
- Indication specific report

Future updates might have the following additional sections:
- OpenTargets
- Knowledge portal networks: https://hugeamp.org/research.html?pageid=kpn_portals
- Genomics England information: https://www.genomicsengland.co.uk/
- Genes and Health information: https://www.genesandhealth.org/
- Generate automated ppt report
- Link AI via api to generate summary of the indication specific section
- Use AI to generate ppt report for efficacy

## Variant information and demographics
### Variant information
Provides number of pLoF carriers across four variant categories in the All of Us cohort: 
- stop gained
- frameshift
- splice acceptor
- splice donor

### Demographics
Includes age, sex, ancestry and ethnicity information of pLoF carriers in comparison with non-carriers.

## Clinical records
Provides phenomewide association study results of pLoF carriers in All of Us and UK Biobank cohorts. The All of Us association results are generated in-house. The UK Biobank results are collected from genebass and astrazeneca open-source portal. 

## Labs and measurements
Provides lab results of pLoF carriers in All of Us and UK Biobank cohort in comparison to the non-carriers. 
Detailed measurement definitions and concept IDs are maintained in `docs/labs_and_measurements.md` (included in the source distribution).

## Survey information
Includes self-reported survey information about general, mental, physical and overall health of pLoF carriers in comparison with non-carriers in the All of Us cohort.

## Homozygous loss of function carriers
Provides demographics and survey information of the biallelic lof variant carriers in All of Us.

## Plasma proteomics
Provides association statistics of gene pLoF with plasma protein levels. 

## Indication specific report
Provides association results for user specified indications from All of Us and UK Biobank cohorts. Currently available indications are:
- obesity
- type_2_diabetes
- diabetic_kidney_disease
- dyslipidemia
- cold_agglutinin_disease
- long_qt_syndrome
- hypertrophic_cardiomyopathy
- metabolic_syndrome
- angelman_syndrome
- hemophilia
- essential_thrombocythemia
- aortic_valve_stenosis

# Resources used to generate the report

## Controlled Datasets

### All of Us
The All of Us cohort currently consists of 420k participants with whole genome sequencing and phenotypic data. 

## Open Source Databases
Here we describe the open source databases used for gathering evidence about the targets:

### GeneBass
GeneBass reports phenomewide associations for LoF carriers among 380k participants from the UK Biobank cohort.

### AstraZeneca PheWAS portal
AstraZeneca reports phenomewide associations for LoF carriers among 500k participants from the UK Biobank cohort.

# Updates and Installation
Separately in TODO

## Internal Use for installation
```bash
# upgrade packages for building
python -m pip install -U pip build
pip install twine
twine upload dist/*

# New version packaging and upload
rm -rf dist build *.egg-info src/*.egg-info
conda activate rwe
python -m build
pip install dist/rwe-0.0.20-py3-none-any.whl
python -c "from rwe.generate_report import generate_rwe_report; import rwe.clients.aou as aou; import rwe.clients.azn as azn; import rwe.clients.genebass as gbs; print('import ok')"
twine upload dist/*

# Before packaging environment test
conda install -c conda-forge python=3.12
pip install -r requirements.txt
playwright install
python -m playwright install-deps
```

# Resources
1. ICD to Phecode mappings: https://www.vumc.org/wei-lab/sites/default/files/public_files/ICD_to_Phecode_mapping.csv
