Metadata-Version: 2.2
Name: PanR2
Version: 0.1.1
Summary: A Python tool for panresistome analysis
Home-page: https://github.com/Tasnimul-Arabi-Anik/PanR2
Author: Tasnimul Arabi Anik
Author-email: arabianik987@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3
Requires-Dist: matplotlib>=3.5
Requires-Dist: seaborn>=0.11
Requires-Dist: numpy>=1.21
Requires-Dist: scipy>=1.7
Requires-Dist: plotly>=5.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# PanR2: Panresistome Analysis Tool

## Overview
PanR2 is a comprehensive Python-based tool for analyzing panresistome data. It processes NCBI and Abricate summary files, merges the data, and generates a wide range of visualizations including heatmaps, bar plots, boxplots, and interactive HTML plots. The tool is designed to help researchers analyze and visualize antibiotic resistance gene presence, prevalence, and distribution patterns across different geographic locations and temporal scales.

**Prerequisites:**
- `ncbi_clean.csv` from [FetchM](https://github.com/Tasnimul-Arabi-Anik/FetchM)
- Summary files in `.tab` (preferred) or `.csv` format from [Abricate](https://github.com/tseemann/abricate)

### Key Features:
- Merges and analyzes NCBI and Abricate outputs
- Calculates gene presence/absence across multiple categories (continent, location, subcontinent, collection date)
- Performs comprehensive statistical tests and correlation analyses
- Generates multiple visualization types: heatmaps, bar plots, boxplots, lollipop plots, and correlation plots
- Creates interactive HTML visualizations for enhanced data exploration
- Generates an interactive HTML index for easy navigation of all results
- Provides detailed statistical analysis outputs

---

## Installation

### Method 1: Using `pip` with `conda` (Recommended)
```bash
conda create -n panr_env python=3.9
conda activate panr_env
pip install panR2
```

### Method 2: Direct installation from GitHub
```bash
conda create -n panr_env python=3.8
conda activate panr_env
pip install git+https://github.com/Tasnimul-Arabi-Anik/PanR2.git
```

### Method 3: Manual Installation from Source
```bash
git clone https://github.com/Tasnimul-Arabi-Anik/PanR2.git
cd PanR2
pip install -r requirements.txt
```

### Confirm Installation
```bash
panr --help
```

---

## Usage

### Command-Line Interface
```bash
panr --ncbi-dir <NCBI_DIRECTORY> --abricate-dir <ABRICATE_DIRECTORY> --output-dir <OUTPUT_DIRECTORY> [OPTIONS]
```

### Required Arguments
| Argument | Description |
|----------|-------------|
| `--ncbi-dir` | Path to the `ncbi_clean.csv` file from FetchM |
| `--abricate-dir` | Directory containing Abricate summary `.tab` or `.csv` files |
| `--output-dir` | Directory to store merged results and visualizations |

### Optional Arguments
| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--genep` | float | - | Minimum % gene presence to include in heatmap |
| `--nseq` | int | - | Minimum number of sequences required per group in heatmaps |
| `--format` | str | `png` | Output format for figures (`tiff`, `svg`, `png`, `pdf`) |
| `--version` | - | - | Show program's version number and exit |
| `-h, --help` | - | - | Show help message and exit |

### Example Usage
```bash
# Basic usage
panr --ncbi-dir ./data/ncbi_clean.csv --abricate-dir ./data/abricate --output-dir ./output

# With optional parameters
panr --ncbi-dir ./data/ncbi_clean.csv --abricate-dir ./data/abricate --output-dir ./output --format pdf --genep 10 --nseq 5
```

---

## Output Structure

PanR2 generates a comprehensive set of outputs organized in the following directory structure:

```
output/
├── figures/
│   ├── heatmap/                          # Geographic heatmaps
│   ├── html_files/                       # Interactive HTML plots
│   ├── mean_ARG/                         # Mean antibiotic resistance gene plots
│   ├── Stat_analysis/                    # Statistical analysis results
│   ├── index.html                        # Main navigation page
│   └── [Various static plots]
└── [Processed CSV files]
```

### Output Files Description

#### 1. Static Visualizations
- **`Resistance_gene_presence.{format}`** - Bar plot showing gene presence across samples
- **`Resistance_gene_percentage.{format}`** - Lollipop plot showing gene percentage distribution
- **`Resistance_gene_identity_boxplot.{format}`** - Boxplot of resistance gene identity scores
- **`Resistance_percentage_by_Antibiotics.{format}`** - Bar plot of resistance by antibiotic classes

#### 2. Heatmaps (`heatmap/` directory)
- **`ncbi_ncbi_Continent_heatmap.{format}`** - Resistance gene distribution by continent
- **`ncbi_ncbi_Geographic_Location_heatmap.{format}`** - Distribution by geographic location
- **`ncbi_ncbi_Subcontinent_heatmap.{format}`** - Distribution by subcontinent
- **`ncbi_ncbi_Collection_Date_heatmap.{format}`** - Temporal distribution patterns

#### 3. Mean ARG Analysis (`mean_ARG/` directory)
- **`Mean_ARG_by_Continent.{format}`** - Average antibiotic resistance genes by continent
- **`Mean_ARG_by_Geographic Location.{format}`** - Average ARGs by geographic location
- **`Mean_ARG_by_Subcontinent.{format}`** - Average ARGs by subcontinent  
- **`Mean_ARG_by_Collection Date.{format}`** - Temporal trends in ARG abundance

#### 4. Interactive HTML Visualizations (`html_files/` directory)
- **`Resistance_gene_distribution_heatmap.html`** - Interactive heatmap of gene distribution
- **`Resistance_gene_geographic_distribution.html`** - Geographic distribution map
- **`Resistance_gene_frequency_boxplot.html`** - Interactive frequency analysis
- **`Resistance_gene_identity_boxplot.html`** - Interactive identity score analysis
- **`Resistance_gene_presence.html`** - Interactive presence/absence visualization
- **`Resistance_gene_percentage.html`** - Interactive percentage analysis
- **`Resistance_percentage_by_Antibiotics.html`** - Interactive antibiotic class analysis
- **`Mean_Frequency_Antibiotic_Resistance_genes.html`** - Mean frequency analysis
- **`Continent_correlation_plot.html`** - Continental correlation analysis
- **`Geographic_Location_correlation_plot.html`** - Location-based correlations
- **`Subcontinent_correlation_plot.html`** - Subcontinental correlation patterns

#### 5. Statistical Analysis (`Stat_analysis/` directory)
- **`combined_geographic_correlation_summary.csv`** - Geographic correlation statistics
- **`combined_overall_tests.csv`** - Overall statistical test results
- **`combined_pairwise_comparisons.csv`** - Pairwise comparison results
- **`combined_summary_statistics.csv`** - Comprehensive summary statistics
- **`ncbi_gene_presence_count_percentage.csv`** - Gene presence counts and percentages

#### 6. Navigation
- **`index.html`** - Interactive HTML index page for easy navigation of all generated visualizations

---

## Example Visualizations

### Static Plots
![Resistance Gene Presence](figures/Resistance_gene_presence.png)
*Bar plot showing the presence of resistance genes across samples*

![Resistance Gene Percentage](figures/Resistance_gene_percentage.png) 
*Lollipop plot displaying gene percentage distribution*

![Geographic Heatmap](figures/heatmap/ncbi_ncbi_Continent_heatmap.png)
*Heatmap showing resistance gene distribution across continents*

### Interactive Features
The tool generates interactive HTML visualizations that allow for:
- Zooming and panning
- Hover tooltips with detailed information
- Dynamic filtering and selection
- Exportable high-quality images
- Responsive design for different screen sizes

Access all interactive plots through the generated `index.html` file in your output directory.

---

## Statistical Analysis Features

PanR2 provides comprehensive statistical analysis including:
- **Correlation Analysis**: Geographic and temporal correlations
- **Comparative Statistics**: Between-group comparisons
- **Summary Statistics**: Descriptive statistics for all variables
- **Pairwise Comparisons**: Detailed pairwise statistical tests
- **Geographic Patterns**: Spatial distribution analysis

---

## Requirements

- Python 3.8+
- Required Python packages (automatically installed):
  - pandas
  - numpy
  - matplotlib
  - seaborn
  - plotly
  - scipy
  - Other dependencies listed in `requirements.txt`

---

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue for bugs, feature requests, or improvements.

---

## License

This tool is provided under the MIT License. See `LICENSE` file for details.

---

## Citation

If you use PanR2 in your research, please cite: DOI: 10.1101/2025.04.08.647722 
```
PanR2: A comprehensive tool for panresistome analysis and visualization
Author: Tasnimul Arabi Anik
GitHub: https://github.com/Tasnimul-Arabi-Anik/PanR2
```

---

## Support

For questions, issues, or feature requests, please:
1. Check the existing [Issues](https://github.com/Tasnimul-Arabi-Anik/PanR2/issues)
2. Create a new issue with detailed information
3. Contact the author: Tasnimul Arabi Anik

---

## Changelog

### Latest Updates
- Added interactive HTML visualizations
- Enhanced statistical analysis capabilities  
- Improved output organization and navigation
- Added support for multiple figure formats
- Enhanced correlation analysis features
