Metadata-Version: 2.4
Name: pam-python
Version: 0.1.42
Summary: Pam Python Library
Home-page: https://github.com/heart/pam-python
Author: Narongrit Kanhanoi
Author-email: narongrit@pams.ai
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Customer Service
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: setuptools>=70.0.0
Requires-Dist: Flask>=3.0.2
Requires-Dist: aiohttp>=3.11.11
Requires-Dist: pandas>=2.2.3
Requires-Dist: Faker>=33.1.0
Requires-Dist: dask>=2024.12.1
Requires-Dist: dask-expr>=1.1.21
Requires-Dist: requests>=2.32.3
Requires-Dist: gunicorn>=23.0.0
Requires-Dist: pyarrow>=19.0.1
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# pam-python-data-plugin-framework

This repository provides the `pam` CLI and runtime framework to build Data Plugin services for PAM Real CDP. It generates a ready-to-run project, standardizes service lifecycle, and handles common tasks like input handling, temp storage, uploads, and service monitoring.

This README is a practical, step-by-step guide you can follow to create and run a real service.

**What you get**

- CLI to initialize a project and scaffold services
- Service lifecycle contract (start, data input, upload, exit)
- Temp file and SQLite helpers
- A monitoring loop for service timeouts and periodic cleanup

---

**Table of Contents**

1. Prerequisites
2. Install
3. Initialize a Project
4. Create a Service
5. Understand the Lifecycle
6. Using Temp Files Correctly
7. Running the Server
8. Testing a Service
9. Configuration
10. Project Structure
11. Troubleshooting

---

**Prerequisites**

- Python 3.8+ recommended
- `pip` and a working virtual environment

---

**Install**
Create a project folder and a virtual environment.

```bash
mkdir my_data_plugin
cd my_data_plugin
python3 -m venv venv
source venv/bin/activate
```

Install the framework:

```bash
pip install pam-python
```

---

**Initialize a Project**
This creates a runnable project with templates (including `AGENT.md`).

```bash
pam init
```

When `requirements.txt` already exists, you will be prompted to choose how to proceed:

- overwrite
- keep
- merge

---

**Create a Service**
Generate a service scaffold. Do not hand-create service templates.

```bash
pam new service rfm_segment
```

This creates a new folder (e.g. `rfm_segment/`) with:

- a service class (`RfmSegmentSvc.py`)
- `functions.py` for your logic
- `service.yaml` for registration
- a test file

---

**Understand the Lifecycle**
The runtime calls your service in two main phases.

1. `on_start`

- Called once at the beginning
- Read parameters from `self.request.runtime_parameters`
- Should return quickly (start a thread for long work)

2. `on_data_input`

- Called when CDP sends input files
- `req.input_files` contains ordered CSV files
- Should also return quickly (use a thread if needed)

When your service is done:

- Call `self._upload_result(...)` or `self._upload_report(...)`
- Call `self._exit()` to signal completion

---

**Using Temp Files Correctly**
Temp storage is managed by the framework. Do not delete temp files manually.

Standard helpers:

- `TempfileUtils.get_temp_path_for_service(self, self.service_name)`
- `TempfileUtils.get_temp_file_name_for_service(self, self.service_name, prefix, extension)`

Notes:

- `get_temp_path_for_service(...)` returns a directory path without a trailing slash.
- The temp path includes date/service/token in this structure:
  `TEMP_DATASOURCE_PATH/YYYY_MM_DD/<service>/<token>`

---

**Uploading Results in Batches**
If your service produces too many rows (or too few per event), use the batch uploader to handle chunking and flushing automatically.

Recommended usage:

```python
from pam.result_batch_uploader import ResultBatchUploader

batch_uploader = ResultBatchUploader(self, batch_size=50000)
batch_uploader.upload(df, "data-name")
batch_uploader.flush()
status = batch_uploader.get_status()
```

Notes:

- `name` separates different result streams (A/B) to avoid schema conflicts.
- `flush()` uploads any remaining rows that are below the batch size.

---

**Running the Server**
The generated `main.py` runs the Flask server.

```bash
python main.py
```

By default it binds to `0.0.0.0:8000`. You can override with:

```bash
export SERVER_HOST=0.0.0.0
export SERVER_PORT=8000
```

---

**Testing a Service**
Run unit tests for a service:

```bash
pam test rfm_segment
```

If you write custom tests, place them in the service folder and name them `test_<service>.py`.

---

**Configuration**
Environment variables you can set:

- `SERVER_HOST`
- `SERVER_PORT`
- `TEMP_BASE_PATH` (default `/app/data`)
- `TEMP_DATASOURCE_PATH` (default `/app/data/data_sources`)
- `TEMP_CLEAN_DAYS` (default `10`)
- `TEMP_CLEAN_INTERVAL_HOURS` (default `6`, set empty to disable periodic cleanup)

---

**Project Structure**
After `pam init` and one service:

```
.
├── main.py
├── AGENT.md
├── Dockerfile
├── rfm_segment/
│   ├── RfmSegmentSvc.py
│   ├── functions.py
│   ├── service.yaml
│   └── test_rfm_segment.py
├── requirements.txt
└── run_unit_test.sh
```

---

**Troubleshooting**

- If `pam` command is missing, ensure your virtualenv is activated.
- If `pam new service` fails, confirm the service name is provided.
- If temp cleanup is too frequent or too slow, adjust `TEMP_CLEAN_INTERVAL_HOURS` and `TEMP_CLEAN_DAYS`.

---

**Next Steps**

- Implement your logic in `functions.py`.
- Wire it into `on_start` and `on_data_input` in your service class.
- Use the temp utilities to write intermediate files.
- Use `_upload_result` to return output to CDP.
