Metadata-Version: 2.3
Name: mlops-python-sdk
Version: 1.0.2
Summary: MLOps Python SDK for XCloud Service API
License: MIT
Author: mlops
Author-email: mlops@example.com
Requires-Python: >=3.9,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: attrs (>=23.2.0)
Requires-Dist: httpx (>=0.27.0,<1.0.0)
Requires-Dist: packaging (>=24.1)
Requires-Dist: python-dateutil (>=2.8.2)
Requires-Dist: typing-extensions (>=4.1.0)
Project-URL: Bug Tracker, https://github.com/xcloud-service/xservice/issues
Project-URL: Homepage, https://mlops.cloud/
Project-URL: Repository, https://github.com/xcloud-service/xservice
Description-Content-Type: text/markdown

# SDK

Software Development Kits for integrating with the XCloud Service API.

> [!NOTE] SDK Support
> SDKs provide type-safe, high-level interfaces for interacting with the platform API. They handle authentication, error handling, and request retries automatically.

## Available SDKs

### Python SDK

### Installation

The Python SDK installation.

```bash
pip install mlops-python-sdk
```

### Configuration

The SDK reads configuration from environment variables by default:

- `MLOPS_API_KEY`: API key (required)
- `MLOPS_DOMAIN`: API domain, e.g. `localhost:8090` or `https://example.com`
- `MLOPS_API_PATH`: API path prefix (default: `/api/v1`)
- `MLOPS_DEBUG`: `true|false` (default: `false`)

Or configure in code:

```python
from mlops import ConnectionConfig, Task

config = ConnectionConfig(
    api_key="xck_...",
    domain="https://example.com",
    api_path="/api/v1",
    debug=False,
)
task = Task(config=config)
```

### Usage

```python
from mlops import Task
from mlops.api.client.models.task_status import TaskStatus
from pathlib import Path

# Initialize Task client (uses environment variables by default)
task = Task()

# Submit a task with gpu type
try:
    result = task.submit(
        name="gpu-task-from-sdk",
        image="/mnt/minio/images/01ai-registry.cn-shanghai.cr.aliyuncs.com+public+llamafactory+0.9.3.sqsh",
        entry_command="llamafactory-cli train /workspace/config/test_lora.yaml",
        resources={
            "partition": "gpu",
            "nodes": 2,
            "ntasks": 2,
            "cpus_per_task": 2,
            "memory": "4G",
            "time": "01:00:00",
            "gres": "gpu:nvidia_a10:1",
            "qos": "qos_xcloud",
            },
        cluster_name="slurm-cn",
        team_id=1,
        file_path="your file path", # optional, support for .zip, .tar.gz, .tgz
    )

    if result is not None:
        print("==== gpu task submitted successfully ====")
        job_id = result.job_id
    else:
        print("==== gpu task submitted failed ====")
except Exception as e:
    print("==== gpu task submitted failed error ====", e)

# Submit a task with cpu type
try:
    entry_content = Path("entry.sh").read_text(encoding="utf-8")
    result = task.submit(
        name="cpu-task-from-sdk",
        image="docker://01ai-registry.cn-shanghai.cr.aliyuncs.com/01-ai/xcs/v2/alpine:3.23.0",
        entry_command=entry_content,
        resources={
            "partition": "cpu",
            "nodes": 1,
            "ntasks": 1,
            "cpus_per_task": 1,
            "memory": "1G",
            "time": "01:00:00",
            "qos": "qos_xcloud",
        },
        cluster_name="slurm-cn",
        team_id=1,
    )

    if result is not None:
        print("==== cpu task submitted successfully ====")
        job_id = result.job_id
    else:
        print("==== cpu task submitted failed ====")
except Exception as e:
    print("==== cpu task submitted failed error ====", e)

# List tasks with filters
try:
    completed_tasks = task.list(
        status=TaskStatus.COMPLETED,
        cluster_name="slurm-cn",
        page=1,
        page_size=20
    )

    # Get task details
    if completed_tasks is not None and len(completed_tasks.tasks) > 0:
        print("==== completed_tasks number ====", len(completed_tasks.tasks))
        task_info = task.get(task_id=completed_tasks.tasks[0].job_id, cluster_name="slurm-cn")
        print("==== task_info ====", task_info)
    else:
        print("==== no completed tasks to get details ====")
except Exception as e:
    print("==== get task details failed error ====", e)


# Cancel a running task
try:
    running_tasks = task.list(
        status=TaskStatus.RUNNING,
        cluster_name="slurm-cn",
        page=1,
        page_size=20
    )
    if running_tasks is not None and len(running_tasks.tasks) > 0:
        print("==== running_tasks number ====", len(running_tasks.tasks))
        # Cancel a task
        result = task.cancel(task_id=running_tasks.tasks[0].job_id, cluster_name="slurm-cn")
        print("==== task cancelled ====", running_tasks.tasks[0].job_id, result)
    else:
        print("==== no running tasks to cancel ====")
except Exception as e:
    print("==== cancel running task failed error ====", e)


# Delete a task
try:
    completed_tasks = task.list(
        status=TaskStatus.COMPLETED,
        cluster_name="slurm-cn",
        page=1,
        page_size=20
    )
    if completed_tasks is not None and len(completed_tasks.tasks) > 0:
        print("==== completed_tasks number ====", len(completed_tasks.tasks))
        # Delete a task
        result = task.delete(task_id=completed_tasks.tasks[0].job_id, cluster_name="slurm-cn")
        print("==== task deleted ====", completed_tasks.tasks[0].job_id, result)
    else:
        print("==== no completed tasks to delete ====")
except Exception as e:
    print("==== delete completed task failed error ====", e)
```

**Task Management Methods:**

- `submit()` - Submit a new task with container image and entry command
- `get()` - Get task details by task ID
- `list()` - List tasks with optional filters (status, cluster_name, team_id, user_id)
- `cancel()` - Cancel a running task
- `delete()` - Delete a task record

**Task Status Values:**

```python
from mlops.api.client.models.task_status import TaskStatus

TaskStatus.PENDING      # Task is pending
TaskStatus.QUEUED       # Task is queued
TaskStatus.RUNNING      # Task is running
TaskStatus.COMPLETED    # Task completed successfully
TaskStatus.SUCCEEDED    # Task succeeded
TaskStatus.FAILED       # Task failed
TaskStatus.CANCELLED    # Task was cancelled
TaskStatus.CREATED      # Task was created
```

**Error Handling:**

```python
from mlops.exceptions import (
    APIException,
    AuthenticationException,
    NotFoundException,
    RateLimitException,
    TimeoutException,
    InvalidArgumentException,
    NotEnoughSpaceException
)

try:
    result = task.submit(name="test", cluster_name="slurm-cn", command="echo hello")
except AuthenticationException as e:
    print(f"Authentication failed: {e}")
except NotFoundException as e:
    print(f"Resource not found: {e}")
except APIException as e:
    print(f"API error: {e}")
```

> [!TIP] Error Handling
> SDKs automatically handle common errors and retry failed requests. Check SDK documentation for error handling best practices.

## Features

- Type-safe API clients
- Automatic authentication
- Error handling
- Request retry logic
- Response validation

## Resources

- [Python SDK Documentation](https://github.com/xcloud-service/xservice/tree/main/client/python-sdk)
- [API Reference](https://xcloud-service.com/docs/api)

