Metadata-Version: 2.4
Name: robin-commons
Version: 0.3.0
Summary: Common infrastructure components for Robin microservices
License-File: LICENSE
License-File: NOTICE
Author: Robin OSS
Author-email: oss@neeve.ai
Requires-Python: >=3.12.0,<4.0.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: fastapi (>=0.110.0)
Requires-Dist: httpx (>=0.28.1,<0.29.0)
Requires-Dist: loguru (>=0.7.2,<0.8.0)
Requires-Dist: nats-py (>=2.7.2,<3.0.0)
Requires-Dist: opentelemetry-api (>=1.37.0,<2.0.0)
Requires-Dist: opentelemetry-exporter-otlp (>=1.37.0,<2.0.0)
Requires-Dist: opentelemetry-instrumentation-fastapi (>=0.58b0,<0.59)
Requires-Dist: opentelemetry-instrumentation-grpc (>=0.58b0,<0.59)
Requires-Dist: opentelemetry-instrumentation-httpx (>=0.58b0,<0.59)
Requires-Dist: opentelemetry-instrumentation-redis (>=0.58b0,<0.59)
Requires-Dist: opentelemetry-instrumentation-sqlalchemy (>=0.58b0,<0.59)
Requires-Dist: opentelemetry-sdk (>=1.37.0,<2.0.0)
Requires-Dist: pydantic (>=2.12.5,<3.0.0)
Requires-Dist: redis (>=6.0.0,<7.0.0)
Requires-Dist: sqlalchemy (>=2.0.28,<3.0.0)
Description-Content-Type: text/markdown

# Common Infrastructure Library

[![CI](https://github.com/neeve-ai/robin-commons/actions/workflows/ci.yml/badge.svg)](https://github.com/neeve-ai/robin-commons/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/robin-commons)](https://pypi.org/project/robin-commons/)

A comprehensive Python library providing battle-tested infrastructure components for building resilient, observable microservices. Designed for teams building distributed systems with Python.

## Overview

This library extracts common patterns and utilities used in production microservices, enabling:

- **Consistent infrastructure patterns** across services
- **Single source of truth** for resilience, observability, and messaging
- **Rapid onboarding** of new services
- **Independent versioning** and upgrade paths

📖 **[Read the User Guide](docs/USER_GUIDE.md)** for comprehensive documentation and examples.

## Components

### 🛡️ Resilience: Circuit Breaker

Prevent cascading failures with automatic circuit breaking. Implements the three-state model (CLOSED, HALF_OPEN, OPEN) with configurable failure thresholds and recovery timeouts. Thread-safe with full async/sync support.

```python
from robin_commons.resilience.breaker import CircuitBreaker, CircuitBreakerConfig

config = CircuitBreakerConfig(
    failure_threshold=5,
    recovery_timeout_seconds=60,
    success_threshold=2
)
breaker = CircuitBreaker(config)

# Async context manager usage
async with breaker:
    await external_service.call()

# Or synchronous usage
with breaker:
    external_service.call()
```

**Features:**
- Three-state model: CLOSED (normal) → OPEN (failing fast) → HALF_OPEN (recovery testing)
- Thread-safe operations with RLock
- Both async and sync context manager support
- Detailed logging for state transitions and failures
- Observability properties: `is_open`, `state`, `failure_count`, `last_failure_time`

### 📝 Logging: Structured JSON Logging

Production-ready logging with JSON output and context propagation. Built on Loguru with environment-aware formatting for Grafana Loki/Alloy collection.

```python
from robin_commons.log import logger, configure_logging

configure_logging()
logger.info("Application started", service="my-service", version="1.0.0")

# Context variables are automatically propagated
from robin_commons.telemetry import set_correlation_id
set_correlation_id("correlation-123")
```

**Features:**
- JSON structured logging for container environments
- Automatic trace context integration
- Environment-aware output (JSON for production, human-readable for development)
- Thread-safe with enqueue=True
- Full backtrace and diagnostic information

### 📊 Telemetry: Observability Suite

Complete observability stack with distributed tracing, metrics collection, and request correlation using OpenTelemetry.

**Setup & Configuration:**
```python
from robin_commons.telemetry import setup_observability, get_observability_config

# Configure once at startup
setup_observability()

# Access config for service metadata
config = get_observability_config()
```

**Distributed Tracing:**
```python
from robin_commons.telemetry import span, async_span, add_span_event

# Synchronous span
@span("operation_name")
def process_data():
    add_span_event("processing_started")
    return data

# Asynchronous span
@async_span("async_operation")
async def fetch_data():
    add_span_event("fetch_completed")
    return data
```

**Metrics Collection:**
```python
from robin_commons.telemetry import (
    record_http_request,
    record_database_query,
    record_cache_operation,
    timed_operation
)

# Record HTTP requests
record_http_request(method="GET", status=200, duration=0.234)

# Record database queries
record_database_query(query="SELECT *", duration=0.015)

# Record cache operations
record_cache_operation(operation="get", hit=True, duration=0.001)

# Time operations
with timed_operation("expensive_operation"):
    result = perform_work()
```

**Request Correlation:**
```python
from robin_commons.telemetry import (
    get_correlation_id,
    set_correlation_id,
    get_request_id,
    set_request_id,
    log_correlation_context
)

# Set/get correlation IDs
set_correlation_id("corr-123")
correlation_id = get_correlation_id()

# Log correlation context
log_correlation_context()
```

**FastAPI Integration:**
```python
from fastapi import FastAPI
from robin_commons.telemetry import TraceMiddleware, setup_observability

app = FastAPI()
app.add_middleware(TraceMiddleware)

@app.on_event("startup")
async def startup():
    setup_observability()
```

**Auto-instrumentation:**
```python
from robin_commons.telemetry import get_instrumentation_manager

manager = get_instrumentation_manager()
manager.setup_all_instrumentation(app)

# Automatically instruments:
# - FastAPI applications
# - SQLAlchemy database operations
# - Redis cache operations
# - HTTPX/Requests HTTP clients
# - NATS messaging
# - gRPC services
```

### Status: Planned Components

**💾 Cache: Redis Client** (Coming soon)
- Redis client with automatic cluster detection
- Connection pooling and resilience features

**📨 Messaging: NATS Client** (Coming soon)
- Production-grade NATS client with JetStream support
- Durable pub/sub messaging with typed event publishing

## Installation

### From PyPI (when available)

```bash
pip install robin-commons
```

### From Source

```bash
git clone https://github.com/neeve-ai/robin-commons.git
cd robin-commons
pip install -e .
```

### Optional Dependencies

Install additional instrumentation for specific frameworks:

```bash
# FastAPI and SQLAlchemy observability
pip install robin-commons[fastapi,sqlalchemy]

# All optional dependencies
pip install robin-commons[all]
```

## Quick Start

### 1. Set Up Logging

```python
from robin_commons.log import configure_logging, logger

configure_logging()
logger.info("Application initialized")
```

### 2. Initialize Observability

```python
from robin_commons.telemetry import setup_observability

setup_observability()
```

### 3. Add Circuit Breaker

```python
from robin_commons.resilience.breaker import CircuitBreaker, CircuitBreakerConfig

breaker_config = CircuitBreakerConfig(
    failure_threshold=5,
    recovery_timeout_seconds=30,
    success_threshold=2
)
breaker = CircuitBreaker(breaker_config)

async with breaker:
    result = await call_external_service()
```

### 4. Add FastAPI with Observability

```python
from fastapi import FastAPI
from robin_commons.telemetry import TraceMiddleware, setup_observability
from robin_commons.log import configure_logging

# Configure at startup
configure_logging()
setup_observability()

app = FastAPI()
app.add_middleware(TraceMiddleware)

# Auto-instrument FastAPI and dependencies
from robin_commons.telemetry import get_instrumentation_manager
get_instrumentation_manager().setup_all_instrumentation(app)
```

## Configuration

All components support configuration via environment variables:

```bash
# Logging
ENVIRONMENT=production

# Observability
OTEL_SERVICE_NAME=my-service
OTEL_SERVICE_VERSION=1.0.0
OTEL_ENVIRONMENT=production
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_TRACES_SAMPLER_ARG=0.1

# For Grafana Cloud (optional)
GRAFANA_CLOUD_OTLP_ENDPOINT=https://otlp-gateway-prod-us-west-0.grafana.net/otlp
OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer<token>

# For local Alloy (cost optimization)
ENABLE_ALLOY=true
ALLOY_HOST=localhost
ALLOY_PORT=4317

# Instrumentation flags
OTEL_FASTAPI_INSTRUMENTATION_ENABLED=true
OTEL_SQLALCHEMY_INSTRUMENTATION_ENABLED=true
OTEL_REDIS_INSTRUMENTATION_ENABLED=true
OTEL_HTTPX_INSTRUMENTATION_ENABLED=true
OTEL_NATS_INSTRUMENTATION_ENABLED=true
OTEL_GRPC_INSTRUMENTATION_ENABLED=true
```

### Using Dotenv

Create a `.env` file:

```env
ENVIRONMENT=development
OTEL_SERVICE_NAME=my-service
OTEL_SERVICE_VERSION=0.1.0
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
```

Load in your application:

```python
from dotenv import load_dotenv
load_dotenv()

from robin_commons.log import configure_logging
from robin_commons.telemetry import setup_observability

configure_logging()
setup_observability()
```

## Architecture

```
robin_commons/
├── resilience/
│   ├── __init__.py
│   └── breaker.py              # Circuit breaker implementation
│       ├── CircuitBreaker       # Main class with async/sync support
│       ├── CircuitBreakerState  # Enum: CLOSED, OPEN, HALF_OPEN
│       ├── CircuitBreakerConfig # Configuration dataclass
│       └── CircuitBreakerError  # Exception for open circuit
│
├── log/
│   ├── __init__.py
│   └── config.py               # Logging configuration
│       ├── configure_logging()  # Setup JSON logging for Loki
│       └── logger               # Loguru logger instance
│
└── telemetry/
    ├── __init__.py             # Public exports
    ├── config.py               # Observability configuration
    │   ├── ServiceConfig        # Service metadata
    │   ├── OtlpExporterConfig   # OTLP endpoint config
    │   └── ObservabilityConfig  # Main config class
    ├── correlation.py          # Request correlation context
    │   ├── set_correlation_id()  # Set correlation context
    │   ├── get_correlation_id()  # Get correlation context
    │   └── Utilities for trace/span/user IDs
    ├── tracing.py              # Distributed tracing
    │   ├── setup_observability()        # Bootstrap tracing
    │   ├── @span decorator              # Sync spans
    │   ├── @async_span decorator        # Async spans
    │   ├── add_span_event()             # Add span events
    │   └── CircuitBreakerSpanExporter   # OTLP with circuit breaker
    ├── metrics.py              # Metrics collection
    │   ├── BaseMetricsCollector # Metrics manager
    │   ├── record_http_request()   # HTTP metrics
    │   ├── record_database_query() # DB metrics
    │   ├── record_cache_operation()# Cache metrics
    │   ├── record_error()         # Error tracking
    │   └── @timed_operation decorator
    ├── middleware.py           # FastAPI HTTP middleware
    │   └── TraceMiddleware      # W3C trace propagation
    └── instrumentation.py      # Auto-instrumentation manager
        └── InstrumentationManager
            ├── setup_all_instrumentation()
            ├── setup_fastapi_instrumentation()
            ├── setup_sqlalchemy_instrumentation()
            ├── setup_redis_instrumentation()
            ├── setup_httpx_instrumentation()
            ├── setup_nats_instrumentation()
            └── setup_grpc_instrumentation()
```

### Dependencies

**Core:**
- `pydantic>=2.12.5` - Configuration validation
- `loguru>=0.7.2` - Structured logging
- `httpx>=0.28.1` - Async HTTP client

**Observability:**
- `opentelemetry-api>=1.37.0` - OpenTelemetry API
- `opentelemetry-sdk>=1.37.0` - OpenTelemetry SDK
- `opentelemetry-exporter-otlp>=1.37.0` - OTLP exporter

**Auto-instrumentation:**
- `opentelemetry-instrumentation-fastapi>=0.58b0`
- `opentelemetry-instrumentation-sqlalchemy>=0.58b0`
- `opentelemetry-instrumentation-redis>=0.58b0`
- `opentelemetry-instrumentation-httpx>=0.58b0`
- `opentelemetry-instrumentation-grpc>=0.58b0`

**Framework Integrations:**
- `fastapi>=0.110.0` - Web framework
- `sqlalchemy>=2.0.28` - ORM
- `redis>=5.0.1` - Cache client

## Versioning

This library follows [Semantic Versioning](https://semver.org/):

- **MAJOR**: Breaking API changes
- **MINOR**: New features (backward-compatible)
- **PATCH**: Bug fixes

See [CHANGELOG.md](CHANGELOG.md) for detailed version history.

## Testing

Run tests locally:

```bash
pytest tests/ -v

# With coverage
pytest tests/ --cov=robin_commons --cov-report=html
```

Run integration tests (requires Docker):

```bash
docker-compose -f docker-compose.test.yml up
pytest tests/integration/ -v
```

## Troubleshooting

### Circuit Breaker Always Open

**Problem:** Your circuit breaker stays in the OPEN state and doesn't recover.

**Solutions:**
1. Check your `failure_threshold` - it might be too low. Try increasing it.
2. Verify the external service is actually recovering and returning successful responses.
3. Check `recovery_timeout_seconds` - ensure it's giving enough time for recovery (default: 60s).
4. In HALF_OPEN state, you need `success_threshold` consecutive successes to close (default: 2).

**Debug:**
```python
breaker = CircuitBreaker(config)
print(f"State: {breaker.state}")
print(f"Failure count: {breaker.failure_count}")
print(f"Next attempt: {breaker.get_next_attempt_time()}")
```

### Missing Traces in OTLP Collector

**Problem:** Spans are not appearing in Grafana, Jaeger, or your OTLP collector.

**Solutions:**
1. Verify `OTEL_EXPORTER_OTLP_ENDPOINT` is correct and reachable:
   ```bash
   curl -i http://localhost:4317/healthz
   ```
2. Check if the OTLP collector is running and accessible from your application.
3. Verify network connectivity - check firewall rules, DNS resolution.
4. Enable debug logging:
   ```python
   import logging
   logging.basicConfig(level=logging.DEBUG)
   ```
5. Check sample rate - default is 0.1 (10% of traces). Set `OTEL_TRACES_SAMPLER_ARG=1.0` for 100%.

### Logging Not Appearing

**Problem:** Your logs aren't being captured or formatted incorrectly.

**Solutions:**
1. Call `configure_logging()` early in your application startup (before creating loggers).
2. For development, set `ENVIRONMENT=development` to get colored console output.
3. For production, use `ENVIRONMENT=production` for JSON output suitable for Loki.
4. Verify log level - default is INFO. Set `DEBUG` to see more details.
5. Check if logs are being enqueued properly (they are async by default).

**Debug:**
```python
from robin_commons.log import logger, configure_logging
configure_logging()
logger.debug("Debug message")
logger.info("Info message")
logger.error("Error message")
```

### Instrumentation Not Working

**Problem:** FastAPI, SQLAlchemy, Redis, or other libraries aren't being instrumented.

**Solutions:**
1. Call instrumentation setup **after** creating app instances:
   ```python
   app = FastAPI()
   from robin_commons.telemetry import get_instrumentation_manager
   manager = get_instrumentation_manager()
   manager.setup_all_instrumentation(app)
   ```
2. Verify instrumentation is enabled in config:
   ```bash
   OTEL_FASTAPI_INSTRUMENTATION_ENABLED=true
   OTEL_SQLALCHEMY_INSTRUMENTATION_ENABLED=true
   ```
3. Check that required packages are installed (instrumentation packages are optional).
4. For FastAPI, add middleware **before** instrumentation:
   ```python
   from robin_commons.telemetry import TraceMiddleware
   app.add_middleware(TraceMiddleware)
   get_instrumentation_manager().setup_all_instrumentation(app)
   ```

### Context Variables Not Propagating

**Problem:** Correlation IDs or trace context not appearing in logs.

**Solutions:**
1. Set correlation context **early** in request processing:
   ```python
   from robin_commons.telemetry import set_correlation_id
   set_correlation_id(request.headers.get("X-Correlation-ID"))
   ```
2. Ensure `TraceMiddleware` is added to FastAPI:
   ```python
   app.add_middleware(TraceMiddleware)
   ```
3. Context variables are async-aware - ensure you're using async functions.
4. For manual context setup, use async tasks carefully:
   ```python
   # In async context - this works
   set_correlation_id("id-123")
   
   # In thread pool - create new context
   import asyncio
   asyncio.run(async_operation())
   ```

### OTLP Connection Errors

**Problem:** Getting connection refused or timeout errors when exporting spans.

**Solutions:**
1. **Local development:** Ensure OTLP collector is running:
   ```bash
   docker run -p 4317:4317 ghcr.io/open-telemetry/opentelemetry-collector
   ```
2. **Using Grafana Cloud:** Verify endpoint and headers:
   ```bash
   GRAFANA_CLOUD_OTLP_ENDPOINT=https://otlp-gateway-prod-us-west-0.grafana.net/otlp
   OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer<YOUR_TOKEN>
   ```
3. **Using Alloy:** Enable with:
   ```bash
   ENABLE_ALLOY=true
   ```
4. **Network issues:** Check:
   - Firewall rules allowing egress to OTLP endpoint
   - DNS resolution for the endpoint
   - TLS certificate validity (for https endpoints)

### High Memory Usage

**Problem:** Application memory grows over time due to telemetry.

**Solutions:**
1. Adjust batch processor settings - configure maximum queue size:
   ```python
   # Default batch size is 512 spans
   # Adjust if needed based on traffic
   ```
2. Reduce sample rate if 100% sampling is enabled:
   ```bash
   OTEL_TRACES_SAMPLER_ARG=0.1  # Sample 10% instead of 100%
   ```
3. For high-traffic services, consider:
   - Using Grafana Cloud or managed observability
   - Sampling at the application level
   - Disabling specific instrumentations if not needed

### Configuration Not Being Applied

**Problem:** Environment variables or configuration changes don't take effect.

**Solutions:**
1. Ensure variables are set **before** importing robin_commons:
   ```python
   import os
   os.environ["OTEL_SERVICE_NAME"] = "my-service"
   
   from robin_commons.telemetry import setup_observability
   setup_observability()
   ```
2. Use `.env` files with `python-dotenv`:
   ```python
   from dotenv import load_dotenv
   load_dotenv()
   ```
3. Verify variables are actually set:
   ```python
   import os
   print(os.getenv("OTEL_SERVICE_NAME"))
   ```
4. Some configuration is cached - restart the application after changing env vars.

See [docs/troubleshooting.md](docs/troubleshooting.md) for more detailed guidance and common issues.

## Contributing

Contributions are welcome! Please follow these guidelines:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Add tests for new functionality
4. Ensure all tests pass (`pytest`)
5. Commit with clear messages (`git commit -m 'Add amazing feature'`)
6. Push to your branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

### Code Style

- Use [Black](https://github.com/psf/black) for formatting
- Follow [PEP 8](https://pep8.org/) conventions
- Add type hints to all functions
- Maintain > 90% test coverage

## License

This project is licensed under the **Apache License, Version 2.0**.

You are free to use, modify, and distribute this library in accordance
with the terms of the license. A copy of the license is available in the
[LICENSE](./LICENSE) file.

### Scope Clarification

This repository contains **open-source shared libraries** used within
the Robin ecosystem, such as common utilities, logging infrastructure,
and foundational components.

It does **not** include:
- The Robin core engine
- Agent orchestration logic
- Proprietary AI models or workflows
- Commercial SaaS infrastructure

Those components are part of Neeve’s proprietary systems and are
distributed separately under commercial terms.

### Contributions

By contributing to this repository, you agree that your contributions
will be licensed under the Apache License, Version 2.0.

## Support

- 📚 [Documentation](docs/)
- 🐛 [Issue Tracker](https://github.com/neeve-ai/robin-commons/issues)
- 💬 [Discussions](https://github.com/neeve-ai/robin-commons/discussions)

## Related Reading

- [OpenTelemetry Documentation](https://opentelemetry.io/)
- [NATS Documentation](https://docs.nats.io/)
- [Circuit Breaker Pattern](https://martinfowler.com/bliki/CircuitBreaker.html)

