Metadata-Version: 2.4
Name: storage-node-env
Version: 0.13.0
Summary: Gymnasium environments for simulating energy nodes with battery energy storage systems
Author-email: Leonardo Guiducci <leonardo.guiducci@unisi.it>
License: MIT
Project-URL: Homepage, https://github.com/unisi-lab305/storage-node-environment
Project-URL: Repository, https://github.com/unisi-lab305/storage-node-environment
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: gymnasium>=0.29.0
Requires-Dist: holidays>=0.35
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandas>=2.0.0
Provides-Extra: dev
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: pylint>=3.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: rl
Requires-Dist: sb3-contrib>=2.0.0; extra == "rl"
Requires-Dist: stable-baselines3>=2.0.0; extra == "rl"
Requires-Dist: tensorboard>=2.14.0; extra == "rl"
Provides-Extra: visualization
Requires-Dist: lttb>=0.3.0; extra == "visualization"
Requires-Dist: matplotlib>=3.5.0; extra == "visualization"
Requires-Dist: rich>=14.0.0; extra == "visualization"
Requires-Dist: seaborn>=0.12.0; extra == "visualization"
Requires-Dist: tqdm>=4.66.0; extra == "visualization"
Provides-Extra: all
Requires-Dist: pre-commit>=3.5.0; extra == "all"
Requires-Dist: pylint>=3.0.0; extra == "all"
Requires-Dist: pytest-cov>=4.0.0; extra == "all"
Requires-Dist: pytest>=7.4.0; extra == "all"
Requires-Dist: ruff>=0.1.0; extra == "all"
Requires-Dist: sb3-contrib>=2.0.0; extra == "all"
Requires-Dist: stable-baselines3>=2.0.0; extra == "all"
Requires-Dist: tensorboard>=2.14.0; extra == "all"
Requires-Dist: lttb>=0.3.0; extra == "all"
Requires-Dist: matplotlib>=3.5.0; extra == "all"
Requires-Dist: rich>=14.0.0; extra == "all"
Requires-Dist: seaborn>=0.12.0; extra == "all"
Requires-Dist: tqdm>=4.66.0; extra == "all"
Dynamic: license-file

# StorageNode Environment

Gymnasium environment for simulating an energy node with battery energy storage system (BESS). Physics-based battery modeling using commercial datasheet parameters for reinforcement learning applications.

[![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![CI Tests](https://github.com/unisi-lab305/storage-node-environment/workflows/CI%20Tests/badge.svg)

## Features

- **Gymnasium-compatible environment** registered as `storage_node_env/EnergyStorage-v0`
- **Physics-based battery modeling** with commercial datasheet parameters
- **Two energy node types**: Producer (production only) and Prosumer (production + consumption)
- **Modular reward system** for different optimization objectives (`self_consumption`, `economic`, `self_consumption_delta`)
- **Rule-based controllers** for baseline comparison
- **Flexible observation space** with optional preprocessing and cyclical encoding

## Installation

### From Source (Development Mode)

```bash
git clone https://github.com/unisi-lab305/storage-node-environment.git
cd storage-node-environment
pip install -e .
```

### From PyPI (When Published)

```bash
pip install storage-node-env
```

The environment is automatically registered with Gymnasium on import and can be instantiated using `gym.make()`.

## Quick Start

### Method 1: Using gym.make() (Recommended)

```python
import gymnasium as gym
import storage_node_env  # Trigger environment registration

# Battery configuration
battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

# Run simulation
obs, info = env.reset(seed=42)
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        break

env.close()
```

### Method 2: Direct Import (Backward Compatible)

```python
from storage_node_env.gym import EnergyStorageEnv

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

env = EnergyStorageEnv(
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

obs, info = env.reset()
# ... same usage as above
```

**Note:** The `gym.make()` approach is recommended as it follows standard Gymnasium conventions and ensures compatibility with Gymnasium ecosystem tools.

## Environment Parameters

| Parameter | Type | Default | Required | Description |
|-----------|------|---------|----------|-------------|
| `node_type` | `str` | - | **Yes** | Type of energy node: `'producer'` or `'prosumer'` |
| `csv_path` | `str` | - | **Yes** | Path to CSV file with historical data |
| `battery_config` | `dict[str, float]` | - | **Yes** | Dictionary with battery parameters (see below) |
| `delta_t` | `float` | - | **Yes** | Timestep duration in hours (e.g., 1.0, 0.25) |
| `lookback_n` | `int` | `2` | No | Number of historical timesteps in observation buffer |
| `num_actions` | `int` | `21` | No | Number of discrete actions (must be odd) |
| `use_preprocessing` | `bool` | `False` | No | Enable observation preprocessing (cyclical encoding, normalization) |
| `add_holiday` | `bool` | `True` | No | Add Italian holiday feature (requires `use_preprocessing=True`) |
| `reward_settings` | `dict \| None` | `None` | No | Reward configuration (see Reward System section) |

### CSV Data Requirements

The CSV file must contain a `datetime` column and node-specific columns:

**For Producer Nodes:**

- `datetime`: Timestamp (e.g., `'2024-01-15 00:00:00'`)
- `production`: Power produced in kW
- `buy_price`: Grid purchase price
- `sell_price`: Grid selling price in €/kWh

**For Prosumer Nodes:**

- `datetime`: Timestamp
- `production`: Power produced in kW
- `consumption`: Power consumed by loads in kW
- `buy_price`: Grid purchase price
- `sell_price`: Grid selling price in €/kWh

**Important:** The `delta_t` parameter must match the frequency of your CSV data (e.g., `delta_t=1.0` for hourly data, `delta_t=0.25` for 15-minute data).

## Battery Configuration

The `battery_config` dictionary contains physical parameters for battery simulation based on commercial datasheets.

### Parameters

| Parameter | Type | Required | Valid Range | Units | Description |
|-----------|------|----------|-------------|-------|-------------|
| `capacity` | `float` | **Yes** | > 0 | kWh | Nominal capacity (C_nom) |
| `dod_max` | `float` | **Yes** | 0 < x ≤ 100 | % | Maximum depth of discharge |
| `power_charge_max` | `float` | **Yes** | > 0 | kW | Maximum charging power |
| `power_discharge_max` | `float` | **Yes** | > 0 | kW | Maximum discharging power |
| `efficiency_charge` | `float` | **Yes** | 0 < x ≤ 1 | - | Charging efficiency (e.g., 0.95 for 95%) |
| `efficiency_discharge` | `float` | **Yes** | 0 < x ≤ 1 | - | Discharging efficiency (e.g., 0.95 for 95%) |
| `alpha` | `float` | No | 0 ≤ x < 1 | - | Parasitic loss coefficient (default: 0.0) |
| `soc_initial` | `float \| None` | No | C_min ≤ x ≤ C_max | kWh | Initial state of charge (default: 50% capacity) |
| `allow_arbitrage` | `bool` | No | `True` / `False` | - | If `False`, charging is capped at current PV production each timestep — battery cannot charge from the grid. Compatible with all reward types and controllers. (default: `True`) |

### Physical Meaning

- **Capacity**: Total energy storage when fully charged
- **DoD (Depth of Discharge)**: Usable capacity percentage (e.g., 90% DoD means 90% of nominal capacity is usable)
- **Power limits**: C-rate constraints from battery datasheet (separate for charge/discharge)
- **Efficiency**: Round-trip energy losses during charge/discharge operations (separate for each direction)
- **Alpha**: Standby consumption per timestep (e.g., 0.001 = 0.1% loss per timestep)
- **SoC initial**: Starting energy level in kWh (if `None`, starts at 50% of nominal capacity)

### Power Convention

- **Positive power** = charging (battery absorbs energy from the grid)
- **Negative power** = discharging (battery releases energy to the grid)

### Example Configuration

**Typical values based on ZCS AZZURRO HV ZBT 5K battery:**

```python
battery_config = {
    'capacity': 5.12,                    # 5.12 kWh nominal capacity
    'dod_max': 90,                       # 90% depth of discharge
    'power_charge_max': 2.5,             # 2.5 kW maximum charging power
    'power_discharge_max': 2.5,          # 2.5 kW maximum discharging power
    'efficiency_charge': 0.95,           # 95% charging efficiency
    'efficiency_discharge': 0.95,        # 95% discharging efficiency
    'alpha': 0.0,                        # No parasitic losses (optional)
    'soc_initial': 2.56                  # Start at 50% SoC (optional)
}
```

**Derived parameters (computed automatically):**

- `C_min = (1 - dod_max/100) × capacity` → Minimum usable SoC (kWh)
- `C_max = capacity` → Maximum usable SoC (kWh)

## Reward System

The environment provides a **modular reward system** supporting different optimization objectives through configurable reward calculators.

### Available Reward Types

| Reward Type | Description | Best For | Suitable Node Types |
| --- | --- | --- | --- |
| `'self_consumption'` | Maximize local energy consumption, minimize grid dependency | Prosumer nodes optimizing grid independence | `['prosumer']` |
| `'economic'` | Maximize profit / minimize cost based on net economic outcome | Economic optimization, price-responsive agents | `['producer', 'prosumer']` |
| `'self_consumption_delta'` | Dense, zero-centred signal: improvement over no-battery baseline. Positive when battery helps, negative when it hurts | Prosumers where natural self-consumption is already high (sparse gradient with `'self_consumption'`) | `['prosumer']` |

### Configuration Structure

```python
reward_settings = {
    'type': str,                 # Required: 'self_consumption', 'economic', or 'self_consumption_delta'
    'weights': dict[str, float], # Optional: weight coefficients
    'normalize': bool            # Optional: normalize rewards (default: False)
}
```

### Weight Parameters

| Weight Key | Default | Description |
|------------|---------|-------------|
| `'main'` | `1.0` | Weight for main reward component |
| `'violation_penalty'` | `0.5` | Weight for power constraint violation penalty (normalised to ≈ [0, 1]) |
| `'storage_usage_penalty'` | `0.0` | Weight for battery usage/wear penalty (disabled by default) |

### Reward Composition

The total reward is a weighted linear combination:

```text
total_reward = (weights['main'] × R_main)
             - (weights['violation_penalty'] × P_violation)
             - (weights['storage_usage_penalty'] × P_usage)
```

Where:

- `R_main`: Main reward component (implementation-specific, typically ∈ [0, 1] or ≈ [-1, 1])
- `P_violation`: Power constraint violation normalised by battery max charge power (`|kW| / P_cha_max`) → ≈ [0, 1]
- `P_usage`: Battery usage penalty (absolute SoC change in percentage points)

### Configuration Examples

#### 1. Default (Automatic Selection)

If `reward_settings=None`, the environment automatically selects:

- **Prosumer nodes** → `'self_consumption'`
- **Producer nodes** → `'economic'`

```python
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
    # No reward_settings → uses 'self_consumption' by default
)
```

#### 2. Minimal Configuration

Specify only the reward type, use default weights:

```python
reward_settings = {
    'type': 'economic'
    # 'weights' will use defaults from registry
    # 'normalize' will default to False
}
```

#### 3. Balanced Strategy

Moderate optimization with constraint awareness:

```python
reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.5,
        'storage_usage_penalty': 0.1
    },
    'normalize': False
}
```

#### 4. Aggressive Optimization

High main weight, low penalties (may violate constraints):

```python
reward_settings = {
    'type': 'economic',
    'weights': {
        'main': 10.0,              # Strong economic signal
        'violation_penalty': 0.1,   # Allow some violations
        'storage_usage_penalty': 0.01  # Minimal wear penalty
    }
}
```

#### 5. Conservative Strategy

High penalties for strict constraint adherence:

```python
reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 5.0,    # Strict constraint adherence
        'storage_usage_penalty': 1.0  # Discourage battery cycling
    }
}
```

#### 6. Dense Delta Reward

Zero-centred per-step signal based on improvement over the idle-battery baseline:

```python
reward_settings = {
    'type': 'self_consumption_delta',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.5,
        'storage_usage_penalty': 0.0
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    reward_settings=reward_settings
)
```

### Choosing Reward Type

| Node Type | Primary Goal | Recommended Reward |
| --- | --- | --- |
| Prosumer | Minimize grid dependency | `'self_consumption'` |
| Prosumer | Minimize costs | `'economic'` |
| Prosumer | Dense training signal (high baseline sc) | `'self_consumption_delta'` |
| Producer | Maximize profit | `'economic'` |

### Reward Normalization

By default, rewards are **raw (unnormalized)** for interpretability and Stable-Baselines3 compatibility.

#### Option 1: SB3 VecNormalize (Recommended)

```python
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

env = gym.make('storage_node_env/EnergyStorage-v0', ...)
env = DummyVecEnv([lambda: env])
env = VecNormalize(
    env,
    norm_obs=False,      # Disable observation normalization
    norm_reward=True,    # Enable reward normalization
    clip_reward=10.0,
    gamma=0.99
)
```

#### Option 2: Built-in Normalization

```python
reward_settings = {
    'type': 'self_consumption',
    'normalize': True  # Enable built-in normalization
}
```

## Rule-Based Controllers

The environment includes **rule-based controllers** that serve as baselines for comparing reinforcement learning agents. These controllers implement fixed decision rules.

**Two usage patterns:**

1. **Direct node evaluation** (recommended for standalone RBC testing): Use controllers with energy node classes (Battery + Producer/Prosumer)
2. **Gymnasium environment evaluation** (v0.4.0+, for RBC vs RL comparison): Use `get_controller_observation()` method to evaluate controllers on Gymnasium environments

### Available Controllers

| Controller | Policy | Use Case | Parameters |
|------------|--------|----------|------------|
| `NaiveController` | Always neutral action (no battery control) | Baseline to measure value of any control strategy | `num_actions` |
| `PriceBasedController` | Energy arbitrage based on electricity prices (charge at low prices, discharge at high prices) | Producer nodes or prosumers with time-of-use tariffs | `num_actions`, `window_size`, `charge_action_pct`, `discharge_action_pct` |
| `SelfConsumptionController` | Maximize local self-consumption (charge during excess production, discharge during deficit) | Prosumer nodes optimizing for grid independence | `num_actions`, `balance_threshold` |

### Usage Example

Controllers are used with Node classes (Producer/Prosumer), not with the Gymnasium environment:

```python
from storage_node_env.core import Prosumer, Battery
from storage_node_env.gym.controllers import SelfConsumptionController

# Create battery and node
battery = Battery(
    capacity=30.0,
    dod_max=90,
    power_charge_max=10.0,
    power_discharge_max=10.0,
    efficiency_charge=0.95,
    efficiency_discharge=0.95
)

node = Prosumer(
    csv_path='dataset/1h/prosumer_test_data.csv',
    delta_t=1.0,
    num_actions=21
)
node.set_storage(battery)
node.reset()

# Create controller
controller = SelfConsumptionController(num_actions=21, balance_threshold=0.5)

# Evaluation loop
total_cost = 0.0
for t in range(len(node.data) - 2):
    # Get current data
    current_row = node.data.iloc[node.time_step]

    # Build observation dictionary for controller
    observation = {
        'production': current_row['production'],
        'consumption': current_row['consumption'],
        'buy_price': current_row['buy_price'],
        'sell_price': current_row['sell_price'],
        'energy_balance': current_row['production'] - current_row['consumption'],
        'final_soc': battery.soc_percent,
        'upper_bound': battery.get_bounds_percent(node.delta_t)[0],
        'lower_bound': battery.get_bounds_percent(node.delta_t)[1]
    }

    # Get action from controller
    action = controller.choose_action(observation, {})

    # Step node
    node_results = node.step(action)
    total_cost += node_results['net_cost']

    # Advance time
    node.advance_time()

print(f'Total cost: {total_cost:.4f} €')
```

### Evaluating Controllers on Gymnasium Environment (v0.4.0+)

**NEW**: For comparing rule-based controllers against RL agents on the same environment:

```python
from typing import cast
import gymnasium as gym
from storage_node_env.gym import EnergyStorageEnv
from storage_node_env.gym.controllers import SelfConsumptionController

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

# Access unwrapped environment for custom methods
gym_env = cast(EnergyStorageEnv, env.unwrapped)

# Create controller
controller = SelfConsumptionController(num_actions=21)

# Evaluation loop
obs, info = env.reset(seed=42)
total_cost = 0.0

while True:
    # Get controller observation from unwrapped environment
    controller_obs = gym_env.get_controller_observation()
    action = controller.choose_action(controller_obs, {})

    obs, reward, terminated, truncated, info = env.step(action)
    total_cost += info['net_cost']

    if terminated or truncated:
        break

print(f'Total cost: {total_cost:.4f} €')
env.close()
```

**Benefits:**

- ✅ RBC and RL agents see identical data
- ✅ Works with Gym wrappers (VecEnv, Monitor)
- ✅ Type-safe API with `get_controller_observation()`

### Instantiation Examples

```python
from storage_node_env.gym.controllers import (
    NaiveController,
    PriceBasedController,
    SelfConsumptionController
)

# 1. Naive controller (baseline)
naive = NaiveController(num_actions=21)

# 2. Price-based controller (energy arbitrage)
price_based = PriceBasedController(
    num_actions=21,
    window_size=168,           # 1 week rolling window
    charge_action_pct=75.0,    # 50% charge power
    discharge_action_pct=25.0  # 50% discharge power
)
price_based.reset()  # Reset before each episode

# 3. Self-consumption controller
self_consumption = SelfConsumptionController(
    num_actions=21,
    balance_threshold=0.5  # Minimum 0.5 kW imbalance to act
)
```

### Utility Functions

```python
from storage_node_env.gym.controllers import list_controllers, print_controllers

# List available controllers
controllers_info = list_controllers()
# Returns: {'NaiveController': 'description...', 'PriceBasedController': ...}

# Print formatted information
print_controllers()
```

## Complete Examples

### Example 1: Prosumer with Preprocessing

```python
import gymnasium as gym
import storage_node_env

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.5,
        'storage_usage_penalty': 0.0
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    lookback_n=2,
    use_preprocessing=True,    # Enable cyclical encoding
    add_holiday=True,          # Add holiday feature
    reward_settings=reward_settings
)

obs, info = env.reset(seed=42)
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    print(f'Step {step+1}: reward={reward:.4f}, net_cost={info["net_cost"]:.4f} €')

    if terminated or truncated:
        break

env.close()
```

### Example 2: Producer with Energy Arbitrage

```python
import gymnasium as gym
import storage_node_env

battery_config = {
    'capacity': 30.0,
    'dod_max': 90,
    'power_charge_max': 10.0,
    'power_discharge_max': 10.0,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

reward_settings = {
    'type': 'economic',
    'weights': {
        'main': 100.0,             # Amplify economic signal
        'violation_penalty': 10.0,
        'storage_usage_penalty': 1.0
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='producer',
    csv_path='dataset/1h/producer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    reward_settings=reward_settings
)

obs, info = env.reset()
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    print(f'Step {step+1}: reward={reward:.4f}, net_profit={info["net_profit"]:.4f} €')

    if terminated or truncated:
        break

env.close()
```

### Example 3: Training with Stable-Baselines3

```python
import gymnasium as gym
import storage_node_env
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    use_preprocessing=True
)

# Wrap in vectorized environment and normalize rewards
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=False, norm_reward=True, clip_reward=10.0)

# Train PPO agent
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=100000)

# Save model
model.save('ppo_prosumer')
```

### Example 4: Parallel Training with SubprocVecEnv

`SubprocVecEnv` spawns each environment in a separate subprocess, enabling true CPU parallelism. Each worker runs an independent episode — all reading the same CSV but with different random seeds — so data collection scales with the number of available cores.

```python
import gymnasium as gym
import storage_node_env
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv, VecNormalize
from stable_baselines3.common.utils import set_random_seed

def make_env(csv_path: str, battery_config: dict, rank: int, seed: int = 0):
    """Factory that creates one environment instance for a subprocess worker."""
    def _init():
        env = gym.make(
            'storage_node_env/EnergyStorage-v0',
            node_type='prosumer',
            csv_path=csv_path,
            battery_config=battery_config,
            delta_t=1.0,
            use_preprocessing=True,
            reward_settings={'type': 'self_consumption_delta'}
        )
        env.reset(seed=seed + rank)
        return env
    set_random_seed(seed + rank)
    return _init

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

N_ENVS = 4  # number of parallel workers (tune to CPU core count)
CSV_PATH = 'dataset/1h/prosumer_test_data.csv'

# Create vectorised environment with one subprocess per worker
vec_env = SubprocVecEnv(
    [make_env(CSV_PATH, battery_config, rank=i) for i in range(N_ENVS)]
)

# Normalise rewards online across all workers
vec_env = VecNormalize(vec_env, norm_obs=False, norm_reward=True, clip_reward=10.0)

# Train — SB3 collects N_ENVS steps per rollout step automatically
model = PPO('MlpPolicy', vec_env, verbose=1, n_steps=512, batch_size=128)
model.learn(total_timesteps=500_000)

model.save('ppo_prosumer_parallel')
vec_env.save('vec_normalize.pkl')
vec_env.close()
```

**When to use `SubprocVecEnv` vs `DummyVecEnv`:**

| | `DummyVecEnv` | `SubprocVecEnv` |
| --- | --- | --- |
| Parallelism | Sequential (single process) | True multiprocess (one core per env) |
| Overhead | Minimal | IPC serialisation per step |
| Best for | Debugging, fast envs, < 4 cores | Long rollouts, many cores, slow envs |
| Usage | Drop-in replacement | Replace `DummyVecEnv` with `SubprocVecEnv` |

### Example 5: Multiple Independent Environments for Evaluation

Run several independent evaluation episodes in parallel and aggregate metrics:

```python
import numpy as np
import gymnasium as gym
import storage_node_env
from stable_baselines3.common.vec_env import SubprocVecEnv

def make_eval_env(csv_path: str, battery_config: dict, rank: int):
    def _init():
        return gym.make(
            'storage_node_env/EnergyStorage-v0',
            node_type='prosumer',
            csv_path=csv_path,
            battery_config=battery_config,
            delta_t=1.0,
            use_preprocessing=True,
        )
    return _init

N_EVAL = 4
vec_env = SubprocVecEnv(
    [make_eval_env('dataset/1h/prosumer_test_data.csv', battery_config, i)
     for i in range(N_EVAL)]
)

obs = vec_env.reset()
episode_costs = np.zeros(N_EVAL)
done_flags = np.zeros(N_EVAL, dtype=bool)

while not done_flags.all():
    actions = vec_env.action_space.sample()   # replace with model.predict(obs)
    obs, rewards, dones, infos = vec_env.step(actions)
    for i, info in enumerate(infos):
        if not done_flags[i]:
            episode_costs[i] += info.get('net_cost', 0.0)
    done_flags |= dones

print(f'Mean episode cost across {N_EVAL} workers: {episode_costs.mean():.4f} €')
vec_env.close()
```

## Project Structure

```txt
storage_node_env/
├── core/                    # Core simulation components
│   ├── base/                # Abstract base classes
│   ├── storage/             # Battery implementation
│   └── nodes/               # Energy node implementations (Producer, Prosumer)
├── gym/                       # Gymnasium integration
│   ├── energy_storage_env.py  # Main environment class
│   ├── utils.py               # Observation building utilities
│   ├── preprocessing/         # Feature encoding and preprocessing
│   ├── rewards/               # Modular reward system
│   └── controllers/           # Rule-based baseline controllers
└── __init__.py                # Package initialization and version info
```

## Documentation

- **[REWARD_SYSTEM.md](storage_node_env/gym/rewards/REWARD_SYSTEM.md)**: Detailed reward system documentation
- **[CONTROLLERS.md](storage_node_env/gym/controllers/CONTROLLERS.md)**: Detailed reward system documentation

## Repository

- **GitHub**: [https://github.com/unisi-lab305/storage-node-environment](https://github.com/unisi-lab305/storage-node-environment)
- **License**: MIT

## Citation

If you use this environment in your research, please cite:

```bibtex
@software{storage_node_env,
  title = {Storage Node Environment: Gymnasium Environment for Battery Energy Storage Systems},
  author = {Leonardo Guiducci},
  email = {leonardo.guiducci@unisi.it},
  year = {2025},
  url = {https://github.com/unisi-lab305/storage-node-environment}
}
```

## Contributing

Contributions are welcome! Please see [CLAUDE.md](CLAUDE.md) for development guidelines and coding standards.

## License

This project is licensed under the MIT License - see the LICENSE file for details.
