Metadata-Version: 2.4
Name: colver
Version: 0.6.2
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Games/Entertainment :: Board Games
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: numpy
Requires-Dist: matplotlib>=3.5
Requires-Dist: runpod>=1.8.1
Requires-Dist: colver[web] ; extra == 'dev'
Requires-Dist: torch>=2.0 ; extra == 'dev'
Requires-Dist: maturin>=1.0,<2.0 ; extra == 'dev'
Requires-Dist: fastapi>=0.100.0 ; extra == 'web'
Requires-Dist: uvicorn[standard]>=0.20.0 ; extra == 'web'
Requires-Dist: websockets>=12.0 ; extra == 'web'
Requires-Dist: aiosqlite>=0.20.0 ; extra == 'web'
Provides-Extra: dev
Provides-Extra: web
License-File: LICENSE
Summary: Fast Belote Contree game engine with AI agents
Keywords: belote,card-game,reinforcement-learning,mcts
License-Expression: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

<p align="center">
  <img src="https://raw.githubusercontent.com/Avo-k/colver/master/images/colver.png" alt="Colver Logo" width="200">
</p>

<p align="center">
  <a href="https://pypi.org/project/colver/"><img src="https://img.shields.io/pypi/v/colver?color=blue" alt="PyPI"></a>
  <a href="https://pypi.org/project/colver/"><img src="https://img.shields.io/pypi/pyversions/colver" alt="Python"></a>
  <a href="https://colver.net/"><img src="https://img.shields.io/badge/demo-colver.net-green" alt="Live Demo"></a>
  <a href="https://github.com/Avo-k/colver/blob/master/LICENSE"><img src="https://img.shields.io/pypi/l/colver" alt="License"></a>
</p>

# Colver

**[Lire en francais](README.fr.md)**

Fast Belote Contree game environment for reinforcement learning. Rust core with Python bindings.

**Live demo: [colver.net](https://colver.net)** — running on a Raspberry Pi.

## Features

- **~1.4M rollouts/sec** single-threaded (play phase), ~895K rollouts/sec on a full deal
- **56-byte `Copy` game state** for fast MCTS cloning
- **Six AI agents** — DMC Q-network, IS-DD with belief network, DD oracle, Smart/Naive IS-MCTS, and heuristic
- **NN bidding** — "Bid V5 IS-DD", a score-aware Dueling DQN (113→512³→43) trained 25M steps on real IS-DD points, used by all agents
- **ML interpretability** — XGBoost distillation + hidden-layer probe reveal the NN's implicit scoring system, translated into human-usable rules (88-94% agreement)
- **Belief network** — NN-based card location prediction for IS-DD search
- **Web interface** — play against AI, spectate, analyze, and solve problems (FastAPI + WebSocket)
- **Python bindings** via PyO3 — `Env` class with full type stubs, installable from PyPI
- Zero dependencies in the core (only `rand` behind a feature flag)

## Web Interface

Play against AI agents directly in your browser at **[colver.net](https://colver.net)**, or run it locally:

```bash
uv run python -m colver.web
# Or: uv run colver-web
# Open http://localhost:8000
```

**Humain vs IA** — Play as South against AI opponents. Choose the agent for your opponents (East/West) and your partner (North) independently. The game follows official FFB Belote Contree rules: bidding with coinche/surcoinche, then 8 tricks. Cards are played instantly on click; the pause slider controls AI thinking delay.

![Play tab](https://raw.githubusercontent.com/Avo-k/colver/master/images/screenshots/tab-play.png)

**IA vs IA** — Spectate AI vs AI matches with all hands visible. Assign a different agent to each of the 4 seats. Step through actions, play full tricks, or use auto-play. The stats panel shows Q-values, DD scores, or hand evaluations for each decision. Paste a CFN string to load a specific position.

![Watch tab](https://raw.githubusercontent.com/Avo-k/colver/master/images/screenshots/tab-watch.png)

**Rejouer** — Browse and replay past games (played or spectated). Click an entry to step through it with navigation controls.

**Annonces** — Compose an 8-card hand, choose your position in the bidding round, and see what *Bid V5 IS-DD* (the NN bidder) would bid — with Q-values for every legal action, plus an XGBoost-distilled "key factors" panel (which features drove the decision) and a DouDou50 simulation of how often the contract actually succeeds.

![Annonces tab](https://raw.githubusercontent.com/Avo-k/colver/master/images/screenshots/tab-annonces.png)

**Annoncer** — Visual strategy guide derived from the bot via ML (per-card point weights, decision rules per position, mirror rule for defense). 88-94% agreement with the NN, mémorisable en quelques minutes.

**Aide** — Aide-mémoire visuel : ordre de force et valeur des cartes (atout / non-atout), points de la donne, règles d'enchères.

**Croyances** — Visualize how the belief network and heuristic model predict card locations as a game progresses. Generate a random game, step through it, and see per-card probability bars with ground truth overlay and accuracy stats. Switch observer perspective (N/E/S/W) and compare NN vs heuristic predictions side by side.

![Croyances tab](https://raw.githubusercontent.com/Avo-k/colver/master/images/screenshots/tab-croyances.png)

**Problemes d'annonce** — Bidding practice problems. See a hand and bidding history, then find the right bid. The AI evaluates your answer against the NN bidder's recommendation.

**Problemes de jeu** — Card play practice problems. See a mid-game position and find the best card. Compare your choice to the DD solver's optimal play.

## Build & Run

Requires Rust 1.70+ and Python 3.10+.

```bash
# Tests (357 tests)
cargo test -p colver-core

# Performance benchmark
cargo run -p colver-core --bin bench --release

# MCTS vs random demo
cargo run -p colver-core --bin mcts_demo --release -- 100

# Smart IS-MCTS vs random + vs naive demo
cargo run -p colver-core --bin smart_ismcts_demo --release -- 100

# Python bindings (via uv)
uv sync
uv run python3 -c "import colver; env = colver.Env(); print(env.reset())"

# Web interface (play against AI)
uv run python -m colver.web

# DMC training (Q-network)
PYTHONPATH=scripts/training uv run python scripts/training/train_dmc.py --num-envs 256 --steps 20000000

# DMC evaluation vs IS-MCTS
uv run python scripts/analysis/eval_dmc.py models/dmc_final.pt --baseline smart --time-ms 20 --both-sides
```

## AI Agents

### Oracle — DD Solver (`solver.rs`)

Perfect-information double-dummy solver that sees all 4 hands — it *cheats*. Alpha-beta with transposition tables, PVS, killer moves, and card equivalence pruning. Computes the exact optimal card in ~7ms (median). Useful as an upper bound.

### Dede — IS-DD (`is_dd.rs`)

Information Set Double-Dummy search. Maintains a probabilistic belief model over hidden cards — updated after every action via hard constraints (voids, trump ceiling) and soft inference (bidding signals, play patterns). Optionally augmented with a **belief network** (NN-based card location prediction, 330→512→512→128, ~2MB). Samples plausible opponent hands weighted by these beliefs, then solves each world exactly with the alpha-beta DD solver. IS-DD sounds like "is Dede" — hence the name.

### DouDou50 — DMC Q-Network (`dmc_net.rs`)

[DouZero](https://arxiv.org/abs/2106.06135)-style reinforcement learning agent. A Q-network picks card plays with a single forward pass — **no search tree**. Default play model, trained 50M steps with the NN bidder frozen (triforge play-only phase).

**Architecture**: ResNet Dueling DQN 411→1024→1024→1024→32 with LayerNorm and skip connections (~2.6M parameters). Uses canonical suit encoding (no augmentation needed). Inference in pure Rust (~1ms/decision, no PyTorch needed). Strongest overall agent.

The previous model **DouDou35** (415→1024³→32, legacy obs, 35M steps) is still supported for backward compatibility. *DouDou* = a reference to DouZero.

### Older search agents

**Smart IS-MCTS** (`smart_ismcts.rs`) — Belief-weighted [Information Set MCTS](https://doi.org/10.1109/TCIAIG.2012.2200894) with heuristic card beliefs. **Naive IS-MCTS** (`naive_ismcts.rs`) — Ensemble determinization without beliefs. Both are configurable and documented in [docs/SMART_ISMCTS.md](docs/SMART_ISMCTS.md).

### Bid V5 IS-DD — NN Bidder (`bid_net.rs`)

Dueling DQN **113→512→512→512→43** with score-aware v2 observation (5 precomputed match-score features: my/opp normalized, win probability, leader distance, signed diff). Trained **25M steps on real IS-DD points** (not DD oracle) with reward clipping, Polyak EMA (τ=0.005), and cosine LR decay. Best arena performance across DMC and IS-DD play (+11% DMC, +14.6% IS-DD winrate vs previous champion). `BidNet::load` auto-detects hidden size and obs_dim (108 / 110 / 113).

Past versions still supported via auto-detect:
- **Bid V3 Max** — 108-dim, trained on `max(DMC, IS-DD)` real points (20M steps)
- **Bid à Dédé** (v2) — 108-dim, DD oracle reward
- **Bid à Doudou** (v1) — 114→256² dueling, DouZero self-play

**Interpretability**: XGBoost distillation and a hidden-layer linear probe revealed that the NN's implicit scoring differs sharply from the classical hand evaluation (e.g., J atout = +11 effective, 9 = +4, A atout = +1, side A = 0 net; plus an anti-synergy J×9 = −2). Translated into a mnemonic 5-feature decision tree reaching 88-94% NN agreement — see [docs/bid/strategies/bid_v5_human_guide.md](docs/bid/strategies/bid_v5_human_guide.md) and [docs/bid/interpretability/probe_morning_report.md](docs/bid/interpretability/probe_morning_report.md).

## Agent Comparison

| Agent | Type | Speed/move | Notes |
|---|---|---|---|
| Oracle (DD) | DD solver (cheats) | ~7ms | Perfect info upper bound |
| Dede (IS-DD) | DD solver + beliefs | ~20ms | Strongest search-based |
| **DouDou50** | **Q-network (ResNet)** | **<1ms** | Strongest overall, no search |
| Smart IS-MCTS | Search + beliefs | ~9ms | Configurable budget |
| Naive IS-MCTS | Search | ~8ms | Configurable budget |

**Note**: Search-based agents get stronger with more time budget. The DMC agent uses no search — one forward pass per decision.

## Architecture

**Workspace:** `colver-core` (pure Rust) + `colver-py` (PyO3/NumPy FFI) + `colver-web` (FastAPI/WebSocket)

### Card Representation

Bitmask system: `Card = u8` (0-31), `CardSet = u32` (bitmask). Layout: Spades\[0-7\], Hearts\[8-15\], Diamonds\[16-23\], Clubs\[24-31\]. Within each suit: 7, 8, 9, J, Q, K, 10, A (plain strength order). Trump strength: J > 9 > A > 10 > K > Q > 8 > 7.

### Game State

`GameState` is `Copy` and ≤96 bytes (compile-time enforced) for fast MCTS cloning. Contains hands, current trick, contract, points/tricks per team, bidding state, played cards bitmask, void tracking, and belote tracking.

### Action Encoding

| Phase | Actions | Encoding |
|---|---|---|
| Bidding | 43 total | 0=PASS, 1-36=bids (9 values x 4 suits), 37-40=capot x 4, 41=COINCHE, 42=SURCOINCHE |
| Playing | 32 total | Card index 0-31 directly |

### Game Flow

Bidding → Playing → Done. Bidding ends after 3 consecutive passes, a surcoinche, or 4 passes (void deal). Playing runs 8 tricks of 4 cards. Card point total = 152; with dix de der = 162 (normal) or 252 (capot).

## Python API

```python
import colver

print(colver.__version__)  # "0.3.3"

# Single environment
env = colver.Env()
obs, legal_actions = env.reset()
obs, reward, done, legal_actions = env.step(action)

env.current_player()       # 0-3
env.phase()                # 0=Bidding, 1=Playing, 2=Done
env.legal_action_mask()    # numpy array (43,)
env.rewards()              # [NS_score, EW_score]
env.bid_improved()         # improved_bid action
env.deal_outcome()         # [NS_outcome, EW_outcome] binary
env.get_observation()      # 415-float observation vector
env.action_naive_ismcts(20)  # naive IS-MCTS action (20ms)
env.action_smart_ismcts(20)  # smart IS-MCTS action (20ms)

# DMC Q-network (if model weights downloaded)
model = colver.model_path()  # ~/.cache/colver/models/dmc_final.bin
if model:
    env.load_dmc_model(str(model))
    result = env.action_dmc_with_stats()  # {"best_action": 5, "q_values": [...]}
```

## Performance

| Workload | Throughput | Latency |
|---|---|---|
| Play-phase rollout | 1.4M/sec | ~720 ns |
| Full-deal rollout | 895K/sec | ~1118 ns |
| MCTS game (1000 iter) vs random | — | 8 ms |
| Smart IS-MCTS game (20x50) vs random | — | 9 ms |
| DMC Q-Network inference | — | <1 ms |

## Docker

The Docker image lets you deploy the web interface on any machine, including a Raspberry Pi (ARM64).

```bash
# Build and run
docker build -t colver .
docker run -p 8000:8000 colver

# Or with Docker Compose
docker compose up -d

# Cross-build for Raspberry Pi (ARM64)
docker buildx build --platform linux/arm64 -t colver .
```

The image is ~257 MB (no PyTorch dependency). All agents run in pure Rust and work on all architectures.

## Rules

Implements Belote Contree with 4 suits (Spades, Hearts, Diamonds, Clubs). Scoring mode: "points faits + points demandes". See `REGLES-DE-LA-BELOTE-CONTREE.pdf` for the full FFB rulebook.

## References

- Kocsis, L. & Szepesvari, C. (2006). [Bandit Based Monte-Carlo Planning](https://link.springer.com/chapter/10.1007/11871842_29). *ECML*.
- Cowling, P.I., Powley, E.J. & Whitehouse, D. (2012). [Information Set Monte Carlo Tree Search](https://doi.org/10.1109/TCIAIG.2012.2200894). *IEEE Transactions on Computational Intelligence and AI in Games*.
- Zha, D. et al. (2021). [DouZero: Mastering DouDiZhu with Self-Play Deep Reinforcement Learning](https://arxiv.org/abs/2106.06135). *ICML*.
- Auer, P., Cesa-Bianchi, N. & Fischer, P. (2002). [Finite-time Analysis of the Multiarmed Bandit Problem](https://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf). *Machine Learning*.

## Acknowledgments

Thanks to **Ronan Guillou**, seasoned coinche player, for his advice on the game and for being the first tester — his good sense guided many UI decisions.

