Metadata-Version: 2.4
Name: graphforge
Version: 0.3.9
Summary: Composable graph tooling for analysis, construction, and refinement
Project-URL: Homepage, https://github.com/DecisionNerd/graphforge
Project-URL: Repository, https://github.com/DecisionNerd/graphforge
Project-URL: Issues, https://github.com/DecisionNerd/graphforge/issues
Author: David Spencer
License: MIT
License-File: LICENSE
Keywords: analysis,graph,opencypher,pydantic
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: defusedxml>=0.7.1
Requires-Dist: isodate>=0.6.1
Requires-Dist: lark>=1.1
Requires-Dist: msgpack>=1.0
Requires-Dist: pydantic>=2.6
Requires-Dist: python-dateutil>=2.8.2
Requires-Dist: pyyaml>=6.0.3
Provides-Extra: dev
Requires-Dist: hypothesis>=6.0; extra == 'dev'
Requires-Dist: pytest-bdd>=7.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.0; extra == 'dev'
Requires-Dist: pytest-split>=0.9.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.0; extra == 'dev'
Requires-Dist: pytest-xdist>=3.0; extra == 'dev'
Requires-Dist: pytest>=9.0.3; extra == 'dev'
Requires-Dist: ruff==0.15.12; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-autorefs>=1.0.0; extra == 'docs'
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
Requires-Dist: mkdocs>=1.6.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'docs'
Description-Content-Type: text/markdown

<h1 align="center">GraphForge</h1>

<p align="center">
  <a href="https://pypi.org/project/graphforge/"><img src="https://img.shields.io/pypi/v/graphforge.svg?label=PyPI&logo=pypi" alt="PyPI version" /></a>
  <a href="https://pypi.org/project/graphforge/"><img src="https://img.shields.io/pypi/dm/graphforge.svg?label=Downloads" alt="Monthly downloads" /></a>
  <a href="https://pypi.org/project/graphforge/"><img src="https://img.shields.io/pypi/pyversions/graphforge.svg?logo=python&logoColor=white" alt="Python versions" /></a>
  <a href="https://github.com/DecisionNerd/graphforge/actions"><img src="https://github.com/DecisionNerd/graphforge/workflows/Test%20Suite/badge.svg" alt="Build status" /></a>
  <a href="https://codecov.io/gh/DecisionNerd/graphforge"><img src="https://codecov.io/gh/DecisionNerd/graphforge/graph/badge.svg" alt="Coverage" /></a>
  <a href="https://github.com/DecisionNerd/graphforge/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License" /></a>
</p>

<p align="center">
  <strong>Composable graph tooling for analysis, construction, and refinement</strong>
</p>

<p align="center">
  A lightweight, embedded, openCypher-compatible graph engine for research and investigative workflows
</p>

---

## Table of Contents

- [Why GraphForge?](#why-graphforge)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Cypher Features](#cypher-features)
- [Datasets](#datasets)
- [Transactions](#transactions)
- [Architecture](#architecture)
- [Development](#development)
- [Roadmap](#roadmap)
- [License](#license)

---

## Why GraphForge?

> *We are not building a database for applications.*
> *We are building a graph execution environment for thinking.*

Modern data science and ML workflows increasingly produce graph-shaped data — entity relationships extracted by LLMs, citation networks, dependency graphs, social connections, knowledge bases. Working with this data shouldn't require running a database server. GraphForge brings the full expressiveness of the openCypher query language to the Python notebook and script environment: zero configuration, single-file persistence, and first-class Python integration.

| | NetworkX | **GraphForge** | Neo4j / Memgraph |
|:---|:---|:---|:---|
| **Setup** | `pip install` | `pip install` | Run a server |
| **Query language** | Python API | **Full openCypher** | Full Cypher |
| **Persistence** | Manual | **SQLite (automatic)** | Native |
| **Notebook-friendly** | ✓ | ✓ | Requires connection |
| **Graph size** | Millions | up to ~20M edges† | Billions |
| **TCK compliance** | N/A | **100% (3,885/3,885)** | ~100% |

**Use GraphForge for:** knowledge graphs, citation networks, research workflows, LLM output storage, social network analysis in notebooks.

**Use a production database for:** high throughput, multi-user access, or graphs beyond the limits in [Scale Limits](docs/reference/scale-limits.md).

† *Traversal queries with LIMIT scale to ~20M edges; full-scan aggregations are practical up to ~1M edges.*

### v0.3.9 — Performance Release

v0.3.9 delivers substantial performance improvements over v0.3.8: LALR(1) linear-time parsing, O(1) property equality index, LIMIT short-circuit for traversal and UNWIND, bulk ingestion API, SQLite PRAGMA tuning, and `elementId()`. TCK compliance is maintained at **3,885/3,885 (100%)**.

See [CHANGELOG.md](CHANGELOG.md) for the full list of changes.

---

## Installation

```bash
pip install graphforge
# or
uv add graphforge
```

**Requirements:** Python 3.10–3.14

**Core dependencies:** `pydantic>=2.6`, `lark>=1.1`, `msgpack>=1.0`

---

## Quick Start

### In-memory graph

```python
from graphforge import GraphForge

db = GraphForge()

# Create nodes and relationships
db.execute("""
    CREATE (alice:Person {name: 'Alice', age: 30})
    CREATE (bob:Person {name: 'Bob', age: 25})
    CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
""")

# Query the graph
results = db.execute("""
    MATCH (p:Person)-[:KNOWS]->(friend)
    WHERE p.age > 25
    RETURN p.name AS person, friend.name AS friend, p.age AS age
    ORDER BY p.age DESC
""")

for row in results:
    print(f"{row['person'].value} (age {row['age'].value}) knows {row['friend'].value}")
```

### Persistent graph

```python
# Save to SQLite
db = GraphForge("research.db")
db.execute("CREATE (:Paper {title: 'Graph Neural Networks', year: 2024})")
db.close()

# Reload later
db = GraphForge("research.db")
result = db.execute("MATCH (p:Paper) RETURN p.title AS t")
print(result[0]['t'].value)  # Graph Neural Networks
```

### Python builder API

```python
alice = db.create_node(['Person', 'Employee'], name='Alice', age=30)
bob = db.create_node(['Person'], name='Bob', age=25)
db.create_relationship(alice, bob, 'KNOWS', since=2020)
```

### Access result values

Results contain `CypherValue` objects — use `.value` to get the Python value:

```python
results = db.execute("MATCH (p:Person) RETURN p.name AS name, p.age AS age")

for row in results:
    name: str = row['name'].value
    age: int  = row['age'].value
```

---

## Cypher Features

GraphForge implements the full openCypher language (100% TCK compliant as of v0.3.8).

### Clauses

```cypher
-- Reading
MATCH (n:Person)-[:KNOWS]->(friend)
OPTIONAL MATCH (n)-[:WORKS_AT]->(company)
WHERE n.age > 25
WITH n, count(friend) AS friends
RETURN n.name, friends
ORDER BY friends DESC
LIMIT 10

-- Writing
CREATE (n:Person {name: 'Alice'})
MERGE (n:Person {name: 'Alice'})
SET n.age = 30
REMOVE n.temp
DELETE n
DETACH DELETE n

-- Iteration
UNWIND [1, 2, 3] AS x
RETURN x * 2 AS doubled

-- Subqueries
MATCH (n) WHERE EXISTS { MATCH (n)-[:KNOWS]->() }
RETURN n
```

### Patterns

```cypher
(n)                                -- Any node
(n:Person)                         -- Node with label
(n:Person {age: 30})               -- Node with property
(a)-[r:KNOWS]->(b)                 -- Directed relationship
(a)-[r:KNOWS|LIKES]->(b)           -- Multiple types
(a)-[*1..3]->(b)                   -- Variable-length (1 to 3 hops)
(a)-[*]->(b)                       -- Any length
p = (a)-[*]->(b)                   -- Bind path to variable
```

### Functions

| Category | Functions |
|----------|-----------|
| String | `toLower`, `toUpper`, `trim`, `split`, `replace`, `substring`, `left`, `right`, `reverse`, `size` |
| Math | `abs`, `ceil`, `floor`, `round`, `sqrt`, `pow`, `exp`, `log`, `sin`, `cos`, `tan`, `pi`, `e` |
| List | `head`, `tail`, `last`, `range`, `size`, `reverse`, `sort`, `collect`, `reduce`, `filter`, `extract` |
| Aggregation | `count`, `sum`, `avg`, `min`, `max`, `collect`, `stDev`, `percentileDisc` |
| Predicate | `all`, `any`, `none`, `single`, `exists`, `isEmpty` |
| Temporal | `date`, `datetime`, `localDatetime`, `time`, `localtime`, `duration`, `now` |
| Spatial | `point`, `distance` |
| Graph | `id`, `labels`, `type`, `keys`, `properties`, `nodes`, `relationships`, `startNode`, `endNode` |
| Conversion | `toInteger`, `toFloat`, `toString`, `toBoolean`, `coalesce` |

### Temporal types (full precision)

```cypher
-- Dates, times, datetimes
RETURN date('2024-01-15')
RETURN datetime('2024-01-15T14:30:00[Europe/London]')  -- IANA timezone
RETURN duration('P1Y2M3DT4H5M6.789S')

-- Nanosecond precision
RETURN duration('PT0.000000789S').nanoseconds  -- 789

-- Extreme years (outside Python's 1-9999 range)
RETURN localdatetime('+999999999-12-31T23:59:59')

-- Arithmetic
RETURN date('2024-01-01') + duration('P1M')  -- 2024-02-01
RETURN duration.between(date('2020-01-01'), date('2024-01-01'))
```

---

## Datasets

Load 100+ real-world graphs instantly:

```python
from graphforge import GraphForge
from graphforge.datasets import load_dataset, list_datasets

db = GraphForge()

# Load any pre-registered dataset (auto-downloads and caches)
load_dataset(db, "snap-ego-facebook")   # Facebook ego networks (SNAP)
load_dataset(db, "ldbc-snb-sf0.1")      # Social network benchmark (LDBC)
load_dataset(db, "netrepo-karate")      # Karate club (NetworkRepository)

# Browse available datasets
for ds in list_datasets(source="snap")[:3]:
    print(f"{ds.name}: {ds.nodes:,} nodes, {ds.edges:,} edges")

# Analyze immediately
results = db.execute("""
    MATCH (n)-[r]->()
    RETURN n.id AS user, count(r) AS degree
    ORDER BY degree DESC LIMIT 5
""")
```

**Available sources:**
- **SNAP** (Stanford): 95 social, web, email, citation, and collaboration networks
- **LDBC**: 10 social network benchmark datasets with temporal data
- **NetworkRepository**: 10 pre-registered datasets

---

## Transactions

```python
db = GraphForge("graph.db")

db.begin()
try:
    db.execute("MATCH (p:Person {id: 123}) SET p.status = 'inactive'")
    db.execute("CREATE (:AuditLog {action: 'deactivate', user_id: 123})")
    db.commit()
except Exception:
    db.rollback()
    raise
finally:
    db.close()
```

---

## Architecture

GraphForge is built in four independent layers:

```
┌─────────────────────────────────────────────────┐
│  Parser         cypher.lark + parser.py         │  Cypher → AST
├─────────────────────────────────────────────────┤
│  Planner        planner.py + operators.py       │  AST → Logical plan
├─────────────────────────────────────────────────┤
│  Executor       executor.py + evaluator.py      │  Plan → Results
├─────────────────────────────────────────────────┤
│  Storage        memory.py + sqlite_backend.py   │  In-memory + SQLite
└─────────────────────────────────────────────────┘
```

Storage uses **MessagePack** for efficient binary encoding of graph properties. Persistence is a single SQLite file with WAL mode for durability.

---

## Development

```bash
# Install with dev dependencies
uv sync --dev

# Run all checks (mirrors CI)
make pre-push

# Run tests
uv run pytest tests/unit tests/integration
uv run pytest tests/tck/ -n auto   # Full TCK (3,885 scenarios)

# Coverage
make coverage
```

---

## Roadmap

| Version | Focus | Status |
|---------|-------|--------|
| v0.3.8 | Full TCK compliance (3,885/3,885) | **Released** |
| v0.3.9 | Performance: LALR parser, property indexes, bulk ingest, SQLite tuning, LIMIT short-circuit | **Released** |
| v0.3.10 | Analytics integration: NetworkX/igraph export, parse/plan cache | Planned |
| v0.4.0 | Native SNA algorithms: PageRank, betweenness, WCC, shortest path via `CALL gf.algo.*` | Planned |
| v1.0 | Production-ready: thread safety, large graph support | Future |

See [CHANGELOG.md](CHANGELOG.md) for full release history.

---

## License

MIT © David Spencer — see [LICENSE](LICENSE) for details.

Built on [Lark](https://github.com/lark-parser/lark), [Pydantic](https://docs.pydantic.dev/), [MessagePack](https://msgpack.org/), and the [openCypher](https://opencypher.org/) specification.
