Metadata-Version: 2.4
Name: rdf4j-python
Version: 0.2.2
Summary: The Python client for RDF4J
Author-email: Chengxu Bian <cbian564@gmail.com>
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.28.1
Requires-Dist: pyoxigraph>=0.4.10
Provides-Extra: sparqlwrapper
Requires-Dist: sparqlwrapper>=2.0.0; extra == "sparqlwrapper"
Dynamic: license-file

# rdf4j-python

[![PyPI version](https://badge.fury.io/py/rdf4j-python.svg)](https://badge.fury.io/py/rdf4j-python)
[![Python Versions](https://img.shields.io/pypi/pyversions/rdf4j-python.svg)](https://pypi.org/project/rdf4j-python/)
[![CI](https://github.com/odysa/rdf4j-python/actions/workflows/ci.yaml/badge.svg)](https://github.com/odysa/rdf4j-python/actions/workflows/ci.yaml)
[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
[![Documentation](https://img.shields.io/badge/docs-sphinx-blue.svg)](https://github.com/odysa/rdf4j-python/tree/main/docs)

**A modern Python client for the Eclipse RDF4J framework, enabling seamless RDF data management and SPARQL operations from Python applications.**

rdf4j-python bridges the gap between Python and the robust [Eclipse RDF4J](https://rdf4j.org/) ecosystem, providing a clean, async-first API for managing RDF repositories, executing SPARQL queries, and handling semantic data with ease.

## Features

- **Async-First Design**: Native support for async/await with synchronous fallback
- **Repository Management**: Create, access, and manage RDF4J repositories programmatically
- **SPARQL Support**: Execute SELECT, ASK, CONSTRUCT, and UPDATE queries effortlessly
- **SPARQL Query Builder**: Fluent, programmatic query construction with method chaining
- **Transaction Support**: Atomic operations with commit/rollback and isolation levels
- **Flexible Data Handling**: Add, retrieve, and manipulate RDF triples and quads
- **File Upload**: Upload RDF files (Turtle, N-Triples, N-Quads, RDF/XML, JSON-LD, TriG, N3) directly to repositories
- **Multiple Formats**: Support for various RDF serialization formats
- **Repository Types**: Memory stores, native stores, HTTP repositories, and more
- **Named Graph Support**: Work with multiple graphs within repositories
- **Inferencing**: Built-in support for RDFS and custom inferencing rules

## Installation

### Prerequisites

- Python 3.11 or higher
- RDF4J Server (for remote repositories) or embedded usage

### Install from PyPI

```bash
pip install rdf4j-python
```

### Install with Optional Dependencies

```bash
# Include SPARQLWrapper integration
pip install rdf4j-python[sparqlwrapper]
```

### Development Installation

```bash
git clone https://github.com/odysa/rdf4j-python.git
cd rdf4j-python
uv sync --group dev
```

## Usage

### Quick Start

```python
import asyncio
from rdf4j_python import AsyncRdf4j
from rdf4j_python.model.repository_config import RepositoryConfig, MemoryStoreConfig, SailRepositoryConfig
from rdf4j_python.model.term import IRI, Literal

async def main():
    # Connect to RDF4J server
    async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
        # Create an in-memory repository
        config = RepositoryConfig(
            repo_id="my-repo",
            title="My Repository",
            impl=SailRepositoryConfig(sail_impl=MemoryStoreConfig(persist=False))
        )
        repo = await db.create_repository(config=config)
        
        # Add some data
        await repo.add_statement(
            IRI("http://example.com/person/alice"),
            IRI("http://xmlns.com/foaf/0.1/name"),
            Literal("Alice")
        )
        
        # Query the data
        results = await repo.query("SELECT * WHERE { ?s ?p ?o }")
        for result in results:
            print(f"Subject: {result['s']}, Predicate: {result['p']}, Object: {result['o']}")

if __name__ == "__main__":
    asyncio.run(main())
```

### SPARQL Query Builder

Build queries programmatically with method chaining instead of writing raw SPARQL strings:

```python
from rdf4j_python import select, ask, construct, describe, GraphPattern, Namespace

ex = Namespace("ex", "http://example.org/")
foaf = Namespace("foaf", "http://xmlns.com/foaf/0.1/")

# SELECT with typed terms — IRIs serialize automatically
query = (
    select("?person", "?name")
    .where("?person", foaf.type, ex.Person)
    .where("?person", foaf.name, "?name")
    .optional("?person", foaf.email, "?email")
    .filter("?name != 'Bob'")
    .order_by("?name")
    .limit(10)
    .build()
)

# Or use string-based prefixed names
query = (
    select("?name")
    .prefix("foaf", "http://xmlns.com/foaf/0.1/")
    .where("?person", "a", "foaf:Person")
    .where("?person", "foaf:name", "?name")
    .build()
)

# GROUP BY with aggregation
query = (
    select("?city", "(COUNT(?person) AS ?count)")
    .where("?person", ex.city, "?city")
    .group_by("?city")
    .having("COUNT(?person) > 1")
    .order_by("DESC(?count)")
    .build()
)

# ASK, CONSTRUCT, and DESCRIBE
ask_query = ask().where("?s", ex.name, "?name").build()

construct_query = (
    construct(("?s", ex.fullName, "?name"))
    .where("?s", ex.firstName, "?fname")
    .bind("CONCAT(?fname, ' ', ?lname)", "?name")
    .build()
)

describe_query = describe(ex.alice).build()
```

The query builder supports FILTER, OPTIONAL, UNION, BIND, VALUES, sub-queries, DISTINCT, ORDER BY, GROUP BY, HAVING, LIMIT, and OFFSET. Both raw strings and typed objects (`IRI`, `Variable`, `Literal`, `Namespace`) work as terms.

### Working with Multiple Graphs

```python
from rdf4j_python.model.term import Quad

async def multi_graph_example():
    async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
        repo = await db.get_repository("my-repo")
        
        # Add data to specific graphs
        statements = [
            Quad(
                IRI("http://example.com/person/bob"),
                IRI("http://xmlns.com/foaf/0.1/name"),
                Literal("Bob"),
                IRI("http://example.com/graph/people")
            ),
            Quad(
                IRI("http://example.com/person/bob"),
                IRI("http://xmlns.com/foaf/0.1/age"),
                Literal("30", datatype=IRI("http://www.w3.org/2001/XMLSchema#integer")),
                IRI("http://example.com/graph/demographics")
            )
        ]
        await repo.add_statements(statements)
        
        # Query specific graph
        graph_query = """
        SELECT * WHERE {
            GRAPH <http://example.com/graph/people> {
                ?person ?property ?value
            }
        }
        """
        results = await repo.query(graph_query)
```

### Advanced Repository Configuration

Here's a more comprehensive example showing repository creation with different configurations:

```python
async def advanced_example():
    async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
        # Memory store with persistence
        persistent_config = RepositoryConfig(
            repo_id="persistent-repo",
            title="Persistent Memory Store",
            impl=SailRepositoryConfig(sail_impl=MemoryStoreConfig(persist=True))
        )
        
        # Create and populate repository
        repo = await db.create_repository(config=persistent_config)
        
        # Bulk data operations
        data = [
            (IRI("http://example.com/alice"), IRI("http://xmlns.com/foaf/0.1/name"), Literal("Alice")),
            (IRI("http://example.com/alice"), IRI("http://xmlns.com/foaf/0.1/email"), Literal("alice@example.com")),
            (IRI("http://example.com/bob"), IRI("http://xmlns.com/foaf/0.1/name"), Literal("Bob")),
        ]
        
        statements = [
            Quad(subj, pred, obj, IRI("http://example.com/default"))
            for subj, pred, obj in data
        ]
        await repo.add_statements(statements)
        
        # Query with the fluent query builder
        from rdf4j_python import select
        from rdf4j_python.model._namespace import Namespace

        foaf = Namespace("foaf", "http://xmlns.com/foaf/0.1/")
        query = (
            select("?name", "?email")
            .where("?person", foaf.name, "?name")
            .optional("?person", foaf.email, "?email")
            .order_by("?name")
            .build()
        )
        results = await repo.query(query)
```

### Uploading RDF Files

```python
import pyoxigraph as og

async def upload_example():
    async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
        repo = await db.get_repository("my-repo")

        # Upload a Turtle file (format auto-detected from extension)
        await repo.upload_file("data.ttl")

        # Upload to a specific named graph
        await repo.upload_file("data.ttl", context=IRI("http://example.com/graph"))

        # Upload with explicit format
        await repo.upload_file("data.txt", rdf_format=og.RdfFormat.N_TRIPLES)

        # Upload with base URI for relative URIs
        await repo.upload_file("data.ttl", base_uri="http://example.com/")
```

### Using Transactions

```python
from rdf4j_python import IsolationLevel

async def transaction_example():
    async with AsyncRdf4j("http://localhost:19780/rdf4j-server") as db:
        repo = await db.get_repository("my-repo")

        # Atomic operations with auto-commit/rollback
        async with repo.transaction() as txn:
            await txn.add_statements([
                Quad(IRI("http://example.com/alice"), IRI("http://xmlns.com/foaf/0.1/name"), Literal("Alice")),
                Quad(IRI("http://example.com/bob"), IRI("http://xmlns.com/foaf/0.1/name"), Literal("Bob")),
            ])
            await txn.delete_statements([old_quad])
            # Commits automatically on success, rolls back on exception

        # With specific isolation level
        async with repo.transaction(IsolationLevel.SERIALIZABLE) as txn:
            await txn.update("""
                DELETE { ?s <http://example.com/status> "draft" }
                INSERT { ?s <http://example.com/status> "published" }
                WHERE { ?s <http://example.com/status> "draft" }
            """)
```

For more detailed examples, see the [examples](examples/) directory.

## Development

### Setting up Development Environment

1. **Clone the repository**:
   ```bash
   git clone https://github.com/odysa/rdf4j-python.git
   cd rdf4j-python
   ```

2. **Install development dependencies**:
   ```bash
   uv sync --group dev
   ```

3. **Start RDF4J Server** (for integration tests):
   ```bash
   # Using Docker
   docker run -p 19780:8080 eclipse/rdf4j:latest
   ```

4. **Run tests**:
   ```bash
   pytest tests/
   ```

5. **Run linting**:
   ```bash
   ruff check .
   ruff format .
   ```

### Project Structure

```
rdf4j_python/
├── _driver/          # Core async driver implementation
├── model/            # Data models and configurations
├── query/            # SPARQL query builder
├── exception/        # Custom exceptions
└── utils/           # Utility functions

examples/            # Usage examples
tests/              # Test suite
docs/               # Documentation
```

## Contributing

We welcome contributions! Here's how to get involved:

1. Fork the repository on GitHub
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes and add tests
4. Run the test suite to ensure everything works
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to your branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

### Running Examples

```bash
# Make sure RDF4J server is running on localhost:19780
python examples/complete_workflow.py
python examples/query.py
```

## License

This project is licensed under the BSD 3-Clause License. See the [LICENSE](LICENSE) file for details.

Copyright (c) 2025, Chengxu Bian

## Support

- **Issues & Bug Reports**: [GitHub Issues](https://github.com/odysa/rdf4j-python/issues)
- **Documentation**: [docs/](https://github.com/odysa/rdf4j-python/tree/main/docs)
- **Questions**: Feel free to open a discussion or issue

If you find this project useful, please consider starring the repository!
