Metadata-Version: 2.4
Name: affiliation-builder
Version: 0.1.0
Summary: Build bipartite networks from JSON affiliation data
Project-URL: Homepage, https://github.com/timofruehwirth/affiliation-builder
Project-URL: Repository, https://github.com/timofruehwirth/affiliation-builder
Project-URL: Issues, https://github.com/timofruehwirth/affiliation-builder/issues
Project-URL: Documentation, https://github.com/timofruehwirth/affiliation-builder#readme
Author-email: Timo Frühwirth <timo.fruehwirth@oeaw.ac.at>
License-Expression: MIT
License-File: LICENSE
Keywords: affiliation,affiliation-network,bipartite-network,digital-humanities,json,network,network-analysis,networkx
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: networkx>=3.0
Requires-Dist: requests>=2.31.0
Description-Content-Type: text/markdown

Affiliation Builder is currently in alpha and welcomes feedback from early adopters.

# Affiliation Builder

Build bipartite affiliation networks from JSON data using NetworkX.

[![PyPI version](https://badge.fury.io/py/affiliation-builder.svg)](https://pypi.org/project/affiliation-builder/)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview

**Affiliation Builder** is a Python package for creating bipartite networks from JSON data on co-affiliation relationships. It transforms structured data about entities (such as people and organizations) and their shared affiliations (such as in events) into NetworkX graph objects for analysis and visualization.

While designed with event-participant data in mind, the package works with any co-affiliation scenario where a set of entities connects to a set of items through shared relationships.

## Features

- **Flexible JSON input:** Supports various JSON structures (arrays, wrapped objects)
- **Multiple entity types:** Handle different entity types simultaneously (such as persons and organizations)
- **Simple and complex entities:** Work with string identifiers or objects
- **Rich metadata:** Preserve all JSON attributes as node properties
- **URL support:** Load data from local files or URLs
- **Comprehensive validation:** Detailed error messages and logging
- **NetworkX integration:** Returns standard NetworkX graph objects

## Requirements

The package has been developed and tested with:
- Python 3.9+
- NetworkX 3.0+
- Requests 2.31.0+

## Installation

```bash
pip install affiliation-builder
```

## Quick Start

```python
from affiliation_builder import build

# Build a bipartite network from JSON data
G = build(
    json_path='events.json',
    node_set_0_key='events',
    node_set_1_keys='participants',
    identifier_key='event_id',
    node_set_1_identifier_key='person_name'
)

# Returns a standard NetworkX Graph object
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")

# Access node sets
node_set_0 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
node_set_1 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}
```

## Understanding the Parameters

The `build()` function has 5 parameters that control how your JSON data maps to the bipartite network:

### Parameter 1: `json_path` (str or Path)

**What it is:** Path to your local JSON file or URL

**Examples:**

```python
json_path='data/events.json'
json_path='https://example.com/data.json'
```

---

### Parameter 2: `node_set_0_key` (str or None)

**What it is:** JSON key containing your items (such as events)

**Use `None` if:** JSON is direct array of items (not wrapped in an object)

**Examples:**

Wrapped object format (specify the key):

```json
{
  "events": [
    {"id": "evt1", "participants": ["Alice", "Bob"]},
    {"id": "evt2", "participants": ["Bob", "Carol"]}
  ]
}
```

```python
node_set_0_key='events'
```

Direct array format (use None):

```json
[
  {"id": "evt1", "participants": ["Alice", "Bob"]},
  {"id": "evt2", "participants": ["Bob", "Carol"]}
]
```

```python
node_set_0_key=None
```

---

### Parameter 3: `node_set_1_keys` (str or list of str)
**What it is:** The JSON key(s) that contain the entities affiliated with each item

**Pass list when:** You have multiple entity types (e.g., both persons and organizations)

**Examples:**

Single entity type:

```json
{"id": "evt1", "participants": ["Alice", "Bob"]}
```

```python
node_set_1_keys='participants'
```

Multiple entity types:

```json
{
  "id": "evt1",
  "persons": ["Alice", "Bob"],
  "organizations": ["University A", "Company B"]
}
```

```python
node_set_1_keys=['persons', 'organizations']
```

---

### Parameter 4: `identifier_key` (str)

**What it is:** JSON key that uniquely identifies each item (e.g., event)

**Examples:**

```json
{"id": "evt1", "name": "Conference 2024", ...}
```

```python
identifier_key='id'
```

---

### Parameter 5: `node_set_1_identifier_key` (str or None, optional)

**What it is:** Key to extract identifiers from entity objects (when entities are objects, not strings)

**Use `None` (default) when:** Entities are simple strings/numbers

**Pass key when:** Entities are objects with multiple attributes

**Examples:**

Simple entities (strings):

```json
{"id": "evt1", "participants": ["Alice", "Bob"]}
```

```python
node_set_1_identifier_key=None
```

Complex entities (objects):

```json
{
  "id": "evt1",
  "participants": [
    {"person_name": "Alice", "role": "speaker", "affiliation": "MIT"},
    {"person_name": "Bob", "role": "attendee", "affiliation": "Stanford"}
  ]
}
```
```python
# Extract 'Alice' and 'Bob' as node IDs
# All other attributes (role, affiliation) are preserved as node properties
node_set_1_identifier_key='person_name'
```

## JSON Structure Examples

### Example 1: Wrapped Object with Simple Entities

```json
{
  "events": [
    {"name": "Conference 2024", "participants": ["Alice", "Bob", "Carol"]},
    {"name": "Workshop 2024", "participants": ["Bob", "David"]}
  ]
}
```
```python
G = build(
    json_path='events.json',
    node_set_0_key='events',
    node_set_1_keys='participants',
    identifier_key='name'
)
```

---

### Example 2: Direct Array with Complex Entities
```json
[
  {
    "project_id": "proj1",
    "members": [
      {"name": "Alice", "role": "lead", "department": "Engineering"},
      {"name": "Bob", "role": "contributor", "department": "Design"}
    ]
  }
]
```
```python
G = build(
    json_path='projects.json',
    node_set_0_key=None,  # Direct array
    node_set_1_keys='members',
    identifier_key='project_id',
    node_set_1_identifier_key='name'  # Extract name from member objects
)

# All attributes preserved as node properties
print(G.nodes['Alice'])  # {'bipartite': 1, 'role': 'lead', 'department': 'Engineering'}
```

---

### Example 3: Multiple Entity Types
```json
{
  "events": [
    {
      "name": "Summit 2024",
      "persons": ["Alice", "Bob"],
      "organizations": ["Company A", "University B"]
    }
  ]
}
```
```python
G = build(
    json_path='https://example.com/data/events.json',
    node_set_0_key='events',
    node_set_1_keys=['persons', 'organizations'],  # Multiple types
    identifier_key='name'
)
```

## Working with the Output

The `build()` function returns a standard NetworkX Graph object with bipartite structure for further processing:

```python
import networkx as nx
from affiliation_builder import build

# Build network
G = build('events.json', 'events', 'participants', 'event_id')

# Access node sets
events = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
participants = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}

# Check bipartite validity
print(nx.is_bipartite(G))

# Analyze the network
print(f"Number of events: {len(events)}")
print(f"Number of participants: {len(participants)}")
print(f"Network density: {nx.density(G)}")

# Project to unipartite network
P = nx.bipartite.weighted_projected_graph(G, participants)
print(f"Co-affiliation edges: {P.number_of_edges()}")
```

## Duplicate Entity Node Handling

When the same entity appears multiple times (such as a participant in several events), the node is created once and edges are added for each affiliation. This is the expected behavior for affiliation networks.

However, if the same entity appears with different attributes in different items, the last set of attributes overwrites earlier sets. For example:
```json
{
  "events": [
    {
      "name": "Event 1",
      "participants": [{"name": "Alice", "role": "speaker"}]
    },
    {
      "name": "Event 2",
      "participants": [{"name": "Alice", "role": "attendee"}]
    }
  ]
}
```

After processing, `G.nodes['Alice']` will have `role: 'attendee'` (from Event 2), but not `role: 'speaker'` (from Event 1).

## Limitations

- **UTF-8 encoding:** Local JSON files must be UTF-8 encoded. Other encodings will raise an error. (URL sources handle encoding automatically based on server response headers.)
- **Hashable identifiers:** Node IDs must be hashable Python types (strings, numbers, tuples). Lists or dictionaries as identifiers will be skipped with a warning.
- **Flat entity lists:** Entity values (under `node_set_1_keys`) must be arrays. Nested structures are not recursively processed.

## Security Considerations

Be aware of potential security risks when processing JSON data from untrusted sources:

### Resource Exhaustion

- **Large files:** No size limits are enforced on JSON files or URL downloads
- **Deep nesting:** Extremely nested JSON structures could cause memory or stack issues
- **Malicious data:** An attacker could provide data designed to consume excessive resources

### Recommendations

1. **Trust your sources:** Only load JSON from sources you control or trust
2. **Validate externally:** Pre-validate JSON files for size and structure if loading from untrusted sources
3. **Monitor resources:** For production use, implement resource monitoring
4. **Sandbox if needed:** Run in isolated environments if processing untrusted data

### Future Considerations
Future versions may include:
- Optional `max_size` parameter for downloads
- Configurable nesting depth limits
- Enhanced validation options

**For now:** Use this package with data from trusted sources, or implement your own validation layer for untrusted input.


## Logging

The package uses Python's `logging` module. By default, log messages are not displayed. To receive processing information, configure logging in your application:

Display full logging from DEBUG level upward:
```python
import logging
from affiliation_builder import build

logging.getLogger('affiliation_builder').setLevel(logging.DEBUG)
logging.getLogger('affiliation_builder').addHandler(logging.StreamHandler())
```

Or set the level of logging to `logging.INFO` for logging only from INFO level upward.

## Examples

See the [`examples/`](https://github.com/timofruehwirth/affiliation-builder/tree/main/examples) directory for:
- `example.json` - Sample JSON data structure
- `example.ipynb` - Complete Jupyter Notebook with test analysis and visualization

## Contributing

Contributions are welcome! Please feel free to submit issues or pull requests on [GitHub](https://github.com/timofruehwirth/affiliation-builder).

## License

This project is licensed under the MIT License - see this [LICENSE](LICENSE) for details.

## Citation

If you use this software in your research, please cite:
```bibtex
@software{fruehwirth2025affiliation,
  author = {Frühwirth, Timo},
  title = {Affiliation Builder: Build bipartite affiliation networks from JSON data},
  year = {2025},
  url = {https://github.com/timofruehwirth/affiliation-builder},
  version = {0.1.0}
}
```

## Acknowledgments

Built with [NetworkX](https://networkx.org/) for network analysis and [Requests](https://requests.readthedocs.io/) for HTTP functionality.