Metadata-Version: 2.4
Name: pitedgar
Version: 0.4.0
Summary: Point-in-time SEC EDGAR financial data pipeline
License: MIT
License-File: LICENSE
Keywords: SEC,EDGAR,financial-data,point-in-time,XBRL,backtesting
Author: Ariel Nacamulli
Requires-Python: >=3.11
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Dist: click
Requires-Dist: edgartools (>=5.0)
Requires-Dist: loguru
Requires-Dist: pandas (>=2.0)
Requires-Dist: pyarrow
Requires-Dist: pydantic (>=2.0)
Requires-Dist: requests
Requires-Dist: tqdm
Project-URL: Documentation, https://github.com/arielNacamulli/pitedgar#readme
Project-URL: Homepage, https://github.com/arielNacamulli/pitedgar
Project-URL: Repository, https://github.com/arielNacamulli/pitedgar
Description-Content-Type: text/markdown

# pitedgar

[![CI](https://github.com/arielNacamulli/pitedgar/actions/workflows/ci.yml/badge.svg)](https://github.com/arielNacamulli/pitedgar/actions/workflows/ci.yml)
[![PyPI version](https://img.shields.io/pypi/v/pitedgar.svg)](https://pypi.org/project/pitedgar/)
[![Python versions](https://img.shields.io/pypi/pyversions/pitedgar.svg)](https://pypi.org/project/pitedgar/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

Point-in-time SEC EDGAR financial data pipeline.

Downloads SEC EDGAR `companyfacts.zip`, parses XBRL JSON facts into a local
parquet file, and exposes a query API with **zero look-ahead bias** — every
value is stamped with the `filed` date (when the data was actually available
to the market), not the period-end date.

---

## Installation

```bash
pip install pitedgar
# or with Poetry
poetry install
```

---

## Quick start

```python
from pathlib import Path
from pitedgar import PitEdgarConfig, build_cik_map, download_bulk, parse_all, PitQuery

config = PitEdgarConfig(
    edgar_identity="Mario Rossi mario@example.com",  # required by SEC
    data_dir=Path("./data"),
)

# Step 1 — one-shot ticker → CIK mapping
tickers = ["AAPL", "MSFT", "JPM", "GOOGL"]
cik_map = build_cik_map(tickers, config)

# Step 2 — download ~1.5 GB bulk ZIP (do this periodically, not every run)
download_bulk(config)

# Step 3 — parse JSON → parquet (sub-minute for 500 companies)
master = parse_all(config, cik_map)

# Step 4 — query
q = PitQuery(config.data_dir / "pit_financials.parquet")

# What revenue figure was available to the market on 2022-06-30?
result = q.as_of(["AAPL", "MSFT"], "us-gaap:Revenues", "2022-06-30")

# Full history
hist = q.history("AAPL", "us-gaap:NetIncomeLoss", freq="A")

# Portfolio cross-section signal
xs = q.cross_section("us-gaap:NetIncomeLoss", "2023-12-31")
```

---

## CLI

```bash
# Resolve tickers (tickers.txt has one ticker per line)
pitedgar map --tickers tickers.txt --identity "Name name@email.com"

# Download bulk ZIP
pitedgar fetch --identity "Name name@email.com"

# Parse to parquet
pitedgar build --identity "Name name@email.com"

# Query a single value
pitedgar query --ticker AAPL --concept us-gaap:Revenues --as-of 2023-06-30
```

---

## Key design decisions

| Decision | Rationale |
|---|---|
| `filed` as PIT timestamp | The date the filing was submitted to SEC — this is when information became public |
| Deduplication keeps latest `filed` per `(concept, end)` | Companies sometimes refile restated figures; keep the superseding value |
| Raw USD values, no scale conversion | SEC reports values as-filed; downstream code applies any needed normalization |
| Local parquet, no runtime HTTP | Queries run at DataFrame speed with no network dependency |

---

## Supported XBRL concepts (defaults)

See `pitedgar.config.DEFAULT_CONCEPTS` for the full list, which includes
revenues, net income, assets, liabilities, equity, EPS, cash, debt, operating
cash flow, capex, and R&D expense.

---

## Examples

- [`examples/fcf_sp500.py`](examples/fcf_sp500.py) — S&P 500 free cash flow benchmark: fetches constituents, builds the parquet, and queries FCF cross-sections across 20 quarters. Useful as an end-to-end performance reference.

---

## Contributing

Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions, coding conventions, and the PR process.

---

## License

[MIT](LICENSE)

