Metadata-Version: 2.4
Name: pyhmfd
Version: 0.1.2
Summary: Read Human Mortality Database and Human Fertility Database data from the web
Project-URL: Homepage, https://github.com/filipeclduarte/pyhmfd
Project-URL: Bug Tracker, https://github.com/filipeclduarte/pyhmfd/issues
Project-URL: R original, https://github.com/timriffe/HMDHFDplus
Author-email: Filipe Duarte <filipe_pb_duarte@hotmail.com>
License: GPL-2.0
Keywords: HFD,HMD,demography,fertility,mortality
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: keyring>=24.0
Requires-Dist: pandas>=2.0
Requires-Dist: requests>=2.31
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: responses>=0.25; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# pyhmfd

A Python package for reading data from the Human Mortality Database (HMD),
Human Fertility Database (HFD), Human Fertility Collection (HFC),
Japanese Mortality Database (JMD), and Canadian Historical Mortality Database (CHMD).

Returns tidy `pandas.DataFrame` objects ready for analysis.

## Credits

This package is a Python port of the R package **HMDHFDplus** by
Tim Riffe, Jose Manuel Aburto, and contributors:

> Riffe T, Aburto JM, et al. (2023). *HMDHFDplus: Read Human Mortality Database
> and Human Fertility Database Data from the Web*. R package.
> <https://github.com/timriffe/HMDHFDplus>

The authentication flow, URL patterns, parsing logic, and data-cleaning
conventions are derived directly from that work. Licensed under GPL-2.0.

## Supported databases

| Database | Short name | Authentication |
|---|---|---|
| [Human Mortality Database](https://www.mortality.org) | HMD | account required |
| [Human Fertility Database](https://www.humanfertility.org) | HFD | account required |
| [Human Fertility Collection](https://www.fertilitydata.org) | HFC | none |
| [Japanese Mortality Database](https://www.ipss.go.jp/p-toukei/JMD) | JMD | none |
| [Canadian Historical Mortality Database](https://www.prdh.umontreal.ca/BDLC) | CHMD | none |

## Installation

```bash
pip install pyhmfd
```

Or from source:

```bash
git clone https://github.com/filipeclduarte/pyhmfd.git
cd pyhmfd
pip install -e ".[dev]"
```

## Credentials

For HMD and HFD you need a free account at their respective websites.
Supply credentials via environment variables (recommended for scripts/CI):

```bash
export HMD_USER="your@email.com"
export HMD_PASSWORD="yourpassword"
export HFD_USER="your@email.com"
export HFD_PASSWORD="yourpassword"
```

Or pass them directly to the function, or let the package prompt interactively.
Credentials entered interactively are offered to the system keyring for storage.

## Quick start

```python
import pyhmfd

# Human Mortality Database — needs credentials
df = pyhmfd.read_hmd_web("USA", "Mx_1x1")
df = pyhmfd.read_hmd_web("FRATNP", "Deaths_1x1", username="u@example.com", password="pw")

# Human Fertility Database — needs credentials
df = pyhmfd.read_hfd_web("USA", "asfrRR")

# Japanese Mortality Database — no auth
df = pyhmfd.read_jmd_web("01", "Deaths_1x1")   # 01 = Hokkaido

# Canadian Historical Mortality Database — no auth
df = pyhmfd.read_chmd_web("que", "Mx_1x1")     # que = Quebec

# Human Fertility Collection — no auth
df = pyhmfd.read_hfc_web("RUS", "ASFRstand")

# Read a locally downloaded file
df = pyhmfd.read_hmd("/path/to/Mx_1x1.txt")
df = pyhmfd.read_hfd("/path/to/asfrRR.txt", item="asfrRR")
```

## Utility functions

```python
# List available countries
pyhmfd.get_hmd_countries()         # ['AUS', 'AUT', ..., 'USA']
pyhmfd.get_hfd_countries()         # DataFrame with country names and codes
pyhmfd.get_hfc_countries()         # list of codes
pyhmfd.get_jmd_prefectures()       # dict: name → 2-digit code
pyhmfd.get_chmd_provinces()        # ['alb', 'bco', 'can', ...]

# List available data items per country
pyhmfd.get_hmd_items("USA")        # DataFrame: item, description, url
pyhmfd.get_hfd_items("USA")        # DataFrame: item, description, url

# Last-update date for an HFD country
pyhmfd.get_hfd_date("USA")         # '20260323' (date of last update)
```

## Output format

All functions return a `pandas.DataFrame`. When `fixup=True` (default):

- `Age` column is `Int64` (nullable integer).
- `OpenInterval` boolean column marks the terminal open age group (e.g. 110+).
- `Year` and `Cohort` are `Int64`.
- Rate and count columns are `float64`.
- Missing values coded as `'.'` in source files become `NaN`.

## Running tests

```bash
pytest
```

## License

GPL-2.0, following the original R package.
