Metadata-Version: 2.1
Name: lm-checkpoints
Version: 0.1.15
Summary: Simple library for loading checkpoints of language models.
License: MIT
Author: Oskar van der Wal
Author-email: odw@duck.com
Requires-Python: >=3.8,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Provides-Extra: eval
Requires-Dist: accelerate (>=0.24.1,<0.25.0)
Requires-Dist: lm-eval (>=0.4.2,<0.5.0); extra == "eval"
Requires-Dist: torch (>=2.0.0,!=2.0.1,!=2.1.0)
Requires-Dist: transformers (>=4.35.0,<5.0.0)
Description-Content-Type: text/markdown

# 🤖🚩lm-checkpoints

> Simple library for dealing with language model checkpoints to study training dynamics.

**lm-checkpoints** should make it easier to work with intermediate training checkpoints that are provided for some language models (LMs), like MultiBERTs and Pythia. This library allows you to iterate over the training steps, to define different subsets, to automatically clear the cache for previously seen checkpoints, etc. Nothing fancy, simply a wrapper for 🤗 models that should make it easier to study their training dynamics.

Install using `pip install lm-checkpoints`.

## Checkpoints
Currently implemented for the following models on HuggingFace:
- [The Pythia models](https://github.com/EleutherAI/pythia)
- [MultiBERTs](https://huggingface.co/google/multiberts-seed_0)

## Example
Say you want to compute some metrics for all model checkpoints of Pythia 160m, but only seed 0.

```python
from lm_checkpoints import PythiaCheckpoints

for ckpt in PythiaCheckpoints(size=160,seed=[0]):
    # Do something with ckpt.model, ckpt.config or ckpt.tokenizer
    print(ckpt.config)
```

Or if you only want to load steps `0, 1, 2, 4, 8, 16` for all available seeds:
```python
from lm_checkpoints import PythiaCheckpoints

for ckpt in PythiaCheckpoints(size=160,step=[0, 1, 2, 4, 8, 16]):
    # Do something with ckpt.model, ckpt.config or ckpt.tokenizer
    print(ckpt.config)
```

Alternatively, you may want to load all final checkpoints of MultiBERTs:
```python
from lm_checkpoints import MultiBERTCheckpoints

for ckpt in MultiBERTCheckpoints.final_checkpoints():
    # Do something with ckpt.model, ckpt.config or ckpt.tokenizer
    print(ckpt.config)
```

### Loading "chunks" of checkpoints for parallel computations
It is possible to split the checkpoints in N "chunks", e.g., useful if you want to run computations in parallel:
```python
chunks = []
checkpoints = PythiaCheckpoints(size=160,seed=[0])
for chunk in checkpoints.split(N):
    chunks.append(chunk)
```

### Dealing with limited disk space
In case you don't want the checkpoints to fill up your disk space, use `clean_cache=True` to delete earlier checkpoints when iterating over these models (NB: You have to redownload these if you run it again!):
```python
from lm_checkpoints import PythiaCheckpoints

for ckpt in PythiaCheckpoints(size=14,clean_cache=True):
    # Do something with ckpt.model or ckpt.tokenizer
```
### Evaluating checkpoints using lm-evaluation-harness
If you install lm-checkpoints with the `eval` option (`pip install "lm-checkpoints[eval]"`), you can use the `evaluate` function to run [lm-evaluation-harness]() for all checkpoints:
```python
from lm_checkpoints import evaluate, PythiaCheckpoints

ckpts = PythiaCheckpoints(size=14, step=[0, 1, 2, 4], seed=[0], device="cuda")

evaluate(
    ckpts,
    tasks=["triviaqa", "crows_pairs_english"],
    output_dir="test_results",
    log_samples=True,
    skip_if_exists=True,
#    limit=5, # For testing purposes!
)
```

Or you can use the `evaluate_checkpoints` script:
```bash
evaluate_checkpoints pythia --output test_results --size 14 --seed 1 --step 0 1 2 --tasks blimp crows_pairs_english --device cuda --skip_if_exists
```

Both examples will create a subdirectory structure in `test_results/` for each model and step. This will contain a results json file (e.g., `results_crows_pairs_english,triviaqa.json`), and if using the `--log_samples` option, a jsonl file containing the LM responses to the individual test items for each task (e.g., `samples_triviaqa.jsonl`).
