Metadata-Version: 2.4
Name: huggingfacesync
Version: 0.1.4
Summary: Manifest-driven Hugging Face upload/download CLI.
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: huggingface_hub>=1.0.0

# huggingfacesync (alias: hfsync)

Manifest-driven CLI for syncing files between a local folder and a Hugging Face Hub repo.

## What it does

- Upload files matched by manifest patterns.
- Download files matched by manifest patterns.
- Read default repo config from manifest (`repo_id`, `repo_type`).
- Skip missing local patterns during upload.
- Skip missing remote files during download.

## Authentication

Login before syncing:

```bash
huggingface-cli login
```

Or set token in environment:

```bash
export HF_TOKEN=your_token
```

## Install

```bash
python -m pip install --upgrade pip
python -m pip install "git+https://github.com/cplusx/hfsync.git"
```

Runtime dependency `huggingface_hub` is installed automatically by pip.

Then run:

```bash
huggingfacesync --help
hfsync --help
```

Both command names are available and equivalent:

- `huggingfacesync`
- `hfsync`

If your shell cannot find these commands, use:

```bash
python -m upload_to_hf --help
```

## Manifest format (`.hfupload`)

If `.hfupload` does not exist, `huggingfacesync` (or `hfsync`) will auto-create a template file and exit.
If the file already exists, it is never overwritten.

```txt
# hfsync manifest
# repo_id=<your-namespace>/<your-repo>
# repo_type=dataset

# One glob per line
data/**
artifacts/**/*.json
metadata/*.csv
```

### Config keys

- `repo_id`: Hugging Face repo id, e.g. `alice/my_dataset`.
- `repo_type`: one of `dataset`, `model`, `space` (default: `dataset`).

You can still override from CLI:

```bash
hfsync upload --repo-id alice/another_repo
```

## Usage

```bash
# Upload matched local files
hfsync upload

# Download matched remote files
hfsync download

# Use custom manifest path
hfsync --manifest path/to/.hfupload upload
hfsync --manifest path/to/.hfupload download

# Preview only
hfsync upload --dry-run
hfsync download --dry-run
```

### Large upload strategy

`upload` supports automatic switching to `upload_large_folder`:

```bash
# default: auto
hfsync upload

# force large-folder uploader
hfsync upload --large-upload always

# force regular uploader
hfsync upload --large-upload never
```

Auto mode switches to large uploader when either threshold is exceeded:

- `--large-files-threshold` (default `3000`)
- `--large-bytes-threshold` (default `1000000000`)

You can also set `--large-num-workers` for `upload_large_folder`.

## Notes

- Upload uses explicit matched files, so patterns that match nothing are reported and skipped.
- Download first lists remote files then filters with manifest patterns.
- Keep `.hfupload` under version control for reproducible sync behavior.

For project maintainers, release instructions are in `MAINTAINING.md`.
