Metadata-Version: 2.4
Name: fetchanything
Version: 0.2.0
Summary: A command-line tool to fetch files from websites recursively
Home-page: https://github.com/yourusername/fetchanything
Author: Chao-Chung Kuo
Author-email: chao-chung.kuo@rwth-aachen.de
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: argparse>=1.4.0
Requires-Dist: urllib3>=2.0.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# FetchAnything

A command-line tool to fetch files from websites recursively.

## Installation

You can install FetchAnything using pip:

```bash
pip install fetchanything
```

Or from source:

```bash
git clone https://github.com/yourusername/fetchanything.git
cd fetchanything
pip install -e .
```

## Usage

Basic usage:

```bash
fetchanything <URL> [options]
```

### Options

- `-l, --level LEVEL`: Maximum crawl depth (default: 2)
- `-f, --filter PATTERN`: File pattern to match (e.g., "*.pdf", "*.jpg")
- `-u, --url-pattern PATTERN`: Regex pattern to match URLs for crawling (e.g., ".*/blog/.*")
- `-o, --out DIRECTORY`: Output directory (default: downloads)
- `-v, --verbose`: Enable verbose output

### Examples

1. Download all PDF files from a website up to depth 2:
```bash
fetchanything https://example.com --level 2 --filter "*.pdf" --out download_pdf
```

2. Download all files from a website up to depth 1:
```bash
fetchanything https://example.com --level 1 --out downloads
```

3. Download all images with verbose output:
```bash
fetchanything https://example.com --filter "*.jpg" -v
```

4. Download PDFs only from blog pages:
```bash
fetchanything https://example.com --filter "*.pdf" --url-pattern ".*/blog/.*"
```

5. Download files only from specific subdomain:
```bash
fetchanything https://example.com --url-pattern "https://docs\\.example\\.com/.*"
```

## Features

- Recursive website crawling with depth control
- File pattern matching
- URL pattern filtering
- Progress tracking with tqdm
- Verbose logging option
- Persistent HTTP sessions
- Error handling and graceful interruption

## Requirements

- Python 3.7 or higher
- requests
- beautifulsoup4
- tqdm
- urllib3

## License

MIT License
