Metadata-Version: 2.1
Name: blackfeed
Version: 0.0.18
Summary: A python package that allows the download of thousands of files concurrently
Home-page: https://github.com/ibragim64/blackfeed
Author: Ibragim Abubakarov
Author-email: ibragim.ai95@gmail.com
Maintainer: Ibragim Abubakarov
Maintainer-email: ibragim.ai95@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Internet :: File Transfer Protocol (FTP)
Description-Content-Type: text/markdown
Requires-Dist: requests
Requires-Dist: boto3
Requires-Dist: pysftp

# BlackFeed
> BlackFeed is a micro python library that allows you download and upload files concurrently.
> You can download your files locally but you can also upload them to your cloud without writing them to disk.

### Packages required
> Installed automatically with **pip**
- requests
- boto3

## Install
```bash
pip install blackfeed
```

## Usage
Download and upload files to AWS S3
**For this to work, AWS CLI must be configured**
```python
from blackfeed.downloader import Downloader
from blackfeed.adapter.s3 import S3Adapter

queue = [
    {
        'url': 'https://www.example.com/path/to/image.jpg', # Required
        'destination': 'some/key/image.jpg' # S3 key - Required 
    },{
        'url': 'https://www.example.com/path/to/image2.jpg',
        'destination': 'some/key/image2.jpg' 
    }
]

downloader = Downloader(
    S3Adapter(bucket='bucketname'),
    multi=True, # If true, uploads files to images to S3 with multithreading
    stateless=False # If set to False, it generates and stores md5 hashes of files in a file
    state_id='flux_states' # name of the file where hashes will be stored (states.txt) not required
    bulksize=200 # Number of concurrent downloads
)
downloader.process(queue)
stats = downloader.get_stats() # Returns a dict with information about the process
```

### Download files with states
Loading states can be useful if you don't want to re-download the same file twice.
```python
from blackfeed.downloader import Downloader
from blackfeed.adapter.s3 import S3Adapter

queue = [
...
]

downloader = Downloader(
    S3Adapter(bucket='bucketname'),
    multi=True,
    stateless=False,
    state_id='filename'
)

# You can add a callback function if needed
# This function will be called after each bulk is processed
def callback(responses):
    # response: {
    #    'destination': destination of the file can be local or can be S3 key,
    #    'url': URL from where the file was downloaded,
    #    'httpcode': HTTP code returned by the server,
    #    'status': True|False,
    #    'content-type': Mime type of the downloaded resource Example: image/jpeg
    # }
    # responses: response[]

    pass # Your logic

downloader.set_callback(callback)

downloader.load_states('filename') # This will load states from "filename.txt"
downloader.process(queue)
stats = downloader.get_stats() # Statistics 
```

## ElasticDownloader
> Let's you to download/retrieve files from FTP, SFTP and HTTP/S servers easily.

### Examples
#### Downloading file from FTP 
```python
from blackfeed.elasticdownloader import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader()
res = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional
# .download() function returns False if there was an error or return the local path of the downloaded file if it was a success.
print(res)
```
```bash
/tmp/myfile.csv
```

### Retrieving binary content of file from FTP
```python
from blackfeed.elasticdownloader import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader()
res = retriever.retrieve(uri) # Return type: io.BytesIO | False

with open('/tmp/myfile.csv', 'wb') as f:
    f.write(res.getvalue())
```
**ElasticDownloader** can handle FTP, SFTP and HTTP URIs automatically.
Use the method **download** to download file locally and use the **retrieve** method to get the binary content of a file.

